Reducing Asynchronous Serial Interrupts
Packey Velleca
Serial ports on UNIX machines are typically used for low rate communications, such as human I/O via an ASCII terminal. In these applications, a person types a few characters per second at the keyboard, and the host returns a few bytes per second to the display. There are usually long periods of time (from tens of milliseconds to hours or days) between sessions when the serial port is idle.
There are times when you are faced with a higher rate application and you might initially want to implement it using one or more serial ports on a UNIX box. After all, if you have a server which is designed for higher I/O rates, why not take advantage of unused serial ports? Such an application might be for distributing time, where the current time comes in at 19200 baud at regular intervals.
It sounds like an easy problem to solve: just connect your high data rate device to an unused serial port on your server, write a small program to handle I/O, and you're done! But, there is a catch. On most UNIX machines, serial devices can only perform reads for a small number of bytes at a time, sometimes even a single byte. That is, the hardware on the machine would interrupt the serial device driver every time it clocked one byte in. At a baud rate of 19200 baud, with 8 bits of data, 1 start bit, no parity bits, and 1 stop bit, the interrupt rate at the host would be:
19200 bits/sec / 10 bits/byte * 1 interrupt/byte = 1920 interrupt/sec
For higher baud rates, or multiple serial lines, this can be a significant load on the server. Each interrupt must be serviced by the CPU, during which time it is unavailable to run application code. For example, if you are getting 1920 interrupts per second, the CPU is interrupted every 521 microseconds. If a serial interrupt service request takes 100 microseconds (for example), then 100 out of every 521 microseconds is needed to handle the interrupt, or about 19% of the CPU is utilized just for system overhead.
Several Approaches One approach to reducing CPU overhead is to buffer the number of bytes from a serial device, and therefore reduce the interrupt rate. For example, some hardware device drivers provide a STREAMS ioctl(2) that allows you to set the number of system clock ticks (typically one tick is a 10-ms duration) the function should buffer bytes before sending them up the stream. Setting this value to 0 provides one interrupt per byte; whereas a value of 2 would buffer data for 20 milliseconds (2 ticks * 10 ms/tick). This approach is satisfactory for a single serial port on a server, but for multiple ports, or higher data rates, a more effective reduction is required.
A classic method for handling large amounts of I/O is off-loading the processing from the server to a secondary device. Typically, the device is dedicated to that task, and therefore can be much more efficient. One such device is a serial I/O board installed in the system peripheral bus. There are VME boards that offer 32 or more serial ports, and they provide two key features: deterministic response time and data buffering. Deterministic response time is required for a realtime application, such as time distribution or fast event response in a control system, and it is defined as the maximum time to process the request and transfer the data from the board to the application program. The other key feature of these boards, data buffering, gathers incoming serial data then usually DMA-transfers it in blocks to main memory, off-loading the CPU from performing the transfer. The downside is that these devices are more complicated to set up and can cost several thousand dollars.
Another way to off-load serial processing from the CPU is to use a communications server, or terminal server. These devices can have 32 or more serial ports connected by a specialized processor to a network, such as a TCP/IP/Ethernet LAN. The processor in the terminal server handles the serial data as it comes in each line, buffers it up, and transfers it to the host via the LAN. The host then receives only one interrupt from its network interface per buffer (packet), which greatly reduces its overhead processing.
Application programs on the host connect to a specific serial port on the terminal server via a TCP socket. Typically, each physical serial port on the terminal server is mapped to a well-known socket, which does not vary. For example, port 1 would have a socket ID of 7001, port 2 would be 7002, etc. Application programs then bind(3N) to that port, thus creating a socket(3N) that is effectively bound to that serial port on the terminal server, producing a full-duplex communication link. Data coming into the serial port is physically transferred to the host via the socket a packet at a time, with packets ranging in size (for Ethernet) from 64 bytes to 1500 bytes. But to the application program, the data arrives in a contiguous stream of bytes.
Note that since the data was buffered for some time in the terminal server, then packeted and sent, there is now a latency between two contiguous bytes at the output of the terminal server that was not in the original byte stream at its input. For example, if the data were buffered in the terminal server for 50 milliseconds before the packet was sent, then some byte n+1 in the byte stream will appear at the host 50 milliseconds after byte n. The original byte stream at the input to the terminal server, however, did not contain this delay. Buffering reduces interrupts, but increases latency.
Which solution works best? There is no best solution for all system designs, but if you can characterize data flow and processing requirements, then you can make a better decision in terms of time, cost and risk. The remainder of this article will focus on characterizing the data flow for a terminal server at different baud rates. By understanding your specific data flow requirements, and then consulting the tables provided below, you can determine whether you need to use a terminal server, use a bus-level device, or simply connect directly to your host.
Test Setup To make measurements for latency and interrupt rates, a test bed was created, which included two SGI Challenge L servers, a Xylogics Remote Annex 2000 communications server (terminal server), an Ethernet analyzer, and assorted network equipment. See Figure 1 for a topology.
Serial ports with no hardware flow control are named /dev/ttydN in SGI IRIX 6.2. The serial port /dev/ttyd3 on an SGI server (S1) was connected via an RS232 cable to port 4 on the terminal server. The Ethernet port on the terminal server was connected to the other SGI server (S2) on the same subnet. The terminal server's utility program rtelnet was started on (S2) to create a raw socket between the terminal server and (S2), with endpoints on the terminal server connected to port4, and on (S2) to a pseudo-tty /dev/test (see sidebar). A file of 48,890 bytes was created on (S1) and written to /dev/ttyd3 for a set of different baud rates. Port 4 on the terminal server was set to the same baud rate as /dev/ttyd3. The file was transferred to the terminal server, then across the Ethernet to (S2), where it was read from /dev/test and displayed in an xterm. File transfer time for the file write was measured by using /bin/time. The other data was gathered by the Ethernet sniffer.
rtelnet is a utility provided by Xylogics for use with the terminal server. It is executed from the host, where it creates a socket that connects on one end to a specified terminal server port, and on the other end to a pseudo-tty. Application programs can then read and write the pseudo-tty as if it were attached directly to the device (i.e., like a device /dev/ttydN). In an actual system, an application program would be written to handle the creation of the socket, and would use the socket directly rather than via a pseudo-tty. rtelnet was used in this test because it was much simpler and faster than writing the code.
Flow control was provided only by TCP - no hardware handshaking was enabled between the terminal server and (S1). All TCP parameters were set to defaults for the terminal server, S1, and S2. Though not specifically disabled, no XON-XOFF flow control was observed between the terminal server and S1.
Table 1 summarizes the results. Note that with TCP/Ethernet, there is an interrupt for every data packet (frame) and another for the acknowledgment. For example, at 19200 baud, 128 packets of data were required to transfer the file. Additionally, there were 128 acknowledgment packets, plus a few more required for setting up the connection, for a total packet count of 256, which is the total number of interrupts on the host.
Conclusions Using a terminal server to buffer serial data streams, packetize the bytes, and send it to a server via a TCP connection will off-load the server by several orders of magnitude. Each TCP packet of data requires an interrupt, as well as each TCP acknowledgment. Based on the data above, the interrupt rate at the server at 19200 bps could be reduced from 1920 interrupt/sec to 10 interrupt/sec. Note that this would also increase worst-case data latency from:
1 / 1920 interrupt/sec = 0.521 ms latency between two bytes
for the original one-interrupt-per-byte scheme to:
1 / 5.02 interrupt/sec = 200.0 ms latency between two bytes
for the one-interrupt-per-buffer scheme. Longer latency is the trade-off for lower interrupt rates. On the other hand, compare CPU utilization for the two, assuming that the serial device interrupt service request has the same 100-sec duration as a network device:
100 sec/interrupt * 1920 interrupt/sec = 19.2% CPU utilization
for one-byte-per-interrupt scheme
100 sec/interrupt * 5.02 interrupt/sec = 0.05% CPU utilization
for one-buffer-per-interrupt scheme.
To decide which architecture is right for you, look at your processing requirements, then refer to the Table 2. Note that "low data rate" is defined here as less than about 19,200 bits/sec, and refers to a continous stream of data.
About the Author
Packey Velleca graduated with a BSEE in 1988, and is currently working on a realtime data processing system for Harris Corp., in Melbourne, FL.
|