Time Synchronization
Packey Velleca
When computers are networked together to share information, it is important they have a common time reference. Networked filesystems, centralized compilers, license servers, and realtime control systems all need to be synchronized to some degree, and what differentiates a realtime control system from a license server is the degree of accuracy needed. The license server could be synchronized within a few minutes, whereas the control system may need to be within a millisecond. This article discusses some of the options available for synchronizing machines to within a milllisecond for use in a realtime data processing system.
There are a few factors that drive the design of a time synchronization system, and they include: available time reference, price, reliability, and accuracy. This article discusses some of these choices, and then describes my choice, ntp, in detail.
Most time distribution schemes for computer systems deliver time-of-day, which is monotonic. This article does not cover the topic of non-realtime time distribution, such as time codes recorded and replayed on tape.
Time References Every time distribution architecture requires a time reference (see sidebar for a glossary). Here are a few of the choices available, and a short description of their relative merits.
GPS The Global Positioning System (GPS) is a network of satellites maintained by the U.S. government that is able to deliver very accurate UTC (coordinated universal time) to nearly any location on the globe. This system is capable of delivering an accuracy of a few microseconds, and requires an antenna and GPS receiver. There are GPS receivers based on VME boards and standalone Local Area Network (LAN) nodes.
Local Time Code Generator A Time Code Generator (TCG) has a highly stable, built-in oscillator that drives a onboard clock, which in turn drives a time encoder. It can be used as a time reference by distributing the encoded signal with standard coax and amplifiers. Popular encoding schemes include the Interrange Instrumentation Group (IRIG) family, which uses a serial time code that AM modulates a sinewave carrier. IRIG resolutions vary from seconds to fractions of a millisecond. These systems can maintain an accuracy of several microseconds, but not inherently with respect to UTC, as the start time is generally set by hand.
ACTS The National Institute of Standards and Technology (NIST) provides the Automated Computer Time Service (ACTS) for synchronizing to UTC. Access to ACTS is via a phone line and modem, with some software running on the local computer to decode the time code and set the system clock. ACTS is capable of an accuracy of about 40 milliseconds.
Choosing An Architecture There are several ways to distribute UTC from a primary signal. The task is to determine the parameters that will drive your design, usually those of cost, accuracy, and reliability. The available time reference may also drive your design, but is normally an independent variable. Since this article is largely about time distribution, it will be assumed the time reference is in place and will not be included in any of the costing. Discussing each architecture requires clarification of a few terms, most of which are defined in the sidebar.
Option 1 - A Time Code Processor (TCP) Board in Every Node A TCP is a board that resides on a computer peripheral bus. The TCP receives an IRIG time code, decodes it, and makes the time of day available to an application program. The program can then set the system clock or timestamp data as needed. This method requires a single time reference, such as a GPS signal or Time Code Generator (TCG), to produce the IRIG time code. The signal is distributed using low-bandwidth amplifiers and standard coaxial cable.
Advantages: Very tight time synchronization due to deterministic propagation delay and low, fairly deterministic processing latency on the computer.
Disadvantages: Cost. Both initial and long-term cost are high. Initial cost includes the time reference, GPS/TCG, amplifiers, cabling, and TCP boards. Also, every node in the network is required to have a peripheral bus with an open slot for the TCP If your bus of choice is VME; this is expensive and not always feasible. Long-term cost includes an amplifier, TCP, amplifier module, and cabling for every new node.
Costs: Initial Costs: $2,700 per node. This cost is an average based on all costs divided by the number of nodes, and includes amps, cables, etc. Cost will be much higher if every node requires a VME bus. Long-term costs: $2,700 per node, since each additional node will require more hardware.
Option 2 - Single TCP with ntp LAN Distribution A time code signal is fed to a TCP residing in a single primary machine on the network, where the application program ntp synchronizes that platform to the incoming timecode. All other nodes in the network then synchronize to the primary machine via ntp over preexisting LANs.
Advantages: ntp is free, and well supported, popular Internetwork software (RFC-1305). The only software cost is for porting and possibly integrating ntp and the device driver. Except for coaxial cabling to the primary machine, all other nodes use existing IP networks to synchronize to UTC. Good time synchronization, about 1 ms, is possible for small LANs. The TCP can accept IRIG-G, which has a 10 ms resolution. There are no long-term costs, since adding a node requires only ntp and a connection to the existing LAN.
Disadvantages: It may be difficult to locate a device driver to interface ntp to the TCP you have chosen on the platform you are using, and you may have to write it yourself. However, there are quite a few drivers already written and available in the ntp distribution for popular time references. Also, as a LAN grows very large, and routing delays become more non-deterministic, the accuracy that ntp is able to maintain drops to about 10 ms or more.
Costs: Initial Costs: $2500 for the TCP, plus time for porting/writing the device driver. Long-term costs: $0
Option 3 - Network-based Time Server with ntp LAN Distribution An IRIG or GPS signal is fed to an off-the-shelf Time Server, which is an Ethernet-based, special purpose computer running the ntp server software. ntp clients are set up on all the other nodes in the network to synchronize to the Time Server via the LAN. Option 3 is identical to Option 2, except that no TCP is needed in any of the machines in the network.
Advantages: Time distribution is now completely off-the-shelf, with no device drivers, or custom software to port or write. There are no long-term costs, since adding a node requires only ntp and a connection to the existing LAN.
Disadvantages: Initial cost of the Time Server is fairly high. Also, most Time Servers only accept IRIG-B, which has a resolution of 1 ms, though this is likely to be good enough.
Costs: Initial costs are $6,000 - $10,000 for a Time Server. Long-term Costs: $0.
Option 4 - timed with LAN Distribution
timed(1M) is a standard UNIX program capable of synchronizing computers to each other, but not to some external reference. timed sets every machine to an average time computed from the time of each machine.
Advantages: It's included in the operating system, so it's ported and supported.
Disadvantages: Since timed uses an average of the time kept by all systems on the network, some machines will be ahead and some will be behind the true time-of-day. It is even likely that all machines, though moderately synchronized to each other, could be very far from UTC.
Costs: Initial cost: $0. Long-term cost: $0.
Case Study For the USAF Range Standardization and Automation (RSA) program, we had the following requirements:
- Our time reference would be supplied by the USAF, and it would be an IRIG-G time code.
- We need to be distribute UTC, with an accuracy of +/- 1 ms, to all nodes on a LAN to allow timestamping log files, operator commands, and displays.
- All systems must have no Single Point of Failure (SPOF), that is, all systems must be 100% redundant.
- The system is closed to the Internet and to other external networks.
Based on these requirements, we decided to distribute UTC per Option 2 above: a TCP based, ntp-distributed system. The rationale for this was threefold. First, we eliminated the network Time Server (Option 3), since no Time Server was compatible with IRIG-G. It is possible to use a Time Code Translator (TCT) to decode IRIG-G and output IRIG-B, but this added too much initial cost, especially since all hardware had to be redundant. Second, we eliminated using UNIX timed, because it doesn't distribute UTC at all. Third, we eliminated the TCP per node (Option 1), because we didn't need that level of synchronization and its associated high cost.
This left Option 2, a very good choice because it met all of our requirements at a moderate cost. Since many of our machines already had VME buses, we found an affordable VME-based TCP that can decode IRIG-G, (one for each of our redundant servers). As for 1 ms synchronization, there was some doubt at first that it could be achieved over an IP network, but as we will see later, this was shown not to be an issue.
How ntp Works
ntp distributes UTC, and does so using a hierarchy to off-load workload. The source of UTC, whatever its source, is known as a Stratum 0 node that denotes true UTC. A Stratum 1 server is a node that is directly synchronized to a Stratum 0. In our architecture, the VME TCP board is the Stratum 0 source, and the server in which the board resides is the Stratum 1 node. Synchronization between Stratum 0 and Stratum 1 is very tight, on the order of tens of microseconds. A Stratum 2 node then gets its UTC from a Stratum 1 server, and so on for the remaining strata. ntp runs as a single executable that is capable of performing as an ntp client or server. In our case, the root node has ntp acting as both client and server (client to the Stratum 0 TCP and server to all other Stratum 1 nodes), and the other nodes on the LAN are all Stratum 1.
As a node gets farther away from the Stratum 0 source, its offset from UTC becomes larger. This offset is due to a number of factors: dissimilar network propagation delays for round trip message passing, the temperature at which a node is operating, the stability of the hardware oscillators in the computer responsible for keeping the platform's system clock, and the kernel's mechanism for updating the system clock. Dissimilar propagation delay occurs when a packet travels over one network path on its way from the client to the server, and then returns to the client via a different (longer or shorter time) route. Ethernet and FDDI networks are themselves non-deterministic (from an application program point of view) in that a node does not know in advance how long it will have to wait to transmit a message, or for the receiver to successfully receive it. Also, different machines have different interrupt latencies for network interface interrupts. Even on like platforms, this value can differ depending of its load profile at any given time. Temperature affects the stability of an oscillator, as changing temperature generally causes changing frequencies. Two identical machines set initially with the same time of day, but in different rooms, may have oscillators running at slightly different frequencies. This would cause the system clocks to have some offset over time.
ntp uses a complex set of timestamps and statistics to continuously calculate this offset and adjust the platform's system clock so that it will have zero offset from UTC. Theoretically then, all nodes can be at zero offset from UTC. In practice this is not the case, especially for nodes higher than Stratum 1. For these reasons, ntp can take as long as 24 hours to accurately characterize a platform and stabilize its clock.
There was some initial concern that a LAN-based time distribution approach would not work, due to the non-determinism of packet delivery and the fairly tight synchronization of 1 ms. To determine the limits of accuracy of ntp within our architecture, we created a testbed and measured ntp performance.
The Testbed Our test had two objectives to verify: 1) ntp is capable of achieving a +/- 1 ms offset from UTC; and 2) ntp is capable of maintaining that offset at least 99% of the time. We built the testbed from eight SGI machines: three servers and five workstations, all on interconnected class B subnets. Some were separated by routers, and some were directly attached to the Stratum 1 server. We set up the ntp network as follows:
- One Stratum 1 server using a VME-based TCP board and a TCG providing IRIG-G time. We used a baked oscillator at 5 MHz to drive the TCG. The TCG was set by hand to the local time.
- Seven Stratum 2 nodes, synchronized to the Stratum 1 via Ethernet.
- Of the Stratum 2 nodes, two were directly attached to the same Ethernet as the Stratum 1 server, and five were attached via one or more routers.
To verify our objectives, we let ntp run for 24 hours, then sampled each machine's offset every 4 seconds for 21 days (450,000 samples). The initial 24 hours allowed ntp to create a drift file for each machine, and allowed the network to come to a steady state. We wrote the scripts and programs to sample offsets and compile statistics (see Listing 2). Note that the ntp distribution also includes hooks for gathering statistics.
Table 1 shows the results of the test. Note that all times are in ms. "Stacks to Stratum0" refers to the number of distinct IP stacks a packet has to traverse before it reaches ntp on the Stratum 0 server. "Average Offset" is the average number of ms the machine was offset from the Stratum 0 server. "Maximum Offset" is the largest absolute value for an offset. Standard Deviation shows how well the collected statistics cluster around the average offset. Accuracy denotes the percentage of samples that fell within a 1 ms offset. Due to rounding, some accuracies calculate to 100%, when in fact they are slightly less.
The results in Table 1 indicate that:
- ntp is capable of maintaining our Stratum 2 machines synchronized within 1 ms of UTC.
- The Stratum 2 machines were within 1 ms of UTC more than 99.97% of the time (about 4 s.d.), worst case. We set the ntp minimum and maximum poll interval (the interval between client requests for time from the server) to 2 and 4 seconds respectively. ntp will automatically adjust the actual poll interval to fall between these min and max values, depending on how well it is able to discipline the system clock. Setting the minimum poll interval lower allows the client to correct itself more quickly, and therefore spend less time with an offset greater than 1 ms, at the expense of more CPU cycles and network traffic.
The ntp step threshold can be set to something less than the default 128 ms, if desired. This could be used to ensure that the offset stays below some threshold by stepping the clock rather than slowly slewing it. You need to be careful when changing this parameter, because too small a value will cause the control software to ntp to become unstable and never properly synchronize. The smallest step value will have to be determined empirically for your platform and network.
Also be aware that the system clock may NOT be monotonic below the millisecond resolution. For example, it is possible that ntp can overshoot the time adjustment while slewing the clock (e.g., a 34-ms offset could be corrected by 54 ms), and the end result is that the clock stays monotonic to 1 ms, but is not at resolutions below that. See Table 2 for an illustration.
Note that between time UTC 24.100 and 25.100, the system clock on the client had the effect of running backward at the 100 ms resolution (i.e., it went from 102 ms at the millisecond epoch to 95 ms at the next millisecond epoch). This isn't a problem in our case, since we're only concerned with millisecond accuracy, not microsecond accuracy.
ntp also allows any client to define any number of backup time servers, so that if a server were unavailable, time could be received from any other server. Also note that no other time protocols or applications that use settimeofday(3C), date(1), or adjtime(2) can be used (e.g., timed, timeslave, etc.) can be running on the platform. ntp must be the only application in control of the system clock.
Another program that comes with the ntp distribution is ntpdate, which should be used at bootup of the client to roughly synchronize (within several milliseconds) the client's system clock to within 1000 seconds of the server's. If the client offset is more than 1000 seconds, ntp will not run on that client.
CPU and memory utilization were measured on a multiprocessor server using ps and top. A 4- to 8-second poll interval sends 2 64-byte packets every 4 seconds per client. Ten clients would average 320 bytes/sec, or about 0.04% network utilization for 10Base-T. Worst case utilization would be if all ten clients hit the server at once, bringing network utilization to about 0.4%. This is a light load for a client, and depending on the number of clients, a light load for the server.
ntp is capable of running indefinitely, and actually improves (to a point) accuracy the longer it runs. The only case in which machines wander out of sync is when there is a malfunction; such as the server or network down. ntp takes just a few minutes to bring a machine into 1 ms synchronization, if platform drift is known. It does this by stepping the client clock to match the server clock after collecting several time samples from the server. Stepping can be avoided by allowing the clocks only to be slewed, but then initial synchronization at startup could take as long as 4 hours.
Conclusions
ntp is well accepted by the Internet community. It is currently installed on more than 100,000 hosts around the world. ntp is in its second decade. Future releases focus mostly on feature enhancements - most, if not all, bugs have been worked out of the time-keeping code.
ntp provides the capability to encrypt all communications between the client and the server, so that is very unlikely that time is being spoofed by an outside time source.
The network delays incur error when the transmit path is not the same length (in time) as the receive path, and for large LANs and WANs, this single variable can cause the attainable accuracy of ntp to be on the order of 10-40 ms. However, where there are well-defined (i.e., one path between nodes) LANs, differential delays are not as much of an issue, and ntp is able to perform very well.
Acknowledgements Many thanks to Eldon Ideus, for porting the HP/GPS-VME ntp driver to IRIX 6.2/Datum, and for his clear head. Also thanks to Harris Corporation for their support.
References Mills, D.L. Network Time Protocol (V3) specification, implementation and analysis. Network Working Group Report RFC-1305, University of Delaware, March 1992, 113 pp.
Mills, D.L. Measured performance of the Network Time Protocol in the Internet system. Network Working Group Report RFC-1128. University of Delaware, October 1989.
About the Author
Packey Velleca graduated with a BSEE in 1988, and is currently working on a realtime data processing system for Harris Corp., in Melbourne, FL.
|