Why Is My NTP Server Costing Me $500/Year? Part 2: Characterizing the NTP Clients

July 14, 2014 Brian Cunnie

In the previous blog post, we concluded that providing an Amazon AWS-based NTP server that was a member of the NTP Pool Project was incurring ~$500/year in bandwidth charges.

In this blog post we examine the characteristics of NTP clients (mostly virtualized). We are particularly interested in the NTP polling interval, the frequency with which the NTP client polls its upstream server. The frequency with which our server is polled correlates directly with our costs (our $500 in Amazon AWS bandwidth corresponds to 46 billion NTP polls[1] ). Determining which clients poll excessively may provide us a tool to reduce the costs of maintaining our NTP server.

This blog posts describes the polling interval of several clients running under several hypervisors, and one client running on bare metal (OS X). This post also describes our methodology in gathering those numbers.

NTP Polling Intervals

The polling intervals of ntpd vary from 64 seconds (the minimum) to 1024 seconds (the maximum)—as much as sixteenfold (note that these values can be overridden in the configuration file, but for purposes of our research we are focusing solely on the default values).

We discover that clients running on certain hypervisors correlate strongly in the amount of polling (e.g. the VirtualBox NTP clients frequently poll at the default minimum poll interval, 64 seconds).

Chart of NTP Polling Intervals

NTP Polling Intervals over a 3-hour period. Note the heavy cluster of dots around 64 seconds—the minimum polling interval

Close-up of the 64-second polling interval

A close-up of the 64-second polling interval (“minpoll”). Notice the dots are mostly VirtualBox with a sprinkling of KVM. NTP clients perform poorly under those hypervisors.

By examining the chart (the chart and the underlying data can be viewed on Google Docs), we can see the following:

  • The guest VMs running under VirtualBox perform the worst (with one exception: Windows). Note that their polling intervals are clustered around the 64-second mark—the minimum allowed polling interval.
  • The Windows VM appears to query for time but once a day. It doesn’t appear to be running ntpd; rather, it appears to set the time via the NTP protocol with a proprietary Microsoft client.
  • The OS X host only queried its NTP server once during a 3-hour period. Since this value (10800 seconds) is more than the default maxpoll value (1024 seconds), we suspect that OS X uses a proprietary daemon and not ntpd.
  • The guest VM running under ESXi performs quite well; although its datapoint is obscured in the chart, if one were to browse the underlying data, one would see that its datapoints are clustered around maxpoll, i.e. 1024 seconds.
  • The guest VM running under Xen (AWS) also performs quite well; its datapoints are also clustered around maxpoll.
  • The guest VM running under KVM performs better than the VirtualBox VMs, which is admittedly damning with faint praise. Their polling intervals tend to cluster around 128 seconds, with smaller clusters at 64 and 256 seconds.
Guest Operating System Hypervisor ntpd version Average polling interval (higher is better)
Ubuntu 14.04 64-bit VirtualBox 4.3.12 r93733 on OS X 10.9.4 4.2.6p5 126
FreeBSD 10.0 64-bit VirtualBox 4.3.12 r93733 on OS X 10.9.4 4.2.4p8 62
Windows 7 Pro 64-bit VirtualBox 4.3.12 r93733 on OS X 10.9.4 N/A 10800
OS X 10.9.4 N/A 86400
Ubuntu 13.04 64-bit AWS (Xen), t1.micro 4.2.6p5 1056
FreeBSD 9.2 64-bit Hetzner (KVM), VQ7 4.2.4p8 146
Ubuntu 12.04 64-bit ESXi 5.5 4.2.6p3 1048

Methodology

1. Choosing the Hypervisors and OSes to Characterize

We decide to characterize the NTP traffic of four different operating systems:

  1. Windows 7 64-bit
  2. OS X 10.9.3
  3. Ubuntu 64-bit (14.04, 13.04, and 12.04)
  4. FreeBSD[2] 64-bit (10.0 and 9.2)

We decide to test the following Hypervisors:

  1. VirtualBox 4.3.12 r93733
  2. KVM (Hetzner)
  3. Xen (Amazon AWS)
  4. ESXi 5.5

Why We Are Not Characterizing NTP Clients on Embedded Systems

We’re ignoring embedded systems, a fairly broad category which covers things as modest as a home WiFi Access Point to as complex as a high-end Juniper router.

There are two reasons we are ignoring those systems.

  • We don’t have the resources to test them (we don’t have the time or the money to purchase dozens of home gateways, configure them, and measure their NTP behavior, let alone the more-expensive higher-end equipment)
  • The operating system of many embedded systems have roots in the Open Source community (e.g. dd-wrt is linux-based, Juniper’s JunOS is FreeBSD-based). There’s reason to believe that the NTP client of those systems would behave the same as the systems upon which they are based.

We wish we had the resources to characterize embedded systems—sometimes they are troublemakers:

  • The operating system of embedded systems that do not have roots in the Open Source community have a poor track record of providing good NTP clients. Netgear, SMC, and D-Link, to mention a few, have had their missteps.

Why Windows and OS X NTP Clients Don’t Matter

Windows and Apple clients don’t matter. Why?

  • They are not our NTP clients. Both Microsoft and Apple have made NTP servers available (time.windows.com and time.apple.com, respectively) and have made them the default NTP server for their operating system.
  • They rarely query for time: Windows 7 only once a day, and OS X every few hours.

We suspect that fewer than 1% of our NTP clients are either Windows or OS X (but we have no data to confirm that).

Regardless of its usefulness, we’re characterizing the behavior of their clients.

2. Setting Up the NTP Clients

The ESXi, Xen (AWS), and KVM (Hetzner) clients have already been set up (not for characterizing NTP, but we’re temporarily borrowing them to perform our measurements); however, the VirtualBox clients (specifically the Ubuntu and FreeBSD guest VMs) need to be set up.

The 3 VirtualBox and 1 Bare-Iron NTP Clients

We choose one machine of each of the four primary Operating Systems (OS X, Windows, Linux, *BSD). We define hostnames, IP addresses, and, in the case of FreeBSD and Linux, ethernet MAC addresses (we use locally-administered MAC addresses[3] ). Strictly speaking, creating hostnames, defining MAC addresses, creating DHCP entries, is not necessary. We put in the effort because we prefer structure:

  • hostname↔IP address mappings are centralized in DNS (which is technically a distributed, not centralized, system, but we’re not here to quibble)
  • IP address↔MAC address mappings are centralized in one DHCP configuration file rather than being balkanized in various Vagrantfiles.

Here are the Four Hosts of the Apocalypse[4] (with apologies to St. John the Evangelist)

Operating System Fully-Qualified
Domain Name
IP Address MAC Address
OS X 10.9.3 tara.nono.com 10.9.9.30 00:3e:e1:c2:0e:1a
Windows 7 Pro 64-bit w7.nono.com 10.9.9.100 08:00:27:ea:2e:43
Ubuntu 14.04 64-bit vm-ubuntu.nono.com 10.9.9.101 02:00:11:22:33:44
FreeBSD 10.0 64-bit vm-fbsd.nono.com 10.9.9.102 02:00:11:22:33:55

Use Vagrant to Configure Ubuntu and FreeBSD VMs

We use Vagrant (a tool that automates the creation and configuration of VMs) to create our VMs. We add the Vagrant “boxes” (VM templates) and create & initialize the necessary directories:

vagrant box add ubuntu/trusty64
vagrant box add chef/freebsd-10.0
cd ~/workspace
mkdir vagrant_vms
cd vagrant_vms
for DIR in ubuntu_14.04 fbsd_10.0; do
  mkdir $DIR
  pushd $DIR
  vagrant init
  popd
done

Now let’s configure the Ubuntu VM. We have two goals:

  1. We want the Ubuntu VM to have an IP address that is distinct from the host machine’s. This will enable us to distinguish the Ubuntu VM’s NTP traffic from the host machine’s (the host machine, by the way, is an Apple Mac Pro running OS X 10.9.3).
  2. We want the Ubuntu VM to run NTP

The former is accomplished by modifying the config.vm.network setting in the Vagrantfile to use a bridged interface (in addition to Vagrant’s default use of a NAT interface); the latter is accomplished by creating a shell script that installs and runs NTP and modifying the Vagrantfile to run said script.

cd ubuntu_14.04/
vim Vagrantfile
  config.vm.box = 'ubuntu/trusty64'
  config.vm.network :public_network, bridge: 'en0: Ethernet 1', mac: '020011223344', use_dhcp_assigned_default_route: true
  config.vm.provision :shell, path: 'ntp.sh'
cat > ntp.sh <<EOF
  #!/usr/bin/env bash
  apt-get install -y ntp
EOF
vagrant up

Now that we have set up an Ubuntu 14.04 as a client, let’s turn our attention to FreeBSD 10.0.

cd ../fbsd_10.0
vim Vagrantfile
  config.vm.box = 'chef/freebsd-10.0'
  # Use NFS as a shared folder
  config.vm.network 'private_network', ip: '10.0.1.10'
  config.vm.network :public_network, bridge: 'en0: Ethernet 1', mac: '020011223355', use_dhcp_assigned_default_route: true
  config.vm.synced_folder ".", "/vagrant", :nfs => true, id: "vagrant-root"
  config.vm.provision :shell, path: 'gateway_and_ntp.sh'
cat > gateway_and_ntp.sh <<EOF
  #!/usr/bin/env bash
  route delete default 10.0.2.2
  route add default 10.9.9.1
  grep ntpd_enable /etc/rc.conf || echo 'ntpd_enable="YES"' >> /etc/rc.conf
  /etc/rc.d/ntpd start
EOF
vagrant up

The FreeBSD Vagrantfile is slightly different[5] than the Ubuntu Vagrantfile.

3. Capturing the NTP Traffic

We enable packet tracing on the upstream firewall (in the case of the VirtualBox guests or the bare-iron OS X host) or on the VM itself (in the case of our AWS/Xen, Hetzner/KVM, and ESXi guests).

Here are the commands we used:

# on our internal firewall
sudo tcpdump -ni em0 -w /tmp/ntp_vbox.pcap -W 1 -G 10800 port ntp
# on our AWS t1.micro instance
sudo tcpdump -w /tmp/ntp_upstream_xen.pcap -W 1 -G 10800 port ntp and ( host 216.66.0.142 or host 50.97.210.169 or host 72.14.183.239 or host 108.166.189.70 )
# our our Hetzner FreeBSD instance
sudo tcpdump -i re0 -w /tmp/ntp_upstream_kvm.pcap -W 1 -G 10800 port ntp and ( host 2a01:4f8:141:282::5:3 or host 2a01:4f8:201:4101::5 or host 78.46.60.42 or host 129.70.132.32 )
# our ESXi 5.5 instance
sudo tcpdump -w /tmp/ntp_upstream_esxi.pcap -W 1 -G 10800 port ntp and host 91.189.94.4

Notes

  • we passed the -W 1 -G 10800 to tcpdump; this is to enable packet capture for 10800 seconds (i.e. 3 hours) and then stop. This will allow us to capture the same duration of traffic from our machines, which makes certain comparisons easier (e.g. the number of times upstream servers were polled over the course of three hours).
  • we used the -w flag (e.g. -w /tmp/ntp_vbox.pcap) to save the output to a file. This enables us to make several passes at the capture data.
  • We filtered for ntp traffic (port ntp)
  • for machines that were NTP servers as well as clients, we restricted traffic capture to the machines that were its upstream server(s) (e.g. the ESXi’s Ubuntu VM’s upstream server is 91.189.94.4, so we appended and host 91.189.94.4 to the filter)

4. Converting NTP Capture to CSV

We need to convert our output into .csv (comma-separated values) files to enable us to import them into Google Docs.

VirtualBox Clients

Ubuntu 14.04

We determine the upstream NTP servers using ntpq:

vagrant@vagrant-ubuntu-trusty-64:~$ ntpq -pn
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+74.120.8.2      128.4.1.1        2 u    4   64  377   52.586  -34.323   2.141
-50.7.64.4       71.40.128.146    2 u   63   64  375   84.136  -28.303   3.513
-2001:19f0:1590: 128.4.1.1        2 u   56   64  377   91.651  -24.310   2.218
*4.53.160.75     204.9.54.119     2 u   35   64  377   59.146  -32.741   3.297
+91.189.94.4     193.79.237.14    2 u    2   64  377  147.590  -32.185   1.860

Next we create a .csv file to be imported into Google Docs for additional manipulation:

for NTP_SERVER in
  74.120.8.2
  50.7.64.4
  2001:19f0:1590:5123:1057:a11:da7a:1
  4.53.160.75
  91.189.94.4
do
  tcpdump -tt -nr ~/Downloads/ntp_vbox.pcap src host $NTP_SERVER |
   awk 'BEGIN {prev = 0 }; { printf "%dn", $1 -prev; prev = $1 }' |
   tail +2 | sort | uniq -c |
   sort -k 2 |
   awk "BEGIN { print "polling interval (seconds), VB/Ubu/$NTP_SERVER" }
        { printf "%d,%dn", $2, $1 }" > /tmp/vb-ubu-$NTP_SERVER.csv
done

Notes regarding the shell script above:

  • tcpdump‘s -tt flag is to generate relative timestamps, so that we may easily calculate the amount of time between each response
  • tcpdump‘s src host parameter is to restrict the packets to NTP responses and not NTP queries (it’s simpler if we pay attention to half the conversation)
  • the first awk command prints the interval (in seconds) between each NTP response
  • the tail command strips the very first response whose time interval is pathological (i.e. whose time interval is the number of seconds since the Epoch, e.g. 1404857430)
  • the sort and uniq tells us the number of times a response was made for a given interval (e.g. “384 NTP responses had a 64-second polling interval”)
  • the second sort command sorts the query by seconds, lexically (not numerically). The reason we sort lexically is because the join command, which we will use in the next step, requires lexical collation, not numerical. (in other words, “1 < 120 < 16 < 2”, not “1 < 2 < 16 < 120”)
  • the second awk command puts the data in a format that’s friendly for Google spreadsheets
FreeBSD 10.0

We use ntp -pn to determine the upstream NTP servers.

Then we create .csv files:

for NTP_SERVER in
  72.20.40.62
  69.167.160.102
  108.166.189.70
do
  tcpdump -tt -nr ~/Downloads/ntp_vbox.pcap src host $NTP_SERVER |
   awk 'BEGIN {prev = 0 }; { printf "%dn", $1 -prev; prev = $1 }' |
   tail +2 | sort | uniq -c |
   awk "BEGIN { print "polling interval (seconds), VB/FB/$NTP_SERVER" }
        { printf "%d,%dn", $2, $1 }" |
   sort > /tmp/vb-fb-$NTP_SERVER.csv
done
Windows 7

The Windows server is easier: there’s only one NTP server it queries (time.windows.com), so we can filter by our VM’s IP address rather than the NTP server’s IP address:

tcpdump -tt -nr ~/Downloads/ntp_vbox.pcap dst host w7.nono.com |
 awk 'BEGIN {prev = 0 }; { printf "%dn", $1 -prev; prev = $1 }' |
 tail +2 | sort | uniq -c |
 awk "BEGIN { print "polling interval (seconds), VB/W7" }
   { printf "%d,%dn", $2, $1 }" |
 sort > /tmp/vb-w7.csv
OS X

Like Windows, there’s only one NTP server (time.apple.com), so we can filter by our VM’s IP address:

tcpdump -tt -nr ~/Downloads/ntp_vbox.pcap dst host tara.nono.com |
 awk 'BEGIN {prev = 0 }; { printf "%dn", $1 -prev; prev = $1 }' |
 tail +2 | sort | uniq -c |
 awk "BEGIN { print "polling interval (seconds), OS X" }
   { printf "%d,%dn", $2, $1 }" |
 sort > /tmp/osx.csv

Xen (AWS) Client

for NTP_SERVER in
  216.66.0.142
  72.14.183.239
  108.166.189.70
do
  tcpdump -tt -nr ~/Downloads/ntp_upstream_xen.pcap src host $NTP_SERVER |
   awk 'BEGIN {prev = 0 }; { printf "%dn", $1 -prev; prev = $1 }' |
   tail +2 | sort | uniq -c |
   awk "BEGIN { print "polling interval (seconds), Xen/Ubu/$NTP_SERVER" }
        { printf "%d,%dn", $2, $1 }" |
   sort > /tmp/xen-ubu-$NTP_SERVER.csv
done

KVM (Hetzner) Client

for NTP_SERVER in
  2a01:4f8:141:282::5:3
  2a01:4f8:201:4101::5
  78.46.60.42
do
  tcpdump -tt -nr ~/Downloads/ntp_upstream_kvm.pcap src host $NTP_SERVER |
   awk 'BEGIN {prev = 0 }; { printf "%dn", $1 -prev; prev = $1 }' |
   tail +2 | sort | uniq -c |
   awk "BEGIN { print "polling interval (seconds), KVM/FB/$NTP_SERVER" }
        { printf "%d,%dn", $2, $1 }" |
   sort > /tmp/kvm-fb-$NTP_SERVER.csv
done

ESXi Client

tcpdump -tt -nr ~/Downloads/ntp_upstream_esxi.pcap src host 91.189.94.4 |
 awk 'BEGIN {prev = 0 }; { printf "%dn", $1 -prev; prev = $1 }' |
 tail +2 | sort | uniq -c |
 awk "BEGIN { print "polling interval (seconds), ESXi/Ubu" }
   { printf "%d,%dn", $2, $1 }" |
 sort > /tmp/esxi-ubu.csv

5. Merging the 17 .csv Files

Next we need to merge the above files into one file that we can easily import into Google Docs.

COMMAS=''
CSV_INDEX=0
for CSV_FILE in *.csv; do
  CSV_TEMP=$$.$CSV_INDEX.csv
  CSV_INDEX=$(( CSV_INDEX + 1 ))
  [ ! -f $CSV_TEMP ] && touch $CSV_TEMP
  ( join -t ,      $CSV_TEMP $CSV_FILE
    join -v 1 -t , $CSV_TEMP $CSV_FILE | sed "s/$/,/"
    join -v 2 -t , $CSV_TEMP $CSV_FILE | sed "s/([^,]*$)/$COMMAS1/" ) |
  sort > $$.$CSV_INDEX.csv
  COMMAS="$COMMAS,"
done

Note:

  • we use the join command to merge the proper fields together; this is so our scatterplot will display properly. The join-field is the polling interval in seconds
  • we use 3 iterations of join
    1. the first one merges the fields with common polling intervals
    2. the second one merges the polling intervals that are present in the first file but not the second
    3. the final one merges the polling intervals that are present in the second file but not the first
  • we invoke sort in order to keep our temporary files lexically collated, a requirement of join
  • we create a series of temporary files, the last one of which (e.g. 5192.17.csv) we will import into Google Docs
  • we need to perform one final sort before import (we need to sort numerically, not lexically):
sort -g < 5192.17.csv > final.csv

6. Mastering Google Docs

In order to create our scatterplot, we must comply with Google’s requirements. For example, each column needs at least 1 datapoint.

  • we add a value of 1 polling interval of 10800 seconds to the OS X column. During our 3-hour packet capture, our OS X host only queried its NTP server once, and we removed that packet (we measure intervals between packets, and we need at least 2 packets measure). Our data now indicates that OS X queries once every 3 hours.
  • we remove the column VB/FB/72.20.40.62. That NTP server is unreachable/broken and has no data points.
  • we add a value of 1 polling interval of 86400 seconds to the VB/W7 column. Windows 7 appears to only query for time information once per day (not discovered in this packet capture but in an earlier one)

Footnotes

1 Math is as follows:

90 B / NTP poll
$500 total
$0.12 / 1 GB

$500
× 1 GB / $0.12
× 1,000,000,000 bytes / GB
× 1 poll / 90 B

= 46296296296 polls = 46.29 Gpolls

2 The inclusion of FreeBSD in the list of Operating Systems is made less for its prevalence (it is vastly overshadowed by Linux in terms of deployments) than for the strong emotional attachment the author has for it.

3 To define our own addresses without fear of colliding with an existing address, we set the locally administered bit (the second least significant bit of the most significant byte) to 1.

4 The term “host” has a specific connotation within the context of virtualization, and we are deliberately mis-using using that term to achieve poetic effect (i.e. “hosts” sounds similar to “horsemen”). But let’s be clear on our terms: a “host” is an Operating System (usually running on bare-iron, but optionally running as a guest VM on another host) running virtualization software (e.g. VirtualBox, Fusion, ESXi, Xen); a “guest” is an operating system that’s running on top of the virtualization software which the host is providing.

In our example only one of the 4 hosts is truly a host—the OS X box is a true host (it provides the virtualization software (VirtualBox) on top of which the remaining 3 operating systems (Ubuntu, FreeBSD, and Windows 7) are running).

5 We’d like to point out the shortcomings of the FreeBSD setup versus the Ubuntu setup: in the Ubuntu setup, we were able to use a directive (use_dhcp_assigned_default_route) to configure Ubuntu to send outbound traffic via its bridged interface. Unfortunately, that directive didn’t work for our FreeBSD VM. So we used a script to set the default route, but the script is not executed when FreeBSD VM is rebooted, and the FreeBSD VM will revert to using the NAT interface instead of the bridged interface, which means we will no longer be able to distinguish the FreeBSD NTP traffic from the OS X host’s NTP traffic.

The workaround is to never reboot the FreeBSD VM. Instead, we use vagrant up and vagrant destroy when we need to bring up or shut down the FreeBSD VM. We incur a penalty in that it takes slightly longer to boot our machine via vagrant up.

Also note that we modified the config.vm.network to use a host-only network instead of the regular NAT network. That change was necessary for the FreeBSD guest to run the required gateway_and_ntp.sh script. Virtualbox was kind enough to warn us:

NFS requires a host-only network to be created.
Please add a host-only network to the machine (with either DHCP or a
static IP) for NFS to work.

About the Author

Biography

Previous
Deciphering PM Lingo
Deciphering PM Lingo

I’m often asked for a list of terms a new Product Manager should know. This may be because someone is tryin...

Next
Pivotal CEO Paul Maritz @ Cloud Foundry Summit—Building a Lasting, Valuable Contribution for the World
Pivotal CEO Paul Maritz @ Cloud Foundry Summit—Building a Lasting, Valuable Contribution for the World

In this post, we highlight the video where Pivotal CEO Paul Maritz explains how the Cloud Foundry community...