All posts by Neil

Google Ping Update

Update: here’s the traceroute when ping performance is 50+ msec:

1 pfsense.nw 1.578 ms
2 * * *
3 * * *
4 tge7-2.ausbtx5202h.texas.rr.com 22.567 ms
5 tge8-5.ausbtx5201h.texas.rr.com 18.703 ms
6 tge0-12-0-6.ausutxla01r.texas.rr.com 15.788 ms
7 agg22.dllatxl301r.texas.rr.com 21.964 ms
8 107.14.17.136 21.865 ms
9 ae1.pr1.dfw10.tbone.rr.com (107.14.17.234)
10 207.86.210.125 (207.86.210.125) 26.794 ms
11 207.88.14.182.ptr.us.xo.net 21.649 ms
12 207.88.14.189.ptr.us.xo.net 17.312 ms
13 ip65-47-204-58.z204-47-65.customer.algx.net 20.543 ms
14 72.14.233.85 24.646 ms
15 72.14.237.219 24.955 ms
16 209.85.243.178 27.354 ms
17 72.14.239.136 43.774 ms
18 216.239.50.237 48.918 ms
19 209.85.243.55 44.747 ms
20 ord08s11-in-f20.1e100.net 43.592 ms

Four extra hops. Hitting a google server presumably somewhere in Chicago instead of Dallas. ORD is the Chicago O’Hare airport code; I don’t really know whether the google data centers are in fact at/near airports or whether they are just using airport codes as a convenient naming scheme for a general area.

So, sometimes I get directed to Dallas, sometimes to Chicago. Will report if I ever see any other server locations.

As an aside, “1e100.net” is google’s clever name for their network.

Hilltop Google Ping Performance

For the past month I have been measuring internet ping (ICMP ECHO) performance from my hilltop network. I do this with a script that runs every 30 minutes and measures the response time of www.google.com as reported by ping.

The script first pings www.google.com exactly once in an attempt to cache the DNS lookup (to avoid DNS time affecting the results). It then invokes the unix ping program with “-c 10″ to do 10 pings. I throw out the highest and lowest result times and average the remaining 8. I record the results in a NumerousApp metric (of course).

The results are shown here:

Hilltop Google Ping

Raw data available in the numerous metric: Hilltop Google Ping

Ignore the occasional spikes when obviously some network disruption was causing consistently high ping times for that measurement (and there is one zero data point where a bug in the script caused a zero reading when the network was completely down).

What’s left is two different consistent readings – something in the low-mid 20msec response range and something averaging in the 50msec response range. It seems pretty apparent that there are two different routes between my hilltop network and www.google.com and for whatever reason sometimes I’m hooked up to the faster/shorter route and sometimes the longer one.

Here, for your amusement, is the heart of my “pinggoo” script:

ping -c 10 $TARGET |
grep from | grep 'time=' |
sed -e 's,.*time=,,' | 
awk ' { print $1 } ' | sort -n | sed -e '1d' -e '$d' | 
awk 'BEGIN {SUM=0; N=0} {SUM=SUM+$1; N=N+1} END {print SUM/( (N>0) ? N : 1)}'

Here’s what a typical output line from ping looks like on my mac:

64 bytes from 74.125.227.114: icmp_seq=0 ttl=53 time=22.388 ms

This is some truly fine shell hackery. It turns out the two grep statements are redundant (either one alone suffices) but I put them both in as a way to ensure I was really looking only at the successful ping lines (the ping program itself puts out a lot of other verbose output). Then the sed deletes everything prior to the ping time. The awk program print $1 separates out the time from the trailing “ms”. What we then have is (hopefully) a list of 10 numbers, one per line. I use sort to put them in numeric order, then sed to delete the first and last line (highest/lowest ping time) and then the final awk program to calculate the average.

I’m sure I could have done all this with a single pass of awk or a python program or something along those lines; however, one nice thing about this hackery is that it is fairly robust across ping variants; so far this has worked just fine on my mac and on Debian wheezy, even though the two ping programs have different output formats (but the essential “time=” part is similar enough on both to work unchanged with this script).

Here’s a traceroute I just did while I appear to be getting the faster performance:

 
 1  pfsense.nw 1.379 ms
 2  * * *
 3  * * *
 4  tge7-5.trswtx1202h.texas.rr.com 20.527 ms
 5  tge0-12-0-14.ausxtxir02r.texas.rr.com 18.915 ms
 6  agg22.hstqtxl301r.texas.rr.com 28.277 ms
 7  107.14.19.94 24.099 ms
 8  ae-0-0.cr0.dfw10.tbone.rr.com 27.132 ms
 9  ae0.pr1.dfw10.tbone.rr.com 24.956 ms
10  207.86.210.125 21.302 ms
11  207.88.14.182.ptr.us.xo.net 25.221 ms
12  207.88.14.189.ptr.us.xo.net 24.043 ms
13  ip65-47-204-58.z204-47-65.customer.algx.net 27.043 ms
14  72.14.233.77 26.225 ms
15  64.233.174.137 25.225 ms
16  dfw06s32-in-f18.1e100.net 23.943 ms

I edited out a bunch of the output detail to make it fit on this page better. Sixteen hops to whichever google server is serving me. If we interpret the hostname at face value my google server (at this moment) is in DFW somewhere.

I’m on Time Warner cable and was recently upgraded to 100Mb performance. This (unsurprisingly) doesn’t seem to have had any material impact on the ping times (throughput and latency being somewhat independent).

I’ll report back if I get any other interesting traceroute data especially when I’m in the 50msec performance arena.

Murphy’s Law

This will reaffirm your faith in Murphy’s Law.

Time Warner recently replaced my cable modem – upgraded for higher performance.

On Friday I went to check the physical install. The modem is down the hill – a quarter mile away – and connects to the house network via a long fiber run.

A long time ago I installed a “remote power rebooter” device for the cable modem so that on those all-too-often occasions when the modem needs to be physically reset it could be power cycled remotely/automatically. Of course the cable guy didn’t plug the new modem into this device, instead he plugged it directly into the wall.

As an aside: the remote control power gizmo I’m using is from Synaccess and it is awesome:

http://www.synaccess-net.com/remote-power.php/1/8

You set this box up to ping a remote address (e.g.,www.google.com) and if it loses connectivity it will power-cycle the outlet. So any time my cable modem wedges it gets rebooted automatically when this device detects loss of internet connectivity.

On Friday I should have unplugged the cable modem and moved it back to the rebooter power outlet. But I was literally on my way out of the house to go out of town for the weekend. The Number One Rule of IT was looming large in my mind: “If it’s working, don’t mess with it”.

The cable modem was working. Power cycling it right before leaving seemed foolish. I could fix it at my leisure on Monday when I got back.

Ah, Murphy, I am so sorry I tempted you that way.

Of course within hours of me actually *being* out of town, the cable modem wedged for some inexplicable reason. I had no VPN access to my home network the entire time I was gone. It wasn’t a big deal, but it was annoying, especially since I knew exactly WHY the modem hadn’t rebooted itself automatically after getting wedged and yet I was a few thousand miles away from it to fix it.

Maybe Thoreau was right.

NumerousApp

Here’s what I’ve been working on in the “office” … integrating different things into www.numerousapp.com

You can monitor:

Check out the iPhone numerous application (Android coming soon; these guys have just gotten started). Disclaimer: they are friends of mine and I am an investor/advisor.

SNMP at home

Several devices on my home network support SNMP. Here’s one way I’m making use of this.

Want to know the last time your router rebooted? Try this:

snmpget -c public server-name system.sysUpTime.0

That UNIX command will work as-is on a Mac. It will also work on Debian Linux but you may have to install the snmp package first and you’ll also have to download the so-called “MIB” to define the “system.sysUptime.0″ name.

Without the MIB installed you can still just use the raw low-level object ID:

snmpget -c public server-name 1.3.6.1.2.1.1.3.0

Some (older) Apple airports, most Cisco devices, many routers, and other such networking equipment support SNMP. So, for example, on my (large) home network I have four Cisco WiFi transmitters. I’ve been interested in charting how often the power fails at my house. So I wrote a script to send each WiFi box this command once a day:

nw% snmpget -c public wap-1 system.sysUpTime.0

DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (1122240526) 129 days, 21:20:05.26

(Command to server “wap-1″ shown) and I record the results. Obviously when there’s been a power failure all the WAPs reboot and now I can know if that has happened (the server these monitoring scripts run on has a UPS so it stays up; otherwise I could of course just look at the server’s own uptime).

If you google snmp system object you will find some pages with more names/OIDs for other variables that might be useful. Also your devices likely have many interesting device-specific parameters you can fetch.  Google is your friend for finding more MIB/OIDs to try.

My pfSense router supports SNMP but you have to enable the SNMP service first (not enabled by default). This may be true on other device types as well so if they don’t respond to SNMP browse through their admin interfaces and see if you have to (i.e., can) enable SNMP.