Random Python Performance Posting

I’ve been writing some very elementary probability simulations in python and wanted to simulate millions of coin flips. I started out with this somewhat-obvious code:

    ht = [0, 0]
    for i in range(n):
        ht[random.choice((0, 1))] += 1

which counts heads/tails in ht[0] and ht[1] and performs n flips.

Can this code be sped up? Inserting time.time() calls before/after the “for” loop I found that this code can perform 10 million flips in 11.8 seconds. So it’s about 1.18 microseconds per flip.

[ aside: all measurements were made on my MacBook Air. I tried to avoid doing anything else while running the timing tests, and in all cases took the fastest result I could get. All results for a given method were similar and were run multiple times at multiple different points in time, so I believe there were no confounding factors with background activities. Your mileage may vary. ]

Borrowing a trick from the old geezer days, I thought “surely if we unroll the loop this will go faster”.  So I changed it to “for i in range(n//100)” and then put 100 of the “ht[random.choice(…” statements in the body of the loop.

Result? 1.16 microseconds per flip. Ok, that’s faster, but by less than 1%. Kudos to python for having efficient looping, plus, of course, the random() call dominates the cost anyway.

Ok maybe random.choice is slower than explicitly asking for a random number:

    ht = [0, 0]
    for i in range(n):
	ht[random.randrange(2)] += 1

Nope, that came in significantly slower! 1.33 microseconds per flip.

Maybe I could speed it up by eliminating array access and just adding up +1 and -1 values in a loop and reconstructing the heads vs. tails numbers after:

    delta = 0
    for i in range(n):
	delta += random.choice((-1, 1))  

That ran as fast as 1.14 microseconds per flip, but still just an insignificant improvement over the the original and arguably more “obvious” code.

Of course, by this point readers are screaming at me “duh, all the time is in the random function and there’s nothing you can do about it” and they are right. I should have tested that first. For example, this code:

    ht = [0, 0]
    for i in range(n):
        ht[1] += 1

runs at 0.11 microseconds per “flip” (albeit there is no flip, it just always adds to ht[1] in this example). Or if we go all the way to the delta method, it will be as little as 0.06 microseconds per “flip” for this:

    delta = 0
    for i in range(n):
        delta += 1

obviously there is no “flip” here but it’s fast.

Of course, we can use a single call to random to generate more than one random bit at a time. In C we might generate a 32-bit random number and then peel off the bits one by one. I’m guessing that doing this in python won’t be as efficient, but there’s still a way to easily amortize one call to the random method over multiple flips. For example we can enumerate all sixteen possibilities for four flips. In binary, they would be: 0000, 0001, 0010, 0011, … 1100 1101 1110 1111. We can note that the first (“0000”) is 4 heads (arbitrarily calling the 0 heads, it doesn’t matter which) and 0 tails. The next one (“0001”) is 3 heads and 1 tails, and so on. The complete enumeration of the outcomes looks like this:

    fourflips = ((4, 0),    # 0000                                              
                 (3, 1),    # 0001                                              
                 (3, 1),    # 0010                                              
                 (2, 2),    # 0011                                              
                 (3, 1),    # 0100                                              
                 (2, 2),    # 0101                                              
                 (2, 2),    # 0110                                              
                 (1, 3),    # 0111                                              
                 (3, 1),    # 1000                                              
                 (2, 2),    # 1001                                              
                 (2, 2),    # 1010                                              
                 (1, 3),    # 1011                                              
                 (2, 2),    # 1100                                              
                 (1, 3),    # 1101                                              
                 (1, 3),    # 1110                                              
                 (0, 4))    # 1111           

and now our loop could look like this:

    ht = [0, 0]
    for i in range(n//4):
        flips = random.choice(fourflips)
        ht[0] += flips[0]
        ht[1] += flips[1]

Running this code we get 0.330 microseconds per flip. About three and a half times as fast as the original code – which makes sense, because we are calling the random function one fourth as many times and that call dominates the timing.

Of course, in doing this optimization I’ve lost the data on individual flips and now I only know the relative number of heads and tails for the four flips that are grouped together. In other words, I can no longer distinguish between a group of four flips that were “HTHT” vs “HHTT” or many other combinations that will all appear as just (2, 2) when returned by random.choice. In my application this is unimportant. If it were important, I could of course have stored the individual flips in the fourflips variable, constructing it this way:

    fourflips = (
            (0, 0, 0, 0),
            (0, 0, 0, 1),
            (0, 0, 1, 0),
     ... etc

and processing it accordingly. I stuck with the aggregated counts because that’s all I cared about in my simulations.

We can take this one level further and combine the running delta method (+1 for a heads, -1 for a tails, and reconstruct the final number of heads and tails from the total number of flips and the delta), plus we can compute the delta ahead of time for each element of the fourflips list (e.g., (0, 4) is a delta of -4, (2, 2) is a delta of 0, etc). Combining those concepts we get:

    ht = [0, 0]
    fourdeltas = (4, 2, 2, 0, 2, 0, 0, -2,
                  2, 0, 0, -2, 0, -2, -2, -4)
    delta = 0
    for i in range(n//4):
        delta += random.choice(fourdeltas)

which gets us all the way down to 0.285 microseconds per flip.

How far can this optimization go? As far as you have patience for and as large as you want the list of deltas to be. I tested the same idea at five flips precomputed as deltas (i.e., five bits at a time) and got down to 0.232 microseconds per flip. There’s a diminishing return for each additional bit added – at five deltas we’re calling the random function 20% as often as the original code, but at four deltas we were already down to 25% so the improvement isn’t as dramatic. Also the deltas list doubles in size each time though of course that can be precomputed by a program so it’s not clear that there’s a pragmatic limit. I don’t know if at some point the length of the deltas list makes random.choice execute slower; I don’t think it would but it depends on implementation details.

For my purposes I was happy at five bits per call to random.choice and 0.232 microseconds per flip, which is more or less five times faster than the original.

 

USENET lives – sort of

I was “there” during the early days of Usenet, back when:

  • News was updated only twice a day, when our server dialed up another machine we had a mutual exchange with.
  • Business cards had bang-paths on them. For example: harvard!cfisun!palladium!nw on an old one of mine.
  • Articles were expired after just a few days because disk space was perpetually scarce.
  • The Great Renaming happened (1987) and we went from everything being “net.foo” to comp.foo, news.foo, talk.foo, etc.

I just naturally assumed the modern world had done away with usenet, so I was amused/surprised to find it (“USENET”) as the answer to this clue in today’s Wall Street Journal crossword puzzle:

I have to wonder what percentage of Wall St Journal crossword puzzle enthusiasts have ever heard of usenet, let alone ever posted on it!

P.S. There is no Cabal.

Fanless FreeBSD – Kingdel PC

Being a crusty UNIX guy, sometimes I prefer FreeBSD as a dedicated headless server instead of Linux. I recently needed a quiet (fanless) box and purchased this Kingdel box from Amazon.

Front view:

 

Rear panel:

 

It came with Windows10 pre-installed, which I promptly wiped out with a full installation of FreeBSD11.1 (amd64). There were only two tricky parts that I’m documenting here in the hopes that someone’s google search will stumble upon them if needed.

First, the BIOS was configured with only a 1 second delay for hitting the magic key (DELETE) to abort the Windows10 boot. I couldn’t remember the right key (is it always DELETE these days?) and since the delay was so short I couldn’t read the message “hit DELETE to stop boot” in the power-up screen. Google to the rescue and then “keep pressing DELETE over and over again during power up” worked.

Second, I had to fool with the BIOS settings to get it to recognize my external USB CD-ROM drive (containing the FreeBSD iso installation image). I had to change the power-on device recognition delay from “automatic” to “manual” and put in a 5 second delay, which made it work. Your mileage may vary depending on what external CD-ROM drive you have. I’m using one that is literally a decade old. It seems clear the Kingdel people (reasonably) turned all the delay knobs to the minimum values to speed bootup into the pre-installed Windows.

A note on how to make the WiFi work. The FreeBSD name for the WiFi device is iwn0. Follow the standard instructions for configuring FreeBSD WiFi, but note that they are written for the “ath” driver not the “iwn” driver (so substitute accordingly).

This means put the following into /etc/rc.conf:

wlans_iwn0="wlan0"
ifconfig_wlan0="WPA SYNCDHCP"

and create the file /etc/wpa_supplicant.conf containing (for example);

network={
   ssid="put SSID here"
   psk="put password here"
}

Your mileage may vary depending on your specific WiFi configuration requirements; in my case I tested this procedure just to make sure the WiFi adapter works (it did) but for my application my device is hardwired.

Two LAN interfaces, a WiFi interface, four serial ports, a zillion USB ports; Kingdel markets this as an “Industrail” computer (note misspelling, lmao). I’m using it to run a bunch of automation scripts and python code and the like, for which it is overkill (I had been running this on a Pi) but still silent.

Christmas Lights 2017

This is an update on the technology being used to drive my Christmas light display this year.

QUICK OVERVIEW

I have about 1100 feet of Red/Green/Blue LED strip around the perimeter of my roof. Because of length limitations, this is implemented as eight separate sections, each with its own separate controller. There is also a separate section on the observation tower, with its own controller.

These controllers can receive infrared (i.e., standard remote-control) commands. To control them I built an IR repeater circuit using an Arduino and wrote some python code to drive everything.

Last year’s detailed write-up is still generally correct, aside from anything new I write about here.

THIS YEAR’S IR REPEATER CIRCUITRY

At the end of last year I prototyped a modification for controlling the roof perimeter (8 strands / 8 controllers) separately from the observation tower. With this modification I can have the roof perimeter all be one color while the observation tower is another color, or blink the tower but not the roof, and vice versa. I still can’t control individual strands on the roof perimeter (entire perimeter will always display identical color); however, since the boundaries of these strands are haphazard (they occur wherever one strand ends and another begins) and coarse (there are only 8 strands across the entire 1100 feet of perimeter), it’s not clear that viable effects could be had by controlling those individually. Though I may try that next year anyway (haha of course).

This year I implemented a new version of the IR repeater circuitry to let me control the roof perimeter (as a whole) separately from the tower. The new circuit looks like this (click image to open it full size if you want):

 

 

This is just a refined version of the modification I performed last year.

To control the roof perimeter, I have eight individual infrared emitters taped onto the receiving area of each of the eight controllers (one controller per strand) scattered around my roof perimeter. The leads from these emitters are connected to wires that all run back to one spot at my house where I have connected them all in series, with additional components as shown in the above circuit diagram.

The advantage, in my application, of connecting them all in series is that if any one of them fails, I’d rather have all eight of the lighting strands become non-responsive, rather than having all but one of them responding to commands. That would look wrong; it’s better to have them all stuck on one color plus that would make me notice the problem right away. This is all theoretic, as no IR emitter (or wire leads to them) has failed this year or last.

Power for the IR emitters is supplied this year by a 12V regulated power source.  Last year I used an unregulated wall-wart; this year I am using a scavenged PC board power supply.

The 38KHz IR digital PWM signal comes out of my Arduino on pin3, all as described in last-year’s article write-up. This signal drives a MOSFET gate to modulate the power to the entire string of IR emitters (which together require more power than the Arduino can drive directly; hence the 12V supply for that part of the circuit).

However, rather than feeding the 38KHz signal directly to the MOSFET gate, it is split into two and fed into two separate AND gates from an SN74HCT08 quad dual-input AND chip. The two “enable” lines – ENA1 and ENA2 – are just simple digital outputs from the Arduino and allow me to separately enable the signal on its way to the two different MOSFETs. By turning ENA1 and ENA2 on/off in my code, I can determine whether IR commands will go out to just the roof perimeter, the tower, or both.

Although we might casually think of the HIGH and LOW inputs on logic chips as being 5 volts vs zero, the TTL spec is broader than that and allows a HIGH to be as low as 2.7 volts. It turns out the SN74HCT08 AND gate output is higher than that, but it is still not high enough to drive the MOSFET gate directly like I was doing when it was being driven directly from the Arduino output pin. For this reason I also inserted a TC427 MOSFET gate driver into the MOSFET gate path. This chip converts a TTL-level input into a rail-to-rail signal (5V/0V in this case) suitable for driving a MOSFET gate input. In general it’s probably a best practice to use a driver chip like this for MOSFETs anyway, even if you are coming directly out an Arduino with sufficient voltage for the 4.5V logic-level requirement of this particular MOSFET gate.

SOFTWARE

I wrote about my software extensively before and put a repository,  arduino-json-IO on github that implements a tiny web server in an Arduino and allows you to send it commands to perform various digital I/O operations. One of those commands allows you to send PWM-modulated IR codes. This makes extensive use of the Arduino IRRemote library to do the actual PWM control.

The IRRemote library outputs these PWM waveforms on pin 3, which becomes the “IR” signal in my circuit diagram above.

The new question, with the enable lines, becomes how to manage those. I could just have used the existing capabilities of arduino-json-IO and explicitly managed the enable lines by writing pseudo-code like this:

# to do something with just the tower
POST "set ENA1 low" command to arduino
POST "set ENA2 high"
POST "IR command for a tower color"

but this is cumbersome and, more importantly, it requires multiple HTTP transactions between the python code driving all this and the poor little arduino generating the IR codes (and enables). Of course, this could be factored out since we only need to send the enable line commands when they need to change from their current state, but that would then require keeping track of the output enable states, and also would be subject to getting “out of sync” if, for example, the arduino server rebooted due to a bug, or a power glitch.

To avoid all that, I decided to customize the generic arduino-json-IO library to add the enable lines directly into the JSON structure sent along with each POST request to the IR emitter code. The way it works now is that the enable lines are set high when an “enable: xxx” directive is encountered in the JSON (“xxx” being the pin to set high) and any pin that was set high as a result of doing that is returned to LOW when the POST request processing is finished. This makes the management of the enable pins be, essentially, an “atomic” operation tied in with each individual POST request that sends IR codes.

The revised code is available here:

neilwebber.com/files/xmas-led/IR-enables.ino

Admittedly this isn’t as “generic” as it could be, but the beauty of something like Arduino is that it’s not unreasonable to customize the embedded software for a specific application, which is exactly what is going on here with this modification.

Given all that let’s review the old way a “heartbeat” effect was created with the arduino-json-IO IR POST command. The JSON I sent looked like this:

[{"codes":
  [{"protocol":"NEC","bits":32},
   {"code":16718565,"delay":525000},
   {"code":16732335,"delay":175000},
   {"code":16718565,"delay":525000},
   {"code":16732335,"delay":1050000}
  ],
 "repeat":10}]

The minimum gap that I found reliable between IR commands was 175msec (175000 microseconds). Call that period of time a “beat”. The above JSON commands the lights to be RED (16718565) for 3 “beats” (about half a second – 525msec), OFF for one beat (175msec), RED for 3 beats, OFF for 6 beats, and then repeats that entire cycle 10 times. This creates a “heart beat” like effect on the lights, all with one POST operation to the arduino server.

With the enable-line modification, that POST request now looks like this:

[{"codes":
  [{"enable":6},
   {"enable":7},
   {"protocol":"NEC","bits":32},
   {"code":16718565,"delay":525000},
   {"code":16732335,"delay":175000},
   {"code":16718565,"delay":525000},
   {"code":16732335,"delay":1050000}
  ],
 "repeat":10}]

Where pins 6 and 7 are my ENA1 and ENA2 pins (roof perimeter enable and tower enable). The arduino server will drive those pins HIGH when the “enable” element is encountered in the “codes” sequence, and will return them to LOW at the end of the “codes” sequence. In this way the management of the enable pins becomes atomic and stateless with respect to any given POST operation.

I wrote some python library code to encapsulate all this into an “XMASLED” object, with methods such as “heartbeat” that would generate the above JSON code and post it to the server. The question then became how to control which enable lines to turn on/off in any given request. I decided to use python context managers for this, instead of explicit “enable” / “disable” method calls. Conceptually the XMASLED object contains two state variables for the enables – “enable_tower”, and “enable_perim”, and the various methods such as heartbeat() use them to form the above JSON. The only thing the context managers do is provide a syntactic sugar allowing these variables to be saved/restored and automatically returned to the prior values on return (or exception) from a nested structure. Thus, the python code to run the heartbeat routine only on the tower, while having the roof be green, looks something like this:

# "X" is the XMASLED object
X.send(X.GREEN)    
with X.tower_only():
    X.heartbeat()

Arguably this is overkill, it wouldn’t have been the end of the world to write:

# (assume both enables are ON already)
X.send(X.GREEN)
X.disable(X.PERIMETER)
X.heartbeat()
X.enable(X.PERIMETER)

but the context manager way seemed a lot prettier, and it is robust against any exceptions (e.g., network down) that might throw us out of heartbeat and up to some higher level without knowing that the internal state for the perimeter enable was still “off”.

It’s hard to know where to stop with this idea of using a JSON data structure as a primitive programming language to have the arduino drive the IR emitters on its own. I’ve drawn the line at the spot we see here; enable pin management, sequences of individual codes, intra-code delays, and repeat counts can all be specified in a single POST command. Anything else requires multiple posts to the Arduino and management by higher-level code (i.e., python in my case).

NEXT YEAR

As I wrote about last year, these cheap controller boxes for the LED strands are really the wrong solution for this application. It’s fun that I’ve managed to build an integrated control system to operate 9 of them in unison via wifi and a baby web server interpreting JSON POSTs,  but every now and then one of the controllers misses a code (just like sometimes your TV seems to miss a button press on your remote control) and shows the wrong color. Plus there are other features people clamor for (“Can the lights change with the music?”) that can never be pragmatically implemented so long as my only control mechanism is limited to imitating an IR remote control.

So, I’m not sure about next year; I think I will be investigating higher-end commerical-grade control systems that already have integrated networking capability and are meant to be controlled “at scale” with multiple units at once. We’ll see…

Redeploying the Christmas Lights!

Almost ready to go again this year. I had a case from my now-dead soekris system; took everything out except the regulated 12V power supply and repurposed the case to hold all this nonsense (including the new “control the tower separately from the roof perimeter” circuit design).

Working in the lab – now to hook it up to the real LED strings (they went up on the roof this week but are currently dark until I hook this back up). Hoping nothing explodes!

My Lutron Experience

I have three Lutron home automation controllers in my house. They operate the motorized window shades and the exterior landscape lighting. My architect wanted me to have many more of these – to control all of the interior lighting. I vetoed that idea and insisted on regular, “you can buy them at home depot” switches for all my interior circuits. I am so glad I did that!

Here’s my Lutron installation with the covers off:

Three Lutron Automation Processors

The reason the covers are off today is because two of them died in a recent power failure. This happens “often”, this is the third time in nine years of owning these that I’ve had to call the automation company in to replace them.

Maybe you don’t think three times in nine years is “often” – but let me ask you this. When was the last time you replaced your microwave oven because of a power failure? How about your TV? Look around your house at all the equipment these days that has a computer inside it – pretty much every appliance you own has one. How many of them have you ever had to replace simply because the voltage fluctuated during a storm and killed the device?

I’m sure it happens from time-to-time, but the consumer-grade appliance manufacturers know that they would have a very bad reputation if their equipment died all the time in power failures. Lutron? Apparently doesn’t care. These processors must have little or no input voltage protection and any glitch on the power lines burns them out. Then, even if just one of them burns out, you end up having to replace all three because the company is constantly obsoleting old versions of these processors when new ones are released. New ones won’t interoperate with old ones.

It’s outrageously bad engineering and it’s hard not to point out that this bad engineering increases sales of the Lutron devices and the billable-hours of the installation/programming service providers.

I “fixed” the “one failed, but you have to replace all three” dilemma by stocking several additional processors the first time I got hosed by that. Unfortunately, today I am having the last spare installed and the next power-glitch will force an upgrade of all three even if just one dies. I am now investigating front-ending the power inputs on these devices with some server-room grade power conditioning instead.

Never, ever, ever, ever, ever allow anyone to talk you into installing this product in your house.

Netgate SG-4860 installed

Finally got rid of the last soekris/pfsense router in my empire. This sg-4860 replaces a net6501-70 that had 8 intel interfaces. I “need” (well, use) five, and have plans for a sixth subnet. The Netgate box has six interfaces so it suffices both for the current needs and the planned one-additional subnet. I don’t anticipate ever going beyond the sixth subnet, and if I do there’s always VLAN trunking options to get more interfaces out of the existing box (and/or multi-hop routing via a secondary router)

Installation went without any glitches. Still running pfsense in basically the same configuration; just had to update the interface names in the configuration XML file.

Now the question is what to do with an old, but perfectly functional, nanoBSD/freeBSD box…

Ordered replacement for my last Soekris router

I am down to my last (and largest configuration) www.soekris.com net 6501 pfsense router, and just ordered a replacement for it from netgate. I’ve already replaced two other routers in my world (at other locations) with netgate products. The nice thing about them is they are directly supported with pfsense, so it’s just an easy way to go once you’ve decided to run pfsense.

This last one, at the hilltop, has been up now for over 454 days:

 

 

 

 

 

 

The router is (obviously) on a UPS. I’ve had the router for even much longer than that; I’m not entirely sure what made me reboot it over a year ago – probably a software upgrade.

Alas, it is time to replace it, primarily because I want to be able to run the newest versions of pfsense that no longer support 32-bit platforms. This box can run in 64-bit mode, but the board itself lacks one specific feature the generic freeBSD 64-bit build requires. I know I can still run pfsense by taking the stock distribution and wedging in a custom kernel build, but it just seems wiser to replace this box with something newer and fully supported anyway.

I took the easy (albeit expensive-ish) way out and ordered a netgate SG-4860-1U. I use 5 different networks in my configuration (only four made it into the screen capture) and though I could certainly achieve that via “router on a stick” with VLAN trunking and a suitable switch, I prefer to have a router with true multiple NICs on general principles.

Not sure what I will do with the soekris box when the new netgate gear arrives; it makes a great Unix freeBSD sandbox but I really have no use for such a thing. Maybe I can turn it into some ridiculous lego contraption controller someday 🙂

Amazon AWS Route53 Region: us-east-1

This is one of those things that seems hard to find even though it is in fact documented, so I thought I’d post this note in the hope that someday it will pop up on someone’s google and be helpful.

So, here are some keywords of note: This is about Route53, the DNS service in Amazon AWS, and the “region” field. The way I ran into it I was using the DynamicDNS feature in my router (pfsense), which can directly update a Route53 record. But it wants the ZoneID in this form:

REGION/ZONEID

I had a ZoneID — they look something like “Z2X8NGLIQTGFO4” (I’ve altered this from what my real ZoneID is of course). But I didn’t know what my region is. In general “my” (best/default) region is “us-west-2” but that didn’t work (generated a complaint about an invalid region). I couldn’t find any way to reveal what the correct region for my Route53 service was.

The reason is … all Route53 services are in us-east-1. That is in fact documented but you really have to dig into the AWS docs to find it if you didn’t already know where to look. So, since it took me a while to find, I wrote this note, in the hope that someone else might stumble onto it via google and get to this answer more easily than I did.

It’s extremely frustrating because the user interface will show you the ZoneID but seems to have no information at all on the Region. It would have been nice if they threw that in the info panel even though the answer is always just us-east-1. Oh well.