Building an HTTP/POST request/response protocol

In the previous post: Python Simple Client/Server Socket Communication Module I began exploring using python’s http.server module to build a simple HTTP server as framework for a request/response protocol that could invoke a remote function and return some results to the client.

My goal is to be able to write code like this:

# ON A SERVER MACHINE
def foo(input_dictionary):
  results = ... do something with input_dictionary
  return results

server = MakeServer(port=12345, func=foo)

# ON A CLIENT MACHINE
client = MakeClient(port=12345)

input_dictionary = { something ... }
results = client.command(input_dictionary)

and essentially have a trivial remote procedure call mechanism allowing the client code to invoke the foo() function on the given input.

We’re going to build this using python’s http.server module. We are going to POST to “/” (indeed, we are going to ignore the path parameter), and the data of the post itself will be a JSON encoded python object. Rather than having to parse a “Data-length” line or anything like that ourselves, the HTTP protocol will handle that part for us; all we have to do is pluck the data length value out of the HTTP header and then read and decode the posted body ourselves.

Expanding on the do_GET example from my previous post, here’s a do_POST server function:

from http.server import HTTPServer
from http.server import BaseHTTPRequestHandler
from http.server import HTTPStatus
import json

class MyRequestHandler(BaseHTTPRequestHandler):
    def do_POST(self):
        datalen = int(self.headers['Content-Length'])
        data = self.rfile.read(datalen)
        obj = json.loads(data)
        print("Got object: {}".format(obj))
        self.send_response(HTTPStatus.OK)
        self.end_headers()

server = HTTPServer(('', 12345), MyRequestHandler)
server.serve_forever()

This simple code needs more error checking. There is also an implicit conversion between 8-bit over-the-wire bytes and python internal string representation (full unicode) hiding inside the json.loads function. In fact, if you are running a version of python older than 3.6 you might be getting an error message related to this and the distinction between a python bytes object (returned by read) and a python string (expected by json.loads). So let’s fix that before going further.

In the good old days, all computing was done in English and the 8-bit ASCII character set was good enough for everyone. All characters fit into one byte, all bytes and strings could be considered more or less as interchangeable things. Obviously, those days are long gone. Even if you say you don’t care about japanese/chinese/etc speakers and their ability to send their characters (which would be a mistake of course), even English users demand full unicode support if for no other reason than to be able to put smiley faces and other emojis into their data. Unicode 😀❤️🐳💡🎉 Happens!

Starting in python 3.x python uses Unicode as the native format for strings. The good part about this is all your code automatically will work with all of those character types. The bad part about this is you have to be cognizant of how Unicode (more than 8 bits per character) interfaces with parts of the world that operate on 8-bit bytes – like TCP streams for example. Arguably this “bad” thing is actually a “good” thing as it forces you to make your code work for everyone, even people who don’t use only ASCII.

So to see Unicode in action in python, try this:

s = '\U0001F600'
#     ^ **NOTE** that's an uppercase U
print("s = /{}/ and len(s) = {}".format(s, len(s)))

This will show you:

 

 

It’s beyond the scope of this post to explain why we usually consider “native Unicode” to be an internal representation of characters and use a different external representation when sending Unicode strings “over the wire” in a protocol. I am just pointing out that this is something we have to do – pick an encoding method and use it properly on both ends.

The most common, standard, encoding used for applications like this is called utf-8, which has the advantage that the first 128 ASCII characters (all the “good old days” characters) are still encoded the same way, as a single byte, which tends to increase interoperability with naive/old programs that are not Unicode enabled (in fact this is one of the “beyond the scope of this post” reasons to encode characters rather than send everything in its 32-bit raw Unicode glory).

So we are going to convert our internal strings into a python bytes object on one side, and back on the other. A bytes object is an iterable sequence of 8-bit integers (0 .. 255), and has a decode method for converting that sequence of bytes into a Unicode string. For example:

>>> letterA = bytes([65]).decode('utf-8')
>>> print(letterA)
A

What this is showing you is that a single byte, with value 65, when decoded using the ‘utf-8’ encoding, becomes a string of one character, an uppercase A. This is demonstrating a property of utf-8, namely that the original (“good old days”) 8-bit ASCII character values are encoded as themselves. Other characters are encoded using multiple bytes, so for example:

>>> jpnhouse = bytes([229, 174, 182]).decode('utf-8')
>>> print(jpnhouse)
家

this is demonstrating that the three-byte sequence [229, 174, 182], when decoded using ‘utf-8’, will become a single character that is (I think) the word “house” in Japanese.

We don’t really need to understand encodings other than to know the encode/decode steps are there, have to be performed, and have to use the same encoding on both sides of the wire. Starting in python version 3.6, the json.loads function will accept a bytes object and do an implicit utf-8 decoding for you. This is why the first example code given up above “works” if you are running python 3.6 or later, but will fail with a complaint about strings versus bytes on earlier versions.  I think it is better practice for us to make that step explicit, which will also have the benefit of making that example code work on older versions of python (python 3.x).

With the explicit decode step the code becomes:

class MyRequestHandler(BaseHTTPRequestHandler):
    def do_POST(self):
        datalen = int(self.headers['Content-Length'])
        data_bytes = self.rfile.read(datalen)
        data_str = data_bytes.decode('utf-8')
        obj = json.loads(data_str)
        print("Got object: {}".format(obj))
        self.send_response(HTTPStatus.OK)
        self.end_headers()

This still isn’t sending any response (other than the “OK” HTTP code) so let’s add that. This revised handler takes out the print on the server side and wraps the received object inside another dictionary {'echo': obj} and then sends that back to the client as the response data:

class MyRequestHandler(BaseHTTPRequestHandler):
    def do_POST(self):
        datalen = int(self.headers['Content-Length'])
        obj = json.loads(
            self.rfile.read(datalen).decode('utf-8'))
        rslt_str = json.dumps({'echo': obj})
        rslt_bytes = rslt_str.encode('utf-8')
        self.send_response(HTTPStatus.OK)
        self.end_headers()
        self.wfile.write(rslt_bytes)
)

This works – but before putting it into production it should probably be enhanced in several ways. It leaves out several HTTP headers in the response; we should probably fill in Content-Type (‘application/json’ would be appropriate) and Content-Length. It turns out Content-Length isn’t “needed” because the default response format is HTTP/1.0 which defines the length by closing the stream at the end. But we might want to use HTTP/1.1 in the response format and include a Content-Length (and thus also allow for persistent connections which were not supported under the HTTP/1.0 format). All of these elaborations are left as an exercise for the reader at this point, in consultation with the http.server module documentation. I made many of these improvements in the code that I will also be posting on github (TBD at the time of writing this posting).

With this primitive server we have enough framework to use curl to fire commands at a server and have it invoke some function (in this case hardwired into “encapsulate the object and return it”) and return results to the client. This works really well with not very many lines of code!

Let’s build an explicit client instead of using curl (although being able to use curl does demonstrate one of the advantages of picking a standard transport protocol such as HTTP/POST). Here is a bare bones test request function:

from http.client import HTTPConnection
import json

def testrq(obj):
    c = HTTPConnection('localhost', 12345)
    c.connect()
    encoded = json.dumps(obj).encode('utf-8')
    c.request("POST", "/", body=encoded, headers={})
    response_bytes = c.getresponse().read()
    response_string = response_bytes.decode('utf-8')
    return json.loads(response_string)

This connects (to hardwired localhost:12345) and sends whatever object (obj) you provide, gets the response (see the http.client / HTTPConnection documentation) and decodes it as a JSON object. Obviously all semblance of generalization and error checking has been omitted here. But this works.

Now that we know how to send arbitrary python objects back and forth from client and server we can work on building a real, but still simple, generalized framework for all this. That will be the next post.

 

Python Simple Client/Server Socket Communication Module

I wanted a python module with a simple client/server request/response protocol … something that would let me invoke a function remotely, with code looking something like:

def server_function(input_dictionary):
  ... do something with the input_dictionary ...
  return {'result': 'blah blah blah'}
  
def client():
  server = ... connect to the server ...
  request = { some dictionary of stuff here }
  result = server.command(request)
  ... result is a dictionary as returned by the server

The client would formulate a request as a dictionary, send it to the server, the server processes it, formulates a dictionary as a reply, and returns that to the client. In effect this is a simplified form of a general remote procedure call system.

Google to the rescue? Yes, there are any number of github repositories and official modules out there that sort of do this. However, I had three problems with all the ones I found:

  • They had bugs in them regarding TCP stream semantics, or, perhaps, if not outright bugs they were written with limitations that wouldn’t generalize to transferring large amounts of data in a single request/response pair.
  • They worked at a byte stream level of abstraction, and I wanted a higher-level “message” or “packet” abstraction (more like the above “send a dictionary, get a dictionary” model).
  • OR … they were huge frameworks, that were very powerful but felt like overkill for my application. I didn’t need to set up an entire REST API, I didn’t need the features of a “real” application server, etc. Of course this is dangerous thinking, in that anything that starts out as a trivial 100-line hack might someday grow into something real. Nevertheless, I decided to proceed with a small implementation of something for my specific purpose. Though, as we’ll see, I did conclude that HTTP/POST was a suitable transport mechanism and ended up implemented what can only be thought of as a completely degenerate (one URL) imitation of a REST API. It’s up to you whether to think of what I’ve done here as something useful, perhaps for limited applications or at least as a learning environment, or a complete waste of time.

On the bugs front, let’s take a look at this code from 21.21.4.1 in the python version 3.6 library documentation for the socketserver module. Here’s the relevant part of their server example:

# self.request is the TCP socket connected to the client
data = self.request.recv(1024).strip()
# just send back the same data, but upper-cased
self.request.sendall(data.upper())

Leveraging the surrounding socketserver framework, this code receives a string over a socket, converts it to upper case, and sends that back as the response. It’s not hard to imagine generalizing this to where perhaps it is passing JSON-serialized python dictionaries (or other arbitrary serializable objects) – voila! Just what I was looking for.

But there’s a problem: TCP is a stream oriented protocol and not a “message oriented” protocol. There are potential bugs lurking in some subtle assumptions in the above code.

What happens if you want to send a 4000 byte string? Let’s try it. I modified the above code to say “4096” in the recv call and similarly modified the client. In fact, the python socket module documentation recommends 4096 as a reasonable value to specify in calls to recv():

Note For best match with hardware and network
realities, the value of bufsize should be a relatively
small power of 2, for example, 4096.

So with that in mind here are the relevant excerpts of the two sides modified to send 4000 bytes (and using a 4096 byte recv() buffer):

# Server, modified to read up to 4096 bytes
    def handle(self):
        data = self.request.recv(4096).strip()
        self.request.sendall(data.upper())
# client, modified to send/recv up to 4096 bytes
# and report the amount of data sent/received
with socket.socket(socket.AF_INET,
                   socket.SOCK_STREAM) as sock:
    data = "0123456789" * 400        # 4000 bytes total
    sock.connect((HOST, PORT))
    sock.sendall(bytes(data, "utf-8"))

    received = str(sock.recv(4096), "utf-8")
    print(
      "Bytes sent {}, bytes received {}".format(
         len(data), len(received)))

I ran the client in a loop and got output like this:

Bytes sent 4000, bytes received 4000
Bytes sent 4000, bytes received 2896
Bytes sent 4000, bytes received 2896
Bytes sent 4000, bytes received 2896
Bytes sent 4000, bytes received 2896
Bytes sent 4000, bytes received 4000
Bytes sent 4000, bytes received 1448
Bytes sent 4000, bytes received 2896
Bytes sent 4000, bytes received 2896
Bytes sent 4000, bytes received 2896
Bytes sent 4000, bytes received 2896
Bytes sent 4000, bytes received 2896
Bytes sent 4000, bytes received 2896

In fact it’s worse than this. If you put a print(len(data)) statement in the server to see what it thinks it is getting, you will find that sometimes the mismatch is on the server side (i.e., the server thinks the client sent less than 4000 bytes), and sometimes it is on the client side (i.e., the server got all the bytes, but the client thinks it got fewer bytes in the reply), and sometimes it is on both.

What’s going on???

TCP is a byte-stream protocol. It reliably delivers the bytes you give to it, and it delivers them in order. What it does NOT do is preserve any notion of message boundaries within that byte stream. We might make one call to sendall() with 4000 bytes, but that does not guarantee that a single recv() on the other side will get all 4000 bytes at once.

When TCP packets are sent “over the wire” (between two distinct machines, via some form of underlying network technology) they will be broken up into segments that have a maximum size, called, unsurprisingly, the maximum segment size, and abbreviated as MSS. The specific MSS value varies network by network, sometimes because of network technology differences, sometimes because of other factors.

If you run the above client and server on a single machine then there is no actual network between the two endpoints. The communication happens over TCP, but entirely within the confines of one machine. In that case it is likely that the code will work perfectly; 4000 bytes will be sent/received consistently on both sides (however this depends on operating system implementation details).

But if you split the client and server across two distinct machines, bringing real network technology (and limitations) into play, you will see results like those I got above. You might even see different results depending on whether your client and server are separated by a wired network, a WiFi network, a cellular network, and so forth, and whether it is a local network (from machine A to machine B in your own house, for example) or a true WAN (from machine A in your house to perhaps machine B running as an AWS cloud instance).

The fact that this simple-minded code “works” when you try it just on your own machine and doesn’t show any bugs until deployed on a larger network is a common source of misunderstanding – leading to hard-to-find bugs.

If the network can only send 1448 bytes in one “segment”, then sending 4000 bytes becomes (at least) a total of three TCP segments over the wire. They can be sent very quickly, so in many cases all three will arrive so fast that the other side will see all the data at once and will receive the full 4000 bytes. But this won’t always happen; sometimes the first segment will arrive and perhaps the second segment is delayed just long enough for the other side to not see it before the recv() call thinks it is finished. And that’s how we end up seeing only 1448 bytes transmitted sometimes, because the recv() call completes its work before the second chunk of data arrives at the destination machine.

Aside: sendall is a python-level convenience function and not an operating system (socket API) primitive. We don’t know whether sendall makes a single call to the underlying socket API to send data or breaks your data down into chunks and loops over multiple OS calls. However, it doesn’t matter. This segmentation of data sent over TCP will happen no matter what any time the MSS is smaller than the total amount of data you are trying to send.  

Emphasizing again: TCP is a byte-stream protocol, not a message protocol. If we want to send and receive structured messages, we have to layer our own application-layer protocol on top of the byte stream provided by TCP. So instead of just sending a JSON-encoded dictionary from client to server, which could easily be 4000 bytes (or more!) and trigger the problems we’re seeing above, we have to put some “decorations” on our data so that each side can parse it and structure it accordingly.

It’s common (and has been for decades) to define protocols to do this using a line-oriented format of headers and data. So, for example, instead of just sending JSON data we can send something that looks like this:

Data-length: 37
["this is a JSON list of one string"]

In this case we have a simple header “Data-length: 37” followed by (if wordpress hasn’t mangled it too badly) exactly 37 characters which are a JSON presentation of a list containing one string.

Now instead of just calling recv() and trying to read the entire message, we would read data line-by-line, parse the fields as they come in, and we can loop over multiple recv() calls if necessary because the “Data-length” header tells us how many bytes to expect after that.

If you are paying attention, you’ll realize there are still problems lurking here. Even with this “send a line telling us how long the data will be” protocol, it is still conceptually possible that even that line, no matter how short it is, might be broken up into multiple TCP segments. In other words, just because this is short:

Data-length: 37

doesn’t mean we will always get it in one read() operation when we are using a TCP stream. Fortunately python (as do many other languages) provides io libraries that essentially read an input stream character by character. That low level code has routines such as “readline” that go character-by-character (waiting for additional input from the TCP stream as necessary) building up a string into a “line” until a newline character is reached. This means we can write our code in terms of “read one line” and not worry about TCP segmentation; the newline character in effect becomes an in-band message boundary character that we can parse out and reconstruct the line abstraction from the raw TCP byte stream.

So we could go off and (carefully) write a bunch of code to implement this sort of protocol on top of TCP, and in fact I’ve done that as an exercise, but pretty soon it becomes apparent that we are in essence re-inventing something that already exists… HTTP requests, and especially HTTP POST requests of (perhaps) JSON encoded application data.

At which point “just go install one of those big REST frameworks” is a plausible answer to my question. Nevertheless, I decided to try to implement a smaller example of HTTP/POST transport for this if for no other reason than as an interesting learning exercise.

In fact python has some modules that make implementing a special purpose HTTP server pretty simple. The http.server module includes an HTTPServer class that allows you to pretty quickly set up a trivial server. Here is a server that just prints out what path name it received and returns an HTTP OK response (with no other data) the client:

from http.server import HTTPServer
from http.server import BaseHTTPRequestHandler
from http.server import HTTPStatus

class MyRequestHandler(BaseHTTPRequestHandler):
  def do_GET(self):
    print("GET request on {}".format(self.path))
    self.send_response(HTTPStatus.OK)
    self.end_headers()

server = HTTPServer(('', 12345), MyRequestHandler)
server.serve_forever()

Running this establishes an HTTP server on port 12345 and you can play with it by sending it GET requests using something like curl:

In window A:

% python3 theabovecode.py

In window B:

% curl localhost:12345/testpath

and you’ll see output indicating that the server received a request for path “/testpath”

In my next post I’ll write up how to use this framework to process an HTTP POST request of JSON-encoded data to/from a trivial server.

Random Python Performance Posting

I’ve been writing some very elementary probability simulations in python and wanted to simulate millions of coin flips. I started out with this somewhat-obvious code:

    ht = [0, 0]
    for i in range(n):
        ht[random.choice((0, 1))] += 1

which counts heads/tails in ht[0] and ht[1] and performs n flips.

Can this code be sped up? Inserting time.time() calls before/after the “for” loop I found that this code can perform 10 million flips in 11.8 seconds. So it’s about 1.18 microseconds per flip.

[ aside: all measurements were made on my MacBook Air. I tried to avoid doing anything else while running the timing tests, and in all cases took the fastest result I could get. All results for a given method were similar and were run multiple times at multiple different points in time, so I believe there were no confounding factors with background activities. Your mileage may vary. ]

Borrowing a trick from the old geezer days, I thought “surely if we unroll the loop this will go faster”.  So I changed it to “for i in range(n//100)” and then put 100 of the “ht[random.choice(…” statements in the body of the loop.

Result? 1.16 microseconds per flip. Ok, that’s faster, but by less than 1%. Kudos to python for having efficient looping, plus, of course, the random() call dominates the cost anyway.

Ok maybe random.choice is slower than explicitly asking for a random number:

    ht = [0, 0]
    for i in range(n):
	ht[random.randrange(2)] += 1

Nope, that came in significantly slower! 1.33 microseconds per flip.

Maybe I could speed it up by eliminating array access and just adding up +1 and -1 values in a loop and reconstructing the heads vs. tails numbers after:

    delta = 0
    for i in range(n):
	delta += random.choice((-1, 1))  

That ran as fast as 1.14 microseconds per flip, but still just an insignificant improvement over the the original and arguably more “obvious” code.

Of course, by this point readers are screaming at me “duh, all the time is in the random function and there’s nothing you can do about it” and they are right. I should have tested that first. For example, this code:

    ht = [0, 0]
    for i in range(n):
        ht[1] += 1

runs at 0.11 microseconds per “flip” (albeit there is no flip, it just always adds to ht[1] in this example). Or if we go all the way to the delta method, it will be as little as 0.06 microseconds per “flip” for this:

    delta = 0
    for i in range(n):
        delta += 1

obviously there is no “flip” here but it’s fast.

Of course, we can use a single call to random to generate more than one random bit at a time. In C we might generate a 32-bit random number and then peel off the bits one by one. I’m guessing that doing this in python won’t be as efficient, but there’s still a way to easily amortize one call to the random method over multiple flips. For example we can enumerate all sixteen possibilities for four flips. In binary, they would be: 0000, 0001, 0010, 0011, … 1100 1101 1110 1111. We can note that the first (“0000”) is 4 heads (arbitrarily calling the 0 heads, it doesn’t matter which) and 0 tails. The next one (“0001”) is 3 heads and 1 tails, and so on. The complete enumeration of the outcomes looks like this:

    fourflips = ((4, 0),    # 0000                                              
                 (3, 1),    # 0001                                              
                 (3, 1),    # 0010                                              
                 (2, 2),    # 0011                                              
                 (3, 1),    # 0100                                              
                 (2, 2),    # 0101                                              
                 (2, 2),    # 0110                                              
                 (1, 3),    # 0111                                              
                 (3, 1),    # 1000                                              
                 (2, 2),    # 1001                                              
                 (2, 2),    # 1010                                              
                 (1, 3),    # 1011                                              
                 (2, 2),    # 1100                                              
                 (1, 3),    # 1101                                              
                 (1, 3),    # 1110                                              
                 (0, 4))    # 1111           

and now our loop could look like this:

    ht = [0, 0]
    for i in range(n//4):
        flips = random.choice(fourflips)
        ht[0] += flips[0]
        ht[1] += flips[1]

Running this code we get 0.330 microseconds per flip. About three and a half times as fast as the original code – which makes sense, because we are calling the random function one fourth as many times and that call dominates the timing.

Of course, in doing this optimization I’ve lost the data on individual flips and now I only know the relative number of heads and tails for the four flips that are grouped together. In other words, I can no longer distinguish between a group of four flips that were “HTHT” vs “HHTT” or many other combinations that will all appear as just (2, 2) when returned by random.choice. In my application this is unimportant. If it were important, I could of course have stored the individual flips in the fourflips variable, constructing it this way:

    fourflips = (
            (0, 0, 0, 0),
            (0, 0, 0, 1),
            (0, 0, 1, 0),
     ... etc

and processing it accordingly. I stuck with the aggregated counts because that’s all I cared about in my simulations.

We can take this one level further and combine the running delta method (+1 for a heads, -1 for a tails, and reconstruct the final number of heads and tails from the total number of flips and the delta), plus we can compute the delta ahead of time for each element of the fourflips list (e.g., (0, 4) is a delta of -4, (2, 2) is a delta of 0, etc). Combining those concepts we get:

    ht = [0, 0]
    fourdeltas = (4, 2, 2, 0, 2, 0, 0, -2,
                  2, 0, 0, -2, 0, -2, -2, -4)
    delta = 0
    for i in range(n//4):
        delta += random.choice(fourdeltas)

which gets us all the way down to 0.285 microseconds per flip.

How far can this optimization go? As far as you have patience for and as large as you want the list of deltas to be. I tested the same idea at five flips precomputed as deltas (i.e., five bits at a time) and got down to 0.232 microseconds per flip. There’s a diminishing return for each additional bit added – at five deltas we’re calling the random function 20% as often as the original code, but at four deltas we were already down to 25% so the improvement isn’t as dramatic. Also the deltas list doubles in size each time though of course that can be precomputed by a program so it’s not clear that there’s a pragmatic limit. I don’t know if at some point the length of the deltas list makes random.choice execute slower; I don’t think it would but it depends on implementation details.

For my purposes I was happy at five bits per call to random.choice and 0.232 microseconds per flip, which is more or less five times faster than the original.

 

USENET lives – sort of

I was “there” during the early days of Usenet, back when:

  • News was updated only twice a day, when our server dialed up another machine we had a mutual exchange with.
  • Business cards had bang-paths on them. For example: harvard!cfisun!palladium!nw on an old one of mine.
  • Articles were expired after just a few days because disk space was perpetually scarce.
  • The Great Renaming happened (1987) and we went from everything being “net.foo” to comp.foo, news.foo, talk.foo, etc.

I just naturally assumed the modern world had done away with usenet, so I was amused/surprised to find it (“USENET”) as the answer to this clue in today’s Wall Street Journal crossword puzzle:

I have to wonder what percentage of Wall St Journal crossword puzzle enthusiasts have ever heard of usenet, let alone ever posted on it!

P.S. There is no Cabal.

Fanless FreeBSD – Kingdel PC

Being a crusty UNIX guy, sometimes I prefer FreeBSD as a dedicated headless server instead of Linux. I recently needed a quiet (fanless) box and purchased this Kingdel box from Amazon.

Front view:

 

Rear panel:

 

It came with Windows10 pre-installed, which I promptly wiped out with a full installation of FreeBSD11.1 (amd64). There were only two tricky parts that I’m documenting here in the hopes that someone’s google search will stumble upon them if needed.

First, the BIOS was configured with only a 1 second delay for hitting the magic key (DELETE) to abort the Windows10 boot. I couldn’t remember the right key (is it always DELETE these days?) and since the delay was so short I couldn’t read the message “hit DELETE to stop boot” in the power-up screen. Google to the rescue and then “keep pressing DELETE over and over again during power up” worked.

Second, I had to fool with the BIOS settings to get it to recognize my external USB CD-ROM drive (containing the FreeBSD iso installation image). I had to change the power-on device recognition delay from “automatic” to “manual” and put in a 5 second delay, which made it work. Your mileage may vary depending on what external CD-ROM drive you have. I’m using one that is literally a decade old. It seems clear the Kingdel people (reasonably) turned all the delay knobs to the minimum values to speed bootup into the pre-installed Windows.

A note on how to make the WiFi work. The FreeBSD name for the WiFi device is iwn0. Follow the standard instructions for configuring FreeBSD WiFi, but note that they are written for the “ath” driver not the “iwn” driver (so substitute accordingly).

This means put the following into /etc/rc.conf:

wlans_iwn0="wlan0"
ifconfig_wlan0="WPA SYNCDHCP"

and create the file /etc/wpa_supplicant.conf containing (for example);

network={
   ssid="put SSID here"
   psk="put password here"
}

Your mileage may vary depending on your specific WiFi configuration requirements; in my case I tested this procedure just to make sure the WiFi adapter works (it did) but for my application my device is hardwired.

Two LAN interfaces, a WiFi interface, four serial ports, a zillion USB ports; Kingdel markets this as an “Industrail” computer (note misspelling, lmao). I’m using it to run a bunch of automation scripts and python code and the like, for which it is overkill (I had been running this on a Pi) but still silent.

Christmas Lights 2017

This is an update on the technology being used to drive my Christmas light display this year.

QUICK OVERVIEW

I have about 1100 feet of Red/Green/Blue LED strip around the perimeter of my roof. Because of length limitations, this is implemented as eight separate sections, each with its own separate controller. There is also a separate section on the observation tower, with its own controller.

These controllers can receive infrared (i.e., standard remote-control) commands. To control them I built an IR repeater circuit using an Arduino and wrote some python code to drive everything.

Last year’s detailed write-up is still generally correct, aside from anything new I write about here.

THIS YEAR’S IR REPEATER CIRCUITRY

At the end of last year I prototyped a modification for controlling the roof perimeter (8 strands / 8 controllers) separately from the observation tower. With this modification I can have the roof perimeter all be one color while the observation tower is another color, or blink the tower but not the roof, and vice versa. I still can’t control individual strands on the roof perimeter (entire perimeter will always display identical color); however, since the boundaries of these strands are haphazard (they occur wherever one strand ends and another begins) and coarse (there are only 8 strands across the entire 1100 feet of perimeter), it’s not clear that viable effects could be had by controlling those individually. Though I may try that next year anyway (haha of course).

This year I implemented a new version of the IR repeater circuitry to let me control the roof perimeter (as a whole) separately from the tower. The new circuit looks like this (click image to open it full size if you want):

 

 

This is just a refined version of the modification I performed last year.

To control the roof perimeter, I have eight individual infrared emitters taped onto the receiving area of each of the eight controllers (one controller per strand) scattered around my roof perimeter. The leads from these emitters are connected to wires that all run back to one spot at my house where I have connected them all in series, with additional components as shown in the above circuit diagram.

The advantage, in my application, of connecting them all in series is that if any one of them fails, I’d rather have all eight of the lighting strands become non-responsive, rather than having all but one of them responding to commands. That would look wrong; it’s better to have them all stuck on one color plus that would make me notice the problem right away. This is all theoretic, as no IR emitter (or wire leads to them) has failed this year or last.

Power for the IR emitters is supplied this year by a 12V regulated power source.  Last year I used an unregulated wall-wart; this year I am using a scavenged PC board power supply.

The 38KHz IR digital PWM signal comes out of my Arduino on pin3, all as described in last-year’s article write-up. This signal drives a MOSFET gate to modulate the power to the entire string of IR emitters (which together require more power than the Arduino can drive directly; hence the 12V supply for that part of the circuit).

However, rather than feeding the 38KHz signal directly to the MOSFET gate, it is split into two and fed into two separate AND gates from an SN74HCT08 quad dual-input AND chip. The two “enable” lines – ENA1 and ENA2 – are just simple digital outputs from the Arduino and allow me to separately enable the signal on its way to the two different MOSFETs. By turning ENA1 and ENA2 on/off in my code, I can determine whether IR commands will go out to just the roof perimeter, the tower, or both.

Although we might casually think of the HIGH and LOW inputs on logic chips as being 5 volts vs zero, the TTL spec is broader than that and allows a HIGH to be as low as 2.7 volts. It turns out the SN74HCT08 AND gate output is higher than that, but it is still not high enough to drive the MOSFET gate directly like I was doing when it was being driven directly from the Arduino output pin. For this reason I also inserted a TC427 MOSFET gate driver into the MOSFET gate path. This chip converts a TTL-level input into a rail-to-rail signal (5V/0V in this case) suitable for driving a MOSFET gate input. In general it’s probably a best practice to use a driver chip like this for MOSFETs anyway, even if you are coming directly out an Arduino with sufficient voltage for the 4.5V logic-level requirement of this particular MOSFET gate.

SOFTWARE

I wrote about my software extensively before and put a repository,  arduino-json-IO on github that implements a tiny web server in an Arduino and allows you to send it commands to perform various digital I/O operations. One of those commands allows you to send PWM-modulated IR codes. This makes extensive use of the Arduino IRRemote library to do the actual PWM control.

The IRRemote library outputs these PWM waveforms on pin 3, which becomes the “IR” signal in my circuit diagram above.

The new question, with the enable lines, becomes how to manage those. I could just have used the existing capabilities of arduino-json-IO and explicitly managed the enable lines by writing pseudo-code like this:

# to do something with just the tower
POST "set ENA1 low" command to arduino
POST "set ENA2 high"
POST "IR command for a tower color"

but this is cumbersome and, more importantly, it requires multiple HTTP transactions between the python code driving all this and the poor little arduino generating the IR codes (and enables). Of course, this could be factored out since we only need to send the enable line commands when they need to change from their current state, but that would then require keeping track of the output enable states, and also would be subject to getting “out of sync” if, for example, the arduino server rebooted due to a bug, or a power glitch.

To avoid all that, I decided to customize the generic arduino-json-IO library to add the enable lines directly into the JSON structure sent along with each POST request to the IR emitter code. The way it works now is that the enable lines are set high when an “enable: xxx” directive is encountered in the JSON (“xxx” being the pin to set high) and any pin that was set high as a result of doing that is returned to LOW when the POST request processing is finished. This makes the management of the enable pins be, essentially, an “atomic” operation tied in with each individual POST request that sends IR codes.

The revised code is available here:

neilwebber.com/files/xmas-led/IR-enables.ino

Admittedly this isn’t as “generic” as it could be, but the beauty of something like Arduino is that it’s not unreasonable to customize the embedded software for a specific application, which is exactly what is going on here with this modification.

Given all that let’s review the old way a “heartbeat” effect was created with the arduino-json-IO IR POST command. The JSON I sent looked like this:

[{"codes":
  [{"protocol":"NEC","bits":32},
   {"code":16718565,"delay":525000},
   {"code":16732335,"delay":175000},
   {"code":16718565,"delay":525000},
   {"code":16732335,"delay":1050000}
  ],
 "repeat":10}]

The minimum gap that I found reliable between IR commands was 175msec (175000 microseconds). Call that period of time a “beat”. The above JSON commands the lights to be RED (16718565) for 3 “beats” (about half a second – 525msec), OFF for one beat (175msec), RED for 3 beats, OFF for 6 beats, and then repeats that entire cycle 10 times. This creates a “heart beat” like effect on the lights, all with one POST operation to the arduino server.

With the enable-line modification, that POST request now looks like this:

[{"codes":
  [{"enable":6},
   {"enable":7},
   {"protocol":"NEC","bits":32},
   {"code":16718565,"delay":525000},
   {"code":16732335,"delay":175000},
   {"code":16718565,"delay":525000},
   {"code":16732335,"delay":1050000}
  ],
 "repeat":10}]

Where pins 6 and 7 are my ENA1 and ENA2 pins (roof perimeter enable and tower enable). The arduino server will drive those pins HIGH when the “enable” element is encountered in the “codes” sequence, and will return them to LOW at the end of the “codes” sequence. In this way the management of the enable pins becomes atomic and stateless with respect to any given POST operation.

I wrote some python library code to encapsulate all this into an “XMASLED” object, with methods such as “heartbeat” that would generate the above JSON code and post it to the server. The question then became how to control which enable lines to turn on/off in any given request. I decided to use python context managers for this, instead of explicit “enable” / “disable” method calls. Conceptually the XMASLED object contains two state variables for the enables – “enable_tower”, and “enable_perim”, and the various methods such as heartbeat() use them to form the above JSON. The only thing the context managers do is provide a syntactic sugar allowing these variables to be saved/restored and automatically returned to the prior values on return (or exception) from a nested structure. Thus, the python code to run the heartbeat routine only on the tower, while having the roof be green, looks something like this:

# "X" is the XMASLED object
X.send(X.GREEN)    
with X.tower_only():
    X.heartbeat()

Arguably this is overkill, it wouldn’t have been the end of the world to write:

# (assume both enables are ON already)
X.send(X.GREEN)
X.disable(X.PERIMETER)
X.heartbeat()
X.enable(X.PERIMETER)

but the context manager way seemed a lot prettier, and it is robust against any exceptions (e.g., network down) that might throw us out of heartbeat and up to some higher level without knowing that the internal state for the perimeter enable was still “off”.

It’s hard to know where to stop with this idea of using a JSON data structure as a primitive programming language to have the arduino drive the IR emitters on its own. I’ve drawn the line at the spot we see here; enable pin management, sequences of individual codes, intra-code delays, and repeat counts can all be specified in a single POST command. Anything else requires multiple posts to the Arduino and management by higher-level code (i.e., python in my case).

NEXT YEAR

As I wrote about last year, these cheap controller boxes for the LED strands are really the wrong solution for this application. It’s fun that I’ve managed to build an integrated control system to operate 9 of them in unison via wifi and a baby web server interpreting JSON POSTs,  but every now and then one of the controllers misses a code (just like sometimes your TV seems to miss a button press on your remote control) and shows the wrong color. Plus there are other features people clamor for (“Can the lights change with the music?”) that can never be pragmatically implemented so long as my only control mechanism is limited to imitating an IR remote control.

So, I’m not sure about next year; I think I will be investigating higher-end commerical-grade control systems that already have integrated networking capability and are meant to be controlled “at scale” with multiple units at once. We’ll see…

Redeploying the Christmas Lights!

Almost ready to go again this year. I had a case from my now-dead soekris system; took everything out except the regulated 12V power supply and repurposed the case to hold all this nonsense (including the new “control the tower separately from the roof perimeter” circuit design).

Working in the lab – now to hook it up to the real LED strings (they went up on the roof this week but are currently dark until I hook this back up). Hoping nothing explodes!

My Lutron Experience

I have three Lutron home automation controllers in my house. They operate the motorized window shades and the exterior landscape lighting. My architect wanted me to have many more of these – to control all of the interior lighting. I vetoed that idea and insisted on regular, “you can buy them at home depot” switches for all my interior circuits. I am so glad I did that!

Here’s my Lutron installation with the covers off:

Three Lutron Automation Processors

The reason the covers are off today is because two of them died in a recent power failure. This happens “often”, this is the third time in nine years of owning these that I’ve had to call the automation company in to replace them.

Maybe you don’t think three times in nine years is “often” – but let me ask you this. When was the last time you replaced your microwave oven because of a power failure? How about your TV? Look around your house at all the equipment these days that has a computer inside it – pretty much every appliance you own has one. How many of them have you ever had to replace simply because the voltage fluctuated during a storm and killed the device?

I’m sure it happens from time-to-time, but the consumer-grade appliance manufacturers know that they would have a very bad reputation if their equipment died all the time in power failures. Lutron? Apparently doesn’t care. These processors must have little or no input voltage protection and any glitch on the power lines burns them out. Then, even if just one of them burns out, you end up having to replace all three because the company is constantly obsoleting old versions of these processors when new ones are released. New ones won’t interoperate with old ones.

It’s outrageously bad engineering and it’s hard not to point out that this bad engineering increases sales of the Lutron devices and the billable-hours of the installation/programming service providers.

I “fixed” the “one failed, but you have to replace all three” dilemma by stocking several additional processors the first time I got hosed by that. Unfortunately, today I am having the last spare installed and the next power-glitch will force an upgrade of all three even if just one dies. I am now investigating front-ending the power inputs on these devices with some server-room grade power conditioning instead.

Never, ever, ever, ever, ever allow anyone to talk you into installing this product in your house.

Why the “new” NIST password recommendation makes sense

The National Institute of Standards and Technology (NIST) recently released a new recommendation on authentication, including best practices for constructing passwords.

DISCLAIMER: I am not a password security expert. But I can do some math.

You are already familiar with the previous/old NIST recommendations because these are the recommendations that drive you crazy:

  • Use upper case and lower case
  • Use numbers
  • Use special characters (!@#$% etc)

One way or another those recommendations have worked their way into almost every system in use today, with the corresponding rules that you curse at when you are setting up a new account.

The new rules say that it’s better to just use some number of words in a phrase. No digits or special characters needed.

Why?

Let’s look at the history of password technology and do some math. Don’t be scared – we won’t be doing anything more difficult than raising a number to a power — which, in a throwback to the old days of Fortran, I will represent in this note using ** as in: 2**3 is 8:

2 ** 3 = 2 * 2 * 2 = 8

If I happen to know that your password is only two characters long, perhaps because I heard how many keyclicks there were when you typed it in, and I can guess that (like most people) you picked your password only from lowercase letters from a to z, then how many passwords would I have to try to guess yours? The answer is that there are 26 letters to choose from, therefore:

N = 26 ** 2 = 676

There are only 676 two-character lowercase passwords I have to try if I want to search all the possibilities to break your password. I can break your password by simply trying every combination “aa”, “ab”, “ac” … “zx”, “zy”, “zz” until I find the one that works.

In the old days passwords were usually limited to 8 characters. This limit can be traced all the way back to late 1970s Unix implementations of the DES password encryption algorithms. In the early days of the web most web site servers were running on Unix boxes that still used the same password code from the 1970s and often still had the eight character limit.

Obviously, 676 passwords won’t take very long for someone to try (by computer), which is why password software usually required you to use more characters – often times making you use an eight character password. A dirty little secret of some of those older systems is that they’d let you set a longer password, but in fact only ever computed based on the first eight. The old NIST recommendations were written during a time when that was still a consideration.

If I still know that you only used lowercase letters and there is a maximum of 8 characters, there are:

N = 26 ** 8 = approximately 208 billion

password possibilities.

When crackers “steal password files” from hacked web sites, what they get is not the passwords themselves, but rather their encrypted forms. This looks like a bunch of gibberish characters. When a web site checks your password, it asks you for your password, encrypts it, and sees if it gets the same gibberish it got back when you first set your password.

Web sites generally never store your original password and there is no way to recover the original password from this encrypted gibberish. Thus, when the bad guys steal a “password” file what they really have to do is just guess every possible password, putting each guess through the encryption software, until they find one that matches the gibberish string they have gotten their hands on.

So we can see the advantage of an 8 character password, instead of a 2 character password, is that they will have to try roughly 208 billion guesses to find your password. Technically, on average, they will have to try half of that before they get lucky and find yours, but for the rest of this memo I will ignore that factor of 2 because it’s not really significant and just clutters the discussion.

When computers were slower, running the DES algorithm 208 billion times would take a long enough that it wasn’t much of a threat. The calculations could take weeks, but as computers got faster and faster that number gradually came down and with modern machines this is now a practical method of attack.

This is why the old password recommendations suggested that you use more characters than just lowercase a to z. If, for example, you randomly picked from uppercase and lowercase characters, there would be 52 possibilities for each position in your password, and the number of guesses required to crack your password went up dramatically:

N = 52 ** 8 = 53.4 TRILLION

Simply by adding upper case into the equation the number of possible passwords increases by a factor of 256 (those of you who are insightful with math will note that we doubled the choices – from 26 to 52, and since there are 8 password characters the possibilities increased by a factor of 2 ** 8 = 256)

If digits (another 10 possible characters) and special characters (!@#$% etc) are added, the possible choices go up to 80 or more. Let’s take 80 possible characters and see what we get:

N = 80 ** 8 = 1677 TRILLION

That looks like a lot of possibilities. And it could be even higher because there are actually more than 80 choices of possible characters people could use in their passwords. But there are some problems. In reality humans get annoyed by all those rules and usually pick passwords that aren’t really randomly selected from all possible characters and they do other things that reduce the possible number of passwords that have to be guessed.

Let’s go back to the upper and lower case combinations (and ignore digits and special characters for now). I said there were

N = 52 ** 8 = 53.4 TRILLION

possible combinations for choosing 52 characters (upper and lower case a to z) eight times. But when most people see this message:

Password must contain at least one upper case character

what do they do in reality?

They take their lame password, and capitalize one letter of it to get past this rule.

How many combinations of passwords are there, if as a bad guy I am reasonably assured that your password only has one uppercase character? Now instead of 52 possibilities for each character, there are still only 26 possibilities, and then there are 8 choices for which one of the positions is going to be upper case.  Therefore, instead of:

N = 52 ** 8 = 53.4 TRILLION

possibilities, there are really only:

N = 26 ** 8 * 8 = 1.6 TRILLION

A similar problem occurs with the digits and special character rules. Many people just substitute numbers for letters in a fairly predictable way, e.g., using the digit zero for the letter “o”, and the digit 3 for the letter “e”, and similar things like that. We all do this, thus many passwords in the real world look like these:

pa55w0rd
dumbrul3
thissux!

The bad guys know that people do this, and when they write their guessing software they don’t have to go through all of the character possibilities. The real number of strings they have to guess is much, much, lower than the simple exponentiation math would imply. This knowledge dramatically decreases the number of possibilities that have to be computed to try to crack your password, and the sophisticated cracking software incorporates knowledge such as “try ordinary words but substitute the number 3 for e” and similar tendencies.

Over time the eight character limit went away, so longer passwords became possible, and many web sites will allow you to have fairly long passwords but still encouraged you to use all sorts of random characters in an attempt to make that exponentiation math work out to a large number.

But people still pick bad passwords because a truly random password like “x@8Q-99!va@:d” is just impossible to remember; no one picks passwords like that.

The new recommendation from NIST takes that into account, and instead recommends that you just pick a phrase that you can remember and no one else would know. This assumes that modern password systems can accept much longer passwords – which most can (it is likely that there is no practical limit in most software these days, though sometimes the web designers impose limits on the login screens).

So let’s look at some math. Suppose you picked a four word phrase from the vocabulary of an 8 year old child. How many passwords are possible?

According to various studies, the average 8 year old native speaker has a vocabulary of about 10000 words. This means that there are:

N = 10000 ** 4 = 10,000 TRILLION

This number is already 6 times higher than the 80 character, fully-random, 8 character calculation, and keep in mind that we already debunked that math as overly generous because no real human being ever actually picks those gibberish characters randomly. This implies that the advantage of the four word random phrase is far greater than “just” a factor of six we just calculated here.

Most adults will have even larger vocabularies, in the neighborhood of 20,000 to 35,000 words, so the number of four-word phrases you might pick for your password becomes even larger.

Now, of course, people are still people, and they might still pick bad passwords even if they are made out of multiple words:

this is my password
I hate password rules
you can't guess this

and so forth. But if you pick a password that:

  • is selected from a wide range of words
  • uses at least one “unusual” word
  • isn’t obviously based on something people might know about you
  • but is still easy for you to remember

then simply combining four words into a phrase and using that as your password is likely to be more secure than eight characters of gibberish. So, as systems around the web start getting updated to conform to the new password recommendations, hopefully you’ll be able to use passwords like these:

lemon blue flying campfire
tree eating pickle moon
disintegrating alien cheese sundae

It would be best if you tried to include some unusual words; remember, you are trying to make the bad guys have to guess from as many words as possible. Though, even if you stick to “just words an eight year old would know” there are roughly 10,000 choices and that already makes your password harder to guess than a realistic eight character “old style” password. Personally I can type pretty well, so “disintegrating alien cheese sundae” is something I could potentially envision using as a password (ooops, ok, not now that I’ve published this haha).

The beauty of the new NIST recommendations is that most people should be able to come up with memorable passwords that are difficult to guess and draw from between 10,000 and 20,000 words for each word in the phrase. The math is inexorable: there are more combinations for these passwords than there are for shorter gibberish passwords.

Of course, if you pick an obvious phrase that a bad guy can guess, that’s your fault. Don’t set your new password to “I love my cat” if everyone knows you love your cat.

If you are paying attention, you will note that the new NIST recommendations are somewhat equivalent to saying “hey, just use a longer password”. So my example of “disintegrating alien cheese sundae” is actually a password of length 33 (including the spaces). Thus in some sense the NIST recommendation isn’t really anything new or earth-shattering. We already know that every time you add one character to a password, it gets harder to guess by a factor related to how many possible characters there are. In fact, a 33 character random password made out of only lowercase letters would have:

N = 26 ** 33 = an enormously large number (10 to the 46th)

possibilities. But, of course, no one is going to have a 33 character random password because it would be impossible to remember. So the NIST recommendation is actually a sneaky way to get us to have longer passwords, at the cost of choosing from a less-than-random set of characters (i.e., those that combine into actual words). There’s no magic here, it’s simply the observation that the longer the password is the better it is, and if we have to give up some randomness (fewer character choices than totally random) to get to this longer password length, the math still works out favorably.

I’m looking forward to getting rid of my ridiculous eight character gibberish passwords and replacing them with easier to remember phrases, though I imagine it may take many years for the tedious old NIST suggestions to become thoroughly debunked and for the newer methodology to find its way into account password rules.

If you’d like to dig deeper into the details of how encryption works, and some other privacy and security topics, here’s a good place to start: https://pixelprivacy.com/resources/what-is-encryption/