Building an HTTP/POST request/response protocol

In the previous post: Python Simple Client/Server Socket Communication Module I began exploring using python’s http.server module to build a simple HTTP server as framework for a request/response protocol that could invoke a remote function and return some results to the client.

My goal is to be able to write code like this:

# ON A SERVER MACHINE
def foo(input_dictionary):
  results = ... do something with input_dictionary
  return results

server = MakeServer(port=12345, func=foo)

# ON A CLIENT MACHINE
client = MakeClient(port=12345)

input_dictionary = { something ... }
results = client.command(input_dictionary)

and essentially have a trivial remote procedure call mechanism allowing the client code to invoke the foo() function on the given input.

We’re going to build this using python’s http.server module. We are going to POST to “/” (indeed, we are going to ignore the path parameter), and the data of the post itself will be a JSON encoded python object. Rather than having to parse a “Data-length” line or anything like that ourselves, the HTTP protocol will handle that part for us; all we have to do is pluck the data length value out of the HTTP header and then read and decode the posted body ourselves.

Expanding on the do_GET example from my previous post, here’s a do_POST server function:

from http.server import HTTPServer
from http.server import BaseHTTPRequestHandler
from http.server import HTTPStatus
import json

class MyRequestHandler(BaseHTTPRequestHandler):
    def do_POST(self):
        datalen = int(self.headers['Content-Length'])
        data = self.rfile.read(datalen)
        obj = json.loads(data)
        print("Got object: {}".format(obj))
        self.send_response(HTTPStatus.OK)
        self.end_headers()

server = HTTPServer(('', 12345), MyRequestHandler)
server.serve_forever()

This simple code needs more error checking. There is also an implicit conversion between 8-bit over-the-wire bytes and python internal string representation (full unicode) hiding inside the json.loads function. In fact, if you are running a version of python older than 3.6 you might be getting an error message related to this and the distinction between a python bytes object (returned by read) and a python string (expected by json.loads). So let’s fix that before going further.

In the good old days, all computing was done in English and the 8-bit ASCII character set was good enough for everyone. All characters fit into one byte, all bytes and strings could be considered more or less as interchangeable things. Obviously, those days are long gone. Even if you say you don’t care about japanese/chinese/etc speakers and their ability to send their characters (which would be a mistake of course), even English users demand full unicode support if for no other reason than to be able to put smiley faces and other emojis into their data. Unicode 😀❤️🐳💡🎉 Happens!

Starting in python 3.x python uses Unicode as the native format for strings. The good part about this is all your code automatically will work with all of those character types. The bad part about this is you have to be cognizant of how Unicode (more than 8 bits per character) interfaces with parts of the world that operate on 8-bit bytes – like TCP streams for example. Arguably this “bad” thing is actually a “good” thing as it forces you to make your code work for everyone, even people who don’t use only ASCII.

So to see Unicode in action in python, try this:

s = '\U0001F600'
#     ^ **NOTE** that's an uppercase U
print("s = /{}/ and len(s) = {}".format(s, len(s)))

This will show you:

 

 

It’s beyond the scope of this post to explain why we usually consider “native Unicode” to be an internal representation of characters and use a different external representation when sending Unicode strings “over the wire” in a protocol. I am just pointing out that this is something we have to do – pick an encoding method and use it properly on both ends.

The most common, standard, encoding used for applications like this is called utf-8, which has the advantage that the first 128 ASCII characters (all the “good old days” characters) are still encoded the same way, as a single byte, which tends to increase interoperability with naive/old programs that are not Unicode enabled (in fact this is one of the “beyond the scope of this post” reasons to encode characters rather than send everything in its 32-bit raw Unicode glory).

So we are going to convert our internal strings into a python bytes object on one side, and back on the other. A bytes object is an iterable sequence of 8-bit integers (0 .. 255), and has a decode method for converting that sequence of bytes into a Unicode string. For example:

>>> letterA = bytes([65]).decode('utf-8')
>>> print(letterA)
A

What this is showing you is that a single byte, with value 65, when decoded using the ‘utf-8’ encoding, becomes a string of one character, an uppercase A. This is demonstrating a property of utf-8, namely that the original (“good old days”) 8-bit ASCII character values are encoded as themselves. Other characters are encoded using multiple bytes, so for example:

>>> jpnhouse = bytes([229, 174, 182]).decode('utf-8')
>>> print(jpnhouse)
家

this is demonstrating that the three-byte sequence [229, 174, 182], when decoded using ‘utf-8’, will become a single character that is (I think) the word “house” in Japanese.

We don’t really need to understand encodings other than to know the encode/decode steps are there, have to be performed, and have to use the same encoding on both sides of the wire. Starting in python version 3.6, the json.loads function will accept a bytes object and do an implicit utf-8 decoding for you. This is why the first example code given up above “works” if you are running python 3.6 or later, but will fail with a complaint about strings versus bytes on earlier versions.  I think it is better practice for us to make that step explicit, which will also have the benefit of making that example code work on older versions of python (python 3.x).

With the explicit decode step the code becomes:

class MyRequestHandler(BaseHTTPRequestHandler):
    def do_POST(self):
        datalen = int(self.headers['Content-Length'])
        data_bytes = self.rfile.read(datalen)
        data_str = data_bytes.decode('utf-8')
        obj = json.loads(data_str)
        print("Got object: {}".format(obj))
        self.send_response(HTTPStatus.OK)
        self.end_headers()

This still isn’t sending any response (other than the “OK” HTTP code) so let’s add that. This revised handler takes out the print on the server side and wraps the received object inside another dictionary {'echo': obj} and then sends that back to the client as the response data:

class MyRequestHandler(BaseHTTPRequestHandler):
    def do_POST(self):
        datalen = int(self.headers['Content-Length'])
        obj = json.loads(
            self.rfile.read(datalen).decode('utf-8'))
        rslt_str = json.dumps({'echo': obj})
        rslt_bytes = rslt_str.encode('utf-8')
        self.send_response(HTTPStatus.OK)
        self.end_headers()
        self.wfile.write(rslt_bytes)
)

This works – but before putting it into production it should probably be enhanced in several ways. It leaves out several HTTP headers in the response; we should probably fill in Content-Type (‘application/json’ would be appropriate) and Content-Length. It turns out Content-Length isn’t “needed” because the default response format is HTTP/1.0 which defines the length by closing the stream at the end. But we might want to use HTTP/1.1 in the response format and include a Content-Length (and thus also allow for persistent connections which were not supported under the HTTP/1.0 format). All of these elaborations are left as an exercise for the reader at this point, in consultation with the http.server module documentation. I made many of these improvements in the code that I will also be posting on github (TBD at the time of writing this posting).

With this primitive server we have enough framework to use curl to fire commands at a server and have it invoke some function (in this case hardwired into “encapsulate the object and return it”) and return results to the client. This works really well with not very many lines of code!

Let’s build an explicit client instead of using curl (although being able to use curl does demonstrate one of the advantages of picking a standard transport protocol such as HTTP/POST). Here is a bare bones test request function:

from http.client import HTTPConnection
import json

def testrq(obj):
    c = HTTPConnection('localhost', 12345)
    c.connect()
    encoded = json.dumps(obj).encode('utf-8')
    c.request("POST", "/", body=encoded, headers={})
    response_bytes = c.getresponse().read()
    response_string = response_bytes.decode('utf-8')
    return json.loads(response_string)

This connects (to hardwired localhost:12345) and sends whatever object (obj) you provide, gets the response (see the http.client / HTTPConnection documentation) and decodes it as a JSON object. Obviously all semblance of generalization and error checking has been omitted here. But this works.

Now that we know how to send arbitrary python objects back and forth from client and server we can work on building a real, but still simple, generalized framework for all this. That will be the next post.

 

Python Simple Client/Server Socket Communication Module

I wanted a python module with a simple client/server request/response protocol … something that would let me invoke a function remotely, with code looking something like:

def server_function(input_dictionary):
  ... do something with the input_dictionary ...
  return {'result': 'blah blah blah'}
  
def client():
  server = ... connect to the server ...
  request = { some dictionary of stuff here }
  result = server.command(request)
  ... result is a dictionary as returned by the server

The client would formulate a request as a dictionary, send it to the server, the server processes it, formulates a dictionary as a reply, and returns that to the client. In effect this is a simplified form of a general remote procedure call system.

Google to the rescue? Yes, there are any number of github repositories and official modules out there that sort of do this. However, I had three problems with all the ones I found:

  • They had bugs in them regarding TCP stream semantics, or, perhaps, if not outright bugs they were written with limitations that wouldn’t generalize to transferring large amounts of data in a single request/response pair.
  • They worked at a byte stream level of abstraction, and I wanted a higher-level “message” or “packet” abstraction (more like the above “send a dictionary, get a dictionary” model).
  • OR … they were huge frameworks, that were very powerful but felt like overkill for my application. I didn’t need to set up an entire REST API, I didn’t need the features of a “real” application server, etc. Of course this is dangerous thinking, in that anything that starts out as a trivial 100-line hack might someday grow into something real. Nevertheless, I decided to proceed with a small implementation of something for my specific purpose. Though, as we’ll see, I did conclude that HTTP/POST was a suitable transport mechanism and ended up implemented what can only be thought of as a completely degenerate (one URL) imitation of a REST API. It’s up to you whether to think of what I’ve done here as something useful, perhaps for limited applications or at least as a learning environment, or a complete waste of time.

On the bugs front, let’s take a look at this code from 21.21.4.1 in the python version 3.6 library documentation for the socketserver module. Here’s the relevant part of their server example:

# self.request is the TCP socket connected to the client
data = self.request.recv(1024).strip()
# just send back the same data, but upper-cased
self.request.sendall(data.upper())

Leveraging the surrounding socketserver framework, this code receives a string over a socket, converts it to upper case, and sends that back as the response. It’s not hard to imagine generalizing this to where perhaps it is passing JSON-serialized python dictionaries (or other arbitrary serializable objects) – voila! Just what I was looking for.

But there’s a problem: TCP is a stream oriented protocol and not a “message oriented” protocol. There are potential bugs lurking in some subtle assumptions in the above code.

What happens if you want to send a 4000 byte string? Let’s try it. I modified the above code to say “4096” in the recv call and similarly modified the client. In fact, the python socket module documentation recommends 4096 as a reasonable value to specify in calls to recv():

Note For best match with hardware and network
realities, the value of bufsize should be a relatively
small power of 2, for example, 4096.

So with that in mind here are the relevant excerpts of the two sides modified to send 4000 bytes (and using a 4096 byte recv() buffer):

# Server, modified to read up to 4096 bytes
    def handle(self):
        data = self.request.recv(4096).strip()
        self.request.sendall(data.upper())
# client, modified to send/recv up to 4096 bytes
# and report the amount of data sent/received
with socket.socket(socket.AF_INET,
                   socket.SOCK_STREAM) as sock:
    data = "0123456789" * 400        # 4000 bytes total
    sock.connect((HOST, PORT))
    sock.sendall(bytes(data, "utf-8"))

    received = str(sock.recv(4096), "utf-8")
    print(
      "Bytes sent {}, bytes received {}".format(
         len(data), len(received)))

I ran the client in a loop and got output like this:

Bytes sent 4000, bytes received 4000
Bytes sent 4000, bytes received 2896
Bytes sent 4000, bytes received 2896
Bytes sent 4000, bytes received 2896
Bytes sent 4000, bytes received 2896
Bytes sent 4000, bytes received 4000
Bytes sent 4000, bytes received 1448
Bytes sent 4000, bytes received 2896
Bytes sent 4000, bytes received 2896
Bytes sent 4000, bytes received 2896
Bytes sent 4000, bytes received 2896
Bytes sent 4000, bytes received 2896
Bytes sent 4000, bytes received 2896

In fact it’s worse than this. If you put a print(len(data)) statement in the server to see what it thinks it is getting, you will find that sometimes the mismatch is on the server side (i.e., the server thinks the client sent less than 4000 bytes), and sometimes it is on the client side (i.e., the server got all the bytes, but the client thinks it got fewer bytes in the reply), and sometimes it is on both.

What’s going on???

TCP is a byte-stream protocol. It reliably delivers the bytes you give to it, and it delivers them in order. What it does NOT do is preserve any notion of message boundaries within that byte stream. We might make one call to sendall() with 4000 bytes, but that does not guarantee that a single recv() on the other side will get all 4000 bytes at once.

When TCP packets are sent “over the wire” (between two distinct machines, via some form of underlying network technology) they will be broken up into segments that have a maximum size, called, unsurprisingly, the maximum segment size, and abbreviated as MSS. The specific MSS value varies network by network, sometimes because of network technology differences, sometimes because of other factors.

If you run the above client and server on a single machine then there is no actual network between the two endpoints. The communication happens over TCP, but entirely within the confines of one machine. In that case it is likely that the code will work perfectly; 4000 bytes will be sent/received consistently on both sides (however this depends on operating system implementation details).

But if you split the client and server across two distinct machines, bringing real network technology (and limitations) into play, you will see results like those I got above. You might even see different results depending on whether your client and server are separated by a wired network, a WiFi network, a cellular network, and so forth, and whether it is a local network (from machine A to machine B in your own house, for example) or a true WAN (from machine A in your house to perhaps machine B running as an AWS cloud instance).

The fact that this simple-minded code “works” when you try it just on your own machine and doesn’t show any bugs until deployed on a larger network is a common source of misunderstanding – leading to hard-to-find bugs.

If the network can only send 1448 bytes in one “segment”, then sending 4000 bytes becomes (at least) a total of three TCP segments over the wire. They can be sent very quickly, so in many cases all three will arrive so fast that the other side will see all the data at once and will receive the full 4000 bytes. But this won’t always happen; sometimes the first segment will arrive and perhaps the second segment is delayed just long enough for the other side to not see it before the recv() call thinks it is finished. And that’s how we end up seeing only 1448 bytes transmitted sometimes, because the recv() call completes its work before the second chunk of data arrives at the destination machine.

Aside: sendall is a python-level convenience function and not an operating system (socket API) primitive. We don’t know whether sendall makes a single call to the underlying socket API to send data or breaks your data down into chunks and loops over multiple OS calls. However, it doesn’t matter. This segmentation of data sent over TCP will happen no matter what any time the MSS is smaller than the total amount of data you are trying to send.  

Emphasizing again: TCP is a byte-stream protocol, not a message protocol. If we want to send and receive structured messages, we have to layer our own application-layer protocol on top of the byte stream provided by TCP. So instead of just sending a JSON-encoded dictionary from client to server, which could easily be 4000 bytes (or more!) and trigger the problems we’re seeing above, we have to put some “decorations” on our data so that each side can parse it and structure it accordingly.

It’s common (and has been for decades) to define protocols to do this using a line-oriented format of headers and data. So, for example, instead of just sending JSON data we can send something that looks like this:

Data-length: 37
["this is a JSON list of one string"]

In this case we have a simple header “Data-length: 37” followed by (if wordpress hasn’t mangled it too badly) exactly 37 characters which are a JSON presentation of a list containing one string.

Now instead of just calling recv() and trying to read the entire message, we would read data line-by-line, parse the fields as they come in, and we can loop over multiple recv() calls if necessary because the “Data-length” header tells us how many bytes to expect after that.

If you are paying attention, you’ll realize there are still problems lurking here. Even with this “send a line telling us how long the data will be” protocol, it is still conceptually possible that even that line, no matter how short it is, might be broken up into multiple TCP segments. In other words, just because this is short:

Data-length: 37

doesn’t mean we will always get it in one read() operation when we are using a TCP stream. Fortunately python (as do many other languages) provides io libraries that essentially read an input stream character by character. That low level code has routines such as “readline” that go character-by-character (waiting for additional input from the TCP stream as necessary) building up a string into a “line” until a newline character is reached. This means we can write our code in terms of “read one line” and not worry about TCP segmentation; the newline character in effect becomes an in-band message boundary character that we can parse out and reconstruct the line abstraction from the raw TCP byte stream.

So we could go off and (carefully) write a bunch of code to implement this sort of protocol on top of TCP, and in fact I’ve done that as an exercise, but pretty soon it becomes apparent that we are in essence re-inventing something that already exists… HTTP requests, and especially HTTP POST requests of (perhaps) JSON encoded application data.

At which point “just go install one of those big REST frameworks” is a plausible answer to my question. Nevertheless, I decided to try to implement a smaller example of HTTP/POST transport for this if for no other reason than as an interesting learning exercise.

In fact python has some modules that make implementing a special purpose HTTP server pretty simple. The http.server module includes an HTTPServer class that allows you to pretty quickly set up a trivial server. Here is a server that just prints out what path name it received and returns an HTTP OK response (with no other data) the client:

from http.server import HTTPServer
from http.server import BaseHTTPRequestHandler
from http.server import HTTPStatus

class MyRequestHandler(BaseHTTPRequestHandler):
  def do_GET(self):
    print("GET request on {}".format(self.path))
    self.send_response(HTTPStatus.OK)
    self.end_headers()

server = HTTPServer(('', 12345), MyRequestHandler)
server.serve_forever()

Running this establishes an HTTP server on port 12345 and you can play with it by sending it GET requests using something like curl:

In window A:

% python3 theabovecode.py

In window B:

% curl localhost:12345/testpath

and you’ll see output indicating that the server received a request for path “/testpath”

In my next post I’ll write up how to use this framework to process an HTTP POST request of JSON-encoded data to/from a trivial server.

Random Python Performance Posting

I’ve been writing some very elementary probability simulations in python and wanted to simulate millions of coin flips. I started out with this somewhat-obvious code:

    ht = [0, 0]
    for i in range(n):
        ht[random.choice((0, 1))] += 1

which counts heads/tails in ht[0] and ht[1] and performs n flips.

Can this code be sped up? Inserting time.time() calls before/after the “for” loop I found that this code can perform 10 million flips in 11.8 seconds. So it’s about 1.18 microseconds per flip.

[ aside: all measurements were made on my MacBook Air. I tried to avoid doing anything else while running the timing tests, and in all cases took the fastest result I could get. All results for a given method were similar and were run multiple times at multiple different points in time, so I believe there were no confounding factors with background activities. Your mileage may vary. ]

Borrowing a trick from the old geezer days, I thought “surely if we unroll the loop this will go faster”.  So I changed it to “for i in range(n//100)” and then put 100 of the “ht[random.choice(…” statements in the body of the loop.

Result? 1.16 microseconds per flip. Ok, that’s faster, but by less than 1%. Kudos to python for having efficient looping, plus, of course, the random() call dominates the cost anyway.

Ok maybe random.choice is slower than explicitly asking for a random number:

    ht = [0, 0]
    for i in range(n):
	ht[random.randrange(2)] += 1

Nope, that came in significantly slower! 1.33 microseconds per flip.

Maybe I could speed it up by eliminating array access and just adding up +1 and -1 values in a loop and reconstructing the heads vs. tails numbers after:

    delta = 0
    for i in range(n):
	delta += random.choice((-1, 1))  

That ran as fast as 1.14 microseconds per flip, but still just an insignificant improvement over the the original and arguably more “obvious” code.

Of course, by this point readers are screaming at me “duh, all the time is in the random function and there’s nothing you can do about it” and they are right. I should have tested that first. For example, this code:

    ht = [0, 0]
    for i in range(n):
        ht[1] += 1

runs at 0.11 microseconds per “flip” (albeit there is no flip, it just always adds to ht[1] in this example). Or if we go all the way to the delta method, it will be as little as 0.06 microseconds per “flip” for this:

    delta = 0
    for i in range(n):
        delta += 1

obviously there is no “flip” here but it’s fast.

Of course, we can use a single call to random to generate more than one random bit at a time. In C we might generate a 32-bit random number and then peel off the bits one by one. I’m guessing that doing this in python won’t be as efficient, but there’s still a way to easily amortize one call to the random method over multiple flips. For example we can enumerate all sixteen possibilities for four flips. In binary, they would be: 0000, 0001, 0010, 0011, … 1100 1101 1110 1111. We can note that the first (“0000”) is 4 heads (arbitrarily calling the 0 heads, it doesn’t matter which) and 0 tails. The next one (“0001”) is 3 heads and 1 tails, and so on. The complete enumeration of the outcomes looks like this:

    fourflips = ((4, 0),    # 0000                                              
                 (3, 1),    # 0001                                              
                 (3, 1),    # 0010                                              
                 (2, 2),    # 0011                                              
                 (3, 1),    # 0100                                              
                 (2, 2),    # 0101                                              
                 (2, 2),    # 0110                                              
                 (1, 3),    # 0111                                              
                 (3, 1),    # 1000                                              
                 (2, 2),    # 1001                                              
                 (2, 2),    # 1010                                              
                 (1, 3),    # 1011                                              
                 (2, 2),    # 1100                                              
                 (1, 3),    # 1101                                              
                 (1, 3),    # 1110                                              
                 (0, 4))    # 1111           

and now our loop could look like this:

    ht = [0, 0]
    for i in range(n//4):
        flips = random.choice(fourflips)
        ht[0] += flips[0]
        ht[1] += flips[1]

Running this code we get 0.330 microseconds per flip. About three and a half times as fast as the original code – which makes sense, because we are calling the random function one fourth as many times and that call dominates the timing.

Of course, in doing this optimization I’ve lost the data on individual flips and now I only know the relative number of heads and tails for the four flips that are grouped together. In other words, I can no longer distinguish between a group of four flips that were “HTHT” vs “HHTT” or many other combinations that will all appear as just (2, 2) when returned by random.choice. In my application this is unimportant. If it were important, I could of course have stored the individual flips in the fourflips variable, constructing it this way:

    fourflips = (
            (0, 0, 0, 0),
            (0, 0, 0, 1),
            (0, 0, 1, 0),
     ... etc

and processing it accordingly. I stuck with the aggregated counts because that’s all I cared about in my simulations.

We can take this one level further and combine the running delta method (+1 for a heads, -1 for a tails, and reconstruct the final number of heads and tails from the total number of flips and the delta), plus we can compute the delta ahead of time for each element of the fourflips list (e.g., (0, 4) is a delta of -4, (2, 2) is a delta of 0, etc). Combining those concepts we get:

    ht = [0, 0]
    fourdeltas = (4, 2, 2, 0, 2, 0, 0, -2,
                  2, 0, 0, -2, 0, -2, -2, -4)
    delta = 0
    for i in range(n//4):
        delta += random.choice(fourdeltas)

which gets us all the way down to 0.285 microseconds per flip.

How far can this optimization go? As far as you have patience for and as large as you want the list of deltas to be. I tested the same idea at five flips precomputed as deltas (i.e., five bits at a time) and got down to 0.232 microseconds per flip. There’s a diminishing return for each additional bit added – at five deltas we’re calling the random function 20% as often as the original code, but at four deltas we were already down to 25% so the improvement isn’t as dramatic. Also the deltas list doubles in size each time though of course that can be precomputed by a program so it’s not clear that there’s a pragmatic limit. I don’t know if at some point the length of the deltas list makes random.choice execute slower; I don’t think it would but it depends on implementation details.

For my purposes I was happy at five bits per call to random.choice and 0.232 microseconds per flip, which is more or less five times faster than the original.

 

Correlation vs. Causation

In case you needed any more evidence of the link between fossil fuel imports (at least, from Norway) and harm to people’s health, I present you:

Lego Build: Porsche GT3

Finally getting around to building this Porsche GT3  lego kit I received as a gift. It’s insane!

More posts later, but for now I just want to post these two pictures:

1) Modeled pistons inside an engine

2) Covered up in the final assembly

So, “you know it’s there, even if you can’t see it”. That is some serious attention to detail!

 

Later I’ll be gushing about the fact this thing has an actual, functioning, transmission in it.