Building an HTTP/POST request/response protocol

In the previous post: Python Simple Client/Server Socket Communication Module I began exploring using python’s http.server module to build a simple HTTP server as framework for a request/response protocol that could invoke a remote function and return some results to the client.

My goal is to be able to write code like this:

# ON A SERVER MACHINE
def foo(input_dictionary):
  results = ... do something with input_dictionary
  return results

server = MakeServer(port=12345, func=foo)

# ON A CLIENT MACHINE
client = MakeClient(port=12345)

input_dictionary = { something ... }
results = client.command(input_dictionary)

and essentially have a trivial remote procedure call mechanism allowing the client code to invoke the foo() function on the given input.

We’re going to build this using python’s http.server module. We are going to POST to “/” (indeed, we are going to ignore the path parameter), and the data of the post itself will be a JSON encoded python object. Rather than having to parse a “Data-length” line or anything like that ourselves, the HTTP protocol will handle that part for us; all we have to do is pluck the data length value out of the HTTP header and then read and decode the posted body ourselves.

Expanding on the do_GET example from my previous post, here’s a do_POST server function:

from http.server import HTTPServer
from http.server import BaseHTTPRequestHandler
from http.server import HTTPStatus
import json

class MyRequestHandler(BaseHTTPRequestHandler):
    def do_POST(self):
        datalen = int(self.headers['Content-Length'])
        data = self.rfile.read(datalen)
        obj = json.loads(data)
        print("Got object: {}".format(obj))
        self.send_response(HTTPStatus.OK)
        self.end_headers()

server = HTTPServer(('', 12345), MyRequestHandler)
server.serve_forever()

This simple code needs more error checking. There is also an implicit conversion between 8-bit over-the-wire bytes and python internal string representation (full unicode) hiding inside the json.loads function. In fact, if you are running a version of python older than 3.6 you might be getting an error message related to this and the distinction between a python bytes object (returned by read) and a python string (expected by json.loads). So let’s fix that before going further.

In the good old days, all computing was done in English and the 8-bit ASCII character set was good enough for everyone. All characters fit into one byte, all bytes and strings could be considered more or less as interchangeable things. Obviously, those days are long gone. Even if you say you don’t care about japanese/chinese/etc speakers and their ability to send their characters (which would be a mistake of course), even English users demand full unicode support if for no other reason than to be able to put smiley faces and other emojis into their data. Unicode 😀❤️🐳💡🎉 Happens!

Starting in python 3.x python uses Unicode as the native format for strings. The good part about this is all your code automatically will work with all of those character types. The bad part about this is you have to be cognizant of how Unicode (more than 8 bits per character) interfaces with parts of the world that operate on 8-bit bytes – like TCP streams for example. Arguably this “bad” thing is actually a “good” thing as it forces you to make your code work for everyone, even people who don’t use only ASCII.

So to see Unicode in action in python, try this:

s = '\U0001F600'
#     ^ **NOTE** that's an uppercase U
print("s = /{}/ and len(s) = {}".format(s, len(s)))

This will show you:

 

 

It’s beyond the scope of this post to explain why we usually consider “native Unicode” to be an internal representation of characters and use a different external representation when sending Unicode strings “over the wire” in a protocol. I am just pointing out that this is something we have to do – pick an encoding method and use it properly on both ends.

The most common, standard, encoding used for applications like this is called utf-8, which has the advantage that the first 128 ASCII characters (all the “good old days” characters) are still encoded the same way, as a single byte, which tends to increase interoperability with naive/old programs that are not Unicode enabled (in fact this is one of the “beyond the scope of this post” reasons to encode characters rather than send everything in its 32-bit raw Unicode glory).

So we are going to convert our internal strings into a python bytes object on one side, and back on the other. A bytes object is an iterable sequence of 8-bit integers (0 .. 255), and has a decode method for converting that sequence of bytes into a Unicode string. For example:

>>> letterA = bytes([65]).decode('utf-8')
>>> print(letterA)
A

What this is showing you is that a single byte, with value 65, when decoded using the ‘utf-8’ encoding, becomes a string of one character, an uppercase A. This is demonstrating a property of utf-8, namely that the original (“good old days”) 8-bit ASCII character values are encoded as themselves. Other characters are encoded using multiple bytes, so for example:

>>> jpnhouse = bytes([229, 174, 182]).decode('utf-8')
>>> print(jpnhouse)
家

this is demonstrating that the three-byte sequence [229, 174, 182], when decoded using ‘utf-8’, will become a single character that is (I think) the word “house” in Japanese.

We don’t really need to understand encodings other than to know the encode/decode steps are there, have to be performed, and have to use the same encoding on both sides of the wire. Starting in python version 3.6, the json.loads function will accept a bytes object and do an implicit utf-8 decoding for you. This is why the first example code given up above “works” if you are running python 3.6 or later, but will fail with a complaint about strings versus bytes on earlier versions.  I think it is better practice for us to make that step explicit, which will also have the benefit of making that example code work on older versions of python (python 3.x).

With the explicit decode step the code becomes:

class MyRequestHandler(BaseHTTPRequestHandler):
    def do_POST(self):
        datalen = int(self.headers['Content-Length'])
        data_bytes = self.rfile.read(datalen)
        data_str = data_bytes.decode('utf-8')
        obj = json.loads(data_str)
        print("Got object: {}".format(obj))
        self.send_response(HTTPStatus.OK)
        self.end_headers()

This still isn’t sending any response (other than the “OK” HTTP code) so let’s add that. This revised handler takes out the print on the server side and wraps the received object inside another dictionary {'echo': obj} and then sends that back to the client as the response data:

class MyRequestHandler(BaseHTTPRequestHandler):
    def do_POST(self):
        datalen = int(self.headers['Content-Length'])
        obj = json.loads(
            self.rfile.read(datalen).decode('utf-8'))
        rslt_str = json.dumps({'echo': obj})
        rslt_bytes = rslt_str.encode('utf-8')
        self.send_response(HTTPStatus.OK)
        self.end_headers()
        self.wfile.write(rslt_bytes)

This works – but before putting it into production it should probably be enhanced in several ways. It leaves out several HTTP headers in the response; we should probably fill in Content-Type (‘application/json’ would be appropriate) and Content-Length. It turns out Content-Length isn’t “needed” because the default response format is HTTP/1.0 which defines the length by closing the stream at the end. But we might want to use HTTP/1.1 in the response format and include a Content-Length (and thus also allow for persistent connections which were not supported under the HTTP/1.0 format). All of these elaborations are left as an exercise for the reader at this point, in consultation with the http.server module documentation. I made many of these improvements in the code that I will also be posting on github (TBD at the time of writing this posting).

With this primitive server we have enough framework to use curl to fire commands at a server and have it invoke some function (in this case hardwired into “encapsulate the object and return it”) and return results to the client. This works really well with not very many lines of code!

Let’s build an explicit client instead of using curl (although being able to use curl does demonstrate one of the advantages of picking a standard transport protocol such as HTTP/POST). Here is a bare bones test request function:

from http.client import HTTPConnection
import json

def testrq(obj):
    c = HTTPConnection('localhost', 12345)
    c.connect()
    encoded = json.dumps(obj).encode('utf-8')
    c.request("POST", "/", body=encoded, headers={})
    response_bytes = c.getresponse().read()
    response_string = response_bytes.decode('utf-8')
    return json.loads(response_string)

This connects (to hardwired localhost:12345) and sends whatever object (obj) you provide, gets the response (see the http.client / HTTPConnection documentation) and decodes it as a JSON object. Obviously all semblance of generalization and error checking has been omitted here. But this works.

Now that we know how to send arbitrary python objects back and forth from client and server we can work on building a real, but still simple, generalized framework for all this. That will be the next post.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.