protlib builds on the struct and SocketServer modules in the standard library to make it easy to implement binary network protocols. It provides support for default and constant struct fields, nested structs, arrays of structs, better handling for strings and arrays, struct inheritance, and convenient syntax for instantiating and using your custom structs.
Here’s an example of defining, instantiating, writing, and reading a struct using file i/o:
from protlib import *
class Point(CStruct):
x = CInt()
y = CInt()
p1 = Point(5, 6)
p2 = Point(x=5, y=6)
p3 = Point(y=6, x=5)
assert p1 == p2 == p3
with open("point.dat", "wb") as f:
f.write( p1.serialize() )
with open("point.dat", "rb") as f:
p4 = Point.parse(f)
assert p1 == p4
You may use the socket.makefile method to use this file i/o approach for sockets.
protlib is free for use under the BSD license. It requires Python 2.6 and will presumably work with Python 2.7, although this hasn’t been tested. It has no other dependencies.
You may click here to download protlib. You may also run easy_install protlib if you have EasyInstall on your system. The project page for protlib in the Cheese Shop (aka the Python Package Index or PyPI) may be found here.
You may also check out the development version of protlib with this command:
svn checkout http://courtwright.org/svn/protlib
This is the root class of all classes representing C data types in the protlib library. It may not be directly instantiated; you must always use one of its subtypes instead. There are three optional arguments which you may pass to a CType:
Parameters: |
|
---|
Accepts either a string or a file-like object (anything with a read method) and returns a Python object with the appropriate value.
>>> raw = "\x00\x00\x00\x05"
>>> i = CInt().parse(raw)
>>> assert i == 5
Note that unlike the struct module, strings are stripped of trailing null bytes when they’re parsed. For example:
>>> raw = "foo\x00\x00"
>>> import struct
>>> s = struct.unpack("5s", raw)[0]
>>> assert s == "foo\x00\x00"
>>>
>>> from protlib import *
>>> s = CString(length = 5).parse(raw)
>>> assert s == "foo"
Note that this is a classmethod on subclasses of CStruct.
Because protlib is built on top of struct module, each basic data type in protlib uses a struct format string. The list of struct format strings may be found here and the protlib types which use them are listed below:
C data type | protlib class | struct format string |
---|---|---|
char | CChar | b |
unsigned char | CUChar | B |
short | CShort | h |
unsigned short | CUShort | H |
int | CInt | i |
unsigned int | CUInt | I |
long | CLong | l |
unsigned long | CULong | L |
float | CFloat | f |
double | CDouble | d |
char[] | CString | Xs (e.g. 5s for char[5]) |
You can make an array of any CType. Arrays pack and unpack to and from Python lists. For example:
>>> ca = CArray(5, CInt)
>>> raw = ca.serialize( [5,6,7,8,9] )
>>> xs = ca.parse(raw)
>>> assert xs == [5,6,7,8,9]
Arrays may either be given default/always values themselves or use the default/always values of the CType they are given. For example:
>>> class Triangle(CStruct):
... xcoords = CArray(3, CInt(default=0))
... ycoords = CArray(3, CInt, default=[0,0,0])
...
>>> tri = Triangle()
>>> assert tri.xcoords == tri.ycoords == [0,0,0]
This should never be instantiated directly. Instead, you should subclass this when defining a custom struct. Your subclass will be given a constructor which takes the fields of your struct as positional and/or keyword arguments. However, you don’t have to provide values for your fields at this time. For example:
>>> class Point(CStruct):
... x = CInt()
... y = CInt()
...
>>> p1 = Point(5, 6)
>>> p2 = Point()
>>> p2.x = 5
>>> p2.y = 6
>>> assert p1 == p2
Returns a literal representation of the CStruct. For example:
>>> class Point(CStruct):
... x = CInt()
... y = CInt()
...
>>> p = Point(x=5, y=6)
>>> p
Point(x=5, y=6)
Returns an objects which may be used to declare a CStruct as a field in another CStruct. This accepts the same default and always parameters as the CType constructor. For example:
>>> class Point(CStruct):
... x = CInt()
... y = CInt()
...
>>> class Vector(CStruct):
... p1 = Point.get_type()
... p2 = Point.get_type(default = Point(0,0))
...
>>> v = Vector(p1 = Point(5,6))
Warning
The order of struct fields is defined by the order in which the CType subclasses for those fields were instantiated. In other words, if you say
from protlib import *
y_field = CInt()
x_field = CInt()
class Point(CStruct):
x = x_field
y = y_field
then when you serialize your struct, the y field will come before the x field because its CInt value was instantiated first. Similarly, if you say
from protlib import *
class Point(CStruct):
x = y = CInt()
then the order of the x and y fields is undefined since they share the same CInt instance. In this second case, a warning will be raised, but the first case is not automatically detected by the protlib library.
protlib also provides a convenient framework for implementing servers which receive and return CStruct objects. This makes it easy to implement custom binary protocols in which structs are passed back and forth over socket connections. This is based on the SocketServer module in the Python standard library.
In order to use these examples, you must do only two things.
Let’s walk through a simple example. We’ll define several structs to represent geometric concepts: a Point, a Vector, and a Rectangle. Each of these structs is a message which can be sent between the client and server. We’ll also define a variable-length message called PointGroup; this struct contains the number of Point messages which immediately follow the PointGroup struct in the message.
Note that first field in each of these messages is a constant value that uniquely identifies the message.
This entire example can be found in the examples/geocalc directory. Here’s the common.py file, which is imported by both the server.py and client.py programs:
import sys
sys.path.append("../..")
from protlib import *
SERVER_ADDR = ("127.0.0.1", 12321)
class Point(CStruct):
code = CShort(always = 1)
x = CFloat()
y = CFloat()
class Vector(CStruct):
code = CShort(always = 2)
p1 = Point.get_type()
p2 = Point.get_type()
class PointGroup(CStruct):
code = CShort(always = 3)
count = CInt()
class Rectangle(CStruct):
code = CShort(always = 4)
points = CArray(4, Point)
For our server, we define a handler class with a handler method for each message we wish to accept. The name of each handler method should be the name of the message class in lower case with the words separated by underscores. For example, the Vector class is handled by the vector method, and the PointGroup class is handled by the point_group method. Each of these handler methods takes a single parameter other than self which is the actual message read and parsed from the socket.
Here’s the server.py file which uses our subclasses of the SocketServer module classes to accept and handle incoming messages:
from common import *
from math import sqrt
class Handler(TCPHandler):
LOG_TO_SCREEN = True
def vector(self, v):
"""returns the mid-point of the line segment"""
return Point(x = (v.p1.x + v.p2.x) / 2,
y = (v.p1.y + v.p2.y) / 2)
def rectangle(self, r):
"""returns the endpoint closest to the origin"""
dists = [(sqrt(p.x**2 + p.y**2), p) for p in r.points]
return min(dists)[1]
def point_group(self, pg):
"""returns a rectangle which encompasses all points in the group"""
points = [Point.parse(self.rfile) for i in range(pg.count)]
xmin = min(p.x for p in points)
xmax = max(p.x for p in points)
ymin = min(p.y for p in points)
ymax = max(p.y for p in points)
return Rectangle(points=[
Point(x=xmin, y=ymin), Point(x=xmin, y=ymax),
Point(x=xmax, y=ymin), Point(x=xmax, y=ymax)
])
server = LoggingTCPServer(SERVER_ADDR, Handler)
if __name__ == "__main__":
server.serve_forever()
To test this server, we have a simple client which sends a series of messages to the server and then reads back the responses, logging everything with our protlib.Logger class. Here’s our client.py script:
from common import *
from random import randrange
def rand_point():
return Point(x=randrange(100), y=randrange(100))
logger = Logger(also_print = True)
parser = Parser(logger = logger)
sock = socket.create_connection(SERVER_ADDR)
f = sock.makefile("r+b", bufsize=0)
vec = Vector(p1=rand_point(), p2=rand_point())
logger.log_and_write(f, vec)
pt = parser.parse(f)
assert vec.p1.x < pt.x < vec.p2.x or vec.p1.x > pt.x > vec.p2.x
assert vec.p1.y < pt.y < vec.p2.y or vec.p1.y > pt.y > vec.p2.y
rect = Rectangle(points=[Point(x=1, y=1),
Point(x=1, y=5),
Point(x=5, y=1),
Point(x=5, y=5)])
logger.log_and_write(f, rect)
pt = parser.parse(f)
assert pt.x == pt.y == 1
logger.log_and_write(f, PointGroup(count=10))
for i in range(10):
logger.log_and_write(f, rand_point())
rect = parser.parse(f)
assert rect.code == Rectangle.code.always
sock.close()
Our server does all of our logging automatically, but we need to manually invoke the logger on the client. The logs created and their format are explained below.
If you use the LoggingTCPServer and LoggingUDPServer classes, then everything is logged for us. There are 4 logs created, a struct_log, a raw_log, an error_log, and a stack_log. The log prefix is the name of the script being executed. So if we’re executing server.py then our log files will be server.struct_log, server.raw_log, etc. All logs are opened in append mode.
Each log message contains a timestamp followed by a unique identifier which indicates the specific message being received. This makes it easy to match the log messages in the different files to one another, since the unique message identifier will be present in each of the 4 logs.
Here’s a description of each log:
This contains the literal representation of each request and response, for example:
2010-01-14 11:14:37.771015 (1263485677_0): received Vector(code=2, p1=Point(code=1, x=31.0, y=24.0), p2=Point(code=1, x=12.0, y=52.0))
2010-01-14 11:14:37.771170 (1263485677_0): sending Point(code=1, x=21.5, y=38.0)
This is convenient because the structs are logged with the Python code which represents them. Therefore we can paste them directly into a Python command prompt to inspect and play around with them:
>>> from common import *
>>> p = Point(code=1, x=21.5, y=38.0)
>>> p
Point(code=1, x=21.5, y=38.0)
This contains the raw data in the form of a Python string of each request and response, for example:
2010-01-14 11:14:37.770419 (1263485677_0): received '\x00\x02\x00\x01A\xf8\x00\x00A\xc0\x00\x00\x00\x01A@\x00\x00BP\x00\x00'
2010-01-14 11:14:37.771247 (1263485677_0): sending '\x00\x01A\xac\x00\x00B\x18\x00\x00'
This is convenient because we can paste these strings into a Python command prompt and play around with them. If they are valid then we can parse them into structs, and if they aren’t then we can examine exactly why; this log will always log what we receive even in the case of unparsable binary data:
>>> from common import *
>>> s = '\x00\x01A\xac\x00\x00B\x18\x00\x00'
>>> p = Point.parse(s)
>>> p
Point(code=1, x=21.5, y=38.0)
>>>
>>> s = "bad"
>>> p = Point.parse(s)
>>> Point.parse(s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "protlib.py", line 230, in parse
return cls.get_type(cached=True).parse(f)
File "protlib.py", line 141, in parse
raise CError("{0} requires {1} bytes and was only given {2} ({3!r})".format(self.subclass.__name__, self.sizeof, len(buf), buf))
protlib.CError: Point requires 10 bytes and was only given 3 ('bad')
>>>
>>> s = "invalid but with enough data"
>>> p = Point.parse(s)
../../protlib.py:148: CWarning: code should always be 1
warn("{0} should always be {1}".format(name, ctype.always), CWarning)
>>> p
Point(code=26990, x=1.1430328245747994e+33, y=1.1834294514326081e+22)
The user does not instantiate this class or any of its subclasses directly. Instead, you declare your own handler class which subclasses either TCPHandler or UDPHandler, which are themselves subclasses of ProtHandler. They also extend the StreamRequestHandler and DatagramRequestHandler classes of the SocketServer module, respectively.
Here are all fields and methods which the user is expected to call and/or override:
By default, your handler will detect all messages present in the same module where the handler class itself is defined. So you can either define your handler in the same module where your structs are defined, or you can import those structs into the handler module. This is the recommended way to integrate your handlers with your struct definitions.
However, you may instead set the STRUCT_MOD field to the module where the structs are declared. (Technically this can be anything with __dict__ and __name__ fields.) You may also set this to a string which is the name of the module where they are declared. For example:
import module_with_structs
class SomeHandler(TCPHandler):
STRUCT_MOD = module_with_structs
# handler methods go here
class AnotherHandler(UDPHandler):
STRUCT_MOD = "module_with_structs"
# handler methods go here
This is False but default, but if set to True, the raw_log will contain a nicely formatted hex dump of the binary data sent and received. For example:
2010-01-14 13:47:37.237310 (1263494857_2): sending '\x00\x04\x00\x01A\x98\x00\x00AP\x00\x00\x00\x01A\x98\x00\x00B\xae\x00\x00\x00\x01B\xc2\x00\x00AP\x00\x00\x00\x01B\xc2\x00\x00B\xae\x00\x00'
0 1 2 3 4 5 6 7
0 00 04 00 01 41 98 00 00
8 41 50 00 00 00 01 41 98
16 00 00 42 ae 00 00 00 01
24 42 c2 00 00 41 50 00 00
32 00 01 42 c2 00 00 42 ae
40 00 00
These are best set where your custom handler class is defined, for example:
class Handler(TCPHandler):
HEX_LOGGING = LOG_TO_SCREEN = True
# handler methods would go here
Anything you return a handler method is sent back to the client, whether it’s a struct or just binary data in a string. However, sometimes you may need to send multiple messages back to the client. You can manually concatenate the binary data strings, or you can use the reply method, for example:
class RepeatRequest(CStruct):
code = CShort(always = 1)
name = CString(length = 25)
repititions = CInt()
class Handler(TCPHandler):
def repeat_request(self, rr):
for i in range(rr.repititions):
self.reply("Hello " + sm.name + "!\n")
A logging object which creates the 4 logs listed above. If prefix is omitted, then all log messages are printed to standard output and no files are created.
If you use the logging servers listed above then the logging is done automatically, but if you’re implementing a client program, then you can use this class to perform the same type of logging.
Parameters: |
|
---|
Logs the repr of an instance of a CStruct subclass to the struct_log.
Parameters: |
|
---|
Logs the repr of the packed binary data to the raw_log. If hex logging is enabled, this will also log a nicely formatted table of the hexadecimal values of this data immediately after the log message.
Parameters: |
|
---|
If you know what struct you want, then you can use the CStruct.parse classmethod to read an instance of that struct from a file, e.g. p = Point.parse(f). However, in some cases you want to read some data from a file or socket but aren’t sure what message is coming across. This class’s parse method figures out which message is being read and returns an instance of the correct struct.
Parameters: |
|
---|
This method accepts a string or file and returns an instance of the struct it reads from that string/file. If the data it finds cannot be parsed into a struct, then it just returns all of the data it is able to read. This may be an empty string if no data is available.
None will be returned in the case of an incomplete message. In this case a message will be written to the error_log if a logger was provided.
Many binary protocols have many message types, but every message has exactly the same fields, even if those fields have different lengths. It would be annoying if you had to write a bunch of mostly-identical struct definitions, so protlib lets you subclass your custom structs and only override the fields which are different in some way, such as having a default value in some subclasses but not others, etc.
Let’s walk through a simple example, which is available in the examples/struct_inheritance directory. First, we define our messages in common.py:
from random import randrange
import sys
sys.path.append("../..")
from protlib import *
SERVER_ADDR = ("127.0.0.1", 5665)
class Message(CStruct):
code = CInt()
timestamp = CString(length=20, default=lambda: datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
comment = CString(length=100, default="")
params = CArray(20, CInt(default=0))
class ErrorMessage(Message): code = CInt(always = 0)
class CCRequest(Message): code = CInt(always = 1)
class CCResponse(Message): code = CInt(always = 2)
class ZipRequest(Message): code = CInt(always = 3)
class ZipResponse(Message): code = CInt(always = 4)
In this case we have a standard message format, and the only thing that varies is the value of the code field, so we need only specify that field in our subclasses. If we needed to override other fields, we could do so in any order; the order of fields would remain as however they were declared in the parent class.
Since these messages all have different constant values in their first field, we can write a normal handler class in our server.py:
from common import *
def credit_card_lookup(ssn):
if ssn != [0] * 9:
return [randrange(10) for i in range(12)]
def zip_lookup(ssn):
if ssn != [0] * 9:
return [randrange(10) for i in range(5)]
class Handler(TCPHandler):
LOG_TO_SCREEN = True
def cc_request(self, ccr):
"""return the credit card number of the person with the given SSN"""
ssn = ccr.params[:9]
cc_num = credit_card_lookup(ssn)
if cc_num:
return CCResponse(params = cc_num)
else:
return ErrorMessage(params=ssn, comment="No matching SSN")
def zip_request(self, zr):
"""return the zip code of the person with the given SSN"""
ssn = zr.params[:9]
zip_code = zip_lookup(ssn)
if zip_code:
return ZipResponse(params = zip_code)
else:
return ErrorMessage(params=ssn, comment="No matching SSN")
server = LoggingTCPServer(SERVER_ADDR, Handler)
if __name__ == "__main__":
server.serve_forever()
Since our handler can return different types of messages depending on whether our lookup was successful, our client.py uses the Parser class to parse all incoming messages:
from common import *
logger = Logger(also_print = True)
parser = Parser(logger = logger)
def rand_ssn():
return [randrange(10) for i in range(9)]
sock = socket.create_connection(SERVER_ADDR)
f = sock.makefile("r+b", bufsize=0)
logger.log_and_write(f, CCRequest(params=rand_ssn()))
ccresp = parser.parse(f)
assert ccresp.code == CCResponse.code.always
logger.log_and_write(f, ZipRequest(params=rand_ssn()))
zresp = parser.parse(f)
assert zresp.code == ZipResponse.code.always
logger.log_and_write(f, ZipRequest())
err = parser.parse(f)
assert err.code == ErrorMessage.code.always
sock.close()
This is the function used to convert between camelCased and separated_with_underscores names. Pass it a string and it returns an all-lower-case string with underscores inserted in the appropriate places. You never have to call this method yourself, but you can use it as a test if you’re unsure of the correct handler method name for one of your CStruct class. To make it even clearer, here are some examples:
SomeStruct -> some_struct
SSNLookup -> ssn_lookup
RS485Adaptor -> rs485_adaptor
Rot13Encoded -> rot13_encoded
RequestQ -> request_q
John316 -> john316
If your struct names are already lower case then this function will just return the original string, whether or not you are already using underscores. So the rs485adaptor struct would be handled by the rs485adaptor handler method, and the rot13_encoded struct would be handled by the rot13_encoded handler method, etc.
Takes a string and returns a string containing a nicely formatted table of the hexadecimal values of the data in that string. For example:
>>> from protlib import *
>>> print hexdump("Hello World!")
0 1 2 3 4 5 6 7
0 48 65 6c 6c 6f 20 57 6f
8 72 6c 64 21
When protlib is imported, it checks whether anyone has set a default socket timeout with the socket.setdefaulttimeout method and if a default does not already exist, it sets the timeout to this value, which is 5 seconds.
If you completely unset the default timeout, this value will still be used in calls to select by the TCPHandler class. However, if you set your own default timeout value, that value will be used.