Parsley Tutorial Part III: Parsing Network Data

This tutorial assumes basic knowledge of writing Twisted TCP clients or servers.

Basic parsing

Parsing data that comes in over the network can be difficult due to that there is no guarantee of receiving whole messages. Buffering is often complicated by protocols switching between using fixed-width messages and delimiters for framing. Fortunately, Parsley can remove all of this tedium.

With parsley.makeProtocol(), Parsley can generate a Twisted IProtocol-implementing class which will match incoming network data using Parsley grammar rules. Before getting started with makeProtocol(), let’s build a grammar for netstrings. The netstrings protocol is very simple:

4:spam,4:eggs,

This stream contains two netstrings: spam, and eggs. The data is prefixed with one or more ASCII digits followed by a :, and suffixed with a ,. So, a Parsley grammar to match a netstring would look like:

nonzeroDigit = digit:x ?(x != '0')
digits = <'0' | nonzeroDigit digit*>:i -> int(i)

netstring = digits:length ':' <anything{length}>:string ',' -> string

makeProtocol() takes, in addition to a grammar, a factory for a “sender” and a factory for a “receiver”. In the system of objects managed by the ParserProtocol, the sender is in charge of writing data to the wire, and the receiver has methods called on it by the Parsley rules. To demonstrate it, here is the final piece needed in the Parsley grammar for netstrings:

receiveNetstring = netstring:string -> receiver.netstringReceived(string)

The receiver is always available in Parsley rules with the name receiver, allowing Parsley rules to call methods on it.

When data is received over the wire, the ParserProtocol tries to match the received data against the current rule. If the current rule requires more data to finish matching, the ParserProtocol stops and waits until more data comes in, then tries to continue matching. This repeats until the current rule is completely matched, and then the ParserProtocol starts matching any leftover data against the current rule again.

One specifies the current rule by setting a currentRule attribute on the receiver, which the ParserProtocol looks at before doing any parsing. Changing the current rule is addressed in the Switching rules section.

Since the ParserProtocol will never modify the currentRule attribute itself, the default behavior is to keep using the same rule. Parsing netstrings doesn’t require any rule changing, so, the default behavior of continuing to use the same rule is fine.

Both the sender factory and receiver factory are constructed when the ParserProtocol‘s connection is established. The sender factory is a one-argument callable which will be passed the ParserProtocol‘s Transport. This allows the sender to send data over the transport. For example:

class NetstringSender(object):
    def __init__(self, transport):
        self.transport = transport

    def sendNetstring(self, string):
        self.transport.write('%d:%s,' % (len(string), string))

The receiver factory is another one-argument callable which is passed the constructed sender. The returned object must at least have prepareParsing() and finishParsing() methods. prepareParsing() is called with the ParserProtocol instance when a connection is established (i.e. in the connectionMade of the ParserProtocol) and finishParsing() is called when a connection is closed (i.e. in the connectionLost of the ParserProtocol).

Note

Both the receiver factory and its returned object’s prepareParsing() are called at in the ParserProtocol‘s connectionMade method; this separation is for ease of testing receivers.

To demonstrate a receiver, here is a simple receiver that receives netstrings and echos the same netstrings back:

class NetstringReceiver(object):
    currentRule = 'receiveNetstring'

    def __init__(self, sender):
        self.sender = sender

    def prepareParsing(self, parser):
        pass

    def finishParsing(self, reason):
        pass

    def netstringReceived(self, string):
        self.sender.sendNetstring(string)

Putting it all together, the Protocol is constructed using the grammar, sender factory, and receiver factory:



NetstringProtocol = makeProtocol(
    grammar, NetstringSender, NetstringReceiver)


The complete script is also available for download.

Intermezzo: error reporting

If an exception is raised from within Parsley during parsing, whether it’s due to input not matching the current rule or an exception being raised from code the grammar calls, the connection will be immediately closed. The traceback will be captured as a Failure and passed to the finishParsing() method of the receiver.

At present, there is no way to recover from failure.

Composing senders and receivers

The design of senders and receivers is intentional to make composition easy: no subclassing is required. While the composition is easy enough to do on your own, Parsley provides a function: stack(). It takes a base factory followed by zero or more wrappers.

Its use is extremely simple: stack(x, y, z) will return a callable suitable either as a sender or receiver factory which will, when called with an argument, return x(y(z(argument))).

An example of wrapping a sender factory:

class NetstringReversalWrapper(object):
    def __init__(self, wrapped):
        self.wrapped = wrapped

    def sendNetstring(self, string):
        self.wrapped.sendNetstring(string[::-1])

And then, constructing the Protocol:

NetstringProtocol = makeProtocol(
    grammar,
    stack(NetstringReversalWrapper, NetstringSender),
    NetstringReceiver)

A wrapper doesn’t need to call the same methods on the thing it’s wrapping. Also note that in most cases, it’s important to forward unknown methods on to the wrapped object. An example of wrapping a receiver:

class NetstringSplittingWrapper(object):
    def __init__(self, wrapped):
        self.wrapped = wrapped

    def netstringReceived(self, string):
        splitpoint = len(string) // 2
        self.wrapped.netstringFirstHalfReceived(string[:splitpoint])
        self.wrapped.netstringSecondHalfReceived(string[splitpoint:])

    def __getattr__(self, attr):
        return getattr(self.wrapped, attr)

The corresponding receiver and again, constructing the Protocol:

class SplitNetstringReceiver(object):
    currentRule = 'receiveNetstring'

    def __init__(self, sender):
        self.sender = sender

    def prepareParsing(self, parser):
        pass

    def finishParsing(self, reason):
        pass

    def netstringFirstHalfReceived(self, string):
        self.sender.sendNetstring(string)

    def netstringSecondHalfReceived(self, string):
        pass
NetstringProtocol = makeProtocol(
    grammar,
    stack(NetstringReversalWrapper, NetstringSender),

The complete script is also available for download.

Switching rules

As mentioned before, it’s possible to change the current rule. Imagine a “netstrings2” protocol that looks like this:

3:foo,3;bar,4:spam,4;eggs,

That is, the protocol alternates between using : and using ; delimiting data length and the data. The amended grammar would look something like this:

nonzeroDigit = digit:x ?(x != '0')
digits = <'0' | nonzeroDigit digit*>:i -> int(i)
netstring :delimiter = digits:length delimiter <anything{length}>:string ',' -> string

colon = digits:length ':' <anything{length}>:string ',' -> receiver.netstringReceived(':', string)
semicolon = digits:length ';' <anything{length}>:string ',' -> receiver.netstringReceived(';', string)

Changing the current rule is as simple as changing the currentRule attribute on the receiver. So, the netstringReceived method could look like this:

    def netstringReceived(self, delimiter, string):
        self.sender.sendNetstring(string)
        if delimiter == ':':
            self.currentRule = 'semicolon'
        else:
            self.currentRule = 'colon'

While changing the currentRule attribute can be done at any time, the ParserProtocol only examines the currentRule at the beginning of parsing and after a rule has finished matching. As a result, if the currentRule changes, the ParserProtocol will wait until the current rule is completely matched before switching rules.

The complete script is also available for download.