Parsley Tutorial Part III: Parsing Network Data¶
This tutorial assumes basic knowledge of writing Twisted TCP clients or servers.
Basic parsing¶
Parsing data that comes in over the network can be difficult due to that there is no guarantee of receiving whole messages. Buffering is often complicated by protocols switching between using fixed-width messages and delimiters for framing. Fortunately, Parsley can remove all of this tedium.
With parsley.makeProtocol()
, Parsley can generate a Twisted
IProtocol-implementing class which will match incoming network data using
Parsley grammar rules. Before getting started with makeProtocol()
, let’s
build a grammar for netstrings. The netstrings protocol is very simple:
4:spam,4:eggs,
This stream contains two netstrings: spam
, and eggs
. The data is
prefixed with one or more ASCII digits followed by a :
, and suffixed with a
,
. So, a Parsley grammar to match a netstring would look like:
nonzeroDigit = digit:x ?(x != '0')
digits = <'0' | nonzeroDigit digit*>:i -> int(i)
netstring = digits:length ':' <anything{length}>:string ',' -> string
makeProtocol()
takes, in addition to a grammar, a factory for a “sender”
and a factory for a “receiver”. In the system of objects managed by the
ParserProtocol
, the sender is in charge of writing data to the wire,
and the receiver has methods called on it by the Parsley rules. To demonstrate
it, here is the final piece needed in the Parsley grammar for netstrings:
receiveNetstring = netstring:string -> receiver.netstringReceived(string)
The receiver is always available in Parsley rules with the name receiver
,
allowing Parsley rules to call methods on it.
When data is received over the wire, the ParserProtocol
tries to
match the received data against the current rule. If the current rule requires
more data to finish matching, the ParserProtocol
stops and waits
until more data comes in, then tries to continue matching. This repeats until
the current rule is completely matched, and then the ParserProtocol
starts matching any leftover data against the current rule again.
One specifies the current rule by setting a currentRule
attribute on
the receiver, which the ParserProtocol
looks at before doing any
parsing. Changing the current rule is addressed in the Switching rules section.
Since the ParserProtocol
will never modify the currentRule
attribute itself, the default behavior is to keep using the same rule. Parsing
netstrings doesn’t require any rule changing, so, the default behavior of
continuing to use the same rule is fine.
Both the sender factory and receiver factory are constructed when the
ParserProtocol
‘s connection is established. The sender factory is a
one-argument callable which will be passed the ParserProtocol
‘s
Transport. This allows the sender to send data over the transport. For
example:
class NetstringSender(object):
def __init__(self, transport):
self.transport = transport
def sendNetstring(self, string):
self.transport.write('%d:%s,' % (len(string), string))
The receiver factory is another one-argument callable which is passed the
constructed sender. The returned object must at least have
prepareParsing()
and finishParsing()
methods.
prepareParsing()
is called with the ParserProtocol
instance
when a connection is established (i.e. in the connectionMade
of the
ParserProtocol
) and finishParsing()
is called when a
connection is closed (i.e. in the connectionLost
of the
ParserProtocol
).
Note
Both the receiver factory and its returned object’s prepareParsing()
are called at in the ParserProtocol
‘s connectionMade
method;
this separation is for ease of testing receivers.
To demonstrate a receiver, here is a simple receiver that receives netstrings and echos the same netstrings back:
class NetstringReceiver(object):
currentRule = 'receiveNetstring'
def __init__(self, sender):
self.sender = sender
def prepareParsing(self, parser):
pass
def finishParsing(self, reason):
pass
def netstringReceived(self, string):
self.sender.sendNetstring(string)
Putting it all together, the Protocol is constructed using the grammar, sender factory, and receiver factory:
NetstringProtocol = makeProtocol(
grammar, NetstringSender, NetstringReceiver)
Intermezzo: error reporting¶
If an exception is raised from within Parsley during parsing, whether it’s due
to input not matching the current rule or an exception being raised from code
the grammar calls, the connection will be immediately closed. The traceback
will be captured as a Failure and passed to the finishParsing()
method of the receiver.
At present, there is no way to recover from failure.
Composing senders and receivers¶
The design of senders and receivers is intentional to make composition easy: no
subclassing is required. While the composition is easy enough to do on your
own, Parsley provides a function: stack()
. It takes a base factory
followed by zero or more wrappers.
Its use is extremely simple: stack(x, y, z)
will return a callable suitable
either as a sender or receiver factory which will, when called with an
argument, return x(y(z(argument)))
.
An example of wrapping a sender factory:
class NetstringReversalWrapper(object):
def __init__(self, wrapped):
self.wrapped = wrapped
def sendNetstring(self, string):
self.wrapped.sendNetstring(string[::-1])
And then, constructing the Protocol:
NetstringProtocol = makeProtocol(
grammar,
stack(NetstringReversalWrapper, NetstringSender),
NetstringReceiver)
A wrapper doesn’t need to call the same methods on the thing it’s wrapping. Also note that in most cases, it’s important to forward unknown methods on to the wrapped object. An example of wrapping a receiver:
class NetstringSplittingWrapper(object):
def __init__(self, wrapped):
self.wrapped = wrapped
def netstringReceived(self, string):
splitpoint = len(string) // 2
self.wrapped.netstringFirstHalfReceived(string[:splitpoint])
self.wrapped.netstringSecondHalfReceived(string[splitpoint:])
def __getattr__(self, attr):
return getattr(self.wrapped, attr)
The corresponding receiver and again, constructing the Protocol:
class SplitNetstringReceiver(object):
currentRule = 'receiveNetstring'
def __init__(self, sender):
self.sender = sender
def prepareParsing(self, parser):
pass
def finishParsing(self, reason):
pass
def netstringFirstHalfReceived(self, string):
self.sender.sendNetstring(string)
def netstringSecondHalfReceived(self, string):
pass
NetstringProtocol = makeProtocol(
grammar,
stack(NetstringReversalWrapper, NetstringSender),
Switching rules¶
As mentioned before, it’s possible to change the current rule. Imagine a “netstrings2” protocol that looks like this:
3:foo,3;bar,4:spam,4;eggs,
That is, the protocol alternates between using :
and using ;
delimiting
data length and the data. The amended grammar would look something like this:
nonzeroDigit = digit:x ?(x != '0')
digits = <'0' | nonzeroDigit digit*>:i -> int(i)
netstring :delimiter = digits:length delimiter <anything{length}>:string ',' -> string
colon = digits:length ':' <anything{length}>:string ',' -> receiver.netstringReceived(':', string)
semicolon = digits:length ';' <anything{length}>:string ',' -> receiver.netstringReceived(';', string)
Changing the current rule is as simple as changing the currentRule
attribute on the receiver. So, the netstringReceived
method could look like
this:
def netstringReceived(self, delimiter, string):
self.sender.sendNetstring(string)
if delimiter == ':':
self.currentRule = 'semicolon'
else:
self.currentRule = 'colon'
While changing the currentRule
attribute can be done at any time, the
ParserProtocol
only examines the currentRule
at the
beginning of parsing and after a rule has finished matching. As a result, if
the currentRule
changes, the ParserProtocol
will wait until
the current rule is completely matched before switching rules.