Thursday, July 14, 2005

py lib gems: greenlets and py.xml

I've been experimenting with various tools in the py library lately, in preparation for a presentation I'll give to the SoCal Piggies group meeting this month. The py lib is choke-full of gems that are waiting to be discovered. In this post, I'll talk a little about greenlets, the creation of Armin Rigo. I'll also briefly mention py.xml.

Greenlets implement coroutines in Python. Coroutines can be seen as a generalization of generators, and it looks like the standard Python libray will support them in the future via 'enhanced generators' (see PEP 342). Coroutines allow you to exit a function by 'yielding' a value and switching to another function. The original function can then be re-entered, and it will continue execution from exactly where it left off.

The greenlet documentation offers some really eye-opening examples of how they can be used to implement generators for example. Another typical use case for greenlets/coroutines is turning asynchronous or event-based code into normal sequential control flow code -- the Python Desktop Server project has a good example of exactly such a transformation.

I've also been reading and looking at the code from Armin's EuroPython talk on greenlets. The talk itself must have been highly entertaining, since it is presented as a PyGame-based game. In one of the code examples I downloaded, I noticed yet another application of the asynchronous-to-sequential transformation, this time related to parsing XML data. In a few lines of code, Armin showed how to turn an asynchronous, Expat-based parsing mechanism into a generator that yields the XML elements one at a time. This approach combines the advantages of 1) using a stream oriented parser (and thus being able to process large amounts of XML data via handlers) with 2) using a generator to expose the XML parsing code in the shape of an iterator.

Here is Armin's code which I saved in a module called iterxml.py (I made a few minor modifications to make the code more general-purpose):

from py.magic import greenlet
import xml.parsers.expat

def send(arg):
greenlet.getcurrent().parent.switch(arg)

# 3 handler functions
def start_element(name, attrs):
send(('START', name, attrs))
def end_element(name):
send(('END', name))
def char_data(data):
data = data.strip()
if data:
send(('DATA', data))

def greenparse(xmldata):
p = xml.parsers.expat.ParserCreate()
p.StartElementHandler = start_element
p.EndElementHandler = end_element
p.CharacterDataHandler = char_data
p.Parse(xmldata, 1)

def iterxml(xmldata):
g = greenlet(greenparse)
data = g.switch(xmldata)
while data is not None:
yield data
data = g.switch()

Consumers of this code can pass a string containing an XML document to the iterxml function and then use a for loop to iterate through the elements yielded by the function, like this:

for data in iterxml(xmldata):
print data

When iterxml first executes, it instantiates a greenlet object and associates it with the greenparse function. Then it 'switches' into the greenlet and thus is calls that function with the given xmldata argument. There is nothing out of the ordinary in the greenparse function, which simply assigns the 3 handler functions to the xpat parser object, then calls its Parse method. However, the 3 handler functions all use greenlets via the send method, which sends the parsed data to the parent of the current greenlet. The parent in this case is the iterxml function, which yields the data at that point, then switches back into the greenparse function. The handler functions then get called again whenever a new XML element is encountered, and the switching back and forth continues until there is no more data to be parsed.

I've wanted for a while to check out the REST API offered by upcoming.org (which is a free alternative to meetup.com), so I used it in conjunction with the XML parsing stuff via greenlets.

Here's some code that uses the iterxml module to parse the response returned by upcoming.org when a request for searching the events in the L.A. metro area is sent to their server:

import sys, urllib
from iterxml import iterxml
import py

baseurl = "http://www.upcoming.org/services/rest/"
api_key = "YOUR_API_KEY_HERE"
metro_id = "1" # L.A. metro area
log = py.log.Producer("")

def get_venue_info(venue_id):
method = "venue.getInfo"
request = "%s?api_key=%s&method=%s&venue_id=%s" %
(baseurl, api_key, method, venue_id)
response = urllib.urlopen(request).read()
for data in iterxml(response):
if data[0] == 'START' and data[1] == 'venue':
attr = data[2]
venue_info = "%(name)s in %(city)s" % attr
break
return venue_info

def search_events(keywords):
method = "event.search"
request = "%s?api_key=%s&method=%s&metro_id=%s&search_text=%s" %
(baseurl, api_key, method, metro_id, keywords)
response = urllib.urlopen(request).read()
for data in iterxml(response):
if data[0] == 'START' and data[1] == 'event':
attr = data[2]
log("\n" + "-" * 80)
log.EVENT("%(name)s" % attr)
log.WHAT("%(description)s" % attr)
log.WHERE(get_venue_info(attr['venue_id']))
log.WHEN("%(start_date)s @ %(start_time)s" % attr)

if __name__ == "__main__":
if len(sys.argv) < 2:
print "Usage: %s " % sys.argv[0]
sys.exit(1)

keywords = "%%20".join(sys.argv[1:])
search_events(keywords)

An upcoming.org API key is automatically generated for you when you click here.

I tested the script by searching for Python-related events in L.A.:

./upcoming_search.py python

--------------------------------------------------------------------------------
[EVENT] SoCal Piggies July Meeting
[WHAT] Monthly meeting of the Southern California Python Interest Group.
[WHERE] USC in Los Angeles
[WHEN] 2005-07-26 @ 19:00:00

Note that I'm also using the py.log facilities I mentioned in a previous post. The only thing I needed to do was to instantiate a log object via log = py.log.Producer("") and then use it via keywords such as EVENT, WHAT, WHERE and WHEN. Since I didn't declare any log consumer, the default consumer is used, which prints its messages to stdout. Each message string is nicely prefixed by the corresponding keyword.

I'm still experimenting with greenlets, and I'm sure I'll use them in the future especially for event-based GUI code.

I want to also briefly touch on py.xml, a tool that allows you to generate XML and HTML documents almost painlessly from your Python code.

Here's the XML returned by the event.search method of upcoming.org when called with a search text of 'python':

<rsp stat="ok" version="1.0">
<event id="24868" name="SoCal Piggies July Meeting" description="Monthly meeting of the Southern California Python Interest Group." start_date="2005-07-26" end_date="0000-00-00" start_time="19:00:00" end_time="21:00:00" personal="0" selfpromotion="0" metro_id="1" venue_id="7425" user_id="14959" category_id="4" date_posted="2005-07-13" />
</rsp>


And here's how to generate the same XML output with py.xml:

import py

class ns(py.xml.Namespace):
"my custom xml namespace"

doc = ns.rsp(
ns.event(
id="24868",
name="SoCal Piggies July Meeting",
description="Monthly meeting of the Southern California Python Interest Group.",
start_date="2005-07-26",
end_date="0000-00-00",
start_time="19:00:00",
end_time="21:00:00",
personal="0",
selfpromotion="0",
metro_id="1",
venue_id="7425",
user_id="14959",
category_id="4",
date_posted="2005-07-13"),
stat="OK",
version="1.0",
)

print doc.unicode(indent=2).encode('utf8')


The code above prints:

<rsp stat="OK" version="1.0">
<event category_id="4" date_posted="2005-07-13" description="Monthly meeting of the Southern California Python Interest Group." end_date="0000-00-00" end_time="21:00:00" id="24868" metro_id="1" name="SoCal Piggies July Meeting" personal="0" selfpromotion="0" start_date="2005-07-26" start_time="19:00:00" user_id="14959" venue_id="7425"/></rsp>


As the py.xml documentation succintly puts it, positional arguments are child-tags and keyword-arguments are attributes. In addition, indentation is also available via the argument to the unicode method.

I intend to cover other tools from the py library in future posts. Stay tuned for discussions on py.execnet, little-known aspects of py.test and more!

1 comment:

simon said...

Very interesting, thanks!

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...