PyCon 2010: I Want to Present Something

I’ve been racking my brains to find something to present at PyCon 2010. I have been trying to find something good to present since PyCon 2003, when I last presented at PyCon.  (I talked about Ape, the Adaptable Persistence Engine for Zope.)  I really liked the experience of presenting and it led to a lot of interesting conversations. Since then, however, nothing has really struck me as a good idea for a presentation. The right idea has to fit at least these criteria:

  • It’s something I’m good at.
  • At least a handful of PyCon attendees would like to learn more about it in a presentation.
  • I can’t spend weeks to prepare for the presentation.

This year, I was planning to do a really fun presentation by writing some Python scripts for controlling a RepRap, then I was going to present the hardware and software.  That didn’t work out, however, because the warping issues with ABS (a type of plastic) are just too severe to print useful parts, so my RepRap has sat idle.  (I’m now considering PLA, but it’s a newish and expensive material.)

Here are a few other ideas for a presentation topic.  What do you think?  Any other ideas?

  • RelStorage.  I could talk about future plans, why I think it is an improvement over ZEO, and why you should use it.  There seem to be a lot of people with questions about RelStorage, so it would be nice to have some time to answer the questions in a big room so others can hear the answers.
  • KARL and BFG.  This would probably be a team presentation.  KARL is some fairly interesting software that I got to help develop this year.  I could talk about the software, along with the development experience and style.  In KARL we made a conscious choice to ignore certain apparent DRY violations, leading to significant productivity gains.
  • A friendly introduction to Buildout.  Buildout is a tool that a lot of developers need, but don’t know it yet.  The function that Buildout performs is as important as version control and automated testing.  Come find out why Buildout is far better than a pile of Makefiles.
  • An introduction to Buildout (zc.buildout) for people familiar with Apache Maven.  Buildout and Maven fill approximately the same niche, but for different audiences.  (Buildout for Python, Maven for Java.)  Maybe there are Mavenites at the conference who would like to switch to a more Python centric system.
  • A discussion of text indexing in Plone and BFG.  This might be a narrow topic, but I find it interesting and important.  I have found ways to reduce complex 90 second text searches to 1 second.  The solution is not pure Python, unfortunately. 😉  I have also thought about how to expand into areas like faceted search/browse functionality.

Feedback encouraged!

RelStorage 1.3.0b1, Now With Blob Support

I have just released two versions of RelStorage. Version 1.3.0b1 adds full support for ZODB blobs stored on the filesystem. Version 1.2.0 is currently the better choice if you’re upgrading a production system and don’t need blob support.

People have been asking for blob support for months. I am glad to finally get it done, with a little help from a customer. With blob support, now we can easily store large artifacts on the filesystem, while keeping all metadata in the database.

To celebrate the new release, I have created a sample buildout.cfg that builds Plone with RelStorage, PostgreSQL, and blob support. (Thanks goes to Hanno Schlichting, who released a compatible version of plone.recipe.zope2instance only moments after I requested it.) Here it is:

[buildout]
parts = plone zope2 instance zopepy
find-links =
    http://dist.plone.org
    http://download.zope.org/ppix/
    http://download.zope.org/distribution/
    http://effbot.org/downloads
    http://packages.willowrise.org
eggs =
    elementtree
    PILwoTk
    RelStorage
    psycopg2
versions = versions

[versions]
ZODB3 = 3.8.3-polling
RelStorage = 1.3.0b1

[plone]
recipe = plone.recipe.plone

[zope2]
recipe = plone.recipe.zope2install
url = ${plone:zope2-url}

[instance]
recipe = plone.recipe.zope2instance
zope2-location = ${zope2:location}
user = admin:admin
products = ${plone:products}
eggs =
    ${buildout:eggs}
    ${plone:eggs}
    plone.app.blob
zcml = plone.app.blob
rel-storage =
    type postgresql
    dsn dbname='plone' user='plone' host='localhost' password='plone'
    blob-dir var/blobs

[zopepy]
recipe = zc.recipe.egg
eggs = ${instance:eggs}
interpreter = zopepy
extra-paths = ${instance:zope2-location}/lib/python
scripts = zopepy zodbconvert

P.S. I have been told that a very prominent Plone developer recently configured RelStorage with master/slave replication on MySQL, and that it works smoothly. I expect him to announce his success soon!

Book Review: Practical Plone 3

Packt Publishing asked me to review their new book, Practical Plone 3: A Beginner’s Guide to Building Powerful Websites. The book impressed me, but not in the way I expected at first.

As I read the instructions in chapter two about how to install Plone, I considered the experience my Dad would have gone through if he had this book when we were setting up Plone to run his company’s web site. My Dad is a power user, but not a programmer or systems administrator, so with this book, he probably would have installed Plone himself on a spare Windows computer. This book would have provided him enough direction to set up a lot of the functionality he needed, without my help. He would have immediately started publishing pages with Plone’s many features.

However, I imagine that a short time later, something would go seriously wrong. The computer’s IP address would change because the DHCP lease expired, the database would lose some transactions due to some misbehaving application, or a mischevious virus would rename files with a “py” extension to “rb”. All of those problems are outside Plone’s control, so this book does not try to address them.

Plone beginners like my Dad are not prepared to handle the problems that occur when a computer is used as a web server. In that light, I wondered if it really is possible to run Plone (or any content management system) without deep technical experience. I thought for a moment that this book is not for beginners after all.

Even after that logic, I decided I still want very much to give my Dad a copy of this book next time we set up a Plone web site. He will read it to find out what the latest version of Plone can do. He will install it on his own computer for his own education, but I will set up the production web site on a server.

The first twelve chapters (250 pages) are intended for Plone users. Beginners will enjoy all of those chapters, I think. As I read them, I even picked up a few things I haven’t learned, like how to use content rules.

I think beginners might struggle the most with chapter nine, which explains how to control workflow. Controlling workflow in Plone is not as easy as other Plone functions, because Plone falls back to the less polished Zope Management Interface for workflow design. Matt Bowen handled the difficult topic gracefully.

The rest of the book (almost 300 pages) is for developers, not power users. The contrast is sharp. While the first half of the book tells the reader what buttons to push, the second half tells the reader how to modify their Buildout configuration and what to type in a terminal session.

Each chapter is written by a different author. I noticed two interesting effects of multiple authorship. First, each author is enthusiastic about the particular topic, so even LDAP (which I generally find quite boring) gets a chapter of quality treatment. Second, there is more redundancy than you would find in most technical books, but redundancy is probably good in this case.

I do have one quibble with the book’s organization. When the technology behind Plone was invented, CSS was still a baby and browsers did not support it well. Back then, changing a site’s appearance meant changing nested tables in HTML, so the developers of Zope (including myself) invented ways to manage that task. That is how the portal_skins tool came about. The theming chapters explain how to use the latest version of that technology.

Today, we can expect all of our customers to use browsers that support CSS, so the chapters on theming should start by explaining how to customize the web site’s CSS. Developers will make much faster progress that way than if they have to learn the many theming-related abstractions Plone has today.

In conclusion, Practical Plone 3 is more than just a beginner’s book. I plan to use this book as a communication tool with my Plone customers. The book is a menu telling my customers who are beginners to Plone what we can set up together without a lot of work. I will also use it to help developers come up to speed on Plone.

The Fastest WSGI Server for Zope

I have been planning to compare mod_wsgi with paste.httpserver, which Zope 3 uses by default.  I guessed the improvement would be small since parsing HTTP isn’t exactly computationally intensive.  Today I finally had a good chance to perform the test on a new linode virtual host.

The difference blew me away.  I couldn’t believe it at first, so I double-checked everything.  The results came out about the same every time, though:

wsgi-zope1

I used the ab command to run this test, like so:

ab -n 1000 -c 8 http://localhost/

The requests are directed at a simple Zope page template with no dynamic content (yet), but the Zope publisher, security, and component architecture are all deeply involved.  The Paste HTTP server handles up to 276 requests per second, while a simple configuration of mod_wsgi handles up to 1476 per second.  Apparently, Graham‘s beautiful Apache module is over 5 times as fast for this workload.  Amazing!

Well, admittedly, no… it’s not actually amazing. I ran this test on a Xen guest that has access to 4 cores.  I configured mod_wsgi to run 4 processes, each with 1 thread. This mod_wsgi configuration has no lock contention.  The Paste HTTP server lets you run multiple threads, but not multiple processes, leading to enormous contention for Python’s global interpreter lock.  The Paste HTTP server is easier to get running, but it’s clearly not intended to compete with the likes of mod_wsgi for production use.

I confirmed this explanation by running “ab -n 1000 -c 1 http://localhost/”; in this case, both servers handled just under 400 requests per second.  Clearly, running multiple processes is a much better idea than running multiple threads, and with mod_wsgi, running multiple processes is now easy.  My instance of Zope 3 is running RelStorage 1.1.3 on MySQL.  (This also confirms that the MySQL connector in RelStorage can poll the database at least 1476 times per second.  That’s good to know, although even higher speeds should be attainable by enabling the memcached integration.)

I mostly followed the repoze.grok on mod_wsgi tutorial, except that I used zopeproject instead of Repoze or Grok.  The key ingredient is the WSGI script that hits my Zope application to handle requests.  Here is my WSGI script (sanitized):

# set up sys.path.
code = open('/opt/myapp/bin/myapp-ctl').read()
exec code

# load the app
from paste.deploy import loadapp
zope_app = loadapp('config:/opt/myapp/deploy.ini')

def application(environ, start_response):
    # translate the path
    path = environ['PATH_INFO']
    host = environ['SERVER_NAME']
    port = environ['SERVER_PORT']
    scheme = environ['wsgi.url_scheme']
    environ['PATH_INFO'] = (
        '/myapp/++vh++%s:%s:%s/++%s' % (scheme, host, port, path))
    # call Zope
    return zope_app(environ, start_response)

This script is mostly trivial, except that it modifies the PATH_INFO variable to map the root URL to a folder inside Zope. I’m sure the same path translation is possible with Apache rewrite rules, but this way is easier, I think.

Limits of zope.pipeline

I’m starting to get a sense of what publisher functionality I can put in a WSGI pipeline and what I shouldn’t.

The pipeline is very useful for specifying the order things should happen.  For example, the error handling should be as early in the pipeline as possible, so it can handle many kinds of errors, but it has to come after the pipeline element that opens and closes the root database connection.  Constraints like that have never been expressed clearly in the current publisher.

I was planning to encapsulate the <base> tag mangling logic in a simple pipeline step, but I’ve studied how it currently works and I realize now that WSGI doesn’t provide a good abstraction for the kind of heuristics Zope uses to makes the <base> tag logic fast.  I am considering several choices:

  1. Split the base tag handling between a pipeline element and a new adapter.
  2. Add short-lived output filter hooks to the response, similar to the traversal_hooks I added to requests, which I think turned out quite nice.
  3. Stick to the original plan, which might cause performance problems since Zope would then have to buffer potentially large output streams.

I need to choose the pattern that maximizes clarity for readers.  #1 and #2 are very similar.  #1 is less direct and thus more ambiguous than #2, but #1 is used more often in Zope code.

RelStorage: A New ZODB Storage

I’m writing RelStorage, a new storage implementation for ZODB / Zope / Plone. RelStorage replaces PGStorage. I’ve put up a RelStorage Wiki page and the zodb-dev mailing list has been discussing it. There is no stable release yet, but a stable release is planned for this month.

While performance is not the main goal (reliability and scalability are more important), I was pleasantly surprised to discover last week that creating a Plone 3 site in RelStorage on PostgreSQL 8.1 is a bit faster than doing the same thing in FileStorage, the default ZODB storage. Clearly, the PostgreSQL team is doing a great job!

Several years ago I put together an early prototype of PGStorage. I recall discovering that PostgreSQL was terribly slow at storing a lot of BLOBs. I read about the soon-to-come TOAST feature, but I wasn’t sure it would solve the problem, so I discarded the whole idea for years. Today, PostgreSQL seems to have no problem at all with this kind of work. It sure has come a long way.

RelStorage also connects to Oracle 10g. According to benchmarks, Oracle has a slight performance advantage, perhaps due to the “read only” isolation mode that Oracle provides. It might be useful for PostgreSQL to get that feature too.

I’m considering setting up a MySQL adapter for RelStorage as well. When the database is in MySQL and Zope is running in mod_wsgi, we could say that the “P” in LAMP stands for Plone!