Must… Write… Something…

Life is really great–so great that I would generally prefer to spend more time living it rather than writing about it.  Still, I’ve got to write, or I’ll forget things and I’ll miss opportunities to communicate.

Yet there’s so much to write.  Where should I start?  Anywhere will do.  Ok.  Time to blog again, and not necessarily about what is most important or what happened most recently.  I’ll write about what I want to write about.

RelStorage 1.1 beta 1

This release has greatly optimized packing. This time for real, I hope! 🙂

At this point, I am confident in the correctness of the pack algorithm. Version 1.0 had it right as well. However, version 1.0 was naive about just how many object states might be packed all at once. It is unwise to attempt to delete gigabytes of data in millions of rows with a single, simple delete statement; it turns out that can take multiple days to complete. Even worse is to do that while holding the commit lock.

So version 1.1 packs one transaction at a time and periodically releases the commit lock. It also has numerous other optimizations in the pack algorithm that should make packing faster and less obtrusive. So, ZODB/Zope/Plone users of the world, please take the beta release for a spin. I’ll help if you have problems with it.

Update: I did not know until now that advisory locks were only recently added to PostgreSQL.  So if you’re using PostgreSQL, at least version 8.2 is required for this beta.  I think I’ll relax that requirement before RelStorage version 1.1.

In Defense of Narrative Doctests

The Zope community likes to innovate on programming. One such innovation is to use extensive “doctests” in place of conventional code tests. This practice, like other innovations, stirs up a little controversy now and then, since it’s often not clear how widely the idea should be applied.

In that context, may I just say that the doctests for zope.wfmc are a pleasure to read. Now if this package just had a wiki-based homepage and linked online API docs, I think it would be a fine model for other packages.

RelStorage 1.0 and Measurements

RelStorage 1.0 is out!

To celebrate, I thought I’d post some measurements. They are the results of a speed test I’ve been running on RelStorage. I created the speed test for the purpose of identifying performance problems. The test certainly helped, but I found that the results varied dramatically between runs, so don’t take these too seriously.

The test hardware is an AMD 5200+ (2 cores, 2.6 GHz). The kernel is 64 bit Linux 2.6.24, but userland is a 32 bit Debian “sid” environment in a chroot area (because it’s much easier to install Oracle in 32 bit Debian than in 64 bit Gentoo). The test is running on Python 2.4 and ZODB 3.7.1 (from Zope 2.10.5). The contenders are ZEO + FileStorage, PostgreSQL 8.3, MySQL 5.0.51a, and Oracle 10g XE. I am using the default configuration of all databases.

The first test shows the time spent per transaction to write 100 objects, followed by the average time spent to read those 100 objects in a different transaction. The time shown is the best of 3 runs, where each run measures the mean duration of each of 10 transactions.

Timing with 100 Objects Per Transaction

This chart says a few interesting things:

  • MySQL is the champ in both read and write speed. I tried hard to make PostgreSQL match, but I haven’t yet found a way. This measurement is backed up by other general observations such as the speed of the unit tests. Both databases run the same 100+ tests, but MySQL runs them in about 17 seconds, while PostgreSQL runs them in about 29 seconds.
  • Oracle 10g XE, unlike the standard Oracle product, is not intended for production. It’s limited in several ways. This test is probably not fair to Oracle, because I expect the standard edition of Oracle to fare better.
  • The ZEO + FileStorage results become unpredictable as the concurrency increases. I have a strong suspicion why that occurs: the ZEO client storage often crashes with a segfault! The speed test automatically recovers from the crash and re-runs the test, but I don’t feel like I can trust the results when ZEO is segfaulting. The crash occurs in zrpc when a lot of clients connect at once.
  • PostgreSQL write speed was only a little slower than ZEO in this test.

The second test shows the time spent per transaction to write and read 10,000 objects.

Time per transaction for 10,000 objects

In this case, MySQL still leads in read speed, but both ZEO and PostgreSQL beat MySQL in write speed. Surprisingly, the ZEO read speed turns out to be the slowest of all the tests, perhaps because 10,000 objects do not fit in the ZEO cache.

The ZEO measurements have major aberrations on the right side of the chart. This may be caused by the fact that during these tests, the ZEO client segfaults on nearly every test run. Fortunately, the fault is in the client, not the server, and these tests put unusually high stress on the ZEO connection setup and teardown code. ZEO connection setup and teardown usually only occurs when Zope starts or stops.

While the primary goal for RelStorage is reliability, not performance, it’s nice to see test results that suggest RelStorage can compete and often win.

UPDATE 12/2008: I clarified the statement above about Oracle 10g XE.

RelStorage 1.0 Beta

I am pleased to announce the beta release of RelStorage 1.0! RelStorage is a storage implementation for ZODB that stores pickles in a relational database.

Features in this release:

  • Full support for PostgreSQL 8.1+, Oracle 10g, and MySQL 5.0+.
  • Supports whole-database migration to and from FileStorage, including all history.
  • A number of optimizations have been implemented, making RelStorage performance comparable to that of FileStorage.
  • There is now a poll-interval option, which reduces the frequency of database polls and helps the database scale better for read-intensive loads.
  • There is now a pack-gc option, which makes it possible to retain at least one revision of every object during packing.

Please help test right away so that we can make a solid 1.0 release.

Get it here:

http://www.zope.org/Members/shane/RelStorage

For more information, see the wiki:

http://wiki.zope.org/ZODB/RelStorage

RelStorage: New MySQL Adapter

I was interested in adding MySQL 5.0+ support to RelStorage, so I went ahead and did it. The code is in Subversion now.

The tests I’ve run suggest the MySQL adapter is already a lot faster than ZEO and both of the other adapters. The MySQL adapter has consistently finished the tests 40% faster than the PostgreSQL adapter. I haven’t tested PostgreSQL 8.3 yet; perhaps that will make a difference. I don’t know yet how the reliability of each adapter compares.

The port took all day. (My wife and I both have a cold, making it unwise to go to work today.) I didn’t think it would take that long, but I had to slow down to figure out the strange relationship between locks and transactions in MySQL. The LOCK TABLE statement is full of surprises! I ended up using GET_LOCK and RELEASE_LOCK and row-level locks instead.

So after editing queries to fit MySQL’s rules and fixing miscellaneous details, RelStorage now has a third database adapter. I’m excited to see what happens next. For one thing, MySQL support could have a major positive effect on Plone.

By the way, MySQLdb version 1.2.2 is required. Version 1.2.1 has a bug involving BLOBs; RelStorage depends heavily on BLOBs.

RelStorage: A New ZODB Storage

I’m writing RelStorage, a new storage implementation for ZODB / Zope / Plone. RelStorage replaces PGStorage. I’ve put up a RelStorage Wiki page and the zodb-dev mailing list has been discussing it. There is no stable release yet, but a stable release is planned for this month.

While performance is not the main goal (reliability and scalability are more important), I was pleasantly surprised to discover last week that creating a Plone 3 site in RelStorage on PostgreSQL 8.1 is a bit faster than doing the same thing in FileStorage, the default ZODB storage. Clearly, the PostgreSQL team is doing a great job!

Several years ago I put together an early prototype of PGStorage. I recall discovering that PostgreSQL was terribly slow at storing a lot of BLOBs. I read about the soon-to-come TOAST feature, but I wasn’t sure it would solve the problem, so I discarded the whole idea for years. Today, PostgreSQL seems to have no problem at all with this kind of work. It sure has come a long way.

RelStorage also connects to Oracle 10g. According to benchmarks, Oracle has a slight performance advantage, perhaps due to the “read only” isolation mode that Oracle provides. It might be useful for PostgreSQL to get that feature too.

I’m considering setting up a MySQL adapter for RelStorage as well. When the database is in MySQL and Zope is running in mod_wsgi, we could say that the “P” in LAMP stands for Plone!