RelStorage 1.1.2

This release has two new useful features, one for performance, one for safety.

The performance feature is that if you use both the cache-servers and poll-interval options, RelStorage will use the cache to distribute basic change notifications.  That means we get to lighten the load on the database using the poll-interval, yet changes should still be seen instantly on all clients.  Yay! 🙂

The only drawback I expect is that caching makes debugging more difficult.  Still, this option should help people build enormous clusters, like the one my current customer was planning to build, although I got word today that they have changed their mind.

The new safety feature is the pack-dry-run option, which lets you run only the nondestructive pre_pack phase to get a list of everything that would be deleted by the pack phase.  This should be particularly useful if you’re trying out packing for the first time on a big database.  My current customer would have benefited from this too.

I also fixed a bug that caused the pack code to not remove as much old stuff as it should and I started using PyPI instead of the wiki as the main web page.  Using PyPI means I have to maintain only one README, which gets translated automatically into the PyPI page.  Until now I’ve had to maintain both the README and the wiki page.

RelStorage 1.1.1 Released

For those just tuning in: RelStorage is an adapter for ZODB (the database behind Zope) that stores data in a relational database such as MySQL, PostgreSQL, or Oracle.  It has advantages over FileStorage and ZEO, the default storage methods.  Read more in the ZODB wiki.

Download this release from PyPI.  As I mentioned in another post, this release works around MySQL performance bugs to make it possible to pack large databases in a reasonable amount of time.

If you are upgrading from previous versions, a simple schema migration is required.  See the migration instructions.

Packing With MySQL

I’ve been working on packing a 6.5GB production database based on RelStorage and MySQL 5.0.  It contains 6.4 million objects, meaning “select count(1) from current_object” returns 6426879.  I would call this a moderately sized database.  When I first approached it, it had not been packed for at least 3 months.  At first, I could not pack it, because RelStorage was using statements that MySQL mis-optimizes.  I will discuss how I fixed the problem.

The first time I tried to pack this database, the pre_pack phase took several hours before I stopped it by killing the Python process.  I studied the problem using “show processlist” and the MySQL docs online.  I found a statement in RelStorage that MySQL mis-optimizes.  I tried again and it still took too long; this time I discovered that a workaround I had applied earlier for a mis-optimized query did not actually help this instance of MySQL.  I decided to make the workaround more robust by using an explicit temporary table.

Then, on Saturday, I started a pack again.  The pre_pack phase completed in less than an hour this time and the final phase began.  The final phase gets from the pre_pack phase a complete list of objects and states to delete; all it really has to do is delete them.  I let it run all weekend.  Today (Monday), using “show processlist” again, I measured 17 seconds per transaction, with 26,000 transactions to go.  With a 50% duty cycle, this would have taken about 10 days!  (The 50% duty cycle is a configurable parameter that causes RelStorage to periodically sleep during packing, allowing other clients to write to the database without significant interruption.)

I read up on the delete statement.  It turns out that delete with a subquery is often very expensive in MySQL.  I converted the delete statements to use a table join instead of a subquery, and the time per transaction dropped from 17 seconds to about 3 milliseconds.  This time the pack finished in minutes.

Does this mean I should optimize every statement this way?  Certainly not.  The SQL syntax I had to use to optimize MySQL is not portable, so I had to sacrifice portability, readability, and maintainability, in order to gain acceptable performance.  I will only make that sacrifice when necessary.

I am very happy that I did not try to optimize this code until I was in a position where I could properly measure the performance with millions of real objects.  If I had optimized prematurely, I would have focused on micro-optimizations that might have only doubled or tripled the performance.  By waiting until the time was ripe, I multiplied the pack performance by a factor of at least 5,000, with only a few days of work and only minor code changes.  Next time someone asks why premature optimization is so evil, I will try to remember this experience.  Most of the best software developers I know skip micro-optimizations and focus on readability instead, helping to ensure that much larger optimizations will be possible later.

Note that during my optimization work, I never once needed to interrupt the main site nor restart MySQL.  I only restarted my own Python program repeatedly.  I doubt anyone else knew there was a pack going on.

RelStorage 1.1.1 Coming Soon

After a lot of testing and performance profiling, on Friday I released RelStorage 1.1.  On Saturday, I was planning to blog about the good news and the bad news, but I was unhappy enough with the bad news that I decided to turn as much of the bad news as I could into good news.  So I fixed the performance regressions and will release RelStorage 1.1.1 next week.

The performance charts have changed since RelStorage 1.0.

Chart 1: Speed tests with 100 objects per transaction.

Chart 2: Speed tests with 10,000 objects per transaction.

I’m still using 32 bit Python 2.4 and the same hardware (AMD 5200+), but this time, I ran the tests using ZODB 3.8.1.

The good news:

  • Compared with RelStorage 1.0, reads from MySQL and Oracle are definitely faster.  In some tests, the Oracle adapter now reads as quickly as MySQL.  To get the Oracle optimization, you need cx_Oracle 5, which was just released this week.  Thanks again to Helge Tesdal and Anthony Tuininga for making this happen.
  • Loading 100 objects or less per transaction via MySQL is now so fast that my speed test can no longer measure it.  The time per transaction is 2 ms whether I’m running one ZODB client or 15 concurrently.  (The bad news is now I probably ought to refine the speed test.)

The bad news in version 1.1 that became good news in version 1.1.1:

  • The Oracle adapter took a hit in write speed in 1.1, but a simple optimization (using setinputsizes()) fixed that and writes to Oracle are now slightly faster than they were in RelStorage 1.0.
  • MySQL performance bugs continue to be a problem for packing.  I attempted to pack a 5 GB customer database, with garbage collection enabled, using RelStorage 1.1, but MySQL mis-optimized some queries and wanted to spend multiple days on an operation that should only take a few seconds.  I replaced two queries involving subqueries with temporary table manipulations.  Version 1.1.1 has the workarounds.

The bad news that I haven’t resolved:

  • Writing a lot of objects to MySQL is now apparently a little slower.  With 15 concurrent writers, it used to take 2.2 seconds for each of them to write 10,000 objects, but now it takes 2.9 seconds.  This puts MySQL in last place for write speed under pressure.  That could be a concern, but I don’t think it should hold up the release.

Of the three databases RelStorage supports, MySQL tends to be both the fastest and the slowest.  MySQL is fast when RelStorage executes simple statements that the optimizer can hardly get wrong, but MySQL is glacial when its query optimizer makes a bad choice.  PostgreSQL performance seems more predictable.  PostgreSQL’s optimizer seems to handle a subquery just as well as a table join, and sometimes SQL syntax rules make it hard to avoid a subquery.  So, to support MySQL, I have to convert moderately complex statements into a series of statements that force the database to make reasonable decisions.  MySQL is smart, but absent-minded.

The ZEO tests no longer segfault in ZODB 3.8.1, which is very good news.  However, ZEO still appears to be the slowest at reading a lot of objects at once.  While it would be easy to blame the global interpreter lock, the second chart doesn’t agree with that assumption, since it shows that ZEO reads are slow regardless of concurrency level.  ZEO clients write to a cache after every read and perhaps that is causing the overhead.

I will recommend that all users of RelStorage upgrade to this version.  The packing improvements alone make it worth the effort.

RelStorage 1.1: All Green

I said earlier that before I release RelStorage 1.1, I wanted to set up thorough automated testing of RelStorage with different combinations of ZODB versions, RelStorage versions, Python versions, and platform types.  I originally intended to use hand crafted scripts to run the tests, but this past week I learned Buildbot’s model through experience working on a customer’s Buildbot configuration, so now I’m comfortable with it and I decided to use Buildbot after all.

Here are the results.

I could just say all the tests are passing, but it only feels real when there is a table with many green cells and no red cells.  This system tests a lot of software combinations.  There are three platforms: 32 bit Debian Etch, 32 bit Debian Lenny, and 64 bit Debian Lenny.  On each platform, I’m testing the RelStorage trunk and 1.1 branch combined with ZODB 3.7, ZODB 3.8, and the ZODB shane-poll-invalidations branch, except that the 64 bit platform with Python 2.5 does not test the ZODB 3.7 branch, since ZODB 3.7 is not compatible with 64 bit Python 2.5.  Each 32 bit platform runs tests with MySQL, PostgreSQL, and Oracle.  The 64 bit platform runs only the MySQL and PostgreSQL tests.

That adds up to 44 tested combinations, all passing.  I can relax now!  My Buildbot is running on a private server, but I can expose a read-only web view if there’s interest.

Always Run the RelStorage Tests

I would like to advise all users of RelStorage to run the RelStorage test suite on their staging and production servers before running any application that uses RelStorage.  There are a number of ways to misconfigure the database and the tests will reveal many kinds of issues.  The test suite is in very good shape and should pass every test, every time.

Here are a few MySQL-specific misconfigurations that the tests will reveal:

  • An incorrect database adapter.  For example, Debian Etch still ships MySQLdb 1.2.1, causing most of the tests to fail.
  • Insufficient packet size.  In my.cnf, you should increase max_allowed_packet from 16M to at least 32M.  This parameter limits the maximum size of an object stored by RelStorage.
  • Insufficient space in /tmp.  You need a lot more than 16M available, but linux-vserver limits /tmp to 16M by default.

To run the suite, do something like this:

cd /path/to/relstorage
export PYTHONPATH=`pwd`:/path/to/ZODB/src
python relstorage/tests/  # or or

Personally, every time I run these tests on a new database, I discover something misconfigured.  It is worth the time to run the tests.

Back to Python, Zope, and Plone

I’ve been writing code for the family history department of the Church of Jesus Christ of Latter-day Saints for the last 4+ years.  It has been a good experience and I feel like I’ve contributed and learned a lot, but now I have a new opportunity with my family that I shouldn’t pass up.  My family business is now doing software consulting!  And guess what kind of consulting offers we’re getting?  It’s all Python, Zope, and Plone!  I’m really happy about that.  I feel like working in Python gives us a good chance to advance the state of the art.

Our first full time gig starts in December.  It will involve a lot of work with RelStorage, which is another cool bonus.  We expect the work to take six months, maybe a little more.  Among other work, we are going to make RelStorage rock on MySQL (even more than it does already).

I also need to tie up some loose ends on RelStorage: I need to integrate the optimized Oracle queries, finish setting up the test environment, and release 1.1.  (It’s about time, eh?  Version 1.1c2 has been out there a long time.  Blame Java… 😛 )

RelStorage Test Plan

Before I release RelStorage 1.1, I believe I should set up a complete test environment.  The test environment will test different versions of Python, ZODB, the supported databases, and the supported database adapters.

I started by installing buildbot, hoping it would solve most of the problem for me.  However, I wanted a builder that would check out two things, ZODB and RelStorage, and install them both; there is no obvious way to do that in buildbot.  So I uninstalled buildbot and decided to write scripts instead.  Maybe I’ll figure out how to make buildbot run my scripts later.

Using Linux-VServer, which I already have set up, I will set up 3 new virtual servers and name them after birds that honk when you annoy them.  “goose1” will be 32 bit Debian Etch with Python 2.4, PostgreSQL 8.1, MySQL, Oracle 10g XE, and cx_Oracle 4.x.  “goose2” will be 32 bit Debian Lenny with Python 2.5, PostgreSQL 8.3, MySQL, Oracle 10g XE, and pre-release cx_Oracle 5.  “goose3” will be 64 bit Debian Lenny with Python 2.5, PostgreSQL 8.3, and MySQL.  Each of the servers will test several combinations of RelStorage checkouts (the trunk, the 1.0 branch, and the 1.1 branch) and ZODB checkouts (the 3.7 and 3.8 branches at least) with all of the installed databases.

This will not test all possible combinations, but should be enough to catch a lot of problems early.  Combinations that I would classify as unimportant at this time include 64 bit Python 2.4 (since Python 2.5 supports 64 bit much better) and Oracle on a 64 bit host (since 10g XE only comes in 32 bit).

RelStorage: MD5 sums

If you study RelStorage a bit, you’ll discover that every object it stores is accompanied by an MD5 sum of the object state.  Then you’ll probably wonder why, since MD5 computation is cheap but not free.  We do it to support undo.

ZODB expects the storage to check whether an undo operation is safe before actually doing it.  FileStorage performs that verification using the following algorithm: if each object’s state in the transaction to undo matches the object’s current state, it is safe to undo.  If any object does not fit that rule, raise an UndoError instead.

RelStorage uses the same algorithm, but it compares states using the MD5 sum rather than the full state, allowing the comparison to proceed quickly.  Actually, the real issue is not speed, but portability. Do all of the supported relational databases have the ability to compare the contents of BLOBs in a query?   It’s hard to find documentation on questions like that.  It’s much easier to just compare MD5 sums.

Besides, it generally feels good to keep MD5 sums around.  If the filesystem hosting your database ever accumulates some corruption, you can use the MD5 sums to help sort out the mess.

RelStorage 1.1 beta 1

This release has greatly optimized packing. This time for real, I hope! 🙂

At this point, I am confident in the correctness of the pack algorithm. Version 1.0 had it right as well. However, version 1.0 was naive about just how many object states might be packed all at once. It is unwise to attempt to delete gigabytes of data in millions of rows with a single, simple delete statement; it turns out that can take multiple days to complete. Even worse is to do that while holding the commit lock.

So version 1.1 packs one transaction at a time and periodically releases the commit lock. It also has numerous other optimizations in the pack algorithm that should make packing faster and less obtrusive. So, ZODB/Zope/Plone users of the world, please take the beta release for a spin. I’ll help if you have problems with it.

Update: I did not know until now that advisory locks were only recently added to PostgreSQL.  So if you’re using PostgreSQL, at least version 8.2 is required for this beta.  I think I’ll relax that requirement before RelStorage version 1.1.