RelStorage 1.1.1 Coming Soon

After a lot of testing and performance profiling, on Friday I released RelStorage 1.1.  On Saturday, I was planning to blog about the good news and the bad news, but I was unhappy enough with the bad news that I decided to turn as much of the bad news as I could into good news.  So I fixed the performance regressions and will release RelStorage 1.1.1 next week.

The performance charts have changed since RelStorage 1.0.

Chart 1: Speed tests with 100 objects per transaction.

Chart 2: Speed tests with 10,000 objects per transaction.

I’m still using 32 bit Python 2.4 and the same hardware (AMD 5200+), but this time, I ran the tests using ZODB 3.8.1.

The good news:

  • Compared with RelStorage 1.0, reads from MySQL and Oracle are definitely faster.  In some tests, the Oracle adapter now reads as quickly as MySQL.  To get the Oracle optimization, you need cx_Oracle 5, which was just released this week.  Thanks again to Helge Tesdal and Anthony Tuininga for making this happen.
  • Loading 100 objects or less per transaction via MySQL is now so fast that my speed test can no longer measure it.  The time per transaction is 2 ms whether I’m running one ZODB client or 15 concurrently.  (The bad news is now I probably ought to refine the speed test.)

The bad news in version 1.1 that became good news in version 1.1.1:

  • The Oracle adapter took a hit in write speed in 1.1, but a simple optimization (using setinputsizes()) fixed that and writes to Oracle are now slightly faster than they were in RelStorage 1.0.
  • MySQL performance bugs continue to be a problem for packing.  I attempted to pack a 5 GB customer database, with garbage collection enabled, using RelStorage 1.1, but MySQL mis-optimized some queries and wanted to spend multiple days on an operation that should only take a few seconds.  I replaced two queries involving subqueries with temporary table manipulations.  Version 1.1.1 has the workarounds.

The bad news that I haven’t resolved:

  • Writing a lot of objects to MySQL is now apparently a little slower.  With 15 concurrent writers, it used to take 2.2 seconds for each of them to write 10,000 objects, but now it takes 2.9 seconds.  This puts MySQL in last place for write speed under pressure.  That could be a concern, but I don’t think it should hold up the release.

Of the three databases RelStorage supports, MySQL tends to be both the fastest and the slowest.  MySQL is fast when RelStorage executes simple statements that the optimizer can hardly get wrong, but MySQL is glacial when its query optimizer makes a bad choice.  PostgreSQL performance seems more predictable.  PostgreSQL’s optimizer seems to handle a subquery just as well as a table join, and sometimes SQL syntax rules make it hard to avoid a subquery.  So, to support MySQL, I have to convert moderately complex statements into a series of statements that force the database to make reasonable decisions.  MySQL is smart, but absent-minded.

The ZEO tests no longer segfault in ZODB 3.8.1, which is very good news.  However, ZEO still appears to be the slowest at reading a lot of objects at once.  While it would be easy to blame the global interpreter lock, the second chart doesn’t agree with that assumption, since it shows that ZEO reads are slow regardless of concurrency level.  ZEO clients write to a cache after every read and perhaps that is causing the overhead.

I will recommend that all users of RelStorage upgrade to this version.  The packing improvements alone make it worth the effort.

RelStorage 1.1: All Green

I said earlier that before I release RelStorage 1.1, I wanted to set up thorough automated testing of RelStorage with different combinations of ZODB versions, RelStorage versions, Python versions, and platform types.  I originally intended to use hand crafted scripts to run the tests, but this past week I learned Buildbot’s model through experience working on a customer’s Buildbot configuration, so now I’m comfortable with it and I decided to use Buildbot after all.

Here are the results.

I could just say all the tests are passing, but it only feels real when there is a table with many green cells and no red cells.  This system tests a lot of software combinations.  There are three platforms: 32 bit Debian Etch, 32 bit Debian Lenny, and 64 bit Debian Lenny.  On each platform, I’m testing the RelStorage trunk and 1.1 branch combined with ZODB 3.7, ZODB 3.8, and the ZODB shane-poll-invalidations branch, except that the 64 bit platform with Python 2.5 does not test the ZODB 3.7 branch, since ZODB 3.7 is not compatible with 64 bit Python 2.5.  Each 32 bit platform runs tests with MySQL, PostgreSQL, and Oracle.  The 64 bit platform runs only the MySQL and PostgreSQL tests.

That adds up to 44 tested combinations, all passing.  I can relax now!  My Buildbot is running on a private server, but I can expose a read-only web view if there’s interest.

z3c.sharding

I just uploaded to svn.zope.org a new package called z3c.sharding.  It is a BTree container that spreads its contents among multiple databases in an automatically balanced way.  It does all of its work with pure and simple ZODB mechanisms.  It should be useful for solving certain kinds of scaling problems.  Enjoy!  There are no releases yet, but go ahead and try it out; if it solves a real problem in its current state, I’ll make a release.

Back to Python, Zope, and Plone

I’ve been writing code for the family history department of the Church of Jesus Christ of Latter-day Saints for the last 4+ years.  It has been a good experience and I feel like I’ve contributed and learned a lot, but now I have a new opportunity with my family that I shouldn’t pass up.  My family business is now doing software consulting!  And guess what kind of consulting offers we’re getting?  It’s all Python, Zope, and Plone!  I’m really happy about that.  I feel like working in Python gives us a good chance to advance the state of the art.

Our first full time gig starts in December.  It will involve a lot of work with RelStorage, which is another cool bonus.  We expect the work to take six months, maybe a little more.  Among other work, we are going to make RelStorage rock on MySQL (even more than it does already).

I also need to tie up some loose ends on RelStorage: I need to integrate the optimized Oracle queries, finish setting up the test environment, and release 1.1.  (It’s about time, eh?  Version 1.1c2 has been out there a long time.  Blame Java… 😛 )

An Abominable, Crazy Idea

What if I were to take a Plone install, delete all of the Zope 2 code (leaving mainly Zope 3 stuff), then mangle the code (including CMF) until Plone works again?

I know I’m foolish for saying this, but my intuition tells me I could actually do it, especially if I got a lot of help from others.  I think it would take 1 to 3 months of full time effort.  I would redesign all sorts of stuff in the process.  The resulting code would not be compatible with any existing site; the necessary migration scripts would take many more months, I think.  But the code would be enough to create new sites from scratch and put them on the web.  The intent would be to make Plone more maintainable.

It would be really crazy… but fun, I think.

The FamilySearch Security Policy Editor and the Zope Component Architecture

Over the past couple of months, I have been working to make it easy for administrators to create and maintain a complex security policy for a giant archive of digital artifacts.  In the process, I think I have found a useful way to configure complex software systems such as Zope 3.

A Security Policy for Dead People

The archive in question stores images, documents, and various other records about dead people.  (Genealogy is mostly about dead people, after all!)  The archive has not yet been deployed, but it will replace an existing simpler system.  Assuming the archive is successful, developers at familysearch.org (my employer) will want to adopt it for their own purposes.  As adoption grows, so will the complexity of the security policy applied to the archive.  Therefore, the security policy must be manageable.  People should not fear the prospect of making changes to the security policy.  Changes in how the system is used should lead to changes in the policy.  If the policy does not evolve with usage, the archive will stagnate to some extent and so will some of the work being done.

Because the requirements are complex, the security policy is also complex.  There are currently six degrees of freedom, meaning that there are six independent variables that affect the outcome of a security policy check.  I don’t know about everyone else, but my quick intuition is typically limited to three dimensions; any more requires a great deal more rational exercise.  Six dimensions is often too much to work with quickly and confidently.

However, I believe the right user interface can optimize that kind of rational exercise.  Following that belief, I created a graphical tool for managing the security policy.  It can answer questions with simple interactions, increasing people’s confidence that they are changing the policy correctly.  I eliminated the need for humans to parse and generate XML, which I think they will find helpful.  But the best part, I think, is I put test-first methodology right before the user’s face.  A screen shot follows.

FamilySearch RBAC Policy Editor

The acronym RBAC in the title stands for Role-Based Access Control.  The six trees in the top left represent the six degrees of freedom; each degree has a grouping hierarchy.  On the right is a report of whether users attempting the selected combination would be granted access.  The reports are updated instantly whenever the user selects a tree node.  The screen shot posted here is showing that according to the policy.xml file in my home directory, users with any role can retrieve any image stream of any published image artifact, regardless of license.  This interface is the place to change that policy.

At the bottom, there are three tabs.  The first tab has a table showing all policy directives.  A directive states that access is to be allowed or denied if the request fits the specified combination.  To change the policy so that people must at least be authenticated before viewing images, the user of this application simply selects the directive shown, clicks the Edit button, chooses a different role, and clicks Ok.

In the status bar is a report of how many tests are passing.  If people use this feature, I expect the application to be quite successful.  The tests tab contains a matrix of tests and test users; each test user has a list of roles.  The cells of the matrix each have a checkbox that shows whether a given test user is expected to be able to do something according to the policy.  If the outcome of the policy does not match the expectation, the cell turns red and the number of passing tests decreases.

RBAC Editor showing tests tab

The report panels on the right feature the ability to show all directives or tests that meet some criteria.  If I want to know why someone’s access is denied when I thought some directive allowed it, I select the conditions of their request, then look on the right to see what the policy says about it.  If it says no directives match, then I select or deselect conditions on the left until I find the directive that needs to change.  If there really is no directive that matches, I add a new directive (and a test!) and verify the change using the report panels again.

The application has other goodies designed to increase users’ confidence, such as fully integrated undo/redo, error and warning highlights instead of cryptic dialog boxes, and “find” fields that filter the rows of the tables.  I expect that this is enough for a security policy administrator.  To make it as friendly as an iPod is not a goal and would even be a disservice for people who are responsible for complex things like a security policy.

A Configuration for Living People

Throughout the process of designing and implementing this, I have kept one thought in my mind: could I use something like this to configure components in the Zope component architecture?  The component architecture solves big, interesting problems, but it also makes the outcome of configuration decisions much less obvious.  If I made an application like this that lets you see and modify the outcome of configuration decisions interactively, would it be useful to the developer community at large?

Boy, would I love to find out.  I started the Zope Jam project some time ago and haven’t done anything with it since, although I thought my initial prototypes looked promising.  I stopped the project because I felt something nagging at me that the design was wrong.  Now I think I see one specific blocker: the whole thing was designed around ZCML.  It appears today that the Zope community strongly supports the component architecture, but not necessarily ZCML.  So the new project would be an interactive configuration browser and it may support more than one way of modifying the configuration.

I still prefer to make it a desktop GUI application (written in Python, rather than Java Swing, which was required for the policy editor), with a variety of low-latency widgets and no access control issues, rather than a browser-based application.  It should run user code directly, so that when the user asks what the outcome of an adapter lookup would be, the GUI’s answer would always be correct.  It should integrate tests of the configuration much like I did with the policy editor.  It should do everything possible to increase the software developer’s confidence in the component architecture.

Let’s Build This

Does anyone else get excited about this?  I love finding ways to make complex things simple.  If I could find a company to fund the development of this, I would work on it full time.  I think it would be a major time saver for any company that is doing significant software development using the Zope component architecture.

RelStorage: MD5 sums

If you study RelStorage a bit, you’ll discover that every object it stores is accompanied by an MD5 sum of the object state.  Then you’ll probably wonder why, since MD5 computation is cheap but not free.  We do it to support undo.

ZODB expects the storage to check whether an undo operation is safe before actually doing it.  FileStorage performs that verification using the following algorithm: if each object’s state in the transaction to undo matches the object’s current state, it is safe to undo.  If any object does not fit that rule, raise an UndoError instead.

RelStorage uses the same algorithm, but it compares states using the MD5 sum rather than the full state, allowing the comparison to proceed quickly.  Actually, the real issue is not speed, but portability. Do all of the supported relational databases have the ability to compare the contents of BLOBs in a query?   It’s hard to find documentation on questions like that.  It’s much easier to just compare MD5 sums.

Besides, it generally feels good to keep MD5 sums around.  If the filesystem hosting your database ever accumulates some corruption, you can use the MD5 sums to help sort out the mess.

RelStorage 1.1 beta 1

This release has greatly optimized packing. This time for real, I hope! 🙂

At this point, I am confident in the correctness of the pack algorithm. Version 1.0 had it right as well. However, version 1.0 was naive about just how many object states might be packed all at once. It is unwise to attempt to delete gigabytes of data in millions of rows with a single, simple delete statement; it turns out that can take multiple days to complete. Even worse is to do that while holding the commit lock.

So version 1.1 packs one transaction at a time and periodically releases the commit lock. It also has numerous other optimizations in the pack algorithm that should make packing faster and less obtrusive. So, ZODB/Zope/Plone users of the world, please take the beta release for a spin. I’ll help if you have problems with it.

Update: I did not know until now that advisory locks were only recently added to PostgreSQL.  So if you’re using PostgreSQL, at least version 8.2 is required for this beta.  I think I’ll relax that requirement before RelStorage version 1.1.

In Defense of Narrative Doctests

The Zope community likes to innovate on programming. One such innovation is to use extensive “doctests” in place of conventional code tests. This practice, like other innovations, stirs up a little controversy now and then, since it’s often not clear how widely the idea should be applied.

In that context, may I just say that the doctests for zope.wfmc are a pleasure to read. Now if this package just had a wiki-based homepage and linked online API docs, I think it would be a fine model for other packages.

RelStorage 1.0 and Measurements

RelStorage 1.0 is out!

To celebrate, I thought I’d post some measurements. They are the results of a speed test I’ve been running on RelStorage. I created the speed test for the purpose of identifying performance problems. The test certainly helped, but I found that the results varied dramatically between runs, so don’t take these too seriously.

The test hardware is an AMD 5200+ (2 cores, 2.6 GHz). The kernel is 64 bit Linux 2.6.24, but userland is a 32 bit Debian “sid” environment in a chroot area (because it’s much easier to install Oracle in 32 bit Debian than in 64 bit Gentoo). The test is running on Python 2.4 and ZODB 3.7.1 (from Zope 2.10.5). The contenders are ZEO + FileStorage, PostgreSQL 8.3, MySQL 5.0.51a, and Oracle 10g XE. I am using the default configuration of all databases.

The first test shows the time spent per transaction to write 100 objects, followed by the average time spent to read those 100 objects in a different transaction. The time shown is the best of 3 runs, where each run measures the mean duration of each of 10 transactions.

Timing with 100 Objects Per Transaction

This chart says a few interesting things:

  • MySQL is the champ in both read and write speed. I tried hard to make PostgreSQL match, but I haven’t yet found a way. This measurement is backed up by other general observations such as the speed of the unit tests. Both databases run the same 100+ tests, but MySQL runs them in about 17 seconds, while PostgreSQL runs them in about 29 seconds.
  • Oracle 10g XE, unlike the standard Oracle product, is not intended for production. It’s limited in several ways. This test is probably not fair to Oracle, because I expect the standard edition of Oracle to fare better.
  • The ZEO + FileStorage results become unpredictable as the concurrency increases. I have a strong suspicion why that occurs: the ZEO client storage often crashes with a segfault! The speed test automatically recovers from the crash and re-runs the test, but I don’t feel like I can trust the results when ZEO is segfaulting. The crash occurs in zrpc when a lot of clients connect at once.
  • PostgreSQL write speed was only a little slower than ZEO in this test.

The second test shows the time spent per transaction to write and read 10,000 objects.

Time per transaction for 10,000 objects

In this case, MySQL still leads in read speed, but both ZEO and PostgreSQL beat MySQL in write speed. Surprisingly, the ZEO read speed turns out to be the slowest of all the tests, perhaps because 10,000 objects do not fit in the ZEO cache.

The ZEO measurements have major aberrations on the right side of the chart. This may be caused by the fact that during these tests, the ZEO client segfaults on nearly every test run. Fortunately, the fault is in the client, not the server, and these tests put unusually high stress on the ZEO connection setup and teardown code. ZEO connection setup and teardown usually only occurs when Zope starts or stops.

While the primary goal for RelStorage is reliability, not performance, it’s nice to see test results that suggest RelStorage can compete and often win.

UPDATE 12/2008: I clarified the statement above about Oracle 10g XE.