RelStorage 1.4.0b1 and zodbshootout

I just released RelStorage 1.4.0b1.  New features:

  • More documentation.
  • Support for history-free storage on PostgreSQL, MySQL, and Oracle.  This reduces the need to pack and makes RelStorage more appropriate for session storage.
  • Speed.  New tests prompted several optimizations that reduced the effect of network latency in both read and write operations.  Memcached support is now integrated in a much better way.
  • Support for asynchronous database replication.  Previous versions of RelStorage worked with MySQL replication, but did not keep ZODB caches in sync when failing over to a slave that was slightly out of date.
  • The Oracle adapter now uses PL/SQL for speed and lock timeouts.  Lock timeouts are important for preventing cluster lockup.
  • Moved the speed test script into a separate package named zodbshootout, making it easier for developers and administrators to run comparative performance tests.
  • The adapter code is more modular, making it easier to support new kinds of databases and database adapter modules.

The zodbshootout script tells me this release of RelStorage is faster than ever.  It reports objects read or written per second, so unlike the previous charts I’ve made, bigger is now better.  Here are the results:

relstorage-speed

PostgreSQL now beats MySQL in some of the tests.  Oracle (not on this chart) is now looking pretty good too.

The new features led to far more automated tests.  My private Buildbot, which tests RelStorage with several combinations of Python, ZODB, and operating systems (in virtual private servers), now takes 2 hours to run all the tests.  Maybe I need to upgrade that server or investigate the possibility of making Buildbot launch an Amazon EC2 instance.

The previous release was 1.3.0b1, which added ZODB blob support.  Several customers asked for new features right after I released 1.3.0b1, so I decided to jump to version 1.4.0b1 rather than finalize the 1.3 series.  The 1.2 series has had more extensive testing, so use that for a while if you have troubles with 1.4.0b1.

This new release should be particularly interesting for Plone users, since Plone is always hungry for faster infrastructure.

PyCon 2010: I Want to Present Something

I’ve been racking my brains to find something to present at PyCon 2010. I have been trying to find something good to present since PyCon 2003, when I last presented at PyCon.  (I talked about Ape, the Adaptable Persistence Engine for Zope.)  I really liked the experience of presenting and it led to a lot of interesting conversations. Since then, however, nothing has really struck me as a good idea for a presentation. The right idea has to fit at least these criteria:

  • It’s something I’m good at.
  • At least a handful of PyCon attendees would like to learn more about it in a presentation.
  • I can’t spend weeks to prepare for the presentation.

This year, I was planning to do a really fun presentation by writing some Python scripts for controlling a RepRap, then I was going to present the hardware and software.  That didn’t work out, however, because the warping issues with ABS (a type of plastic) are just too severe to print useful parts, so my RepRap has sat idle.  (I’m now considering PLA, but it’s a newish and expensive material.)

Here are a few other ideas for a presentation topic.  What do you think?  Any other ideas?

  • RelStorage.  I could talk about future plans, why I think it is an improvement over ZEO, and why you should use it.  There seem to be a lot of people with questions about RelStorage, so it would be nice to have some time to answer the questions in a big room so others can hear the answers.
  • KARL and BFG.  This would probably be a team presentation.  KARL is some fairly interesting software that I got to help develop this year.  I could talk about the software, along with the development experience and style.  In KARL we made a conscious choice to ignore certain apparent DRY violations, leading to significant productivity gains.
  • A friendly introduction to Buildout.  Buildout is a tool that a lot of developers need, but don’t know it yet.  The function that Buildout performs is as important as version control and automated testing.  Come find out why Buildout is far better than a pile of Makefiles.
  • An introduction to Buildout (zc.buildout) for people familiar with Apache Maven.  Buildout and Maven fill approximately the same niche, but for different audiences.  (Buildout for Python, Maven for Java.)  Maybe there are Mavenites at the conference who would like to switch to a more Python centric system.
  • A discussion of text indexing in Plone and BFG.  This might be a narrow topic, but I find it interesting and important.  I have found ways to reduce complex 90 second text searches to 1 second.  The solution is not pure Python, unfortunately. 😉  I have also thought about how to expand into areas like faceted search/browse functionality.

Feedback encouraged!

RelStorage 1.3.0b1, Now With Blob Support

I have just released two versions of RelStorage. Version 1.3.0b1 adds full support for ZODB blobs stored on the filesystem. Version 1.2.0 is currently the better choice if you’re upgrading a production system and don’t need blob support.

People have been asking for blob support for months. I am glad to finally get it done, with a little help from a customer. With blob support, now we can easily store large artifacts on the filesystem, while keeping all metadata in the database.

To celebrate the new release, I have created a sample buildout.cfg that builds Plone with RelStorage, PostgreSQL, and blob support. (Thanks goes to Hanno Schlichting, who released a compatible version of plone.recipe.zope2instance only moments after I requested it.) Here it is:

[buildout]
parts = plone zope2 instance zopepy
find-links =
    http://dist.plone.org
    http://download.zope.org/ppix/
    http://download.zope.org/distribution/
    http://effbot.org/downloads
    http://packages.willowrise.org
eggs =
    elementtree
    PILwoTk
    RelStorage
    psycopg2
versions = versions

[versions]
ZODB3 = 3.8.3-polling
RelStorage = 1.3.0b1

[plone]
recipe = plone.recipe.plone

[zope2]
recipe = plone.recipe.zope2install
url = ${plone:zope2-url}

[instance]
recipe = plone.recipe.zope2instance
zope2-location = ${zope2:location}
user = admin:admin
products = ${plone:products}
eggs =
    ${buildout:eggs}
    ${plone:eggs}
    plone.app.blob
zcml = plone.app.blob
rel-storage =
    type postgresql
    dsn dbname='plone' user='plone' host='localhost' password='plone'
    blob-dir var/blobs

[zopepy]
recipe = zc.recipe.egg
eggs = ${instance:eggs}
interpreter = zopepy
extra-paths = ${instance:zope2-location}/lib/python
scripts = zopepy zodbconvert

P.S. I have been told that a very prominent Plone developer recently configured RelStorage with master/slave replication on MySQL, and that it works smoothly. I expect him to announce his success soon!

RelStorage 1.2.0b2 Released

This release works with unpatched versions of ZODB 3.9!  A big thank-you to Jim Fulton for including support for RelStorage in ZODB.  This release also continues to support patched versions of ZODB 3.7 and 3.8.

I have been doing a lot of testing, and I have found MySQL 5.1.34 to be a lot more stable than earlier releases of MySQL 5.1, so I am now declaring MySQL 5.1.34 and above supportable, meaning that if you ask questions about it, I am no longer going to request that you revert to MySQL 5.0. 🙂

Finally, I recently expanded my private RelStorage Buildbot to include a Windows XP slave.  After solving a couple of minor test glitches, the test results are now all consistently green on 4 platforms:

  • Debian Etch, 32 bit (Python 2.4.4, MySQL 5.0.32, PostgreSQL 8.1.17, Oracle 10g XE)
  • Debian Lenny, 32 bit (Python 2.5.2, MySQL 5.0.51a, PostgreSQL 8.3.7, Oracle 10g XE)
  • Debian Lenny, 64 bit (same as above but no Oracle)
  • Windows XP, 32 bit (Python 2.6.2, MySQL 5.1.34, PostgreSQL 8.3.7)

I’m thinking about adding another Linux slave that runs MySQL 5.1 and Python 2.6.

Anyway, enjoy the release!

P.S. You may be wondering why I released 1.2.0b2 instead of 1.2.0b1.  A little slip ruined the web page on PyPI, so I fixed the slip and skipped to the next version number.

How to Install Plone with RelStorage and MySQL

These step by step instructions describe how to install Plone on Ubuntu with RelStorage connected to MySQL as the main database. Familiarity with Linux systems administration is expected. Update: These instructions were revised in August 2009 for Plone 3.2.3 and RelStorage 1.2.0.

Continue reading How to Install Plone with RelStorage and MySQL

RelStorage for Sessions?

I’ve been working on a document explaining how to install Plone with RelStorage, starting from a basic Linux server.  As always, the basic procedure is simple, but there are all sorts of interesting little complications.  One detail that bugged me today is the need for a shared session database.

Session storage is a little different from normal storage because keeping a history of session state becomes expensive quickly.  For session storage, I think we still want all the goodness of ZODB transactions, conflict detection, distributed caching, and so on, but in this case, the ability to undo is pointless and the need to pack is a liability.

The database schema I’ve been using in RelStorage is a mismatch for this need.  A history-free storage should not have a “transaction” table, there should be no need for MD5 sums and prev_tid pointers, and the compound primary keys consisting of oid and tid should become simple primary keys indexed by oid.  A history-free storage still needs garbage collection, but not packing.

I’m thinking that the main RelStorage class will need few changes to support history-free storages, while the database adapter class will change so much that it would be best to just create a different adapter.  I like that.  It won’t be possible to switch history on and off without an export and import operation, but that seems reasonable.  To put a positive spin on the new adapters, I think I’ll call the new adapters “packless”, like the old BerkeleyDB storage.

I think that’s a good plan.  While the purpose of the packless adapters is initially session storage, they will certainly be also usable for other databases, including the main database.  I expect them to be slightly simpler and faster than the history-preserving adapters we have now.

I encourage anyone interested to leave a comment, even if all you wish to say is “I want that!” 🙂

The Fastest WSGI Server for Zope

I have been planning to compare mod_wsgi with paste.httpserver, which Zope 3 uses by default.  I guessed the improvement would be small since parsing HTTP isn’t exactly computationally intensive.  Today I finally had a good chance to perform the test on a new linode virtual host.

The difference blew me away.  I couldn’t believe it at first, so I double-checked everything.  The results came out about the same every time, though:

wsgi-zope1

I used the ab command to run this test, like so:

ab -n 1000 -c 8 http://localhost/

The requests are directed at a simple Zope page template with no dynamic content (yet), but the Zope publisher, security, and component architecture are all deeply involved.  The Paste HTTP server handles up to 276 requests per second, while a simple configuration of mod_wsgi handles up to 1476 per second.  Apparently, Graham‘s beautiful Apache module is over 5 times as fast for this workload.  Amazing!

Well, admittedly, no… it’s not actually amazing. I ran this test on a Xen guest that has access to 4 cores.  I configured mod_wsgi to run 4 processes, each with 1 thread. This mod_wsgi configuration has no lock contention.  The Paste HTTP server lets you run multiple threads, but not multiple processes, leading to enormous contention for Python’s global interpreter lock.  The Paste HTTP server is easier to get running, but it’s clearly not intended to compete with the likes of mod_wsgi for production use.

I confirmed this explanation by running “ab -n 1000 -c 1 http://localhost/”; in this case, both servers handled just under 400 requests per second.  Clearly, running multiple processes is a much better idea than running multiple threads, and with mod_wsgi, running multiple processes is now easy.  My instance of Zope 3 is running RelStorage 1.1.3 on MySQL.  (This also confirms that the MySQL connector in RelStorage can poll the database at least 1476 times per second.  That’s good to know, although even higher speeds should be attainable by enabling the memcached integration.)

I mostly followed the repoze.grok on mod_wsgi tutorial, except that I used zopeproject instead of Repoze or Grok.  The key ingredient is the WSGI script that hits my Zope application to handle requests.  Here is my WSGI script (sanitized):

# set up sys.path.
code = open('/opt/myapp/bin/myapp-ctl').read()
exec code

# load the app
from paste.deploy import loadapp
zope_app = loadapp('config:/opt/myapp/deploy.ini')

def application(environ, start_response):
    # translate the path
    path = environ['PATH_INFO']
    host = environ['SERVER_NAME']
    port = environ['SERVER_PORT']
    scheme = environ['wsgi.url_scheme']
    environ['PATH_INFO'] = (
        '/myapp/++vh++%s:%s:%s/++%s' % (scheme, host, port, path))
    # call Zope
    return zope_app(environ, start_response)

This script is mostly trivial, except that it modifies the PATH_INFO variable to map the root URL to a folder inside Zope. I’m sure the same path translation is possible with Apache rewrite rules, but this way is easier, I think.

How to Fix the MySQL Write Speed

Last time I ran the RelStorage performance tests, the write speed to a MySQL database appeared to be slow and getting slower.  I suspected, however, that all I needed to do was tune the database.  Today I changed some InnoDB configuration parameters from the defaults.  The simple changes solved the MySQL performance problem completely.

The new 10K chart, using RelStorage 1.1.3 on Debian Sid with Python 2.4 and the same hardware as before:

I added the following lines to my.cnf to get this speed:

innodb_data_file_path = ibdata1:10M:autoextend
innodb_buffer_pool_size=256M
innodb_additional_mem_pool_size=20M
innodb_log_file_size=64M
innodb_log_buffer_size=8M
innodb_flush_log_at_trx_commit=1
innodb_file_per_table

This is similar to the configuration suggested by the InnoDB documentation for a 512 MB database server.  Even if you have a 16 GB server, I would suggest starting with the settings for a 512 MB server, then watch what happens to the RAM and CPU on the database server when you connect all of your client machines simultaneously.  You want to leave at least half the RAM available for disk cache and usage spikes.

Not all of these changes are related to speed.  The innodb_file_per_table option just seems like a good idea because it makes tables visible on the filesystem, which should improve manageability.  I think it might improve cache locality as well.

With these changes to my.cnf, ZEO, PostgreSQL, and MySQL all perform about the same for writes, with MySQL having a slight lead.  I suspect all three are hitting hardware and kernel limits.  I think the differences would be more pronounced on higher-end storage hardware.

A big caveat: It’s risky to change InnoDB settings unless you’re familiar with all the effects.  Some changes break compatibility with existing table data.  Get to know the InnoDB documentation very well before you change these settings, and make backups using mysqldump, as always.

Meanwhile, Oracle XE continues to write slowly and ZEO read performance is so bad that it’s off the chart.  I bet ZEO read performance could be improved with some simple optimizations somewhere, but I don’t have an incentive to fix that. 🙂  Perhaps it has been fixed in ZODB 3.9.

RelStorage Support

I am more than happy to support RelStorage as best I can by email.  Every time I do, however, I always get a nagging feeling that I could help RelStorage users a lot better if we set up a short term support contract.  I would very much appreciate a chance to optimize their system by testing the performance of different configurations.  When the communication is limited to email, neither of us gets a chance to discover how we might help each other better.

So if you’re a RelStorage user and your database is growing by tens of gigabytes, please seriously consider a short term support contract with my little company.  A little tuning or code revision in the right place could yield orders of magnitude performance gains.  I really want to help you directly.  Contact me at shane (at) willowrise (dot) com.