Contemplating Integration of Protocol Buffers into ZODB

I am now looking at integrating Google’s Protocol Buffers into ZODB.  This means that wherever possible, ZODB should store a serialized protocol buffer rather than a pickle.  The main thing my customer hopes to gain is language independence, but storing protocol buffers could open other possibilities as well.  I think this is a very good idea and I’m going to pursue it.

Although ZODB has always clung tightly to Python pickles, I don’t think that moving to a different serialization format violates any basic ZODB design principle.  On the other hand, I have tried to change the serialization format before; in particular, the APE project tried to make the serialization format completely pluggable.  The APE project turned out not to be viable.  Therefore, this project must not repeat the mistake that broke APE.

Why did APE fail?  APE was not viable because it was too hard to debug.  It was too hard to debug because storage errors never occurred near their source, so tracebacks were never very helpful.  The only people who could solve such problems were those who were intimately familiar with the application, APE, and ZODB, all at the same time.  The storage errors indicated that some application code was trying to store something the serialization layer did not like.  The serialization layer had no ability to voice its objections until transaction commit, at which time the incompatible application code was no longer on the stack (and might even be on a different machine if ZEO was involved).

This turned out to be a much more serious problem than I anticipated.  It made me realize that one of the top design concerns for large applications is producing high quality stack tracebacks in error messages.  A quality traceback can pinpoint the cause of an error in seconds.  I am not aware of any substitute for quality tracebacks, so I am now willing to sacrifice a lot to get them.

So I am faced with a basic design choice.  Should protocol buffers be integrated into ZODB at the application layer, rather than behind a storage layer like APE did?  If I choose to take this route, Persistent subclasses will need to explicitly store a protocol buffer.  Storage errors will indeed occur mostly at their source, since the protocol buffer classes will check validity immediately.

Now that I’ve written this out, the right choice seems obvious: the main integration should indeed be done at the application layer.  Until now, it was hard to distinguish this issue from other issues like persistent references and the format of database records.

Furthermore, the simplest thing to do at first is to store a protocol buffer object as an attribute of a normal persistent object, rather than my initial idea of creating classes that join Persistent with Google’s generated classes.  That means we will still store a pickle, but the pickle will contain a serialized protocol buffer.  Later on, I will figure out how to store a protocol buffer without a pickle surrounding it.  I will also provide a method of storing persistent references, though it might be different from the method ZODB users are accustomed to.

Wing IDE Broke?

Seeing this in a text editor makes me nervous:

That’s invalid code, but I didn’t write it: the IDE is displaying my file completely incorrectly. There are lines missing. There is some kind of repaint bug and it has something to do with scrolling. No matter how featureful an IDE might be, I can’t use it if it can’t show me a text file without jumbling the lines. When I once saw an open source text editor do the same kind of thing, I dropped that editor so fast that I no longer remember its name. 🙂

Wing IDE?

I have been trying out Wing IDE.  It’s nice that it shows me instant documentation as I’m typing, but there’s still a lot I’d like to see.  I have some feature requests:

  • The file dialog in Wing IDE is a royal pain, just like most file dialogs.  KDE is the only system I’ve seen with a consistently good file dialog, so please let me use that instead.  Provide some configuration option that tells the IDE to use a shell command like “kdialog –getopenfilename /” whenever I want to open a file.
  • NetBeans has the right idea for renaming symbols.  It’s even better than Eclipse.  In NetBeans, Ctrl-R doesn’t open a search/replace dialog, nor does it open a refactoring dialog if the symbol is private.  NetBeans does something much more clever: it selects all instances of the symbol, then as you type, all instances of that symbol change simultaneously.  No dialog is necessary.  That feature alone tempts me to use NetBeans for Python code, even though NetBeans is as oversized as Eclipse.
  • When I’m typing code, the main documentation I’m interested in is interface documentation, not implementation documentation.  So Wing IDE really needs to support zope.interface.
  • In both Eclipse and NetBeans, I can almost completely ignore import statements.  Auto-completion adds the necessary import statements automatically.  Eclipse goes even further and generates import statements when I paste code from another file, but that’s just icing on the cake.

If only Wing IDE supported these features, buying a license would be an easy decision.  A promise from the developers that those features are coming soon would be very encouraging.

Bootstrap.py versus pkg_resources.py

I’ve been using zc.buildout quite a bit over the past month.  Although it has been working, it has been doing strange things like using the wrong version of zope.interface.  Yesterday I finally figured out why, and today I found a possible solution.

It turns out that Ubuntu (8.10) provides a package called python-pkg-resources.  At least one Ubuntu package (Snowballz, a strategy game written in Python) pulls in that package automatically.  It installs a pkg_resources module in Python’s site-packages directory, but it does not install the rest of setuptools.

I can understand why Ubuntu chose to split up setuptools, but that choice causes havoc for the bootstrap.py module people use to install zc.buildout.  Here is what bootstrap.py is supposed to do:

  1. Download ez_setup.py and run it.
  2. ez_setup tries to import the pkg_resources module, but fails.
  3. The setuptools package is not found, so ez_setup downloads setuptools in a temporary directory.
  4. ez_setup alters sys.path to include the new setuptools package.
  5. bootstrap.py imports the pkg_resources module from the version of setuptools just downloaded.
  6. Ask pkg_resources about the installed setuptools package.
  7. Use setuptools to install zc.buildout.

Here is what bootstrap.py actually does when pkg_resources.py exists in the site-packages directory (differences emphasized):

  1. Download ez_setup.py and run it.
  2. ez_setup successfully imports the pkg_resources module from site-packages.
  3. The setuptools package is not found, so ez_setup downloads setuptools in a temporary directory.
  4. ez_setup alters sys.path to include the new setuptools package.
  5. boostrap.py continues to use the previously imported pkg_resources module.
  6. Ask pkg_resources about the installed setuptools package.
  7. pkg_resources does not find setuptools because pkg_resources does not notice the change to sys.path.  bootstrap.py fails.

At first, following ideas I gleaned from various posts about zc.buildout, I worked around this by deleting the setuptools egg and the pkg_resources module from site-packages.  I didn’t know exactly why this helped until I studied the problem.  It turns out that bootstrap.py was just not written to cope with a system-wide installation of pkg_resources.

Now I think I recognize another bad choice that zc.buildout has been making.  zc.buildout generates a “bin” directory full of Python scripts.  Those scripts prepend egg directories and egg zip files to sys.path before doing their work.  I noticed that sometimes the list of paths to prepend includes “/usr/lib/python2.5/site-packages”, which is already on sys.path.  I now suspect that whenever zc.buildout includes paths like that, it’s wrong, and the cause is a mixup involving a system-wide installation of pkg_resources, setuptools, or some other foundational package.

Here is a possible way to fix bootstrap.py.  Just before the “import pkg_resources” line, add this:

del sys.modules[‘pkg_resources’]

This solved the bootstrap.py problem for me.  Altering sys.modules is rarely a good idea, but this might be a good exception to the rule.  I don’t believe we need to catch KeyError because ez_setup should have imported pkg_resources already.

Beyond this, there is probably more work to do to make zc.buildout produce correct scripts.

Whoever said computers behave logically must have been joking or delusional.  The people who provide the software never fully agree with each other–nor even themselves!