Repozublisher

Back in 2002-2004, I was part of the team that redesigned a lot of code for Zope 3.  The publisher was one of the first things we redesigned.  While I’m sure ZPublisher started out pretty clean when it was first coded, over the years it had grown massive special cases and odd dependencies.  For Zope 3, we reduced the publisher to something clean again and provided a way for it to grow as requirements grew.

Unfortunately, it seems the growth strategy did not work.  The publisher I see now in Zope 3 is spread out among at least 3 packages (zope.publisher, zope.app.publisher, zope.app.publication) and it’s quite a tangle.  Sure there are interfaces, but a lot of code calls undocumented methods.  It especially saddens me that the publisher implements its own version of object traversal rather than use zope.traversing.

We can’t let this code languish.  Remember what the P in Zope stands for?  We have to get this right.

So what went wrong?  Perhaps we didn’t provide the right kind of extensibility.  We provided a series of documented hooks (the IPublication interface) that applications like Zope could easily override.  Wait, did I say “applications like Zope”?  Over the past 5 years, most Zope development has focused on building solid libraries with minimal dependencies, rather than building another application like Zope 2.  The audience has changed, so the requirements have changed.

I think zope.publisher is a classical framework with all the classical problems of a framework. I suppose we could try to solve the problem with another cleanup.  We could mash together zope.publisher, zope.app.publisher, and zope.app.publication in one package, which would untangle a lot, but that package would probably have a lot of unnecessary dependencies.  Then we could factor stuff out of that new package to reduce the dependencies.  After all that work, though, the publisher would still be a framework.  I expect that it would yet again accumulate special cases quickly.

The Repoze team may have provided a different solution to our publisher woes.  They have built a system based on WSGI filter pipelines.  I’ve looked at Repoze and I’m very intrigued.  For example, if you want your application to support Zope-style transactions, all you have to do is include a transaction manager in the pipeline; no extra baggage will come along, now that the transaction package has been split from ZODB.  It seems like most functions of the current publisher could be rebuilt on this design.  This is a different kind of extensibility that could control cruft accumulation much better.

So, should we replace zope.publisher with Repoze packages?  How would others feel about that, particularly the Repoze team?  It’s really fortunate that the Repoze packages are BSD licensed rather than GPL, otherwise I would not even consider this.

This is not yet a proposal.  This is just a little discussion that happens to span the planet.  The next step, if this idea doesn’t get shot down, is to create a more formal proposal.  The proposal would specify which Zope release should deprecate the current publisher packages.  (3.5? 3.6?  I don’t know.)

keas.pbpersist and keas.pbstate

Remember how I was talking about serializing data in ZODB using Google Protocol Buffers instead of pickles?  Well, Keas Inc. suggested the idea and asked me to work on it.  The first release of a package combination that implements the idea is ready:

  • keas.pbstate 0.1.1 (on PyPI), which helps you write classes that store all state in a Protocol Buffer message.
  • A patch for ZODB (available at packages.willowrise.org) that makes it possible to plug in serializers other than ZODB’s standard pickle format.
  • keas.pbpersist 0.1 (on PyPI), which registers a special serializer with ZODB so that ProtobufState objects get stored without any pickling.

This code is new, but the tests all pass and the coverage tests claim that every line of code is run by the tests at least once.  I hope these packages get used so we can find out their flaws.  I did my best to make keas.pbstate friendly, but there are some areas, object references in particular, where I don’t like the current syntax.  I don’t know if this code is fast–optimization would be premature!

I should mention that while working with protobuf, I got the feeling that C++ is the native language and Java and Python are second class citizens.  I wonder if I’d have the same feeling with other serialization formats.

RelStorage 1.1.2

This release has two new useful features, one for performance, one for safety.

The performance feature is that if you use both the cache-servers and poll-interval options, RelStorage will use the cache to distribute basic change notifications.  That means we get to lighten the load on the database using the poll-interval, yet changes should still be seen instantly on all clients.  Yay! 🙂

The only drawback I expect is that caching makes debugging more difficult.  Still, this option should help people build enormous clusters, like the one my current customer was planning to build, although I got word today that they have changed their mind.

The new safety feature is the pack-dry-run option, which lets you run only the nondestructive pre_pack phase to get a list of everything that would be deleted by the pack phase.  This should be particularly useful if you’re trying out packing for the first time on a big database.  My current customer would have benefited from this too.

I also fixed a bug that caused the pack code to not remove as much old stuff as it should and I started using PyPI instead of the wiki as the main web page.  Using PyPI means I have to maintain only one README, which gets translated automatically into the PyPI page.  Until now I’ve had to maintain both the README and the wiki page.

http://pypi.python.org/pypi/RelStorage/1.1.2

Patched ZODB3 Eggs Available

This week, I put up some ZODB3 eggs and source distributions with the patch required for RelStorage already applied. I built both ZODB3-3.8.1-polling and ZODB3-3.7.3-polling.  I even made eggs for Windows developers who have not yet taken the time to set up MinGW. 😉

http://packages.willowrise.org/

Developers can use this web site in buildout.cfg to incorporate RelStorage in their applications.  Feel free to mirror the site if you need to.

Promoting the Zope Component Architecture

I just sent this bit of promotion for the Zope Component Architecture to a friend (paraphrased slightly):

The Zope community uses adaptation for a new purpose. While adaptation is classically a way to force uncooperative classes to communicate, that is not the intent of the component architecture.

The intent of adapters in Zope is to expand objects’ contracts to fulfill application requirements while not polluting reusable code.  Adapters act like a streamlined form of model/view separation.

The community’s experience with Zope 2 was that while we did our best to keep code modular and reusable, there were simply too many cases where we needed to change the behavior of a base component in order to fulfill application requirements. We badly needed a way to use some kind of model/view separation at any point in the code.

Using Zope 2, we tried several industry solutions to address this, but using any of those solutions, we still found that application-specific dependencies had to creep into otherwise reusable code in order to meet reasonable deadlines. That really hurt too, because we were a bunch of hard-core OO developers and we really hated breaking modularity. We knew we were incurring a long term debt.

The Zope component architecture is the little gem that resulted from that long experience. It helps programmers avoid creating application-specific dependencies at every level. The component architecture resembles other indirection frameworks like Spring and AOP, but I believe it solves more problems elegantly.

That said, the Zope community now has enough experience with the component architecture to know that the first time we applied it in Zope 3, we applied too much of it in some places. Thus Zope 3 is currently somewhat overgeneralized. Like any indirection framework, you have to gain some experience before you learn what indirections are appropriate.

How am I doing? I think the ZCA would be quite valuable for my friend, who is today an excellent Java designer and coder. I want to find the right words to express why it would be valuable to him.

Easy Workaround for zc.buildout

Problem: running “python bootstrap.py” or “bin/buildout” often produces scripts that mix up the Python package search path due to some packages being installed system-wide.  Version conflicts result.

Workaround: use “python -S bootstrap.py” and “python -S bin/buildout”.  Magically, no more version conflicts.

I wish I had thought of that before.  Duh!

Update: Another tip for new zc.buildout users I’ve been meaning to mention is that you should create a preferences file in your home directory so that downloads and eggs are cached centrally.  This makes zc.buildout much friendlier.  Do this:

mkdir ~/.buildout
echo "[buildout]" >> ~/.buildout/default.cfg
echo "eggs-directory = $HOME/.buildout/eggs" >> ~/.buildout/default.cfg
echo "download-cache = $HOME/.buildout/cache" >> ~/.buildout/default.cfg

It seems a bit silly that zc.buildout doesn’t have these settings by default.  They make zc.buildout behave a lot like Apache Maven, which is what a lot of Java shops are using these days.  Both zc.buildout and Maven are great tools once you get to know them, but both are a real pain to understand at first.

What Would ZODB + Paxos Look Like?

I just learned about the Paxos algorithm. I think we might be able to use it to create a fully distributed version of ZODB. I found a document that explains Paxos in simple terms.  Now I’m interested in learning about any ideas and software that might support integration of Paxos into ZODB.  I would also like to know how much interest people have in such a project.

I think ZODB’s transaction layer already implements a sort of squashed version of Paxos, but it’s not currently possible to separate the pieces to make it distributed.  To me, “distributed ZODB” means multiple servers accept writes while assuring consistency at all times.  I also require sub-millisecond response timing on the majority of read operations, since that is what ZODB applications have come to rely upon.  I suspect the speed requirement disqualifies systems like CouchDB.

Egg Patching Solution #3

Martijn Faasen suggested this solution in a comment on my previous post and I think it’s the best.  I created a new service:

http://packages.willowrise.org

I simply posted a patched ZODB3 source distribution on a virtual-hosted server.  The first tarball, “ZODB3-3.8.1-polling-serial.tar.gz”, includes both the invalidation polling patch and the framework I created for plugging in data serialization formats other than pickles, but in the near future I plan to also post distributions with just the polling patch and some eggs for Windows users.

It would not make sense for me to post the patched tarballs and eggs on PyPI because I don’t want people to pull these patched versions accidentally.  Pulling these needs to be an explicit step.

Thanks to setuptools and zc.buildout, it turns out that creating a Python code distribution server is a piece of cake.  The buildout process scans the HTML pages on distribution servers for <a> links.  Any link that points to a tarball or egg with a version number is considered a candidate.  A static web site can easily fulfill these requirements. I imagine it gets deeper than that, but for now, that’s all I need.

To use this tarball, buildout.cfg just needs to include lines something like:

[buildout]
find-links = http://packages.willowrise.org
versions = versions

[versions]
ZODB3 = 3.8.1-polling-serial

zc.buildout does the rest.

It took a while to find this solution because, upon encountering the need to distribute patched eggs, I guessed it would be difficult to set up and maintain my own package distribution server. I also guessed setuptools had no support for patches in its versioning scheme. I’m glad I was completely wrong.

By the way, Ian suggested pip as a solution, but I don’t yet see how it helps. I am interested. I hope to see more of pip on Ian’s great blog.

Egg Patching Solution #2

I’ve been thinking more about patching Python eggs.  All I really need is for buildout.cfg to use a patched egg.  It doesn’t matter when the patching happens (although monkey patching is unacceptable; the changes I’m making are too complex for that.)  So the buildout process should download an egg that has already been patched.  That solution is probably less error-prone anyway.

So, I could create a “ZODB3-polling” egg that contains ZODB 3.8.1 with the invalidation polling patch, then upload that to PyPI.  All I have to do is tell people how to change their buildout.cfg to use my egg in place of the ZODB3 egg.

Ah, but there’s trouble: the ZODB3 egg is pulled in automatically through egg dependencies.  If people simply add my new egg to their buildout.cfg, they will end up with two ZODB versions in the Python path at once.  Which one wins?!

Therefore, it seems like zc.buildout should have a way to express, in buildout.cfg, “any requirement for egg X should instead be satisfied by egg Y”.  I am going to study how that might be done.

Poaching (Patching) Eggs

The term “egg” as used in the Python community seems so whimsical.  It deserves lots of puns.  A couple of weeks ago, I made a little utility for myself that takes all the eggs from an egg farm produced by zc.buildout and makes a single directory tree full of Python packages and modules.  I called it Omelette.  Get it?  Ha!  (I can hear chickens groaning already…)  The surprising thing about Omelette is it typically finishes in less than 1 second, even with dozens of eggs and thousands of modules.  It mostly produces symlinks, but it also unpacks zip files.  I plan to share it, but I don’t know when I’ll get around to packaging it.

Anyway, I want to talk about poaching patching eggs.  As systems grow in complexity, patching becomes more important.  Linux distributors, for example, solve a really complex problem, and their solution is to patch nearly every package.  If they didn’t, installed systems would be an unstable and unglued mess.  I imagine distributors’ patches usually reach upstream maintainers, but I also imagine it often takes months or years for those patches to trickle into a stable release of each package.

I really want to find a good way to integrate patching into the Python egg installation process.  I want to be able to say, in package metadata, that my package requires ZODB with a certain patch.  That patch would take the form of a diff file that might be downloaded from the Web.  I also want to be able to say that another package requires ZODB with a different patch, and assuming those patches have no conflicts, I want the Python package installation system to install ZODB with both patches.  Moreover, I want other buildouts to use ZODB without patches, even though I have a centralized cache of eggs in my ~/.buildout directory.

So let’s say my Python package installation system is zc.buildout, setuptools, and distutils.  Which layer should be modified to support patching?  I don’t think the need for automated patching arises until you’re combining a lot of packages, so it would seem most natural to put patching in zc.buildout. I can imagine a build.cfg like this:

[versions]
ZODB3=3.8.1 +poll-invalidations

[patches]
poll-invalidations=http://example.com/path/poll-invalidations-3.8.1.diff

I wonder how difficult it would be to achieve that.  Some modification of setuptools might be required.  Alternatively, can Paver patch eggs?  I suspect Paver is not very good at patching eggs either.