RelStorage: MD5 sums

If you study RelStorage a bit, you’ll discover that every object it stores is accompanied by an MD5 sum of the object state.  Then you’ll probably wonder why, since MD5 computation is cheap but not free.  We do it to support undo.

ZODB expects the storage to check whether an undo operation is safe before actually doing it.  FileStorage performs that verification using the following algorithm: if each object’s state in the transaction to undo matches the object’s current state, it is safe to undo.  If any object does not fit that rule, raise an UndoError instead.

RelStorage uses the same algorithm, but it compares states using the MD5 sum rather than the full state, allowing the comparison to proceed quickly.  Actually, the real issue is not speed, but portability. Do all of the supported relational databases have the ability to compare the contents of BLOBs in a query?   It’s hard to find documentation on questions like that.  It’s much easier to just compare MD5 sums.

Besides, it generally feels good to keep MD5 sums around.  If the filesystem hosting your database ever accumulates some corruption, you can use the MD5 sums to help sort out the mess.