Picklebytes – Tech Notes from Shane Hathaway

When There’s No Telnet or Netcat

Sometimes I need to test an Internet connection from a machine where neither telnet nor netcat is installed. Python, which is practically always preinstalled on Linux servers, provides a simple solution.

# python
>>> import socket
>>> s = socket.socket()
>>> s.connect((host, port))

If the connect() call returns immediately, then the connection (including DNS lookup) was successful.

New RelStorage Maintainer

I’d like to thank Jason Madden for taking over as the maintainer of RelStorage! He recently released version 2.0 and what I’ve seen of it looks excellent. Version 1.6 still works well, but version 2 is probably the way to go for new projects.

Secure Random Code Generator

Long time no blog. I’d like to break the ice with a lighthearted Python code readability optimization discussion.

I need a Python function that generates short, random, numeric codes of a specified length using a secure random number generator. The codes must not start with 0, but the output should be a string. This was my first draft:

import struct, os

def make_code(digits):
    """Generate a random short code."""
    if digits > 9:
        return make_code(9) + make_code(digits - 9)
    [rand] = struct.unpack('>L', os.urandom(4))
    min_code = 10 ** (digits - 1)
    code = min_code + rand % (9 * min_code)
    return unicode(code)

Two things about this solution bothered me:

The math I used looks too clever. I’m not sure future maintainers will be able to understand why it works.
It discards a lot of the entropy provided by /dev/urandom, which could be expensive on some servers. (I confess this thought is premature optimization.)

To try to clean this up, I created a generator that produces random secure digits and based the make_code function on it:

def random_digits():
    while True:
        [n] = struct.unpack('>L', os.urandom(4))
        s = ('%d' % n)[1:]
        for c in s:
            yield c

next_random_digit = random_digits().next

def make_code(digits):
    return ''.join(next_random_digit() for _ in range(digits))

This felt better, but it has two critical problems. It can return 0 for the first digit, and due to the next_random_digit global, this solution is not thread safe! I’ve never tried to run a generator in multiple concurrent threads. I can only imagine the possible fireworks.

So I solved the problems as follows.

class RandomDigitStream(object):
    def __init__(self):
        self.streams = threading.local()

    def stream(self):
        while True:
            [n] = struct.unpack('>L', os.urandom(4))
            s = ('%d' % n)[1:]
            for c in s:
                yield c

    def make_code(self, digits):
        """Generate a random short code containing only digits."""
        try:
            next_digit = self.streams.next_digit
        except AttributeError:
            self.streams.next_digit = next_digit = self.stream().next

        while True:
            c = next_digit()
            # Don't use 0 for the first digit.
            if c != '0':
                break
        lst = [c]
        lst.extend(next_digit() for _ in range(1, digits))
        return ''.join(lst)

make_code = RandomDigitStream().make_code

Now the solution is large and I’m pretty sure it’s less readable than the original solution. Anyone want to take a shot at making this better? I’m currently thinking that the original solution was better than I thought.

Compositing in Plone

Seven years ago, I started a project called CompositePage. It added a drag and drop UI to Zope 2. Others built on it and created CompositePack. I applaud them for their efforts.

However, today I think the CompositePage UI pattern is the wrong idea. The CompositePage UI pattern is to render the page with extra markup that allows people with the necessary privileges to move objects around the page. The idea works technically, but it incurs a lot of technical issues for template authors and it is frequently difficult to explain to users.

I saw an implementation of a much better idea a few years ago. The better idea is to present a greatly simplified representation of the page when in design mode. Ideally, the simplified page should contain only basic colors and fuzzy text, so as to draw the user’s attention to the widgets, not the page.

Now it just occurred to me that CompositePage could implement this idea easily by letting developers define two templates per layout: the real template and the representative design template. Why didn’t I think of that before? That would be quite friendly.

Pyramid and Traversal

There has been a lot of discussion in the Pylons/Pyramid community about the concept of traversal. Some people, when faced with traversal for the first time, get confused and want it removed from Pyramid. They want Pyramid to support only routes, a different way of achieving a similar goal.

Personally, I think traversal is simpler and more powerful than routes. Conceptually, traversal is simply an interpretation of path segments as item lookup: the URL path “/spam/eggs/ham” is interpreted in Python as “root[‘spam’][‘eggs’][‘ham’]”. The root object is some object with a __getitem__ method; root.__getitem__(‘spam’) returns some object which has its own __getitem__ method, and so on. If __getitem__ raises a KeyError, then Pyramid generates an HTTP 404 error (which you can customize).

The objects along the path are often persistent, but there is no reason they have to be. Pyramid only requires __getitem__. Well, actually, they should also have __parent__ and __name__ attributes if you want Pyramid to generate correct URLs for you. WingCash, which will soon use Pyramid, uses traversal over non-persistent objects that only survive for one request.

Routes, on the other hand, requires a mini-language (similar to regular expressions) and iteration over possible path interpretations. Expanding the model requires changing the central list of all possible routes. Routes limits the complexity of the model, while traversal allows arbitrarily complex models.

Traversal is one of the best ideas from the Zope world. Traversal was painful to do correctly in Zope, yet we still loved it because it allowed us to expand web applications organically. I am very glad that Pyramid has simplified traversal and made the mechanism easily accessible. I think traversal is a major ingredient that led to the success of Plone.

The Holy Grail server framework

… and I think I just wrote it.

For years, I have wanted a Python TCP server framework with the following features.

Multi-process: I want the framework to be able to use all available processors and cores with no GIL (global interpreter lock) contention. This is important for CPU-bound applications.
Multi-threaded: Each process should have a fixed number of threads so it can multiplex tasks. This is important for I/O-bound applications.
Graceful shutdown: when I issue the kill signal, I want the server to close its server socket immediately, but I want existing connections to have a chance to finish their work. This is important if the connections are involved in transactions.
Graceful code reloading and reconfiguration: when I issue the HUP (hangup) signal, I want the server reload its configuration, spin up new processes, and tell the old processes to stop accepting new connections and shut down only when finished. Also, the server should only load application code in child processes, so that loading new code is simply a matter of issuing a HUP signal.
Transaction integration, including two-phase commit: If the connection dies, I want to roll back the associated transaction automatically, unless the connection dies after the first phase of two-phase commit, in which case I want to leave the transaction in a partially committed state and sort out the issue at a higher level.
No proxying: I don’t want all traffic to flow through the master process, which would be bad for both reliability and performance. Linux (and maybe any Unix system) allows multiple processes to listen on the same server socket; I want to use that feature.

What I have described is well beyond what the Python standard library provides. In theory, it will let me have all the joys of transactions and zero-downtime updates in a high performance environment. I have been searching for an implementation of this design for years. This week, I have been reworking some WingCash code to make better use of transactions and I realized I really need this design.

So, on Monday, I wrote my dream server framework and tried it out… and to my surprise, it actually worked! I was surprised because I thought that some Python hacker, somewhere, must have tried this experiment by now, and must have failed; otherwise the experiment results would surely be published or mentioned somewhere. Perhaps my requirements are not very common, the design is not obvious, or I did not type the right search terms.

I am now writing tests for the new server framework, which I am tentatively calling fdtserver (forking distributed transaction server). I won’t put it into production until it has 100% statement coverage like the rest of WingCash. I wonder whether fdtserver should be open source. Doesn’t the need for all those features, particularly graceful code reloading, exist elsewhere?

Cash and hors d’oeuvres

My friends and I have discovered an interesting way to celebrate events like finishing a milestone or the coming of a new year: we’re giving each other cash. Specifically, today we’re giving each other $20.11. I’ve never had so much fun passing cash around before. Suddenly, cash is as cool as a party favor or hors d’oeuvre. What is it about WingCash that makes cash fun, and will others have this experience, or is it just me?

Cash has always been serious thing for me. I count and save it carefully. I have devoted a lot of thought to deciding how to spend it or give it away. For most of my life, I operated under the impression that money I give away never comes back to me; I believed every cent I receive comes from personal labor.

I would like to relate a little story. Around the age of ten, I once noticed the design of the packaging of my family’s Commodore 64. The entire box was a single sheet of corrugated cardboard with no staples, glue, or tape required to maintain the structure. It had an origami-like flavor: with nothing but folds and cuts, someone had created a box that was strong, easy to open and close, and stayed closed when you wanted it closed. (At the time, I attributed the brilliant design to Commodore, but today I suspect it is actually a very old design.)

I made my own cardboard box after the same design by cutting a laundry detergent box. I was impressed to discover my box was built well even though my measurements were not very precise. That box became one of my treasures. I made a second, larger one and painted it silver, but the first one lasted longer. I still have that box now, and it is still sturdy, though it is worn.

That treasured box is where I kept my pile of coins. I collected coins there for a long time, saving them for something important, though I did not know what. Those coins had extra value to me because they were stored in my box. For years, nothing seemed valuable enough to pay for with my coin collection.

In 1999, I finally found a good way to spend those coins. I spent them on something useful, fun, and not just for me. I spent them as part of the down payment on my first new car. I still thoroughly enjoy that car and it still seems to work as well as the day I bought it. But why did it take so long for me to find something valuable enough to spend my special coins?

I did not think it was right to spend those coins until I found something useful, fun, and not just for me. Nothing before my car qualified: everything was either just for me or just for someone else, or it was fun but not very useful, or it was useful but not something to get excited about. A new car is certainly useful, the car I chose is definitely fun (for the driver at least), and it has 4 seats so it was obviously not just for me.

So now here I am, a penny pincher from birth, passing coins around freely. What changed? Well, cash has always been useful, but not really very fun, and too formal to call it “not just for me”. I feel like WingCash is changing that. The formality of cash is gone; what replaces it seems to be an emerging culture of passing cash around at will and taking what you need.

What makes it fun? It’s easy, but ease alone does not make something fun. There has to be some challenge. The challenge is to use the cash wisely. I think the wisest thing to do with cash is to help the people around you, especially your family and friends, but also the good people who run businesses and non-profit organizations. That is who I want WingCash to be good for, in that order: family, friends, businesses, and non-profits.

So happy new year, everyone! It looks like 2011 is going to be a great year.

Breaking Down Barriers (so I can send you cash)

So here’s the situation: I have some cash I want to give away to people just for signing up at WingCash, but there are still some issues preventing me from doing that:

WingCash currently supports very limited jurisdictions for individuals. Specifically, South Carolina, New Mexico, and Montana. The support for only those states is based on advice from our lawyers. We can’t even offer cash to our own family and neighbors yet! The process of adding jurisdictions is complicated from a legal standpoint. We’re working on it, but we have to be patient.
The sign-up process is easy, but we were asking for people’s birth dates. Some people might consider that a privacy issue, yet we have no intention of using people’s birth dates in any way except to ensure they are at least 18 years old. (We do intend to support minors, but in a different way.)

We worked out a simple solution to the second issue: just use a check box instead of asking for a birth date. The lawyers agreed that it was a good idea, so we have changed the sign-up page. No birth date is required anymore.

Owners of a business in the United States can participate in WingCash now by signing up and then adding their business. There is a link to add your business on the WingCash home page. So, I extend my temporary offer to any legitimate business that signs up at WingCash: I will give you cash! Once you have cash, you can tweet that cash to other people, redeem it through your bank account, etc.

Of course, people won’t really have the full experience until they can send and receive cash as individuals. We’ll get there!

WingCash is Open!

I’ve been working on a cool project for the past year. I haven’t been able to share much about it, but now that it’s open to the public, I intend to talk about it and explore it in the open. It’s built on Python, Repoze.BFG (to be replaced with Pyramid soon), PostgreSQL, Buildout, and a big pile of other great software.

WingCash is a system for sending cash on the Internet between people and businesses. It is Brad Wilkes’s idea and dream. He had the idea while working at FamilySearch. FamilySearch is trying to help genealogical archives remain financially solvent so that the records they hold can be accessible to the world. WingCash is completely independent of FamilySearch, but one of the intents is to solve a major financial issue for genealogical archives.

To that end, a foundational principle of WingCash is to charge no fees for direct exchanges of cash between WingCash users. On WingCash, sending a penny is sensible and economical. It is also sensible for millions to send a penny apiece to a single recipient.

WingCash does have to charge fees for other kinds of transfers, such as transfers to a bank, since banks charge WingCash for the service. WingCash will also offer premium services. But the core service of holding cash and sending it to anyone for no charge will remain free for good.

WingCash currently has limited resources, so please don’t hammer the newborn server, but WingCash does have a design that should allow it to expand into many servers over time. Check it out and say what you think. Please add your ideas to the feedback forum.

Here are some topics I expect to cover in future posts about WingCash:

I heart pythonsecurity.org
Bank notes are back
Integration with web stores and donation pages
Plans for the API

Python JSON Performance

This test compares the speed of various object serializers available for Python. I should have run it long ago. It opened my eyes a bit!

import time
import json
import simplejson
import marshal
import pickle
import cPickle
import cerealizer

def time_module(module):
    d = {}
    for k in range(100000):
        d[str(k)] = -k

    start = time.time()
    enc = module.dumps(d)
    stop = time.time()
    enc_time = stop - start

    start = time.time()
    got = module.loads(enc)
    stop = time.time()
    dec_time = stop - start

    if got != d:
        raise AssertionError("Module %s failed encoding" % module.__name__)

    print '%s: %.3f encode, %.3f decode' % (
        module.__name__, enc_time, dec_time)

if __name__ == '__main__':
    time_module(marshal)
    time_module(simplejson)
    time_module(cPickle)
    time_module(pickle)
    time_module(cerealizer)

    class EnhancedCPickle:
        __name__ = 'EnhancedCPickle'
        def dumps(self, obj):
            return cPickle.dumps(obj, cPickle.HIGHEST_PROTOCOL)
        def loads(self, data):
            return cPickle.loads(data)

    time_module(EnhancedCPickle())

    time_module(json)

Here are the results I got.

marshal: 0.010 encode, 0.016 decode
simplejson: 0.056 encode, 0.035 decode
cPickle: 0.136 encode, 0.051 decode
pickle: 0.570 encode, 0.553 decode
cerealizer: 0.268 encode, 0.249 decode
EnhancedCPickle: 0.115 encode, 0.030 decode
json: 0.214 encode, 1.195 decode

According to the Python documentation, the built-in json module is based on simplejson. What the documentation fails to mention is that simplejson is at least an order of magnitude faster than the built-in module. It’s even faster than cPickle, according to this microbenchmark. I would use the marshal module if I could, but that would introduce security issues in my application (as would the pickle modules).