Secure Random Code Generator

Long time no blog. I’d like to break the ice with a lighthearted Python code readability optimization discussion.

I need a Python function that generates short, random, numeric codes of a specified length using a secure random number generator. The codes must not start with 0, but the output should be a string. This was my first draft:

import struct, os

def make_code(digits):
    """Generate a random short code."""
    if digits > 9:
        return make_code(9) + make_code(digits - 9)
    [rand] = struct.unpack('>L', os.urandom(4))
    min_code = 10 ** (digits - 1)
    code = min_code + rand % (9 * min_code)
    return unicode(code)

Two things about this solution bothered me:

  • The math I used looks too clever. I’m not sure future maintainers will be able to understand why it works.
  • It discards a lot of the entropy provided by /dev/urandom, which could be expensive on some servers. (I confess this thought is premature optimization.)

To try to clean this up, I created a generator that produces random secure digits and based the make_code function on it:

def random_digits():
    while True:
        [n] = struct.unpack('>L', os.urandom(4))
        s = ('%d' % n)[1:]
        for c in s:
            yield c

next_random_digit = random_digits().next

def make_code(digits):
    return ''.join(next_random_digit() for _ in range(digits))

This felt better, but it has two critical problems. It can return 0 for the first digit, and due to the next_random_digit global, this solution is not thread safe! I’ve never tried to run a generator in multiple concurrent threads. I can only imagine the possible fireworks.

So I solved the problems as follows.

class RandomDigitStream(object):
    def __init__(self):
        self.streams = threading.local()

    def stream(self):
        while True:
            [n] = struct.unpack('>L', os.urandom(4))
            s = ('%d' % n)[1:]
            for c in s:
                yield c

    def make_code(self, digits):
        """Generate a random short code containing only digits."""
        try:
            next_digit = self.streams.next_digit
        except AttributeError:
            self.streams.next_digit = next_digit = self.stream().next

        while True:
            c = next_digit()
            # Don't use 0 for the first digit.
            if c != '0':
                break
        lst = [c]
        lst.extend(next_digit() for _ in range(1, digits))
        return ''.join(lst)

make_code = RandomDigitStream().make_code

Now the solution is large and I’m pretty sure it’s less readable than the original solution. Anyone want to take a shot at making this better? I’m currently thinking that the original solution was better than I thought.

Compositing in Plone

Seven years ago, I started a project called CompositePage.  It added a drag and drop UI to Zope 2.  Others built on it and created CompositePack.  I applaud them for their efforts.

However, today I think the CompositePage UI pattern is the wrong idea.  The CompositePage UI pattern is to render the page with extra markup that allows people with the necessary privileges to move objects around the page.  The idea works technically, but it incurs a lot of technical issues for template authors and it is frequently difficult to explain to users.

I saw an implementation of a much better idea a few years ago.  The better idea is to present a greatly simplified representation of the page when in design mode.  Ideally, the simplified page should contain only basic colors and fuzzy text, so as to draw the user’s attention to the widgets, not the page.

Now it just occurred to me that CompositePage could implement this idea easily by letting developers define two templates per layout: the real template and the representative design template.  Why didn’t I think of that before?  That would be quite friendly.

Pyramid and Traversal

There has been a lot of discussion in the Pylons/Pyramid community about the concept of traversal.  Some people, when faced with traversal for the first time, get confused and want it removed from Pyramid.  They want Pyramid to support only routes, a different way of achieving a similar goal.

Personally, I think traversal is simpler and more powerful than routes.  Conceptually, traversal is simply an interpretation of path segments as item lookup: the URL path “/spam/eggs/ham” is interpreted in Python as “root['spam']['eggs']['ham']“.  The root object is some object with a __getitem__ method; root.__getitem__(‘spam’) returns some object which has its own __getitem__ method, and so on.  If __getitem__ raises a KeyError, then Pyramid generates an HTTP 404 error (which you can customize).

The objects along the path are often persistent, but there is no reason they have to be.  Pyramid only requires __getitem__.  Well, actually, they should also have __parent__ and __name__ attributes if you want Pyramid to generate correct URLs for you.  WingCash, which will soon use Pyramid, uses traversal over non-persistent objects that only survive for one request.

Routes, on the other hand, requires a mini-language (similar to regular expressions) and iteration over possible path interpretations.  Expanding the model requires changing the central list of all possible routes.  Routes limits the complexity of the model, while traversal allows arbitrarily complex models.

Traversal is one of the best ideas from the Zope world.  Traversal was painful to do correctly in Zope, yet we still loved it because it allowed us to expand web applications organically.  I am very glad that Pyramid has simplified traversal and made the mechanism easily accessible.  I think traversal is a major ingredient that led to the success of Plone.

The Holy Grail server framework

… and I think I just wrote it.

For years, I have wanted a Python TCP server framework with the following features.

  • Multi-process: I want the framework to be able to use all available processors and cores with no GIL (global interpreter lock) contention.  This is important for CPU-bound applications.
  • Multi-threaded: Each process should have a fixed number of threads so it can multiplex tasks.  This is important for I/O-bound applications.
  • Graceful shutdown: when I issue the kill signal, I want the server to close its server socket immediately, but I want existing connections to have a chance to finish their work.  This is important if the connections are involved in transactions.
  • Graceful code reloading and reconfiguration: when I issue the HUP (hangup) signal, I want the server reload its configuration, spin up new processes, and tell the old processes to stop accepting new connections and shut down only when finished.  Also, the server should only load application code in child processes, so that loading new code is simply a matter of issuing a HUP signal.
  • Transaction integration, including two-phase commit: If the connection dies, I want to roll back the associated transaction automatically, unless the connection dies after the first phase of two-phase commit, in which case I want to leave the transaction in a partially committed state and sort out the issue at a higher level.
  • No proxying: I don’t want all traffic to flow through the master process, which would be bad for both reliability and performance.  Linux (and maybe any Unix system) allows multiple processes to listen on the same server socket; I want to use that feature.

What I have described is well beyond what the Python standard library provides.  In theory, it will let me have all the joys of transactions and zero-downtime updates in a high performance environment.  I have been searching for an implementation of this design for years.  This week, I have been reworking some WingCash code to make better use of transactions and I realized I really need this design.

So, on Monday, I wrote my dream server framework and tried it out… and to my surprise, it actually worked!  I was surprised because I thought that some Python hacker, somewhere, must have tried this experiment by now, and must have failed; otherwise the experiment results would surely be published or mentioned somewhere.  Perhaps my requirements are not very common, the design is not obvious, or I did not type the right search terms.

I am now writing tests for the new server framework, which I am tentatively calling fdtserver (forking distributed transaction server).  I won’t put it into production until it has 100% statement coverage like the rest of WingCash.  I wonder whether fdtserver should be open source.  Doesn’t the need for all those features, particularly graceful code reloading, exist elsewhere?

Cash and hors d’oeuvres

My friends and I have discovered an interesting way to celebrate events like finishing a milestone or the coming of a new year: we’re giving each other cash.  Specifically, today we’re giving each other $20.11.  I’ve never had so much fun passing cash around before.  Suddenly, cash is as cool as a party favor or hors d’oeuvre.  What is it about WingCash that makes cash fun, and will others have this experience, or is it just me?

Cash has always been serious thing for me.  I count and save it carefully.  I have devoted a lot of thought to deciding how to spend it or give it away.  For most of my life, I operated under the impression that money I give away never comes back to me; I believed every cent I receive comes from personal labor.

I would like to relate a little story.  Around the age of ten, I once noticed the design of the packaging of my family’s Commodore 64.  The entire box was a single sheet of corrugated cardboard with no staples, glue, or tape required to maintain the structure.  It had an origami-like flavor: with nothing but folds and cuts, someone had created a box that was strong, easy to open and close, and stayed closed when you wanted it closed.  (At the time, I attributed the brilliant design to Commodore, but today I suspect it is actually a very old design.)

I made my own cardboard box after the same design by cutting a laundry detergent box.  I was impressed to discover my box was built well even though my measurements were not very precise.  That box became one of my treasures.  I made a second, larger one and painted it silver, but the first one lasted longer.  I still have that box now, and it is still sturdy, though it is worn.

That treasured box is where I kept my pile of coins.  I collected coins there for a long time, saving them for something important, though I did not know what.  Those coins had extra value to me because they were stored in my box.  For years, nothing seemed valuable enough to pay for with my coin collection.

In 1999, I finally found a good way to spend those coins.  I spent them on something useful, fun, and not just for me.  I spent them as part of the down payment on my first new car.  I still thoroughly enjoy that car and it still seems to work as well as the day I bought it.  But why did it take so long for me to find something valuable enough to spend my special coins?

I did not think it was right to spend those coins until I found something useful, fun, and not just for me.  Nothing before my car qualified: everything was either just for me or just for someone else, or it was fun but not very useful, or it was useful but not something to get excited about.  A new car is certainly useful, the car I chose is definitely fun (for the driver at least), and it has 4 seats so it was obviously not just for me.

So now here I am, a penny pincher from birth, passing coins around freely.  What changed?  Well, cash has always been useful, but not really very fun, and too formal to call it “not just for me”.  I feel like WingCash is changing that.  The formality of cash is gone; what replaces it seems to be an emerging culture of passing cash around at will and taking what you need.

What makes it fun?  It’s easy, but ease alone does not make something fun.  There has to be some challenge.  The challenge is to use the cash wisely.  I think the wisest thing to do with cash is to help the people around you, especially your family and friends, but also the good people who run businesses and non-profit organizations.  That is who I want WingCash to be good for, in that order: family, friends, businesses, and non-profits.

So happy new year, everyone!  It looks like 2011 is going to be a great year.

Breaking Down Barriers (so I can send you cash)

So here’s the situation: I have some cash I want to give away to people just for signing up at WingCash, but there are still some issues preventing me from doing that:

  • WingCash currently supports very limited jurisdictions for individuals.  Specifically, South Carolina, New Mexico, and Montana.  The support for only those states is based on advice from our lawyers.  We can’t even offer cash to our own family and neighbors yet!  The process of adding jurisdictions is complicated from a legal standpoint.  We’re working on it, but we have to be patient.
  • The sign-up process is easy, but we were asking for people’s birth dates.  Some people might consider that a privacy issue, yet we have no intention of using people’s birth dates in any way except to ensure they are at least 18 years old.  (We do intend to support minors, but in a different way.)

We worked out a simple solution to the second issue: just use a check box instead of asking for a birth date.  The lawyers agreed that it was a good idea, so we have changed the sign-up page.  No birth date is required anymore.

Owners of a business in the United States can participate in WingCash now by signing up and then adding their business.  There is a link to add your business on the WingCash home page.  So, I extend my temporary offer to any legitimate business that signs up at WingCash: I will give you cash!  Once you have cash, you can tweet that cash to other people, redeem it through your bank account, etc.

Of course, people won’t really have the full experience until they can send and receive cash as individuals.  We’ll get there!

WingCash is Open!

I’ve been working on a cool project for the past year.  I haven’t been able to share much about it, but now that it’s open to the public, I intend to talk about it and explore it in the open.  It’s built on Python, Repoze.BFG (to be replaced with Pyramid soon), PostgreSQL, Buildout, and a big pile of other great software.

WingCash is a system for sending cash on the Internet between people and businesses.  It is Brad Wilkes’s idea and dream.  He had the idea while working at FamilySearch.  FamilySearch is trying to help genealogical archives remain financially solvent so that the records they hold can be accessible to the world.  WingCash is completely independent of FamilySearch, but one of the intents is to solve a major financial issue for genealogical archives.

To that end, a foundational principle of WingCash is to charge no fees for direct exchanges of cash between WingCash users.  On WingCash, sending a penny is sensible and economical.  It is also sensible for millions to send a penny apiece to a single recipient.

WingCash does have to charge fees for other kinds of transfers, such as transfers to a bank, since banks charge WingCash for the service.  WingCash will also offer premium services.  But the core service of holding cash and sending it to anyone for no charge will remain free for good.

WingCash currently has limited resources, so please don’t hammer the newborn server, but WingCash does have a design that should allow it to expand into many servers over time.  Check it out and say what you think.  Please add your ideas to the feedback forum.

Here are some topics I expect to cover in future posts about WingCash:

  • I heart pythonsecurity.org
  • Bank notes are back
  • Integration with web stores and donation pages
  • Plans for the API

Python JSON Performance

This test compares the speed of various object serializers available for Python. I should have run it long ago. It opened my eyes a bit!

import time
import json
import simplejson
import marshal
import pickle
import cPickle
import cerealizer

def time_module(module):
    d = {}
    for k in range(100000):
        d[str(k)] = -k

    start = time.time()
    enc = module.dumps(d)
    stop = time.time()
    enc_time = stop - start

    start = time.time()
    got = module.loads(enc)
    stop = time.time()
    dec_time = stop - start

    if got != d:
        raise AssertionError("Module %s failed encoding" % module.__name__)

    print '%s: %.3f encode, %.3f decode' % (
        module.__name__, enc_time, dec_time)

if __name__ == '__main__':
    time_module(marshal)
    time_module(simplejson)
    time_module(cPickle)
    time_module(pickle)
    time_module(cerealizer)

    class EnhancedCPickle:
        __name__ = 'EnhancedCPickle'
        def dumps(self, obj):
            return cPickle.dumps(obj, cPickle.HIGHEST_PROTOCOL)
        def loads(self, data):
            return cPickle.loads(data)

    time_module(EnhancedCPickle())

    time_module(json)

Here are the results I got.

marshal: 0.010 encode, 0.016 decode
simplejson: 0.056 encode, 0.035 decode
cPickle: 0.136 encode, 0.051 decode
pickle: 0.570 encode, 0.553 decode
cerealizer: 0.268 encode, 0.249 decode
EnhancedCPickle: 0.115 encode, 0.030 decode
json: 0.214 encode, 1.195 decode

According to the Python documentation, the built-in json module is based on simplejson. What the documentation fails to mention is that simplejson is at least an order of magnitude faster than the built-in module. It’s even faster than cPickle, according to this microbenchmark. I would use the marshal module if I could, but that would introduce security issues in my application (as would the pickle modules).

How to get a fast response from me

I’m swamped with work.  I can’t yet talk much about the work I’ve been doing until we go public, but it’s a cool project, although its coolness may not be immediately obvious at first. ;-)

Meanwhile, I have been rather slow to answer some emails.  Here is what you need to do to get a fast email response from me: present me a viable option that allows someone other than me to do the work.  For example, if you want one of my open source projects changed, the best thing to suggest is that you will do the work and all I have to do is review a patch.  (Ideally, include the unified diff in the email so I can review instantly.)

RepStrap Notes for my Past Self

Here are some bits of wisdom I have gained about building and using a Bits From Bytes RepStrap 2.0, in stream-of-thought order.  I wish I had known this stuff before!  I imagine much of this applies to newer machines as well.

  • Essential tools that you may not realize are essential: a small benchtop drill press (AKA pillar drill) if you don’t have a lathe, a good soldering iron (a Hakko 936 is great), solder wick (size #1), a heat gun (Black and Decker makes a good one) and heat shrink, a range of drill bits that differ in size by 1/64″, a bench vice, a good hack saw, a chisel, a metal file, a set of small diamond files, a multimeter with a temperature probe, and a dremel-style tool.  BTW, a drill press is much quieter than a hand-held drill.
  • As a beginner, buy only PLA.  ABS and HDPE are more difficult because they shrink too much while cooling, pulling the lower layers upward, warping objects with dimensions of about 5 cm or larger.  ABS also smells bad while building.  Warping happens with PLA objects by about 15 cm, but most objects in the community aren’t that big.
  • Warping can break the extruder mount.  Don’t accept warping.  If you break the extruder mount, fix it with super glue and tie the extruder to its mount in more than one place.
  • Loctite Super Glue for All Plastics, #01-07011, works well on acrylic.  Apply it generously and always allow the full 8 hour set time.  If the lid gets filled with glue, drill it out.
  • Loctite Vinyl and Plastic, #01-06998-01, works well for flexible plastic and rubber, such as the Z axis belt.
  • Kapton tape is helpful, although I don’t trust it enough to apply it directly to the nichrome wire on the heater.  I’ve heard that Kapton tape can catch fire.  I use Kapton tape to insulate the copper leads connecting to the nichrome.
  • When building the heater, let the nichrome dangle outside the barrel a bit to avoid breaking the nichrome wire.  The copper/nichrome connection does not need to be inside the heater.
  • Twisting copper and nichrome together, then applying solder, makes a strong bond.
  • Build a 4 ohm heater, not an 8 ohm heater.  A 4 ohm heater can reach the specified temperature more quickly and maintain it better.  The standard transistor can handle 5 amps; a 4 ohm heater at 12 volts consumes only 3 amps.
  • If you short the PWM transistors, they will stay on forever, which could cause a fire.  Use Nophead’s suggested replacement.
  • Running the object cooling fan continuously (with a PWM setting of 64) seems to prevent jams and burned filament.  To compensate for the fan, add extra cement as insulation and raise the temperature by about 10 degrees.  I used to add fiber glass insulation to the heater, but that made it harder to fix the heater.
  • Ace Hardware sells an 8oz tub of Chimney Sweep Furnace Cement.  I’ve been happy with it.  It cures in minutes with a heat gun. :-)
  • The copper pipe on the heater seems unnecessary.  I have left it off and I’m happy.
  • Don’t bother with the 30mm washer holding the heater in place.  Cut a piece of thin aluminum instead.
  • The 5V motor can run at 12V, but it’s loud and has a short life (about 2 weeks when I used the machine a lot).  If you use a Solarbiotics DC motor, it *must* be the 12V type.
  • The 12V DC motor, running at full speed, extrudes about 1.6 mm3/s, translating to about 4 or 5 cm3/h, including pauses.  The 5V motor runs faster and can extrude 3 mm3/s.  Too bad it doesn’t work for long.  Trying to extrude more than 3 mm3/s leads to frequent jamming.
  • The extruder motor has more than enough power.  The much harder problem to solve is gripping the filament; the motor tends to turn the same speed whether grip is achieved or not.  My latest solution for gripping is to sharpen the stainless steel (SS) M8 screw with a diamond file (the drill press is handy for this process) and move the SS washer upward by filing a small slot, so that the washer doesn’t dull the important part of the screw.  This has worked well so far.
  • Make the extruder detachable.  A 9 pin connector is good.  You will remove and improve the extruder dozens (hundreds?) of times before you’re satisfied that it is reliable.
  • Make the heater detachable too.
  • To make a thermocouple, get type K thermocouple wire (with glass insulation) and twist the wires together at the point where you want to measure the temperature.  I don’t think welding the thermocouple is necessary at all.
  • The alarm LED on the thermocouple PCB is very useful.  If it flickers, you probably have broken thermocouple wire.
  • Build on blue painter’s tape (the non-stick side).  I stuck the tape to the acrylic base that came with the BitsFromBytes 2.0 kit.
  • The primary source of wobble is the 608ZZ bearings.  Don’t rely on their accuracy.  I had to change both the Y and Z axes from the standard design so that the bearings could wobble without affecting the machine.  (The X axis might need a similar adjustment.)
  • Some of the acrylic takes too much load and breaks.  I replaced the acrylic piece that connects to the X axis stepper with hacked 1/4″ aluminum.  The fix has been quite reliable.
  • Run the machine from a server so that you can get other work done on your laptop/desktop at the same time.
  • Art of Illusion is very buggy, Blender is just as buggy when performing boolean operations, and OpenSCAD is great once you’ve gone through the slightly difficult process of compiling it.  Use OpenSCAD for anything that requires exactness.  Use Blender for artwork.
  • Both the RepRap host software and Skeinforge are hard to configure correctly.  Many of the defaults are wrong.  Tweaking them takes a lot of time.  Skeinforge is much more flexible.  Get Skeinforge from Subversion and plan to edit the code to fix bugs such as division by zero.
  • Use a G-code firmware, but plan to tweak the firmware.  I hope you can program in both C and Python. :-)
  • The Oozebane plugin is no good.  In my experience, a much better solution is to add code to the firmware that pauses when starting a segment and temporarily reverses the motor (while the steppers are moving!) after stopping a segment.  The length of the pause depends on how much time has passed since the extruder was last turned off.  A Skeinforge plugin could emulate this behavior, but it’s simpler to do it in the firmware.
  • A new Sanguino board requires you to upload a bootloader, which requires either a programmer or a breadboard and a parallel port.  Most new computers don’t have a parallel port.  I got by with Xubuntu loaded on an old Pentium II that had a parallel port.
  • Both OpenSCAD and Skeinforge often need lots of CPU (2-3 GHz) and RAM (1 GB or more).  OpenSCAD needs a modern 3D video card.  A $30 fanless nVidia card from NewEgg is sufficient.
  • I wrote my own Python scripts for running the machine.  The scripts prepare the machine before starting a build, and can pause and resume a large build.  I’m not sure how others are handling that part.  Maybe I should post my scripts on some server.

Having learned all that and more, I think I am finally ready to build a Mendel. :-)