Long time no blog. I’d like to break the ice with a lighthearted Python code readability optimization discussion.
I need a Python function that generates short, random, numeric codes of a specified length using a secure random number generator. The codes must not start with 0, but the output should be a string. This was my first draft:
import struct, os def make_code(digits): """Generate a random short code.""" if digits > 9: return make_code(9) + make_code(digits - 9) [rand] = struct.unpack('>L', os.urandom(4)) min_code = 10 ** (digits - 1) code = min_code + rand % (9 * min_code) return unicode(code)
Two things about this solution bothered me:
- The math I used looks too clever. I’m not sure future maintainers will be able to understand why it works.
- It discards a lot of the entropy provided by /dev/urandom, which could be expensive on some servers. (I confess this thought is premature optimization.)
To try to clean this up, I created a generator that produces random secure digits and based the make_code function on it:
def random_digits(): while True: [n] = struct.unpack('>L', os.urandom(4)) s = ('%d' % n)[1:] for c in s: yield c next_random_digit = random_digits().next def make_code(digits): return ''.join(next_random_digit() for _ in range(digits))
This felt better, but it has two critical problems. It can return 0 for the first digit, and due to the next_random_digit global, this solution is not thread safe! I’ve never tried to run a generator in multiple concurrent threads. I can only imagine the possible fireworks.
So I solved the problems as follows.
class RandomDigitStream(object): def __init__(self): self.streams = threading.local() def stream(self): while True: [n] = struct.unpack('>L', os.urandom(4)) s = ('%d' % n)[1:] for c in s: yield c def make_code(self, digits): """Generate a random short code containing only digits.""" try: next_digit = self.streams.next_digit except AttributeError: self.streams.next_digit = next_digit = self.stream().next while True: c = next_digit() # Don't use 0 for the first digit. if c != '0': break lst = [c] lst.extend(next_digit() for _ in range(1, digits)) return ''.join(lst) make_code = RandomDigitStream().make_code
Now the solution is large and I’m pretty sure it’s less readable than the original solution. Anyone want to take a shot at making this better? I’m currently thinking that the original solution was better than I thought.
Anyone looking at this ought to realize that there is a solution in the Python System libs already, called SystemRandom, since Python 2.4.
Thus something like
import itertools
def randdigits():
rand = SystemRandom()
while True:
yield str(rand.rand_int(0, 9))
def make_code(digits):
rand = SystemRandom()
omit_leading_0 = itertools.dropwhile(lambda x: x == '0', randdigits())
return join(itertools.islice(omit_leading_0, digits))