Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

[STDS-802-Privacy] Collision probabilities for randomly chosen MAC addresses for privacy (aka the birthday paradox) - doing the sums in your head



I have talked with a number of people who have been surprised at the probability of at least one collision occurring when n participants each independently and randomly chose one of of m values. While the answer to this problem, for any particular n, m can be found by using a handy app or searching for a slide presentation, I personally find resorting to electronics to do the calculation unsatisfying as it doesn't provide any insight it to what is happening - or any help with evaluating alternatives. [For a brief anecdote as to why I feel this way see the end of this note].

For the range of collision probabilities that are of interest to us it is trivial to perform the mental calculation. A good approximation for p (probability of at least one collision) is:

p = n**2/2m

In other words if we have M bits of random space to pick from, and the number of participants can fit in N bits then an upper bound for the probability of at least one collision is:

1 in 2**(M+1-2N)

e.g. given 32 bits to pick from, and 2**11 participants picking there is a 1 in 2**(32+1-22) = 1 in 2**11 = 1 in 2048 chance of at least one collision. [Always accepting the questionable assumption that each of the participants is using a good random number generator to make the choice].

The above formula, p = n**2/2m is given on the Wikipedia page for the birthday paradox (I changed my initial calculation to use p, n, and m to agree with that page). The page notes that the approximation is good for p up to 0.5, way beyond the figures of interest to us I suspect. For p = 0.5 we get the simple n ~ sqrt(m), 19 for real birthdays (choice from 365) which is just a little lower than the real 23 (for uniform distribution).

It's worth looking at the math to get some feel for what is going on. A crude argument may provide more insight that the full analysis, which I'll give later. Consider each of our n participants picking a number (address) in turn. The first will have no chance of collision, the second 1 in m, the third 2 in m, ... the last (n-1) in m - assuming there has been no collision so far. So the *average* participant will have an ((n-1)/2) in m of picking a value that has already been picked by someone else. So the chance of *someone* colliding equals the number of participants times the average collision probability i.e.

p = n(n-1)/2m ~ n**2/2m

So the 'paradox' in the birthday paradox comes from the fact that not only is the chance of any one individual hitting upon a value that has been chosen before proportional to the density of choices, but that probability is repeated for every participant, thus making the square of the number of participants the significant factor.

More rigorously, we consider the number of possible arrangements of independent choices. The number of ways that n participants can choose values from a field of m is n**m. For non-colliding choices the first participant has no choice of collision (can choose any value without collision), the second can choose any of the remaining (m-1), the third any of the remaining (m-2), the last any of the remaining (m-n-1). So the number of non-colliding choices is m!/(m-(n-1))! [ m factorial divided by (m-(n-1)) factorial. We can write this as the product of the terms:

  (1 - 0/m).(1-1/m).(1-2/m)....(1-(n-3)/m).(1-(n-2)/m).(1-(n-1)/m)

multiplying this out we have 1 minus terms in (1/m)**1, (1/m)**2, etc.
Taking the terms in (1/m) in pairs (pairing first with last, second with next to last etc. so each pair sums to (n-1) and then multiplying by the number of pairs (n/2) we have:

- 1/m(n-1)(n/2)   ... as by the crude argument above

The terms in (1/m)**2 will have n**3 in the denominator, but for small n/m they do not concern us unduly and the series converges rapidly and with alternating sign. For more than you wanted to know see

http://en.wikipedia.org/wiki/Birthday_problem

Mick
...
[Why I think using an app to calculate these probabilities instead of simple approximation blunts the intellect. Years ago (many many years ago) when I was supervising an undergraduate physics practical class and almost no student could afford their own calculator, the Cavendish had bought a number of desktop calculators for shared use, and I asked a student who was queuing to use one of these what he wanted to calculate - thinking I might display some mental wizardry and get the guy back to work at the same time. He answered "the square root of 25". He got it when I said "think about it".]