[STDS-802-Privacy] Collision probabilities for randomly chosen MAC addresses for privacy (aka the birthday paradox) - doing the sums in your head
I have talked with a number of people who have been surprised at the
probability of at least one collision occurring when n participants each
independently and randomly chose one of of m values. While the answer to
this problem, for any particular n, m can be found by using a handy app
or searching for a slide presentation, I personally find resorting to
electronics to do the calculation unsatisfying as it doesn't provide any
insight it to what is happening - or any help with evaluating
alternatives. [For a brief anecdote as to why I feel this way see the
end of this note].
For the range of collision probabilities that are of interest to us it
is trivial to perform the mental calculation. A good approximation for p
(probability of at least one collision) is:
p = n**2/2m
In other words if we have M bits of random space to pick from, and the
number of participants can fit in N bits then an upper bound for the
probability of at least one collision is:
1 in 2**(M+1-2N)
e.g. given 32 bits to pick from, and 2**11 participants picking there is
a 1 in 2**(32+1-22) = 1 in 2**11 = 1 in 2048 chance of at least one
collision. [Always accepting the questionable assumption that each of
the participants is using a good random number generator to make the
choice].
The above formula, p = n**2/2m is given on the Wikipedia page for the
birthday paradox (I changed my initial calculation to use p, n, and m to
agree with that page). The page notes that the approximation is good for
p up to 0.5, way beyond the figures of interest to us I suspect. For p =
0.5 we get the simple n ~ sqrt(m), 19 for real birthdays (choice from
365) which is just a little lower than the real 23 (for uniform
distribution).
It's worth looking at the math to get some feel for what is going on. A
crude argument may provide more insight that the full analysis, which
I'll give later. Consider each of our n participants picking a number
(address) in turn. The first will have no chance of collision, the
second 1 in m, the third 2 in m, ... the last (n-1) in m - assuming
there has been no collision so far. So the *average* participant will
have an ((n-1)/2) in m of picking a value that has already been picked
by someone else. So the chance of *someone* colliding equals the number
of participants times the average collision probability i.e.
p = n(n-1)/2m ~ n**2/2m
So the 'paradox' in the birthday paradox comes from the fact that not
only is the chance of any one individual hitting upon a value that has
been chosen before proportional to the density of choices, but that
probability is repeated for every participant, thus making the square of
the number of participants the significant factor.
More rigorously, we consider the number of possible arrangements of
independent choices. The number of ways that n participants can choose
values from a field of m is n**m. For non-colliding choices the first
participant has no choice of collision (can choose any value without
collision), the second can choose any of the remaining (m-1), the third
any of the remaining (m-2), the last any of the remaining (m-n-1). So
the number of non-colliding choices is m!/(m-(n-1))! [ m factorial
divided by (m-(n-1)) factorial. We can write this as the product of the
terms:
(1 - 0/m).(1-1/m).(1-2/m)....(1-(n-3)/m).(1-(n-2)/m).(1-(n-1)/m)
multiplying this out we have 1 minus terms in (1/m)**1, (1/m)**2, etc.
Taking the terms in (1/m) in pairs (pairing first with last, second with
next to last etc. so each pair sums to (n-1) and then multiplying by the
number of pairs (n/2) we have:
- 1/m(n-1)(n/2) ... as by the crude argument above
The terms in (1/m)**2 will have n**3 in the denominator, but for small
n/m they do not concern us unduly and the series converges rapidly and
with alternating sign. For more than you wanted to know see
http://en.wikipedia.org/wiki/Birthday_problem
Mick
...
[Why I think using an app to calculate these probabilities instead of
simple approximation blunts the intellect. Years ago (many many years
ago) when I was supervising an undergraduate physics practical class and
almost no student could afford their own calculator, the Cavendish had
bought a number of desktop calculators for shared use, and I asked a
student who was queuing to use one of these what he wanted to calculate
- thinking I might display some mental wizardry and get the guy back to
work at the same time. He answered "the square root of 25". He got it
when I said "think about it".]