Re: [STDS-802-Privacy] Collision probabilities for randomly chosen MAC addresses for privacy (aka the birthday paradox) - doing the sums in your head
- To: STDS-802-PRIVACY@xxxxxxxxxxxxxxxxx
- Subject: Re: [STDS-802-Privacy] Collision probabilities for randomly chosen MAC addresses for privacy (aka the birthday paradox) - doing the sums in your head
- From: Paul Lambert <paul@xxxxxxxxxxx>
- Date: Thu, 5 Feb 2015 13:00:00 -0800
I find the metric of a 50% collision probability for a single
collision to be the most useful for engineering comparisons.
The slope of the probability from very low to high is very steep
on a logarithmic scale (powers of 2) at the 50% break point.
Knowing where the break point happens (at 50%) gives a
uniform way to compare different probabilities for different
numbers of bits. It¹s also easy to calculate in your head.
Paul
On 2/5/15, 11:08 PM, "Rene Struik" <rstruik.ext@xxxxxxxxx> wrote:
>Hi Mick:
>
>In case you are interested in extensions of the birthday attack to k
>collisions (instead of one), please see the attached SAC 2001 paper of
>Fabian Kuhn and myself. The basic result is that if the sample space
>size N is not too small, one can expect to need roughly SQRT(kN) samples
>to get k collisions (i.e., roughly SQRT(k) times the number for one
>collision). More precise results are in the paper. It was recently shown
>(paper to be presented at Eurocrypt 2015) that for the discrete log
>problem, these "random walks" are essentially the best one can do.
>
>Best regards, Rene
>
>
>On 2/5/2015 1:20 AM, Mick Seaman wrote:
>> I have talked with a number of people who have been surprised at the
>> probability of at least one collision occurring when n participants
>> each independently and randomly chose one of of m values. While the
>> answer to this problem, for any particular n, m can be found by using
>> a handy app or searching for a slide presentation, I personally find
>> resorting to electronics to do the calculation unsatisfying as it
>> doesn't provide any insight it to what is happening - or any help with
>> evaluating alternatives. [For a brief anecdote as to why I feel this
>> way see the end of this note].
>>
>> For the range of collision probabilities that are of interest to us it
>> is trivial to perform the mental calculation. A good approximation for
>> p (probability of at least one collision) is:
>>
>> p = n**2/2m
>>
>> In other words if we have M bits of random space to pick from, and the
>> number of participants can fit in N bits then an upper bound for the
>> probability of at least one collision is:
>>
>> 1 in 2**(M+1-2N)
>>
>> e.g. given 32 bits to pick from, and 2**11 participants picking there
>> is a 1 in 2**(32+1-22) = 1 in 2**11 = 1 in 2048 chance of at least one
>> collision. [Always accepting the questionable assumption that each of
>> the participants is using a good random number generator to make the
>> choice].
>>
>> The above formula, p = n**2/2m is given on the Wikipedia page for the
>> birthday paradox (I changed my initial calculation to use p, n, and m
>> to agree with that page). The page notes that the approximation is
>> good for p up to 0.5, way beyond the figures of interest to us I
>> suspect. For p = 0.5 we get the simple n ~ sqrt(m), 19 for real
>> birthdays (choice from 365) which is just a little lower than the real
>> 23 (for uniform distribution).
>>
>> It's worth looking at the math to get some feel for what is going on.
>> A crude argument may provide more insight that the full analysis,
>> which I'll give later. Consider each of our n participants picking a
>> number (address) in turn. The first will have no chance of collision,
>> the second 1 in m, the third 2 in m, ... the last (n-1) in m -
>> assuming there has been no collision so far. So the *average*
>> participant will have an ((n-1)/2) in m of picking a value that has
>> already been picked by someone else. So the chance of *someone*
>> colliding equals the number of participants times the average
>> collision probability i.e.
>>
>> p = n(n-1)/2m ~ n**2/2m
>>
>> So the 'paradox' in the birthday paradox comes from the fact that not
>> only is the chance of any one individual hitting upon a value that has
>> been chosen before proportional to the density of choices, but that
>> probability is repeated for every participant, thus making the square
>> of the number of participants the significant factor.
>>
>> More rigorously, we consider the number of possible arrangements of
>> independent choices. The number of ways that n participants can choose
>> values from a field of m is n**m. For non-colliding choices the first
>> participant has no choice of collision (can choose any value without
>> collision), the second can choose any of the remaining (m-1), the
>> third any of the remaining (m-2), the last any of the remaining
>> (m-n-1). So the number of non-colliding choices is m!/(m-(n-1))! [ m
>> factorial divided by (m-(n-1)) factorial. We can write this as the
>> product of the terms:
>>
>> (1 - 0/m).(1-1/m).(1-2/m)....(1-(n-3)/m).(1-(n-2)/m).(1-(n-1)/m)
>>
>> multiplying this out we have 1 minus terms in (1/m)**1, (1/m)**2, etc.
>> Taking the terms in (1/m) in pairs (pairing first with last, second
>> with next to last etc. so each pair sums to (n-1) and then multiplying
>> by the number of pairs (n/2) we have:
>>
>> - 1/m(n-1)(n/2) ... as by the crude argument above
>>
>> The terms in (1/m)**2 will have n**3 in the denominator, but for small
>> n/m they do not concern us unduly and the series converges rapidly and
>> with alternating sign. For more than you wanted to know see
>>
>> http://en.wikipedia.org/wiki/Birthday_problem
>>
>> Mick
>> ...
>> [Why I think using an app to calculate these probabilities instead of
>> simple approximation blunts the intellect. Years ago (many many years
>> ago) when I was supervising an undergraduate physics practical class
>> and almost no student could afford their own calculator, the Cavendish
>> had bought a number of desktop calculators for shared use, and I asked
>> a student who was queuing to use one of these what he wanted to
>> calculate - thinking I might display some mental wizardry and get the
>> guy back to work at the same time. He answered "the square root of
>> 25". He got it when I said "think about it".]
>
>
>--
>email: rstruik.ext@xxxxxxxxx | Skype: rstruik
>cell: +1 (647) 867-5658 | US: +1 (415) 690-7363
>