Thread Links			Date Links
Thread Prev	Thread Next	Thread Index	Date Prev	Date Next	Date Index

Re: [STDS-802-Privacy] Collision probabilities for randomly chosen MAC addresses for privacy (aka the birthday paradox) - doing the sums in your head

To: STDS-802-PRIVACY@xxxxxxxxxxxxxxxxx
Subject: Re: [STDS-802-Privacy] Collision probabilities for randomly chosen MAC addresses for privacy (aka the birthday paradox) - doing the sums in your head
From: Rene Struik <rstruik.ext@xxxxxxxxx>
Date: Thu, 5 Feb 2015 09:08:03 -0500

Hi Mick:

In case you are interested in extensions of the birthday attack to kcollisions (instead of one), please see the attached SAC 2001 paper ofFabian Kuhn and myself. The basic result is that if the sample spacesize N is not too small, one can expect to need roughly SQRT(kN) samplesto get k collisions (i.e., roughly SQRT(k) times the number for onecollision). More precise results are in the paper. It was recently shown(paper to be presented at Eurocrypt 2015) that for the discrete logproblem, these "random walks" are essentially the best one can do.


Best regards, Rene


On 2/5/2015 1:20 AM, Mick Seaman wrote:

I have talked with a number of people who have been surprised at theprobability of at least one collision occurring when n participantseach independently and randomly chose one of of m values. While theanswer to this problem, for any particular n, m can be found by usinga handy app or searching for a slide presentation, I personally findresorting to electronics to do the calculation unsatisfying as itdoesn't provide any insight it to what is happening - or any help withevaluating alternatives. [For a brief anecdote as to why I feel thisway see the end of this note].
For the range of collision probabilities that are of interest to us itis trivial to perform the mental calculation. A good approximation forp (probability of at least one collision) is:
p = n**2/2m
In other words if we have M bits of random space to pick from, and thenumber of participants can fit in N bits then an upper bound for theprobability of at least one collision is:
1 in 2**(M+1-2N)
e.g. given 32 bits to pick from, and 2**11 participants picking thereis a 1 in 2**(32+1-22) = 1 in 2**11 = 1 in 2048 chance of at least onecollision. [Always accepting the questionable assumption that each ofthe participants is using a good random number generator to make thechoice].
The above formula, p = n**2/2m is given on the Wikipedia page for thebirthday paradox (I changed my initial calculation to use p, n, and mto agree with that page). The page notes that the approximation isgood for p up to 0.5, way beyond the figures of interest to us Isuspect. For p = 0.5 we get the simple n ~ sqrt(m), 19 for realbirthdays (choice from 365) which is just a little lower than the real23 (for uniform distribution).
It's worth looking at the math to get some feel for what is going on.A crude argument may provide more insight that the full analysis,which I'll give later. Consider each of our n participants picking anumber (address) in turn. The first will have no chance of collision,the second 1 in m, the third 2 in m, ... the last (n-1) in m -assuming there has been no collision so far. So the *average*participant will have an ((n-1)/2) in m of picking a value that hasalready been picked by someone else. So the chance of *someone*colliding equals the number of participants times the averagecollision probability i.e.
p = n(n-1)/2m ~ n**2/2m
So the 'paradox' in the birthday paradox comes from the fact that notonly is the chance of any one individual hitting upon a value that hasbeen chosen before proportional to the density of choices, but thatprobability is repeated for every participant, thus making the squareof the number of participants the significant factor.
More rigorously, we consider the number of possible arrangements ofindependent choices. The number of ways that n participants can choosevalues from a field of m is n**m. For non-colliding choices the firstparticipant has no choice of collision (can choose any value withoutcollision), the second can choose any of the remaining (m-1), thethird any of the remaining (m-2), the last any of the remaining(m-n-1). So the number of non-colliding choices is m!/(m-(n-1))! [ mfactorial divided by (m-(n-1)) factorial. We can write this as theproduct of the terms:
  (1 - 0/m).(1-1/m).(1-2/m)....(1-(n-3)/m).(1-(n-2)/m).(1-(n-1)/m)

multiplying this out we have 1 minus terms in (1/m)**1, (1/m)**2, etc.
Taking the terms in (1/m) in pairs (pairing first with last, secondwith next to last etc. so each pair sums to (n-1) and then multiplyingby the number of pairs (n/2) we have:
- 1/m(n-1)(n/2)   ... as by the crude argument above
The terms in (1/m)**2 will have n**3 in the denominator, but for smalln/m they do not concern us unduly and the series converges rapidly andwith alternating sign. For more than you wanted to know see
http://en.wikipedia.org/wiki/Birthday_problem

Mick
...
[Why I think using an app to calculate these probabilities instead ofsimple approximation blunts the intellect. Years ago (many many yearsago) when I was supervising an undergraduate physics practical classand almost no student could afford their own calculator, the Cavendishhad bought a number of desktop calculators for shared use, and I askeda student who was queuing to use one of these what he wanted tocalculate - thinking I might display some mental wizardry and get theguy back to work at the same time. He answered "the square root of25". He got it when I said "think about it".]



--
email: rstruik.ext@xxxxxxxxx | Skype: rstruik
cell: +1 (647) 867-5658 | US: +1 (415) 690-7363

Attachment: kuhn-struik-Pollard-rho-for-multiple-DLPs-SAC-2001.pdf
Description: Adobe PDF document

Follow-Ups:
- Re: [STDS-802-Privacy] Collision probabilities for randomly chosen MAC addresses for privacy (aka the birthday paradox) - doing the sums in your head
  - From: Paul Lambert

References:
- [STDS-802-Privacy] Collision probabilities for randomly chosen MAC addresses for privacy (aka the birthday paradox) - doing the sums in your head
  - From: Mick Seaman

Prev by Date: [STDS-802-Privacy] February 4 - Teleconference details
Next by Date: [STDS-802-Privacy] Collision probabilities for randomly chosen MAC addresses for privacy (aka the birthday paradox) - doing the sums in your head
Previous by thread: [STDS-802-Privacy] Collision probabilities for randomly chosen MAC addresses for privacy (aka the birthday paradox) - doing the sums in your head
Next by thread: Re: [STDS-802-Privacy] Collision probabilities for randomly chosen MAC addresses for privacy (aka the birthday paradox) - doing the sums in your head
Index(es):
- Date
- Thread