Re: [STDS-802-Privacy] Collision probabilities for randomly chosen MAC a

Thread Links	Date Links
Thread Prev	Thread Next	Thread Index	Date Prev	Date Next	Date Index

Thread Links

Date Links

Re: [STDS-802-Privacy] Collision probabilities for randomly chosen MAC addresses for privacy (aka the birthday paradox) - doing the sums in your head

To: STDS-802-PRIVACY@xxxxxxxxxxxxxxxxx

Subject: Re: [STDS-802-Privacy] Collision probabilities for randomly chosen MAC addresses for privacy (aka the birthday paradox) - doing the sums in your head

From: Robert Moskowitz <rgm@xxxxxxxxxxxxxxxxxxxx>

Date: Thu, 5 Feb 2015 16:54:35 -0500

Title: Standard

To me, 50% is "expect this to happen, it might not but expect it."

Whereas say 10% is "plan on this happening, but don't be supprised if it does not."

I have used this in classrooms with success.

But then this is what Sir R.A. Fisher had to deal with in 1905 when he set the 'bar' for statistical reporting in botanical experiments in the Royal Academy of Science. The literature of the time was rife with studies that something occured in the field 60% so it was significant (more than half the time!). No, said SIr Fisher, you need 87% to be significant and 95% to be very significant. Or some such; this was a part of my education some 50 years after he set the standard. :)

I guess in our case we flip the numbers. If a collision has a 13% chance of occuring that is significant?

Ah Stadistics. I HAD the course work for the degree, but did not persue it. Good thing for me and the field...

What is next? A discussion on Renewal Theory? How many IP addresses do you have to stock in your warehouse given a field usage rate of N and a restocking time of M? The classic lemma only took a week of lectures to cover. Oh, but that was for lightbulb warehousing...

Well, back to comment resolution review.

On 02/05/2015 04:00 PM, Paul Lambert wrote:

I find the metric of a 50% collision probability for a single
collision to be the most useful for engineering comparisons.

The slope of the probability from very low to high is very steep
on a logarithmic scale (powers of 2) at the 50% break point.
Knowing where the break point happens (at 50%) gives a
uniform way to compare different probabilities for different
numbers of bits.  It¹s also easy to calculate in your head.

Paul


On 2/5/15, 11:08 PM, "Rene Struik" <rstruik.ext@xxxxxxxxx> wrote:

Hi Mick:

In case you are interested in extensions of the birthday attack to k
collisions (instead of one), please see the attached SAC 2001 paper of
Fabian Kuhn and myself. The basic result is that if the sample space
size N is not too small, one can expect to need roughly SQRT(kN) samples
to get k collisions (i.e., roughly SQRT(k) times the number for one
collision). More precise results are in the paper. It was recently shown
(paper to be presented at Eurocrypt 2015) that for the discrete log
problem, these "random walks" are essentially the best one can do.

Best regards, Rene


On 2/5/2015 1:20 AM, Mick Seaman wrote:

I have talked with a number of people who have been surprised at the
probability of at least one collision occurring when n participants
each independently and randomly chose one of of m values. While the
answer to this problem, for any particular n, m can be found by using
a handy app or searching for a slide presentation, I personally find
resorting to electronics to do the calculation unsatisfying as it
doesn't provide any insight it to what is happening - or any help with
evaluating alternatives. [For a brief anecdote as to why I feel this
way see the end of this note].

For the range of collision probabilities that are of interest to us it
is trivial to perform the mental calculation. A good approximation for
p (probability of at least one collision) is:

p = n**2/2m

In other words if we have M bits of random space to pick from, and the
number of participants can fit in N bits then an upper bound for the
probability of at least one collision is:

1 in 2**(M+1-2N)

e.g. given 32 bits to pick from, and 2**11 participants picking there
is a 1 in 2**(32+1-22) = 1 in 2**11 = 1 in 2048 chance of at least one
collision. [Always accepting the questionable assumption that each of
the participants is using a good random number generator to make the
choice].

The above formula, p = n**2/2m is given on the Wikipedia page for the
birthday paradox (I changed my initial calculation to use p, n, and m
to agree with that page). The page notes that the approximation is
good for p up to 0.5, way beyond the figures of interest to us I
suspect. For p = 0.5 we get the simple n ~ sqrt(m), 19 for real
birthdays (choice from 365) which is just a little lower than the real
23 (for uniform distribution).

It's worth looking at the math to get some feel for what is going on.
A crude argument may provide more insight that the full analysis,
which I'll give later. Consider each of our n participants picking a
number (address) in turn. The first will have no chance of collision,
the second 1 in m, the third 2 in m, ... the last (n-1) in m -
assuming there has been no collision so far. So the *average*
participant will have an ((n-1)/2) in m of picking a value that has
already been picked by someone else. So the chance of *someone*
colliding equals the number of participants times the average
collision probability i.e.

p = n(n-1)/2m ~ n**2/2m

So the 'paradox' in the birthday paradox comes from the fact that not
only is the chance of any one individual hitting upon a value that has
been chosen before proportional to the density of choices, but that
probability is repeated for every participant, thus making the square
of the number of participants the significant factor.

More rigorously, we consider the number of possible arrangements of
independent choices. The number of ways that n participants can choose
values from a field of m is n**m. For non-colliding choices the first
participant has no choice of collision (can choose any value without
collision), the second can choose any of the remaining (m-1), the
third any of the remaining (m-2), the last any of the remaining
(m-n-1). So the number of non-colliding choices is m!/(m-(n-1))! [ m
factorial divided by (m-(n-1)) factorial. We can write this as the
product of the terms:

  (1 - 0/m).(1-1/m).(1-2/m)....(1-(n-3)/m).(1-(n-2)/m).(1-(n-1)/m)

multiplying this out we have 1 minus terms in (1/m)**1, (1/m)**2, etc.
Taking the terms in (1/m) in pairs (pairing first with last, second
with next to last etc. so each pair sums to (n-1) and then multiplying
by the number of pairs (n/2) we have:

- 1/m(n-1)(n/2)   ... as by the crude argument above

The terms in (1/m)**2 will have n**3 in the denominator, but for small
n/m they do not concern us unduly and the series converges rapidly and
with alternating sign. For more than you wanted to know see

http://en.wikipedia.org/wiki/Birthday_problem

Mick
...
[Why I think using an app to calculate these probabilities instead of
simple approximation blunts the intellect. Years ago (many many years
ago) when I was supervising an undergraduate physics practical class
and almost no student could afford their own calculator, the Cavendish
had bought a number of desktop calculators for shared use, and I asked
a student who was queuing to use one of these what he wanted to
calculate - thinking I might display some mental wizardry and get the
guy back to work at the same time. He answered "the square root of
25". He got it when I said "think about it".]


-- 
email: rstruik.ext@xxxxxxxxx | Skype: rstruik
cell: +1 (647) 867-5658 | US: +1 (415) 690-7363

--
Robert Moskowitz
Owner
HTT Consulting
C:

248-219-2059
F:

248-968-2824
E:

rgm@xxxxxxxxxxxxxxxxxxxx

There's no limit to what can be accomplished if it doesn't matter who gets the credit