Thread Links | Date Links | ||||
---|---|---|---|---|---|
Thread Prev | Thread Next | Thread Index | Date Prev | Date Next | Date Index |
To me, 50% is "expect this to happen, it might not but expect it." Whereas say 10% is "plan on this happening, but don't be supprised if it does not." I have used this in classrooms with success. But then this is what Sir R.A. Fisher had to deal with in 1905 when he set the 'bar' for statistical reporting in botanical experiments in the Royal Academy of Science. The literature of the time was rife with studies that something occured in the field 60% so it was significant (more than half the time!). No, said SIr Fisher, you need 87% to be significant and 95% to be very significant. Or some such; this was a part of my education some 50 years after he set the standard. :) I guess in our case we flip the numbers. If a collision has a 13% chance of occuring that is significant? Ah Stadistics. I HAD the course work for the degree, but did not persue it. Good thing for me and the field... What is next? A discussion on Renewal Theory? How many IP addresses do you have to stock in your warehouse given a field usage rate of N and a restocking time of M? The classic lemma only took a week of lectures to cover. Oh, but that was for lightbulb warehousing... Well, back to comment resolution review. On 02/05/2015 04:00 PM, Paul Lambert
wrote:
I find the metric of a 50% collision probability for a single collision to be the most useful for engineering comparisons. The slope of the probability from very low to high is very steep on a logarithmic scale (powers of 2) at the 50% break point. Knowing where the break point happens (at 50%) gives a uniform way to compare different probabilities for different numbers of bits. It¹s also easy to calculate in your head. Paul On 2/5/15, 11:08 PM, "Rene Struik" <rstruik.ext@xxxxxxxxx> wrote:Hi Mick: In case you are interested in extensions of the birthday attack to k collisions (instead of one), please see the attached SAC 2001 paper of Fabian Kuhn and myself. The basic result is that if the sample space size N is not too small, one can expect to need roughly SQRT(kN) samples to get k collisions (i.e., roughly SQRT(k) times the number for one collision). More precise results are in the paper. It was recently shown (paper to be presented at Eurocrypt 2015) that for the discrete log problem, these "random walks" are essentially the best one can do. Best regards, Rene On 2/5/2015 1:20 AM, Mick Seaman wrote:I have talked with a number of people who have been surprised at the probability of at least one collision occurring when n participants each independently and randomly chose one of of m values. While the answer to this problem, for any particular n, m can be found by using a handy app or searching for a slide presentation, I personally find resorting to electronics to do the calculation unsatisfying as it doesn't provide any insight it to what is happening - or any help with evaluating alternatives. [For a brief anecdote as to why I feel this way see the end of this note]. For the range of collision probabilities that are of interest to us it is trivial to perform the mental calculation. A good approximation for p (probability of at least one collision) is: p = n**2/2m In other words if we have M bits of random space to pick from, and the number of participants can fit in N bits then an upper bound for the probability of at least one collision is: 1 in 2**(M+1-2N) e.g. given 32 bits to pick from, and 2**11 participants picking there is a 1 in 2**(32+1-22) = 1 in 2**11 = 1 in 2048 chance of at least one collision. [Always accepting the questionable assumption that each of the participants is using a good random number generator to make the choice]. The above formula, p = n**2/2m is given on the Wikipedia page for the birthday paradox (I changed my initial calculation to use p, n, and m to agree with that page). The page notes that the approximation is good for p up to 0.5, way beyond the figures of interest to us I suspect. For p = 0.5 we get the simple n ~ sqrt(m), 19 for real birthdays (choice from 365) which is just a little lower than the real 23 (for uniform distribution). It's worth looking at the math to get some feel for what is going on. A crude argument may provide more insight that the full analysis, which I'll give later. Consider each of our n participants picking a number (address) in turn. The first will have no chance of collision, the second 1 in m, the third 2 in m, ... the last (n-1) in m - assuming there has been no collision so far. So the *average* participant will have an ((n-1)/2) in m of picking a value that has already been picked by someone else. So the chance of *someone* colliding equals the number of participants times the average collision probability i.e. p = n(n-1)/2m ~ n**2/2m So the 'paradox' in the birthday paradox comes from the fact that not only is the chance of any one individual hitting upon a value that has been chosen before proportional to the density of choices, but that probability is repeated for every participant, thus making the square of the number of participants the significant factor. More rigorously, we consider the number of possible arrangements of independent choices. The number of ways that n participants can choose values from a field of m is n**m. For non-colliding choices the first participant has no choice of collision (can choose any value without collision), the second can choose any of the remaining (m-1), the third any of the remaining (m-2), the last any of the remaining (m-n-1). So the number of non-colliding choices is m!/(m-(n-1))! [ m factorial divided by (m-(n-1)) factorial. We can write this as the product of the terms: (1 - 0/m).(1-1/m).(1-2/m)....(1-(n-3)/m).(1-(n-2)/m).(1-(n-1)/m) multiplying this out we have 1 minus terms in (1/m)**1, (1/m)**2, etc. Taking the terms in (1/m) in pairs (pairing first with last, second with next to last etc. so each pair sums to (n-1) and then multiplying by the number of pairs (n/2) we have: - 1/m(n-1)(n/2) ... as by the crude argument above The terms in (1/m)**2 will have n**3 in the denominator, but for small n/m they do not concern us unduly and the series converges rapidly and with alternating sign. For more than you wanted to know see http://en.wikipedia.org/wiki/Birthday_problem Mick ... [Why I think using an app to calculate these probabilities instead of simple approximation blunts the intellect. Years ago (many many years ago) when I was supervising an undergraduate physics practical class and almost no student could afford their own calculator, the Cavendish had bought a number of desktop calculators for shared use, and I asked a student who was queuing to use one of these what he wanted to calculate - thinking I might display some mental wizardry and get the guy back to work at the same time. He answered "the square root of 25". He got it when I said "think about it".]-- email: rstruik.ext@xxxxxxxxx | Skype: rstruik cell: +1 (647) 867-5658 | US: +1 (415) 690-7363 --
Robert Moskowitz Owner HTT Consulting C: F: E: There's no limit to what can be accomplished if it doesn't matter who gets the credit |