Sunday, March 06, 2011

Skewed distribution

Distributing items (IP packets, phone calls, http requests) randomly onto a set of machines (say, four) is easy (in C here):

int slot = random_nonnegative () % 4;

The number space returned by the random number generator is big enough that the items are essentially evenly distributed even if the number of slots wasn't a divider of the size of the random number space.

I want a skewed distribution, however. The slots shall be loaded unevenly. Thus:

int slot = random_nonnegative () % 4;
slot = random_nonnegative () % (slot + 1);

That is, if slot is 3 after the first line, then we have an even change to get to any of the slots in the second line. If it is 0 we can only end up there. In essence there are four ways to get to slot 0, three to slot 1, two to slot 2, and one to slot 3. Zeroth-order intuition would thus say that four out of ten (1+2+3+4) items would end up in slot zero.

Except that the experimental results don't match. And for a reason. For slot being 3 in the first line, a fourth of the item go to each slot; for slot being 2, a third goes to each of the three first slots, and so on. Thus slot 3 receives a quarter of a quarter, or 1/16th of all items (and not one tenth); slot 2 receives a quarter of a quarter likewise, plus a third of a quarter via slot being 2 on first line; slot 0 finally gets 1/16+1/12+1/8+1/4, which is 3/48+4/48+6/48+12/84, which is 25/48, or about 52%. Which is a bit more than the 40% of our first guess.


char *target = targetarray [
random_nonneg () % (
random_nonneg () % (sizeof (targetarray) /
sizeof (targetarray [0]))
+ 1)];

for the golfing old-style C hacker.)