Back in spring of 2011 I ran a little data/stats experiment through Mechanical Turk, asking as many people as I could afford to list a random number between 1-100. I wanted to see if there was a characteristic “human” aspect to picking random numbers generated by a crowd.
Unfortunately that test was made useless by a single spammer in India who figured out how to submit “7″ over and over and collect all $15 I’d allocated for the project (“I’d pay $15 to find that out”) and so I ended up observing something else I already knew about humans that day instead.
But, today, I discovered that I may have just been trying to get to what wikipedia calls Benford’s Law. Somehow I’ve only ever properly studied statistics through high school biology class, and given that Benford’s project happened in 1938 I can only say of my hunch about such a pattern’s existence, “not bad for an amateur!”
Still, Benford’s law and findings remain fascinating to me. What did he find?
A popular example:
a list of the heights of the 60 tallest structures in the world by category shows that 1 is by far the most common leading digit, irrespective of the unit of measurement
Isn’t that amazing?
For numbers drawn from certain distributions, for example IQ scores, human heights or other variables following normal distributions, the law is not valid. However, if one “mixes” numbers from those distributions, for example by taking numbers from newspaper articles, Benford’s law reappears.
Two more cases:
This result has been found to apply to a wide variety of data sets, including electricity bills, street addresses, stock prices, population numbers, death rates, lengths of rivers, physical and mathematical constants, and processes described by power laws (which are very common in nature).
Some well-known infinite integer sequences provably satisfy Benford’s law exactly (in the asymptotic limit as more and more terms of the sequence are included). Among these are the Fibonacci numbers, the factorials, the powers of 2, and the powers of almost any other number.
There are even uses in practice Benford’s Law is borne out well enough that the IRS has used it as a basic check to determine whether tax forms contain inconsistencies — if the distribution of first digits doesn’t follow Benford’s Law, something may be amiss — and apparently it can sometimes also be used to test for election fraud. I can’t off-the-cuff conceive of a reason why this should be so, but I’m fascinated.
Via Mathworld, here’s the graph: