*Update: *Check out Benford’s Law on R-Bloggers (via @BenfordsLaw) for lots, lots more.

—-

Back in spring of 2011 I ran a little data/stats experiment through Mechanical Turk, asking as many people as I could afford to list a random number between 1-100. I wanted to see if there was a characteristic “human” aspect to picking random numbers generated by a crowd.

Unfortunately that test was made useless by a single spammer in India who figured out how to submit “7” over and over and collect all $15 I’d allocated for the project (“I’d pay $15 to find that out”) and so I ended up observing something else I already knew about humans that day instead.

But, today, I discovered that I may have just been trying to get to what wikipedia calls Benford’s Law. Somehow I’ve only ever properly studied statistics through high school biology class, and given that Benford’s project happened in 1938 I can only say of my hunch about such a pattern’s existence, “not bad for an amateur!”

Still, Benford’s law and findings remain fascinating to me. What did he find?

A popular example:

a list of the heights of the 60 tallest structures in the world by category shows that 1 is by far the most common leading digit,

irrespective of the unit of measurement

Isn’t that amazing?

Further,

For numbers drawn from certain distributions, for example IQ scores, human heights or other variables following normal distributions, the law is not valid. However, if one “mixes” numbers from those distributions, for example by taking numbers from newspaper articles, Benford’s law reappears.

Two more cases:

This result has been found to apply to a wide variety of data sets, including electricity bills, street addresses, stock prices, population numbers, death rates, lengths of rivers, physical and mathematical constants, and processes described by power laws (which are very common in nature).

Some well-known infinite integer sequences provably satisfy Benford’s law exactly (in the asymptotic limit as more and more terms of the sequence are included). Among these are the Fibonacci numbers, the factorials, the powers of 2, and the powers of almost any other number.

There are even uses in practice Benford’s Law is borne out well enough that the IRS has used it as a basic check to determine whether tax forms contain inconsistencies — if the distribution of first digits doesn’t follow Benford’s Law, something may be amiss — and apparently it can sometimes also be used to test for election fraud. I can’t off-the-cuff conceive of a reason why this should be so, but I’m fascinated.

Via Mathworld, here’s the graph:

*thanks Jevgenij!*

I don’t think Benford’s Law would actually apply for “random numbers from 1-100″, but I think you’d have found something close to it if you’d given the range 1-200. It relies roughly on this idea: if you gave people the range [1:200], and they chose evenly from among that range, 111/200 of the responses would have a leading digit of 1, 12/200 of them would have a leading digit of 2, while the other seven possible leading digits would only account for 11/200. To get the nice curve requires more orders of magnitude, and it’s partly reliant on the assumption that the range of possible values either doesn’t cap out at a round power of 10 or tails off gradually to that rather than being evenly distributed.

That said, I think if your experiment hadn’t been taken over by that spammer I’m almost certain you’d have found a distribution statistically differentiable from random, because that’s just how people work. This is what the application of Benford’s Law to catch fraud relies on, though again it only works in the right conditions. For example, here’s someone trying to apply it to Russia’s recent election, finding that it doesn’t fit, but ending unconvinced that he’s proven anything because the underlying conditions may not lead to a fit even in the absence of fraud:

http://www.badscience.net/2012/03/is-there-statistical-evidence-of-fraud-in-the-russian-election-data/

yes, there are “characteristic “human” aspect to picking random numbers”, at least on an individual level. No, Benford does not work for truly random numbers (because each number has an equal chance of showing up as any other number). See the following for how and with what reasoning people making up fake numbers:

Gambarara, F., 2004: “Benford Distribution in Science”, ETH Zürich. Available at http://www.socio.ethz.ch/education/mtu/downloads/gambarara_nagy_2004_mtu.pdf

Judge, G. and Schechter, L., 2007: “Detecting Problems in Survey Data using Benford’s Law,” Journal of Human Resources, 44:1–24.

McGregor, G.: “Long Arm of Benford’s ‘Law’ Helps CRA Track Tax Cheats”, Ottawa Citizen; April 29, 2009.

Hill, T., 1988: “Random-Number Guessing and the First Digit Phenomenon”, Psychological Reports 62:967-71.

Both of you — thank you for the thoughtful and reasoned responses!

BTW, if you are interested in learning more about the underlying math:

http://www.quora.com/Why-is-it-that-in-many-data-sets-there-are-about-six-times-more-numbers-starting-with-the-digit-1-than-with-the-digit-9

Good read there

There’s an Excel spreadsheet to investigate Benford’s Law at http://investexcel.net/3420/benfords-law-excel/

That’s awesome! Thanks, Simon!