Benford’s Law

Benford’s Law predicts the frequency with which each of the digits 1-9 will appear as the first digit in values taken from a real world data set. Benford’s law holds true often enough that forensic accountants suspect financial fraud when it doesn’t. More about Benford’s Law on Wikipedia. and in “The Drunkard’s Walk” by Leonard Mlodinow.

This will make a table of the Benford’s percentages:

    CREATE TABLE benfordtemp (
        d INT,
        p DOUBLE
    );

    INSERT INTO benfordtemp
        VALUES
        (1, 30.1),
        (2, 17.6),
        (3, 12.5),
        (4, 9.7),
        (5, 7.9),
        (6, 6.7),
        (7, 5.8),
        (8, 5.1),
        (9, 4.6);

This is how to calculate percentages for the first digit counts in a particular data set, in this case country populations:

    CREATE TABLE valtemp (
        k int
    );

    INSERT INTO valtemp
        SELECT SUBSTRING(population, 1, 1)
        FROM Country;

    SET @TOTAL = (SELECT COUNT(*) FROM valtemp);

    SELECT
        k, (COUNT(*) / @total) * 100 ,
        (SELECT p FROM benfordtemp WHERE d=k)
        FROM valtemp
        GROUP BY k
        HAVING k > 0;

Use PLOT to put the two together in a bar chart like the one above:

    PLOT
        AXISLABELS, VERY LIGHT BLUE BARS, VERY BIG BLUE HORTICK
      WITH
        TITLE
          "Benford's Law Applied to Country Populations"
        FORMAT Y DECIMAL "#'%'"
        TITLE "First Digit of Population Value (1-9)"
        LEGEND AT 550 275
        NO SIDES
        NO TICKS X
    SELECT
        k, (COUNT(*) / @total) * 100 ,
        (SELECT p FROM benfordtemp WHERE d=k)
        FROM valtemp
        GROUP BY k
        HAVING k > 0;

Comments are closed.