Pareto Charts

I’ve been reading Stephen Few’s excellent book, “Now You See It”. Its a big book (300+) pages, packed with practical examples, and one that I think any Entrance user will find useful. The Pareto Chart above is loosely based on one of his examples (p. 196).

In a Pareto Chart, bars are drawn with a line showing accumulated sums. The bars are ordered for decreasing height, so the changes in the sums decrease as well. This makes it possible to read off the largest contributors to the total. In our example, you can see that the first three categories account for 88.9% of total sales.

One way to accumulate sums is to use a MySQL variable, like this:

    SET @sum=0;
    SELECT @sum := @sum+sales FROM t;

To re-express values in a table column as percentages, use a pattern like this:

    SET @total = (SELECT SUM(sales) FROM t);
    SELECT (sales / @total) * 100 FROM t;

Putting the pieces together, adding PLOT, and filling in a few details we have the Pareto Chart shown above:

    -- DROP TABLE IF EXISTS t;
    -- DROP TABLE IF EXISTS t2;

-- the sample data
    CREATE TABLE t (what varchar(255), sales double);
    INSERT INTO t VALUES
      ('bagels', 20), ('donuts', 10), ('coffee', 80), ('prune danish', '40'),
      ('scones', 2.5), ('toast', 5);

-- accumulate totals
    SET @sum = 0.0;
    CREATE TABLE t2
    SELECT what, sales, @sum := sales+@sum AS accum
      FROM t
      ORDER BY sales DESC
      ;

-- calculate percentages
    SET @total = (SELECT SUM(sales) FROM t2);
    ALTER TABLE t2 ADD COLUMN accum_pct DOUBLE AFTER accum;
    ALTER TABLE t2 ADD COLUMN sales_pct DOULBE AFTER accum;
    UPDATE t2
      SETt accum_pct = (accum/@total) * 100.0,
      sales_pct = (sales/@total) * 100.0;

-- make the plot
    PLOT
      AXISLABELS, LIGHT GRAY BAR,
      RED LINE, BLACK FILLED CIRCLE, DATALABELS
    WITH
      SCALE Y 0 100 25
      FORMAT Y DECIMAL "#'%'"
      NO SIDES
      TITLE " "
      TITLE Y "Percentage of Sales"
      TITLE X "Item"
    SELECT
      what, sales_pct,
      accum_pct, accum_pct, CONCAT(FORMAT(accum_pct, 1), '% ')
      FROM t2
      ORDER BY sales DESC
      ;

Note: You have to be careful about updating user defined variables in this way, as discussed in “High Performance SQL” by Schartz et al. and on Roland Bouman’s blog. This script works because we haven’t made assumptions about the evaluation order of the columns.

Comments are closed.