I’ve been reading Stephen Few’s excellent book, “Now You See It”. Its a big book (300+) pages, packed with practical examples, and one that I think any Entrance user will find useful. The Pareto Chart above is loosely based on one of his examples (p. 196).
In a Pareto Chart, bars are drawn with a line showing accumulated sums. The bars are ordered for decreasing height, so the changes in the sums decrease as well. This makes it possible to read off the largest contributors to the total. In our example, you can see that the first three categories account for 88.9% of total sales.
One way to accumulate sums is to use a MySQL variable, like this:
SET @sum=0;
SELECT @sum := @sum+sales FROM t;
To re-express values in a table column as percentages, use a pattern like this:
SET @total = (SELECT SUM(sales) FROM t);
SELECT (sales / @total) * 100 FROM t;
Putting the pieces together, adding PLOT, and filling in a few details we have the Pareto Chart shown above:
-- DROP TABLE IF EXISTS t;
-- DROP TABLE IF EXISTS t2;
-- the sample data
CREATE TABLE t (what varchar(255), sales double);
INSERT INTO t VALUES
('bagels', 20), ('donuts', 10), ('coffee', 80), ('prune danish', '40'),
('scones', 2.5), ('toast', 5);
-- accumulate totals
SET @sum = 0.0;
CREATE TABLE t2
SELECT what, sales, @sum := sales+@sum AS accum
FROM t
ORDER BY sales DESC
;
-- calculate percentages
SET @total = (SELECT SUM(sales) FROM t2);
ALTER TABLE t2 ADD COLUMN accum_pct DOUBLE AFTER accum;
ALTER TABLE t2 ADD COLUMN sales_pct DOULBE AFTER accum;
UPDATE t2
SETt accum_pct = (accum/@total) * 100.0,
sales_pct = (sales/@total) * 100.0;
-- make the plot
PLOT
AXISLABELS, LIGHT GRAY BAR,
RED LINE, BLACK FILLED CIRCLE, DATALABELS
WITH
SCALE Y 0 100 25
FORMAT Y DECIMAL "#'%'"
NO SIDES
TITLE " "
TITLE Y "Percentage of Sales"
TITLE X "Item"
SELECT
what, sales_pct,
accum_pct, accum_pct, CONCAT(FORMAT(accum_pct, 1), '% ')
FROM t2
ORDER BY sales DESC
;
Note: You have to be careful about updating user defined variables in this way, as discussed in “High Performance SQL” by Schartz et al. and on Roland Bouman’s blog. This script works because we haven’t made assumptions about the evaluation order of the columns.
