I’ve been reading Stephen Few’s excellent book, “Now You See It”. Its a big book (300+) pages, packed with practical examples, and one that I think any Entrance user will find useful. The Pareto Chart above is loosely based on one of his examples (p. 196).
In a Pareto Chart, bars are drawn with a line showing accumulated sums. The bars are ordered for decreasing height, so the changes in the sums decrease as well. This makes it possible to read off the largest contributors to the total. In our example, you can see that the first three categories account for 88.9% of total sales.
One way to accumulate sums is to use a MySQL variable, like this:
SELECT @sum := @sum+sales FROM t;
To re-express values in a table column as percentages, use a pattern like this:
SET @total = (SELECT SUM(sales) FROM t);
SELECT (sales / @total) * 100 FROM t;
Putting the pieces together, adding PLOT, and filling in a few details we have the Pareto Chart shown above:
-- DROP TABLE IF EXISTS t;
-- DROP TABLE IF EXISTS t2;
-- the sample data
CREATE TABLE t (what varchar(255), sales double);
INSERT INTO t VALUES
('bagels', 20), ('donuts', 10), ('coffee', 80), ('prune danish', '40'),
('scones', 2.5), ('toast', 5);
-- accumulate totals
SET @sum = 0.0;
CREATE TABLE t2
SELECT what, sales, @sum := sales+@sum AS accum
ORDER BY sales DESC
-- calculate percentages
SET @total = (SELECT SUM(sales) FROM t2);
ALTER TABLE t2 ADD COLUMN accum_pct DOUBLE AFTER accum;
ALTER TABLE t2 ADD COLUMN sales_pct DOULBE AFTER accum;
SETt accum_pct = (accum/@total) * 100.0,
sales_pct = (sales/@total) * 100.0;
-- make the plot
AXISLABELS, LIGHT GRAY BAR,
RED LINE, BLACK FILLED CIRCLE, DATALABELS
SCALE Y 0 100 25
FORMAT Y DECIMAL "#'%'"
TITLE " "
TITLE Y "Percentage of Sales"
TITLE X "Item"
accum_pct, accum_pct, CONCAT(FORMAT(accum_pct, 1), '% ')
ORDER BY sales DESC
Note: You have to be careful about updating user defined variables in this way, as discussed in “High Performance SQL” by Schartz et al. and on Roland Bouman’s blog. This script works because we haven’t made assumptions about the evaluation order of the columns.