Archive for October, 2009

Benford’s Law

Friday, October 30th, 2009

Benford’s Law predicts the frequency with which each of the digits 1-9 will appear as the first digit in values taken from a real world data set. Benford’s law holds true often enough that forensic accountants suspect financial fraud when it doesn’t. More about Benford’s Law on Wikipedia. and in “The Drunkard’s Walk” by Leonard Mlodinow.

This will make a table of the Benford’s percentages:

    CREATE TABLE benfordtemp (
        d INT,
        p DOUBLE
    );

    INSERT INTO benfordtemp
        VALUES
        (1, 30.1),
        (2, 17.6),
        (3, 12.5),
        (4, 9.7),
        (5, 7.9),
        (6, 6.7),
        (7, 5.8),
        (8, 5.1),
        (9, 4.6);

This is how to calculate percentages for the first digit counts in a particular data set, in this case country populations:

    CREATE TABLE valtemp (
        k int
    );

    INSERT INTO valtemp
        SELECT SUBSTRING(population, 1, 1)
        FROM Country;

    SET @TOTAL = (SELECT COUNT(*) FROM valtemp);

    SELECT
        k, (COUNT(*) / @total) * 100 ,
        (SELECT p FROM benfordtemp WHERE d=k)
        FROM valtemp
        GROUP BY k
        HAVING k > 0;

Use PLOT to put the two together in a bar chart like the one above:

    PLOT
        AXISLABELS, VERY LIGHT BLUE BARS, VERY BIG BLUE HORTICK
      WITH
        TITLE
          "Benford's Law Applied to Country Populations"
        FORMAT Y DECIMAL "#'%'"
        TITLE "First Digit of Population Value (1-9)"
        LEGEND AT 550 275
        NO SIDES
        NO TICKS X
    SELECT
        k, (COUNT(*) / @total) * 100 ,
        (SELECT p FROM benfordtemp WHERE d=k)
        FROM valtemp
        GROUP BY k
        HAVING k > 0;

The EntranceServlet

Friday, October 30th, 2009

(Please be aware that EntranceServlet is changing soon! Watch for another blog post)

The same scripts you use to make charts in the Entrance Desktop application can be used in the Entrance web application to serve up charts. These charts can be embedded in web pages using HTML IMG tags, just as you would any other image.

The Entrance web application requires Java and a servlet container, like Tomcat or Jetty. To install it, decide on the settings for these parameters:

Name Required? Description
script-root Yes The path to the Entrance script directory. The servlet will only execute scripts from this directory and its subdirectoies.
jdbc-url Yes A JDBC URL. The servlet will only execute scripts from this connection URL. eg. jdbc:mysql://localhost/test
username No The username used to connect to the database, defaults to ‘root’
password No The Password used to connect to the database, defaults to ” (empty string)
allowparameter   No True to allow parameter substituion, using
param.(parameter name)=(value)
 syntax.
WARNING:
Set this ‘true’ only when you are not concerned about “SQL Injection” attacks. If you need parameters, use PreparedStatements and make calls to the Entrance API instead. Defaults to false.

Use the “Install Servlet” tool, in the Entrance Desktop X Menu to make an Entrance Web Application Archive (WAR) file with the settings you want. (If you don’t see it there, follow the instructions below for adding it):

If you don’t have write permissions for the webapps directory on your server, write the Entrance WAR file somewhere else, eg. your home directory, then copy it as a super user.

After deploying Entrance, start and stop the server, then navigate your web browser to:

    http://(the URL of your server)/entrance/

You should see the Entrance welcome screen, which includes a link to a rowser that will let you see the PLOT scripts you have installed.

Use the normal HTML IMG tag to embed Entrance dynamic charts in a web page as you would any online PNG image. Image URLs take the form:

    http://(your server)/entrance/plot/(path to script)

where the ‘path’ is the page from script-root to the script.

Here’s a tip: If you are running Entrance Desktop on the machine where you are running the Entrance web application, add the script-root directory to the Entrance tree using File | Add folder… It will be easy to find and edit Entrance scripts. We’ve used this with remote desktops, too!

Adding InstallServletX
If you have upgraded Entrance from an older version, the installer servlet will not appear in the X Menu unless you have added it. To add the installer: Select “Edit this menu…” from the X Menu, then fill in the dialog that appears to match this:

Editing web.xml
There can be cases when you want to edit the web.xml file youself, without relying on InstallServelX to write the file for you. For example, this is the way to specify mutliple Entrance servlet contexts using different url/user/password combinations. If you need help with this, contact us at “support at entrance.com” or send email to the Entrance Google Group.

The Command Line Utility

Friday, October 30th, 2009

A command line utility is bundled with the Entrance desktop application. This utiility uses the same code to draw charts that the Entrance desktop application uses, but output is sent to the JPG or PNG file you specify.

It can be used to “batch” generation of PNG files from shell scripts.

Usage:

java -jar entrance.jar (options) -c (connection_name)
  -o (output file) (script file)

java -jar entrance.jar (options) -url (jdbc url)
  -u (user) -p (password) -o (output file) (script file)

Options:

-c (Entrance connection name)
-url (jdbc url), eg. jdbc:mysql://(host):(port)/(db)
-u (user name)
-p (password)
-w (integer width of the resulting image)
-h (integer height of the resulting image)
-o (PNG or JPG output file)
-d (JDBC driver class)


Examples:

java -jar entrance.jar
  -url jdbc:mysql://localhost/test -u joe -p bagadonuts
  -o out.png ~/area.sql

java -jar entrance.jar -c test -o out.png ~/area.sql

java -jar entrance.jar -c test -o out.png ~/area.sql
  -w 800 -h 400

The Levy Distribution

Friday, October 30th, 2009

The plot above shows several Levy distributions using formulas provided in the Wikipedia article about them.

Its fairly easy to make function plots like this. Use the Tools | “Generate a reference table…” to generate x values:

then use ALTER, UPDATE and the MySQL Mathematical Functions to add a column and fill it in with function values:

    ALTER TABLE ref1
        ADD COLUMN y double;

    SET @c = 1.0;
    UPDATE ref1
        SET y = SQRT(@c / (PI() * 2.0))
            * (EXP(-@c/ (2.0 * x)))
            / (POW(x, 1.5 )) ;

Finally, do the plot:

    PLOT SCATTER
        X, RED LINE, DARK GREEN LINE, BLUE LINE
      WITH
        TITLE "Levy Distributions"
        SCALE Y 0 1.0 0.25
        FORMAT Y DECIMAL "#.##"

        SCALE Y2 0 1.0 0.25
        NO LABELS Y2
        TICKS X2 LEFT

        SCALE X 0 3.0 1.0
        SCALE X2 0 3.0 1.0
        NO LABELS X2
        TICKS X2 BOTTOM
        LIGHT GRAY GRID LINES
        LEGEND BOTTOM
    SELECT
        x,
        y AS "c = 0.5",
        y2 AS "c = 1.0",
        y3 AS "c = 2.0"
        FROM ref1;

Comparing Today to Yesterday

Friday, October 30th, 2009

This is output from a script I cooked up for comparing today’s visitor counts to yesterday’s. The idea is to see if the counts are in line with normal activity: Hourly counts that went low would mean something wrong with the web site. High counts, which happen occasionally, mean that someone gave Entrance a plug somewhere.

(This chart was made a little after 9 AM this morning, so the red bar for 9 is low representing an incomplete count)

To ensure that any 0 counts show up, I include a LEFT JOIN with a table containing the values 0-23. SKINNY makes the second set of bars 1/3 the width of the first set. Here’s the complete script:

-- hours 0 - 23
create temporary table hours_t
    (h int, today int, yesterday int);

insert into hours_t(h) values
    (0), (1), (2), (3), (4), (5), (6), (7), (8),
    (9), (10), (11), (12), (13), (14), (15), (16),
    (17), (18), (19), (20), (21), (22), (23);

create temporary table today_t
    select h, count(distinct ip) c
      from ACCESS
      where day(dtime) = day(curdate())
      group by h;

create temporary table yesterday_t
    select h, count(distinct ip) c2
      from ACCESS
      where date(dtime) =
        date(date_sub(curdate(), interval 1 day))
      group by h;

plot
    all axislabels, very light blue bar, light red bar
    with
      no sides
      baroverlap 1
      skinny
      comment left "Number of Visitors"
      title x "Hour of Day"
select hours_t.h, c2, c
    from (hours_t LEFT JOIN today_t
    on hours_t.h = today_t.h) LEFT JOIN
    yesterday_t
    on hours_t.h = yesterday_t.h;

With a similar technique, you could compare this months performance to last months, or days in this weeks to last weeks, etc.

Bar Charts

Friday, October 30th, 2009


This PLOT command made the simple bar chart above (click the chart for a complete script):

    PLOT
        AXISLABELS, LIGHT GRAY BARS, LIGHT RED BARS
      WITH
        TITLE "Main Title"
        TITLE X "X Axis"
        NO SIDES
        TITLE Y "Y Axis"
        GRIDLINES HORIZONTAL
        LEGEND BOTTOM
    SELECT
        lbl, Series1, Series2
        FROM foof;

Use OUTLINES to add bar outlines:

Use BARWIDTH (0.0-1.0) to control the bar widths (these are “BARWIDTH .5″):

Use BARSEPARATION (number of pixels) to control the space between bars in individual clusters (these are “BARSEPARATION 10″):

And use BAROVERLAP (0 or 1) to determine whether bars overlap or not (these are “BAROVERLAP 1″):

Sparklines

Friday, October 30th, 2009

Sparklines are “small, simple, intense datawords” like this spark1 this: and this:

Edward Tufte describes sparklines and provides examples in in his book, Beautiful Evidence.

Use Entrance PAGE and FRAME to make sparkline-sized graphics:

    PLOT
        SKYBLUE LINE
      WITH
        PAGE 0 0 100 20
        FRAME 0 0 100 20
        NO XAXIS
        NO YAXIS
        NO SIDES
    SELECT price FROM price_earnings;

Export the resulting images to PNG files from the chart popup. You can also automate PNG generation by running Entrance from the command line or serve sparkline images using EntranceServlet.

Here are a few more examples, along the lines of the ones in Tufte’s book (click the image to see the Entrance script that made it):

This is an HTML table containing three RotatedBarChart sparklines:

this is the first row in the table
this is the second
this is the third

and this is how you script them:

    PLOT ROTATEDBARCHART
        SKYBLUE BAR
      WITH
        PAGE 0 0 200 20
        FRAME 0 0 100 20
        NO YAXIS
        NO SIDES
        NO LABELS X
        NO TICKS X
        SCALE 0 100 100
    SELECT 75;

Bubble Charts

Friday, October 30th, 2009


 
    CREATE TEMPORARY TABLE t(
        l VARCHAR(255), x DOUBLE, y DOUBLE,
      s INT, c VARCHAR(255)
    );

    INSERT INTO t
      VALUES('A', 10,10, 30, 'transparent light green'),
        ('B', 10,20, 40, 'transparent light yellow'),
        ('C', 20, 40, 120, 'transparent light blue'),
        ('D', 40, 60, 180, 'transparent light red'),
        ('E', 55, 50, 70, 'transparent light purple');

    PLOT SCATTER
        X, FILLED CIRCLE, DATALABELS CENTERED,
        SIZE OVERRIDE, COLOR OVERRIDE
      WITH
        SCALE X 0 75 25
        SCALE Y 0 120 40
        NO XAXIS
        NO YAXIS
        NO SIDES
    SELECT x,y, l, s, c
        FROM t;

To make a Bubble Chart, use SCATTER and assign columns for the SIZE OVERRIDE and COLOR OVERRIDE.

The color column should should contain an Entrance color name or the empty string for each row. The size column should contain an integer or null. (Size null causes nothing to be drawn)

Something similar can be used to make a Venn diagrams:


    CREATE TEMPORARY TABLE t(
        lc VARCHAR(255),
        x DOUBLE, y DOUBLE, s INT, c VARCHAR(255)
    );

    INSERT INTO t
        VALUES (
        ' LEFT',
          28, 15, 150, 'very trnasparent light blue'),
        ('RIGHT ',
          18, 15, 150,'very transparent yellow'),
        ('TOP',
          23, 33, 150, 'very transparent light red');

    PLOT SCATTER
        X, FILLED CIRCLE
        DATALABELS CENTERED,
    SIZE OVERRIDE, COLOR OVERRIDE
      WITH
        SCALE X 0 50 25
        SCALE Y 0 50 25
        NO XAXIS
        NO YAXIS
        NO SIDES
    SELECT x,y, lc, s, c
        FROM t;

Please note: Beginning with version 1.4.11, we have changed the units used for SIZE OVERRIDEs. The size is now specified in virtual page coordinates, 1-1000, with 1000 equal to the full width of the current page. With the desktop window sized for the full screen you won’t see much change. However, “bubbles” can now resize nicely to fit on small pages, like iPhone screens.

Legends

Friday, October 30th, 2009



    PLOT
        AXISLABELS, VERY LIGHT BLUE BAR,
        LIGHT GRAY BAR, VERY LIGHT RED BAR
     WITH
        LEGEND
        PAGE THIRD
        OUTLINE
    SELECT
        labels, col1, col2, col3 FROM bartemp;

Use LEGEND to draw a legend and specify its location. By default, the legend is drawn on the right hand side of the chart. Use LEGEND BOTTOM to place it at the bottom:



    PLOT
        AXISLABELS, VERY LIGHT BLUE BAR,
        LIGHT GRAY BAR, VERY LIGHT RED BAR
     WITH
        LEGEND
        PAGE THIRD
        OUTLINE
    SELECT
        labels, col1, col2, col3 FROM bartemp;

Use LEGEND AT (x) (y) to position a vertical legend somewhere else on the page. Specify coordinates using virtual page coordinates, 0-1000.


    PLOT
        AXISLABELS, VERY LIGHT BLUE BAR,
        LIGHT GRAY BAR, VERY LIGHT RED BAR
     WITH
        LEGEND AT 200 300
        PAGE THIRD
        OUTLINE
    SELECT
        labels, col1, col2, col3 FROM bartemp;

The problem with LEGEND AT:: Using LEGEND AT and a fixed FRAME without a fixed PAGE will allow the legend to move in relation to the frame when the page is resized. (The problem is that the FRAME is set using pixels) This will be fixed when we change FRAME to use the same coordinates as LEGEND AT.

Legend makes reasonable choices when a chart mixes types, eg:


    PLOT
        AXISLABELS, VERY LIGHT BLUE AREA,
        LIGHT BLUE BAR, SCARLETRED CIRCLES AND LINE
      WITH
        LEGEND
        PAGE THIRD
        OUTLINE
    SELECT
        labels, col1, col2, col3 FROM bartemp;

Legends for Overrides
There are several tricks you can use to provide Entrance legends for color or size overrides.

Use LEGENDITEM to add a legend item for a bar or area color set with a COLOR OVERRIDE:


    PLOT BARCHART
        LIGHT BLUE BARS, COLOR OVERRIDE
      WITH
        LEGEND RIGHT
        LIGHT RED LEGENDITEM "Interesting"
        NO YAXIS
        NO SIDES
    SELECT
        jan AS 'Not interesting',
          IF(year = 1920 or year = 1921,
            'light red', '') FROM CPI
        LIMIT 12;

Use NULL in the select statement to add additional legend items for line color overrides:


    PLOT BARCHART
        LIGHT GRAY THICK LINE,
        SCARLETRED THICK LINE,
        COLOR OVERRIDE
      WITH
        LEGEND RIGHT
        NO YAXIS
        NO SIDES
        COLLAR
    SELECT
        jan AS 'Not interesting', null as 'Interesting',
        IF(year = 1920 or year = 1921, 'scarletred', '') FROM CPI
        LIMIT 12;

Use AS in a SELECT statement to change the legend label, or hide it using a an empty string:


    PLOT
        AXISLABELS, BIG FILLED CHAMELEON BOX AND LINE,
        BIG FILLED SKYBLUE TRIANGLE AND LINE,
        BIG FILLED SCARLETRED CIRCLE AND LINE
      WITH
        LEGEND
        COLLAR
        NO YAXIS
        NO XAXIS
        NO SIDES
    SELECT
        labels,
        col4 AS "",       -- hidden
        col3 AS "",       -- hidden
        col2 AS "Just Me!"       -- changed
        FROM bartemp;

Use repeated column names to create legend items for series where series type is drawn on top of another:


    PLOT XYCHART
        X,
        LIGHT BLUE THICK LINE,
        BIG PURPLE FILLED BOX,
        GREEN THICK LINE,
        BLUE BIG FILLED CIRCLE
        BUTTER THICK LINE,
        BIG RED FILLED CIRCLE
      WITH
        LEGEND
        NO YAXIS
        NO XAXIS
        NO SIDES
    SELECT
        C0, C1, C1, C2, C2, C3, C3
        FROM carnivalTEMP;

Color a Swath

Friday, October 30th, 2009



    PLOT
        AXISLABELS,
        SWATH,
        RGB(255,64,64) SWATH,
        RGB(255,96,96) SWATH,
        RGB(55,128,128) SWATH,
        RGB(55,160,160) SWATH,
        RGB(255,192,192) SWATH,
        RGB(255,224,224) SWATH

Use SWATH to paint color between two series. SWATHes can be mixed and matched with other series types like LINE, BAR, and AREA.

The first swath you specify determines a baseline. Each swath after the first paints a band of color. Another example:

    PLOT
        AXISLABELS, SWATH, LIGHT RED SWATH,
          LIGHT GREEN SWATH, LIGHT BLUE SWATH
    SELECT
        year, jan, jan*4, jan *5, jan*6
        FROM CPI
        WHERE year < 2007;

Use a completely transparent swath, when you need to start a new baseline:

  PLOT
        AXISLABELS,
        SWATH,
        LIGHT RED SWATH,
        RGBA(255,255,255,0) SWATH,
        LIGHT BLUE SWATH
      WITH
        SOLID THICK GRIDLINES
    SELECT
        year, jan, jan*4 , jan*5,
        jan*6
        FROM CPI
        WHERE year < 2007;

More examples (click an image to see the script that made it):