_images/toyplot.png

Data Tables

Overview

Data tables, with rows containing observations and columns containing variables or series, are arguably the cornerstone of science. Much of the functionality of Toyplot or any other plotting package can be reduced to a process of mapping data series from tables to properties like coordinates and colors. To facilitate this, Toyplot provides toyplot.data.Table - an ordered, heterogeneous collection of named, equal-length columns, where each column is a Numpy masked array. Toyplot data tables are used for internal storage and manipulation by all of the individual types of plot, and can be useful for managing data prior to ingestion into Toyplot.

Be careful not to confuse the data tables described in this section with Table Coordinates, which are used to visualize tabular data.

import toyplot.data
import numpy
table = toyplot.data.Table()
table["x"] = numpy.arange(10)
table["x*2"] = table["x"] * 2
table["x^2"] = table["x"] ** 2
table
xx*2x^2
000
121
244
369
4816
51025
61236
71449
81664
91881

You can see from this small example that Toyplot tables provide automatic pretty-printing when used with Jupyter notebooks, like other Toyplot objects (Jupyter pretty-printing is provided as a convenience - to create tabular data graphics, you will likely want the additional control provided by Table Coordinates).

A Toyplot table behaves like a Python dict that maps column names (keys) to the column values (1D arrays). For example, you assign and access individual columns using normal indexing notation with column names:

print table["x"]
print table["x*2"]
print table["x^2"]
[0 1 2 3 4 5 6 7 8 9]
[ 0  2  4  6  8 10 12 14 16 18]
[ 0  1  4  9 16 25 36 49 64 81]

In addition, the keys(), values(), and items() methods also act like their standard library counterparts, providing column-oriented access to the table contents. However, unlike a normal Python dict, a Toyplot table remembers the order in which columns were added to the table, and always returns them in the same order:

for name in table.keys():
    print name
x
x*2
x^2
for column in table.values():
    print column
[0 1 2 3 4 5 6 7 8 9]
[ 0  2  4  6  8 10 12 14 16 18]
[ 0  1  4  9 16 25 36 49 64 81]
for name, column in table.items():
    print name, column
x [0 1 2 3 4 5 6 7 8 9]
x*2 [ 0  2  4  6  8 10 12 14 16 18]
x^2 [ 0  1  4  9 16 25 36 49 64 81]

That’s all straightforward, but Toyplot tables also behave like Python lists. For example, you can use the normal Python length function to get the number of rows:

len(table)
10

And you can access a single row using its integer index:

table[3]
xx*2x^2
369

And you can retrieve a range of rows using slice notation:

table[2:5]
xx*2x^2
244
369
4816

You can also retrieve a noncontiguous range of rows using Numpy advanced slicing:

table[[1, 3, 4]]
xx*2x^2
121
369
4816

Finally, you can mix both forms of indexing (rows and columns) to retrieve arbitrary subsets of a table:

table[3, "x*2"]
x*2
6
table[3:6, ["x", "x*2"]]
xx*2
36
48
510
table[[1, 3, 4], ["x", "x*2"]]
xx*2
12
36
48

Passing a sequence of column names allows you to reorder the columns in a table if necessary:

table[["x*2", "x"]]
x*2x
00
21
42
63
84
105
126
147
168
189

Note that all of these operations are taking views into the underlying column storage, so no data is copied.

Initialization

In the above example, we created a data table from scratch, adding data one-column-at-a-time to an empty table. However, there are many different ways to create a table. For example, you can pass a dictionary that maps column names to column values:

data = dict()
data["Name"] = ["Tim", "Fred", "Jane"]
data["Age"] = [45, 32, 43]
toyplot.data.Table(data)
AgeName
45Tim
32Fred
43Jane

You can also pass any other object that implements Python’s collections.Mapping protocol. Note that the columns are inserted into the table in alphabetical order, since Python dictionaries / maps do not have an explicit ordering.

If column order matters to you, you can use an instance of collections.OrderedDict instead, which remembers the order in which data was added:

import collections
data = collections.OrderedDict()
data["Name"] = ["Tim", "Fred", "Jane"]
data["Age"] = [45, 32, 43]
toyplot.data.Table(data)
NameAge
Tim45
Fred32
Jane43

Another way to add ordered data to a data table would be to use a sequence of (column name, column values) tuples:

data = [("Name", ["Tim", "Fred", "Jane"]), ("Age", [45, 32, 43])]
toyplot.data.Table(data)
NameAge
Tim45
Fred32
Jane43

Again, you can use any collections.Sequence type, not just a Python list as in this example.

You can also treat any two-dimensional numpy array (matrix) as a table:

data = numpy.array([["Tim", 45],["Fred", 32], ["Jane", 43]])
toyplot.data.Table(data)
01
Tim45
Fred32
Jane43

... Note that the ordering of the matrix columns is retained, and column names are created for you.

You can also convert a Pandas data frame into a table:

import pandas
df = pandas.read_csv("temperatures.csv")
toyplot.data.Table(df)[0:5]
STATIONSTATION_NAMEDATETMAXTMINTOBS
GHCND:USC00294366JEMEZ DAM NM US2013010139-72-67
GHCND:USC00294366JEMEZ DAM NM US201301020-133-133
GHCND:USC00294366JEMEZ DAM NM US2013010311-139-89
GHCND:USC00294366JEMEZ DAM NM US2013010411-139-89
GHCND:USC00294366JEMEZ DAM NM US2013010522-144-111

By default, the data frame index isn’t included in the conversion to a table, but you can override this if you wish:

toyplot.data.Table(df, index=True)[0:5]
index0STATIONSTATION_NAMEDATETMAXTMINTOBS
0GHCND:USC00294366JEMEZ DAM NM US2013010139-72-67
1GHCND:USC00294366JEMEZ DAM NM US201301020-133-133
2GHCND:USC00294366JEMEZ DAM NM US2013010311-139-89
3GHCND:USC00294366JEMEZ DAM NM US2013010411-139-89
4GHCND:USC00294366JEMEZ DAM NM US2013010522-144-111

If you don’t like the auto generated name of the index column, you can provide an alternate name of your own:

toyplot.data.Table(df, index="INDEX")[0:5]
INDEXSTATIONSTATION_NAMEDATETMAXTMINTOBS
0GHCND:USC00294366JEMEZ DAM NM US2013010139-72-67
1GHCND:USC00294366JEMEZ DAM NM US201301020-133-133
2GHCND:USC00294366JEMEZ DAM NM US2013010311-139-89
3GHCND:USC00294366JEMEZ DAM NM US2013010411-139-89
4GHCND:USC00294366JEMEZ DAM NM US2013010522-144-111

Note that hierarchical data frame indices will be converted to multiple table columns.

Demonstration

As a convenience for pedagogical purposes only, Toyplot provides basic functionality to load a table from a CSV file - but please note that Toyplot is emphatically not a data manipulation library! For real work you should use the Python standard library csv module to load data, or functionality provided by libraries such as Numpy or Pandas. In the following example, we will load a set of temperature readings into a data table to use for a visualization:

table = toyplot.data.read_csv("temperatures.csv")
table[0:10]
STATIONSTATION_NAMEDATETMAXTMINTOBS
GHCND:USC00294366JEMEZ DAM NM US2013010139-72-67
GHCND:USC00294366JEMEZ DAM NM US201301020-133-133
GHCND:USC00294366JEMEZ DAM NM US2013010311-139-89
GHCND:USC00294366JEMEZ DAM NM US2013010411-139-89
GHCND:USC00294366JEMEZ DAM NM US2013010522-144-111
GHCND:USC00294366JEMEZ DAM NM US2013010644-122-100
GHCND:USC00294366JEMEZ DAM NM US2013010756-122-11
GHCND:USC00294366JEMEZ DAM NM US20130108100-83-78
GHCND:USC00294366JEMEZ DAM NM US2013010972-83-33
GHCND:USC00294366JEMEZ DAM NM US2013011189-5022

Then, we convert the readings (which are stored as tenths of a degree Celsius) to Fahrenheit:

table["TMAX_F"] = ((table["TMAX"].astype("float64") * 0.1) * 1.8) + 32
table["TMIN_F"] = ((table["TMIN"].astype("float64") * 0.1) * 1.8) + 32

Finally, we can pass table columns directly to Toyplot for plotting:

canvas = toyplot.Canvas(width=600, height=300)
axes = canvas.cartesian(xlabel="Day", ylabel=u"Temperature \u00b0F")
axes.plot(table["TMAX_F"], color="red", stroke_width=1)
axes.plot(table["TMIN_F"], color="blue", stroke_width=1);
0100200300400Day050100Temperature °F