Data Tables¶

Overview¶

Data tables, with rows containing observations and columns containing variables or series, are arguably the cornerstone of science. Much of the functionality of Toyplot or any other plotting package can be reduced to a process of mapping data series from tables to properties like coordinates and colors. To facilitate this, Toyplot provides toyplot.data.Table - an ordered, heterogeneous collection of named, equal-length columns, where each column is a Numpy masked array. Toyplot data tables are used for internal storage and manipulation by all of the individual types of plot, and can be useful for managing data prior to ingestion into Toyplot.

Be careful not to confuse the data tables described in this section with Table Coordinates, which are used to visualize tabular data.

import toyplot.data
import numpy

table = toyplot.data.Table()
table["x"] = numpy.arange(10)
table["x*2"] = table["x"] * 2
table["x^2"] = table["x"] ** 2
table

x	x*2	x^2
0	0	0
1	2	1
2	4	4
3	6	9
4	8	16
5	10	25
6	12	36
7	14	49
8	16	64
9	18	81

You can see from this small example that Toyplot tables provide automatic pretty-printing when used with Jupyter notebooks, like other Toyplot objects (Jupyter pretty-printing is provided as a convenience - to create tabular data graphics, you will likely want the additional control provided by Table Coordinates).

A Toyplot table behaves like a Python dict that maps column names (keys) to the column values (1D arrays). For example, you assign and access individual columns using normal indexing notation with column names:

print(table["x"])
print(table["x*2"])
print(table["x^2"])

[0 1 2 3 4 5 6 7 8 9]
[ 0  2  4  6  8 10 12 14 16 18]
[ 0  1  4  9 16 25 36 49 64 81]

In addition, the keys(), values(), and items() methods also act like their standard library counterparts, providing column-oriented access to the table contents. However, unlike a normal Python dict, a Toyplot table remembers the order in which columns were added to the table, and always returns them in the same order:

for name in table.keys():
    print(name)

x
x*2
x^2

for column in table.values():
    print(column)

[0 1 2 3 4 5 6 7 8 9]
[ 0  2  4  6  8 10 12 14 16 18]
[ 0  1  4  9 16 25 36 49 64 81]

for name, column in table.items():
    print(name, column)

x [0 1 2 3 4 5 6 7 8 9]
x*2 [ 0  2  4  6  8 10 12 14 16 18]
x^2 [ 0  1  4  9 16 25 36 49 64 81]

That’s all straightforward, but Toyplot tables also behave like Python lists. For example, you can use the normal Python length function to get the number of rows:

len(table)

And you can access a single row using its integer index:

table[3]

x	x*2	x^2
3	6	9

And you can retrieve a range of rows using slice notation:

table[2:5]

x	x*2	x^2
2	4	4
3	6	9
4	8	16

You can also retrieve a noncontiguous range of rows using Numpy advanced slicing:

table[[1, 3, 4]]

x	x*2	x^2
1	2	1
3	6	9
4	8	16

Finally, you can mix both forms of indexing (rows and columns) to retrieve arbitrary subsets of a table:

table[3, "x*2"]

x*2
6

table[3:6, ["x", "x*2"]]

x	x*2
3	6
4	8
5	10

table[[1, 3, 4], ["x", "x*2"]]

x	x*2
1	2
3	6
4	8

Passing a sequence of column names allows you to reorder the columns in a table if necessary:

table[["x*2", "x"]]

x*2	x
0	0
2	1
4	2
6	3
8	4
10	5
12	6
14	7
16	8
18	9

Note that all of these operations are taking views into the underlying column storage, so no data is copied.

Initialization¶

In the above example, we created a data table from scratch, adding data one-column-at-a-time to an empty table. However, there are many different ways to create a table. For example, you can pass a dictionary that maps column names to column values:

data = dict()
data["Name"] = ["Tim", "Fred", "Jane"]
data["Age"] = [45, 32, 43]
toyplot.data.Table(data)

Age	Name
45	Tim
32	Fred
43	Jane

You can also pass any other object that implements Python’s collections.abc.Mapping protocol. Note that the columns are inserted into the table in alphabetical order, since Python dictionaries / maps do not have an explicit ordering.

If column order matters to you, you can use an instance of collections.OrderedDict instead, which remembers the order in which data was added:

import collections
data = collections.OrderedDict()
data["Name"] = ["Tim", "Fred", "Jane"]
data["Age"] = [45, 32, 43]
toyplot.data.Table(data)

Name	Age
Tim	45
Fred	32
Jane	43

Another way to add ordered data to a data table would be to use a sequence of (column name, column values) tuples:

data = [("Name", ["Tim", "Fred", "Jane"]), ("Age", [45, 32, 43])]
toyplot.data.Table(data)

Name	Age
Tim	45
Fred	32
Jane	43

Again, you can use any collections.abc.Sequence type, not just a Python list as in this example.

You can also treat any two-dimensional numpy array (matrix) as a table:

data = numpy.array([["Tim", 45],["Fred", 32], ["Jane", 43]])
toyplot.data.Table(data)

0	1
Tim	45
Fred	32
Jane	43

… Note that the ordering of the matrix columns is retained, and column names are created for you.

You can also convert a Pandas data frame into a table:

import pandas
df = pandas.read_csv(toyplot.data.temperatures.path)
toyplot.data.Table(df)[0:5]

STATION	STATION_NAME	DATE	TMAX	TMIN	TOBS
GHCND:USC00294366	JEMEZ DAM NM US	20130101	39	-72	-67
GHCND:USC00294366	JEMEZ DAM NM US	20130102	0	-133	-133
GHCND:USC00294366	JEMEZ DAM NM US	20130103	11	-139	-89
GHCND:USC00294366	JEMEZ DAM NM US	20130104	11	-139	-89
GHCND:USC00294366	JEMEZ DAM NM US	20130105	22	-144	-111

By default, the data frame index isn’t included in the conversion to a table, but you can override this if you wish:

toyplot.data.Table(df, index=True)[0:5]

index0	STATION	STATION_NAME	DATE	TMAX	TMIN	TOBS
0	GHCND:USC00294366	JEMEZ DAM NM US	20130101	39	-72	-67
1	GHCND:USC00294366	JEMEZ DAM NM US	20130102	0	-133	-133
2	GHCND:USC00294366	JEMEZ DAM NM US	20130103	11	-139	-89
3	GHCND:USC00294366	JEMEZ DAM NM US	20130104	11	-139	-89
4	GHCND:USC00294366	JEMEZ DAM NM US	20130105	22	-144	-111

If you don’t like the auto generated name of the index column, you can provide an alternate name of your own:

toyplot.data.Table(df, index="INDEX")[0:5]

INDEX	STATION	STATION_NAME	DATE	TMAX	TMIN	TOBS
0	GHCND:USC00294366	JEMEZ DAM NM US	20130101	39	-72	-67
1	GHCND:USC00294366	JEMEZ DAM NM US	20130102	0	-133	-133
2	GHCND:USC00294366	JEMEZ DAM NM US	20130103	11	-139	-89
3	GHCND:USC00294366	JEMEZ DAM NM US	20130104	11	-139	-89
4	GHCND:USC00294366	JEMEZ DAM NM US	20130105	22	-144	-111

Note that hierarchical data frame indices will be converted to multiple table columns.

Demonstration¶

As a convenience, for pedagogical purposes only, Toyplot provides basic functionality to load a table from a CSV file - but please note that Toyplot is emphatically not a data manipulation library! For real work you should use the Python standard library csv module to load data, or functionality provided by libraries such as Numpy or Pandas. In the following example, we will load a set of temperature readings into a data table to use for a visualization:

table = toyplot.data.read_csv(toyplot.data.temperatures.path)
table[0:10]

STATION	STATION_NAME	DATE	TMAX	TMIN	TOBS
GHCND:USC00294366	JEMEZ DAM NM US	20130101	39	-72	-67
GHCND:USC00294366	JEMEZ DAM NM US	20130102	0	-133	-133
GHCND:USC00294366	JEMEZ DAM NM US	20130103	11	-139	-89
GHCND:USC00294366	JEMEZ DAM NM US	20130104	11	-139	-89
GHCND:USC00294366	JEMEZ DAM NM US	20130105	22	-144	-111
GHCND:USC00294366	JEMEZ DAM NM US	20130106	44	-122	-100
GHCND:USC00294366	JEMEZ DAM NM US	20130107	56	-122	-11
GHCND:USC00294366	JEMEZ DAM NM US	20130108	100	-83	-78
GHCND:USC00294366	JEMEZ DAM NM US	20130109	72	-83	-33
GHCND:USC00294366	JEMEZ DAM NM US	20130111	89	-50	22

Then, we convert the readings (which are stored as tenths of a degree Celsius) to Fahrenheit:

table["TMAX_F"] = ((table["TMAX"].astype("float64") * 0.1) * 1.8) + 32
table["TMIN_F"] = ((table["TMIN"].astype("float64") * 0.1) * 1.8) + 32

Finally, we can pass table columns directly to Toyplot for plotting:

canvas = toyplot.Canvas(width=600, height=300)
axes = canvas.cartesian(xlabel="Day", ylabel=u"Temperature \u00b0F")
axes.plot(table["TMAX_F"], color="red", stroke_width=1)
axes.plot(table["TMIN_F"], color="blue", stroke_width=1);