Data Tables¶
Overview¶
Data tables, with rows containing observations and columns
containing variables or series, are arguably the cornerstone of
science. Much of the functionality of Toyplot or any other plotting
package can be reduced to a process of mapping data series from tables
to properties like coordinates and colors. To facilitate this, Toyplot
provides toyplot.data.Table
- an ordered, heterogeneous
collection of named, equal-length columns, where each column is a
Numpy masked array. Toyplot data tables are used for internal storage
and manipulation by all of the individual types of plot, and can be
useful for managing data prior to ingestion into Toyplot.
Be careful not to confuse the data tables described in this section with Table Coordinates, which are used to visualize tabular data.
import toyplot.data
import numpy
table = toyplot.data.Table()
table["x"] = numpy.arange(10)
table["x*2"] = table["x"] * 2
table["x^2"] = table["x"] ** 2
table
x | x*2 | x^2 |
---|---|---|
0 | 0 | 0 |
1 | 2 | 1 |
2 | 4 | 4 |
3 | 6 | 9 |
4 | 8 | 16 |
5 | 10 | 25 |
6 | 12 | 36 |
7 | 14 | 49 |
8 | 16 | 64 |
9 | 18 | 81 |
You can see from this small example that Toyplot tables provide automatic pretty-printing when used with Jupyter notebooks, like other Toyplot objects (Jupyter pretty-printing is provided as a convenience - to create tabular data graphics, you will likely want the additional control provided by Table Coordinates).
A Toyplot table behaves like a Python dict that maps column names (keys) to the column values (1D arrays). For example, you assign and access individual columns using normal indexing notation with column names:
print(table["x"])
print(table["x*2"])
print(table["x^2"])
[0 1 2 3 4 5 6 7 8 9]
[ 0 2 4 6 8 10 12 14 16 18]
[ 0 1 4 9 16 25 36 49 64 81]
In addition, the keys()
, values()
, and items()
methods also
act like their standard library counterparts, providing column-oriented
access to the table contents. However, unlike a normal Python dict, a
Toyplot table remembers the order in which columns were added to the
table, and always returns them in the same order:
for name in table.keys():
print(name)
x
x*2
x^2
for column in table.values():
print(column)
[0 1 2 3 4 5 6 7 8 9]
[ 0 2 4 6 8 10 12 14 16 18]
[ 0 1 4 9 16 25 36 49 64 81]
for name, column in table.items():
print(name, column)
x [0 1 2 3 4 5 6 7 8 9]
x*2 [ 0 2 4 6 8 10 12 14 16 18]
x^2 [ 0 1 4 9 16 25 36 49 64 81]
That’s all straightforward, but Toyplot tables also behave like Python lists. For example, you can use the normal Python length function to get the number of rows:
len(table)
10
And you can access a single row using its integer index:
table[3]
x | x*2 | x^2 |
---|---|---|
3 | 6 | 9 |
And you can retrieve a range of rows using slice notation:
table[2:5]
x | x*2 | x^2 |
---|---|---|
2 | 4 | 4 |
3 | 6 | 9 |
4 | 8 | 16 |
You can also retrieve a noncontiguous range of rows using Numpy advanced slicing:
table[[1, 3, 4]]
x | x*2 | x^2 |
---|---|---|
1 | 2 | 1 |
3 | 6 | 9 |
4 | 8 | 16 |
Finally, you can mix both forms of indexing (rows and columns) to retrieve arbitrary subsets of a table:
table[3, "x*2"]
x*2 |
---|
6 |
table[3:6, ["x", "x*2"]]
x | x*2 |
---|---|
3 | 6 |
4 | 8 |
5 | 10 |
table[[1, 3, 4], ["x", "x*2"]]
x | x*2 |
---|---|
1 | 2 |
3 | 6 |
4 | 8 |
Passing a sequence of column names allows you to reorder the columns in a table if necessary:
table[["x*2", "x"]]
x*2 | x |
---|---|
0 | 0 |
2 | 1 |
4 | 2 |
6 | 3 |
8 | 4 |
10 | 5 |
12 | 6 |
14 | 7 |
16 | 8 |
18 | 9 |
Note that all of these operations are taking views into the underlying column storage, so no data is copied.
Initialization¶
In the above example, we created a data table from scratch, adding data one-column-at-a-time to an empty table. However, there are many different ways to create a table. For example, you can pass a dictionary that maps column names to column values:
data = dict()
data["Name"] = ["Tim", "Fred", "Jane"]
data["Age"] = [45, 32, 43]
toyplot.data.Table(data)
Age | Name |
---|---|
45 | Tim |
32 | Fred |
43 | Jane |
You can also pass any other object that implements Python’s
collections.abc.Mapping
protocol. Note that the columns are
inserted into the table in alphabetical order, since Python dictionaries
/ maps do not have an explicit ordering.
If column order matters to you, you can use an instance of
collections.OrderedDict
instead, which remembers the order in
which data was added:
import collections
data = collections.OrderedDict()
data["Name"] = ["Tim", "Fred", "Jane"]
data["Age"] = [45, 32, 43]
toyplot.data.Table(data)
Name | Age |
---|---|
Tim | 45 |
Fred | 32 |
Jane | 43 |
Another way to add ordered data to a data table would be to use a sequence of (column name, column values) tuples:
data = [("Name", ["Tim", "Fred", "Jane"]), ("Age", [45, 32, 43])]
toyplot.data.Table(data)
Name | Age |
---|---|
Tim | 45 |
Fred | 32 |
Jane | 43 |
Again, you can use any collections.abc.Sequence
type, not
just a Python list as in this example.
You can also treat any two-dimensional numpy array (matrix) as a table:
data = numpy.array([["Tim", 45],["Fred", 32], ["Jane", 43]])
toyplot.data.Table(data)
0 | 1 |
---|---|
Tim | 45 |
Fred | 32 |
Jane | 43 |
… Note that the ordering of the matrix columns is retained, and column names are created for you.
You can also convert a Pandas data frame into a table:
import pandas
df = pandas.read_csv(toyplot.data.temperatures.path)
toyplot.data.Table(df)[0:5]
STATION | STATION_NAME | DATE | TMAX | TMIN | TOBS |
---|---|---|---|---|---|
GHCND:USC00294366 | JEMEZ DAM NM US | 20130101 | 39 | -72 | -67 |
GHCND:USC00294366 | JEMEZ DAM NM US | 20130102 | 0 | -133 | -133 |
GHCND:USC00294366 | JEMEZ DAM NM US | 20130103 | 11 | -139 | -89 |
GHCND:USC00294366 | JEMEZ DAM NM US | 20130104 | 11 | -139 | -89 |
GHCND:USC00294366 | JEMEZ DAM NM US | 20130105 | 22 | -144 | -111 |
By default, the data frame index isn’t included in the conversion to a table, but you can override this if you wish:
toyplot.data.Table(df, index=True)[0:5]
index0 | STATION | STATION_NAME | DATE | TMAX | TMIN | TOBS |
---|---|---|---|---|---|---|
0 | GHCND:USC00294366 | JEMEZ DAM NM US | 20130101 | 39 | -72 | -67 |
1 | GHCND:USC00294366 | JEMEZ DAM NM US | 20130102 | 0 | -133 | -133 |
2 | GHCND:USC00294366 | JEMEZ DAM NM US | 20130103 | 11 | -139 | -89 |
3 | GHCND:USC00294366 | JEMEZ DAM NM US | 20130104 | 11 | -139 | -89 |
4 | GHCND:USC00294366 | JEMEZ DAM NM US | 20130105 | 22 | -144 | -111 |
If you don’t like the auto generated name of the index column, you can provide an alternate name of your own:
toyplot.data.Table(df, index="INDEX")[0:5]
INDEX | STATION | STATION_NAME | DATE | TMAX | TMIN | TOBS |
---|---|---|---|---|---|---|
0 | GHCND:USC00294366 | JEMEZ DAM NM US | 20130101 | 39 | -72 | -67 |
1 | GHCND:USC00294366 | JEMEZ DAM NM US | 20130102 | 0 | -133 | -133 |
2 | GHCND:USC00294366 | JEMEZ DAM NM US | 20130103 | 11 | -139 | -89 |
3 | GHCND:USC00294366 | JEMEZ DAM NM US | 20130104 | 11 | -139 | -89 |
4 | GHCND:USC00294366 | JEMEZ DAM NM US | 20130105 | 22 | -144 | -111 |
Note that hierarchical data frame indices will be converted to multiple table columns.
Demonstration¶
As a convenience, for pedagogical purposes only, Toyplot provides
basic functionality to load a table from a CSV file - but please note
that Toyplot is emphatically not a data manipulation library! For real
work you should use the Python standard library csv
module to
load data, or functionality provided by libraries such as Numpy
or
Pandas
. In the following example, we will load a set of temperature
readings into a data table to use for a visualization:
table = toyplot.data.read_csv(toyplot.data.temperatures.path)
table[0:10]
STATION | STATION_NAME | DATE | TMAX | TMIN | TOBS |
---|---|---|---|---|---|
GHCND:USC00294366 | JEMEZ DAM NM US | 20130101 | 39 | -72 | -67 |
GHCND:USC00294366 | JEMEZ DAM NM US | 20130102 | 0 | -133 | -133 |
GHCND:USC00294366 | JEMEZ DAM NM US | 20130103 | 11 | -139 | -89 |
GHCND:USC00294366 | JEMEZ DAM NM US | 20130104 | 11 | -139 | -89 |
GHCND:USC00294366 | JEMEZ DAM NM US | 20130105 | 22 | -144 | -111 |
GHCND:USC00294366 | JEMEZ DAM NM US | 20130106 | 44 | -122 | -100 |
GHCND:USC00294366 | JEMEZ DAM NM US | 20130107 | 56 | -122 | -11 |
GHCND:USC00294366 | JEMEZ DAM NM US | 20130108 | 100 | -83 | -78 |
GHCND:USC00294366 | JEMEZ DAM NM US | 20130109 | 72 | -83 | -33 |
GHCND:USC00294366 | JEMEZ DAM NM US | 20130111 | 89 | -50 | 22 |
Then, we convert the readings (which are stored as tenths of a degree Celsius) to Fahrenheit:
table["TMAX_F"] = ((table["TMAX"].astype("float64") * 0.1) * 1.8) + 32
table["TMIN_F"] = ((table["TMIN"].astype("float64") * 0.1) * 1.8) + 32
Finally, we can pass table columns directly to Toyplot for plotting:
canvas = toyplot.Canvas(width=600, height=300)
axes = canvas.cartesian(xlabel="Day", ylabel=u"Temperature \u00b0F")
axes.plot(table["TMAX_F"], color="red", stroke_width=1)
axes.plot(table["TMIN_F"], color="blue", stroke_width=1);