Data Tables¶
Overview¶
Data tables, with rows containing observations and columns containing variables or series, are arguably the cornerstone of science. Much of the functionality of Toyplot or any other plotting package can be reduced to a process of mapping data series from tables to properties like coordinates and colors. To facilitate this, Toyplot provides toyplot.data.Table
- an ordered, heterogeneous collection of named, equal-length columns, where each column is a Numpy masked array. Toyplot data tables are used for internal storage and manipulation by all of the individual types of plot, and can be useful for managing data prior to ingestion into Toyplot.
Be careful not to confuse the data tables described in this section with Table Coordinates, which are used to visualize tabular data.
[1]:
import toyplot.data
import numpy
[2]:
table = toyplot.data.Table()
table["x"] = numpy.arange(10)
table["x*2"] = table["x"] * 2
table["x^2"] = table["x"] ** 2
table
[2]:
x | x*2 | x^2 |
---|---|---|
0 | 0 | 0 |
1 | 2 | 1 |
2 | 4 | 4 |
3 | 6 | 9 |
4 | 8 | 16 |
5 | 10 | 25 |
6 | 12 | 36 |
7 | 14 | 49 |
8 | 16 | 64 |
9 | 18 | 81 |
You can see from this small example that Toyplot tables provide automatic pretty-printing when used with Jupyter notebooks, like other Toyplot objects (Jupyter pretty-printing is provided as a convenience - to create tabular data graphics, you will likely want the additional control provided by Table Coordinates).
A Toyplot table behaves like a Python dict that maps column names (keys) to the column values (1D arrays). For example, you assign and access individual columns using normal indexing notation with column names:
[3]:
print(table["x"])
print(table["x*2"])
print(table["x^2"])
[0 1 2 3 4 5 6 7 8 9]
[ 0 2 4 6 8 10 12 14 16 18]
[ 0 1 4 9 16 25 36 49 64 81]
In addition, the keys()
, values()
, and items()
methods also act like their standard library counterparts, providing column-oriented access to the table contents. However, unlike a normal Python dict, a Toyplot table remembers the order in which columns were added to the table, and always returns them in the same order:
[4]:
for name in table.keys():
print(name)
x
x*2
x^2
[5]:
for column in table.values():
print(column)
[0 1 2 3 4 5 6 7 8 9]
[ 0 2 4 6 8 10 12 14 16 18]
[ 0 1 4 9 16 25 36 49 64 81]
[6]:
for name, column in table.items():
print(name, column)
x [0 1 2 3 4 5 6 7 8 9]
x*2 [ 0 2 4 6 8 10 12 14 16 18]
x^2 [ 0 1 4 9 16 25 36 49 64 81]
That’s all straightforward, but Toyplot tables also behave like Python lists. For example, you can use the normal Python length function to get the number of rows:
[7]:
len(table)
[7]:
10
And you can access a single row using its integer index:
[8]:
table[3]
[8]:
x | x*2 | x^2 |
---|---|---|
3 | 6 | 9 |
And you can retrieve a range of rows using slice notation:
[9]:
table[2:5]
[9]:
x | x*2 | x^2 |
---|---|---|
2 | 4 | 4 |
3 | 6 | 9 |
4 | 8 | 16 |
You can also retrieve a noncontiguous range of rows using Numpy advanced slicing:
[10]:
table[[1, 3, 4]]
[10]:
x | x*2 | x^2 |
---|---|---|
1 | 2 | 1 |
3 | 6 | 9 |
4 | 8 | 16 |
Finally, you can mix both forms of indexing (rows and columns) to retrieve arbitrary subsets of a table:
[11]:
table[3, "x*2"]
[11]:
x*2 |
---|
6 |
[12]:
table[3:6, ["x", "x*2"]]
[12]:
x | x*2 |
---|---|
3 | 6 |
4 | 8 |
5 | 10 |
[13]:
table[[1, 3, 4], ["x", "x*2"]]
[13]:
x | x*2 |
---|---|
1 | 2 |
3 | 6 |
4 | 8 |
Passing a sequence of column names allows you to reorder the columns in a table if necessary:
[14]:
table[["x*2", "x"]]
[14]:
x*2 | x |
---|---|
0 | 0 |
2 | 1 |
4 | 2 |
6 | 3 |
8 | 4 |
10 | 5 |
12 | 6 |
14 | 7 |
16 | 8 |
18 | 9 |
Note that all of these operations are taking views into the underlying column storage, so no data is copied.
Initialization¶
In the above example, we created a data table from scratch, adding data one-column-at-a-time to an empty table. However, there are many different ways to create a table. For example, you can pass a dictionary that maps column names to column values:
[15]:
data = dict()
data["Name"] = ["Tim", "Fred", "Jane"]
data["Age"] = [45, 32, 43]
toyplot.data.Table(data)
[15]:
Age | Name |
---|---|
45 | Tim |
32 | Fred |
43 | Jane |
You can also pass any other object that implements Python’s :class:collections.abc.Mapping
protocol. Note that the columns are inserted into the table in alphabetical order, since Python dictionaries / maps do not have an explicit ordering.
If column order matters to you, you can use an instance of :class:collections.OrderedDict
instead, which remembers the order in which data was added:
[16]:
import collections
data = collections.OrderedDict()
data["Name"] = ["Tim", "Fred", "Jane"]
data["Age"] = [45, 32, 43]
toyplot.data.Table(data)
[16]:
Name | Age |
---|---|
Tim | 45 |
Fred | 32 |
Jane | 43 |
Another way to add ordered data to a data table would be to use a sequence of (column name, column values) tuples:
[17]:
data = [("Name", ["Tim", "Fred", "Jane"]), ("Age", [45, 32, 43])]
toyplot.data.Table(data)
[17]:
Name | Age |
---|---|
Tim | 45 |
Fred | 32 |
Jane | 43 |
Again, you can use any :class:collections.abc.Sequence
type, not just a Python list as in this example.
You can also treat any two-dimensional numpy array (matrix) as a table:
[18]:
data = numpy.array([["Tim", 45],["Fred", 32], ["Jane", 43]])
toyplot.data.Table(data)
[18]:
0 | 1 |
---|---|
Tim | 45 |
Fred | 32 |
Jane | 43 |
… Note that the ordering of the matrix columns is retained, and column names are created for you.
You can also convert a Pandas data frame into a table:
[19]:
import pandas
df = pandas.read_csv(toyplot.data.temperatures.path)
toyplot.data.Table(df)[0:5]
[19]:
STATION | STATION_NAME | DATE | TMAX | TMIN | TOBS |
---|---|---|---|---|---|
GHCND:USC00294366 | JEMEZ DAM NM US | 20130101 | 39 | -72 | -67 |
GHCND:USC00294366 | JEMEZ DAM NM US | 20130102 | 0 | -133 | -133 |
GHCND:USC00294366 | JEMEZ DAM NM US | 20130103 | 11 | -139 | -89 |
GHCND:USC00294366 | JEMEZ DAM NM US | 20130104 | 11 | -139 | -89 |
GHCND:USC00294366 | JEMEZ DAM NM US | 20130105 | 22 | -144 | -111 |
By default, the data frame index isn’t included in the conversion to a table, but you can override this if you wish:
[20]:
toyplot.data.Table(df, index=True)[0:5]
[20]:
index0 | STATION | STATION_NAME | DATE | TMAX | TMIN | TOBS |
---|---|---|---|---|---|---|
0 | GHCND:USC00294366 | JEMEZ DAM NM US | 20130101 | 39 | -72 | -67 |
1 | GHCND:USC00294366 | JEMEZ DAM NM US | 20130102 | 0 | -133 | -133 |
2 | GHCND:USC00294366 | JEMEZ DAM NM US | 20130103 | 11 | -139 | -89 |
3 | GHCND:USC00294366 | JEMEZ DAM NM US | 20130104 | 11 | -139 | -89 |
4 | GHCND:USC00294366 | JEMEZ DAM NM US | 20130105 | 22 | -144 | -111 |
If you don’t like the auto generated name of the index column, you can provide an alternate name of your own:
[21]:
toyplot.data.Table(df, index="INDEX")[0:5]
[21]:
INDEX | STATION | STATION_NAME | DATE | TMAX | TMIN | TOBS |
---|---|---|---|---|---|---|
0 | GHCND:USC00294366 | JEMEZ DAM NM US | 20130101 | 39 | -72 | -67 |
1 | GHCND:USC00294366 | JEMEZ DAM NM US | 20130102 | 0 | -133 | -133 |
2 | GHCND:USC00294366 | JEMEZ DAM NM US | 20130103 | 11 | -139 | -89 |
3 | GHCND:USC00294366 | JEMEZ DAM NM US | 20130104 | 11 | -139 | -89 |
4 | GHCND:USC00294366 | JEMEZ DAM NM US | 20130105 | 22 | -144 | -111 |
Note that hierarchical data frame indices will be converted to multiple table columns.
Demonstration¶
As a convenience, for pedagogical purposes only, Toyplot provides basic functionality to load a table from a CSV file - but please note that Toyplot is emphatically not a data manipulation library! For real work you should use the Python standard library csv
module to load data, or functionality provided by libraries such as Numpy or Pandas. In the following example, we will load a set of temperature readings into a data table to use for a visualization:
[22]:
table = toyplot.data.read_csv(toyplot.data.temperatures.path)
table[0:10]
[22]:
STATION | STATION_NAME | DATE | TMAX | TMIN | TOBS |
---|---|---|---|---|---|
GHCND:USC00294366 | JEMEZ DAM NM US | 20130101 | 39 | -72 | -67 |
GHCND:USC00294366 | JEMEZ DAM NM US | 20130102 | 0 | -133 | -133 |
GHCND:USC00294366 | JEMEZ DAM NM US | 20130103 | 11 | -139 | -89 |
GHCND:USC00294366 | JEMEZ DAM NM US | 20130104 | 11 | -139 | -89 |
GHCND:USC00294366 | JEMEZ DAM NM US | 20130105 | 22 | -144 | -111 |
GHCND:USC00294366 | JEMEZ DAM NM US | 20130106 | 44 | -122 | -100 |
GHCND:USC00294366 | JEMEZ DAM NM US | 20130107 | 56 | -122 | -11 |
GHCND:USC00294366 | JEMEZ DAM NM US | 20130108 | 100 | -83 | -78 |
GHCND:USC00294366 | JEMEZ DAM NM US | 20130109 | 72 | -83 | -33 |
GHCND:USC00294366 | JEMEZ DAM NM US | 20130111 | 89 | -50 | 22 |
Then, we convert the readings (which are stored as tenths of a degree Celsius) to Fahrenheit:
[23]:
table["TMAX_F"] = ((table["TMAX"].astype("float64") * 0.1) * 1.8) + 32
table["TMIN_F"] = ((table["TMIN"].astype("float64") * 0.1) * 1.8) + 32
Finally, we can pass table columns directly to Toyplot for plotting:
[24]:
canvas = toyplot.Canvas(width=600, height=300)
axes = canvas.cartesian(xlabel="Day", ylabel=u"Temperature \u00b0F")
axes.plot(table["TMAX_F"], color="red", stroke_width=1)
axes.plot(table["TMIN_F"], color="blue", stroke_width=1);