_images/toyplot.png

Visualizing Community Detection in Graphs

The following explores how Toyplot’s graph visualization could be used to support development of a hypothetical new community detection algorithm:

Graph Data

First, we load the structure of a graph from a text file into an \(E \times 2\) matrix containing the source and target vertices for each of \(E\) graph edges:

import numpy

edges = numpy.array([row.split() for row in open("communities.edges")], dtype="int")
edges
array([[ 1,  2],
       [ 1,  3],
       [ 1,  4],
       [ 2,  3],
       [ 2,  4],
       [ 3,  4],
       [ 5,  6],
       [ 5,  7],
       [ 5,  8],
       [ 6,  7],
       [ 6,  8],
       [ 7,  8],
       [ 9, 10],
       [ 9, 11],
       [ 9, 12],
       [10, 11],
       [10, 12],
       [11, 12],
       [ 4,  5],
       [ 8,  9]])

Conveniently, Toyplot can display this matrix directly as a graph using a force-directed layout, inducing the vertex labels from the edge data:

import toyplot
toyplot.graph(edges);
123456789101112

Ground Truth Communities

Next, we load a set of ground truth community assignments from a text file into a \(V \times 2\) matrix containing the vertex ID and community ID for each of \(V\) vertices (note that in this case the assignments are already sorted in ID order, if they hadn’t been, we’d need to sort them explicitly):

ground_truth = numpy.array([row.split() for row in open("communities.ground-truth")], dtype="int")
ground_truth
array([[ 1,  1],
       [ 2,  1],
       [ 3,  1],
       [ 4,  1],
       [ 5,  2],
       [ 6,  2],
       [ 7,  2],
       [ 8,  2],
       [ 9,  3],
       [10,  3],
       [11,  3],
       [12,  3]])

We want to color each graph vertex by its community ID, so we create an appropriate categorical colormap:

colormap = toyplot.color.brewer.map("Set2")
colormap

Now we can rerender the graph, overriding the default edge color, vertex size, vertex label style, and vertex color:

ecolor = "lightgray"
vlstyle = {"fill":"white"}
vcolor = ground_truth[:,1]
toyplot.graph(edges, ecolor=ecolor, vsize=20, vlstyle=vlstyle, vcolor=(vcolor, colormap));
123456789101112

Assigned Communities

Next, we render the graph again using a different set of community ID values, generated using a hypothetical community detection algorithm that we are working on:

assigned = numpy.array([row.split() for row in open("communities.assigned")], dtype="int")
assigned
array([[ 1,  4],
       [ 2,  1],
       [ 3,  1],
       [ 4,  1],
       [ 5,  2],
       [ 6,  2],
       [ 7,  2],
       [ 8,  2],
       [ 9,  2],
       [10,  2],
       [11,  2],
       [12,  2]])
vcolor = assigned[:,1]
toyplot.graph(edges, ecolor=ecolor, vsize=20, vlstyle=vlstyle, vcolor=(vcolor, colormap));
123456789101112

We want to compare the ground truth communities with the communities assigned by our algorithm, but there are two challenges:

  • Flipping back-and-forth between the two visualizations and trying to understand what changed is extremely difficult.
  • Even if the algorithm we’re testing arrives at identical community memberships across the graph, it is unlikely that it would happen to assign the same ID for each community ... so even in the best case, the vertex colors won’t match the ground truth.

What we need is a way to highlight the differences between the two sets of communities, so we only have to look at a single visualization. The following is suggested by Jon Berry at Sandia National Laboratories:

Highlighting Edge Types

We will render the graph using the vertex colors assigned by the algorithm, but with the edges colored using a set of rules that highlight the differences between the ground truth and assigned communities:

  • Edges that are intra-community in both ground truth and assigned communities remain light gray.
  • Edges that are inter-community in both ground truth and assigned communities remain light gray.
  • Edges that are intra-community in the ground truth but inter-community for the assigned communities are displayed red (i.e. the assignment split up a ground truth community).
  • Edges that are inter-community in the ground truth but intra-community for the assigned communities are displayed green (i.e. the assignment merged two ground-truth communities).
ground_truth_map = dict(ground_truth.tolist())
assigned_map = dict(assigned.tolist())

ground_truth_source = [ground_truth_map[vertex] for vertex in edges[:,0]]
ground_truth_target = [ground_truth_map[vertex] for vertex in edges[:,1]]

assigned_source = [assigned_map[vertex] for vertex in edges[:,0]]
assigned_target = [assigned_map[vertex] for vertex in edges[:,1]]
# Default all edges to light gray.
ecolor = numpy.repeat("lightgray", len(edges))

selection = numpy.equal(ground_truth_source, ground_truth_target) & numpy.not_equal(assigned_source, assigned_target)
ecolor[selection] = "red"

selection = numpy.not_equal(ground_truth_source, ground_truth_target) & numpy.equal(assigned_source, assigned_target)
ecolor[selection] = "green"

canvas, axes, mark = toyplot.graph(edges, ecolor=ecolor, vsize=20, vlstyle=vlstyle, vcolor=(vcolor, colormap));
123456789101112

Now, we have a single visualization that emphasizes the differences between the two sets of community assignments, and we can clearly see the behavior of the tested algorithm: the green edge connecting vertices 8 and 9 indicate that two ground truth communities were merged together into one, while the red edges adjacent to vertex 1 show that it was “split” from its neighbors into its own community.

Now that that’s taken care of, we can create a publication-quality PDF version of the plot for incorporation into a paper:

import toyplot.pdf
toyplot.pdf.render(canvas, "communities.pdf")