Visualizing Community Detection in Graphs¶

The following explores how Toyplot’s graph visualization could be used to support development of a hypothetical new community detection algorithm:

Graph Data¶

First, we will load some sample graph data provided by Toyplot in three parts. The first part is an \(E \times 2\) matrix containing the source and target vertices for each of \(E\) graph edges:

import toyplot.data

edges, truth, assigned = toyplot.data.communities()
edges

array([[ 1,  2],
       [ 1,  3],
       [ 1,  4],
       [ 2,  3],
       [ 2,  4],
       [ 3,  4],
       [ 5,  6],
       [ 5,  7],
       [ 5,  8],
       [ 6,  7],
       [ 6,  8],
       [ 7,  8],
       [ 9, 10],
       [ 9, 11],
       [ 9, 12],
       [10, 11],
       [10, 12],
       [11, 12],
       [ 4,  5],
       [ 8,  9]])

Conveniently, Toyplot can display this matrix directly as a graph using a force-directed layout, and inducing the vertex labels from the edge data:

toyplot.graph(edges);

Ground Truth Communities¶

The second part of the data we loaded is a set of ground truth community assignments in the form of a \(V \times 2\) matrix containing the vertex ID and community ID for each of \(V\) vertices (note that in this case the assignments are already sorted in ID order, if they hadn’t been, we’d need to sort them explicitly):

truth

array([[ 1,  1],
       [ 2,  1],
       [ 3,  1],
       [ 4,  1],
       [ 5,  2],
       [ 6,  2],
       [ 7,  2],
       [ 8,  2],
       [ 9,  3],
       [10,  3],
       [11,  3],
       [12,  3]])

We want to color each graph vertex by its community ID, so we create an appropriate categorical colormap:

colormap = toyplot.color.brewer.map("Set2")
colormap

Now we can rerender the graph, overriding the default edge color, vertex size, vertex label style, and vertex color:

ecolor = "lightgray"
vlstyle = {"fill":"white"}
vcolor = truth[:,1]
toyplot.graph(edges, ecolor=ecolor, vsize=20, vlstyle=vlstyle, vcolor=(vcolor, colormap));

Assigned Communities¶

Next, we render the graph again using the third part of the loaded data: a different set of community ID values, generated using a hypothetical community detection algorithm that we are working on:

assigned

array([[ 1,  4],
       [ 2,  1],
       [ 3,  1],
       [ 4,  1],
       [ 5,  2],
       [ 6,  2],
       [ 7,  2],
       [ 8,  2],
       [ 9,  2],
       [10,  2],
       [11,  2],
       [12,  2]])

vcolor = assigned[:,1]
toyplot.graph(edges, ecolor=ecolor, vsize=20, vlstyle=vlstyle, vcolor=(vcolor, colormap));

We want to compare the ground truth communities with the communities assigned by our algorithm, but there are two challenges:

Flipping back-and-forth between the two visualizations and trying to understand what changed is extremely difficult.
Even if the algorithm we’re testing arrives at identical community memberships across the graph, it is unlikely that it would happen to assign the same ID for each community … so even in the best case, the vertex colors won’t match the ground truth.

What we need is a way to highlight the differences between the two sets of communities, so we only have to look at a single visualization. The following is suggested by Jon Berry at Sandia National Laboratories:

Highlighting Edge Types¶

We will render the graph using the vertex colors assigned by the algorithm, but with the edges colored using a set of rules that highlight the differences between the ground truth and assigned communities:

Edges that are intra-community in both ground truth and assigned communities remain light gray.
Edges that are inter-community in both ground truth and assigned communities remain light gray.
Edges that are intra-community in the ground truth but inter-community for the assigned communities are displayed red (i.e. the assignment split up a ground truth community).
Edges that are inter-community in the ground truth but intra-community for the assigned communities are displayed green (i.e. the assignment merged two ground-truth communities).

truth_map = dict(truth.tolist())
assigned_map = dict(assigned.tolist())

truth_source = [truth_map[vertex] for vertex in edges[:,0]]
truth_target = [truth_map[vertex] for vertex in edges[:,1]]

assigned_source = [assigned_map[vertex] for vertex in edges[:,0]]
assigned_target = [assigned_map[vertex] for vertex in edges[:,1]]

import numpy

# Default all edges to light gray.
ecolor = numpy.repeat("lightgray", len(edges))

selection = numpy.equal(truth_source, truth_target) & numpy.not_equal(assigned_source, assigned_target)
ecolor[selection] = "red"

selection = numpy.not_equal(truth_source, truth_target) & numpy.equal(assigned_source, assigned_target)
ecolor[selection] = "green"

canvas, axes, mark = toyplot.graph(edges, ecolor=ecolor, vsize=20, vlstyle=vlstyle, vcolor=(vcolor, colormap));

Now, we have a single visualization that emphasizes the differences between the two sets of community assignments, and we can clearly see the behavior of the tested algorithm: the green edge connecting vertices 8 and 9 indicate that two ground truth communities were merged together into one, while the red edges adjacent to vertex 1 show that it was “split” from its neighbors into its own community.

Now that that’s taken care of, we can create a publication-quality PDF version of the plot for incorporation into a paper:

import toyplot.pdf
toyplot.pdf.render(canvas, "communities.pdf")