Thursday, January 22, 2015

Spectral Graph Drawing: The Basics

Okay so it's been a ridiculously long time since I've posted anything. Yes I still love graphs and of course I still love graph drawing. The latter is what I have, in particular, been focusing on as of late. The particular topic that's been getting my undivided attention is spectral graph drawing. Under this approach the positions of a graph's nodes in 1, 2 or 3 dimensions are determined by the entries of certain eigenvectors of a graph related matrix. Many different matrices can be used for this purpose, though at the most basic level one utilizes the eigenvectors of the so-called Laplacian matrix of a graph. 

For starters what exactly is a graph's Laplacian? Well if one was to define D to be the diagonal matrix with a graph's degree sequence along the main diagonal and A as the typical adjacency matrix of the graph, then the Laplacian L is defined to be the difference L = D-A. It is generally assumed in the literature that the graph being visualized is undirected, and lacks loops and multiple edges. We can assume, however, that the edges of the graph are weighted, though only with nonnegative real numbers. 

Besides its cool name, the Laplacian receives a lot of interest in spectral drawing research because of several desirable properties that it has. I'll summarize these without proofs below as a 'Theorem'. 

Theorem: Let G be a graph on n nodes as described thus far and let L be its Laplacian matrix: 
a. L is symmetric positive semi definite
b. L has eigenvalue 0 with the corresponding eigenvector of all 1's
c. The multiplicity of the eigenvalue 0 is equal to the number of connected components in the graph. 


Positive semi-definite means that if you take any vector x with n entries and calculate the product x*Lx (where x* denotes the transpose of x) then the result will be a nonnegative number. Symmetric means that L and its transpose are the same. There happens to be a very nice result in matrix theory which says that the eigenvalues of a symmetric positive semi-definite matrix are all real and nonnegative. Thus all the eigenvalues of the Laplacian of an undirected graph with nonnegative weights are nonnegative real numbers. The eigenvectors of L also have the nice property that if you take any two eigenvectors xi and xj corresponding to different eigenvalues they will be orthogonal to each other (i.e. the dot product of xi and xj will come out to zero). :-0

Righteous! Okay. So up to this point we've seen something called the Laplacian of a graph and some technical properties of it that are supposedly nice, though things are still looking pretty abstract and the connection to graph drawing is not readily apparent. The thing or rather equation that helps to tie all of these loose strings together is what has come to be called Hall's Energy function. The function is named after Kenneth Hall. He was the first to suggest the utilization of graph spectra for drawing purposes and modeled the problem of visualizing a graph in one dimension by the following set of equations: 
The minimization is of Hall's energy function described in the first line. It is taken over all real n dimensional vectors x. The second equation puts a normalization constraint on the solution to ensure that x has unit norm and the area of the drawing remains compact. The final equation asks that your solution x be orthogonal to the vector of all one's. This is to avoid the trivial solution of placing all nodes on the same position of the real number line. As this suggests the minimizing vector will determine the x coordinates of the graph's nodes. 


To continue with the hand waving, this won't be proved here, but one can also show that the critical points for minimizing the energy of Hall's function are exactly the eigenvectors of L. It's pretty simple to prove using substitution into x*Lx that the resulting output values will be the corresponding eigenvalues. The absolute minimum of the function of course occurs when all xi are equal, but again that would pack all nodes into the same spot and would do nothing for us in terms of revealing information on the graph's structure. The best one can do then is to take the second best solution. This occurs at the eigenvector for the second smallest nonzero eigenvalue, call it x2 (it will be orthogonal to the all one's vector because of the pairwise orthogonality for eigenvectors of symmetric matrices). Turns out it has a very special name: the Fiedler vector ;D 

The above derivation can be generalized to 2, 3 or even k dimensions. For the case of 2 dimensions Hall describes two sets of equations where the second is much like the first from above. The only difference is that instead of a vector x we are looking for another one, call it x3, which satisfies the same constraints as above but is also orthogonal to x2. The result will be the third smallest nonzero eigenvector. x2 and x3 together will give the x and y coordinates of our graph's vertices. 

Phew! That was a lot of setup (mainly because of the underlying theoretical justification). The very nice thing about all of this eigenvector business though is that the graph drawing can be computed quickly and the result is exact and globally optimal. This is in contrast to other graph drawing problems which tend to be very VERY difficult (in the NP Hard class) and which often require inexact heuristics and suboptimal solutions. It also has a natural interpretation. Nodes that are more related to each other and have high edge weight should be placed closer together so that their coordinate differences contribute less to the sum. Those that are less related with smaller edge weights can be placed further apart without increasing the sum by a significant amount. 

This latter point also highlights a potential drawback, however, and that is the tendency of spectral drawings obtained through this method to smoosh clusters of highly related nodes into small spaces of the total drawing area and placing outlier nodes on the outer edges. When presented with a drawing like this it is difficult to make out much of the graph's structure, especially in the highly clustered regions. If all one wants to know is the existence of said clusters and doesn't care much for the finer details then that's all okay, but for those who want to gain a deeper insight into a graph's structure they may need to use more sophisticated spectral techniques. 


I will end it all here for the time being and move on with an example of a spectral graph visualization. The more sophisticated spectral drawing techniques will be reserved in later posts. 

What you see below is a graph visualization obtained using the two smallest nonzero Laplacian eigenvectors for a dolphin social network in New Zealand :-). 62 dolphins in total were recorded and an undirected social network was created to model their frequent associations with each other. Each dolphin was given its own name, but to clean up the visualization I decided to omit them here. The data comes from: D. Lusseau, K. Schneider, O.J. Boisseau, P. Haase, E. Slooten and S.M. Dawson, Behavioral Ecology and Sociobiology 54, 396-405 (2003). The actual visualization was produced by the draw_spectral method in python's networkx module. 

It's clear from the drawing that there are some cliques within the dolphin network. The details of the one at top right are easier to make out and there are some definite outliers with few associations like the ones at bottom right. However, the clique to the top left is much more collapsed and spiked in appearance. To make out the individual associations here better one would need a more sophisticated algorithm, and perhaps one with good zooming in features. These, as I said before, will be the topic of future posts, so if spectral graph drawing floats your boat, stay tuned! :-D