Wednesday, March 4, 2015

Drawing Bipartite Graphs: A Genetic Algorithmic Approach

It was somewhat recently in a course I'm taking that the basics

of so-called genetic algorithms were discussed. These "genetic" algorithms are ones that have been designed to mimic the various steps of natural selection. That is, a genetic algorithm (GA for short) consists of the following steps and components.  

1. Initialization: Start the process with some initial population, call it P0. The fitness of each individual in P0 is measured by some fitness evaluation function.  

2. Breeding: From P0 pairs of individuals are selected (usually with some probability relevant to their fitness values), to breed and produce offspring for the next generation. 

3. Genetic Operators: During the breeding phase of the algorithm,genetic material from selected parents undergo crossover and mutation operations. The crossover operation combines portions of the parent's genetic material into one genome, and mutation then randomly changes portions of it. The final result is the genome of a child from the pair. 

4. Selection: Following the breeding of pairs from P0, the fitness values of all individuals (children and parents) are evaluated for selection for the next generation. Typically the ones with the highest fitness values are chosen, with some selection rules also dictating that portions of the previous population also be included in subsequent iterations of the algorithm. 

Following the selection of a new (and hopefully fitter) population in step 4, steps 2 through 4 are repeated until some termination condition is reached (e.g. some predefined number of steps has passed or the average population fitness has stabilized). 

So that is the basic outline of a genetic algorithm. Why am I talking about all of this here? Well apart from it being inherently interesting I was amazed to hear that the approach has been applied to areas of graph drawing. :-0 (that was my face when I found that out). I'm always excited to learn about new graph drawing algorithms and even more so new areas of graph drawing, so you can probably imagine I was pretty stoked about all of this. Researching the topic a little bit more I discovered that quite a few articles had been written about it, with algorithms designed for drawing undirected, orthogonal, directed, bipartite, and lattice graphs. I'm still in the process of skimming over all of the articles I found, though one area of genetic graph drawing that appears promising to me is in the drawing of bipartite graphs. For the remainder of this post I've decided to summarize one such algorithm by Makinen and Sieranta from their article "Genetic Algorithms for Drawing Bipartite Graphs". It's very interesting and will also give a better idea through example of what genetic algorithms are like. 

Bipartite graphs are traditionally drawn in a layered fashion, with each vertex subset assigned to its own (usually horizontal) layer. Since the layers determine the y coordinates of the graph's nodes beforehand, what remains is to assign x coordinates to the vertices in a way that makes the drawing as clear as possible, typically by minimizing the number of edge crossings. One key observation relating to this is the fact that the number of edge crossings in a bipartite graph drawing depends on the ordering of nodes within layers rather than on the actual assigned x coordinates. Thus the problem of drawing bipartite graphs reduces to a permutation problem of nodes within layers that minimizes edge crossings. Vertices can be permuted in both layers to achieve this, though some algorithms take the alternative approach of fixing  the x coordinates in one layer and permuting the vertices in the other layer (referred to as the level permutation problem or the one sided crossing minimization problem). Both approaches to permuting layers to minimize edge crossings turn out to be NP-Hard, so heuristics are required to approximate solutions to them. Makinen and Sieranta take the approach of permuting a single layer in their algorithm. 

Makinen and Sieranta assume at the beginning that their graphs have equal sized vertex subsets.That is, given a bipartite graph G  with disjoint vertex sets V1 and V2, it is assumed throughout their paper that |V1| = |V2| = n. Here n is some positive integer called the size of the graph. The maximum number of edges that can occur in such a graph of size n is n*n = n^2. Letting E denote the edge set of G the authors define the edge density of G to be |E|/(n^2). What follows now are the key components to their genetic algorithm.  

1. Chromosome representation: Each individual is represented as a permutation of the numbers 1 through n, indicating an ordering of the graph's nodes within the layer that is to be rearranged. So for example suppose that the graph size is 3 and that the vertices of V2 = {v1, v2, v3} are to be permuted. A permutation written as (3,1,2) means that v3 has an x coordinate of 1, v1 an x coordinate of 2, and v2 an x coordinate of 3 (i.e. 3 goes to 1, 1 goes to 2, and 2 goes to 3).

2. Evaluation Function: the fitness of an individual is evaluated as its total number of edge crossings in the corresponding bipartite graph drawing. The desire is to have a drawing with fewer edge crossings and less clutter, so a permutation of the non-fixed layer is considered more fit in this context if it induces fewer edge crossings. 

3. Population Size: Through experiment (details omitted here) the authors found an optimal population size of 50. It is important to choose a good population size since one that is too small may not be representative of the space of potential solutions while one that is too large can hinder the search process of the algorithm. 

4. Crossover Operation: This operation takes two individuals x and y with permutations x = (x1, x2,...,xn) and y = (y1, y2,...,yn), and uses them to produce two new offspring. First, two random integers i and j with i < j are generated from the range [1...n]. Next the segments xi,...,xj and yi,...yj are retrieved from each parent. Each segment occupies positions i through j in the permutations of the generated children. The remaining permutation positions are augmented by numbers from the set {1,...,n} that haven't yet been included. They are listed in the order that they appear in the other parent. So for instance the child that has a segment from x is augmented by the remaining positive integers in the order that they appear from left to right in y. 

The authors illustrate this operation through the following example. Suppose that our parents are x = (1,2,3,4,5,6) and y = (2,4,5,1,6,3). Also assume that we have generated the random integers i = 3 and j = 5. Our x and y segments will be (3,4,5) and (5,1,6) respectively. The augmented x segment will become (2,1,3,4,5,6). The other child will be (2,3,5,1,6,4). 

5. Mutation Operation: Suppose that we are given a permutation x = (x1,x2,...,xn). For each xi a random real number r from [0...1] is generated and compared to some fixed mutation rate, call it r'. If r < r', then a mutation will proceed as follows. Suppose that we have generated a random real r that is less than r' for xi. Generate a random integer j from [1...n]. If i < j then move xi to position xj and shift all elements xk where k = i+1,...,j one position to the left. If i > j then move xi to the position of xj and shift all all elements xk with k = j,...,i-1 one position to the right. The authors suggest using a mutation rate of r' = 0.03 (derivation details omitted here). 

6. Selection Policy: The authors found the following selection/breeding policy to be the most helpful (here they are combining the two genetic algorithm steps into one). First, pick the fittest individual in the current population (this requires counting the number of edge crossings in all of the induced bipartite drawings). This individual will be included in the next generation. For the remainder of the population, create two subpopulations A and B. A (respectively B) consists of individuals with a crossing number that is below (respectively above) the average crossing number of the whole population. Randomly select one parent from A, one from B, and apply crossover and mutation to produce two children for the next generation. Repeat this step on sets A and B until a new population size of 50 is created. 

Thus the algorithm of Makinen and Sieranta begins with an initial population P0 of permutations (all corresponding to the same layer that is to be rearranged for the graph drawing), and applies the selection/breeding policy to produce a new population P1. This policy is again applied to P1, and continues iteratively until some termination condition is reached. Going through the paper I didn't see any fixed termination condition; it seemed to vary according to the graph size. For smaller graphs of size 15 (remember that this means 15 nodes in each layer) it appears that they terminated their genetic algorithm after a little over 60 iterations. For larger graphs of size 20 and 30 they specified using at least 100 generations (i.e. iterations of the algorithm), and stopping after 50 consecutive generations showed no improvements in terms of edge crossings. 

The authors did find favorable results with their algorithm. They compared their results to the classical barycenter heuristic (BC for short), which handles one sided permutation by moving a node in say V2 (the layer to be permuted) to the average x coordinate of its neighbors in V1. Specifically they found the following results: 

1. Comparing best results, they found their GA to perform as much as 2% better than BC on graphs of size 15 with 10% density (i.e. a low density). The results were about the same as BC with higher density (around 50%) size 15 graphs. 

2. For graphs of size 20 and density 30% 100 random bipartite graphs were generated. After running at least 100 iterations of the genetic algorithm until 50 consecutive generations showed no difference in total edge crossings, they found the average difference of their output compared to that generated by BC to be 0.3%. For graphs of size 30 and density 30% the average difference between their GA and BC was 0.5%.

Several comments should be made regarding these results, as some of these numbers may not appear that impressive at first glance. For starters, a graph size of 15, 20 or even 30 might not sound very large, though a lot of the bipartite subgraphs of multi-layered graphs that I've encountered haven't been very large, so if that is representative of bipartite graphs and mult-layered graphs in general then it really isn't such a bad thing that the graphs considered in the study are small.  

Also, a 2% difference in results may not sound very large, let alone differences of 0.5 or 0.3 percent. Remember though that this difference is being taken over the number of edge crossings, which can get quite large as the underlying combinatorics of the bipartite graph become more complicated. Thus even a small percentage difference can correspond to quite a few edge crossings and visually make quite the difference in the graph drawing's appearance. 

So, punchline is that the GA, which was quite simple, did in fact compare favorably to a well known heuristic in one sided crossing minimization. I'm currently studying more applications of genetic algorithms in graph drawing including another GA for drawing bipartite graphs. If I don't write another post that summarizing a genetic graph drawing algorithm I'll at least post a listing of the articles that I've found in the area for interested readers. :-) 


References: 
Genetic Algorithms for Drawing Bipartite Graphs by Erkki Makinen, Mika Sieranta-Intern. J. Computer Math, 1994






1 comment:

Anonymous said...

Wow very interesting! What an awesome idea.