Saturday, March 14, 2015

Drawing Bipartite Graphs: A Mutation-Based Approach

Given the popularity of my recent post on genetic algorithm (GA) basics and bipartite graph drawing applications, I decided to summarize another related article here. The paper, "An Experimental Comparison Between Evolutionary Algorithm and Barycenter Heuristic for the Bipartite Drawing Problem" by Zoheir Ezziane, not only presents a GA that compares favorably to the popular barycenter heuristic, but also highlights a topic that is often debated over: the need for a crossover operation in a genetic algorithm. 

Although genetic algorithms can greatly reduce search in the solution space for a problem, they also have the drawback of suffering from typically long computation times. The total computation time can be influenced by many different factors, such as sensitivity to GA parameters(e.g. mutation rate and population size), complexity of the evaluation function, or complexity of the crossover operation. Zoheir makes the same point in his paper. Based on experimental results on bipartite graphs from other papers (listed below), he designs a purely mutation-based GA for bipartite graph drawing (i.e. it has no crossover operation). Unlike the GA from my previous post which fixes the nodes of one of the layers and permutes those of the other, this algorithm rearranges nodes in both layers of the bipartite graph. The algorithm also differs from that described previously in that it does not require the two vertex subsets to be of equal size. 

Thus, assume that we are given a bipartite graph G with vertex sets 1 and 2 of sizes n1 and n2 respectively. The components of Zoheir's genetic algorithm are as follows: 

1. Chromosome representation: Each bipartite graph drawing is represented by two arrays of sizes n1 and n2. These hold the vertices of sets 1 and 2 respectively, and 

2. Evaluation function: The fitness of an individual is equal to the number of edge crossings in the graph drawing induced by its ordering of vertices in layers 1 and 2. 

3. Mutation Operation: This operation has three subparts. 
a. Each vertex v in set 1 mutates with a probability p to another vertex k in the range [1...n1] which has not been selected before. 

b. Each vertex v in set 2 mutates with a probability p to another vertex l in the range [1...n2] which has not been previously selected. 

c. After the mutation operator has been applied to all vertices, reevaluate the number of edge crossings (i.e. the fitness value). 


Parts a. and b. are the same operation applied to different vertex sets, so without loss of generality let's look at an example on the first vertex set of a graph. Suppose for instance that set 1 = {1,3,7,8} and that it's initial ordering within the associated array is [3,7,8,1]. Now let's apply mutations of part a. to the elements of the first set one by one, assuming a mutation rate of p = 0.6. 

1. Take element 1 and suppose that mutation is triggered at that node. We now randomly select some number k from [1...4] (here n1 = 4). Suppose that k = 2. The node located at position 2 of the array 7. 7 has not been selected before, so therefore we swap nodes 1 and 7, creating the new array [3,1,8,7] as a representation of set 1.  


2. Take element 3 and suppose that chance favors mutation here as well. We randomly select k from [1...4] and chance also has it that k = 2. The node located at location 2 of the array is now 1, but 1 has previously been selected/handled by the first step. We must therefore continue to generate random k's until we find corresponding to an array position containing a node that hasn't previously been handled. Let's say that at our second go k = 4. Node 7 is now located at position 4, and it hasn't been previously selected. We thus swap nodes 3 and 7 in the array, creating the new representation [7,1,8,3]. 

3. Take element 7, but suppose that probabilities are not in our favor this time and we skip a mutation step. The array then remains as it was at the end of step 2 as [7,1,8,3]. 

4. Take element 8 and suppose that mutation is triggered at that node. We randomly generate k which comes out to 3. The node located at position 3 is 8 itself. We therefore leave 8 alone and have [7,1,8,3] as our final array representation for vertex set 1. 

4. Selection Policy: Following a random generation of an initial population P0, individuals are created through crossover and mutation applied to pairs from P0 until the total number of new individuals is equal to the size of P0. An elitist selection policy is then applied in the following way. Take the best (i.e. the fittest) member of P0 and the best of the current population. If the best member of the current population is less fit than the best member of the previous, replace the worst member of the current population with the best of the previous. If, on the other hand, the best member of the current population is more fit than that of the previous generation, then leave the current population alone. This current population will then be taken as the new generation to be evaluated for subsequent iterations of the GA. After each new generation of individuals is created through crossover and mutation the selection policy will perform any replacements deemed necessary by the described elitist method. 

And there you have it, what I would consider to be the bulk of the genetic algorithm. :-) What remains to explain now are the specific parameters used by Zoheir and the results he found when comparing the algorithm to the barycenter heuristic. 

A mutation rate (the probability p used in the mutate operation) or 0.8 was used throughout the simulations. Also, the total size or order of a bipartite graph was considered rather than the sizes of the individual vertex subsets. So in his analysis a graph of order 6 could have vertex subsets of sizes 5 and 1, 2 and 4, or 3 and 3. 


An important graph metric considered throughout the simulations was that of graph density. In the context of a bipartite graph having vertex subsets of sizes m and n, the graph density is defined to be the ratio |E|/(m*n). |E| denotes the cardinality of its edge set. 

The following list includes some of the highlights of the simulated results: 

1. Starting with a small population size of 10 and assuming a bipartite graph of order 10, the GA always outperformed the barycenter heuristic regardless of density. The trade off here comes in terms of time. It was found that the total number of iterations needed to outperform the barycenter heuristic in this scenario reached 200. 

2. Not surprisingly with a larger population size of say 30, fewer iterations were needed to surpass the results of the one-sided barycenter heuristic. The total number of iterations needed for the GA to do so never exceeded 50 in this case due to the variety of solutions offered by the larger population size. 

3. The GA always eventually outperforms the barycenter heuristic on graphs with density greater than 50%. 

4. The GA always eventually outperforms the barycenter heuristic on graphs with density less than 50%, but with generally larger population sizes (specific population numbers are graphed in the results section). 

5. On graphs of order 17 the average number of iterations needed by the GA varied dramatically based on the density of the graph. 

So to summarize, what we can take away from all of this is that a purely mutation-based genetic algorithm also shows successful performance in drawing bipartite graphs. When tested on bipartite graphs of orders 10 and 17, it was found to eventually outperform the classical barycenter heuristic in terms of edge crossing minimization. The trade off came in terms of time, with the total number of iterations needed to exceed the barycenter result reaching a maximum of 200. If the ultimate goal is to have the cleanest possible graph drawing with the fewest number of edge crossings, or if one is dealing with bipartite graphs that are pretty small in size, then this is not that big of a sacrifice. 

Another interesting result to take out of this study is the finding that the number of iterations of the GA varied quite a bit with graph density. This highlights the importance of considering graph density, rather than graph size alone, as a complexity measurement in these types of graph drawing studies. 

References: 
1. Ezziane, Z. 2007. Experimental Comparison Between Evolutionary Algorithm and Barycenter Heuristic for the Bipartite Drawing Problem. J. Comput. Sci., 3: 717-722. 

2.  Newton, M., O. Sykora, M. Withall and I. Vrto, 2002. A PVM Computation of Bipartite Graph Drawings. http://parc.lboro.ac.uk/research/projects/parseqgd/s topara/stopara.pdf accessed on 6th August 2006. 

3. Rosete-Suarez, A., M. Sebag and A. OchoaRodriguez, 1999. A study of Evolutionary Graph Drawing. Technical Report, http://citeseer.ist.psu.edu/411217.html accessed on 7th November 2006. 

4. Branke, J., F. Bucher and H. Schmeck, 1997. A Genetic Algorithm for Undirected Graphs. In J. T. Alnader Edn., Proceedings of 3 rd Nordic Workshop on Genetic Algorithms and their Appli., 193-206.


No comments: