Wednesday, January 30, 2013

Graph Theory Applications: Databases!

So far I've been discussing some of the more what I will call "theoretical" aspects of graph theory; theoretical in that I haven't really delved into any concrete applications of the topic. Of course I love graph theory for its own sake but the fact that it has so many diverse applications makes it all the better and adds another layer to an already rich area of discrete math. Hence the topic of today's posting: applications of graph theory in databases! :-) 

I partly chose databases as the application focus because I've been giving a lot of lectures on so-called graph databases. These are databases where the information is stored within graph-theoretic entities such as nodes and edges. This contrasts with the traditional SQL approach of having tables of rows and columns hold your data. There are many different graph databases out there, but the one I have been focusing on the past few weeks is Neo4j. It is Java-based, the j standing for Java I believe. If you do a general google search online you'll come across various tutorials on Neo4j as well as links to the rich Neo4j apis. I have not experimented with any other graph databases besides Neo4j. As such I can't say how they compare to each other. However, that does not change my general opinion on the merits of Neo4j, which I believe to have a lot of really nice classes and methods that allow you to quickly create a graph database and add information to it. For instance, if you want to create a so-called embedded graph database, that can be accomplished through the following line of code: 

GraphDatabaseService graph = new GraphDatabaseFactory.newEmbeddedDatabase(pathName); 

The path name just indicates where you are going to store your graph database (i.e. the name of its path). 

After that it's just a matter of adding nodes and edges (in Neo4j lingo the edges are called relationships) to your database! Adding a node is as simple as declaring a Node variable and calling the createNode method of graph from above: 

Node me = graph.createNode(); 
me.setProperty("name", "Tanya"); 

Node myDog = graph.createNode(); 
myDog.setProperty("name", "Argos"); 

In this case I added two nodes, one for myself and one for my dog :-). I've also added names for the two of us through the  setProperty method of the Node class. The entire Neo4j framework follows the property graph model, which means that the nodes and edges in your graph database can have properties associated with them. These properties store the information you want within your database and give your graph more meaning. 

And as for edges, if you would like to create an edge from one node to another in your database, you can do the following:

me.createRelationshipTo(myDog, MyRelations.LOVES);

By invoking this createRelationshipTo method I am creating an edge/relationship from myself to Argos(first parameter in method) that is of type LOVES. The method requires that you specify the type of relationship that exists between two nodes. I won't get into the specifics of how you define your relationship types, but just know that it can involve a special type of class called an enum in Java. 


And that's it! 

Well.....errghh...
To be completely truthful adding the nodes and relationships to the graph takes a little bit more work than just this. In reality you have to do all of the node and relationship creation within the context of a transaction and declare a Transaction object to accomplish that. As with the relationship types I will not go into the details of that in this posting. What I will point out though regarding the sample code is how nice Neo4j is in the way  they've set up their classes and variables. With the exception of the whole Transaction business, adding nodes (such as people or dogs) and relationships between them to a graph is a cinch! 

Now all of this coding is fine and dandy, but what should be emphasized even more is just how powerful and novel the idea is of storing large quantities of information within graphs. If you do a regular image search of graph theory it is most likely that you will stumble across pictures of what look like big colorful hairballs. A zoom-in of that will show that there are actually distinct nodes and edges distributed throughout said hairballs.How does this relate to graph theory? Well, it is becoming more commonplace nowadays to attempt to visualize data in the real world in the form of graphs. The "hairballs" that you see in your image search were made to represent such data which in our current information age is becoming larger and more complex. Trying to visually make out parts of huge graphs can seem daunting, so imagine how hard it must be to try and analyze it analytically and locate something very specific within the graph! It's like trying to find a needle in a haystack! That is where graph databases make their contribution and why I think they are so revolutionary. Having a database store the information of a graph-like entity according to the exact same structure can make things much easier for searching purposes than the traditional relational database approach. The best example that I've seen discussed so far is the following. Say that you would like to determine all of the friends of my friends within our social network. Using Neo4j the search can be accomplished simply by traversing the relevant edges in the graph, starting of course at the node that represents me; just follow all edges pointing to my friends and the edges to pointing to their friends in turn. That's pretty intuitive and probably what many of us would do if given the problem. No need to do complicated and resource taxing join operations in the classic table-based approach! And if you want to be more specific about the friends of friends search, say that I'm only interested in knowing the friends of my friends that happen to live in Puerto Rico. By using a graph database that implements the graph property model,this more specific search can be accomplished by going through the properties of nodes that are traversed and only including those that have a location property of Puerto Rico. Sounds simple enough, but it's pretty amazing! Especially if said social network looks like one of those hairballs in your image search. ;D 

P.S. If any of you have had experiences with other graph databases, then please leave a comment! I am curious to hear how other graph db's handle things like node/edge creation and searches.