In this post, I'm introducing the powerful graph computing package of Apache Hama and semi-clustering algorithm described Google's Pregel paper[1]. A semi-cluster in a social graph is a group of people who interact frequently with each other and less frequently with others. It is different from ordinary clustering in the sense that a vertex may belong to more than one semi-cluster.
The algorithm is a greedy algorithm. Since Apache Hama provides complete clone of Pregel, you can write a semi-clustering program and run it on large graphs in a few minutes like this:
@Override public void compute(Iterator<SCMessage> messages) throws IOException { if (this.getSuperstepCount() == 0) { // In superstep 0, V enters itself in that list as a semi-cluster of // size 1 and score 1, and publishes itself to all of its neighbors. } // In subsequent supersteps, V circulates over the received semi-clusters SCMessage clusters = null; while ((clusters = messages.next()) != null) { for (SemiCluster c : clusters.getClusters()) { // If a semi-cluster c does not already contain V, V is added to c to // form c’. } } // The semi-clusters are sorted by their scores and the // best ones are sent to V’s neighbors. // Vertex V updates its list of semi-clusters with the // semi-clusters that contain V. boolean updated = updateLocalClusters(clusters); if (!updated) { // The algorithm terminates either when the semi- // clusters stop changing or the user may provide a // limit. // At that point, the list of best semi-cluster // candidates for each vertex may be aggregated // into a global list of best semi-clusters. voteToHalt(); } else { // The semi-clusters are sorted by their scores and the best ones are // sent to V’s neighbors. } }
What do you think - easy enough to use? :-)
P.S., the implementation issue is still left open[2]. If you want to contribute your code or just want to share something with us, Please feel free to contact Hama team.
1. Pregel: a system for large-scale graph processing
2. Implementation of a Semi-Clustering algorithm, described in Pregel paper.