December 26, 2010

Now I'm a user of HBase.

Today, I've changed the database of my toy project, a URL shortener service, from the MySQL to the Apache HBase. It seems run well with my Java real-time web application.

Actually, it's not my first time. I was contributed a HQL (HBase Query Language) long long time ago. If they didn't tackle me, I could made something like Pig or Hive. Haha, anyways.

My HBase cluster size is 5 nodes. There's no big data yet but, it is now used for some twitter clients and a number of web sites and the rows are increasing as almost 30 per second.

Later, I'll use them for my some research work e.g., information-flow analysis, user propensity analysis, web structure mining, trend mining with the Apache Hama. :-)

December 22, 2010

[Note] Several problems where Apache Hama can be used

  • Web Graph Structure Mining
  • Social Network Analysis
    • Information Flow On Social Network (finding top-K influential nodes)
    • Evolution Of Social Network
  • High Level Machine Learning (Bioinformatics, Chemical informatics .., etc)
.... and many others.

December 14, 2010

Serialize Printing of "Hello BSP" with Apache Hama

Serialize Printing of "Hello BSP"

Each BSP task of the HAMA cluster, will print the string "Hello BSP" in serial order. This example will help you to understand the concepts of the BSP computing model.

  • Each task gets its own hostname (hostname:port pair) and a sorted list containing the hostnames of all the other peers.
  • Each task prints the LOG string "Hello BSP" only when its turn comes at intervals of 5 seconds.

BSP implementation of Serialize Printing of "Hello BSP"

public class SerializePrinting {
  
  public static class HelloBSP extends BSP {
    public static final Log LOG = LogFactory.getLog(HelloBSP.class);
    private Configuration conf;
    private final static int PRINT_INTERVAL = 5000;

    @Override
    public void bsp(BSPPeer bspPeer) throws IOException, KeeperException,
        InterruptedException {
      int num = Integer.parseInt(conf.get("bsp.peers.num"));

      int i = 0;
      for (String otherPeer : bspPeer.getAllPeerNames()) {
        if (bspPeer.getPeerName().equals(otherPeer)) {
          LOG.info("Hello BSP from " + i + " of " + num + ": "
              + bspPeer.getPeerName());
        }
        
        Thread.sleep(PRINT_INTERVAL);
        bspPeer.sync();
        i++;
      }
    }

    @Override
    public Configuration getConf() {
      return conf;
    }

    @Override
    public void setConf(Configuration conf) {
      this.conf = conf;
    }

  }

  public static void main(String[] args) throws InterruptedException,
      IOException {
    // BSP job configuration
    HamaConfiguration conf = new HamaConfiguration();
    // Execute locally
    // conf.set("bsp.master.address", "local");

    BSPJob bsp = new BSPJob(conf, SerializePrinting.class);
    // Set the job name
    bsp.setJobName("serialize printing");
    bsp.setBspClass(HelloBSP.class);

    BSPJobClient.runJob(bsp);
  }
}

Hide the Maven Target Directory from Open Resource Shortcut

I use m2eclipse and I am a big Maven fan. Unfortunately, I don't want resources to show up from my target directory when I use the open resource shortcut. So how do we get around this? Simply, right click the target folder, click Properties, then check the derived checkbox and hit the Ok button.