January 15, 2009

Distributed matrix multiplication with HAMA

To mutliply two matrices A and B, We collect the blocks to 'collectionTable' firstly using map/reduce.




Rows are named as c(i, j) with sequential number ((N^2 * i) + ((j * N) + k) to avoid duplicated records. Each row has a two sub matrices of a(i, k) and b(k, j) so that minimized data movement and network cost. Finally, We multiply and sum sequentially.

Don't look down upon this uneducated method. There are a lot of duplicated blocks but, it'll be distributed at the hadoop/hbase level. And, Increased node numbers, there is a linear increase of IO channel. So, We can see an approximately linear increase of speed with increment of node number.

Here is my test result with Hama:

* 4 Intel(R) Xeon(R) CPU 2.33GHz, SATA hard disk, Physical Memory 16,626,844 KB
* Replication factor : 3
* Rests are default

= Matrix-Matrix Multiply of 5,000 by 5,000 dense matrix =

Unfortunately, hbase seems not ready to put large columns yet. So, I tested 5,000 by 5,000.

* 3 node : 602 seconds, 400 blocks
* 6 node : 270 seconds, 400 blocks

If you have more good idea, please share with me!! :)

5 comments:

  1. Anonymous7/2/09 03:33

    I'm so expected to join HAMA.
    If possible, I would perform the evaluation roughly at Y!grid ASAP.

    Now, could i be a committer officially in apache-hama?
    If so, give me a permission to commit.

    I feel that our potential view is converging a akin point.

    It's now or never.
    See you soon.

    ReplyDelete
  2. Upper comment is written by sangwon.
    Thanks.

    ReplyDelete
  3. Sounds good. Check this out -- http://incubator.apache.org/hama/developers.html

    ReplyDelete
  4. Hello Edward,

    I have been trying to use HAMA as it is stated that it can do matrix operations on massive matrices. However I can't find this functionality in the last release 0.6.0 source files!! Please tell me if I am missing something . Best regards and please keep up your nice work

    ReplyDelete
    Replies
    1. Hi CECUEG,

      This MapReduce based approach was not good for performance and scalability, so we're trying to re-implementation based on Bulk Synchronous Parallel.

      Delete