Rows are named as c(i, j) with sequential number ((N^2 * i) + ((j * N) + k) to avoid duplicated records. Each row has a two sub matrices of a(i, k) and b(k, j) so that minimized data movement and network cost. Finally, We multiply and sum sequentially.
Don't look down upon this uneducated method. There are a lot of duplicated blocks but, it'll be distributed at the hadoop/hbase level. And, Increased node numbers, there is a linear increase of IO channel. So, We can see an approximately linear increase of speed with increment of node number.
Here is my test result with Hama:
* 4 Intel(R) Xeon(R) CPU 2.33GHz, SATA hard disk, Physical Memory 16,626,844 KB
* Replication factor : 3
* Rests are default
= Matrix-Matrix Multiply of 5,000 by 5,000 dense matrix =
Unfortunately, hbase seems not ready to put large columns yet. So, I tested 5,000 by 5,000.
* 3 node : 602 seconds, 400 blocks
* 6 node : 270 seconds, 400 blocks
If you have more good idea, please share with me!! :)