In blocking, matrix multiplication of the original arrays is transformed into matrix multiplication of blocks. For example,
C_block(1,1)=A_block(1,1)*B_block(1,1) + A_block(1,2)*B_block(2,1)
This is a well-known Cache Blocking technique for efficient memory accesses. Then, We can think about a Memory Blocking technique for efficient disk accesses and parallelism (or distributed?) on the Hadoop MapReduce as describe below.
Each mapper processes a matrix multiplication of blocks and reducer processes sum up all sub-matrices.
-> <C_block(1,1), A_block(1,1)*B_block(1,1)>, ...
-> <C_block(1,1), A_block(1,1)*B_block(1,1) + A_block(1,2)*B_block(2,1)>, ...
In my own work, It runs greatly. But, I'm not sure it can be auto-optimized.