Fault-tolerance in Hama

Recently, Hama core committers Suraj Menon and Thomas Jungblut are working on Fault-tolerant BSP system. And I am trying to read the source code. Their design describes the new BSP computing system and API enabling checkpoint-based recovery. Furthermore, describes the confined recovery, which can be used to improve the cost and latency of recovery.

I didn't fully understand and test yet but quite nice!

Comments

Popular posts from this blog

일본만화 추천 100선

음성 인공지능 스타트업의 기회 분석

공유 모빌리티 회사로 합류

인간 본성의 법칙 (책 리뷰 + 잡담)