Fault-tolerance in Hama

Recently, Hama core committers Suraj Menon and Thomas Jungblut are working on Fault-tolerant BSP system. And I am trying to read the source code. Their design describes the new BSP computing system and API enabling checkpoint-based recovery. Furthermore, describes the confined recovery, which can be used to improve the cost and latency of recovery.

I didn't fully understand and test yet but quite nice!


Popular posts from this blog

일본만화 추천 100선

음성 인공지능 스타트업의 기회 분석

공유 모빌리티 회사로 합류

인간 본성의 법칙 (책 리뷰 + 잡담)