Introducing Rebase: a Firebase like backend service based on Node.js, Redis, and HBase

삼성전자 재직시절 (아마도 2 ~ 3년 전) 대외 기고 승인까지 다 받아놓고 게재하지 않았던 글을 이제서야 올려본다. 지금 다시 보니 감회가 새롭네.

Introducing Rebase: a Firebase like backend service based on Node.js, Redis, and HBase 

Guest post written by Edward J. Yoon and Sungwon Han, Senior Software Engineers, Samsung Electronics.

 Event-driven applications that handle user actions, sensor outputs, or messages from other applications have become increasingly popular in recent years, in part as a result of the proliferation of smart devices and the Internet of Things. With this, the real-time databases (like Google's Firebase [1] and RethinkDB [2]) are considered as important technologies for supporting these non-traditional applications and workloads. In this guest post, we introduce Rebase, a Firebase-like backend service that allows you to store and synchronize data in online/offline modes, push notifications and provides analytics (in near future) for mobile platforms. Rebase was first deployed to run on Amazon AWS. However due to performance and cost reasons, we soon migrated it to run on Joyent’s Triton Cloud. We share performance benchmarks for Rebase on Joyent Triton vs. Amazon AWS, which served as our key decision metric.

Rebase 

Recently, we developed a container-based real-time database for Samsung IoT applications, called Rebase. It uses event-driven model to notify changes among clients, stores data as JSON objects in a K/V databases. It also provides easy-to-use RESTful APIs and client libraries that accessible to cross-platforms such as iOS and Android in form of SDKs. With Rebase, developers can focus on core functionality and not have to worry about the infrastructure running the backend services.

Below is a reference architecture for Rebase. It has the following key components:

 Rebase backend container stack 

NGINX, Node.js, and Standalone Redis 

We adopted NginX, Node.js, and Redis as a backend servers. Node.js is a JavaScript runtime environment that processes incoming requests asynchronously in a loop, called event loop. It has an event-driven and non-blocking I/O model, thus it allows handling large throughput of data and building scalable applications. NginX is used as a load balancer, and the standalone Redis is used for message queueing and asynchronous messaging.

Redis and HBase Cluster 

For the persistent storage, we reviewed NoSQLs and K/V stores such as Redis, HBase, and MongoDB. MongoDB isn't suitable for the time-series workloads. Redis is very fast but it has limited scalability and persistency capabilities. HBase is column-oriented K/V store that allows horizontal scalability, and allows to store not only JSON object itself but also parsed columns for fast columnar scan. So, we decided to use the sharded Redis cluster as cache storage, and the HBase as persistent storage.

Programming Interface 

Rebase's client SDK is very similar to the Google's Firebase. It provides basic CRUD operations and stream subscriptions APIs. For example, chatting app synchronization can be written like the following:
var setMessage = function(data) {
    var val = data.val();
    this.displayMessage(data.key, val.name, val.text);
  }.bind(this);

// Loads the last 3 messages and listen for new ones.
this.messages.limitToLast(3).on('child_added', setMessage);

Joyent Triton vs. AWS 

Cost

In the early days of Rebase, we mainly used Amazon EC2 instances and ECS service as a container pools for Rebase backend service. AWS served our needs well, when our users and traffic workloads were small. However as we started scaling out, it quickly turned into a very expensive option for our long term needs.

Server Density

Additionally, AWS runs containers on VMs, which adds unnecessary overhead to the compute nodes. Also if you end up running Kubernetes to manage containers, on VMs running on EC2, the layers of abstraction keep adding flab and reduce efficiency of the compute resources. We tried Joyent and were impressed with it’s containers on bare-metal service. No VMs or abstraction layers to go through.

Performance

Moving to bare metal with Joyent's Triton service [3], we were able to get significant performance improvements as following throughput graph illustrates. For Read operations, we were able to increase throughput by 3~5% while keeping latency under 1 second. For Write operations, we were able to increase throughput by 63% at the same latency of under 1 second. 

Deployment Time

We were able to reduce the deployment time drastically (5 minutes to 1 minute) for new cluster of rebase backend service using Triton’s optimized container provisioning.

State of the Project 

We are still in the early stages and the code isn't meant for production usage yet. The auto-scaling with minimal downtime is challenging. An elastic solution that allows auto scaling up and down or non-stop migrating service is our next goal. Container Pilot and the auto pilot pattern look like some of the right tools to try.

We also plan to open source this project once we've made a few more refinements to it, so that we could tailor it and contribute back to the wider community. We will also be looking to add contributors, so keep a lookout on here for updates! 

References

1. "Google Firebase". https://firebase.google.com/
2. "RethinkDB". http://www.rethinkdb.com/
3. "Joyent Triton". https://www.joyent.com/triton

No comments:

Post a Comment