A neuron-centric programming model

According to the Google's DistBelief paper, the user defines the computation that takes place at each node in each layer of the model, and the messages that should be passed during the upward and downward phases of computation. They didn't described details about programming APIs, but we can imagine like below:
 class MyNeuron extends Neuron
   method upward(messages [m1, m2, ..., ])
       sum ← 0
       for each w ∈ [m1, m2, ..., ] do
          sum  ← sum + m.input * m.weight
          
       // propagate squashed output value to neurons of next layer
       propagate(squashingFunction(sum));

   method downward(messages [m1, m2, ..., ])
       for each w ∈ [m1, m2, ..., ] do
         gradient  ← this.output * (1 - this.output) * m.delta *  m.weight
         propagate(gradient);

         // weight collections
         w ← w + Δw (α * this.output * m.delta)

       // push updates to parameter server
       push(weights);
The reason I separate it into two methods (unlike Google's Pregel) is that framework can determine whether it's upward or downward phase by message type internally. Moreover, with this, we can reduce the user-side code complexity. The advantage of this is very fit for multi-thread programming and parallel computing of each gradient calculations at neuron level.

Additionally, instead of allowing user to write a formula for some calculations directly within neuron-centric programming model, we abstract some arithmetic operations and generate the code for GPU acceleration internally. With this, we compile it to a GPU-oriented code that batches for speed.

That way the user will not have to think about the specifics of GPU, but instead focus on the algorithm.

No comments:

Post a Comment