The processor is an accelerator for ‘reinforcement learning’ – a behaviourist psychology-inspired learning algorithm that mimics the way dopamine encourages reward-motivated behaviour in human social interactions.
According to the paper presented at ISSCC, to implement reinforcement learning the chip inherits properties of stochastic neural networks and recent advances in Q-learning.
Mixed-signal processing was adopted over an all-digital approach to save area and power. Estimates are: 3.4mm2 footprint (>5mm2 for digital), 80µW interconnect loss (140µW) and <17µW leakage (20µW).
Executing the algorithms requires the equivalent of 4 to 8bits (1:16 to 1:256) accuracy, according to the researchers, which rules out analogue voltage computation because of the limiting effect of low supply voltage on dynamic range. Instead, analogue pulse-widths have been chosen, “thereby enabling large dynamic ranges. As a trade-off, the architectures are slower, which is perfectly acceptable for the applications in hand” the paper says.
As an example of the mixed signal processing within a time-domain neuron (see ‘hidden’ layer below), a time-domain multiply-and-accumulate (MAC) is implemented in a 21-bit counter which multiplies the 6-bit input from a pre-synaptic neuron by the 6-bit weight of the synapse.
The counter’s input is a pulse whose width is proportional to the input value, and the counter is clocked by a frequency proportional to the learned weighting, with the result that the count is proportional to one multiplied by the other. Using an up/down counter allows negative values of input to be accommodated.
This looks fairly digital up to here, but the weighing-to-frequency oscillator appears to be based on binary-weighted current sources – implemented as memory-in-logic to reduce data movement.
“The energy to perform a MAC is proportional to the magnitude of the operands and hence the importance of the computation in the neural network, a feature inherent in the brain but missing in digital logic,” said the researchers. Worst-case power is 1.25pJ/MAC at 0.8V.
A detailed description of forward and training (reverse) paths was provided at the conference. Loosely, the chip takes in distance information from three forward-facing ultrasonic sensors (dead-ahead, at 2 o’clock and 10 o’clock). The feed-forward path is a three-layered neural network (input, ‘hidden’ and output) with the network sizes and bit-widths optimised for minimum power executing the chosen algorithm.
There are 84 time-domain neurons in the hidden later, performing a weighted sum of the inputs and, using an activation function, each neuron produces pulses that are re-transmitted via stochastic synapses to the output layer of neurons. The output layer, after a winner-takes-all comparator, produces straight-ahead, left or right commands.
Weightings are updated by back‑propagation based on reward criteria – such as avoiding an obstacle, for example.
Peak energy efficiency is at 0.8V, with 690pJ/inference and 1.5nJ/training cycle.
ISSCC paper 7.4
A 55nm Time-domain mixed-signal neuromorphic accelerator with stochastic synapses and embedded reinforcement learning for autonomous micro-robot.
The IEEE’s annual International Solid-State Circuits Conference is the place where the world’s companies and universities gather to show off their chip-based circuit developments, and where attending engineers get a first glimpse of the state-of-the‑art in digital, analogue, power and RF design techniques.