10bn Flops using logs

10bn Flops using logsRoy Rubinstein  
Motorola has unveiled a 32-bit logarithmic co-processor aimed at 3D graphics and video processing applications.
The attraction of using logarithms as the internal number format of the Beta device is that multiplies and divisions become single cycle additions and subtractions. As a comparison, Motorola cites a 32-bit divide using the PowerPC 603 processor which takes between 18 and 33 cycles. If it is a square root operation, the 603 can take over 100 cycles.
This, claims ShaoWei Pan, one of the Beta’s designers, provides the device with significant processing performance/power consumption advantages compared to conventional processors.
The Formatter is used to convert almost any data format into the 32-bit internal logarithm domain. The absolute error using a logarithmic operation is less than one least significant bit in IEEE 32-bit floating point format, claims Motorola. The data is then passed to any number of the 64 on-chip compute units, each having 128x32bit of local store.
The units are also linked via a local data bus. This enables data to be passed between the units during such operations as digital filtering and correlations.
According to Motorola, with one instruction Beta can implement a 64 by 64 filter bank or a 4096-tap finite impulse response filter, a 256-point or greater fast Fourier transform, or large matrix computations. The average sustained processing performance is of the order of 10 billion floating point operation/s.
The results from the compute units are converted back into real values using one of the 32 anti-logarithm (Alog) units. If required, the results from the Alog units are accumulated using an adder-chain structure and passed to the reformatter which converts the data into the required format.
Motorola claims that the Beta chip is ideal as a DSP co-processor for PC or embedded environments. It has already been implemented on a board containing an MPC823 processor. For certain wireless communication applications the overall design achieved a speed-up 100 times greater than using the MPC823 alone.
The Beta chip is clocked at 120MHz, as are the input and output buses. The sustained data processing rate of the Beta can be as high as 480Mbyte/s. Power consumption reaches 3W when all 64 compute units are being exercised.

Leave a Reply

Your email address will not be published. Required fields are marked *