Show report – ARC enters signal processing market

Show report – ARC enters signal processing marketRichard Ball
From its North London base, ARC is starting to make in-roads into the embedded processor market.
At the Forum, the company is announcing version three of its processor architecture which adds DSP extensions, debug support for multiple processors and improved power consumption.
Concentrating on the DSP extensions, EW spoke to James Hakewill, chief architect of the 32-bit processor.
“In version three, we’re announcing an integrated package of extensions,” said Hakewill. This includes a multiply-accumulate (MAC) unit, a block of XY memory, improved instruction cache and a library of functions.
While the previous ARC had a 16-bit multiplier, its size limited applications. The new 32-bit MAC is far more useful and can be reconfigured as either single 24-bit or as dual 16-bit. It can address audio, mobile phones, Dolby stereo or modems.
To work as a dual MAC, two words of 16-bit data are packed into a 32-bit word. Two 16-bit MACs can be issued on every cycle.
Providing data for the MAC is an XY memory. “The XY memory is something that will be very familiar to DSPprogrammers.”
The XY memory is configurable in size up to 16kbyte and can be arranged in up to four banks. “There are multiple banks for context switching and also background DMA transfers to a bank that is not in context,” Hakewill said.
The nature of DSPcode has led to an update of the instruction cache. “In release two we had a direct mapped cache,” said Hakewill. “But in DSP applications you want to lock an inner loop into the cache.”
This just doesn’t work with a direct mapped cache, which ARC originally solved by having an option to split the cache into two, one half of which could be locked.
For flexibility, the new core uses a set associative cache, which can be two, four or eight way. Again, size and line length can be configured.
Debugging, the bane of an embedded developer, has been improved. The single JTAG interface can debug up to eight ARCs on one chip. The Metaware debugger has been updated to support this.
Hardware breakpoints and watchpoints can halt the processor or trigger an interrupt which transfers data before restarting the application code.
A set of library functions are also freely available. They include FIR and IIR filters, fast Fourier transform, powers, logs, square root and matrix multiply or adds. The assembly code functions can be called from C or C++. ARC’s DSP extensions at a glance 32 x 32 multiply accumulate block configured as single 24-bit or dual 16-bit XY memory has multiple banks power consumption reduced by 30 per cent, 0.5mW/MHz on 0.25?m CMOS single JTAG interface debugs up to eight ARCs
In terms of performance, ARC compares its processor against an Oak DSP core. With both running at 80MHz, Hakewill claims the ARC performs an FIR filter 1.7 times faster and a 256-tap complex FFT 20 per cent faster than the Oak.
“And an ARC can run twice as fast on a 0.25?m process,” Hakewill claimed.
Power consumption has also been reduced, by up to 30 per cent over the previous version. “We’re also adding a sleep mode which removes clocks from the processor and restarts on an interrupt,” Hakewill said.
The consumption of the basic processor is now around 0.5mW/MHz using a bulk CMOS 0.25?m process. Using the low power 1.2V process from Xemics, this drops to less than 50?W/MHz.

Leave a Reply

Your email address will not be published. Required fields are marked *