Show report – MIPS takes instruction

Show report – MIPS takes instructionRichard Ball
MIPS Technologies has announced a complete revamp of its microprocessor architectures, reducing the number of instruction sets from five to two.
MIPS32 combines the existing MIPS I and II instruction set architectures (ISAs). It includes a better memory model from the R4000 and R5000 processors plus an exception model.
MIPS64 covers ISAs III, IV and V, adding 64-bit floating point and a single instruction, multiple data capability.
The company has also unveiled the first three processors based on the new ISAs. Based on MIPS32, the 4Kc was previously codenamed Jade. Using the MIPS64 ISA are the 5Kc, which was called Opal, and 20Kc, formerly Ruby.
At the Embedded Processor Forum, MIPS described in detail the 4Kc (Jade) core and 4Kp, a ‘petite’ version.
“4Kc is the first implementation of the MIPS32 architecture and is optimised for system-on-chip applications,” said Bruno Kajiyama, chip architect at MIPS.
For example, the single issue, five stage pipeline core uses one clock, and all logic is clocked on the rising edge. “This makes it easy to insert scan and add customer blocks,” Kajiyama said. “And we have restructured the whole pipeline so the core can be synthesised.”
Gated clocks are used to remove power to areas of the core not in use. “It also cuts the clock to parts of the pipeline when there is a stall,” said Kajiyama.
The 4Kc implements a single cycle, 32×16-bit multiply-accumulate (MAC) unit. 32×32-bit multiplies have a two cycle throughput.
Cache memory is available from one to 16kbyte for both instructions and data. Cache can be direct mapped or two or four-way set associative.
As with the majority of compact 32-bit embedded cores these days, 4Kc is available in both synthesisable and physical formats (soft and hard).
The former, said Kajiyama, does not increase power too much, but does come with a clock speed penalty. In a typical 0.25?m process, a soft core will run at 150MHz, he said. The hard macro, on the other hand, will reach 225MHz.
For either version, power is claimed to be under 2mW/MHz on a core using 8kbyte instruction and data caches. A sleep mode reduces power requirements to less than 100?W, said Kajiyama.
Also announced at the Forum was the 4Kp which uses a low cost multiply unit, taking 32 cycles for a MAC operation. Block address translation (BAT) is used for memory protection, which is simpler and smaller than the 4Kc’s translation lookaside buffer (TLB). Some of the embedded JTAG features for on-chip debug have been removed.
Both cores can achieve the same clock speeds, but “the main difference is in die size”, said Kajiyama. Without cache, the 4Kc occupies about 3mm2 of silicon, rising to 10mm2 with 8kbyte each of instruction and data cache.
The 4Kp is around 2mm2 without cache, while power drops below 1mW/MHz, again without cache. Obviously, its slower MAC reduces performance on signal processing applications.


Leave a Reply

Your email address will not be published. Required fields are marked *

*