
Multi-core processing is the future. Everyone says so. So why is it not moving more quickly into the industry mainstream?
“Von Neumann is a poor use of scaling, all the energy is going on the communication between the processor and the memory. It’s much better to use 20 microprocessors running at 100MHz than one at 2GHz,” says IMEC co-founder Professor Hugo De Man.
“Five of the top ten supercomputers use massively parallel processing using Blue Gene,” says Dr Bernie Meyerson, CTO of IBM.
Although everyone accepts that multi-cores running at low speed get over the power density brick wall which was hit by single processors running at high speed, people are finding multi-core much more difficult to implement than expected.
 |
Alan Gatherer |
“There’s a lot of snake oil around in the multi-core business,” says Alan Gatherer, CTO for communication infrastructure at Texas Instruments.
“I’m not claiming we understand the secret. It varies from application to application. I’m not sure anyone knows how to build a generic multi-core architecture. It’s a great goal, but the chances of failure are 100 per cent.”
Gatherer’s point is that multi-core delivers performance when it is targeted at a specific application when you know what each part of the chip will be doing, but it becomes a nightmare when you try to produce a microprocessor which can be programmed for many different applications which, of course, is supposed to be the whole point of a microprocessor.
“The theoretical argument for parallelism is very compelling but has never been realised. You put a vast array of microprocessors down, and you get terrific performance,” says Chris Rowen, CEO of Tensilica. “It is so cheap to build microprocessors nowadays you can build in redundant microprocessors.
"We have had clients who have put in lots of redundant microprocessors because it doesn’t cost anything to do. But while the theoretical argument for parallelism grows in strength, why has it so often failed in the past?”
 |
Transputer |
The most prominent failure was the UK Government-backed 1978 start-up Inmos, which developed a chip called the Transputer which contained an array of processing elements and was designed to link with an infinite number of other Transputers.
Although used for some niche applications like radar and finger-print checking, the Transputer never became mainstream.
However, Inmos spawned a raft of UK companies specialising in multi-processing. Two, based in Inmos’ old home town of Bristol are PicoChip and Clearspeed. “The reason why Bristol is such a strong cluster in parallel processing, and why the UK developed real expertise in this space, has a lot to do with the funding of Inmos and the development of the Transputer,” says Tom Beese, CEO of ClearSpeed.
PicoChip processes radio algorithms while ClearSpeed processes maths problems that can be chopped into small bits and run on many processors at once.
ClearSpeed’s chip is an array of 96 processing elements designed for 64-bit double precision floating point maths. Each element has an integer ALU, a dual issue 64-bit FPU, 6kbytes of local memory, and a MAC. The array is clocked at 250MHz and delivers a sustained 25Gflops at less than 10W. Clearspeed has worked with both Intel and AMD on multi-core implementations.
PicoChip’s architecture is tuned for WiMAX and HSDPA applications. The PC102 device has more than 300 processors which run at 160MHz and deliver 200Ginstructions/s and 40GMACs between them. Unlike the ClearSpeed device which relies on every processor doing the same thing at once (single instruction multiple data), the PC102 is multiple data multiple instruction, so every processor can be doing something different.
At the applications level, multi-processing is well understood and has been used for years. “TI has been a multi-core house for years and years. Customers have taken multiple DSPs and put them on a board and expected them to work together,” says Gatherer. “Now process technologies allow us to put several cores on a chip. It’s not so much a strategy, it’s a natural progression. We do it where it makes sense for the application.”
“The majority of ARM designs are multiprocessor chips,” says John Goodacre, ARM’s programming manager for multi-processing. “We have the tools to support that. It’s nothing new. The reason people do multi-processing is because specialist cores can be more power efficient than one big core.”
 |
Cell: Eight way processing |
So multi-processing with various specialist cores, targeted at a specific application, is not the problem. For instance, Cell, the IBM/Toshiba/Sony microprocessor, is aimed at vector graphics processing, and can deliver a quarter of a trillion FLOPS.
Producing a specialist super-cruncher is not the problem. The problem is caused by multiple cores, all the same, which are intended to work as a generic processor.
“We can produce a chip with a lot of theoretical Mips but it won’t be very programmable, the difficulty is partitioning the programming across the cores,” says TI’s Gatherer. “Our customers tell us that, if you have to partition an algorithm across multiple cores and expect them all to talk to each other in real-time, that’s a hard problem.”
That problem has to be solved by programmers, and everyone seems to agree that programmers are entrenched in a multi-core resistant mind-set which is the main hindrance to the wider adoption of multi-core architectures.
“The software guys don’t move very fast. That’s why multi-core processing looks as if it will be a long term play,” says ARM’s Goodacre. “Multi-threading was a stop-gap, but it doesn’t add anything, and can make software very difficult to write.”
 |
Peter Claydon |
Peter Claydon, founder and COO of PicoChip, reckons: “A lot of the parallel processing start-ups have failed because they try to use a sequential programming language like C. The mentality of programmers is to expect everything to be serial. Multi-core is here to stay, but it needs a new way of thinking. I have talked to people who are thinking of doing start-ups to produce multi-core programming tools.”
“Customers say: ‘Be very careful with parallel architectures’. They ask us, ‘you don’t know how to program these massively parallel devices yourselves do you?’,” says TI’s Gatherer. “That’s one of the reasons why there’s been no traction for those companies doing massively parallel architectures. These companies program their own devices because no one else can.
"When they say: ‘We’ve got a reference design so you don’t need to program it yourself’, what they really mean is, ‘we’ve got a reference design because we know you’ll never be able to program it yourself’. But a lot of our customers feel they have their own competitive differentiation, which they can provide from programming the chip themselves. That’s what we provide.”
The goal is a programming tool which is understandable, accessible and readily usable by programmers, and which fully exploits the power of parallelism. No one is expecting this anytime soon.
So there is the problem. Multi-core is perceived as difficult to program, programming tools do not really exist, and the programmers are entrenched in a serial mind-set.
Tensilica’s Rowen lists the needs as: “New tools and software for application-centric energy management; a need for new tools, software and training for multi-processor programming; significant silicon improvements for lower voltage and capacitance, and automation in multi-processor partitioning and interconnect.”
That is a lot of problems but, as everyone agrees, multi-core is the future.