Latest News
|NewsletterIn 1999, Fred Pollack from Intel famously pointed out that single-core high-performance x86 processors had reached a complexity level where there was a steep relationship between power consumption and performance. Since then, dual-cores with a simpler architecture have taken over the PC desktop.
Many of the same principles apply to embedded devices, just on a much smaller scale. A single-chip Asic platform can now easily contain a multiprocessor cluster of small and efficient processors that do not have the inherent instruction set complexity overhead of an x86.
Many people think that “multicore” means “more performance”, but in the embedded world it also means “more power efficient”.
As far as dynamic energy goes, running a concurrent task on a quad-core does not necessarily use more energy than being time-sliced on a single core. This is because the processors can be placed into standby as soon as they have no work to do. This may only be true on efficient, tightly-coupled multicore designs if such immediate power save modes and the cost of utilising the multicore is less than the overhead of time-slicing on a single process.
Static power consumption is also a major issue with embedded devices, as it affects the standby time. It is a particular problem with the latest small-geometry processes, where the laws of physics dictate that the leakage must unavoidably increase. Turning off areas of unused logic with power gating is increasingly important for embedded devices.
Multicore designs can offer the ability to trade-off the maximum performance against static power, by powering on or off entire CPUs as they are required. A dual-core system has 50% and 100% performance levels using one or two CPUs, with corresponding 50% and 100% static power consumption. In a quad-core system there are 25%, 50%, 75% and 100% performance levels.
So, during idle periods or periods of low activity, it makes a lot of sense to have only one processor active, to reduce the static power drain. Electrically, this is fairly straightforward, as each CPU is placed in a different power domain.
Software design
A lot of traditional embedded software engineers are concerned about the complexity of writing software for multicore systems, which is understandable for engineers who have always worked on uniprocessor systems. The good news is most of the complexity can be handled by the operating system.
A symmetric multiprocessing (SMP) operating system provides a high-level “threading” API which makes it easy to control multiple cores. Even real-time performance can be guaranteed by dedicating a task to a specific processor.
By choosing a standard architecture and a standard operating system, the task can be made much simpler. Most OS vendors have announced or are working on SMP support, which is the easiest way to code for a multicore solution. Linux has good SMP support for many architectures, including ARM, taking full advantage of new architectural features such as power efficient spinlocks, thread ID register, and memory regions supporting re-ordering of memory accesses, which allows full performance to be achieved.
Even programmers who have programmed a multicore device in the past may have found the experience challenging, with the overheads to support concurrency often being greater than the benefit the programmer could bring by parallelising their application.
But tight integration of the multiprocessing capability ensures that the overheads to support concurrency are low and therefore enables very fine levels of software concurrency to realise a performance gain.
It is really only in situations where a single significant task exists that the programmer may need to consider decomposing this into smaller tasks, but as in traditional single-core devices, tasks are split between other accelerators and processors to maximise performance and ensure maximum power efficiency.
So maybe this move to multicore is not such a big step when you realise that many of today’s single-core processor handhelds are already using multiple processors – the performance and power advantage of such multiple-processor embedded devices is simply being extended to include the multicore processor.
Ian Rickards is CPU product manager at ARM