Electronics Weekly Magazine
Loading

Sign-up for newsletters:

Electronics Weekly newsletters - Sign up for Made By Monkeys, Mannerisms, Gadget Master and Daily and Monthly newsletters

How to win with multi-core processor design

Tuesday 21 June 2011 12:45
How to win with multi-core processor design

Niall Cooling, managing director at embedded training specialist, Feabhas, looks at the challenges and benefits of moving your embedded design to multi-core technology.

It feels as if not a day goes by without a new announcement about a major development in multi-core technology. With so much press surrounding multi-core, you have to ask the question “Is it for me?” – can I utilise multi-core technology in my embedded application?

However, from a software developer’s perspective all the code examples seem to demonstrate the (same) massive performance improvements to “rendering fractals” or “ray tracing programs”. The examples always refer to Amdahl’s Law, showing gains when using, say, 16 or 128 cores.

This is all very interesting, but not what I would imagine most embedded developers would consider “embedded”.

These types of programs are sometimes referred to as “embarrassingly parallel” as it is so obvious they would benefit from parallel processing. In addition the examples use proprietary solutions, such as TBB from Intel, or language extensions with limited platform support, such as OpenMP.

So taking fractals and OpenMP out of the equation, is multi-core really useful for embedded systems?

For designs currently based around the traditional Linux pThread model (processes and threads) moving to the most common form for multi-core solution (symmetrical multi-processing – SMP) should be relatively straightforward, as Linux natively supports SMP within the kernel ­design.

SMP is where there is a single general purpose operating system (GPOS) running on top of multiple processing cores.

The GPOS takes care of the management of allocating and distributing tasks across the cores, in a transparent manner to the application. Today it is almost impossible to buy a modern desktop PC that is not already running Windows on an SMP platform.

In reality this transition may not be so simple; a well designed application should work on SMP multi-core without modification, but many applications aren’t well designed for threading.

The first common problem is that subtle bugs may appear that did not exist when executing on a single core. These are generally due to poor design practices (for example using same-priority FIFO scheduling to enforce mutual exclusion) or the misunderstanding or misuse of certain inter-thread synchronisation primitives (such as using thread priorities to guarantee ordering). o first and foremost we need to ensure an application is “SMP correct”.

Assuming our application is SMP correct, it may not necessarily make major performance gains. There are numerous reasons for this, but the major ones are that either the application or the libraries are not SMP optimised. For example, a library may be written in C++ using the standard template library (STL) algorithms’ such as std::find_if or std::search. 

Standard library implementations of the STL are unlikely to exploit multi-core parallelism. To become SMP optimised the application would need to be reworked to use, for example, the GNU parallel library for the C++ STL (built on top of OpenMP) replacing the standard library calls with their SMP optimised equivalents: __gnu_parallel::find_if and  __gnu_parallel::search.

Nevertheless, our biggest stumbling block is likely to be how can an existing non-threaded application, built around a uni-core design, utilise a multi-core solution?

Unfortunately an operating system cannot automatically parallelise your application; it will be left to you to partition it into threads. This is potentially the most difficult challenge as there are numerous and subtle complexities that come into play.

There are a number of approaches to aid this decomposition processes, for example, data decomposition, task decomposition and temporal decomposition. But can you work out where to refactor the code to utilise those extra cores without wasting a huge amount of time on trial and error?

For example, I have seen a ported application running slower on a quad-core system than its original uni-core system. This was a simple case of multiple threads sharing global data, which in turn lead to cache coherency issues.

There are numerous games you can play to help here, such as processor­-affinity, but all require detailed understanding of multi-core technologies. Interestingly there are already commercial products appearing – such as Prism from Critical Blue, and vfEmbedded from Vector Fabrics – which are specially designed to aid and semi-automate this process.

Moving away from GPOS, many of the high-end rReal-time operating systems (RTOS) also support SMP (such as QNX, VxWorks from Wind River and Integrity from Green Hills Software). What makes these solutions very attractive to the real-time designer is the support for Hypervisor technology.



In short, Hypervisors allow a GPOS to co-exist with an RTOS – you can run Android alongside the RTOS – often called asymmetric multi-processing (AMP) – while ensuring the real-time aspects of the design are not compromised by the GPOS.

Modern multi-cores are adding native features to support Hypervisor technology (such asARM’s TrustZone and Intel ‘s VT-x). For the real-time designer this gives detailed management of both software (locking tasks to cores) and hardware (mapping interrupts to cores), which, when well done can reduce power consumption compared to a single-core solution.

Can traditional RTOS and bare-metal applications on lower-end processors such as the ARM7 utilise multi-core? In short, no, not without a major amount of rework. Even though it is possible to create multi-core solutions based on, say, the Cortex-M family, the smaller RTOS is fundamentally not designed to support SMP. 

But before we discount it completely, there is another model; hybrid multi-core. This is where the cores differ and run as separate programs. As an example, NXP have recently released a dual-core design based around an ARM Cortex-M4 and a Cortex-M0 on the same chip (LPC4300).

Hybrid designs are starting to appear that mix multiple high-end cores (such as two Cortex-A8s) with one or more low-end core (such as a Cortex-M4). In this model the Cortex-A8s run in SMP model, with the Cortex-M0 doing the real-time work.

Multi-core is making major inroads into embedded computing. Recent developments in operating systems and support tools help that transition. However, for the smaller embedded system, until product evolution demands it, SMP multi-core may still be some way off.

www.feabhas.co.uk

 

Comments powered by Disqus

Share the content

Most Viewed

Products

Related Jobs

Resources