Pretty much any large system-on-chip (SoC) design today will have multiple processor cores in it. Often these are part of a subsystem handling communications, audio or video, alongside one or more host and application processors. An increasingly important challenge is how to debug these systems.
Better debug can improve the system performance, sometimes dramatically. In using debug technology to tune the performance of the on-chip interconnect to the requirements of the different processors, Israel-based SoC design firm, Mobileye increased the performance of its multicore image recognition chip by a factor of six with only a minimal increase in clock speed and power consumption.
The processor core designers are at the heart of the problem, as the chip designer needs access deep into all the different cores, cross triggering on different conditions. Technology for this bus tuning is being commercialised by First Silicon Solutions (FS2), a division of processor design firm MIPS Technologies.
“There are well established ways of doing processor level debug and everyone has their own solution with run control, setting break points and so on,” says Neal Stollon, director of technical marketing at FS2.
“The other problem is more communications related to latency especially, or peripherals which are being shared between cores that act as competing masters. It’s a bigger problem than you typically see in simulation because you have to simulate the entire system to solve this. We see companies facing the same problems over and over again.”
A lot of this technology has been used in-house for several years.
“In a set-top box you can have five, six, seven, 32-bit or 64-bit cores running at several hundred MHz for HDTV, and we are getting more and more integrated,” says Andy Jones, consultant system architect for the home entertainment group at STMicroelectronics. “We have been doing bus monitoring for years internally. The performance tuning is something we get almost for free once we have the debug infrastructure in place, but more standard tools would definitely help.”
Tom Pennello, technical director of software tools at ARC International and co-founder of tool company Metaware, says one of the first petabit routers had 16 ARC cores in. It supported multiple homogeneous processors with debug access for each core.
“From what I’ve seen of the different architectures, ARC is the only one where you can look at the contents of the cache and address tags for each core so you can look at the state of the core,” says Pennello. “We even do some of the bus transactions in simulation, but not in hardware.”
The key to the FS2 approach, called RRT, is agents in the cores and on the busses, linked to an external probe.
“These agents are extremely small and the path is extremely short. It’s very different from the idea of a big buffer for trace which increases the cost of the chip,” says Stollon. “The RRT agent on these chips have short buffers and can transmit the data to the probe online. We just have the possibility to slow down all the components, except the trace info. The MIPS cores, the image processing cores, the DMA, can be ten times slower so the tracing system has enough time to export the trace information to the probe. In this way we have no impact on the size of the chip.
According to Elchanan Rushinek, v-p engineering at Mobileye, being able to look at an entire frame of vision data is an important capability.
“We can have visibility about one or all of the agents on the bus so can be a record of all the latencies on the bus,” explains Rushinek. “Once we have this fast analysis we can focus on problematic frames and then run only that frame in full mode, racing both the MIPS cores and the RRT info. In this way we can have much better visibility including correlation between transactions of the bus and the code.”
Stollon says it allows 2GB of data to be buffered rather than putting these buffers on the chip.
“This is a pretty radical change to the way things have been traditionally done,” says Stollon. “It does require extra pins on the device but if you look at different ways to handle that, there are a pretty large number of pins available and some of these can be multiplexed for different applications, including the debug.”
This can be an issue with some customers, says Jones at ST.
“Many designs are pad limited so the cost is increased by adding pins rather than silicon. And it is getting worse, so people are looking at high speed serial links such as C-JTAG,” he says. “We are not deploying that today but it’s something we are keeping a very close eye on.”
But debug, and standards, are definitely more of an issue. “Resources for debug add cost but now our big customers are making it a requirement,” Jones says. “A standard would help significantly so the core support could be independent of the cores.”
ARM provides the CoreSight debug technology within its cores.
“The most common requirement we see today is for hetereogeneous systems with ARM processors and additional processor elements,” says William Oram, product manager for the CoreSight debug technology at ARM. “I think we have really developed the CoreSight on chip debug technology for the requirements of the system that will cover the visibility for the cores, but also for hetereogeneous solutions - that’s the way we see it going forward.”
But this needs visibility of other cores, so ARM provides a framework with all the system level components, with the correct on chip interfaces to plug in the ARM cores. It also supplies debug hardware and defines the interfaces, for which the specification is openly available.
“There are levels of standardisation and the problem with multicore is the hardware level is allowing a level of commonality that is architecturally neutral so that the semantics of the trace data is not significant,” Oram says. “But it’s also important to allow the specifics of the analysers to fit into that solution and at that level you are starting to get quite specific. We do both.”
“It does come down to the enlightenment of customers who see the productivity advantages of reducing the time to market,” says Jones at ST. “Others have yet to get there.”