Intel projects faster execution through slow cooking

| No Comments | No TrackBacks

Share |

For the past couple of decades, Intel has followed the tenets of Pollack's Rule: that the performance of microprocessors increases with the square root of the number of transistors used. Writing in the Communications of the ACM, Intel architects Shekhar Borkar and Andrew Chien argue that Pollack's Rule never really lived up to the promise and it's really time to put it to rest.

Over the past 20 years, during which processor performance increased roughly 1000-fold, two orders of magnitude came from faster transistors alone. Microarchitectural enhancements made up the balance.

To keep up with the performance trend, a processor - well, a processor array - will need to deliver 30 times more instructions per second by 2020. The key to that is likely to be more memory - and lots of it. And slower individual processors.

One of the problems of simply adding more processors of the kind used today with the same sort of memory hierarchy, the authors argue, is that the power consumed by the on-chip network alone - as shared data shuffles around the array - will become a significant fraction of the total power budget. One projections puts the consumption at 35W out of an average desktop power budget of 65W.

Increasing the size of local caches to reduce the amount of cross-chip traffic to where they are much larger than those in use today can dramatically reduce this network consumption. It's worth noting that Borkar is a fan of 3D architectures, claiming at the recent Design Automation Conference (DAC): "I've been trying to find a killer app for 3D for the past ten years."

It is not in the ACM paper but stacking local memory on top of processor cores might be a way to achieve the massive increase in local-memory capacity without eating too far into the logic transistor budget. Having said that, the processor of 2020 should have a lot of transistors available to it and, as the emphasis will be on capacity rather than raw access speed, single-transistor memories might provide a way to get the necessary capacity on one die.

A more radical departure for Intel is to move to near-threshold transistor operation. The company has worked on accelerators using this approach and Borkar and Chien reckon this could work for at least some of the cores on a future Intel processor.

Cycle times increase dramatically as you scale towards the threshold voltage. But so does energy efficiency, peaking as you head into the subthreshold region. As single-thread performance still matters and will matter in 2020, you cannot use near-threshold logic to run applications that do not parallelise well. So, you reserve several cores, at least, for relatively high-voltage operation then, for throughput, deploy tens or hundreds of much smaller, much slower near-threshold cores - either general-purpose or armed with application-specific accelerators.

A more radical change would be on the software side. Having cache-coherency protocols blasting the on-chip network with snoop traffic is not going to help with power efficiency.

Borkar and Chien argue: "...these future systems may drop hardware support for a single flat address space (which normally wastes energy on address manipulation/computing), single-memory hierarchy (coherence and monitoring energy overhead), and steady rate of execution (adapting to the available energy budget). These systems will place more of these components under software control, depending on increasingly sophisticated software tools to manage the hardware boundaries and irregularities with greater energy efficiency."

Languages such as Erlang make it easier to have threads cooperate without using shared memory and, at some point, C++ is going to have to acquire some primitives that make the language more suitable for parallelisation.

The authors stress: "Efficient data orchestration will increasingly be critical, evolving to more efficient memory hierarchies and new types of interconnect tailored for locality and that depend on sophisticated software to place computation and data so as to minimise data movement. The objective is ultimately the purest form of energy-proportional computing at the lowest-possible levels of energy."

The Low-Power Design Blog is enabled by Mentor Graphics. The company has focused years of R&D on low-power design techniques and is glad to support a resource that highlights creative methods for reducing the power consumption of electronic systems.

No TrackBacks

TrackBack URL: http://www.electronicsweekly.com/cgi-bin/mt/mt-tb.cgi/202118

Leave a comment

OpenID accepted here Learn more about OpenID
Powered by Movable Type 4.37




Blog support

The Low-Power Design Blog is enabled by Mentor Graphics. The company has focused years of R&D on low-power design techniques and is glad to support a resource that highlights creative methods for reducing the power consumption of electronic systems.

Author Profile

Chris Edwards
Chris is a freelance technology journalist. He writes regularly for Engineering & Technology and New Electronics.

Archives

About this Entry

This page contains a single entry by Chris Edwards published on June 17, 2011 5:05 PM.

Intel tries near-threshold logic for crypto circuitry was the previous entry in this blog.

Lower energy: a strong driver for 3DIC is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.