ISSCC: 64bit ARM v8 and POWER8
Aimed at servers, its 84million transistor Potenza processor module (PMD) has two ARM cores sharing a 256kbyte L2 cache.
Each core of the PMD has a four-wide out-of-order superscalar micro-architecture.
“Execution units are crafted with pipeline designs for concurrent handling of one load, one store, two integer, as well as multiple ASIMD/floating-point operations,” said the firm. “Micro-architectural elements include branch predication, separate L1 instruction and data-caches, L1 and L2 data pre-fetch, and hardware table walk.”
The initial server configuration has four PMDs on the chip sharing 8Mbyte of L3 cache and four DRAM channels via a central switch.
Fabricated on 10 metal layer 40nm bulk CMOS, each PMD occupies 14.8mm2 and runs at up to 3GHz, averaging 4.5W consumption from 0.9V “under representative workloads”, said the paper.
In detail, the ISSCC presentation deals with memory building blocks – largely a 2kbyte (0.374µm2 6T SRAM) cell, power distribution and on-chip clock distribution.
ISSCC paper 5.8 ‘A 3GHz 64b ARM v8 processor in 40nm bulk CMOS technology‘
In another highlight, also aiming at servers, IBM is revealing its Power8 processor which has 12 eight-threaded cores with 96Mbyte of L3 achieving 2.5x performance improvement over its earlier Power7+.
Implemented in 22nm SOI eDRAM technology with 15 levels of metal, the processor has huge off-chip bandwidth, integrated voltage regulation and resonant clocking.
So novel are the last two features that the IEEE has allocated them their own separate papers at ISSCC 2014.
The distributed system of micro-regulators, implemented in the same 22nm eDRAM technology, can deliver 12.3A to the Power8 with 90.5% efficiency and a density of 36W/mm2. Its multimode resonant clock can oscillate from 2.5GHz to greater than 5GHz and can “reduce clock grid power by 33% and dynamically switch between high and low resonant frequency modes without idle cycles” claims IBM.
ISSCC paper 5.1, ‘POWER8: A 12-core server-class processor in 22nm SOI with 7.6Tbit/s off-chip bandwidth.’
“IBM and Applied Micro processors represent two major aspects of processor development: extreme performance for ‘big data’ handling and power efficiency for cloud computing,” said Session chair Atsuki Inoue of Fujitsu. “These greater levels of performance and energy efficiency in ever more dense form factors will extend our abilities from increasing multimedia social computing to scientific and medical applications, such as understanding human genomes.
Both Intel and AMD are talking about thier new processors in the same session.
Intel 4.31 billion transistor 15 core Ivytown Xeon has 37.5MB of shared L3 cache on a 22nm finfet process with nine metal layers.
In a second paper, Intel is talking about its 22nm finfet CMOS graphics core with adaptive clocking to deal with voltage droops. Operation is down to 0.38V.
A third Intel paper covers its Haswell fourth-generation Core processor, also using 22nm finfets. It has integrated voltage regulators and graphics with embedded DRAM providing 102Gbyte/s bandwidth at 1.22pJ/bit. Compared with earlier versions, standby power is down by 95%, and floating point capability is doubled.
AMD’s 236 million transistor x86 Steamroller occupies 29.47mm2 in 28nm and uses a shared 96kbyte 3-way instruction cache and 10kbyte branch target buffer improve single and multithreaded performance compared its earlier 32nm design. Steamroller is another processor using resonant clocking.
It also gets a second paper, once more dealing with clocks throttling back automatically to deal with voltage droops. AMD estimates it allows 7-15% better power efficiency, as frequency can be maintained at lower voltage.