AMD's K7 stands out among the x86 crowd

AMD’s K7 stands out among the x86 crowdRichard Ball  
 
Soon to be released to the public, K7 is the latest generation x86 processor from AMD. A couple of important things make this device stand out from the crowd: Firstly, it could mark the first time AMD takes a performance lead over Intel; and second, attention has been paid to processor’s floating point performance, often poor in x86 clone devices.
AMD’s paper at the conference details the architecture of the floating point unit (FPU) of the K7.
With 2.4 million transistors, occupying about 15 per cent of the die area, the FPU is no small part of the chip. It is responsible for executing all x86 floating point, MMX and 3DNow! instructions. The latter are AMD’s version of MMX for floating point numbers. Operation
FADD
FMUL
FDIV
MMX Add
MMX Mul
3DNow! Add
3DNow! Mul Latency 4
4
16/20/24
2
3|
4
4 Throughput 1
1
13/17/21
1
1
1
1  
 
The front end takes up to three x86 FP instructions per cycle and decodes them into internal execution op-codes. When the operands (data) are available these are sent to the 36 entry scheduler.
The scheduler sends the operands to the relevant execution units when they become available. A retire block keeps the state of the core up to date, freeing up registers when their associated op-code is completed.
Three execution pipelines are available in the core; an add, a multiply and a store pipeline. The first carries out all FPaddition, subtraction and compare operations.
The multiply pipe computes FP multiply, remainder, division, square roots, MMX operations and 3DNow! multiplications. A 76 x 76-bit multiplier takes four cycles to compute a multiply, and 16 to 24 cycles for a division.
Multiplys can fill in unused cycles during a division to maximise use of the pipeline.
The store pipe…? It stores.


Leave a Reply

Your email address will not be published. Required fields are marked *

*