At the introduction of the 12,000 gate 32bit ARM Cortex M0 recently, an ARM spokesman told EW the M0 was close to the 10,000 gates normally occupied by the 8bit 8051 microcontroller.
Inspired by this, EW wondered how far state-of-the-art design techniques could shrink the venerable 8051, and came across T8051, a 2,700 gate version from Polish intellectual property (IP) firm Evatronix.
"The gate count is the trump card of the T8051," said Evatronix. "There is no 8051-compatible microcontroller on the market that would even come close to the achieved values."
Evatronix' IP portfolio has several 8051s, including the R8051XC2 which it claims is the fastest available - more of that later.
Its T8051 is aimed at mixed signal Asics made in older fabs - processing 500nm and 350nm silicon - where die area and therefore gate count is at a premium.
"It is for where something is needed to store or transmit data from an ADC, for example. People sometimes use a hard-coded state-machine for this, but a state machine cannot be modified without re-running all the synthesis and all the verification." Evatronix 8051 product line manager Maciej Pyka told EW. "We wanted a processor solution with the same number of gates as a state machine."
The T8051 had to be small.
"Simply, the core was designed from the beginning with size in mind. We defined the absolute minimum of resources to execute the 8051 instruction set," said Pyka.
For example, the design team gave up using a dedicated register to generate the memory address bus.
"Usually, this is coded as a separate register loaded with the programme pointer or data," explained Pyka. "We skipped that and said the programme counter register itself would generate the bus, and stored the programme count in a general purpose register when we loaded the programme counter register data."
The additional data path required to temporarily transfer the programme count to a multi-purpose register used less gates, and less power said Pyka, than a separate memory address register with its logic.
Even optimised for size, cycle for cycle Pyka claims the T8051beats so-called 'four-cycle' 8051s - designs that need four clock cycles to execute the shortest instruction.
"In Dhrystones it is twice as fast as the 8051 from Synopsys, and comes close to the two-cycle design from Mentor Graphics," he said.
Intel's 8051 was 12-cycle.
"The T8051 core is 4.1 times faster than the original 12-clock core in terms of Dhrystone benchmark," said Pyka.
Compared with both of these, Evatronix latest design is a scorcher.
"The R8051XC2 is 9.4 times faster than the original," said Pyka. "With architectural extensions beyond the standard 8051 architecture, the R8051XC2 goes 12.1 times faster."
All the Evatronix 8051 cores are binary compatible to standard 8051.
Several configurations R8051XC2 are also peripheral set compatible.