Electronics Weekly Magazine
Loading

Sign-up for newsletters:

Electronics Weekly newsletters - Sign up for Made By Monkeys, Mannerisms, Gadget Master and Daily and Monthly newsletters

The way for fast multicore SPICE simulations

Tuesday 16 November 2010 11:16

Guest columnist Hany Elhak, analogue product marketing manager at Synopsys, believes that when carrying out SPICE simulation, accurate should not mean sluggish

When SPICE simulation was invented at UC Berkeley in the 1970s and later commercialised as HSPICE in the 1980s, analogue circuits consisted of a few hundred transistors. At the time, the technology was good enough for circuit designers to accurately simulate state-of-the-art designs using CRAY 1 and VAX mainframes.

Over the years, analogue circuits increased in size and complexity, and fabrication processes evolved—leading to more complicated device models.

While simulation technology continued to improve to keep up with the challenge, the rapid advancements in computer technology assumed most of the burden. Circuit designers successfully simulated their ever-growing circuits in reasonable times because they ran the simulations on computers with faster microprocessors and larger memories. To the misfortune of the simulation industry, microprocessor speeds saturated around 3GHz, and the computational ‘free lunch’ is over.

During the last decade, FastSPICE technology evolved as a solution to the simulation bottleneck. FastSPICE simulators exploited the knowledge of circuit topologies to find shortcuts to reduce simulation time.

A divide-and-conquer approach was used to partition the circuit into manageable loosely-coupled blocks that could be solved independently. Simplified device models were also used to speed up computations.

FastSPICE simulators calculated the timing and power characteristics of memory and custom digital circuits with hundreds of millions of transistors at a reasonable level of accuracy. Hence, only smaller analogue blocks with high accuracy requirements were left for SPICE simulators to crunch.

The problem we are facing today is that these analogue blocks with high-accuracy requirements such as phase locked loops (PLLs), data converters and power management circuits are getting larger, with their post-layout versions reaching millions of transistors.

Accurate SPICE simulators have run out of steam, computer platforms are not getting any faster and FastSPICE simulators are not delivering the accuracy required for characterising these analogue blocks.

Today, analogue designers must either cope with simulation runs that last days and sometimes weeks or resort to faster runs at lower accuracy. To speed up SPICE simulation without compromising accuracy, we need new simulation algorithms that take advantage of modern computer architectures.

Such computers consist of multiple processor cores sharing the same memory and use integrated cache to reduce memory communication. Traditional simulation algorithms based on sequential computations cannot benefit from these modern computers.

To appreciate the challenge of developing simulation algorithms on modern multicore computers, let us look at how circuit simulation works.

A typical SPICE simulation analyses the circuit on a large number of time steps. Each time step consists of multiple iterations, each of which can be broken down into two major tasks:

- Evaluating the devices in the circuit and loading them into a matrix

- Solving the matrix to calculate voltage and current at each node

The iterations continue until the circuit converges, then the simulator moves to the next time step and repeats the same process.

The percentage of simulation time spent evaluating the devices and solving the matrix is dependent on circuit type.

The key to accelerating SPICE on a multicore computer is to be able to parallelise as much of each individual task as possible without sacrificing accuracy.

Device evaluation is the dominant activity for small pre-layout circuits.

This may take up to 75% of the simulation time and scales linearly with circuit size.

Traditional SPICE simulators distribute this task on multiple CPUs, achieving a modest level of parallelisation.


Leaving more than a third of the simulation in sequential tasks, one can only expect 2-3x speed-up using 8-core computers, as Amdahl’s law predicts (see figure 1). Scalability is even worse on large post-layout circuits in which device evaluation represents less than half the simulation time.

Solving the matrix can consume more than 50% of the simulation time for large post-layout circuits.

Significant scaling can be achieved by running this task in parallel. However, solving a sparse matrix, the typical matrix form for electronic circuits, involves a great deal of sequential activities.

Most parallel SPICE simulators apply parallel matrix computation to the same old sequential algorithm. Hence, they scale poorly.

Even if a simulator performs 90% of the computation in parallel (90% parallelisation efficiency), Amdahl’s law predicts a theoretical speedup of 5x on 8 cores.

The scaling is further limited by the speed of data as it moves between processors and memory. The actual speed-up can be less than 3x on 8 cores.

To obtain highly-scalable computations, the parallel efficiency of the underlying code must be very close to 100%.

Another simulation challenge is that the order of data processing is not cache-efficient. Cache efficiency is important for multicore performance because processors compete for cache and memory access.



HSPICE approaches multicore architectures with a technology called HSPICE Precision Parallel (HPP). This uses adaptive sub-matrix, a scalable algorithm that divides the matrix-solving stage into smaller tasks that can be efficiently performed on multiple cores.

Furthermore, it parallelises other small non-serial tasks such as output and time step control, achieving parallelisation efficiency close to 100%.

The algorithm dynamically balances thread computation loads and minimises communication between threads. The result is up to 7x scaling on 8 cores at full SPICE accuracy.

HPP is also optimised for integrated cache. It localises the data used by each thread and minimises cache misses by limiting data blocks to fit into the highest level cache.

Attention to memory efficiency also provides the benefit of high capacity. HSPICE is capable of simulating post-layout circuits in excess of 10 million elements.

As analogue circuits increase in size and complexity, simulation technology strives to keep pace. By using modern simulation technologies, analogue designers no longer need to tolerate weeks of simulation time or resort to less accurate simulations.

 

Comments powered by Disqus

Share the content

Most Viewed

Products

Related Jobs

Resources