Whack-a-Mole is an old (pre-electronic) arcade game where moles randomly pop out of holes. The aim is to hit each mole with a mallet, causing it to retreat and earning you points. I should point out these are not real moles.
Achieving timing closure on complex FPGA designs bears a strong resemblance to this game, but is a lot less fun.
Why FPGA design is like Whack-a-Mole
The most significant time-sink in FPGA design (apart from verification) is timing closure. This is the often iterative process of ensuring that each and every path in the design meets the required timing. Timing closure is easy and automatic for relatively small and slow designs, so they do not demand too much from the FPGA device or the implementation tools.
However, most designs do not fall into that happy category and as each critical path is adjusted in order to meet timing, new ones are uncovered or created anew.
Looking at Figure 1, only path C has negative slack. However, as the design or the constraints are altered to improve path C, then paths B and E become critical and so on until finally all paths meet timing. The similarity to Whack-a-Mole becomes very apparent but now it is irritating rather than fun.

Figure 1: Iterative timing closure
Fixing it
So, surely, new FPGA families fix this problem?
Indeed, FPGA vendors are excellent at creating new generations of FPGAs in order to give higher performance and higher capacity. However, FPGA users are even better at finding ways to push FPGA capabilities beyond the limits.
Recognising that today’s leading FPGA designs contain many clock domains, use embedded multiply-accumulate functions, include embedded processors and need a variety of memory resources, FPGA devices have arisen with specific resources distributed throughout the device, ready to implement these functions.
This is very welcome but in some ways this distribution of embedded resources makes the timing closure problem more acute. The embedded functions are themselves a source of routing variance as shown by the following.
Suppose your design needs a change to increase the size of a RAM. There may be ample RAM resources in the FPGA device, but you may need to use a large block RAM rather than a collection of distributed RAMs.
The synthesis tool can easily map to the required RAM but the block RAM may only be available in specific columns on the FPGA device, so the placement of your design is distorted from the original placement. Critical routing is stretched to and from the column holding the new RAM.
Alternatively, the design is re-placed so the path is now closer to the new RAM but other converging or divergent paths are stretched instead. All of this would be unpredictable for traditional synthesis tools.

Figure 2: How paths are stretched by RTL changes
Why is routing prediction important?
Routing delay is inherently unpredictable because there are many different routing paths on the FPGA between a driver and a load.
Each path has a different delay, and logic synthesis at the start of the tool flow cannot predict which path will be chosen by the place and route algorithms at the end of the tool flow.
The fastest routing resources are usually the scarcest and routing congestion often leads to sub-optimal path delays. Just building FPGA devices with more and faster routing resources is not the answer because the FPGAs would be less area efficient, more expensive and power hungry.
One unfortunate by-product of the new FPGA generations is that routing delay has become by far the most critical portion of the overall delay in the critical paths. Therefore predictability of timing in a synthesis, place and route flow has degraded with each generation.
In the end, the cause of timing closure boils down to the discrepancy between path timing predicted by synthesis and that actually achieved by place and route. The answer is for placement to be earlier in the flow and embedded within synthesis. A combined synthesis, place and route tool would dramatically reduce the number of iterations required for timing closure.
How to win at Whack-a-Mole
Consider a large design with multiple complex modules (Figure 2).
In many cases a fix to a timing problem involves changing the RTL and, usually, changes that improve timing also increase resource usage.
Module A has been placed next to modules B and C. When we ‘fix’ the RTL for module A to resolve a timing problem, the module expands to use resources previously used by B and C.
This forces components in B and C to move and paths to stretch, often creating new critical paths. Nothing about the logic in B or C has changed and a normal logic synthesis flow would not change its results because the estimate of interconnect delays in B or C would be exactly the same as before A increased.
The cause-and-effect coupling between RTL in module A and the new critical path in B or C is a physical coupling. In general, effects that lead to unpredictability in design iterations are physical in nature, which leads naturally to true physical synthesis.
In true physical synthesis, when the RTL for A is changed and it enlarges, stretching the wires in B and C, the new longer interconnect is correctly estimated and accounted for so a new combination of optimisations, placement and local routing along the new critical paths automatically fix the problem during the same physical synthesis run.
In this case many of the moles are automatically whacked for you and you never even see them try to pop up.
Estimating the total delay of a path in the FPGA has become very difficult, so timing closure has become the time-sink for many designers today.
Physical synthesis provides the required routing predictability in order to enable controlled, more predictable timing convergence. True physical synthesis systems help bring FPGA projects to successful conclusions in a shorter time, allowing you more time to go down the fairground and have some fun.
True physical synthesis
Many tools claim to be performing physical synthesis but in reality are using timing information from previous place and route runs to re-optimise the synthesis results. However, the original synthesis is unaffected so it is too late to undo major structuring decisions made by the synthesis.
In true physical synthesis, the synthesis does the placement itself so it knows with confidence where the routing will go on the final FPGA. Timing prediction is thus much more accurate and timing closure problems are avoided.
Doug Amos is director of European business development at Synplicity