Today’s technology nodes, at 90nm, 65nm, and below, enable a tremendous amount of functionality on a single chip. To create the high-performance and low-power requirements of consumer-centric, high-volume, convergence devices, designers use increasingly advanced multi-clocking architectures and sophisticated power modes.
In many projects, especially those that extensively leverage reuse of legacy and third-party IP blocks, it is not the design but the verification of these system-on-chips that is the bottleneck. Many companies, therefore, have deployed new verification technologies - such as constrained-random stimulus generation, assertion-based verification, and coverage-driven verification - on their projects and have substantially raised their verification productivity.
However, a new class of problems has emerged that current simulation-based technologies cannot address very well. These issues pertain to data that has to cross from one clock domain to another, so called clock domain crossing (CDC) signals. If the circuitry handling the data transfer across clock domains are not carefully designed, they will cause intermittent bugs that will very likely slip through the verification process and end up in silicon, ultimately affecting yield and, in the worst case, respins.
Fully understanding and planning every clock domain crossing (CDC) is a difficult task. Collett (Collett International, 2005 IS/Asic Design Closure study) reports that in 2005 over 15 per cent of designs already use 20 or more clock domains, which literally means thousands of CDC signals. Using simulation alone, it is virtually impossible to verify that these designs do not have clock domain crossing issues. Indeed, in the same report, Collett reports that clocking issues are now the second most prevalent defect in all chips that need a respin.
Illegal Crossings When data is passed between two clock domains, it can get corrupted when the clock edges of the sending and receiving flip-flops are close to each other. For example, if a receiving flip-flop latches-in the data at the same time that the sending flip-flop is changing the value, then the value that the receiving flip-flop receives is neither a 1 nor a 0 but somewhere in between. This value will settle eventually to a 1 or a 0, but the logic that the flip-flop drives may generate random values because of the delay. Random values in the design are not good news.
If a bit crosses between the domains and goes to more than one receiving flip-flop or multiple bits cross between the domains, then corruption is even more likely, as the failure described above is combined with another failure mode. It is possible that one receiving flip-flop will see a value change before or after another flip-flop. For example, if three bits transition across the clock domain and their values are ‘111’ at one clock cycle, then at the next transition they will be ‘000’; however the receiving flip-flops may see ‘101’ for a cycle before ‘000’ is correctly latched in. This corrupt value could have unexpected consequences.
When these errors occur, they can be extremely difficult to resolve. The data corruption may only occur after hours or days of operation, when the clock edges line up in exactly the wrong way. Every part can fail if the clocks line up in the wrong way. These issues cannot be screened with test vectors on a test system. It is unlikely that a firmware fix will be able to resolve these issues as well, as they often manifest as data corruption rather than simple control issues. It can take months of engineering time to find, understand, and fix these issues, and the problem may not be detected until a significant number of units have failed in a customer’s hands. Especially with today’s complex chips, the low yields and design respins caused by these types of bugs are dangerously expensive.
Synchroniser crossing guards Several synchroniser structures exist to transition data and control signals safely between clock domains. The challenge lies in ensuring that the structures in a given design are valid synchronisers. Engineers have tried for several years to address this issue using simulation techniques; however, simulation alone isn’t well suited to finding CDC bugs, resulting in mismatches between errors detected in simulation and those that show up in hardware. Even new advanced test bench methodologies that use constrained random environments and assertions are highly unlikely to catch a missing synchronization bug. Because simulation environments are designed to catch functional defects, most simulation runs do not account for indeterminate values, assuming that values instantly resolve between clock cycles. Furthermore, timing information is often removed from the design to dramatically improve simulator performance through abstraction.
To compensate for this, two main simulation techniques have been used: one is to add synchronisers that jitter and the other is to sweep the clock frequencies of asynchronous clocks. But these do not work well for traditional asynchronous clock domain crossings and can be impossible to apply to synchronous crossings that are being treated as asynchronous crossings. Both carry the risk of false failures, rely to a large degree on luck, and lack the automation and coverage necessary to significantly reduce risk.
A Complete, Automated CDC Solution What is required is an automated solution that completely verifies that all CDCs behave as they ought to. All of the clocks and CDCs must be identified and verified. It must be determined that all signals crossing clock domain boundaries are safely managed and that all of the synchronisers spanning clock domains are correctly structured.
Once the basic structures are validated, any parameters set by the user must be checked for correctness. A complete CDC solution should automatically generate assertions that can be used during simulation to validate these user definitions, and it must measure and report coverage metrics. It should also verify that if one synchroniser catches a data transition in the design before or after another synchronizer, it will not cause a change in the output of the design, as this could be a CDC defect.
Finally, CDC verification solution must inject jitter on any signals that cross clock domains during simulation in order to verify that the design is immune to the issues it will face once in the field.
Multiple clock domains are essential to the realisation of today’s performance and power optimised designs. Only through an advanced, automated CDC verification solution such as this can you have confidence that your multi-clock designs will function correctly the first time and your target yield will be met.
Dan Cohen, applications engineer with Mentor Graphics and Curtis Banks is a staff design engineer with the storage component IC physical design group of LSI Logic.