Every digital / FPGA designer should be aware of clock domain crossing between asynchronous clock domains. As our devices get more and more complex it is easy for our designs to operate with several asynchronous clock domains. Of course, asynchronous clocks are those which do not have a fixed phase relationship. This means the clocks will originate from different sources, or if divided from a common source it will not be an integer division. As you would expect we have looked at how to detect and correct asynchronous CDC in AMD devices before. Along with how we can use third party tools such as blue pearls visual verification suite to identify asynchronous CDC in our designs.
Synchronous CDC occurs when two clocks domains are of different frequencies but have a defined phase relationship. This means the clocks are often generated from a common clock source and are integer divisions of the clock source. Synchronous CDC is simpler than asynchronous CDC when it comes to transfer data between registers clocked by different clocks but there can still be complications.
In our AMD devices, synchronous CDC most often occurs when the clocks originate from the same MMCM / PLL. Using different outputs from the same MMCM will result in the phase error adding to the clock uncertainty for the clock path.
As clock frequencies increase, this phase error and clock uncertainty can present issues with achieving timing closure for both set up and hold when transferring data between registers clocked by different sources.
One approach which we can use if we are operating at higher clock frequencies and the clock uncertainty and phase error may result in the timing closure issues is to leverage the capabilities of the BUFGCE cell in UltraScale devices.
In place of using different outputs from the MMCM to provide clocks synchronous to each other, we can leverage the BUFGCE. Lets look at a simple example which wants to generate clocks at 500 MHz, and 250 MHz.
Our initial approach would be to consider using two MMCM outputs, one set for 500 MHZ the other set for 250MHz. However, as stated above this might lead to timing issues later in the implementation.
A better way is to use one output from the MMCM, in this case 500 MHz and then use two BUFGCE’s to provide the downstream clocks. One set for a division ratio of one so it passes 500 MHz straight through it. The other BUFGCE set for a division ratio of 2 so the 500 MHz signal is divided to create the 250 MHz clock.
Taking this approach prevents the phase error from contributing to the overall clock uncertainty, improving the final timing performance.
Lets take a look at this in Vivado creating a simple application which has a counter which runs at 250MHz and can be restarted by an external signal. This external signal for this exercise is sampled at 500MHz, to detect a rising edge. In reality we may also use such an approach to implement a glitch filter also.
The design implements two identical logic designs which are on different timing paths. One uses the separate clock outputs from the MMCM (Counter Path1). While the second uses a single output from the MMCM and generates the division by using BUFGCEs to divide the 500 MHz clock to 250 MHz (Counter Path). Counter Path1 requires the addition of the phase error to the clock uncertainty while the counter path does not.
When we take such an approach, we do need to consider a few key rules
1. The enables or resets on the BUFGCE must be provided from the same source, using a different source may introduce a phase shift which is not reported.
2. Ensure the buffers for the clocks are of the same type, for example both BUFGCE, not a mix of BUFG and BUFGCE due to the different propagation delays in the cells.
Of course if we do have several clock emanating from the same MMCM, we can use the CLOCK_DELAY_GROUP constraint to balance the clock nets necessary. We should be careful not to overuse this constraint however, as I can put stresses on the clock placer.
Hopefully next time you are faced with architecting the clocking network of an AMD FPGA you will be able to leverage these principles along with others we have talked about recently to help you achieve timing closure.
Hopefully this helps you understand how to organise your clocking better and remember we have a webinar on timing closure coming up soon, see the links below.
If you want to dive further into this topic UG949 has a great explanation also here
Workshops and Webinars
If you enjoyed the blog why not take a look at the free webinars, workshops and training courses we have created over the years. Highlights include
Upcoming Webinars Timing, RTL Creation, FPGA Math and Mixed Signal
Professional PYNQ Learn how to use PYNQ in your developments
Introduction to Vivado learn how to use AMD Vivado
Ultra96, MiniZed & ZU1 three day course looking at HW, SW and PetaLinux
Arty Z7-20 Class looking at HW, SW and PetaLinux
Mastering MicroBlaze learn how to create MicroBlaze solutions
HLS Hero Workshop learn how to create High Level Synthesis based solutions
Perfecting Petalinux learn how to create and work with PetaLinux OS
Boards
Get an Adiuvo development board
Adiuvo Spartan 7 / RPi 2040 Embedded System Development Board
Adiuvo Spartan 7 Tile - Low Risk way to add a FPGA to your design.
Embedded System Book
Do you want to know more about designing embedded systems from scratch? Check out our book on creating embedded systems. This book will walk you through all the stages of requirements, architecture, component selection, schematics, layout, and FPGA / software design. We designed and manufactured the board at the heart of the book! The schematics and layout are available in Altium here Learn more about the board (see previous blogs on Bring up, DDR validation, USB, Sensors) and view the schematics here.