There are many factors that come together to create a successful FPGA design. Not only do we need to have a sensible architecture, IP reuse, and write high-quality code that exploits the architectural features of the FPGA we are targeting, but we also need to ensure our design can close timing. For beginners and experienced designers alike, achieving timing closure can sometimes be seen as a challenge.
However, it does not have to be this way. In this blog, we are going to take a look at how we can work towards timing closure by creating a baseline.
The objective of timing closure is to ensure:
All active clock pins are covered by a clock definition.
All endpoints have a constraint with respect to a defined clock.
All input ports have an input delay constraint.
All output ports have an output delay constraint.
All timing exceptions have been correctly identified.
A design that has achieved timing closure has the following aspects:
Worst Negative Slack (WNS) and Total Negative Slack (TNS) greater than or equal to 0 ns when performed considering the maximum delay analysis.
Worst Hold Slack (WHS) and Total Hold Slack greater than or equal to 0 ns when performed using the minimum delay analysis.
Worst Pulse Width Slack and Total Pulse Width Slack greater than or equal to 0 ns.
A baseline timing closure enables us to focus on the simplest set of timing constraints, which are predominantly our internal timing constraints.
To create a baseline timing closure, we undertake the following steps:
Post-Synthesis:
Once synthesis has completed, use the timing wizard to define the constraints of the design. Skip the IO constraints. Run the validation methodology report to identify any major methodology issues, for example, clocks not being on global clock routes, etc.
Post-Optimization:
Optimize the design using the TCL command opt_design and report the timing. This will identify any failing paths. For example, in the design extract below, there are several timing failures between the AXI Smart Interconnect and the AXI BRAM Controller.
While UG949 recommends addressing timing issues pre routing which fail by over 0.5ns, in this instance I decided to insert a AXI register slice in the design and re run synthesis and optimisation to ensure the issue was addressed.
With the identified timing issues corrected the next stage is to run the placement of the design.
Post-Placement:
The flow post-placement is the same as for the post-optimization stage. By reporting the timing, we are able to see what the predicted timing performance will be and address any cases of negative slack. Along with addressing any indicated large hold delays which are greater than 0.5 ns. WHS delays which are lower than this may be able to be addressed by the routing algorithms
Post-Routing:
The completion of the routing provides the most accurate indication of the timing performance, and it is at this stage we need to address any timing issues in both the WNS and WHS. However, by looking at the timing at each stage (post-synthesis, optimization, and placement), we are trying to resolve the issues as they are presented to make the final elements easier to address.
When it comes to addressing the issues reported at each stage of the design, we can extract information from Vivado’s reports including:
Quality of Result Analysis - An assessment score that is indicative of how likely your design is to meet performance targets. Also provides flow guidance.
Quality of Result Suggestions - Identifies issues with a design and offers solutions in the form of tool switches, properties that influence tool behavior on cells, and recommends text modifications where it is not possible to automate a solution.
Design Analysis Report – Provides information on timing path characteristics, design interconnect complexity, and congestion.
Of course, to achieve timing closure, we must change either the constraints, the strategy, or the RTL. Things we may want to consider to achieve timing closure are:
How many logic levels are there?
Are we using constraints/attributes that might prevent optimization?
Does a path contain elements such as BRAM and DSPs which have high logic delays?
Are there any high fan-out nets?
Are the cells placed far apart?
Are we missing pipeline registers?
Once we have achieved the baseline timing closure, we are able to progress further and work toward full timing closure once all IO are correctly constrained.
Of course, for this, we will take more time; however, the established baseline will help considerably.
Hopefully, this helps you understand the flow of achieving the baseline timing closure. I plan on running a workshop in June on just this subject.
Workshops and Webinars
If you enjoyed the blog why not take a look at the free webinars, workshops and training courses we have created over the years. Highlights include
Professional PYNQ Learn how to use PYNQ in your developments
Introduction to Vivado learn how to use AMD Vivado
Ultra96, MiniZed & ZU1 three day course looking at HW, SW and PetaLinux
Arty Z7-20 Class looking at HW, SW and PetaLinux
Mastering MicroBlaze learn how to create MicroBlaze solutions
HLS Hero Workshop learn how to create High Level Synthesis based solutions
Perfecting Petalinux learn how to create and work with PetaLinux OS
Embedded System Book
Do you want to know more about designing embedded systems from scratch? Check out our book on creating embedded systems. This book will walk you through all the stages of requirements, architecture, component selection, schematics, layout, and FPGA / software design. We designed and manufactured the board at the heart of the book! The schematics and layout are available in Altium here Learn more about the board (see previous blogs on Bring up, DDR validation, USB, Sensors) and view the schematics here.
Sponsored by AMD
Yorumlar