Timing closure suggestions

fpgaoptimizationtimingtiming-analysisvhdl

I've a FPGA design ( I didn't write a single line of sources code) and I've to add a module ( in the design there is a Wishbone bus where it's possible to link others wishbone interface ). The modules linked to the bus are UART and "custom UARTS", there are 16 devices linked to the bus.

In my opinion the design is written really bad ( a lot of long combinational path linked to the wishbone, memory element not registred and a writing code style really caothic and far away to the hardware implementation ) but "it works and close the timing" without my module ( the clk constraints is 125 MHz and the PAR get 125.109 MHz with an occupation of resources of about 50% more or less ).

When I add my module the timing aren't met in several paths. The slow paths are outside my module. Now the question, can I be sure that the problems isn't my module ? Is the PAR report a sufficient way to prove that the problem is the others part of the design ?

Best Answer

Since you mentioned "par" it sounds like you're using Xilinx. You should run the static timing report "trce." Run "trce" with verbose timing: "-v 10." This will show the 10 worst paths even if the constraint is met. Sometimes you want to optimize/pipeline/register paths that aren't the very worst ones, because fixing these can help reduce routing congestion, and it lets you meet overall timing closure easier (faster MAP/PAR runs).

Related Solutions

Electronic – timing constraint for bus synchronizer circuits

I don't have experience with Quartus, so treat this as general advice.

When working on paths between clock domains, timing tools expand the clocks to the least common multiple of their periods and select the closest pair of edges.

For paths from a 36 MHz clock (27.777 ns) to a 100 MHz clock (10 ns), if I did my quick calculations correctly, the closest pair of rising edges is 138.888 ns on the source clock and 140 ns on the destination clock. That's effectively a 900 MHz constraint for those paths! Depending on rounding (or for clocks with no relationship), it could come out worse than that.

There are at least three ways to write constraints for this structure. I am going to call the clocks fast_clk and slow_clk as I think that's clearer for illustration.

Option 1: disable timing with set_false_path

The easiest solution is to use set_false_path to disable timing between the clocks:

set_false_path -from [get_clocks fast_clk] -to [get_clocks slow_clk]
set_false_path -from [get_clocks slow_clk] -to [get_clocks fast_clk]

This is not strictly correct, since there are timing requirements for the synchronizer to work correctly. If the physical implementation delays the data too much relative to the control signal, then the synchronizer will not work. However, since there isn't any logic on the path, it's unlikely that the timing constraint will be violated. set_false_path is commonly used for this kind of structure, even in ASICs, where the effort vs. risk tradeoff for low-probability failures is more cautious than for FPGAs.

Option 2: relax the constraint with set_multicycle_path

You can allow additional time for certain paths with set_multicycle_path. It is more common to use multicycle paths with closely related clocks (e.g. interacting 1X and 2X clocks), but it will work here if the tool supports it sufficiently.

set_multicycle_path 2 -from [get_clocks slow_clk] -to [get_clocks fast_clk] -end -setup
set_multicycle_path 1 -from [get_clocks slow_clk] -to [get_clocks fast_clk] -end -hold

The default edge relationship for setup is single cycle, i.e. set_multicycle_path 1. These commands allow one more cycle of the endpoint clock (-end) for setup paths. The -hold adjustment with a number one less than the setup constraint is almost always needed when setting multi cycle paths, for more see below.

To constrain paths in the other direction similarly (relaxing the constraint by one period of the faster clock), change -end to -start:

set_multicycle_path 2 -from [get_clocks fast_clk] -to [get_clocks slow_clk] -start -setup
set_multicycle_path 1 -from [get_clocks fast_clk] -to [get_clocks slow_clk] -start -hold

Option 3: specify requirement directly with set_max_delay

This is similar to the effect of set_multicycle_path but saves having to think through the edge relationships and the effect on hold constraints.

set_max_delay 10 -from [get_clocks fast_clk] -to [get_clocks slow_clk]
set_max_delay 10 -from [get_clocks slow_clk] -to [get_clocks fast_clk]

You may want to pair this with set_min_delay for hold checks, or leave the default hold check in place. You may also be able to do set_false_path -hold to disable hold checks, if your tool supports it.

Gory details of edge selection for multi-cycle paths

To understand the hold adjustment that gets paired with each setup adjustment, consider this simple example with a 3:2 relationship. Each digit represents a rising clock edge:

1     2     3
4   5   6   7

The default setup check uses edges 2 and 6. The default hold check uses edges 1 and 4.

Applying a multi-cycle constraint of 2 with -end adjusts the default setup and hold checks to use the next edge after what they were originally using, meaning the setup check now uses edges 2 and 7 and the hold check uses edges 1 and 5. For two clocks at the same frequency, this adjustment makes sense — each data launch corresponds with one data capture, and if the capture edge is moved out by one, the hold check should also move out by one. This kind of constraint might make sense for two branches of a single clock if one of the branches has a large delay. However, for the situation here, a hold check using edges 1 and 5 isn't desirable, since the only way to fix it is to add an entire clock cycle of delay on the path.

The multi-cycle hold constraint of 1 (for hold, the default is 0) adjusts the edge of the destination clock uesd for hold checks backwards by one edge. The combination of 2-cycle setup MCP and 1-cycle hold MCP constraints will result in a setup check using edges 2 and 7, and a hold check using edges 1 and 4.

Electronic – Making combinational component synchronous

I assume that you mean that you have a combinational (not asynchronous) logic which represents critical timing path which limits the frequency, right?

Your Static Timing Analysis (STA) tool have no idea that you allow this component to spend several clock cycles before you expect to get a valid output. Synthesis still thinks that the propagation from the input stage to the output stage should take place in a single clock cycle. It makes its best to optimize this path, but still, the propagation delay is too long.

You have several options:

The best option in such cases is to split the combinational path into few smaller path, and add register for sampling the intermediate results of the reduced paths. This will reduce the propagation delays, but will add latency - you will get the valid output delayed by the number of sampling stages.
If you do not want to mess with this logic (it is too complex, or it is "silicon proven", or any other reason), you can do what you did: add sampling stages before and after the logic, and add additional control logic which knows how many cycles to wait for the valid data (and not allow the inputs to change). However, you must communicate such an unusual intent to all the tools. Information like this was given a general name: "design constraints". Now, I don't know how to specify design constraints for your FPGA (I think it will not take you more than 10 minutes to understand how). What constraint do you need to add? These kind of combinational paths which take more than one cycle are called "multi-cyle-paths" (MCPs). Find it in the documentation of your FPGA\IDE.

NOTE: there is yet another design constraint which may help you - "false path" (FP). When you define some path as FP, no tool will try to derive any timing for this path. This is useful for DFT stuff, clock muxes etc. However, even you may be tempted to use it in this case (FPs are much more simple to define properly than MCPs), don't do it! MCPs constraints must be validated by formal tools, therefore if you define MCP as FP you are masking many potential bugs.

Best Answer

Related Solutions

Electronic – timing constraint for bus synchronizer circuits

Electronic – Making combinational component synchronous

Related Topic