Electronic – Is using floor plan tool during FPGA design ever actually useful or required

fpgaliberoquartus

I have used Intel Quartus and Microsemi Libero. Both of these tools contain a method whereby we are able to view the floorplan of the FPGA, hover the mouse around to see what parts of netlist have been mapped to different locations on the FPGA floorplan.

The tool is called Chip Editor in Intel Quartus & Chip Planner in Microsemi Libero. The purpose is the same for both.

One feature of these tools is whereby we are able to lock certain part of our design to specific region of the FPGA. This is done via logic lock regions in Quartus and via Floor Planner constaints in Microsemi Libero.

Usually we just click on the compile button and let the tool do synthesis and fitting and this usually gives a netlist that meets timing. The fitter will figure out how to fit the design.

This brings me to my questions:

Why would one ever need to use these floor plan tools to lock design logic into specific regions? Is there any benefit to doing this? Is this ever really required?
Also, if we have done this logic locking to specific regions, what if we want to add debug logic e.g SignalTap II (Quartus) instance into that logic or an Identify Instance (Libero)?

If one wants to use this tool then,

How does one decide what part of design should be locked into what part of the FPGA floor plan? For complex designs, it will certainly be very difficult to make a decision on this by human. This is why I don't understand the point of these tools.

Best Answer

Why would one ever need to use these floor plan tools to lock design logic into specific regions? Is there any benefit to doing this? Is this ever really required?

There are certainly reasons why it is useful, but it really depends on the design.

For massively interconnected designs which don't have nice groupings (e.g. there are lots of processing cores which depend heavily on all the other cores, rather than each core operating independently), the synthesis tools can struggle to see the wood for the trees.

They try to bunch all of the logic as close together as possible for timing, but because the tools can't see how to group it into small sections, this actually can result in worse FMax as bits of cores get exploded around within other cores due to resource scarcity or routing conjestion.

By using LogicLock regions or equivalent, you can help the tools to see blocks which should be grouped together, and this can improve the timing performance as the tools can more tightly pack parts within the LogicLock regions.

If there are many clocks in a design, you can also LogicLock registers that belong to one clock into a specific region to try and reduce the number global clocks required. The synthesis tools are quite good at this nowadays, so probably not needed.

Another reason is if you have logic which is being pulled strongly in two directions (e.g. memory PHY in one corner, processor in the other corner, interconnect fabric in between). If one part was, say, running at a higher frequency than the other, then ideally any clock crossing would be closer to the high speed portion to cope with timing requirements, however if the logic is being pulled strongly in two directions it can be hard for the tools to optimise. There have been times where adding a LogicLock region for this sort of reason has taken designs I've worked on from failing timing to passing.

For more exotic use cases, such as Time to Digital conversion, you would use long carry chains to convert a pulse width into a multi-bit code. This technique typically requires precisely controlled and repeatable propagation delays, so constraining even to the exact register or LUT can be required.

Also, if we have done this logic locking to specific regions, what if we want to add debug logic e.g SignalTap II (Quartus) instance into that logic or an Identify Instance (Libero)?

I can't speak for Libero, but for Quartus unconstrained logic can still be placed within unused portions of the LogicLock region (unless you specifically disallow this). If you add debug logic like SignalTap it will be free to place it wherever it wants (unless you constrain SignalTap to a region), including adding the tap logic within the logiclocked region.

Finally you might want to save a regions of the FPGA for a specific future expansion, so might constrain the current design to a smaller portion of the FPGA so that you know you have the space you need later on.

How does one decide what part of design should be locked into what part of the FPGA floor plan? For complex designs, it will certainly be very difficult to make a decision on this by human. This is why I don't understand the point of these tools.

Unless you have a reason to do so, its usually best to leave it up to the synthesis tools and not overconstrain the design to begin with.

If you start running in to issues with, say, timing analysis, then you could start to investigate if there are lots of long timing paths that appear to be due to high speed logic being widely distributed rather than packed tightly. The Chip Planner is quite useful as it in Quartus at least you can get it to show timing paths.

The fix might be to add more pipelining, or to start constraining logic to certain regions. Adding regional constraints can also allow you to pick apart complex designs to say, group high speed logic, and then see how that affects other paths from perhaps lower speed regions which could then point towards good places to add pipelining.

Related Solutions

Electronic – timing constraint for bus synchronizer circuits

I don't have experience with Quartus, so treat this as general advice.

When working on paths between clock domains, timing tools expand the clocks to the least common multiple of their periods and select the closest pair of edges.

For paths from a 36 MHz clock (27.777 ns) to a 100 MHz clock (10 ns), if I did my quick calculations correctly, the closest pair of rising edges is 138.888 ns on the source clock and 140 ns on the destination clock. That's effectively a 900 MHz constraint for those paths! Depending on rounding (or for clocks with no relationship), it could come out worse than that.

There are at least three ways to write constraints for this structure. I am going to call the clocks fast_clk and slow_clk as I think that's clearer for illustration.

Option 1: disable timing with set_false_path

The easiest solution is to use set_false_path to disable timing between the clocks:

set_false_path -from [get_clocks fast_clk] -to [get_clocks slow_clk]
set_false_path -from [get_clocks slow_clk] -to [get_clocks fast_clk]

This is not strictly correct, since there are timing requirements for the synchronizer to work correctly. If the physical implementation delays the data too much relative to the control signal, then the synchronizer will not work. However, since there isn't any logic on the path, it's unlikely that the timing constraint will be violated. set_false_path is commonly used for this kind of structure, even in ASICs, where the effort vs. risk tradeoff for low-probability failures is more cautious than for FPGAs.

Option 2: relax the constraint with set_multicycle_path

You can allow additional time for certain paths with set_multicycle_path. It is more common to use multicycle paths with closely related clocks (e.g. interacting 1X and 2X clocks), but it will work here if the tool supports it sufficiently.

set_multicycle_path 2 -from [get_clocks slow_clk] -to [get_clocks fast_clk] -end -setup
set_multicycle_path 1 -from [get_clocks slow_clk] -to [get_clocks fast_clk] -end -hold

The default edge relationship for setup is single cycle, i.e. set_multicycle_path 1. These commands allow one more cycle of the endpoint clock (-end) for setup paths. The -hold adjustment with a number one less than the setup constraint is almost always needed when setting multi cycle paths, for more see below.

To constrain paths in the other direction similarly (relaxing the constraint by one period of the faster clock), change -end to -start:

set_multicycle_path 2 -from [get_clocks fast_clk] -to [get_clocks slow_clk] -start -setup
set_multicycle_path 1 -from [get_clocks fast_clk] -to [get_clocks slow_clk] -start -hold

Option 3: specify requirement directly with set_max_delay

This is similar to the effect of set_multicycle_path but saves having to think through the edge relationships and the effect on hold constraints.

set_max_delay 10 -from [get_clocks fast_clk] -to [get_clocks slow_clk]
set_max_delay 10 -from [get_clocks slow_clk] -to [get_clocks fast_clk]

You may want to pair this with set_min_delay for hold checks, or leave the default hold check in place. You may also be able to do set_false_path -hold to disable hold checks, if your tool supports it.

Gory details of edge selection for multi-cycle paths

To understand the hold adjustment that gets paired with each setup adjustment, consider this simple example with a 3:2 relationship. Each digit represents a rising clock edge:

1     2     3
4   5   6   7

The default setup check uses edges 2 and 6. The default hold check uses edges 1 and 4.

Applying a multi-cycle constraint of 2 with -end adjusts the default setup and hold checks to use the next edge after what they were originally using, meaning the setup check now uses edges 2 and 7 and the hold check uses edges 1 and 5. For two clocks at the same frequency, this adjustment makes sense — each data launch corresponds with one data capture, and if the capture edge is moved out by one, the hold check should also move out by one. This kind of constraint might make sense for two branches of a single clock if one of the branches has a large delay. However, for the situation here, a hold check using edges 1 and 5 isn't desirable, since the only way to fix it is to add an entire clock cycle of delay on the path.

The multi-cycle hold constraint of 1 (for hold, the default is 0) adjusts the edge of the destination clock uesd for hold checks backwards by one edge. The combination of 2-cycle setup MCP and 1-cycle hold MCP constraints will result in a setup check using edges 2 and 7, and a hold check using edges 1 and 4.

The use of OFFSET IN/OUT constraint for FPGA design when using register in IOB

Do OFFSET IN and OFFSET OUT constraints add any "value" to a UCF? Do they provide the tools any more information about your design that is needed for the I/O timing to be correct?

That is exactly what they are there for. Although you say in your question "assume the register is in the IOB", my point would be that I don't need to care whether the flipflop is packed into the IOB - if those constraints are accurately specified, complete and met by the tools.

Forcing flipflop packing is often a useful way to get the best possible timing, but I still add OFFSET constraints so that if anything goes awry I find out about it.

Best Answer

Related Solutions

Electronic – timing constraint for bus synchronizer circuits

The use of OFFSET IN/OUT constraint for FPGA design when using register in IOB

Related Topic