Electronic – Asynchronous FIFO design with PULSE synchronizer

cdcclockdigital-logicfifoverilog

I'm trying to understand various implementations of asynchronous FIFO from the following link
https://inst.eecs.berkeley.edu/~cs150/sp10/Collections/Discussion/Honors/Honors14_1PP.pdf

In the slide 7 there's a proposal where pulse synchronizers are used for read and write signals which are the. This design doesn't work because if the reads and writes are high in successive cycles(burst transfer), pulse synchronizer output will only output one pulse in destination clock domain. Am I correct?
Suppose if I have a case where writes and reads are not continuous but are separated by fixed 2-3 cycles will this design work? Will it have any further issues?

Best Answer

This design doesn't work because if the reads and writes are high in successive cycles(burst transfer), pulse synchronizer output will only output one pulse in destination clock domain. Am I correct?

If you want to use pulse synchronizers to keep read-write pointer/counters synchronized to the respective clock domains, there are overheads to keep in mind. Normally you want to continuously enqueue/dequeue data to/from a FIFO every clock cycle. Suppose you keep write signal high for say 2 clock cycles, you have to get 2 synchronized pulses or a two-cycle long pulse at the read-clock domain, which will then update the pointers and the FIFO works flawlessly. BUT unfortunately pulse synchronizers don't work like that.

Pulse/Toggle Synchronizer

Consider a simple toggle/pulse synchronizer like this: (credits: edn.com)

For this pulse synchronizer to work correctly, the output signal from flop-A has to be stable for a minimum time period such that there is at least one clock edge at destination clock that will sample the data correctly without metastability. This is because it's possible that the signal causes metastability in the first clock edge at flop-B1. After metastability, flop-B1 may settle to a wrong value, which is then propagated by rest of the flops. However if flop-A output signal remains stable until the next destination-clock edge, it is sure that the correct value is sampled at the second clock edge.

Scenario

Suppose write-clock is very faster than read-clock. Say, you keep write signal asserted high for 2 successive clock cycles (as we discussed at the beginning). What happens is that flop-A output toggles for a single write-clock cycle, and it is never sure that this single-cycle pulse at flop-A is correctly synchronized to the read-clock because of metastability. May be this transition gets completely missed before any sampling clock edge arrives at read-clock. It is also possible that the '1' from flop-A was sampled so close to the read-clock edge and it settled to '0' after metastability. Then the sampled signal remains '0' in remaining read-clock cycles as well, because the flop-A's output signal has already de-asserted after one write-clock cycle. The result is that the you missed the pulse completely. So the write pointer/counter will not be updated at the read-clock domain, and thus the pointers goes out of sync at the two clock domains, and the functionality of the FIFO flaws.

Thus you can't really get full throughput if you design an Asynchronous FIFO using pulse synchronizers. You have to pulse write and read properly for successive data transfer with enough time-period between the pulses for the destination clock domain to correctly sample and update the pointers.

Suppose if I have a case where writes and reads are not continuous but are separated by fixed 2-3 cycles will this design work

From above discussion I guess it's already clear that there is a dependency on clock periods of read and write. Suppose read-clock is at 10 MHz and write-clock is 100 MHz, pulsing write every 2-3 clock cycles is not going to guarantee synchronization with read-clock domain. Pulsing write for one cycle generates an active-high strobe signal (at Flop-A) internally which has to be sampled and converted to pulse at read-clock. You need longer wait duration before you can pulse the next write and be absolutely sure that synchronization happened; in this it would be greater than the period of read-clock ie., \$> 10\$ write-clock cycles.

Related Solutions

Electronic – timing constraint for bus synchronizer circuits

I don't have experience with Quartus, so treat this as general advice.

When working on paths between clock domains, timing tools expand the clocks to the least common multiple of their periods and select the closest pair of edges.

For paths from a 36 MHz clock (27.777 ns) to a 100 MHz clock (10 ns), if I did my quick calculations correctly, the closest pair of rising edges is 138.888 ns on the source clock and 140 ns on the destination clock. That's effectively a 900 MHz constraint for those paths! Depending on rounding (or for clocks with no relationship), it could come out worse than that.

There are at least three ways to write constraints for this structure. I am going to call the clocks fast_clk and slow_clk as I think that's clearer for illustration.

Option 1: disable timing with set_false_path

The easiest solution is to use set_false_path to disable timing between the clocks:

set_false_path -from [get_clocks fast_clk] -to [get_clocks slow_clk]
set_false_path -from [get_clocks slow_clk] -to [get_clocks fast_clk]

This is not strictly correct, since there are timing requirements for the synchronizer to work correctly. If the physical implementation delays the data too much relative to the control signal, then the synchronizer will not work. However, since there isn't any logic on the path, it's unlikely that the timing constraint will be violated. set_false_path is commonly used for this kind of structure, even in ASICs, where the effort vs. risk tradeoff for low-probability failures is more cautious than for FPGAs.

Option 2: relax the constraint with set_multicycle_path

You can allow additional time for certain paths with set_multicycle_path. It is more common to use multicycle paths with closely related clocks (e.g. interacting 1X and 2X clocks), but it will work here if the tool supports it sufficiently.

set_multicycle_path 2 -from [get_clocks slow_clk] -to [get_clocks fast_clk] -end -setup
set_multicycle_path 1 -from [get_clocks slow_clk] -to [get_clocks fast_clk] -end -hold

The default edge relationship for setup is single cycle, i.e. set_multicycle_path 1. These commands allow one more cycle of the endpoint clock (-end) for setup paths. The -hold adjustment with a number one less than the setup constraint is almost always needed when setting multi cycle paths, for more see below.

To constrain paths in the other direction similarly (relaxing the constraint by one period of the faster clock), change -end to -start:

set_multicycle_path 2 -from [get_clocks fast_clk] -to [get_clocks slow_clk] -start -setup
set_multicycle_path 1 -from [get_clocks fast_clk] -to [get_clocks slow_clk] -start -hold

Option 3: specify requirement directly with set_max_delay

This is similar to the effect of set_multicycle_path but saves having to think through the edge relationships and the effect on hold constraints.

set_max_delay 10 -from [get_clocks fast_clk] -to [get_clocks slow_clk]
set_max_delay 10 -from [get_clocks slow_clk] -to [get_clocks fast_clk]

You may want to pair this with set_min_delay for hold checks, or leave the default hold check in place. You may also be able to do set_false_path -hold to disable hold checks, if your tool supports it.

Gory details of edge selection for multi-cycle paths

To understand the hold adjustment that gets paired with each setup adjustment, consider this simple example with a 3:2 relationship. Each digit represents a rising clock edge:

1     2     3
4   5   6   7

The default setup check uses edges 2 and 6. The default hold check uses edges 1 and 4.

Applying a multi-cycle constraint of 2 with -end adjusts the default setup and hold checks to use the next edge after what they were originally using, meaning the setup check now uses edges 2 and 7 and the hold check uses edges 1 and 5. For two clocks at the same frequency, this adjustment makes sense — each data launch corresponds with one data capture, and if the capture edge is moved out by one, the hold check should also move out by one. This kind of constraint might make sense for two branches of a single clock if one of the branches has a large delay. However, for the situation here, a hold check using edges 1 and 5 isn't desirable, since the only way to fix it is to add an entire clock cycle of delay on the path.

The multi-cycle hold constraint of 1 (for hold, the default is 0) adjusts the edge of the destination clock uesd for hold checks backwards by one edge. The combination of 2-cycle setup MCP and 1-cycle hold MCP constraints will result in a setup check using edges 2 and 7, and a hold check using edges 1 and 4.

Electronic – On the use of “BLOCK INTERCLOCKDOMAIN PATHS”

What you are doing is correct. The key part is the synchronizer. If the only inter-clock domain nets are processed there, you can ignore the warning.

Your plan to remote constraints on other nets isn't the right plan. If you get warnings apart from your synchronizer, that means you have clock crossing elsewhere. You need to fix them.