Electronic – fpga clock muxing

fpgavhdl

We are using an fpga with limited resources, the IGLOO Nano, so to implement all our functionality, we need to share a FIFO between two different vhdl components, which are using different clocks.
The functionality is as shown below:

DATA(SCLK) -->|------|              |----------|
              | MUX  |------------->|DATA      |
DATA (CLK) -->|------|              |          |
                                    |   FIFO   |
    SCLK ---->|------|              |          |
              | MUX  |------------->|WCLK      |
    CLK  ---->|------|              |----------|

SCLK=27MHz and CLK=13.5MHz are not related.
DATA is either synchronous with the SCLK or CLK, depending on which is selected in the MUX.
The synthesizer tool shows a warning: While analyzing gated clock network, ambiguities have been found on gates
My problem is that DATA is not clocked correctly into the FIFO, and the post place and route simulation confirms this. The DATA is not correctly aligned with WCLK, when arriving at the FIFO input port.
How do I constrain a design like shown above, so data is always synchronous with the WCLK ?
EDIT: additional information, the MUX select pin does not change often.

Best Answer

I would discourage you from trying to MUX the clocks like you show. You are already seeing that there are issues of trying to use gated clocks.

My suggestions -

Find a larger FPGA that is not so resource constrained for your design. There are a lot of good choices out there that are economical.
Find a way to combine your clock domains into one so that one common clock can drive the whole design.
Partition your design to be in two separate devices with each device supporting a single clock domain.

Related Solutions

Electronic – fpga clock strategy

You can create as many clocks as you want, and you can use PLLs or DCMs to create arbitrary clocks. The question is whether you need to, or if you should be doing it a different way.

I find that I end up running as much logic at a common or "core" clock frequency, say the 54MHz that you are using, but I need to trigger certain processes to run periodically. Say a 100ms debounce, a 10kHz PWM update, a 1s timer tick for wall clock, you get the idea. Instead of generating these clocks, I instead run everything at the core clock frequency and generate arbitrary clock enable signals.

You generally don't want to create divided clocks for several reasons. Logic-generated clocks are jittery, the tools may end up routing these "clock" signals along routing paths intended for logic (since they're generated from logic) and as mentioned above and by others, PLLs and DCMs are much better options if you really need to generate a different clock.

Clock gating is what you want. The device primitives have an additional clock enable signal which "gates" the clock signal, allowing to propagate into the primitive or not. When the clock enable is negated, the FF doesn't see the clock and effectively holds its state as if the clock pulse never occurred. When the clock enable signal is asserted the FF sees the clock normally and things proceed as expected. Clock enables are designed specifically to control an FF's access to its clock and as such don't have issues with generating runt clocks. They also don't take up any additional resources, so use them.

e.g. generating a clock in logic. This is bad, don't do this:

process gen_100ms_clk (clk, rst)
variable ctr: integer range 0 to 5399999;

begin
    if rst = '1' then
        ctr := 0;
        out <= '0';
    elsif rising_edge(clk) then
        if ctr = ctr'high then
            out <= not out;
            ctr := 0;
        else
            ctr := ctr + 1;
        end if;
    end if;
end process gen_100ms_clk;

This code has the out signal toggle state every 100ms; This signal would be a poor choice to use as the clock signal of a new process, such as here:

process do_100ms(out, rst)
begin
    if rising_edge(out) then
        ...
    end if;
end process do_100ms;

This is bad because the FFs in the do_100ms() process are using a signal created through the logic in the gen_100ms_clk() process.

Instead, use a clock enable, as shown here:

process gen_100ms_ce (clk, rst)
variable ctr: integer range 0 to 5399999;

begin
    if rst = '1' then
        ctr := 0;
        out <= '0';
    elsif rising_edge(clk) then
        if ctr = ctr'high then
            out <= '1';
            ctr := 0;
        else
            out <= '0';
            ctr := ctr + 1;
        end if;
    end if;
end process gen_100ms_clk;

Now gen_100ms_ce() creates an out signal that is high for 1T every 100ms. This is a great way to signal to your code that it's time to do something:

process do_100ms(clk, rst)
begin
    if rising_edge(clk) then
        if out = '1' then
            ...
        end if;
    end if;
end process do_100ms;

Now your do_100ms() process is running at the same 54MHz clock as everything else and it uses a proper clock enable to trigger whatever you want to happen every 100ms.

Take a look at the RTL output of your toolset; you'll see that the primitive used in your do_100ms() process will use its clock enable signal.

This method also achieves power savings since there will be large swaths of logic that stay "static" for long amounts of time even though the global clock net is wiggling away at 54MHz in your case. Once every 100ms in my example above, all the clocks which are gated with the 100ms enable become active for 1T and then are static again for another 99.9999815ms. :-) CMOS consumes very little power when it's not changing state, so the only power consumption in the logic with the gated-off clock is in the leakage currents of its logic.

You can extend this into a full-out means of power management. You create clock enables for all the subsystems and your power manager negates the clock enable for whichever subsections you dont' want powered.

Electronic – fpga internal metastability

The output of a register whose input does not change within a specified margin of the clock will switch within a specified period of time of the clock. The output of a register whose input changes too close to a clock may change at some arbitrary time in future which might, though is generally not likely to, be near the next clock.

If a register's input is derived from signals that are all clocked by the same signal as the register itself, and if the maximum propagation time is sufficiently shorter than the time between clocks, then provided the earlier registers switch as specified, the derived input will not change within the forbidden window. If the maximum propagation time is longer than the time between clocks, however, a bad situation will arise. When latching truly asynchronous events, it's possible that events may occasionally put a latch into a metastable state, but feeding the output from that latch into a second latch will usually clear things up. If the first latch goes into a metastable state about once a minute, and one in ten million metastability events on the first latch will cause the second latch to go metastable, problems on the second latch will only occur about once every twenty years. If rather than being asynchronous, however, the signals arriving at a latch switch at times which combinatorial logic delays by a time close to a clock period, it's possible that rather the first latch going metastable once a minute, it may go metastable millions of times per second. Adding a second latch may improve things, but even a with 10,000,000:1 improvement the downstream latch would still go metastable many times per minute.

If your propagation time is too long relative to your clock period, you need to either add registers to ensure that the propagated result will be consistently seen some number of clock periods after the earlier-stage latches change, or else add logic to ensure that nothing will do anything with the output from a register which may have gone metastable. The former approach would be better if one wishes to handle one data item per clock cycle but can accept the pipeline delay. The latter approach may be better if there's lots of parallel data and it won't be necessary to have multiple calculations in the pipe simultaneously [the amount of logic required would be independent of the number of data paths]. The latter approach may be especially advantageous if the required number of delay cycles may be variable [e.g. if a circuit may operate at 100MHz, 50Mhz, or 32MHz and the logic's propagation time is 25ns, one may use a two-cycle delay at 100Mhz, a one cycle at 50MHz, and no delay at 32Mhz or below].

Best Answer

Related Solutions

Electronic – fpga clock strategy

Electronic – fpga internal metastability

Related Topic