Electronic – How to create a triple redundant clock tree in FPGA manually

fpgaredundancyskewtiming-analysis

I am exploring a range of techniques to implement TMR clock trees as part of a global TMR design (all resources including i/o pins, clock trees, reset trees, logic and registers are implemented with triple redundancy). As I am not interested in being locked into any vendor's automated GTMR tools, I'm looking to do this by hand. My understanding is that GTMR is required in FPGA because a single-event-upset SEU in the CRAM bits of a SRAM based FPGA could disconnect the clock tree from a large portion of the logic that it was driving…

I observe the difficulties with the following 3 approaches on an Altera Cyclone V SE:

1) Altera's Cyclone ALTPLL can generate up to 6 clock outputs from one clock source. It is technically possible to request 3 output clocks, each output clock driven at the same target frequency, duty cycle and phase offset. Unfortunately the redundancy is "optimised away" by the fitter tool, resulting in only a single global clock tree being driven. Can someone propose how to prevent this optimisation? –> Yes, it is probably better to have three ALTPLL.. please see approach 2. However, approach 1 would be interesting to explore jitter between driving registers from 3 different clock networks using this approach.

2) In true GTMR fashion, lets assume we use three input clock pins, each input clock pin driving its own pair of {ALTPLL, ALTCLKCTRL} modules. With a rather generous application of guidance to the tools (e.g. syn_keep/syn_noprune/dont_touch like controls), it is possible to drive three global clock tree networks with 3 different PLL operating at the same speeds. I have setup a very simple timing harness (1 data input pin drives a shift register of 10 bits. Those 10 input bins are xored onto the state of 2x 10 bit shift register driving, each shift register driving one data output pin. The three shift registers are all driven by their own clock pin (pin_m?_clock)). With this simple test scheme in place, TimeQuest Timing Analyzer "setup summary" complains that:
pin_m0_clock slack: 0.650 End Point TNS: 0.000
pin_m1_clock slack: -0.877 End Point TNS: -7.575 (flagged as an error)
pin_m2_clock slack: -0.855 End Point TNS: -7.9641 (flagged as an error)
I'm not sure what is required to address these errors in a safe way.

3) If we use three clock pins (either driven by a single clock source, or three frequency and phase synchronised clock sources), and three clock trees, in which each clock pin directly drives its own ALTCLKCTRL module, it is possible to drive three global clock tree networks through the FPGA. Using the same test harness described in (2) above, TimeQuest Timing Analyzer does not make any complaints. There is around -0.339 (m0_clock->m1_clock) to -0.583 (m0_clock->m2_clock) clock skew. I note that clock control for M0 and M1 are located physically close together on the middle-left-edge of the chip, where the third clock control for M2 is located on the middle-bottom edge of the chip and may explain some of the difference in skew (339 vs 583). [9 Nov 2014: I was able to reduce the skew from 583 back down to ~300 by clustering the altclktrl modules physically close together, and driving each of the 3 clock pins from a near-by a dedicated positive-edge clock input pin].

As an related item, I implemented a single-clock version of the same test harness to check for clock-skew between registers on the same clock.. and it was down to -0.071. I'm hoping there is some way to significantly reduce that >300 skew down to something much lower. (My understanding is at least one of the primary goals of reducing the jitter between the TMR clock-trees is to prevent metastability problems on the feedback loop of TMR finite state machines [ loopback -> voter -> FSM logic -> D-FF -> loopback ].)

I'd be interested to hear advice on how to improve this (3)rd approach. I am not sure exactly what the pro's/con's of driving the global clock networks directly from the pins are, but it seems like approach (2) above would be better if the errors reported in (2) could be overcome.

I appreciate all input, guidance and advice on how to correctly do / optimise any of the above three manual approaches. Please feel free to propose a even better manual approach for on-chip global TMR.

Thanks

The Happy Techy

Best Answer

That's a doozy of a question. What's the research for?

I know the approach I'm going to suggest won't resolve the problem with the tools complaining, but it might minimize the skew in an actual implementation. I'm not familiar with Altera FPGA's, I've worked mostly with Xilinx S3 and S6 parts; I know this approach can be made to work in an S3, but not an S6, so it might be possible in your case.

In the S3, you can make small adjustments to the phase of the managed clocks (see pg4,5); in a research paper that I've read (I'm still looking for it, will update with a link if/when I find it), the researchers used this capability and a feedback loop to fine-tune the synchronization of a comms link.

If your FPGA has similar capabilities then you might be able to do something similar: use feedback loops to synchronize your three PLL's. It might also be able to help with the problem that three external clock sources and three PLL's will never have identical frequencies. They may be close - to within a few ppm - but never identical.

Related Solutions

Electronic – Do I need to reset the FPGA design after startup

You should assume the clock input to your flip-flops is toggling unless you can prove otherwise (by a guaranteed power on or post configuration delay). All the flip-flops on a given clock domain are not guaranteed to start on the same clock edge based on GWE or GSR. Both act like an asynchronous reset and cause potential problems for some logic (counters, one-hot state machines, etc).

Specifically a one-hot state-machine that transitions immediately after configuration WILL (eventually) FAIL (transition to an invalid state). The frequency of failure will depending on the clock period compared to the device (and place and route) specific skew for your design.

Another simple experiment to see this behavior initialize a relatively fast count down counter with 10000000 and look at its behavior immediately after configuration. Some bits make the transition to 01111111 and some bits miss that first transition but the subsequent counting sequence will be correct.

The white paper mentioned by Krunal Desai talks about this very problem and is a great reference. Any SRAM based FPGA will most likely have a similar issue.

There is no need to reset the registers to get a known value. If you have logic that is sensitive to all starting on the same clock edge will need to add synchronization logic (this can consist of a synchronously de-asserted reset or other synchronous logic). Xilinx AR44174 talks about the issue a little more. I would add a third method of mitigation which is to guarantee clocked logic is not changing/transitioning during the first several clock cycles after startup.

Electronic – Create multiple clocks on FPGA or create clock dividers

There are different reasons why someone wants/needs to have different clocks in a design. Most of the time I try to keep all my design running with the same clock.

When working with several clocks, we talk about different clock domains. Each signal has its home clock domain. These clock domains are needed by the routing tools to find a layout that meets the timing (= maximum acceptable delay).

If a signal crosses from one clock domain to the other, it is highly recommended to use some crossing logic in order to avoid problems like metastability. Most of the time this is done by FIFOs or Dual-port-BRAM. (More informations about this topic can be found here: http://www.gstitt.ece.ufl.edu/courses/spring11/eel4712/lectures/metastability/EEIOL_2007DEC24_EDA_TA_01.pdf)

Often external interfaces are driven by theire own clocks. In this cases we can not avoid having different clock domains. Use the interface clock to sample the data and maybe do some preprocessing and then migrate the data to the system clock.

Clock Enables: In FPGA designs, clock enables lead to gated clocks and should be avoided by all means. Clocks are routed on special nets. These clock infrastructure takes care of delivering the clock signal to each gate at the same time. After adding logic into the clock path (for example an enable) the clock signal will be routed on normal signal paths. The place-and-route process will have a hard time to make sure, all timing requirements are still met. Better than enabling the clock, would be to make an ENABLE entry to your logic. The clock stays untouched, but the logic knows if it has to react or not. In VHDL this would look something like this:

enable_example: process(clk_C)
begin
if rising_edge(clk_C) then
    if enable_DI = '1' then
        << write your logic here>>
    end if;
end if;
end process enable_example;

Resources: In therms of resources we need to specify a little bit what kind of resources. There are mainly 3 types of resources we could talk about:

Time: Running everything with the highest clock makes your design easy to read and understand but may lead to bad timing. High frequencys allow only short routing and logic propagation delays. The lower your clock, the better the chance of achieveing timing constraints.
FPGA Resources: Modern FPGAs have enough clock nets that you don't need to worry about running out of nets in most cases. But crossing clock domains may need quite some space (FIFO in BRAM or different logic).
Energy: (I don't have enough knowledge to give a good answer to this)

Best Answer

Related Solutions

Electronic – Do I need to reset the FPGA design after startup

Electronic – Create multiple clocks on FPGA or create clock dividers

Related Topic