Electronic – How to create a triple redundant clock tree in FPGA manually

fpgaredundancyskewtiming-analysis

I am exploring a range of techniques to implement TMR clock trees as part of a global TMR design (all resources including i/o pins, clock trees, reset trees, logic and registers are implemented with triple redundancy). As I am not interested in being locked into any vendor's automated GTMR tools, I'm looking to do this by hand. My understanding is that GTMR is required in FPGA because a single-event-upset SEU in the CRAM bits of a SRAM based FPGA could disconnect the clock tree from a large portion of the logic that it was driving…

I observe the difficulties with the following 3 approaches on an Altera Cyclone V SE:

1) Altera's Cyclone ALTPLL can generate up to 6 clock outputs from one clock source. It is technically possible to request 3 output clocks, each output clock driven at the same target frequency, duty cycle and phase offset. Unfortunately the redundancy is "optimised away" by the fitter tool, resulting in only a single global clock tree being driven. Can someone propose how to prevent this optimisation? –> Yes, it is probably better to have three ALTPLL.. please see approach 2. However, approach 1 would be interesting to explore jitter between driving registers from 3 different clock networks using this approach.

2) In true GTMR fashion, lets assume we use three input clock pins, each input clock pin driving its own pair of {ALTPLL, ALTCLKCTRL} modules. With a rather generous application of guidance to the tools (e.g. syn_keep/syn_noprune/dont_touch like controls), it is possible to drive three global clock tree networks with 3 different PLL operating at the same speeds. I have setup a very simple timing harness (1 data input pin drives a shift register of 10 bits. Those 10 input bins are xored onto the state of 2x 10 bit shift register driving, each shift register driving one data output pin. The three shift registers are all driven by their own clock pin (pin_m?_clock)). With this simple test scheme in place, TimeQuest Timing Analyzer "setup summary" complains that:
pin_m0_clock slack: 0.650 End Point TNS: 0.000
pin_m1_clock slack: -0.877 End Point TNS: -7.575 (flagged as an error)
pin_m2_clock slack: -0.855 End Point TNS: -7.9641 (flagged as an error)
I'm not sure what is required to address these errors in a safe way.

3) If we use three clock pins (either driven by a single clock source, or three frequency and phase synchronised clock sources), and three clock trees, in which each clock pin directly drives its own ALTCLKCTRL module, it is possible to drive three global clock tree networks through the FPGA. Using the same test harness described in (2) above, TimeQuest Timing Analyzer does not make any complaints. There is around -0.339 (m0_clock->m1_clock) to -0.583 (m0_clock->m2_clock) clock skew. I note that clock control for M0 and M1 are located physically close together on the middle-left-edge of the chip, where the third clock control for M2 is located on the middle-bottom edge of the chip and may explain some of the difference in skew (339 vs 583). [9 Nov 2014: I was able to reduce the skew from 583 back down to ~300 by clustering the altclktrl modules physically close together, and driving each of the 3 clock pins from a near-by a dedicated positive-edge clock input pin].

As an related item, I implemented a single-clock version of the same test harness to check for clock-skew between registers on the same clock.. and it was down to -0.071. I'm hoping there is some way to significantly reduce that >300 skew down to something much lower. (My understanding is at least one of the primary goals of reducing the jitter between the TMR clock-trees is to prevent metastability problems on the feedback loop of TMR finite state machines [ loopback -> voter -> FSM logic -> D-FF -> loopback ].)

I'd be interested to hear advice on how to improve this (3)rd approach. I am not sure exactly what the pro's/con's of driving the global clock networks directly from the pins are, but it seems like approach (2) above would be better if the errors reported in (2) could be overcome.

I appreciate all input, guidance and advice on how to correctly do / optimise any of the above three manual approaches. Please feel free to propose a even better manual approach for on-chip global TMR.

Thanks

The Happy Techy

Best Answer

That's a doozy of a question. What's the research for?

I know the approach I'm going to suggest won't resolve the problem with the tools complaining, but it might minimize the skew in an actual implementation. I'm not familiar with Altera FPGA's, I've worked mostly with Xilinx S3 and S6 parts; I know this approach can be made to work in an S3, but not an S6, so it might be possible in your case.

In the S3, you can make small adjustments to the phase of the managed clocks (see pg4,5); in a research paper that I've read (I'm still looking for it, will update with a link if/when I find it), the researchers used this capability and a feedback loop to fine-tune the synchronization of a comms link.

If your FPGA has similar capabilities then you might be able to do something similar: use feedback loops to synchronize your three PLL's. It might also be able to help with the problem that three external clock sources and three PLL's will never have identical frequencies. They may be close - to within a few ppm - but never identical.