Electronic – Help wanted explaining signals coming with higher frequency than clock and how to handle them

clockfpgasignalsimulationverilog

I am getting my hands on Verilog and FPGA programming. So I wrote a simple module that handles two inputs – button signal and a clock. Initially, it lights up two LEDs and when the user presses and releases the button, it turns those two off and lights up another two. Here is my code thus far (criticism is very welcome):

`timescale 1ns / 1ps

module led_flip_flop(input wire clk,
                     input wire button,
                     output wire [3:0] led);

   parameter state_on_idle = 3'b000;
   parameter state_on_push = 3'b001;
   parameter state_on_pop = 3'b010;
   parameter state_off_idle = 3'b011;
   parameter state_off_push = 3'b100;
   parameter state_off_pop = 3'b101;

   reg                                 led_sig;
   reg [2:0]                           state;
   reg [2:0]                           next_state;

   initial
     begin
        led_sig = 1'b1;
        state = state_on_idle;
        next_state = state_on_idle;
     end

   always @ (posedge clk)
     begin
        // Clock-synchronized state change feed-back loop.
        // No reset is implemented yet.
        state <= next_state;
     end

   // Button press/release & led on/off flip FSM.
   always @ (state or button)
     case (state)
       state_on_idle:
         // LED is on, button was untouched.
         // If the button is pressed now, switch to next state
         // to wait for release.
         begin
            led_sig = 1'b1;
            next_state = button ? state_on_push : state_on_idle;
         end

       state_on_push:
         // LED is on, button was pressed.
         // If the button is release now, switch to next state
         // that will turn off the LED and go IDLE with LEDs off.
         begin
            led_sig = 1'b1;
            next_state = button ? state_on_push : state_on_pop;
         end

       state_on_pop:
         begin
            // LED was on, button got pressed and released.
            // Switch the LED off, go into IDLE state.
            led_sig = 1'b0;
            next_state = state_off_idle;
         end

       state_off_idle:
         begin
            // IDLE state with LEDs off. Button was untouched.
            // If button is pressed then go to the next state
            // that waits for the button release in order to
            // turn on LEDs.
            led_sig = 1'b0;
            next_state = button ? state_off_push : state_off_idle;
         end

       state_off_push:
         begin
            // LEDs are off. Button was pressed. Wait for the button
            // release in order to enable LED(s).
            led_sig = 1'b0;
            next_state = button ? state_off_push : state_off_pop;
         end

       state_off_pop:
         begin
            // LEDs were off, button clicked. Enable LED(s) and
            // go into IDLE state with high beams.
            led_sig = 1'b1;
            next_state = state_on_idle;
         end

       default:
         // Some foobar state... Recover.
         begin
            led_sig = 1'b1;
            next_state = state_on_idle;
         end
     endcase

   assign led[0] = led_sig;
   assign led[1] = ~led_sig;
   assign led[2] = led_sig;
   assign led[3] = ~led_sig;

endmodule

It was synthesized for the Spartan-6 LX9 FPGA. Here is the schematics:

Xilinx ISE generated schematics

I tested it manually on a real device and it seems to work. So today I was practicing test bench simulation and wrote the following bench:

`timescale 1ns / 1ps

module test_bench();

   reg clk;
   reg button;
   wire [3:0] led;

   led_flip_flop prog(clk, button, led);

   initial begin
      #20000 $finish;
   end

   initial
     begin
       clk = 0;
       forever #5 clk = ~clk;
     end

   initial
     begin
       button = 0;
       forever #({$random} % 11) button = ~button;
     end

endmodule

Before I had a fixed delay for the button = ~button statement and signals in simulation looked good. However, with the random delay, it seems like button press+release signals could get in between clock strobes, and so signals look like this:

waves

Some waves that look bad to me marked red.

As I understand it, there is a problem if the signal from the button is short and is coming in between clock cycles. Of course, in the real life only Chuck Norris can manage to press and release a button fast enough to get between a 66Mhz clock raising edges.

However, that raised few questions in my head..

  1. If external signals are coming with higher frequency then clock frequency (or if processing of a signal takes multiple clock cycles), what are generic approaches to process such signals?
  2. Should the clock always be involved? For example, can I achieve the same thing without using clock at all?
  3. Is this what is called "crossing a clock domain"?

Or perhaps I am doing something totally wrong and silly?

Any help is appreciated. Thank you!

Best Answer

If external signals are coming with higher frequency then clock frequency (or if processing of a signal takes multiple clock cycles), what are generic approaches to process such signals?

One approach is to use a faster clock. Most FPGAs (including your Spartan 6) have dedicated logic for generating different clocks from a source clock. Xilinx has called them DCM (Digital Clock Managers) in the past, although I think Spartan 6 has an actual PLL and DCM-like blocks (may be called something else these days).

These are nice. You can put a 12 MHz crystal on your PCB to minimize emissions, and then inside the FPGA you can multiply the clock up to e.g. 96 MHz (a multiplication of 8x)

You could also try clocking logic on the falling edge and the rising edge. This would give you the effective appearance of a 2x clock (also called DDR), but your logic gets complicated because you must split the design between falling-edge and rising-edge logic.

DCMs can also generate clocks that are 90, 180, or 270 degrees out-of-phase with the incoming clock. So if you really needed to sample a signal faster than the FPGA can handle, you could use the 90 degree clock and do the same DDR trick on it. This would give you four chances to sample an incoming signal with the fastest possible FPGA clock, but again this gets complicated because now you're negotiating logic between two sets of rising and falling edges.

.

Should the clock always be involved? For example, can I achieve the same thing without using clock at all?

You could give pure combinatorial logic a shot. But in general it represents a timing nightmare. FPGAs have a sea of flip flops and they're very important for achieving timing closure. It really is best practice to clock everything that you can afford to clock. FPGA IOBs (the input output buffers, which send or receive signals to the outside world) have integrated flip flops for the express purpose of making timing closure easier to achieve.

I'm not even really sure how you'd do this without clocking.

.

Is this what is called "crossing a clock domain"?

Almost. Crossing a clock domain, as mentioned above, happens when the output and input do not share a common clock. The reason this is bad is that those clocks are typically asynchronous. As an engineer, you should always consider that asynchronous I/O will go out of its way to change at the worst possible time.

Button inputs are asynchronous, so they suffer from the same problems as crossing clock domains. The problem with an async signal is that flip flops have parameters known as setup and hold times. Setup time is how long the signal must be steady before the clock edge. Hold time is how long the signal must be steady after the clock edge. If you have a setup or hold violation, the flip flop can go meta-stable. This means that you have no idea what the output will be. At the next clock edge, it generally fixes itself. (flip flops can also go meta-stable if they have a voltage between Vih and Vil applied during the clock edge, but that's not a problem in the FPGA) I believe I also once read somewhere that hold time in FPGAs is typically considered to be 0, and the primary reason you don't get timing closure is insufficient setup time.

To handle async inputs, we use synchronizers. Essentially we put multiple flip flops between the async input and the logic it's connected to. If the first flip flop goes meta stable, it won't screw up the second flip flop...hopefully. It still can, though, but the mean time between failures could be years. If that's not enough, you put even more flip flops to reduce the MTBF to acceptable levels.

Now, in terms of "crossing clock domains", you only need the synchronizer if your clocks are async to each other. One of the benefits of the DCM is that it can generate multiple clock frequencies that are perfectly synchronized. So if you have a "big" calculation that takes a long time to complete, you can use a 12 MHz clock, while the 96 MHz clock samples the inputs. Since they both came from the DCM, every 12 MHz edge perfectly lines up with a 96 MHz edge, so there's no need to use a synchronizer.

EDIT: response to comment

So clocks cannot pass my flip-flop asynchronously, obviously because there can be one signal at a time only (pure physics). So the synchronization is needed to handle one clock "ticking in between" of the other clock cycles. If that is correct, what happens if my logic (combined gate delay) is greater than input clock? Do I get silently screwed, does the signal just "disappears" on its way through the gates, or input clock tick is getting lost?

I'm going to be a bit picky about the language here because it's important to make distinctions sometimes. The clocks are async to each other, but every flip flop is sync to its own clock. The async signal is the output of the clock in one domain and the input of a clock in the other domain.

Say you have two crystals on a board at 48 MHz. They are not both precisely 48 MHz. So if you clock two series flip flops with the two crystals and an inverter at the end to make them switch every cycle, you will eventually cause the first flip flop to output a signal that violates the setup time of the second flip flop's input relative to the second flip flop's clock; but as far as the first flip flop's clock, everything is going according to plan. The second clock will glitch for one cycle; it may go high, it may go low, it may go somewhere inbetween, it might start going one direction and then abruptly change to another halfway through the clock cycle, etc. After the next second clock tick, it will probably correctly sample its input and update its output...probably. Which is why for really high reliability they put more flip flops in the synchronizer, because then you have probably*probably*probably = all but certain.

That's somewhat contrived for an example, but it's easy to see how those clocks would have a periodic meta-stable incident that's a function of the difference in the crystal speeds. For things like buttons, it happens basically totally randomly. This is also why some protocols carry their own clock somehow (e.g. RS232, USB has implicit clock in the data; I2C, SPI have explicit clock trace), to avoid crossing clock domains.

.

When your design is synthesized/mapped/placed/routed, the tools will tell you the maximum propagation delay for any logic that touches a flip flop in your design. If you go higher than this frequency, you broke the rules and the FPGA isn't responsible. In fact, you generally want to go a little slower. So what you do is tell the design tools (for Xilinx this uses a constraint file) "When you're doing place and route, try to keep everything at most 19ns apart," (~52 MHz), then you actually use a 48 MHz crystal (20.8 ns), giving you 1.8 ns of slack (about 9%).

Getting your design to route at the desired speed is called "achieving timing closure". For more complex designs, you have to consider where to put extra flip flops to help pipeline the process and keep the clock speed up. The Pentium 4 is an extreme example; pipeline stages 5 and 20 existed for the sole purpose of allowing signals to propagate across the chip, allowing them to make the clock speed very high. Since the FPGA has a DCM, if you need to clock a very large piece of logic in one stage (e.g. a very large decoder or mux), you could have the DCM do a "slow logic" clock to handle that part, and then have a "fast logic" clock for the other parts. This also kinda helps with power consumption as well, for any stage that's running with the slow logic clock.

For pure combinatorial logic, the tools will tell you the maximum pad-to-pad delay; that is, how much time a signal takes to get from the input pad, through all the logic, and back to the output pad. The problem with such logic is that it has no concept of the past. It cannot remember whether the light was on or off. You could probably design your own feedback elements (e.g. SR latch), perhaps even using the set/resets of the integrated flip flops, but at this point you're starting to become sequential logic, so you might as well just take the plunge and get everything clocked reliably.