Electrical – Sampling data at 5MHz with 50Mhz clock in Verilog

fpgaoversamplingverilog

I'm trying to make a controller for the MAX31855 thermocouple IC.
My FPGA works at 50MHz and this IC works at 5MHz, so I'm using a frequency divider to get the 5MHz clock signal.

Now the IC is sending to the FPGA their 32 data bits, at 1 bit per clock cycle. So I don't exactly how to sample this (5MHz) bit stream with a 50Mhz clock signal.

I'm also thinking in metastability problems.

Any idea?

Best Answer

You say you have a frequency divider. But that is just the beginning. Indeed you have to add a synchroniser for the serial input. I looked at the datasheet and you need an SPI interface without the transmit part. That means you also need a chip select, serial/parallel converter, . I am not going to write that for you (After all that is what I earn my money with) so I am going to give you the most important snippets:

  always @(posedge clk or negedge reset_n)
   begin
      if (!reset_n)
      begin
         ser_in_meta <= 1'b0;
         ser_in_sync <= 1'b0;     
      end
      else
      begin
         // Sync input on system clock 
         ser_in_meta <= ser_in;
         ser_in_sync <= ser_in_meta;
      end
   end

         // Divide by 10 counter
         if (clock_div==4'd9)
            clock_div <= 4'd0;
         else
            clock_div <= clock_div + 4'd1;

        // Symmetrical 1/10 system clock
        if (clock_div==4'd0)
           ser_clk <= 1'b0;
        else
           if (clock_div==4'd4)
              ser_clk <= 1'b1;

        if (sample)
        begin
           bit_count <= bit_count + 5'h1;
           // Receive: MS bit arrives first 
           shift_in  <= {shift_in[30:0],ser_in_sync};                    
        end        

   // pick up the data just before the falling clock edge 
   assign sample  = (clock_div==4'd9);

The Maxim datasheet says the data is changing max 40ns after the falling clock edge. So pick it up just before.

Related Solutions

Reading a serial data stream with Verilog

You have two requirements here:

You need to make sure you sample that data at the correct time according to the bus needs.
You need to synchronize this data into your system clock domain.

There are a couple of ways to meet these needs.

First, if the bus clock is slow enough below your system clock, then you can synchronize the bus clock to your clock domain with a double flop. Then use a simple edge detector to determine when the rising edge is. This is used to safely sample the data line of the bus. Note that this achieves #2 automatically. Also, as noted here, this is the preferred way to do this in an FPGA.

This method has a drawback in that the data is sampled somewhere between two and three system clocks later than the actual 'on the pin' bus clock edge. If this is too long (due to your system clock not being fast enough in comparison), you have to go an alternative way.

In this method, you sample before synchronizing to the system clock domain. The reason is to make sure you are sampling at the right time according to the bus.

always @(posedge bclk) begin  // positive edge of bus clock
    sampled_data <= data;
end

At this point, you have sampled_data which is a signal in the bclk domain. You need to synchronize it to your system clock domain. To do this, you have to use handshaking or a FIFO.

One way that works is to do the shift register in the bus clock domain to get to parallel data. Then pass it through a dual clock FIFO to the other domain. FPGAs have primitives just for this use.

// Latch the data in the bclk domain
always @(posedge bclk) begin
    fifo_d <= {fifo_d[30:0], data};

    data_count <= data_count + 1;
    if (data_count == 31)
        fifo_wr <= 1'b1;  // This will latch fifo_d to the dual clock fifo input
    else
        fifo_wr <= 1'b0;
end

// Read the data out of the FIFO in the system clock domain.
always @(posedge clk) begin
    if (fifo_ready)
        synchronized_data <= fifo_q;   // Now the data is in your domain.
end

Other notes:

As with any I/O at the edge of the FPGA as well as between clock domains, you will need to correctly define the timing constraints. Do describe the details is a bit too general for this forum, but as recommended in the comments by Greg, this paper is a good source for understanding the needs for the clock domain crossing. The FPGA vendors tend to have decent write ups for input delay and output delay definitions as well.

Electronic – digital bandpass filter with parallel inputs

It should be possible to unroll this, but it will require 64*16 = 1024 MAC operations per clock cycle. Think about it like this:

y[n] = a0 * x[n] + a1 * x[n-1] + ... + a63 * x[n-63]

That's the filter operation that you need to do. Let's simplify that a bit and only consider the first 3 terms:

y[n] = a0 * x[n] + a1 * x[n-1] + a2 * x[n-2]

Each -1 is one clock cycle of delay. If you get one term per clock cycle, then you can implement that directly with 3 multipliers and three registers to store the x-values. However, if you get two x values per clock cycle, you also need to produce two y values per clock cycle. In that case, you need to do something like this, presuming your input values are x[2n] and x[2n+1]:

y[2n]   = a0 * x[2n]   + a1 * x[2n+1-2] + a2 * x[2n-2]
y[2n+1] = a0 * x[2n+1] + a1 * x[2n]     + a2 * x[2n+1-2]

And you can continue this for more inputs:

y[3n]   = a0 * x[3n]   + a1 * x[3n+2-3] + a2 * x[3n+1-3]
y[3n+1] = a0 * x[3n+1] + a1 * x[3n]     + a2 * x[3n+2-3]
y[3n+2] = a0 * x[3n+2] + a1 * x[3n+1]   + a2 * x[3n]

Note that in this case, each clock cycle of delay is NOT a delay of 1, so I have rewritten the terms as a sum of the original term and the delay. So for example, 2n gets moved to 2n-2 on the next cycle, and 2n+1 goes to 2n+1-2 on the next cycle. You can scale this pattern to what you need, however I would recommend using a Python script or similar to generate your HDL as this would be a nightmare to implement manually.

All in all, you will need parallel sample count * filter length MAC operations. Note that it may be possible in some cases to do two MAC operations in one DSP slice if it has a pre-adder and your filter coefficient list has a symmetry that you can exploit. So if you are using a modern Xilinx chip, it may be possible to implement this in 512 DSP slices.

Edit: Here's another option that's a little crazy, but it might be worth looking at. It is possible to build an FIR filter without using any DSP slices that's still reasonably fast - it's called a distributed arithmetic filter. The tradeoff is that for a coefficient width of M bits, it requires M clock cycles to compute the next sample. You're already doing 16 samples in parallel, it might be worth looking at trying a distributed arithmetic implementation that's 16*M in parallel. 16 bit samples * 16 samples would only be 256 parallel DA filter implementations. I have not done much with distributed arithmetic so I'm not sure exactly how well it scales, but it's another possible way to implement your filter. I'm not sure what FPGA you're using, but it's possible that you won't have enough multipliers to build a more standard design with DSP slices and DA may be the only option.

Best Answer

Related Solutions

Reading a serial data stream with Verilog

Electronic – digital bandpass filter with parallel inputs

Related Topic