Electronic – Is the theory of operation behind the FPGA design acceptable

fpgaled strip

(This question is somewhat related to a previous question of mine.)

I'm trying to use an FPGA to drive an LED strip which contains several WS2801 ICs. (WS2801 datasheet)
The operating premise of the WS2801 is simple – clock in 24 bits of data (8 bits each for R,G,B) and then leave the clock low for 500µs. This causes the WS2801 to latch the data and change the LED color. If you have a strip of multiple WS2801s in series, you clock in 24 bits * (Number of ICs) and then hold the clock low to latch 'em all up. Simple, right?

So, I have created a "WS2801 Test Driver" module, clocked at 2MHz. (Datasheet claims it can run as fast as 25MHz but I have yet to test this in practice).

Basically, my driver is a shift register (with a pre-loaded 72-bit value) and a counter.
Why 72 bits? I wanted to test a string of 3 WS2801 ICs. In practice, I need to load in data from some kind of buffer…thing. (Any suggestions for that would be appreciated but that seems mostly out of the scope of this question.)

Here's a simple block diagram:
I will add a more accurate block diagram in a little bit, I don't think the clock enable is shown accurately.
diagram of WS2801 test module

The clock is shared between the shift register and the counter. After 72 ticks (all the data has now been shifted out), the counters output goes low, disabling the clock output and preventing data from shifting out. This is the start of the 500µS clock delay.
The clock is obviously still running the counter, which continues to keep counting. Now the counter waits 1,000 ticks and then drives the output high, enabling the output clock and serial data output.

Why 1,000 ticks? – At 2MHz, the period is .5µs. To get to 500µS,we need 500/.5 = 1,000 ticks. In practice, I've found I need to add a little fudge factor – 1,032 ticks, actually. This might be due to poor clock routing or propagation delay or something of that nature. I haven't really looked into it yet.

The design as implemented seems to work OK. I looked at the outputs on a logic analyzer, everything seems fine and I'm getting the colors on the LEDs that I expect.

My question is:

Is this a good design?
If there is a better method of going about this, please suggest!

If you read the link to my previous question: Does it seem like this design will integrate nicely into the bigger picture of creating an FPGA based Ambilight clone?

Thanks for reading!

Best Answer

I'll add a bit to Brian Carlton's answer.

Within an FPGA, it's correct; gated clocks are not at all recommended. And the flip-flops will have a separate ENable input so that it's not necessary.

In your case, though, because your gated clock only goes to the output pin and isn't used internally to the FPGA, you can gate your clock without penalty. The way to do it is to make sure the clock gating is done in the output block. Assuming you're using Xilinx, instead of instantiating an OBUF for your clock output, use an OBUFT, and you'll get access to the tristate pin of the output buffer. If you're using another vendor's FPGAs there will be an equally easy way to do this.

If you prefer to do this using inference rather than instantiation, you'll need to be sure to enable an option during compiling to push logic into IO blocks. If the gated clock does actually fan-out (but you didn't show it in your diagram), you'll also need to enable an option that allows duplicate logic to be generated.

Related Solutions

Electrical – Implementing a derived clock in a FPGA

You would not do something like this in a real design. See this for an example design rule you would be violating. Sometimes, you cannot avoid this, but you should only violate the design rule, if you know what you are doing. Instead, you should use an enable signal to enable the 1 Hz logic for 1 clock cycle every second. All the logic still runs at 50 MHz.

Now that's out of the way, there are many ways to implement what you have specified. You cannot avoid the comparator really, because even if you are counting down you still need to reset the counter. The comparator will not use much logic anyway. It is just a large AND gate.

You could have problems when using fast clocks, resulting in long counter chains. The way to avoid this is to use linear feedback shift registers (LFSRs). The way they work is like a shift registers, but with XOR gates inserted in between some DFFs. By choosing where to place the XOR gates you can control the number of cycles it takes for the LFSR to return to its original state. The LFSR is like a counter, but it counts in a random order. Since you do not care about the order, it will work in your application. The advantage of the LFSR is that the next state of a DFF depends only on the previous DFF. There is no carry to propagate.

The other advantage is that you are using the carry chain in an FPGA. All LUTs have a fast connection to the next LUT called the carry chain. The LFSR uses the carry chain to connect to the next LUT instead of using the routing multiplexers. This is what your standard counter also uses, but the carry needs to propagate through the entire chain.

The problem with LFSRs is that you need to know where to place the XOR gates, i.e. which characteristic polynomial to use. This can be difficult to find. However, there are other things you can do. When you implement a counter in the naive way, \$ A = A + 1 \$, the synthesizer will implement it as a ripple adder. There are other adder topologies you can use. The Kogge-Stone adder for example becomes faster as the width becomes large. My own experiments on the Cyclone IV suggest that happens at around 16. However, you sacrifice area, because, unlike with LFSRs, you need more logic to implement it. It also does not use the carry chain, making it slower for low widths.

If you really need to gate a clock, see this as an example, or check your manufacturer's guidelines. Notice the presence of a falling edge DFF. This is to prevent glitches on the clock line from race conditions in the logic gate.

Electronic – Fairly Simple VHDL SPI bus working in simulation but not on FPGA (Lattice MACHOX3LF-6900C FPGA and Lattice Diamond software)

Answering my own question here, as it turns out you are NOT supposed to use clock dividers in VHDL. I had falsely assumed this was fine as long as you treated each clock as a clock, however as it turns out there is a single hardware specific route that the clock takes, and a second clock sourced from a divider simply does not have the same small-delay properties of the custom clock line.

I got all this from this forum post (edaboard.com/thread283723.html), in which they recommend replacing clock dividers with clock enablers. So rather than having all your logic triggered by a clock edge, you have a clock edge trigger a look at a conditional, which looks to see if "clock enable" is high, and if it's high then it runs the code. As long as you keep clock enable high for just one clock cycle, then it acts the same. You just basically turn it high every 10 million cycles and low on the 10000001th, if you want a clock divided by 10 million for example.

You want to avoid going

if(clk = '1' and clock'event and clock_enable)

as I'm fairly certain you're supposed to avoid logical operations with your clock (I'm new to VHDL so I've never experienced this myself, just read it in other forum posts). Instead you go

if (clk = '1' and clock'event) then
    if (clock_enable = '1') then
    ...

Best Answer

Related Solutions

Electrical – Implementing a derived clock in a FPGA

Electronic – Fairly Simple VHDL SPI bus working in simulation but not on FPGA (Lattice MACHOX3LF-6900C FPGA and Lattice Diamond software)

Related Topic