Electronic – Verilog “optimizes” the code to run slower

verilog

I am using a Lattice LC4256V-5T144 CPLD and their ispLever Classic software. Simulation says this chip can run the 20-bit counter at 227 MHz; my requirement is 200 MHz. I have found that if the clock enable is used, code only runs at 182 MHz, which is too slow. I can design the circuit to avoid using clock enable, but the software always "optimizes" it to use the clock enable, and thus slows it down. There is plenty of space in the CPLD.

The counter is OK, my problem is latching the counter value upon the rising edge of an independent OneKHz signal derived from a 10MHz timebase. I use a 2-FF synchronizer for it.

QUESTION: When setting latch, how can I force Verilog to use logic on the D input rather than "optimizing" it to use the clock enable?

Here's the relevant Verilog code:

wire counterInput, OneKHz;  // defined and driven elsewhere in the module
                            // counterInput is 10kHz to 200 MHz
reg [19:0] counter, latch;  // latch is read via SPI elsewhere in the module
reg latchEn1, latchEn2;     // 2-FF synchronizer for OneKHz
reg latchEnable;            // also sets Ready (which is cleared by SPI read)

// latchEn1 and latchEn2 form a 2-bit shift register of OneKHz, so
// (latchEn1 & !latchEn2) means a rising edge on OneKHz happened
// on posedge of counterInput one clock earlier.

always @ (posedge counterInput)
begin
    counter <= counter + 1'd1;
    latchEn2 <= latchEn1;
    latchEn1 <= OneKHz;
    latchEnable <= (latchEn1 & !latchEn2);
    latch <= latchEnable ? counter : latch; // THIS LINE IS THE PROBLEM
//  latch <= (latchEnable & counter) | (!latchEnable & latch); // also fails
//          (that line also fails if I write out each bit)
end

Best Answer

The software is Lattice's ispLever Classic, as I said, version 2.0 (the latest). That is a software suite, and the component doing this "optimization" is the fitter.

Clock enable is the issue, as the clock enable must be valid before the clock, while the data input needs to be valid at the clock. Looking at the post-fit equations in the "Fitter Report" shows that the clock enable is being used, and the "Timing Report" shows Fmax=185MHz with the limiting path involving the clock enables.

I kept playing with it, and found that the presence of seemingly unrelated wires can generate different answers, sometimes differing significantly in Fmax. Replacing the last line in the always block with this still uses the clock enable and gives FMax=185MHz:

if(latchEnable) latch = counter;

But if I also add a new wire "dummy" to the module command and add these lines, I get Fmax=210MHz:

output dummy;
wire [19:0] latchSelect = (latchEn1 & !latchEn2) ? counter : latch;
assign dummy = latchSelect[0] | latchSelect[1] | latchSelect[2] | 
    latchSelect[3] | latchSelect[4] | latchSelect[5] | latchSelect[6] | 
    latchSelect[7] | latchSelect[8] | latchSelect[9] | latchSelect[10] | 
    latchSelect[11] | latchSelect[12] | latchSelect[13] | latchSelect[14] | 
    latchSelect[15] | latchSelect[16] | latchSelect[17] | latchSelect[18] | 
    latchSelect[19];

Note this new latchSelect is used ONLY to set dummy, which is an output pin (a real output is needed to prevent it being optimized away). But its presence changes the code used to set the latch. Now it uses the D input, not the clock enable; Fmax=210MHz. (Making dummy be a 20-bit bus slows it down.)

I can now replace the last line in the always block with this, and get FMax=208MHz (yes, slower than previous);

latch = latchSelect;

But if I remove dummy the latchSelect gets optimized away, it uses the clock enable, and Fmax=185MHz.

Another poor "optimization" I found is that if I connect a FF output like latchEn1 to an output pin, it also slows the circuit down, because wherever that signal is used, it uses the output pin, not the FF output (look at the post-fit equations). I connect important signals to output pins so my logic analyzer can see them for debugging -- it's OK if those output pins run somewhat slower, but reducing the counter/latch Fmax means I must remove those assignments once the circuit works.

Bottom line: fitting verilog equations to a device is complicated and subtle. Small changes in the code can make important changes in circuit performance. I do have code that meets my requirements with 5% margin (Fmax=210MHz, requirement is 200 MHz). Of course I still need to install the circuit and test it with real signals....