Electrical – FPGA Internal Timing constraint failing

fifofpgaintel-fpgatiming-analysisverilog

I'm currently trying to implement an IP-Core on a Cyclone V 5CSEBA6U23I7 FPGA-HPS System using Altera Quartus II and TimeQuest Analyzer.
The Verilog code pasted below produces a timing problem, namely the assignment fifo_wdata_289[255:0] <= {fifo_out,fifo_wdata_289[255:16]};, which writes the output of one FIFO into another FIFO register array.
The FIFOs in use are asynchronous, but the signals used are on the same clock domain.
This is being placed on the chip with a clock skew of -2.255 ns, which is a little bit less than a whole period (288 MHz clock => 3.47 ns), and makes TimeQuest complain, that the constraints are being violated.
The recommendation of TimeQuest is to Reduce the levels of combinational logic for the path (Issue Long Combinational Path) with Extra levels of combinational logic = 1.

The 288 MHz clock is generated via PLL and I'm using Timeconstraint files (sdc) with commands derive_pll_clocks and derive_clock_uncertainty.

My question now is how to solve this problem, since I only have one layer of combinational logic (Demuxer and routing, as suggested by the assignment) and thus have no idea how to reduce that. Is there any other way I can make sure the timing requirement is met, or is there a better way to program this statemachine?
Thanks for your help.

            case(stateProc)
                2'b00:
                begin
                    if(~fifo_empty&~fifo_full_289)
                    begin
                        fifo_wdata_289[255:0] <= {fifo_out,fifo_wdata_289[255:16]};
                        fifo_wdata_289[287:256] <= {2'b11,fifo_wdata_289[287:258]};
                        if(imgSize >= maxImgSize - 32'd2)//image done, transmit data and fire irq
                        begin
                            fifo_wdata_289[288] <= 1;
                            stateProc <= 2'b01;
                            imgSize <= 32'd0;
                        end
                        else if(fifo_wdata_289[258])//process transfer without irq //258 high means that 256 has been written in this cycle and 258 in the previous cycle
                        begin
                            fifo_wdata_289[288] <= 0;
                            stateProc <= 2'b01;
                            imgSize <= imgSize + 32'd2;
                        end
                        else//accumulate more data
                        begin
                            fifo_wdata_289[288] <= 0;
                            stateProc <= 2'b00;
                            imgSize <= imgSize + 32'd2;
                        end
                    end
                    else
                    begin
                        fifo_wdata_289 <= fifo_wdata_289;
                        imgSize <= imgSize;
                        stateProc <= 2'b00;
                    end
                end
                2'b01:
                begin
                    fifo_wdata_289 <= fifo_wdata_289;
                    imgSize <= imgSize;
                    stateProc <= 2'b11;
                end
                2'b11:
                begin
                    fifo_wdata_289 <= 0;
                    imgSize <= imgSize;
                    stateProc <= 2'b00;
                end
            endcase

EDIT:

The TimeQuest failing paths look mostly like this:
Slack:
-3.178

From Node:
soc_system:u0|CamConnector:camconnector_0|FIFO_289:FIFO_inst_289|dcfifo:dcfifo_component|dcfifo_6up1:auto_generated|wrptr_g[0]

To Node:
soc_system:u0|CamConnector:camconnector_0|fifo_wdata_289[267]

EDIT2:
I'm currently using MLAB cells as FIFO storage, which are rated for 290 MHz.
Both FIFOs are asynchronous and have a synchronizaton setting of 3.

Timing closure Recommandations of TimeQuest reports the following two issues:

  1. Unbalanced Combinational Logic
  2. Long Combinational Path

Best Answer

since I only have one layer of combinational logic (Demuxer and routing, as suggested by the assignment)

No you don't have that. There is also the clearing in stateProc==2'b11. But I assume the Cyclone LUTs are big enough that it can encompass an extra AND gate at the same time.

But I suggest you also post the failing path as reported by the tool. Because something does not add up here. The path for the fifo_wdata_289[288] is more complex and you say that it does pass timing. I would not be surprised if the mux control signals: fifo_empty, fifo_full_289, stateProc etc. are also in the failing timing path together with the drive tree as you control a lot of FF's from one signal.

If not you can really only run at a lower speed. There is no use in fiddling with the clock as the circuit uses it's own signals (fifo_wdata_289[255:0] <= {fifo_out,fifo_wdata_289[255:16]};) Thus if you skew or invert the clock, fifo_wdata_289[255:16] will come out later and thus again fail timing. You may be lucky and the fifo_out is arriving late and causing the timing error but again look at the timing report of the failing path.

Anyway, skewing a clock is something which can be done in an ASIC but I would not try it an FPGA. I don't think you have the fine control to skew one clock ~300ps against another. But I have not us cyclones so maybe....


Update:
Just what I expected: The path start at the wrptr_g... I assume that is the write pointer which is probably compared to the read pointer which makes the fifo_full/empty flags which control the mux.

The best way to get rid of the timing issue is to make a registered version of the full/empty flags. That may mean you have to redesign some of the rest of the circuit.

If the FIFO is a synchronous one there is a trick to generate synchronous full/empty flags. You check if the level is ONE and on a read but not write you synchronous set the 'empty' flag. See here for some free example code of FIFO's which have that feature. Unfortunately that does not work with a-synchronous FIFOs.