Electronic – Discrepancy between post-Place-and-Route static timing analysis and ISIM simulation results

constraintsfpgasimulationspartantiming

Overview

I'm implementing a simple Harvard-style CPU using Xilinx ISE version 14.1. I'm using settings compatible with a Digilent Nexys3 board, but for the time being the entire project is performed in simulation only.

I have the following entry in my UCF file that specifies the location (pin) of the clock on the Nexys3 board, along with a 100MHz period constraint. This means a period of 10ns.

Net "clk" LOC=V10 | IOSTANDARD=LVCMOS33;
Net "clk" TNM_NET = sys_clk_pin;
TIMESPEC TS_sys_clk_pin = PERIOD sys_clk_pin 100000 kHz;

I am clocking all synchronous logic using the positive edge of this clock.

Post Place-and-Route static timing analysis suggests everything is fine:

Timing constraint: TS_sys_clk_pin = PERIOD TIMEGRP "sys_clk_pin" 100 MHz HIGH 
50%;
For more information, see Period Analysis in the Timing Closure User Guide (UG612).

 12987 paths analyzed, 961 endpoints analyzed, 0 failing endpoints
 0 timing errors detected. (0 setup errors, 0 hold errors, 0 component switching limit errors)
 Minimum period is   4.003ns.

The minimum period is well within the 10ns target. There are no unconstrained paths in the report.

The report then mentions this path first. Since the timing matches, I assume it's the slowest path in the design (the path with the least slack). It's the path from the Instruction Register to the highest bit of the stack pointer. The path (through the ALU and bus 3) looks sane for my design when the register is loaded with an immediate value. The push/pop path takes a different path.

Slack (setup path):     5.997ns (requirement - (data path - clock path skew + uncertainty))
  Source:               CONTROL/IR_15_2 (FF)
  Destination:          SP/VALUE_31 (FF)
  Requirement:          10.000ns
  Data Path Delay:      3.951ns (Levels of Logic = 9)
  Clock Path Skew:      -0.017ns (0.252 - 0.269)
  Source Clock:         clk_BUFGP rising at 0.000ns
  Destination Clock:    clk_BUFGP rising at 10.000ns
  Clock Uncertainty:    0.035ns

  Clock Uncertainty:          0.035ns  ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
    Total System Jitter (TSJ):  0.070ns
    Total Input Jitter (TIJ):   0.000ns
    Discrete Jitter (DJ):       0.000ns
    Phase Error (PE):           0.000ns

  Maximum Data Path at Slow Process Corner: CONTROL/IR_15_2 to SP/VALUE_31
    Location             Delay type         Delay(ns)  Physical Resource
                                                       Logical Resource(s)
    -------------------------------------------------  -------------------
    SLICE_X13Y30.BQ      Tcko                  0.391   CONTROL/IR_15_3
                                                       CONTROL/IR_15_2
    SLICE_X5Y31.D3       net (fanout=12)       0.847   CONTROL/IR_15_2
    SLICE_X5Y31.D        Tilo                  0.259   RAM/read_address<1>
                                                       ALU1/Mmux_RR11241
    SLICE_X14Y24.B3      net (fanout=4)        1.283   bus3_1_OBUF
    SLICE_X14Y24.COUT    Topcyb                0.380   SP/VALUE<3>
                                                       SP/Mcount_VALUE_lut<1>
                                                       SP/Mcount_VALUE_cy<3>
    SLICE_X14Y25.CIN     net (fanout=1)        0.003   SP/Mcount_VALUE_cy<3>
    SLICE_X14Y25.COUT    Tbyp                  0.076   SP/VALUE<7>
                                                       SP/Mcount_VALUE_cy<7>
    SLICE_X14Y26.CIN     net (fanout=1)        0.003   SP/Mcount_VALUE_cy<7>
    SLICE_X14Y26.COUT    Tbyp                  0.076   SP/VALUE<11>
                                                       SP/Mcount_VALUE_cy<11>
    SLICE_X14Y27.CIN     net (fanout=1)        0.003   SP/Mcount_VALUE_cy<11>
    SLICE_X14Y27.COUT    Tbyp                  0.076   SP/VALUE<15>
                                                       SP/Mcount_VALUE_cy<15>
    SLICE_X14Y28.CIN     net (fanout=1)        0.003   SP/Mcount_VALUE_cy<15>
    SLICE_X14Y28.COUT    Tbyp                  0.076   SP/VALUE<19>
                                                       SP/Mcount_VALUE_cy<19>
    SLICE_X14Y29.CIN     net (fanout=1)        0.003   SP/Mcount_VALUE_cy<19>
    SLICE_X14Y29.COUT    Tbyp                  0.076   SP/VALUE<23>
                                                       SP/Mcount_VALUE_cy<23>
    SLICE_X14Y30.CIN     net (fanout=1)        0.003   SP/Mcount_VALUE_cy<23>
    SLICE_X14Y30.COUT    Tbyp                  0.076   SP/VALUE<27>
                                                       SP/Mcount_VALUE_cy<27>
    SLICE_X14Y31.CIN     net (fanout=1)        0.003   SP/Mcount_VALUE_cy<27>
    SLICE_X14Y31.CLK     Tcinck                0.314   SP/VALUE<31>
                                                       SP/Mcount_VALUE_xor<31>
                                                       SP/VALUE_31
    -------------------------------------------------  ---------------------------
    Total                                      3.951ns (1.800ns logic, 2.151ns route)
                                                       (45.6% logic, 54.4% route)

Armed with this knowledge I run a post-Place-and-Route simulation with a 10ns clock period thinking everything will be fine. However, it is not. The signals do not settle in time for the next clock edge and everything is a mess. Relaxing the clock to 50ns (20Mhz) allows plenty of time for everything to settle.

timings

At 425ns we get the clock pulse that signals the start of the cycle in which we will execute the instruction SP <- 0xFFFFFFFF. IR_15_2 is the signal from the timing report. SP_value is a register, so it only assumes the value presented to it on the next rising edge. SP is loaded from bus3 so we use that as a proxy.

In the graph we see that it takes 3ns or so for IR_15_2 to be asserted at all. Then it takes over 10ns more for the signal to be taken over by bus1. At 451ns, a full 26ns later, the signal is available on bus3 and we can start thinking about loading SP with it.

Question

Static timing tell me that the path longest register-to-register path in the design should take about 4ns, whereas the simulation shows that the signals take about 26ns to settle. What is going on here? Is the static timing analysis not finding all relevant paths? Did I use/configure the simulator wrong? Did I misread the static timing analysis?

I'm OK running the design at 20Mhz, this is not a speed competition. I just have the feeling that I'm missing something important.

Additional Information

The complete project (VHDL files, XISE project) is available on bitbucket.

Best Answer

Looking through your timing report, there is nothing that indicates a potential issue. Since you have a problem, this means that the scenarios that static timing analysis (STA) is checking are not covering the actual usage of your circuit.

Without any serious setup of STA, some common assumptions are that all inputs are valid by the time the clock rises, and that all states are known (meaning a logic 1 or 0). Immediately, the UUUUUUUU on bus3 looks very suspicious, and is a possible issue with initialization. In logic simulations, U implies that the line is either a 1 or 0, but a register driving it was not initialized properly. This could cause the simulator to give weird answers until all registers are loaded or reset. However, this problem manifests itself in later cycles after all registers have been initialized.

The other potential issue is bus1 starts in a high impedance state (ZZZZZZZZ). Considering that tri-state is not usually assumed in timing analysis, this is the most likely source of the timing discrepancy. Tri-state conditions must be carefully coded into your STA tool in order for them to be considered. This can be a very difficult task, and is prone to error (incorrect programming, missed cases, etc.). I believe that programming in tri-state delays would most likely give you an accurate STA result that should match your simulation.

However, tri-state is usually a bad choice for on-chip communication for both ASICs and FPGAs. This ambiguity of STA reliability, potential for bus contention, and the uncertainty of drive strength requirements make tri-state more likely to cause problems than fix them. The safer method is to use a multiplexor to select which source "talks" to the bus, or partition the design differently. I would only use tri-state when I know it will solve more issues than it can cause.