Overview
I'm implementing a simple Harvard-style CPU using Xilinx ISE version 14.1. I'm using settings compatible with a Digilent Nexys3 board, but for the time being the entire project is performed in simulation only.
I have the following entry in my UCF file that specifies the location (pin) of the clock on the Nexys3 board, along with a 100MHz period constraint. This means a period of 10ns.
Net "clk" LOC=V10 | IOSTANDARD=LVCMOS33;
Net "clk" TNM_NET = sys_clk_pin;
TIMESPEC TS_sys_clk_pin = PERIOD sys_clk_pin 100000 kHz;
I am clocking all synchronous logic using the positive edge of this clock.
Post Place-and-Route static timing analysis suggests everything is fine:
Timing constraint: TS_sys_clk_pin = PERIOD TIMEGRP "sys_clk_pin" 100 MHz HIGH
50%;
For more information, see Period Analysis in the Timing Closure User Guide (UG612).
12987 paths analyzed, 961 endpoints analyzed, 0 failing endpoints
0 timing errors detected. (0 setup errors, 0 hold errors, 0 component switching limit errors)
Minimum period is 4.003ns.
The minimum period is well within the 10ns target. There are no unconstrained paths in the report.
The report then mentions this path first. Since the timing matches, I assume it's the slowest path in the design (the path with the least slack). It's the path from the Instruction Register to the highest bit of the stack pointer. The path (through the ALU and bus 3) looks sane for my design when the register is loaded with an immediate value. The push/pop path takes a different path.
Slack (setup path): 5.997ns (requirement - (data path - clock path skew + uncertainty))
Source: CONTROL/IR_15_2 (FF)
Destination: SP/VALUE_31 (FF)
Requirement: 10.000ns
Data Path Delay: 3.951ns (Levels of Logic = 9)
Clock Path Skew: -0.017ns (0.252 - 0.269)
Source Clock: clk_BUFGP rising at 0.000ns
Destination Clock: clk_BUFGP rising at 10.000ns
Clock Uncertainty: 0.035ns
Clock Uncertainty: 0.035ns ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
Total System Jitter (TSJ): 0.070ns
Total Input Jitter (TIJ): 0.000ns
Discrete Jitter (DJ): 0.000ns
Phase Error (PE): 0.000ns
Maximum Data Path at Slow Process Corner: CONTROL/IR_15_2 to SP/VALUE_31
Location Delay type Delay(ns) Physical Resource
Logical Resource(s)
------------------------------------------------- -------------------
SLICE_X13Y30.BQ Tcko 0.391 CONTROL/IR_15_3
CONTROL/IR_15_2
SLICE_X5Y31.D3 net (fanout=12) 0.847 CONTROL/IR_15_2
SLICE_X5Y31.D Tilo 0.259 RAM/read_address<1>
ALU1/Mmux_RR11241
SLICE_X14Y24.B3 net (fanout=4) 1.283 bus3_1_OBUF
SLICE_X14Y24.COUT Topcyb 0.380 SP/VALUE<3>
SP/Mcount_VALUE_lut<1>
SP/Mcount_VALUE_cy<3>
SLICE_X14Y25.CIN net (fanout=1) 0.003 SP/Mcount_VALUE_cy<3>
SLICE_X14Y25.COUT Tbyp 0.076 SP/VALUE<7>
SP/Mcount_VALUE_cy<7>
SLICE_X14Y26.CIN net (fanout=1) 0.003 SP/Mcount_VALUE_cy<7>
SLICE_X14Y26.COUT Tbyp 0.076 SP/VALUE<11>
SP/Mcount_VALUE_cy<11>
SLICE_X14Y27.CIN net (fanout=1) 0.003 SP/Mcount_VALUE_cy<11>
SLICE_X14Y27.COUT Tbyp 0.076 SP/VALUE<15>
SP/Mcount_VALUE_cy<15>
SLICE_X14Y28.CIN net (fanout=1) 0.003 SP/Mcount_VALUE_cy<15>
SLICE_X14Y28.COUT Tbyp 0.076 SP/VALUE<19>
SP/Mcount_VALUE_cy<19>
SLICE_X14Y29.CIN net (fanout=1) 0.003 SP/Mcount_VALUE_cy<19>
SLICE_X14Y29.COUT Tbyp 0.076 SP/VALUE<23>
SP/Mcount_VALUE_cy<23>
SLICE_X14Y30.CIN net (fanout=1) 0.003 SP/Mcount_VALUE_cy<23>
SLICE_X14Y30.COUT Tbyp 0.076 SP/VALUE<27>
SP/Mcount_VALUE_cy<27>
SLICE_X14Y31.CIN net (fanout=1) 0.003 SP/Mcount_VALUE_cy<27>
SLICE_X14Y31.CLK Tcinck 0.314 SP/VALUE<31>
SP/Mcount_VALUE_xor<31>
SP/VALUE_31
------------------------------------------------- ---------------------------
Total 3.951ns (1.800ns logic, 2.151ns route)
(45.6% logic, 54.4% route)
Armed with this knowledge I run a post-Place-and-Route simulation with a 10ns clock period thinking everything will be fine. However, it is not. The signals do not settle in time for the next clock edge and everything is a mess. Relaxing the clock to 50ns (20Mhz) allows plenty of time for everything to settle.
At 425ns we get the clock pulse that signals the start of the cycle in which we will execute the instruction SP <- 0xFFFFFFFF
. IR_15_2
is the signal from the timing report. SP_value
is a register, so it only assumes the value presented to it on the next rising edge. SP is loaded from bus3 so we use that as a proxy.
In the graph we see that it takes 3ns or so for IR_15_2
to be asserted at all. Then it takes over 10ns more for the signal to be taken over by bus1. At 451ns, a full 26ns later, the signal is available on bus3 and we can start thinking about loading SP with it.
Question
Static timing tell me that the path longest register-to-register path in the design should take about 4ns, whereas the simulation shows that the signals take about 26ns to settle. What is going on here? Is the static timing analysis not finding all relevant paths? Did I use/configure the simulator wrong? Did I misread the static timing analysis?
I'm OK running the design at 20Mhz, this is not a speed competition. I just have the feeling that I'm missing something important.
Additional Information
The complete project (VHDL files, XISE project) is available on bitbucket.
Best Answer
Looking through your timing report, there is nothing that indicates a potential issue. Since you have a problem, this means that the scenarios that static timing analysis (STA) is checking are not covering the actual usage of your circuit.
Without any serious setup of STA, some common assumptions are that all inputs are valid by the time the clock rises, and that all states are known (meaning a logic
1
or0
). Immediately, theUUUUUUUU
onbus3
looks very suspicious, and is a possible issue with initialization. In logic simulations,U
implies that the line is either a1
or0
, but a register driving it was not initialized properly. This could cause the simulator to give weird answers until all registers are loaded or reset. However, this problem manifests itself in later cycles after all registers have been initialized.The other potential issue is
bus1
starts in a high impedance state (ZZZZZZZZ
). Considering that tri-state is not usually assumed in timing analysis, this is the most likely source of the timing discrepancy. Tri-state conditions must be carefully coded into your STA tool in order for them to be considered. This can be a very difficult task, and is prone to error (incorrect programming, missed cases, etc.). I believe that programming in tri-state delays would most likely give you an accurate STA result that should match your simulation.However, tri-state is usually a bad choice for on-chip communication for both ASICs and FPGAs. This ambiguity of STA reliability, potential for bus contention, and the uncertainty of drive strength requirements make tri-state more likely to cause problems than fix them. The safer method is to use a multiplexor to select which source "talks" to the bus, or partition the design differently. I would only use tri-state when I know it will solve more issues than it can cause.