Electronic – Why won’t the Xilinx block RAM in a Spartan-3E consistently return data in a single clock cycle

fpgaspartan-3timing-analysisxilinx

I'm creating a design using Verilog on a Xilinx Spartan-3E (XC3S500E) that uses multiple dual-port block RAMs, all instantiated through Verilog primitives such as RAMB16_S18_S18. I am using one port for both reading and writing (using write-enable) and the second for only reading (by setting WEB to 0). Both ports share the same clock. The block RAM is set to 18-bit wide, but I am ignoring the parity data (that is, not using its output value, and always writing zeros to parity bits)

I am using Xilinx ISE 13.4 and synthesizing/implementing using the GUI workflow with default settings. (non-default settings such as aggressive timing optimization and/or physical synthesis did not have a difference in regard to this issue)

I have timing constraints made for my one and only clock net, and it is consistent with the actual clock signal being put in (the constraint is for 50MHz, there is a 50MHz clock on the dev board I am using, and the timing report states that the maximum frequency for my design is 64.7MHz. The clock runs through a clock mutliplexer used as a clock enable, before going to the entirety of my logic.

In my code, I have a state machine that has three states (transitioning on rising edge of the same 50MHz clock that the block RAM uses):

  1. Write address to a register connected to the address inputs of ports A and B.
  2. Read data, perform some logic operations on it, and write it to a reg[15:0] that is connected to DIA (data in A) on the block RAM (with the address not changing). Turn on write enable for port A.
  3. Read some IO pins into unrelated registers. Turn off block RAM write enable.

This consistently succeeds in a behavioral simulation in ISim, and (although it had less testing than the simulator) consistently succeeds on port A. Port B has identical logic (simply bit-slicing for address, and the same configuration for SSR, SRVAL, INIT) but does not succeed in reading during this one clock cycle. By simply adding an extra state between 1 and 2 (thus giving a setup time of over one entire clock cycle for the address) the design works, although as a learner of FPGA development I would like to know why and how to avert it.

According to Xilinx datasheet DS312 and timing diagrams within, this should be an acceptable way of using the block RAM. There are setup and hold times given in that same datasheet, but the ISE tools should already be aware of them and apply them during timing analysis, if I'm not mistaken. Additionally, I've re-read the block RAM section of UG331 (Spartan-3 Generation User Guide) a few times and could not find any inconsistencies between the instructions and my use of the block RAM.

The timing report list of the slowest paths mysteriously does not list any paths going to the offending RAM port.

If someone could make a recommendation, that would be appreciated, as I've spent quite a while debugging this and fear that I might be making some beginners' mistake. If any additional info is needed, please let me know so I can provide it.

Best Answer

Newly written data at the rising-edge is available directly after this edge only at the same port. Actually, the data input is internally forwarded to the data output of the same RAM port. Also called WRITE_FIRST mode.

But, it is never forwarded to the output of the other RAM port, regardless of the specified WRITE_MODE. It will be available for reading (of course at a another rising edge) after the internal write to the memory has been completed. In your example it is just the next rising clock edge, because the internal write time is always smaller (faster) than the minimum allowed clock period.

This behavior is described in XAPP 463 Using Block RAM in Spartan-3 Generation FPGAs in section Dual-Port RAM Conflicts and Resolution. The given example there uses different clocks, but is also applies whe the same clock is used for both ports.

This behaviour is still the same in current FPGAs from Xilinx and Altera.

The forwarding to the other RAM port has to be done by your one with surrounding logic.