Have I understood that situation
correctly?
Yes - if some part of your output data is available later than other parts, you have to delay the other parts so they line up.
It's not a fudge, or a "bad" thing to do - it's just what has to be done to make the outputs right.
I could probably buffer the sync pulses too and delay them in the same way.
That's what I'd do. (EDIT: And as Yann reminded me, delaying signals can be very cheap in Xilinx FPGAs - 16 ticks can fit in a single look-up table + 1 more in the flipflop that's next to the LUT)
Or I could pre adjust the calculated memory address to compensate in advance.
That's another option, but will probably take more logic.
Verilog offers three system tasks related to the simulation timestep:
$time
returns the current time as a 64-bit integer, in units of the timestep defined by the `timescale directive.
$stime
returns the same value as a 32-bit integer.
$realtime
returns the same value as a real number.
However none of these system directives are likely to be useful for synthesis. They would normally be implemented by special code in the simulator that accesses variables in the simulator program.
To keep track of time in a real circuit, you need to start with an input clock with a known frequency. For good accuracy you'd generate this with a crystal oscillator circuit, which would be entirely outside your device if you're working on an FPGA.
Then you simply build a counter to keep track of ticks of your clock (note code not tested):
module time_counter(input clk, input rst, output reg [31:0] ticks)
always @(posedge clk or posedge rst) begin
if (rst) begin
ticks <= 32'h00000000;
end
else begin
ticks <= ticks + 1;
end
end
endmodule
There is a limit to how fast a counter like this can operate, and various tricks to build faster counters if needed.
If you need to find the period of an input with an unknown period, you need to compare it to a clock with a known period. It's simplest conceptually to make the known clock much faster than the unknown clock and count ticks of the known clock for each period of the unknown clock. If that's not possible, you can divide the unknown clock by some factor (say 16 or 128 or 1024) and count ticks of the known clock for each cycle of the divider to work out the unknown period.
Best Answer
In this case, the execution order should be as follows (in this exact order):
What you are running into is confusion about blocking vs non-blocking assignment. This is a nice example as to why it is called non-blocking assignment. When you hit the line
#5 window[i]<=douta;
, the simulator will wait 5 time units, then schedulewindow[i]
to take on the value ofdouta
at the end of the current time step. Because you used the non-blocking assignment operator, the execution of thealways
block continues to the next statements, which is thecase
statement. Since the assigns here are blocking assignments, they are executed immediately. Then, the#5
is hit (after the case), so the scheduler schedules the next execution for 5 time units in the future. Then it moves on to anything else it needs to do. Finally, it comes back (still within time step X+5) and completes all the non-blocking assignments (likewindow[i]<=douta
). Thats why you are seeing what looks like non-inorder execution of youralways
block.