I am currently working on my first-ever Verilog project for my company.
I am a newly minted engineer so my experience is just academical.
Forgive me if my English is not perfect, I'll try to do my best.
I need to implement a UART communication on a Xilinx Virtex-7 FPGA.
At the moment, I have worked on both the receiver and the transmitter by themselves AND connected to each other generating a loopback communication.
Since we had 1 bit error every 50 or so, they asked me to implement a checksum control inside my unit.
I have previously had troubles with blocking vs nonblocking assignments, solved thanks to the awesome StackOverflow community.
At the moment, when I try to send my packet (10 bytes + a checksum, that I precalculate), my unit seems to receive the content correctly (I linked RX_states to GPIO Leds to check if there was activity), when I send the checksum it activates and send back…. a completely wrong string, except for exactly the first character.
I would like to paste the Whole code here, but it exceed the maximum space allowed, so I will truncate where the code has been debugged as best as possible.
In particular, at the moment I feel like the issue resides in the Receiver-checksum-buffer chain
module async_receiver(
input reset,
input clk,
input RxD,
input Rx_enable, //buf_available from the buffer
output wire RxD_ready, //data_trigger for the buffer
output wire [7:0] RxD_data
//output wire [2:0] Rx_state_scope // data received, valid only (for one clock cycle) when RxD_ready is asserted
);
//functional parameters
parameter ClkFrequency = 200000000; // 200MHz
parameter Baud = 9600;
parameter Oversampling = 16;
reg RxD_ready_internal;
assign RxD_ready = RxD_ready_internal;
//state encoding
localparam
RX_IDLE = 3'b000,
RX_START = 3'b001,
RX_BITS_1 = 3'b010,
RX_BITS_2 = 3'b011,
RX_BITS_3 = 3'b100,
RX_BITS_4 = 3'b101,
RX_STOP = 3'b110,
RX_INIT = 3'b111;
//support variables
reg [7:0] RxD_data_int;
reg [2:0] Rx_state; //3 bits for 6 states
integer cnt_0, cnt_1; //input values for my variables
integer bit_counter; //bit counter variable
integer delay_cnt;
reg RxD_int;
integer i; //vector index for output data
//trigger recognition
wire START_Trigger;
StartDetectionUnit SDU ( .clk(clk), .state (Rx_state), .signal_in (RxD), .trigger (START_Trigger));
//tick gen
wire OversampledTick;
BaudTickGenOvers #(ClkFrequency, Baud, Oversampling) as (.reset(reset), .clk(clk), .trigger(START_Trigger), .tick(OversampledTick)); //generates Bitticks for recognition
//wire assignments
// assign Rx_state_scope = Rx_state;
assign RxD_data = RxD_data_int;
always @(posedge clk) begin
if (reset)
begin
RxD_ready_internal <= 0;
RxD_int <= 1'b0;
RxD_data_int <= 8'b00000000;
Rx_state <= RX_IDLE;
end
else if (Rx_enable)
case (Rx_state)
RX_INIT:
begin
i <= 0;
delay_cnt <= 0;
bit_counter <= 0;
cnt_0 <= 0;
cnt_1 <= 0;
Rx_state <= RX_IDLE;
end
RX_IDLE:
begin
//reinitialization after each cycle
RxD_ready_internal <= 0;
RxD_int <= 1'b0;
i <= 0;
delay_cnt <= 0;
bit_counter <= 0;
cnt_0 <= 0;
cnt_1 <= 0;
if (START_Trigger) Rx_state <= RX_START; //when RxD is received as zero during an oversampling tick, it starts the state machine
end
RX_START:
begin
if (OversampledTick)
begin
//initialize the length of the word
bit_counter <= 7;
if (delay_cnt > 17 ) //wait for a bit period + an initial delay to avoid sampling on the edge of the bit --> 16 + 2 ticks
begin
delay_cnt <= 0;
Rx_state <= RX_BITS_1;
end
//if 16 ticks have not passed, it increments the counter and stays in the same state
else begin
delay_cnt <= delay_cnt+1;
Rx_state <= RX_START;
end
end
end
RX_BITS_1:
begin
if (OversampledTick)
begin
if (delay_cnt < 1 ) //i want to execute the recognition ONLY at soon as I change state not for every wait tick
begin
RxD_int <= RxD; //copy the received bit into an internal "buffer"
if (RxD_int == 1) cnt_1 <= cnt_1+1;
else if (RxD_int == 0) cnt_0 <= cnt_0 +1;
end
if (delay_cnt > 2) //4 ticks delay
begin
delay_cnt <= 0;
Rx_state <= RX_BITS_2; //wait for a few "baud ticks" so the samples are equally interspaced
end
else begin
delay_cnt <= delay_cnt+1; //if the wait time has not been completed it stays in the same state
Rx_state <= RX_BITS_1;
end
end
end
RX_BITS_2:
begin
if (OversampledTick) //I think this is the most important thing to verify about actual functionality
begin
if (delay_cnt < 1) //i want to execute the recognition ONLY at soon as I change state not for every wait tick
begin
RxD_int <= RxD; //copy the received bit into an internal "buffer"
// $display ("seconda acquisizione avviene al tempo", $time);
if (RxD_int == 1) cnt_1 <= cnt_1+1;
else if (RxD_int == 0) cnt_0 <= cnt_0 +1;
end
if (delay_cnt > 2) //4 ticks delay
begin
delay_cnt <= 0;
Rx_state <= RX_BITS_3; //wait for a whole "baud cycle"
end
else begin
delay_cnt <= delay_cnt+1;
Rx_state <= RX_BITS_2;
end
end
end
RX_BITS_3:
begin
if (OversampledTick) //I think this is the most important thing to verify about actual functionality
begin
if (delay_cnt < 1)
begin
RxD_int <= RxD;
// $display ("terza acquisizione avviene al tempo", $time);
if (RxD_int == 1) cnt_1 <= cnt_1+1;
else if (RxD_int == 0) cnt_0 <= cnt_0 +1;
end
if (delay_cnt > 2) //4 ticks delay
begin
delay_cnt <= 0;
Rx_state <= RX_BITS_4; //wait for a whole "baud cycle"
end
else begin
delay_cnt <= delay_cnt+1;
Rx_state <= RX_BITS_3;
end
end
end
RX_BITS_4:
begin
if (OversampledTick) //I think this is the most important thing to verify about actual functionality
begin
if (delay_cnt < 1)
begin
RxD_int <= RxD;
// $display ("quarta acquisizione avviene al tempo", $time);
if (RxD_int == 1) cnt_1 <= cnt_1+1;
else if (RxD_int == 0) cnt_0 <= cnt_0 +1;
end
if (delay_cnt > 2) //4 ticks delay - one tick dedicated to writing
begin
//WRITE FUNCTION
//after checking whether there were 1s or 0s recognized, I write the appropriate value
//and increment the output index
if (cnt_1 >= 3)
begin
RxD_data_int[i] <= 1;
// $display ("OUTPUT_UART bit %b all'indice numero %d al tempo: ", RxD_data[i], i, $time);
i<=i+1;
end
else if (cnt_0 >=3)
begin
RxD_data_int[i] <= 0;
// $display ("OUTPUT_UART bit %b all'indice numero %d al tempo: ", RxD_data[i], i, $time);
i<=i+1;
end
else begin //error case when neither a 0 nor a 1 is recognized
// $display ("Frame Error!");
Rx_state <= RX_IDLE;
end
//END OF THE BYTE DESCRIPTION
if (bit_counter == 0) begin
delay_cnt <= 0;
// $display ("Detecto la fine del byte al tempo", $time);
Rx_state <= RX_STOP;
end
else begin //modify the RX_BITS variables here instead of an in an iterative state
delay_cnt <= 0;
cnt_0 <= 0;
cnt_1 <= 0;
bit_counter <= bit_counter -1;
Rx_state <= RX_BITS_1; //i have waited for the correct amount of time but the bit wasn't the last one
end
end
else begin //wait case when I have not reached the end of the 16 ticks
delay_cnt <= delay_cnt+1; //delay count
// $display ("valore del delay_cnt %d al tempo:", delay_cnt, $time);
Rx_state <= RX_BITS_4;
end
end
end
RX_STOP:
begin
if (OversampledTick)
begin
RxD_ready_internal <= 1;
// $display ("RX RECEIVED: %b aka %c @t=", RxD_data_int, RxD_data_int, $time);
Rx_state <= RX_IDLE; //wait for a whole "baud cycle"
end
end
default: Rx_state <= RX_IDLE;
endcase
end
endmodule
module checksum256 (
input reset,
input [7:0]data, //RxD_data is the data necessary for checksum
input enable, //RxD_ready flag enables the checksum
output reg sum_ok //flag to signal that my operation went correctly [to the control unit and to the transmission buffer unit]
);
//21.09 SUM_RESULT_INTERNAL becomes SUM_OK directly
reg [7:0]sum_value; //counting the sum
integer m; // count the # of byte till i am inside the control byte
reg sum_ok_internal;
// assign sum_ok = sum_ok_internal;
always @ (posedge enable or posedge reset) //triggers whenever enable (RxD_ready)signal has a rising edge
begin
if (reset)
begin
sum_value = 8'b00000000;
m = 0;
sum_ok = 0;
end
//11th byte is my control byte
else if (enable)
begin if (m == 10)
begin
//ROUTINE DI CONFRONTO DELLA SOMMA
if (sum_value == data) sum_ok = 1;
else sum_ok = 0;
// $display ("checksum_result vale %b al tempo: ", sum_ok, $time);
sum_value = 0; //reset the sum value
m = 0; //reset the word counter
end
else begin
sum_value = sum_value + data;
// $display ("sum_value vale %b al tempo:", sum_value, $time);
m = m + 1;
sum_ok =0;
end
end
end
endmodule
module output_Dbuffer (
input reset,
input [7:0]input_data, //data coming from the receiver
input data_trigger, //RxD_ready flag tells me that I can copy the data
input checksum_trigger, //SUM_OK flag from Checksum unit is the trigger for the output buffer
input RTS_n, //TX_BUSY flag of the transmitter is my ready to send flag
input clk, //ck needed for the FSM
output wire TX_trigger, //TX_START flag of the transmitter now comes from THIS unit instead of Receiver
output wire [7:0]TX_data, //byte for transmission
output wire buf_available ,
output wire [2:0]buf_state_scope
);
//internal variables
reg [7:0] mem[0:9]; //memory init, 10 * 8 bit locations
integer m, n, i, j ; //M = row [a.k.a. bytes], N = column [a.k.a. single bits]
reg TX_trigger_int;
reg [7:0] TX_data_int;
reg buf_available_flag;
reg sum256_ok, checksum_sent;
reg [7:0]checksum_buff ;
//buffer FSM required variables
localparam //state enumeration declaration
BUF_IDLE = 3'b000,
BUF_START = 3'b001,
BUF_BYTES = 3'b010,
BUF_BUSY = 3'b011,
BUF_TX_CHECKSUM = 3'b100,
BUF_INIT = 3'b101;
reg [2:0] buf_state; //2 bits for 4 states
//static assignments of OUTPUTS : Transmission Flag and Transmission Data (content)
assign TX_trigger = TX_trigger_int;
assign TX_data = TX_data_int;
//signal for sending outside of the system my buf state bits
assign buf_state_scope = buf_state;
//flag to signal if the buffer is available or busy
assign buf_available = buf_available_flag;
//Block for copying my input data inside the memory
always @(data_trigger or reset)
begin
if (reset)
begin
for (n = 0; n < 10; n = n+1) mem [n]= 8'h00;
i=0;
checksum_buff = 8'h00;
end
else if (data_trigger)
begin
if ( i < 10)
begin
mem[i] = input_data;
// $display ("mem[%d ] vale %b al tempo:", i, mem[i],$time);
i = i + 1; //the increment happens at the end of current iteration so it doesn't re-iterate on itself
end
else
begin
checksum_buff = input_data; //copy the checksum inside my buffer
// $display("checksum_buff vale %b al tempo:", checksum_buff, $time);
i = 0; //the increment happens at the end of current iteration so it doesn't re-iterate on itself
// for (n = 0; n < 10; n = n+1)
// $display ("mem[%d] = %b aka %c", n, mem[n], mem[n]);
// $display ("checksum = %b aka %c", checksum_buff, checksum_buff);
end
end
end
//Block for continuously checking the checksum flag
always @(*)
begin
sum256_ok =0;
if (reset) sum256_ok = 0;
else if (checksum_trigger == 1) sum256_ok = 1;
else if (checksum_sent == 1) sum256_ok = 0;
end
//Block for transmitting [here I manage the TX_Data and TX_Trigger functionality]
always @(posedge clk)
begin
if (reset)
begin
buf_state <= BUF_INIT;
TX_trigger_int <= 0;
TX_data_int <= 8'b00000000;
buf_available_flag <= 1;
end
else case (buf_state)
BUF_INIT:
begin
buf_state <= BUF_IDLE;
checksum_sent <=0;
j = 0;
end
BUF_IDLE:
begin
TX_trigger_int <= 0;
TX_data_int <= 8'b00000000;
m =0;
n =0;
j =0;
checksum_sent <=0;
if (sum256_ok)
begin
//check if the TX is not busy
if (RTS_n == 0) buf_state <= BUF_START;
end
end
BUF_START:
begin
TX_trigger_int <=0;
// if ((i == 0) || ( (j - i) > 1 ))
buf_state <= BUF_BYTES;
// else begin
// $display ("BUFFER BUSY @time:", $time);
// buf_available_flag <= 0;
// buf_state <= BUF_BUSY;
// end
end
BUF_BYTES:
begin
//check if the TX is busy
if (RTS_n==0)
begin
if (j > 9)
begin
TX_trigger_int <= 0;
buf_state <= BUF_TX_CHECKSUM;
end
else begin
TX_data_int <= mem[j];
// $display ("BUFFER: %b -> TX", mem[j]);
TX_trigger_int <= 1;
j <= j+1;
buf_state <= BUF_START;
end
end
end
BUF_BUSY:
begin
if (RTS_n == 0)
begin
// $display ("BUFFER AVAILABLE AGAIN @time:", $time);
buf_state <= BUF_START;
buf_available_flag <= 1;
end
end
BUF_TX_CHECKSUM:
begin
if (RTS_n==0) begin
TX_data_int <= checksum_buff;
checksum_sent <= 1;
TX_trigger_int <= 1;
buf_state <= BUF_IDLE;
end
end
default: buf_state = BUF_IDLE;
endcase
end
endmodule
Best Answer
As I stated in my comment,
mem
is an inferred level-sensitive latch (for rest of the answer I will use latch for anything level-sensitive, and flop for anything edge-sensitive). Latches are inferred in always blocks that are not edge-triggered (posedge
/negedge
) and is not assigned to a deterministic value if all logical branches.Here is your original logic reformatted with my comments:
Except for synchronous logic, synthesizers do not look at the sensitivity list. Therefore
input_data
andi
are actually part of the sensitivity list during synthesis. Simulation does use the sensitivity list, therefore it will ignore changes toinput_data
andi
. This is a place where simulation and synthesis will mismatch.i
is part of the real sensitivity and it is updated on every pass of the always block. This creates an asynchronous feedback willdata_trigger
is high andreset
is low. If you were to have addedi
to your sensitivity list, your simulation would likely hang from an infinite loop. On FPGA there is a slight propagation delay so you will results of seemingly random data. If you were measuring power you would also see high power consumption in this condition.If you look through your log files there may be a warning about the asynchronous feedback.
i
needs to be a flop.mem
andchecksum_buff
should be flops too. From an eyeball review of your code, it looks like everything is running on theposedge clk
domain. Therefore I suggest you change the code to:With this reset will also be synchronous. If you really want asynchronous reset, change it to
always @(posedge clk or posedge reset)
. If you really wantdata_trigger
to be the clocking event (I don't recommend it) then change theposedge clk
toposedge data_trigger
AND replaceelse if (data_trigger)
withelse
. Anyposedge
/negedge
signal referenced in the block is treated as an asynchronous trigger. You have this issue in yourchecksum256
module.Verilog combinational logic should be written in
always @*
or the synonymousalways @(*)
. You will run into code that give a list of signals instead of a literal*
; don't do that. That style is for legacy Verilog-1995. Since Verilog-2001@*
/@(*)
is the preferred strategy to limit the risk an incomplete sensitivity list. The other thing for proper combinational logic is that each left-hand-side (LHS) signal must be assigned to a determinate value for all possible logic branches. Assigning it to itself doesn't count. Assigning it a determinate value at the top of an always block before anif
orcase
gives it a default assignment. Combinational logic should be assigned with blocking statements (=
).Synchronous logic must be have a clocking trigger
posedge
ornegedge
. Do not reference the clock signal in the body of the of the always block. Up to two additionalposedge
ornegedge
signals can be in the sensitivity list for asynchronous reset/preset, and asynchronous must assign registers to constants. Not all FPGAs support flops with asynchronous reset/preset; some have a limited number. Check the manual for your board. Synchronous logic should be assigned with non-blocking statements (<=
).Latches usually should be avoided. Unfortunately they can be unintentionally inferred with what should be combinational logic. A latch happens when not all logical branches assign the signal to a deterministic value. If you need a latch, keep it logic simple and assign with non-blocking.
If you have SystemVerilog you can be more explicit by replacing
always @*
withalways_comb
andalways_latch
. This way your synthesizer can do a better job warning you about unintentional latches.FYI: Verilog non-blocking was inspired/copied form VHDL. The usage of blocking and non-blocking should be similar between the two.