Electrical – Verilog Vivado UART design : implementation / bitstream issues

fpgahdlverilogvivadoxilinx

I am currently working on my first-ever Verilog project for my company.
I am a newly minted engineer so my experience is just academical.
Forgive me if my English is not perfect, I'll try to do my best.

I need to implement a UART communication on a Xilinx Virtex-7 FPGA.
At the moment, I have worked on both the receiver and the transmitter by themselves AND connected to each other generating a loopback communication.
Since we had 1 bit error every 50 or so, they asked me to implement a checksum control inside my unit.

I have previously had troubles with blocking vs nonblocking assignments, solved thanks to the awesome StackOverflow community.

At the moment, when I try to send my packet (10 bytes + a checksum, that I precalculate), my unit seems to receive the content correctly (I linked RX_states to GPIO Leds to check if there was activity), when I send the checksum it activates and send back…. a completely wrong string, except for exactly the first character.

I would like to paste the Whole code here, but it exceed the maximum space allowed, so I will truncate where the code has been debugged as best as possible.

In particular, at the moment I feel like the issue resides in the Receiver-checksum-buffer chain

module async_receiver(
    input reset,
    input clk,
    input RxD, 
    input Rx_enable, //buf_available from the buffer
    output wire RxD_ready, //data_trigger for the buffer
    output wire [7:0] RxD_data
    //output wire [2:0] Rx_state_scope // data received, valid only (for one clock cycle) when RxD_ready is asserted   
    );
    //functional parameters
    parameter ClkFrequency = 200000000; // 200MHz
    parameter Baud = 9600;
    parameter Oversampling = 16;

    reg RxD_ready_internal;

    assign RxD_ready = RxD_ready_internal;


    //state encoding
    localparam  
    RX_IDLE = 3'b000, 
    RX_START = 3'b001,  
    RX_BITS_1 = 3'b010,
    RX_BITS_2 = 3'b011,
    RX_BITS_3 = 3'b100,
    RX_BITS_4 = 3'b101,
    RX_STOP = 3'b110,
    RX_INIT = 3'b111;

    //support variables
    reg [7:0]  RxD_data_int;  
    reg [2:0] Rx_state; //3 bits for 6 states
    integer cnt_0, cnt_1; //input values for my variables
    integer bit_counter;   //bit counter variable
    integer delay_cnt;
    reg RxD_int;
    integer i; //vector index for output data

    //trigger recognition
    wire START_Trigger;
    StartDetectionUnit SDU ( .clk(clk), .state (Rx_state), .signal_in (RxD), .trigger (START_Trigger));

    //tick gen
    wire OversampledTick;        
    BaudTickGenOvers #(ClkFrequency, Baud, Oversampling) as (.reset(reset), .clk(clk), .trigger(START_Trigger), .tick(OversampledTick)); //generates Bitticks for recognition

    //wire assignments     
  //  assign Rx_state_scope = Rx_state;
    assign RxD_data = RxD_data_int;



    always @(posedge clk) begin


        if (reset) 
            begin
                 RxD_ready_internal <= 0;
                 RxD_int <= 1'b0;
                 RxD_data_int <= 8'b00000000;
                 Rx_state <= RX_IDLE;

            end
            else if (Rx_enable)
                        case (Rx_state)


                        RX_INIT:
                                begin
                                    i <= 0;
                                    delay_cnt <= 0;
                                    bit_counter <= 0;
                                    cnt_0 <= 0;
                                    cnt_1 <= 0;
                                    Rx_state <= RX_IDLE;

                                end


                        RX_IDLE: 
                                begin
                                    //reinitialization after each cycle
                                    RxD_ready_internal <= 0;
                                    RxD_int <= 1'b0;
                                    i <= 0;
                                    delay_cnt <= 0;
                                    bit_counter <= 0;
                                    cnt_0 <= 0;
                                    cnt_1 <= 0;

                                        if (START_Trigger) Rx_state <= RX_START; //when RxD is received as zero during an oversampling tick, it starts the state machine

                                end                                                              


                        RX_START:
                                begin

                                if (OversampledTick)

                                   begin
                                            //initialize the length of the word
                                            bit_counter <= 7;
                                            if (delay_cnt > 17 ) //wait for a bit period + an initial delay to avoid sampling on the edge of the bit --> 16 + 2 ticks 
                                                begin
                                                    delay_cnt <= 0;
                                                    Rx_state <= RX_BITS_1;
                                                end
                                            //if 16 ticks have not passed, it increments the counter and stays in the same state
                                            else begin 
                                                 delay_cnt <= delay_cnt+1;
                                                 Rx_state <= RX_START;
                                                 end
                                   end
                                end     


                        RX_BITS_1:
                                    begin

                                        if (OversampledTick) 
                                        begin

                                            if (delay_cnt < 1 ) //i want to execute the recognition ONLY at soon as I change state not for every wait tick
                                                begin
                                                    RxD_int <= RxD; //copy the received bit into an internal "buffer"
                                                    if (RxD_int == 1) cnt_1 <= cnt_1+1;
                                                    else if (RxD_int == 0) cnt_0 <= cnt_0 +1;
                                                end           
                                            if (delay_cnt > 2) //4 ticks delay
                                            begin
                                                delay_cnt <= 0;
                                                Rx_state <= RX_BITS_2; //wait for a few "baud ticks" so the samples are equally interspaced
                                            end
                                            else begin 
                                                     delay_cnt <= delay_cnt+1; //if the wait time has not been completed it stays in the same state
                                                     Rx_state <= RX_BITS_1;
                                                 end
                                            end
                                        end


                        RX_BITS_2:
                                begin
                                if (OversampledTick) //I think this is the most important thing to verify about actual functionality     
                                    begin
                                        if (delay_cnt < 1) //i want to execute the recognition ONLY at soon as I change state not for every wait tick
                                            begin
                                                RxD_int <= RxD; //copy the received bit into an internal "buffer"
                                   //        $display ("seconda acquisizione avviene al tempo", $time);
                                                if (RxD_int == 1) cnt_1 <= cnt_1+1;
                                                else if (RxD_int == 0) cnt_0 <= cnt_0 +1;
                                            end                
                                        if (delay_cnt > 2) //4 ticks delay
                                        begin
                                            delay_cnt <= 0;
                                            Rx_state <= RX_BITS_3; //wait for a whole "baud cycle"
                                        end
                                        else begin
                                             delay_cnt <= delay_cnt+1;
                                             Rx_state <= RX_BITS_2;
                                             end
                                    end
                                end

                        RX_BITS_3:

                                begin
                                    if (OversampledTick) //I think this is the most important thing to verify about actual functionality     
                                    begin
                                        if (delay_cnt < 1)
                                            begin
                                                RxD_int <= RxD;
                                    //          $display ("terza acquisizione avviene al tempo", $time);
                                                if (RxD_int == 1) cnt_1 <= cnt_1+1;
                                                else if (RxD_int == 0) cnt_0 <= cnt_0 +1;
                                            end                 
                                        if (delay_cnt > 2) //4 ticks delay
                                        begin
                                            delay_cnt <= 0;
                                            Rx_state <= RX_BITS_4; //wait for a whole "baud cycle"
                                        end
                                        else begin
                                             delay_cnt <= delay_cnt+1;
                                             Rx_state <= RX_BITS_3; 
                                             end                       
                                    end
                                end

                        RX_BITS_4:
                                begin
                                if (OversampledTick) //I think this is the most important thing to verify about actual functionality     
                                    begin
                                    if (delay_cnt < 1)
                                        begin
                                            RxD_int <= RxD;
                               //             $display ("quarta acquisizione avviene al tempo", $time);                                  
                                            if (RxD_int == 1) cnt_1 <= cnt_1+1;
                                            else if (RxD_int == 0) cnt_0 <= cnt_0 +1;
                                        end                
                                    if (delay_cnt > 2) //4 ticks delay - one tick dedicated to writing
                                    begin

                                        //WRITE FUNCTION
                                        //after checking whether there were 1s or 0s recognized, I write the appropriate value
                                        //and increment the output index
                                        if (cnt_1 >= 3) 
                                            begin
                                                RxD_data_int[i] <= 1;
                               //                 $display ("OUTPUT_UART bit %b all'indice numero %d al tempo: ", RxD_data[i], i, $time);
                                                i<=i+1;
                                            end

                                        else if (cnt_0 >=3)
                                            begin
                                                RxD_data_int[i] <= 0;
                               //                 $display ("OUTPUT_UART bit %b all'indice numero %d al tempo: ", RxD_data[i], i, $time);
                                                i<=i+1; 
                                            end                                

                                        else begin //error case when neither a 0 nor a 1 is recognized
                                         //   $display ("Frame Error!"); 
                                            Rx_state <= RX_IDLE;
                                                 end   

                                         //END OF THE BYTE DESCRIPTION
                                        if (bit_counter == 0) begin 
                                           delay_cnt <= 0;
                                 //          $display ("Detecto la fine del byte al tempo", $time);
                                           Rx_state <= RX_STOP; 
                                           end   
                                           else begin //modify the RX_BITS variables here instead of an in an iterative state
                                                delay_cnt <= 0;
                                                cnt_0 <= 0;
                                                cnt_1 <= 0;
                                                bit_counter <= bit_counter -1;
                                                Rx_state <= RX_BITS_1;    //i have waited for the correct amount of time but the bit wasn't the last one                              
                                                end

                                    end

                                    else begin //wait case when I have not reached the end of the 16 ticks
                                             delay_cnt <= delay_cnt+1; //delay count
                                           //  $display ("valore del delay_cnt %d al tempo:", delay_cnt, $time);
                                             Rx_state <= RX_BITS_4;   
                                         end 


                                end
                            end

                        RX_STOP: 
                                begin

                                if (OversampledTick) 
                                    begin
                                            RxD_ready_internal <= 1;
                                           // $display ("RX RECEIVED: %b aka %c @t=", RxD_data_int, RxD_data_int, $time);
                                            Rx_state <= RX_IDLE; //wait for a whole "baud cycle"
                                        end

                        end                


                        default: Rx_state <= RX_IDLE;
                        endcase
                    end


endmodule   

module checksum256 (
    input reset,
    input [7:0]data, //RxD_data is the data necessary for checksum
    input enable, //RxD_ready flag enables the checksum
    output reg sum_ok //flag to signal that my operation went correctly [to the control unit and to the transmission buffer unit]
    );

    //21.09 SUM_RESULT_INTERNAL becomes SUM_OK directly

    reg [7:0]sum_value; //counting the sum
    integer m; // count the # of byte till i am inside the control byte    
    reg sum_ok_internal;

   // assign sum_ok = sum_ok_internal;    

always @ (posedge enable or posedge reset) //triggers whenever enable (RxD_ready)signal has a rising edge
       begin
        if (reset) 
            begin
                sum_value = 8'b00000000;
                m = 0;
                sum_ok = 0;
            end
           //11th byte is my control byte
           else if (enable)
                begin if (m == 10) 
                    begin
                    //ROUTINE DI CONFRONTO DELLA SOMMA
                        if (sum_value == data) sum_ok = 1;
                        else sum_ok = 0;
                      //  $display ("checksum_result vale %b al tempo: ", sum_ok, $time);
                            sum_value = 0; //reset the sum value
                            m = 0; //reset the word counter    
                    end
                   else begin
                            sum_value = sum_value + data;
                           // $display ("sum_value vale %b al tempo:", sum_value, $time);
                            m = m + 1;
                            sum_ok =0;

                   end 
              end     
       end

endmodule

module output_Dbuffer (
    input reset,
    input [7:0]input_data, //data coming from the receiver
    input data_trigger, //RxD_ready flag tells me that I can copy the data
    input checksum_trigger, //SUM_OK flag from Checksum unit is the trigger for the output buffer
    input RTS_n, //TX_BUSY flag of the transmitter is my ready to send flag
    input clk, //ck needed for the FSM
    output wire TX_trigger, //TX_START flag of the transmitter now comes from THIS unit instead of Receiver
    output wire [7:0]TX_data, //byte for transmission
    output wire buf_available ,

    output wire [2:0]buf_state_scope
    );

    //internal variables
    reg [7:0] mem[0:9]; //memory init, 10 * 8 bit locations
    integer m, n, i, j ; //M = row [a.k.a. bytes], N = column [a.k.a. single bits]

    reg TX_trigger_int;
    reg [7:0] TX_data_int; 
    reg buf_available_flag;
    reg sum256_ok,  checksum_sent;        
    reg [7:0]checksum_buff ;

    //buffer FSM required variables 
    localparam  //state enumeration declaration
        BUF_IDLE = 3'b000, 
        BUF_START = 3'b001,
        BUF_BYTES = 3'b010,  
        BUF_BUSY = 3'b011,
        BUF_TX_CHECKSUM = 3'b100,
        BUF_INIT = 3'b101;

    reg [2:0] buf_state; //2 bits for 4 states

    //static assignments of OUTPUTS : Transmission Flag and Transmission Data (content)
    assign TX_trigger = TX_trigger_int;
    assign TX_data = TX_data_int;

    //signal for sending outside of the system my buf state bits
    assign buf_state_scope = buf_state;
    //flag to signal if the buffer is available or busy
    assign buf_available = buf_available_flag;


  //Block for copying my input data inside the memory    
    always @(data_trigger or reset)
                begin
                  if (reset)   
                    begin
                        for (n = 0; n < 10; n = n+1) mem [n]= 8'h00;
                        i=0;
                        checksum_buff = 8'h00;
                    end     
                   else if (data_trigger)
                        begin 
                            if ( i < 10) 
                            begin
                                mem[i] = input_data;
                              //  $display ("mem[%d ] vale %b al tempo:", i, mem[i],$time);
                                i = i + 1; //the increment happens at the end of current iteration so it doesn't re-iterate on itself
                            end
                        else 
                            begin
                                checksum_buff = input_data; //copy the checksum inside my buffer
                              //  $display("checksum_buff vale %b al tempo:", checksum_buff, $time);
                                i = 0; //the increment happens at the end of current iteration so it doesn't re-iterate on itself

                                 // for (n = 0; n < 10; n = n+1)
                                // $display ("mem[%d] = %b aka %c", n, mem[n], mem[n]);

                                // $display ("checksum = %b aka %c", checksum_buff, checksum_buff);
                            end
                       end
                end


//Block for continuously checking the checksum flag                                   
   always @(*)
    begin
         sum256_ok =0;
         if (reset) sum256_ok = 0;
         else if (checksum_trigger == 1) sum256_ok = 1;        
         else if (checksum_sent == 1) sum256_ok = 0;

    end    


    //Block for transmitting [here I manage the TX_Data and TX_Trigger functionality]
always @(posedge clk)
        begin
            if (reset)
                begin
                        buf_state <= BUF_INIT;
                        TX_trigger_int <= 0;
                        TX_data_int <= 8'b00000000;
                        buf_available_flag <= 1;        
                end

            else case (buf_state)

                BUF_INIT: 
                    begin
                            buf_state <= BUF_IDLE;
                            checksum_sent <=0;
                            j = 0;
                    end


                BUF_IDLE:
                    begin
                        TX_trigger_int <= 0;
                        TX_data_int <= 8'b00000000;
                        m =0;
                        n =0;
                        j =0; 
                        checksum_sent <=0;
                        if (sum256_ok) 
                            begin
                                //check if the TX is not busy
                                if (RTS_n == 0) buf_state <= BUF_START;
                            end


                    end

                BUF_START:
                        begin
                            TX_trigger_int <=0;

                           // if ((i == 0) || ( (j - i) > 1 ))

                            buf_state <= BUF_BYTES;
                               //     else begin
                                      //  $display ("BUFFER BUSY @time:", $time);
                                //        buf_available_flag <= 0;
                               //         buf_state <= BUF_BUSY;   
                               //     end 
                        end   

                BUF_BYTES:
                            begin

                                //check if the TX is busy
                                if (RTS_n==0) 
                                    begin

                                        if (j > 9) 
                                                begin
                                                    TX_trigger_int <= 0;
                                                    buf_state <= BUF_TX_CHECKSUM;
                                                end
                                        else begin
                                                TX_data_int <= mem[j];
                                             //   $display ("BUFFER: %b -> TX", mem[j]);
                                                TX_trigger_int <= 1;
                                                j <= j+1;
                                                buf_state <= BUF_START;
                                            end
                                    end

                            end

                BUF_BUSY:
                                begin
                                    if (RTS_n == 0)
                                        begin 
                                           // $display ("BUFFER AVAILABLE AGAIN @time:", $time);
                                            buf_state <= BUF_START;
                                            buf_available_flag <= 1;
                                        end

                                end

                BUF_TX_CHECKSUM:
                                begin

                                    if (RTS_n==0) begin
                                            TX_data_int <= checksum_buff;
                                            checksum_sent <= 1;
                                            TX_trigger_int <= 1;
                                            buf_state <= BUF_IDLE;

                                        end
                                end

                        default: buf_state = BUF_IDLE;

                        endcase


        end




endmodule

Best Answer

As I stated in my comment, mem is an inferred level-sensitive latch (for rest of the answer I will use latch for anything level-sensitive, and flop for anything edge-sensitive). Latches are inferred in always blocks that are not edge-triggered (posedge/negedge) and is not assigned to a deterministic value if all logical branches.

Here is your original logic reformatted with my comments:

always @(data_trigger or reset) // <-- poorly inferred latch
begin
    if (reset) // Asynchronous reset because 'reset' is in the sensitivity list
    begin
        for (n = 0; n < 10; n = n+1) mem[n] = 8'h00;
        i = 0;
        checksum_buff = 8'h00;
    end     
    else if (data_trigger) // <-- Also asynchronous
    begin 
        if ( i < 10) 
        begin
            mem[i] = input_data;
            i = i + 1; // <-- as a latch will cause feedback loop on FPGA, will not be detected on in Simulations
        end
        else 
        begin
            checksum_buff = input_data;
            i = 0; // <-- also will cause a feedback loop to 'if(i<10)' in FPGA
        end
    end
end 

Except for synchronous logic, synthesizers do not look at the sensitivity list. Therefore input_data and i are actually part of the sensitivity list during synthesis. Simulation does use the sensitivity list, therefore it will ignore changes to input_data and i. This is a place where simulation and synthesis will mismatch.

i is part of the real sensitivity and it is updated on every pass of the always block. This creates an asynchronous feedback will data_trigger is high and reset is low. If you were to have added i to your sensitivity list, your simulation would likely hang from an infinite loop. On FPGA there is a slight propagation delay so you will results of seemingly random data. If you were measuring power you would also see high power consumption in this condition.

If you look through your log files there may be a warning about the asynchronous feedback.

i needs to be a flop. mem and checksum_buff should be flops too. From an eyeball review of your code, it looks like everything is running on the posedge clk domain. Therefore I suggest you change the code to:

always @(posedge clk)
begin
    if (reset)   
    begin
        for (n = 0; n < 10; n = n+1) mem[n] <= 8'h00;
        i <= 0;
        checksum_buff <= 8'h00;
    end     
    else if (data_trigger)
    begin 
        if ( i < 10) 
        begin
            mem[i] <= input_data;
            i <= i + 1;
        end
        else 
        begin
            checksum_buff <= input_data;
            i <= 0;
        end
    end
end

With this reset will also be synchronous. If you really want asynchronous reset, change it to always @(posedge clk or posedge reset). If you really want data_trigger to be the clocking event (I don't recommend it) then change the posedge clk to posedge data_trigger AND replace else if (data_trigger) with else. Any posedge/negedge signal referenced in the block is treated as an asynchronous trigger. You have this issue in your checksum256 module.

Verilog combinational logic should be written in always @* or the synonymous always @(*). You will run into code that give a list of signals instead of a literal *; don't do that. That style is for legacy Verilog-1995. Since Verilog-2001 @*/@(*) is the preferred strategy to limit the risk an incomplete sensitivity list. The other thing for proper combinational logic is that each left-hand-side (LHS) signal must be assigned to a determinate value for all possible logic branches. Assigning it to itself doesn't count. Assigning it a determinate value at the top of an always block before an if or case gives it a default assignment. Combinational logic should be assigned with blocking statements (=).

Synchronous logic must be have a clocking trigger posedge or negedge. Do not reference the clock signal in the body of the of the always block. Up to two additional posedge or negedge signals can be in the sensitivity list for asynchronous reset/preset, and asynchronous must assign registers to constants. Not all FPGAs support flops with asynchronous reset/preset; some have a limited number. Check the manual for your board. Synchronous logic should be assigned with non-blocking statements (<=).

Latches usually should be avoided. Unfortunately they can be unintentionally inferred with what should be combinational logic. A latch happens when not all logical branches assign the signal to a deterministic value. If you need a latch, keep it logic simple and assign with non-blocking.

always @* begin
  if (enable) Q <= D;
end

If you have SystemVerilog you can be more explicit by replacing always @* with always_comb and always_latch. This way your synthesizer can do a better job warning you about unintentional latches.


FYI: Verilog non-blocking was inspired/copied form VHDL. The usage of blocking and non-blocking should be similar between the two.

Related Topic