Electronic – FPGA starts working after irrelevant changes, why

cyclonefpgaquartusuartverilog

I have written a UART module in Verilog. By using that module I get data from PC via UART and then send that data back again via this UART module. I uploaded it to FPGA for testing. It works flawless no matter how many characters I send with the below code (Actually you may not need the TX state but I put the full code for clearance):

module uart_transceiver

 #(parameter [15:0] CLK_PERIOD = 434) //50000000/115200

(
    input sys_rst,
    input sys_clk,

    input uart_rx,
    output reg uart_tx,

    //input [15:0] divisor,

    output reg [7:0] rx_data,
    output reg rx_err,
    output [1:0] state_uart,
    output [7:0] clock_most8,
    output [7:0] clock_least8,

    input [7:0] tx_data,
    input tx_wr,
    output reg rx_busy,
    output reg rx_done,
    output reg tx_busy
);

reg [1:0]   state_tx, state_rx;

//-----------------------------------------------------------------
// UART RX Logic
//-----------------------------------------------------------------
reg [3:0]   rx_bitcount;
reg [15:0]  clk_count_rx = 16'h0001;
reg [7:0]   rx_reg;
reg             rx1;

assign state_uart = state_rx;
//assign clock_least8 = clk_count_rx[7:0];
//assign clock_most8 = clk_count_rx[15:8];


parameter   IDLERX      = 2'd0, 
                TAKESTART   = 2'd1,
                TAKE8BITS   = 2'd2,
                TAKESTOP    = 2'd3;

parameter   [15:0] FIRST_CHECK = CLK_PERIOD / 2;



always @(posedge sys_clk) begin

    if(sys_rst) begin

        state_rx <= IDLERX;
        rx_err <= 1'b0;
        rx_done <= 1'b0;

    end else begin

        case(state_rx)

            IDLERX: begin
                rx_bitcount <= 4'd0;
                rx_busy <= 1'b0;
                if(uart_rx == 1'b0 && rx_err != 1'b1) begin
                    clk_count_rx <= 16'h0001;
                    state_rx <= TAKESTART;
                end
            end

            // Take one start signal and check if it is 0
            TAKESTART: begin
                rx_busy <= 1'b1;
                rx_done <= 1'b0;
                clk_count_rx <= clk_count_rx + 16'h0001;

                if(clk_count_rx == FIRST_CHECK) 
                    begin
                    rx1 <= uart_rx;
//                  clk_count_rx <= clk_count_rx + 16'd1;
                    end 

                if(clk_count_rx == CLK_PERIOD && rx1 != 1'b0) 
                    begin
                    rx_err <= 1'b1;
                    state_rx <= IDLERX;
                    end 

                else if(clk_count_rx == CLK_PERIOD && rx1 == 1'b0) 
                    begin
                    clk_count_rx <= 16'h0001;
                    state_rx <= TAKE8BITS;
                    end 

            end


            // Take 8 bits of data but be careful to wait for 1 CLK_PERIOD for each bit
            TAKE8BITS: begin

                if(rx_bitcount < 4'd8) 
                    begin               
                    clk_count_rx <= clk_count_rx + 16'h0001;

                    if(clk_count_rx == FIRST_CHECK) 
                        begin
                        rx1 <= uart_rx;
                        end 

                    else if(clk_count_rx == CLK_PERIOD) 
                        begin
                        clk_count_rx <= 16'h0001;
                        rx_reg <= {rx1, rx_reg[7:1]};   
                        rx_bitcount <= rx_bitcount + 1'b1;                                                  
                        end

                    end 

                else                
                    begin               
                    clk_count_rx <= 16'h0001;
                    state_rx <= TAKESTOP;
                    end

            end

            // Take 1 stop bit in last CLK_PERIOD
            TAKESTOP: begin

                clk_count_rx <= clk_count_rx + 16'h0001;

                if(clk_count_rx == FIRST_CHECK) 
                    begin
                    rx1 <= uart_rx;
                    end 

                else if(clk_count_rx == CLK_PERIOD && rx1 != 1'b1) 
                    begin
                    rx_err <= 1'b1;
                    state_rx <= IDLERX;
                    end 

                else if(clk_count_rx == CLK_PERIOD && rx1 == 1'b1) 
                    begin   
                    rx_data <= rx_reg;
                    rx_done <= 1'b1;
                    clk_count_rx <= 16'h0001;
                    state_rx <= IDLERX;
                    end 
                else if(clk_count_rx > CLK_PERIOD)
                    begin
                    rx_err <= 1'b1;
                    state_rx <= IDLERX;
                    end
            end

        endcase
    end

end



//-----------------------------------------------------------------
// UART TX Logic
//-----------------------------------------------------------------
parameter   IDLE            = 2'd0, 
                SENDSTART   = 2'd1,
                SEND8BITS   = 2'd2,
                SENDSTOP    = 2'd3;


reg [3:0] tx_bitcount;
reg [15:0] clk_count;
reg [7:0] tx_reg;

always @(posedge sys_clk) begin

    if(sys_rst) begin

        state_tx <= IDLE;

    end else begin

        case(state_tx)

            IDLE: begin
                uart_tx <= 1'b1;
                //tx_reg <= tx_data;
                clk_count <= 16'd0;
                tx_bitcount <= 4'd0;
                tx_busy <= 1'b0;
                if(tx_wr) begin
                    tx_reg <= tx_data;
                    state_tx <= SENDSTART;
                end
            end

            // Send one start signal for one CLK_PERIOD
            SENDSTART: begin
                tx_busy <= 1'b1;
                uart_tx <= 1'b0;
                if(clk_count == CLK_PERIOD) begin
                    clk_count <= 16'd0;
                    //tx_reg <= tx_data;
                    state_tx <= SEND8BITS;
                end else begin
                    clk_count <= clk_count + 16'd1; 
                end

            end


            // Send 8 bits of data but be careful to wait for 1 CLK_PERIOD for each bit
            SEND8BITS: begin
                if(tx_bitcount == 4'd0) begin
                    tx_bitcount <= tx_bitcount + 4'd1;
                    uart_tx <= tx_reg[0];   
                    tx_reg <= {1'b0, tx_reg[7:1]};
                end 
                if(clk_count == CLK_PERIOD) begin
                    clk_count <= 16'd0;
                    if(tx_bitcount < 4'd8) begin
                        tx_bitcount <= tx_bitcount + 4'd1;
                        uart_tx <= tx_reg[0];   
                        tx_reg <= {1'b0, tx_reg[7:1]};
                    end 
                    else begin
                        clk_count <= 16'd0;
                        state_tx <= SENDSTOP;
                    end
                end 
                else begin
                        clk_count <= clk_count + 16'd1;
                end
            end

            // Send 1 stop bit for one CLK_PERIOD
            SENDSTOP: begin
                uart_tx <= 1'b1;        
                if(clk_count == CLK_PERIOD) begin
                    clk_count <= 16'd0;
                    tx_busy <= 1'b0;
                    state_tx <= IDLE;
                end else begin
                    clk_count <= clk_count + 16'd1; 
                end                     
            end

        endcase
    end

end

endmodule

However, if I delete the last else if block below from TAKESTOP state shown above, the code stops working after working well for some random number of characters and becomes stuck at TAKESTOP state. When I apply sys_rst, it again works for some characters and then gets stuck at same TAKESTOP state. The interesting thing is that when this else if block is placed and the code working flawless, it never actually enters this else if. I knew that because otherwise it would be stuck at IDLERX state as rx_err would be 1.:

else if(clk_count_rx > CLK_PERIOD)
        begin
        rx_err <= 1'b1;
        state_rx <= IDLERX;
        end

I am getting mad about such problems. Instead of inserting that else if block, the problem also vanishes when I assign the clk_count_rx to an output by removing the comment from these two lines from the above full code shows:

assign clock_least8 = clk_count_rx[7:0];
assign clock_most8 = clk_count_rx[15:8];

In other words my state machine does not work well and stable if I do not assign a register to an output although I do not use that output. This has happened a couple more times for my different Verilog projects.

The question how can such irrelevant changes repair my code? What is wrong with me? The thing doesn't seem logical. Am I doing a structural conceptual mistake and this change seems to repair it by somehow? I do not know. I use Quartus 16.1 with Cyclone V GX Starter Board.

By the way this is a working UART module and any one is welcomed to use it. I will be surely very glad to whom helps me on this matter.

Best Answer

Your uart_rx signal is asynchronous to your clock. However, you have one place in your code where you use it directly in the state machine. This is curious, because in all other cases, you are careful to assign it to rx1 and then test that.

    IDLERX: begin
      rx_bitcount <= 4'd0;
      rx_busy <= 1'b0;
      if (uart_rx == 1'b0 && rx_err != 1'b1) begin
        clk_count_rx <= 16'h0001;
        state_rx <= TAKESTART;
      end
    end

The problem here is that this if statement modifies both the state register and the counter register, a total of 18 FFs. Each of those FFs has logic associated with it that determine what its next state is. When you feed this logic with an asynchronous input and that input changes close to the clock edge, some of the FFs will see it in one state, while others will see it in a different state. After the clock edge, they will be in an inconsistent state that has nothing to do with your actual logic.1

When you modify your code, you cause the hardware resources and the routing paths inside the chip to be configured differently, which means that the delays between uart_rx and those 18 FFs will change. This changes the behavior of the state machine in unpredictable ways.

Bottom line: ALL inputs to a state machine MUST be synchronized to the clock! In your case, the easiest way to do that is to put the assignment

rx1 <= uart_rx;

outside your case statement, and use rx1 exclusively inside it.

BTW, you're not the first to make this mistake: UART Receiver glitches ... and you won't be the last.


1 Note that this has nothing to do with the issue of metastability, which is also an issue with asynchronous inputs. For best results, you'd feed uart_rx through at least two FFs before using it.