Electronic – Why does this simple VHDL pattern for a shift register not work as expected

activehdlmodelsimsimulationvhdl

At first glance you would expect the VHDL source code below to behave as a shift register. In that q, over time would be

"UUUU0", "UUU00", "UU000", "U0000", "00000", ....

but instead it is always U after five (or more) consecutive clock cycles.

Why is this?

This code is actually a much simplified version of a far more complicated simulation. But it demonstrates the symptoms that I see.

It exhibits this interesting and unexpected result during simulation under both ModelSim and ActiveHDL, I have not tried other simulators and would (secondly to an explanation of the cause) like to know if others act in the same way.

To answer this question properly you must understand that:

I know this is not the best way of implementing a shift register
I know for RTL synthesis this should have a reset.
I know an array of std_logic is a std_logic_vector.
I know of the aggregation operator, &.

What I have also found:

If the assignment temp(0)<='0'; is moved inside the process, it works.
If the loop is unwrapped (see commented code), it works.

I will reiterate that this is a very simplified version of a much more complicated design (for a pipelined CPU), configured to purely show the unexpected simulation results. The actual signal types are just a simplification. For this reason you must consider your answers with the code in the form as it is.

My guess is that the VHDL simulation engine's optimiser is mistakenly (or perhaps as per specification) not bothering to run the expressions inside the loop as no signals outside change, though I can disprove this by placing the unwrapped loop in a loop.

So I expect that the answer to this question is more to do with the standards for VHDL simulation of inexplicit VHDL syntax and how VHDL simulation engines do their optimisations, rather than if given code example is the best way of doing something or not.

And now to the code I am simulating:

 library ieee;
 use ieee.std_logic_1164.all;   

 entity test_simple is
    port (
        clk : in  std_logic;
        q   : out std_logic
    );                   
 end entity;

 architecture example of test_simple is
    type   t_temp is array(4 downto 0) of std_logic;
    signal temp : t_temp;
 begin

    temp(0) <= '0';

    p : process (clk)
    begin               
        if rising_edge(clk) then
            for i in 1 to 4 loop
                    temp(i) <= temp(i - 1);
            end loop;

            --temp(1) <= temp(0);   
            --temp(2) <= temp(1);
            --temp(3) <= temp(2);
            --temp(4) <= temp(3);
        end if;
    end process p;
    q <= temp(4);
 end architecture;

And the test bench:

library ieee;
use ieee.std_logic_1164.all;

entity Bench is
end entity;

architecture tb of bench is

component test_simple is
    port (
        clk : in  std_logic;
        q   : out std_logic
    );                   
end component;

signal clk:std_logic:='0';
signal q:std_logic;     
signal rst:std_logic;

constant freq:real:=100.0e3;

begin                       
    clk<=not clk after 0.5 sec / freq;

    TB:process
    begin
        rst<='1';
        wait for 10 us;
        rst<='0';
        wait for 100 us;
        wait;
    end process;

     --Note: rst is not connected
    UUT:test_simple  port map (clk=>clk,q=>q) ;
end architecture;

Best Answer

It has to do with what can be easily evaluated at elaboration time, formally, what is called a "locally static expression". This is an obscure looking rule, but it deserves some thought - eventually it does make some sense, and your simulator is quite correct in alerting you by generating non-obvious results.

Now, temp(1) can be evaluated at compile time (even earlier than elaboration time) and it can generate a driver on bit 1 of "temp".

However, temp(i) involves a bit more work for the tools. Given the trivial nature of the loop bounds here ( 1 to 4 ) it is obvious to us humans that temp(0) cannot be driven and what you are doing is safe. But imagine the bounds were functions lower(foo) to upper(bar) in a package declared somewhere else... now the most you can say with certainty is that temp is driven - so the "locally static" expression is temp.

And that means that the process is constrained by these rules to drive all of temp, at which point you have multiple drivers on temp(0) - the process driving (no initial value, i.e. 'u') and the external temp(0) <= '0';. So naturally the two drivers resolve to 'U'.

The alternative would be a "hacky little rule" (opinion) that if the loop bounds were constants, do one thing, but if they were declared as something else, do something else, and so on ... the more such oddball little rules there are, the more complex the language becomes... in my opinion, not a better solution.

Related Solutions

VHDL simulation does not work

There are a number of style issues here and one major problem.

Fru1tbat has pointed out that you never initialise MainCounter : one way to do so would be to drive the Reset input with a simple waveform :

RESET <= '0', '1' after 100 ns, '0' after 200 ns;

in the parallel execution region (after the architecture "begin") would do so. Note that this is for simulation only, and it would be MUCH better to make RESET an input port and drive it from your simulation testbench.

Secondly (and this is the major problem, waiting to bite you randomly in real hardware) the RESET clause in process COUNT should be simplified to ONLY clear MainCounter on reset. Then the wrapround clear should be moved into the clocked part of the process:

if Reset = '1' then 
   MainCounter <= (others => '0');
elsif rising_edge(Clk) then
   if MainCounter >= 12345 then
      MainCounter <= (others => '0');
   else
      MainCounter <= MainCounter + 1;
   end if;
end if;

The reason for this is subtle if you're not used to hardware design : as expressed, you have a pile of asynchronous logic waiting to clear the counter if it momentarily transitions to a high number while counting. So counting from X"7FFF" to X"8000" it will probably momentarily pass through X"FFFF" ... and reset itself.

By moving that logic to the clocked part, it will only reset if the temporary state persists until the next clock edge : and the timing analysis tools will report timing failure if that is even a possibility.

Now the warnings : Metavalues at 0 ns are basically harmless. Metavalues persisting AFTER reset is complete are a different matter...

And the style issues :

You can compare Unsigned against simple integer literals (because numeric_std overloads comparison operators with mixed operand types) and that would make the code MUCH more readable. Better : use constant declarations instead of magic numbers. (As you discovered, you can't assign an integer literal to an unsigned; because assignment is not an operator. There are to_unsigned and to_integer conversion functions for that purpose)

There is no need for parentheses around boolean expressions e.g. in if-statements. Some C programmers use them out of habit, but it looks needlessly cluttered to me.

Your SLOW process is unclocked, despite the clock in its sensitivity list... I prefer simple clocked processes; they avoid the problems that plague combinational processes such as SLOW - which will not work correctly unless you add all the RHS signals (here - only Maincounter) to the sensitivity list.

Electrical – VHDL Wait until statement not behaving as expected

It's possible to recreate your problem with VHDL code distributed on the companion web site for the book. Note the entire chapter on the UART is available as a PDF sample.

It requires modifying the mod_m_counter to include a dvsr input: :

-- Listing 4.11
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity mod_m_counter is
    generic (
        N: integer := 4         -- number of bits
        -- M: integer := 10     -- mod M      -- DELETED
    );
    port (
        clk, reset:   in  std_logic;
        dvsr:         in  std_logic_vector (10 downto 0); -- ADDED
        max_tick:     out std_logic;
        q:            out std_logic_vector(N - 1 downto 0)
    );
end entity mod_m_counter;

architecture arch of mod_m_counter is
    signal r_reg:   unsigned(N - 1 downto 0) := (others => '0'); -- ADD default
    signal r_next:  unsigned(N - 1 downto 0) := (others => '0'); -- ADD default
    signal M:       positive := 163;  -- ADDED defaults for metavalue WARNINGs
begin
    M <= to_integer(unsigned(dvsr)); -- ADDED
    -- register
    process (clk, reset)
    begin
        if reset = '1' then
            r_reg <= (others => '0');
        elsif clk'event and clk = '1' then
            r_reg <= r_next;
        end if;
    end process;
    -- next-state logic
    r_next <= (others => '0') when r_reg >= M - 1 else -- >= no reset required
              r_reg + 1;
    -- output logic
    q <= std_logic_vector(r_reg);
    max_tick <= '1' when r_reg = M - 1 else '0';
end architecture arch;

As well as creating a uart_no_fifo entity and architecture:

library ieee;
use ieee.std_logic_1164.all;

entity uart_no_fifo is
    generic (
        DBIT:     integer := 8;   -- number of  data bits
        SB_TICK:  integer := 16   -- number of clocks for stop bits, 16 per bit
    );
    port (
        clk:           in  std_logic;
        reset:         in  std_logic;
        dvsr:          in  std_logic_vector (10 downto 0);
        rx:            in  std_logic;
        tx_start:      in  std_logic;
        w_data:        in  std_logic_vector (7 downto 0);
        r_data:        out std_logic_vector (7 downto 0);
        tx:            out std_logic;
        rx_done_tick:  out std_logic;
        tx_done_tick:  out std_logic
    );
end entity uart_no_fifo;

architecture foo of uart_no_fifo is
    component mod_m_counter is  -- Chapter 4.11  (MODIFIED)
        generic (
            N: integer := 4      -- number of bits
            -- M: integer := 10     -- mod-M  -- DELETED
        );
        port (
            clk, reset:     in  std_logic;
            dvsr:           in  std_logic_vector (10 downto 0);  -- ADDED
            max_tick:       out std_logic;
            q:              out std_logic_vector(N - 1 downto 0)
        );
    end component mod_m_counter;
    component uart_tx is       -- Chapter 7.1
       generic (
          DBIT:     integer := 8;     -- # data bits
          SB_TICK:  integer := 16      -- # ticks for stop bits
       );
       port (
          clk, reset:   in  std_logic;
          tx_start:     in  std_logic;
          s_tick:       in  std_logic;
          din:          in  std_logic_vector(7 downto 0);
          tx_done_tick: out std_logic;
          tx:           out std_logic
       );
    end component uart_tx;
    component uart_rx is      -- Chapter 7.3
        generic (
            DBIT:       integer := 8;  -- # data bits
            SB_TICK:    integer := 16  -- # ticks for stop bits
        );
        port (
          clk, reset:   in  std_logic;
          rx:           in  std_logic;
          s_tick:       in  std_logic;
          rx_done_tick: out std_logic;
          dout:         out std_logic_vector(7 downto 0)
       );
    end component uart_rx;
    signal tick:    std_logic;
begin
BAUD_RATE_GENERATOR:
    mod_m_counter
        generic map (
            N => 16
        )
        port map (
            clk => clk,
            reset => reset,
            dvsr => dvsr,
            max_tick => tick,
            q => open
        );
UART_TXMT:
    uart_tx
        generic map (
            DBIT => DBIT,
            SB_TICK => SB_TICK
        )
        port map (
            clk => clk,
            reset => reset,
            tx_start => tx_start,
            s_tick => tick,
            din => w_data,
            tx_done_tick => tx_done_tick,
            tx => tx
        );
UART_RCV:
    uart_rx
        generic map (
            DBIT => DBIT,
            SB_TICK => SB_TICK
        )
        port map (
            clk => clk,
            reset => reset,
            rx => rx,
            s_tick => tick,
            rx_done_tick => rx_done_tick,
            dout => r_data
        );
end architecture foo;

Which replicates the glitch when used with your supplied testbench:

This is a static hazard where the output rx_done_tick goes from a '0' to a '1' and back to a '0', caused by evaluating s_tick in the uart_rx state machine:

                when stop =>
                    if s_tick = '1' then
                        if s_reg = SB_TICK - 1 then
                            state_next   <= idle;
                            rx_done_tick <= '1';
                        else
                            s_next <= s_reg + 1;
                        end if;
                    end if;
        end case;

It's necessary because the idle state doesn't take into effect s_tick. Note a common s_tick from the baud rate generator is used for both uart_tx and uart_rx. In duplex communications this wouldn't be the case, the two ends of the serial link would be running off different clocks and have different x16 baud sample ticks.

Evaluating s_tick could be added to to state idle, but wouldn't help. The output rx_done_tick isn't clock synchronous, the state machine is Mealy (when it doesn't have to be, it has separate processes for clocked state transitions and outputs) where in a Moore machine s_tick would be an enable for the clock and rx_done_tick would have to be predicatively set to a '1'.

One of the things you notice in Figure 7.5 is that the FIFO's you have eliminated are clock synchronous where outputs from the UART RX and TX are evaluated on a clock edge. Evaluating rx_done_tick on the rising edge of the clock overcomes the static hazard. That's not the case in your testbench.

Modifying your testbench to evaluate rx_done_tick synchronously:

  -- wait until rx_done_tick = '1';  -- CHANGE, now clock synchronous
  wait until rising_edge(clk) and rx_done_tick = '1';

yields useful results:

You'd also find waiting for expected clock synchronous events you could get rid of at least one other wait statement.

You can also eliminate the static hazard.

The idea here is to reduce impulse noise in digital logic.

We can modify the OP's version of uart_rx:

architecture a_bit of uart_rx is
    type state_type is (idle, start, data, stop);
    signal state_reg:       state_type;
    signal state_next:      state_type;
    signal s_reg, s_next:   unsigned(4 downto 0);
    signal n_reg, n_next:   unsigned(2 downto 0);
    signal b_reg, b_next:   std_logic_vector(7 downto 0);
    signal sync1_reg:       std_logic;
    signal sync2_reg:       std_logic;
    signal sync_rx:         std_logic;
    signal rx_done:         std_logic;
    signal rx_done_next:    std_logic;
begin
-- synchronization for rx
    process (clk, reset)
    begin
        if reset = '1' then
            sync1_reg <= '0';
            sync2_reg <= '0';
        elsif (clk'event and clk = '1') then
            sync1_reg <= rx;
            sync2_reg <= sync1_reg;
        end if;
    end process;
    sync_rx <= sync2_reg;
-- FSMD state & data registers
    process (clk, reset)
    begin
        if reset = '1' then
            state_reg <= idle;
            s_reg     <= (others => '0');
            n_reg     <= (others => '0');
            b_reg     <= (others => '0');
            rx_done   <= '0';
        elsif clk'event and clk = '1' then
            if s_tick = '0' then  -- ADDED enable
                state_reg <= state_next;
                s_reg     <= s_next;
                n_reg     <= n_next;
                b_reg     <= b_next;
                rx_done   <= rx_done_next;
            end if;
        end if;
end process;
-- next-state logic & data path
process (state_reg, s_reg, n_reg, b_reg, s_tick, sync_rx)
begin
   state_next   <= state_reg;
   s_next       <= s_reg;
   n_next       <= n_reg;
   b_next       <= b_reg;
   rx_done_next <= '0';
       case state_reg is
            when idle =>
                if sync_rx = '0' then
                    state_next <= start;
                    s_next     <= (others => '0');
                end if;
            when start =>
                -- if s_tick = '1' then
                    if s_reg = 7 then
                        state_next <= data;
                        s_next     <= (others => '0');
                        n_next     <= (others => '0');
                    else
                        s_next <= s_reg + 1;
                    end if;
                -- end if;
            when data =>
                -- if s_tick = '1' then 
                    if s_reg = 15 then
                        s_next <= (others => '0');
                        b_next <= sync_rx & b_reg(7 downto 1);
                        if n_reg = (DBIT - 1) then
                            state_next <= stop;
                        else
                            n_next <= n_reg + 1;
                        end if;
                        else
                            s_next <= s_reg + 1;
                        end if;
                    -- end if;
                when stop =>
                    -- if s_tick = '1' then
                        if s_reg = SB_TICK - 1 then
                            state_next   <= idle;
                            -- rx_done_next <= '1';
                        else
                            s_next <= s_reg + 1;
                        end if;
                        if s_reg = SB_TICK - 2 then  -- PREDICTIVELY
                            rx_done_next <= '1';
                        end if;
                    -- end if;
        end case;
    end process;
    rx_done_tick <= rx_done and s_tick; -- no static hazard
    dout <= b_reg;
end architecture a_bit;

where rx_done_tick is combinatorially derived from a register output rx_done and s_tick. This eliminates the immediate static hazard. With the original test bench evaluation of rx_done_tick:

It should be noted that a static hazard is still possible due to skew in routing of rx_done and s_tick (which isn't the clock) post synthesis. The testbench should still evaluate rx_done_tick on a clock edge.

Best Answer

Related Solutions

VHDL simulation does not work

Electrical – VHDL Wait until statement not behaving as expected

Related Topic