Storing values on variable fpga vhdl

fpgavhdl

I want to develop an application that is able to get and store two input values and then output the two stored values. E.g.: The input string is "John". The application should get "J" from user_w_write_8_data and store it into my_buffer_1, then get "o" store it into my_buffer_2, then output "J", then output "o". After that the application should get "h" from user_w_write_8_data and store it into my_buffer_1, then get "n" store it into my_buffer_2, then output "h", then output "n". Basically, the application should store two characters and then output these two, store other two characters and then output these two characters etc.

As suggested, I started from the XillyBus loopback demo (xillydemo) and then you should encapsulate your own logic into a separate entity which replaces the FIFO fifo_8.

First I created the my_buffer entity, used to store the data coming into the fifo and then output.

EDIT 2: I had to put if (counter=0) [my_buffer.vhd] statement because when the simulation runs, the program first executes that if (counter=0) [my_buffer.vhd] statement and after that it adds input assignments for next rising edge [my_buffer_tb.vhd]. Note that the program goes only once into if (counter=0) [my_buffer.vhd] (there is NO counter <= 0).

Here the code:

library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_unsigned.all;

entity my_buffer is
  port (
bus_clk             : in  std_logic;
reset_8             : in  std_logic;
-- Data from CPU via XillyBus
user_w_write_8_wren : in  std_logic;
user_w_write_8_full : out std_logic;
user_w_write_8_data : in  std_logic_vector(7 DOWNTO 0);
-- Data to CPU via XillyBus
user_r_read_8_rden  : in  std_logic;
user_r_read_8_empty : out std_logic;
user_r_read_8_data  : out std_logic_vector(7 DOWNTO 0));
end my_buffer;

architecture rtl of my_buffer is
signal tmp_1 : std_logic_vector(7 downto 0); --tmp for the first char
signal tmp_2 : std_logic_vector(7 downto 0); --tmp for the second char
signal counter : integer := 0; --counter used to control the if else statements

begin 

process (bus_clk)
   begin 
        if (bus_clk'event and bus_clk = '1') then 
              if (counter = 0) then --after this statement, the program adds input assigmnents for next rising edge here [my_buffer_tb.vhd] 
                 user_r_read_8_empty <= '0';
                 user_w_write_8_full <= '0';
                 counter <= 1;
              elsif (counter = 1) then --data valid on user_w_write_8_data
                 user_r_read_8_empty <= '0';
                 tmp_1 <= user_w_write_8_data; --store the first char
                 user_w_write_8_full <= '0';
                 counter <= 2;
              elsif (counter = 2) then --data valid on user_w_write_8_data
                 user_r_read_8_empty <= '1';
                 tmp_2 <= user_w_write_8_data; --store the second char
                 user_w_write_8_full <= '1';
                 counter <= 3;
              elsif (counter = 3) then
                 user_r_read_8_empty <= '1';
                 user_r_read_8_data <= tmp_1; --put the first char on xillybus_read_8
                 user_w_write_8_full <= '1';
                 counter <= 4;
              elsif (counter = 4) then
                 user_r_read_8_empty <= '0';
                 user_r_read_8_data <= tmp_2; --put the second char on xillybus_read_8
                 user_w_write_8_full <= '0';
                 counter <= 1;
              end if;

        end if;

 end process;
end rtl; 

QUESTIONS ALREADY ANSWERED IN THE COMMENT (thank you very much!):

1) Xillybus documentation explains that user_w_write_8_full is used to inform that no more data can be accepted and it has to be set 1 only when user_w_write_8_wren='1'. The problem in my logic is that when user_w_write_8_wren='1' I have to store data into tmp_1 or tmp_2. Then, when I have to set user_w_write_8_full since when user_w_write_8_wren='1' I have to take data?

2)Xillybus documentation explains that user_r_read_8_rden informs that valid data is present on user_r_read_8_data. In my logic, it seems like I do not have to take care when data is present on user_r_read_8_data. Well, when the program is into elsif (counter = 2) then or elsif (counter = 3) then, I have to put the data, stored into tmp_1 or tmp_2, into xillybus_read_8 device file. Is it the proper way to pull data out from fifo (my_buffer) to xillybus_read_8 device file?

3) Xillybus documentation explains that user_r_read_8_empty informs that no more data can be read. When do I have to take care of being informed when no more data can be read? Does my logic need this information?

The following code is xillydemo. I have commented the code lines that I do not use. Since I work only with xillybus_write_8 and xillybus_read_8 device files, I commented all signals related to 32 device files and mem device files. In addition I added my_buffer component.

library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_unsigned.all;
use ieee.numeric_std.all;

entity xillydemo is
  port (
PCIE_PERST_B_LS : IN std_logic;
PCIE_REFCLK_N : IN std_logic;
PCIE_REFCLK_P : IN std_logic;
PCIE_RX_N : IN std_logic_vector(3 DOWNTO 0);
PCIE_RX_P : IN std_logic_vector(3 DOWNTO 0);
GPIO_LED : OUT std_logic_vector(3 DOWNTO 0);
PCIE_TX_N : OUT std_logic_vector(3 DOWNTO 0);
PCIE_TX_P : OUT std_logic_vector(3 DOWNTO 0));
end xillydemo;

architecture sample_arch of xillydemo is
  component xillybus
    port (
  PCIE_PERST_B_LS : IN std_logic;
  PCIE_REFCLK_N : IN std_logic;
  PCIE_REFCLK_P : IN std_logic;
  PCIE_RX_N : IN std_logic_vector(3 DOWNTO 0);
  PCIE_RX_P : IN std_logic_vector(3 DOWNTO 0);
  GPIO_LED : OUT std_logic_vector(3 DOWNTO 0);
  PCIE_TX_N : OUT std_logic_vector(3 DOWNTO 0);
  PCIE_TX_P : OUT std_logic_vector(3 DOWNTO 0);
  bus_clk : OUT std_logic;
  quiesce : OUT std_logic;
--      user_r_mem_8_rden : OUT std_logic;
--      user_r_mem_8_empty : IN std_logic;
--      user_r_mem_8_data : IN std_logic_vector(7 DOWNTO 0);
--      user_r_mem_8_eof : IN std_logic;
--      user_r_mem_8_open : OUT std_logic;
--      user_w_mem_8_wren : OUT std_logic;
--      user_w_mem_8_full : IN std_logic;
--      user_w_mem_8_data : OUT std_logic_vector(7 DOWNTO 0);
--      user_w_mem_8_open : OUT std_logic;
--      user_mem_8_addr : OUT std_logic_vector(4 DOWNTO 0);
--      user_mem_8_addr_update : OUT std_logic;
--      user_r_read_32_rden : OUT std_logic;
--      user_r_read_32_empty : IN std_logic;
--      user_r_read_32_data : IN std_logic_vector(31 DOWNTO 0);
--      user_r_read_32_eof : IN std_logic;
--      user_r_read_32_open : OUT std_logic;
  user_r_read_8_rden : OUT std_logic;
  user_r_read_8_empty : IN std_logic;
  user_r_read_8_data : IN std_logic_vector(7 DOWNTO 0);
  user_r_read_8_eof : IN std_logic;
  user_r_read_8_open : OUT std_logic;
--      user_w_write_32_wren : OUT std_logic;
--      user_w_write_32_full : IN std_logic;
--      user_w_write_32_data : OUT std_logic_vector(31 DOWNTO 0);
--      user_w_write_32_open : OUT std_logic;
  user_w_write_8_wren : OUT std_logic;
  user_w_write_8_full : IN std_logic;
  user_w_write_8_data : OUT std_logic_vector(7 DOWNTO 0);
  user_w_write_8_open : OUT std_logic);
  end component;

--  component fifo_8x2048
--    port (
--      clk: IN std_logic;
--      srst: IN std_logic;
--      din: IN std_logic_VECTOR(7 downto 0);
--      wr_en: IN std_logic;
--      rd_en: IN std_logic;
--      dout: OUT std_logic_VECTOR(7 downto 0);
--      full: OUT std_logic;
--      empty: OUT std_logic);
--  end component;

component my_buffer
port (
  bus_clk: IN std_logic;
  reset_8: IN std_logic;
  user_w_write_8_wren: IN std_logic;
  user_w_write_8_full: OUT std_logic;      
  user_w_write_8_data: IN std_logic_VECTOR(7 downto 0);
  user_r_read_8_rden: IN std_logic;
  user_r_read_8_empty: OUT std_logic;
  user_r_read_8_data: OUT std_logic_VECTOR(7 downto 0)
  );
end component;

--  component fifo_32x512
--    port (
--      clk: IN std_logic;
--      srst: IN std_logic;
--      din: IN std_logic_VECTOR(31 downto 0);
--      wr_en: IN std_logic;
--      rd_en: IN std_logic;
--      dout: OUT std_logic_VECTOR(31 downto 0);
--      full: OUT std_logic;
--      empty: OUT std_logic);
--  end component;

-- Synplicity black box declaration
  attribute syn_black_box : boolean;
--  attribute syn_black_box of fifo_32x512: component is true;
--  attribute syn_black_box of fifo_8x2048: component is true;

--  type demo_mem is array(0 TO 31) of std_logic_vector(7 DOWNTO 0);
--  signal demoarray : demo_mem;

  signal bus_clk :  std_logic;
  signal quiesce : std_logic;

  signal reset_8 : std_logic;
  signal reset_32 : std_logic;

  signal ram_addr : integer range 0 to 31;

--  signal user_r_mem_8_rden :  std_logic;
--  signal user_r_mem_8_empty :  std_logic;
--  signal user_r_mem_8_data :  std_logic_vector(7 DOWNTO 0);    
--  signal user_r_mem_8_eof :  std_logic;
--  signal user_r_mem_8_open :  std_logic;
--  signal user_w_mem_8_wren :  std_logic;
--  signal user_w_mem_8_full :  std_logic;
--  signal user_w_mem_8_data :  std_logic_vector(7 DOWNTO 0);
--  signal user_w_mem_8_open :  std_logic;
--  signal user_mem_8_addr :  std_logic_vector(4 DOWNTO 0);
--  signal user_mem_8_addr_update :  std_logic;
--  signal user_r_read_32_rden :  std_logic;
--  signal user_r_read_32_empty :  std_logic;
--  signal user_r_read_32_data :  std_logic_vector(31 DOWNTO 0);
--  signal user_r_read_32_eof :  std_logic;
--  signal user_r_read_32_open :  std_logic;
  signal user_r_read_8_rden :  std_logic;
  signal user_r_read_8_empty :  std_logic;
 signal user_r_read_8_data :  std_logic_vector(7 DOWNTO 0);
  signal user_r_read_8_eof :  std_logic;
  signal user_r_read_8_open :  std_logic;
--  signal user_w_write_32_wren :  std_logic;
--  signal user_w_write_32_full :  std_logic;
--  signal user_w_write_32_data :  std_logic_vector(31 DOWNTO 0);
--  signal user_w_write_32_open :  std_logic;
  signal user_w_write_8_wren :  std_logic;
  signal user_w_write_8_full :  std_logic;
  signal user_w_write_8_data :  std_logic_vector(7 DOWNTO 0);
  signal user_w_write_8_open :  std_logic;

begin
  xillybus_ins : xillybus
port map (
  -- Ports related to /dev/xillybus_mem_8
  -- FPGA to CPU signals:
--      user_r_mem_8_rden => user_r_mem_8_rden,
--      user_r_mem_8_empty => user_r_mem_8_empty,
--      user_r_mem_8_data => user_r_mem_8_data,
--      user_r_mem_8_eof => user_r_mem_8_eof,
--      user_r_mem_8_open => user_r_mem_8_open,
  -- CPU to FPGA signals:
--      user_w_mem_8_wren => user_w_mem_8_wren,
--      user_w_mem_8_full => user_w_mem_8_full,
--      user_w_mem_8_data => user_w_mem_8_data,
--      user_w_mem_8_open => user_w_mem_8_open,
  -- Address signals:
--      user_mem_8_addr => user_mem_8_addr,
--      user_mem_8_addr_update => user_mem_8_addr_update,

  -- Ports related to /dev/xillybus_read_32
  -- FPGA to CPU signals:
--      user_r_read_32_rden => user_r_read_32_rden,
--      user_r_read_32_empty => user_r_read_32_empty,
--      user_r_read_32_data => user_r_read_32_data,
--      user_r_read_32_eof => user_r_read_32_eof,
--      user_r_read_32_open => user_r_read_32_open,

  -- Ports related to /dev/xillybus_read_8
  -- FPGA to CPU signals:
  user_r_read_8_rden => user_r_read_8_rden,
  user_r_read_8_empty => user_r_read_8_empty,
  user_r_read_8_data => user_r_read_8_data,
  user_r_read_8_eof => user_r_read_8_eof,
  user_r_read_8_open => user_r_read_8_open,

  -- Ports related to /dev/xillybus_write_32
  -- CPU to FPGA signals:
--      user_w_write_32_wren => user_w_write_32_wren,
--      user_w_write_32_full => user_w_write_32_full,
--      user_w_write_32_data => user_w_write_32_data,
--      user_w_write_32_open => user_w_write_32_open,

  -- Ports related to /dev/xillybus_write_8
  -- CPU to FPGA signals:
  user_w_write_8_wren => user_w_write_8_wren,
  user_w_write_8_full => user_w_write_8_full,
  user_w_write_8_data => user_w_write_8_data,
  user_w_write_8_open => user_w_write_8_open,

  -- General signals
  PCIE_PERST_B_LS => PCIE_PERST_B_LS,
  PCIE_REFCLK_N => PCIE_REFCLK_N,
  PCIE_REFCLK_P => PCIE_REFCLK_P,
  PCIE_RX_N => PCIE_RX_N,
  PCIE_RX_P => PCIE_RX_P,
  GPIO_LED => GPIO_LED,
  PCIE_TX_N => PCIE_TX_N,
  PCIE_TX_P => PCIE_TX_P,
  bus_clk => bus_clk,
  quiesce => quiesce
  );

--  A simple inferred RAM

--  ram_addr <= conv_integer(user_mem_8_addr);

  process (bus_clk)
  begin
    if (bus_clk'event and bus_clk = '1') then
--      if (user_w_mem_8_wren = '1') then 
--        demoarray(ram_addr) <= user_w_mem_8_data;
--      end if;
--      if (user_r_mem_8_rden = '1') then
--        user_r_mem_8_data <= demoarray(ram_addr);
--      end if;
end if;
  end process;

--  user_r_mem_8_empty <= '0';
--  user_r_mem_8_eof <= '0';
--  user_w_mem_8_full <= '0';

--  32-bit loopback

--  fifo_32 : fifo_32x512
--    port map(
--      clk        => bus_clk,
--      srst       => reset_32,
--      din        => user_w_write_32_data,
--      wr_en      => user_w_write_32_wren,
--      rd_en      => user_r_read_32_rden,
--      dout       => user_r_read_32_data,
--      full       => user_w_write_32_full,
--      empty      => user_r_read_32_empty
--      );

--  reset_32 <= not (user_w_write_32_open or user_r_read_32_open);

--  user_r_read_32_eof <= '0';

--  8-bit loopback

--  fifo_8 : fifo_8x2048
--    port map(
--      clk        => bus_clk,
--      srst       => reset_8,
--      din        => user_w_write_8_data,
--      wr_en      => user_w_write_8_wren,
--      rd_en      => user_r_read_8_rden,
--      dout       => user_r_read_8_data,
--      full       => user_w_write_8_full,
--      empty      => user_r_read_8_empty
--      );

--    reset_8 <= not (user_w_write_8_open or user_r_read_8_open);
--    user_r_read_8_eof <= '0';


my_buffer_1: my_buffer
  port map (
bus_clk             => bus_clk,
reset_8             => reset_8,
user_w_write_8_wren => user_w_write_8_wren,
user_w_write_8_full => user_w_write_8_full,
user_w_write_8_data => user_w_write_8_data,
user_r_read_8_rden  => user_r_read_8_rden,
user_r_read_8_empty => user_r_read_8_empty,
user_r_read_8_data  => user_r_read_8_data);

-- these lines must be preserved in the XillyDemo
reset_8 <= not (user_w_write_8_open or user_r_read_8_open);
user_r_read_8_eof <= user_r_read_8_empty and not(user_w_write_8_open);

end sample_arch;

EDIT 2: Test bench. The problem is that during the simulation all the input and output values are correct but when I generate the bitfile and I execute this program on FPGA the outputed values are totally wrong. Note that during the simulation the program assigns the right values to tmp_1 and tmp_2. In this test bench the program:

-executes if (counter = 0) statements (basically it does nothing). In this way, during the next clock cycle the program initializes the signals as described in my_buffer_tb.vhd
-writes 3 (00000011) into tmp_1
-writes 4 (00000100) into tmp_2
-writes 3 (00000011) into user_r_read_8_data
-writes 4 (00000100) into user_r_read_8_data

-writes 5 (00000101) into tmp_1
-writes 6 (00000110) into tmp_2
-writes 5 (00000101) into user_r_read_8_data
-writes 6 (00000110) into user_r_read_8_data

Here the code:

library ieee;
use ieee.std_logic_1164.all;

entity my_buffer_tb is
end my_buffer_tb;

architecture sim of my_buffer_tb is
  signal bus_clk             : std_logic := '1';
  signal reset_8             : std_logic;
  signal user_w_write_8_wren : std_logic;
  signal user_w_write_8_full : std_logic;
  signal user_w_write_8_data : std_logic_vector(7 DOWNTO 0);
  signal user_r_read_8_rden  : std_logic;
  signal user_r_read_8_empty : std_logic;
  signal user_r_read_8_data  : std_logic_vector(7 DOWNTO 0);
begin
  -- component instantiation
  DUT: entity work.my_buffer
port map (
  bus_clk             => bus_clk,
  reset_8             => reset_8,
  user_w_write_8_wren => user_w_write_8_wren,
  user_w_write_8_full => user_w_write_8_full,
  user_w_write_8_data => user_w_write_8_data,
  user_r_read_8_rden  => user_r_read_8_rden,
  user_r_read_8_empty => user_r_read_8_empty,
  user_r_read_8_data  => user_r_read_8_data);

  -- clock generation
  bus_clk <= not bus_clk after 5 ns;

  -- waveform generation
  WaveGen_Proc: process
  begin
-- Input values sampled by the DUT with the first rising edge of bus_clk
reset_8 <= '1';                     -- apply reset
-- other input values don't care during reset
wait until rising_edge(bus_clk);

-- Input values sampled by DUT with second rising edge of bus_clk
--    reset_8 <= '0';
--    user_w_write_8_wren <= '0';
--    user_w_write_8_data <= (others => '-');
--    user_r_read_8_rden  <= '0';
--    wait until rising_edge(bus_clk);

--FIRST BUFFER

-- Add input assigmnents for next rising edge here
reset_8 <= '0';
user_w_write_8_wren <= '1';
user_r_read_8_empty <= '0';
user_w_write_8_full <= '0';
user_w_write_8_data <= "00000011"; --3
wait until rising_edge(bus_clk);

-- Add input assigmnents for next rising edge here
reset_8 <= '0';
user_w_write_8_wren <= '1';
user_r_read_8_empty <= '1';
user_w_write_8_full <= '1';
user_w_write_8_data <= "00000100"; --4
wait until rising_edge(bus_clk);

-- Add input assigmnents for next rising edge here
reset_8 <= '0';
user_w_write_8_wren <= '0';
user_r_read_8_empty <= '1';
user_w_write_8_full <= '1';
wait until rising_edge(bus_clk);

-- Add input assigmnents for next rising edge here
reset_8 <= '0';
user_w_write_8_wren <= '0';
user_r_read_8_empty <= '0';
user_w_write_8_full <= '0';
wait until rising_edge(bus_clk);


--SECOND BUFFER

-- Add input assigmnents for next rising edge here
reset_8 <= '0';
user_w_write_8_wren <= '1';
user_r_read_8_empty <= '0';
user_w_write_8_full <= '0';
user_w_write_8_data <= "00000101"; --5
wait until rising_edge(bus_clk);

-- Add input assigmnents for next rising edge here
reset_8 <= '0';
user_w_write_8_wren <= '1';
user_r_read_8_empty <= '1';
user_w_write_8_full <= '1';
user_w_write_8_data <= "00000110"; --6
wait until rising_edge(bus_clk);

-- Add input assigmnents for next rising edge here
reset_8 <= '0';
user_w_write_8_wren <= '0';
user_r_read_8_empty <= '1';
user_w_write_8_full <= '1';
wait until rising_edge(bus_clk);

-- Add input assigmnents for next rising edge here
reset_8 <= '0';
user_w_write_8_wren <= '0';
user_r_read_8_empty <= '0';
user_w_write_8_full <= '0';
wait until rising_edge(bus_clk);     


-- finished
wait;
  end process WaveGen_Proc;
end sim;

Again, during the simulation the signal values are ok but when I generate the bitfile and I execute this program on FPGA the outputted values are totally wrong. Where am I wrong?

EDIT 3:
here, the the logic after edit 2 suggestions:

begin 

process (bus_clk)
begin 
   if (bus_clk'event and bus_clk = '1') then
      if (reset_8 = '1') then
         -- reset has highest priority, initalize all state registers
         user_r_read_8_empty <= '1';  -- no data available
         user_w_write_8_full <= '1';  -- not ready yet
         counter <= 0;                -- counter starts at 0

      elsif (counter = 0) then
         -- just signal that we are ready for accepting data in the following cycle  
         user_w_write_8_full <= '0';
         counter <= 1;

      elsif (counter = 1) then
         -- wait for first byte!
         if user_w_write_8_wren = '1' then
            tmp_1 <= user_w_write_8_data; -- store the first char
            -- user_w_write_8_full is kept low, thus, we are ready for accepting more data
            counter <= 2;
         end if;

      elsif (counter = 2) then
         if user_w_write_8_wren = '1' then
            tmp_2 <= user_w_write_8_data; -- store the eighth char
            user_w_write_8_full <= '1';   -- now our data buffers are full
            user_r_read_8_empty <= '0';  -- we are ready for reading in the next cycle
            counter <= 3;
         end if;          


      elsif (counter = 3) then
          if user_r_read_8_rden = '1' then 
             user_r_read_8_data <= tmp_1;
             counter <= 4;
          end if;      

      elsif (counter = 4) then
        if user_r_read_8_rden = '1' then 
            user_r_read_8_data <= tmp_2;
            user_w_write_8_full <= '0';
            user_r_read_8_empty <= '1';
            counter <= 0;
        end if;  
      end if;
   end if;
end process;
end rtl;

Best Answer

I recommend to restart from the XillyBus loopback demo (xillydemo). Then you should encapsulate your own logic into a separaty entity, let's call it my_buffer. This entity will replace the fifo_8 component instantiation. That is, your component will directly take the input from the XillyBus and and the output is also directly conencted to the XillyBus.

As my_buffer will replace the FIFO, it has to handle all the FIFO-related control and data signals. Thus, you should create a new file (e.g. my_buffer.vhdl) and start with this implementation; of course, you have to add your custom logic:

library ieee;
use ieee.std_logic_1164.all;

entity my_buffer is
  port (
    bus_clk             : in  std_logic;
    reset_8             : in  std_logic;
    -- Data from CPU via XillyBus
    user_w_write_8_wren : in  std_logic;
    user_w_write_8_full : out std_logic;
    user_w_write_8_data : in  std_logic_vector(7 DOWNTO 0);
    -- Data to CPU via XillyBus
    user_r_read_8_rden  : in  std_logic;
    user_r_read_8_empty : out std_logic;
    user_r_read_8_data  : out std_logic_vector(7 DOWNTO 0));
end my_buffer;

architecture rtl of my_buffer is
begin  -- rtl

  -- add custom logic here

end rtl;

This entity can then be instantiated in the architecture of xillydemo with:

my_buffer_1: my_buffer
  port map (
    bus_clk             => bus_clk,
    reset_8             => reset_8,
    user_w_write_8_wren => user_w_write_8_wren,
    user_w_write_8_full => user_w_write_8_full,
    user_w_write_8_data => user_w_write_8_data,
    user_r_read_8_rden  => user_r_read_8_rden,
    user_r_read_8_empty => user_r_read_8_empty,
    user_r_read_8_data  => user_r_read_8_data);

-- these lines must be preserved in the XillyDemo
reset_8 <= not (user_w_write_8_open or user_r_read_8_open);
user_r_read_8_eof <= user_r_read_8_empty and not(user_w_write_8_open);

Before, you synthesize the xillydemo you should check your implementation of my_buffer by simulation. Create a new file (e.g. my_buffer_tb.vhdl) and start with the following implementation of a testbench. This testbench provides the inputs from the XillyBus for each clock cycle. You can then check the response from your entity (instantiated as DUT) and compare it to the XillyBus documentation. I have already added the input assignments for the first two clock cycles. Just continue the same way.

library ieee;
use ieee.std_logic_1164.all;

entity my_buffer_tb is
end my_buffer_tb;

architecture sim of my_buffer_tb is
  signal bus_clk             : std_logic := '1';
  signal reset_8             : std_logic;
  signal user_w_write_8_wren : std_logic;
  signal user_w_write_8_full : std_logic;
  signal user_w_write_8_data : std_logic_vector(7 DOWNTO 0);
  signal user_r_read_8_rden  : std_logic;
  signal user_r_read_8_empty : std_logic;
  signal user_r_read_8_data  : std_logic_vector(7 DOWNTO 0);
begin
  -- component instantiation
  DUT: entity work.my_buffer
    port map (
      bus_clk             => bus_clk,
      reset_8             => reset_8,
      user_w_write_8_wren => user_w_write_8_wren,
      user_w_write_8_full => user_w_write_8_full,
      user_w_write_8_data => user_w_write_8_data,
      user_r_read_8_rden  => user_r_read_8_rden,
      user_r_read_8_empty => user_r_read_8_empty,
      user_r_read_8_data  => user_r_read_8_data);

  -- clock generation
  bus_clk <= not bus_clk after 5 ns;

  -- waveform generation
  WaveGen_Proc: process
  begin
    -- Input values sampled by the DUT with the first rising edge of bus_clk
    reset_8 <= '1';                     -- apply reset
    -- other input values don't care during reset
    wait until rising_edge(bus_clk);

    -- Input values sampled by DUT with second rising edge of bus_clk
    reset_8 <= '0';
    user_w_write_8_wren <= '0';
    user_w_write_8_data <= (others => '-');
    user_r_read_8_rden  <= '0';
    wait until rising_edge(bus_clk);

    -- Add input assigmnents for next rising edge here

    -- finished
    wait;
  end process WaveGen_Proc;
end sim;

After question has been modified:

1) If you implement the full signal with a register, then initialize it to low. When XillyBus writes the first value, keep it low. When XillyBus writes the second one, set it to high (at the rising clock edge) because your buffer will not accept values in the subsequent counter states. When returning to counter 0 also set the full register back to low. You can ignore wren high from XillyBus when you output full =high from your buffer.

2) The rden signal is driven by XillyBus, not by your buffer. It is not driven high until your buffer signals empty low (see below). When rden is high your buffer must present the data in the next clock cycle.

3) The empty signal indicates that data is available from your buffer, so that, XillyBus can read it (indicated by rden).

Hints regarding "Edit 2" of question:

At first, the testbench is wrong because you assigned the empty output instead of the rden input of your buffer in the testbench.

Afterwards the testbench is too simple to detect the errors in your implementation of my_buffer. reset_8 is applied only for one cycle, but it might be applied for many cycles. Moreover, your testbench does not include enough idle cycles, where both rden and wren are low. There can and will be an arbitrary number of idle cycles before the first write (CPU->FPGA) between the first and second byte, and also between writing and reading and vice versa.

Thus, the testbench did not reveal that you forgot to check for reset, rden and wren in the implementation of my_buffer. To give you a hint, the process in my_buffer should start like this:

process (bus_clk)
begin 
   if (bus_clk'event and bus_clk = '1') then
      if reset_8 = '1' then
         -- reset has highest priority, initalize all state registers
         user_r_read_8_empty <= '1';  -- no data available
         user_w_write_8_full <= '1';  -- not ready yet
         counter <= 0;                -- counter starts at 0

      elsif (counter = 0) then
         -- just signal that we are ready for accepting data in the following cycle  
         user_w_write_8_full <= '0';
         counter <= 1;

      elsif (counter = 1) then
         -- wait for first byte!
         if user_w_write_8_wren = '1' then
            tmp_1 <= user_w_write_8_data; -- store the first char
            -- user_w_write_8_full is kept low, thus, we are ready for accepting more data
            counter <= 2;
         end if;

      elsif (counter = 2) then
         -- wait for second byte!
         if user_w_write_8_wren = '1' then
            tmp_2 <= user_w_write_8_data; -- store the second char
            user_w_write_8_full <= '1';   -- now our data buffers are full
            user_r_read_8_empty <= '0';  -- we are ready for reading in the next cycle
            counter <= 3;
         end if;

      -- and so on

In the testbench, at least one idle cycle is required after reset because, full is initialized to high and goes low one cycle later.

Also note, that I have removed unnecessary register assigments. Registers keep their values, if they are not assigned a new value in the current clock cycle.