Electronic – Copy data into read_device_file after EOF from input text [Xillybus – VHDL]

fpgavhdl

I have developed an application that is able to:
1. get and store 8 input values (8 characters)
2. swap upper to lower (or vice versa) case only the first char
3. output the eight characters

E.g.: The input string is "hello this is" the application should:

  • get "h" from user_w_write_8_data, swap "h" into "H" and store "H" into tmp_1
  • get "e" from user_w_write_8_data and store it into tmp_2
  • get "l" from user_w_write_8_data and store it into tmp_3
  • get "l" from user_w_write_8_data and store it into tmp_4
  • get "o" from user_w_write_8_data and store it into tmp_5
  • get " " from user_w_write_8_data and store it into tmp_6
  • get "t" from user_w_write_8_data and store it into tmp_7
  • get "h" from user_w_write_8_data and store it into tmp_8
  • copy data from tmp_1 to user_r_read_8_data
  • copy data from tmp_2 to user_r_read_8_data
  • copy data from tmp_3 to user_r_read_8_data
  • copy data from tmp_4 to user_r_read_8_data
  • copy data from tmp_5 to user_r_read_8_data
  • copy data from tmp_6 to user_r_read_8_data
  • copy data from tmp_7 to user_r_read_8_data
  • copy data from tmp_8 to user_r_read_8_data

Then

  • get "i" from user_w_write_8_data, swap "i" into "I" and store "I" into tmp_1
  • get "s" from user_w_write_8_data and store it into tmp_2
  • get " " from user_w_write_8_data and store it into tmp_3
  • get "i" from user_w_write_8_data and store it into tmp_4
  • get "s" from user_w_write_8_data and store it into tmp_5
  • copy data from tmp_1 to user_r_read_8_data
  • copy data from tmp_2 to user_r_read_8_data
  • copy data from tmp_3 to user_r_read_8_data
  • copy data from tmp_4 to user_r_read_8_data
  • copy data from tmp_5 to user_r_read_8_data

Basically, the application should store 8 characters, swap the first char and then output these eight characters, store other eight characters, swap the first char and then output these eight characters etc. until it reads EOF at the end of the input file.

PROBLEM: my application works if the input text contains a multiple of 8 characters (input text with 8 or 16 or 24 or 32 etc. characters). The problem is that when the input file does not have a multiple of 8 number of characters.

E.g: if input string is "hello this is" (12 characters), the application works until when it copies the first 8 characters ("hello th") but when it has to copy the next 5 remaining characters ("is is") the application does not write "is is" into read_device_file.

This problem occurs because when the input file reaches EOF char, the application is not able to increment the counter signal anymore (because user_w_write_8_wren = '0'). In order to overcome this problem, I play with check variable. More precisely, when the input file reaches EOF the application:

-sets check variable to 1
-when check variable is 1, the application sets user_r_read_8_empty to '0' (means that data is available in the next clock cycle) and sets the check variable to 2
-when check variable is 2, the application should output the remaining characters (in the previous example should output "is is")

The tmp_counter signal is used to copy only the necessary remaining characters. E.g if the remaining characters are "is is" (five remaining characters), tmp_counter will allow to copy only tmp_1, tmp_2, tmp_3, tmp_4 and tmp_5 into user_r_read_8_data and not tmp_6, tmp_7 and tmp_8.

By running the simulation on the test bench the application works fine but when I execute this application on the FPGA it does not copy the remaining characters. Where am I wrong?

EDIT 1: my_buffer.vhd is the file in which I implemented the logic described above. I have improved the logic and instead of using EOF I check the full stop as suggested in the answer (I have removed case conversion because of limit characters):

library ieee;
use ieee.std_logic_1164.all;

entity my_buffer is
  port (
bus_clk             : in  std_logic;
reset_8             : in  std_logic;
-- Data from CPU via XillyBus
user_w_write_8_wren : in  std_logic;
user_w_write_8_full : out std_logic;
user_w_write_8_data : in  std_logic_vector(7 DOWNTO 0);
-- Data to CPU via XillyBus
user_r_read_8_rden  : in  std_logic;
user_r_read_8_empty : out std_logic;
user_r_read_8_data  : out std_logic_vector(7 DOWNTO 0));
end my_buffer;

architecture rtl of my_buffer is

signal tmp_1 : std_logic_vector(7 downto 0); --tmp for the first char
signal tmp_2 : std_logic_vector(7 downto 0); --tmp for the second char
signal tmp_3 : std_logic_vector(7 downto 0); --tmp for the third char
signal tmp_4 : std_logic_vector(7 downto 0); --tmp for the fourth char
signal tmp_5 : std_logic_vector(7 downto 0); --tmp for the fifth char
signal tmp_6 : std_logic_vector(7 downto 0); --tmp for the sixth char
signal tmp_7 : std_logic_vector(7 downto 0); --tmp for the seventh char
signal tmp_8 : std_logic_vector(7 downto 0); --tmp for the eighth char

signal counter : integer := 0; --counter used to control the if else statements
shared variable check : integer := 0;
signal tmp_counter : integer := -1;

begin 

process (bus_clk)
begin 
   if (bus_clk'event and bus_clk = '1') then

      if user_w_write_8_wren = '1' then
          if (user_w_write_8_data="00101110" and check=0 and counter<9) then --EOT:00000100, .:00101110
              user_r_read_8_empty <= '0'; --data available in the next clock cycle 
              user_w_write_8_full <= '1';   -- now our data buffers are full
              tmp_counter <= 1; --used to copy only the necessary remaing char. During every clock cycle it is decremented of 1 unit.
              check := 1; --force the app to go into elsif statement down here
          end if;
      end if;

      if (reset_8 = '1') then
         -- reset has highest priority, initalize all state registers
         user_r_read_8_empty <= '1';  -- no data available
         user_w_write_8_full <= '1';  -- not ready yet
         counter <= 0;                -- counter starts at 0             

      elsif (counter = 0) then
         -- just signal that we are ready for accepting data in the following cycle  
         user_w_write_8_full <= '0';
         counter <= 1;


      -- 1<=counter<=8 application copies data from user_w_write_8_data to tmp (swapping the first char) 
      elsif (counter = 1 and check=0) then
         -- wait for first byte!
         if user_w_write_8_wren = '1' then
            tmp_1 <= user_w_write_8_data; -- store the first char

            --From upper case to lower case                               
            --From lower case to upper case


            -- user_w_write_8_full is kept low, thus, we are ready for accepting more data
            counter <= 2;
         end if;

      elsif (counter = 2 and check=0) then
         -- wait for second byte!
         if user_w_write_8_wren = '1' then
            tmp_2 <= user_w_write_8_data; -- store the second char
            counter <= 3;
         end if;

      elsif (counter = 3 and check=0) then
         -- wait for third byte!
         if user_w_write_8_wren = '1' then
            tmp_3 <= user_w_write_8_data; -- store the third char
            counter <= 4;
         end if;

      elsif (counter = 4 and check=0) then
         -- wait for fourth byte!
         if user_w_write_8_wren = '1' then
            tmp_4 <= user_w_write_8_data; -- store the fourth char
            counter <= 5;
         end if;

      elsif (counter = 5 and check=0) then
         -- wait for fifth byte!
         if user_w_write_8_wren = '1' then
            tmp_5 <= user_w_write_8_data; -- store the fifth char
            counter <= 6;
         end if;

      elsif (counter = 6 and check=0) then
         -- wait for sixth byte!
         if user_w_write_8_wren = '1' then
            tmp_6 <= user_w_write_8_data; -- store the sixth char
            counter <= 7;
         end if;    

      elsif (counter = 7 and check=0) then
            -- wait for seventh byte!
            if user_w_write_8_wren = '1' then
               tmp_7 <= user_w_write_8_data; -- store the seventh char
               counter <= 8;
            end if;             

      elsif (counter = 8 and check=0) then
         -- wait for eighth byte!
         if user_w_write_8_wren = '1' then
            tmp_8 <= user_w_write_8_data; -- store the eighth char
            user_w_write_8_full <= '1';   -- now our data buffers are full
            user_r_read_8_empty <= '0';  -- we are ready for reading in the next cycle
            counter <= 9;
         end if;          

      -- 9<=counter<=16 application copies data from tmp to user_r_read_8_data                
      elsif (counter = 9 and check=0) then
        --pull out tmp_1
        if user_r_read_8_rden = '1' then 
            user_r_read_8_data <= tmp_1;
            counter <= 10;
        end if;

      elsif (counter = 10 and check=0) then
        --pull out tmp_2
        if user_r_read_8_rden = '1' then 
            user_r_read_8_data <= tmp_2;
            counter <= 11;
        end if;

      elsif (counter = 11 and check=0) then
          --pull out tmp_3
          if user_r_read_8_rden = '1' then 
              user_r_read_8_data <= tmp_3;
              counter <= 12;
          end if;            

      elsif (counter = 12 and check=0) then
          --pull out tmp_4
          if user_r_read_8_rden = '1' then 
             user_r_read_8_data <= tmp_4;
             counter <= 13;
          end if;      

      elsif (counter = 13 and check=0) then
          --pull out tmp_5
          if user_r_read_8_rden = '1' then 
             user_r_read_8_data <= tmp_5;
             counter <= 14;
          end if;      

      elsif (counter = 14 and check=0) then
          --pull out tmp_6
          if user_r_read_8_rden = '1' then 
             user_r_read_8_data <= tmp_6;
             counter <= 15;
          end if;      

      elsif (counter = 15 and check=0) then
          --pull out tmp_7
          if user_r_read_8_rden = '1' then 
             user_r_read_8_data <= tmp_7;
             counter <= 16;
          end if;      

      elsif (counter = 16 and check=0) then
        --pull out tmp_8
        if user_r_read_8_rden = '1' then 
            user_r_read_8_data <= tmp_8;
            user_w_write_8_full <= '0';
            user_r_read_8_empty <= '1';
            counter <= 1;
        end if;  
      end if;

    --when application reached EOF and it did not copy the remaing characters
    if  (check = 1 and (tmp_counter<counter)) then --application is ready for coping the remaing char
          if (tmp_counter=1 and (tmp_counter<(counter-1))) then --there data into tmp_1
              if user_r_read_8_rden = '1' then 
                 user_r_read_8_data <= tmp_1;
                 tmp_counter <= 2;
              end if;

          elsif (tmp_counter=2 and (tmp_counter<counter)) then --there data into tmp_2
              if user_r_read_8_rden = '1' then 
                 user_r_read_8_data <= tmp_2;
                 tmp_counter <= 3;
              end if;              

          elsif (tmp_counter=3 and (tmp_counter<counter)) then --there data into tmp_3
              if user_r_read_8_rden = '1' then 
                 user_r_read_8_data <= tmp_3;
                 tmp_counter <= 4;
              end if;

          elsif (tmp_counter=4 and (tmp_counter<counter)) then --there data into tmp_4
              if user_r_read_8_rden = '1' then 
                  user_r_read_8_data <= tmp_4;
                  tmp_counter <= 5;
              end if;  

          elsif (tmp_counter=5 and (tmp_counter<counter)) then --there data into tmp_5
              if user_r_read_8_rden = '1' then 
                  user_r_read_8_data <= tmp_5;
                  tmp_counter <= 6;
              end if;

          elsif (tmp_counter=6 and (tmp_counter<counter)) then --there data into tmp_6
              if user_r_read_8_rden = '1' then 
                  user_r_read_8_data <= tmp_6;
                  tmp_counter <= 7;
              end if;                  

          elsif (tmp_counter=7 and (tmp_counter<counter)) then --there data into tmp_7
              if user_r_read_8_rden = '1' then 
                  user_r_read_8_data <= tmp_7;
                  tmp_counter <= 8;
              end if;

          elsif (tmp_counter=8 and (tmp_counter<counter)) then --there data into tmp_8
              if user_r_read_8_rden = '1' then 
                  user_r_read_8_data <= tmp_8;
                  tmp_counter <= 9;
              end if;
          end if;
     end if;   

     if (tmp_counter = counter) then
         user_r_read_8_empty <= '1'; --no data available in the next clock cycle 
         user_w_write_8_full <= '1';   -- now our data buffers are full
         tmp_counter <= -1;
     end if;                                 

   end if;
end process;
end rtl;

Here is the xillydemo.vhd. Because of body limited characters (30000) of stackexchange, I put only my_buffer component that I declared into xillydemo.vhd:

--xillybus code here

component my_buffer
port (
  bus_clk: IN std_logic;
  reset_8: IN std_logic;
  user_w_write_8_wren: IN std_logic;
  user_w_write_8_full: OUT std_logic;      
  user_w_write_8_data: IN std_logic_VECTOR(7 downto 0);
  user_r_read_8_rden: IN std_logic;
  user_r_read_8_empty: OUT std_logic;
  user_r_read_8_data: OUT std_logic_VECTOR(7 downto 0)
  );
end component;

--xillybus code here  

my_buffer_1: my_buffer
  port map (
bus_clk             => bus_clk,
reset_8             => reset_8,
user_w_write_8_wren => user_w_write_8_wren,
user_w_write_8_full => user_w_write_8_full,
user_w_write_8_data => user_w_write_8_data,
user_r_read_8_rden  => user_r_read_8_rden,
user_r_read_8_empty => user_r_read_8_empty,
user_r_read_8_data  => user_r_read_8_data);

-- these lines must be preserved in the XillyDemo
reset_8 <= not (user_w_write_8_open or user_r_read_8_open);
user_r_read_8_eof <= user_r_read_8_empty and not(user_w_write_8_open);

--xillybus code here

EDIT 2: This is the test bench that I use to run the simulation as suggested in the answer:

library ieee;
use ieee.std_logic_1164.all;

entity my_buffer_tb is
end my_buffer_tb;

architecture sim of my_buffer_tb is
  signal bus_clk             : std_logic := '1';
  signal reset_8             : std_logic;
  signal user_w_write_8_wren : std_logic;
  signal user_w_write_8_full : std_logic;
  signal user_w_write_8_data : std_logic_vector(7 DOWNTO 0);
  signal user_r_read_8_rden  : std_logic;
  signal user_r_read_8_empty : std_logic;
  signal user_r_read_8_data  : std_logic_vector(7 DOWNTO 0);
begin
  -- component instantiation
  DUT: entity work.my_buffer
port map (
  bus_clk             => bus_clk,
  reset_8             => reset_8,
  user_w_write_8_wren => user_w_write_8_wren,
  user_w_write_8_full => user_w_write_8_full,
  user_w_write_8_data => user_w_write_8_data,
  user_r_read_8_rden  => user_r_read_8_rden,
  user_r_read_8_empty => user_r_read_8_empty,
  user_r_read_8_data  => user_r_read_8_data);

  -- clock generation
  bus_clk <= not bus_clk after 5 ns;

  -- waveform generation
  WaveGen_Proc: process
  begin
-- Input values sampled by the DUT with the first rising edge of bus_clk
reset_8 <= '1';                     -- apply reset
-- other input values don't care during reset
wait until rising_edge(bus_clk);

-- Input values sampled by DUT with second rising edge of bus_clk
reset_8 <= '0';
user_w_write_8_wren <= '0';
user_w_write_8_data <= (others => '-');
user_r_read_8_rden  <= '0';
wait until rising_edge(bus_clk);

--FIRST BUFFER

-- Add input assigmnents for next rising edge here
reset_8 <= '0';
user_w_write_8_wren <= '1';
user_w_write_8_data <= "01001000"; --H
user_r_read_8_rden <= '0';    
wait until rising_edge(bus_clk);

-- Add input assigmnents for next rising edge here
reset_8 <= '0';
user_w_write_8_wren <= '1';
user_w_write_8_data <= "01000101"; --E
user_r_read_8_rden <= '0';    
wait until rising_edge(bus_clk);

-- Add input assigmnents for next rising edge here
reset_8 <= '0';
user_w_write_8_wren <= '1';
user_w_write_8_data <= "01001100"; --L
user_r_read_8_rden <= '0';    
wait until rising_edge(bus_clk);

-- Add input assigmnents for next rising edge here
reset_8 <= '0';
user_w_write_8_wren <= '1';
user_w_write_8_data <= "01001100"; --L
user_r_read_8_rden <= '0';    
wait until rising_edge(bus_clk);

-- Add input assigmnents for next rising edge here
reset_8 <= '0';
user_w_write_8_wren <= '1';
user_w_write_8_data <= "01001111"; --0
user_r_read_8_rden <= '0';    
wait until rising_edge(bus_clk);

-- Add input assigmnents for next rising edge here
reset_8 <= '0';
user_w_write_8_wren <= '1';
user_w_write_8_data <= "00100000"; -- 
user_r_read_8_rden <= '0';    
wait until rising_edge(bus_clk);    

-- Add input assigmnents for next rising edge here
reset_8 <= '0';
user_w_write_8_wren <= '1';
user_w_write_8_data <= "01010100"; --T
user_r_read_8_rden <= '0';    
wait until rising_edge(bus_clk);

 -- Add input assigmnents for next rising edge here
reset_8 <= '0';
user_w_write_8_wren <= '1';
user_w_write_8_data <= "01001000"; --H
user_r_read_8_rden <= '0';    
wait until rising_edge(bus_clk);

-- Add input assigmnents for next rising edge here
reset_8 <= '0';
user_w_write_8_wren <= '0';
user_r_read_8_rden <= '1'; --read H   
wait until rising_edge(bus_clk);

-- Add input assigmnents for next rising edge here
reset_8 <= '0';
user_w_write_8_wren <= '0';
user_r_read_8_rden <= '1'; --read E
wait until rising_edge(bus_clk);

-- Add input assigmnents for next rising edge here
reset_8 <= '0';
user_w_write_8_wren <= '0';
user_r_read_8_rden <= '1'; --read L   
wait until rising_edge(bus_clk);

-- Add input assigmnents for next rising edge here
  reset_8 <= '0';
user_w_write_8_wren <= '0';
user_r_read_8_rden <= '1'; --read L   
wait until rising_edge(bus_clk);

-- Add input assigmnents for next rising edge here
reset_8 <= '0';
user_w_write_8_wren <= '0';
user_r_read_8_rden <= '1'; --read 0
wait until rising_edge(bus_clk);

-- Add input assigmnents for next rising edge here
reset_8 <= '0';
user_w_write_8_wren <= '0';
user_r_read_8_rden <= '1'; --read 
wait until rising_edge(bus_clk);

-- Add input assigmnents for next rising edge here
reset_8 <= '0';
user_w_write_8_wren <= '0';
user_r_read_8_rden <= '1'; --read T
wait until rising_edge(bus_clk);

-- Add input assigmnents for next rising edge here
reset_8 <= '0';
user_w_write_8_wren <= '0';
user_r_read_8_rden <= '1'; --read H
wait until rising_edge(bus_clk);


--SECOND BUFFER   
-- Add input assigmnents for next rising edge here
reset_8 <= '0';
user_w_write_8_wren <= '1';
user_w_write_8_data <= "01001001"; --I
user_r_read_8_rden <= '0';    
wait until rising_edge(bus_clk);

-- Add input assigmnents for next rising edge here
reset_8 <= '0';
user_w_write_8_wren <= '1';
user_w_write_8_data <= "01010011"; --S
user_r_read_8_rden <= '0';    
wait until rising_edge(bus_clk);   

-- Add input assigmnents for next rising edge here
reset_8 <= '0';
user_w_write_8_wren <= '1';
user_w_write_8_data <= "00100000"; -- 
user_r_read_8_rden <= '0';    
wait until rising_edge(bus_clk);    

-- Add input assigmnents for next rising edge here
reset_8 <= '0';
user_w_write_8_wren <= '1';
user_w_write_8_data <= "01001001"; --I
user_r_read_8_rden <= '0';    
wait until rising_edge(bus_clk);

-- Add input assigmnents for next rising edge here
reset_8 <= '0';
user_w_write_8_wren <= '1';
user_w_write_8_data <= "01010011"; --S
user_r_read_8_rden <= '0';    
wait until rising_edge(bus_clk);   

-- Add input assigmnents for next rising edge here
reset_8 <= '0';
user_w_write_8_wren <= '1';
user_w_write_8_data <= "00101110"; --.
user_r_read_8_rden <= '0';    
wait until rising_edge(bus_clk);     

-- AFTER EOT
user_w_write_8_wren <= '0';
user_w_write_8_data <= (others => '-');


 -- Add input assigmnents for next rising edge here
 reset_8 <= '0';
 user_w_write_8_wren <= '0';
 user_r_read_8_rden <= '1'; --read I
 wait until rising_edge(bus_clk);

 -- Add input assigmnents for next rising edge here
 reset_8 <= '0';
 user_w_write_8_wren <= '0';
 user_r_read_8_rden <= '1'; --read S
 wait until rising_edge(bus_clk);

 -- Add input assigmnents for next rising edge here
 reset_8 <= '0';
 user_w_write_8_wren <= '0';
 user_r_read_8_rden <= '1'; --read ' '
 wait until rising_edge(bus_clk);

 -- Add input assigmnents for next rising edge here
 reset_8 <= '0';
 user_w_write_8_wren <= '0';
 user_r_read_8_rden <= '1'; --read I
 wait until rising_edge(bus_clk);

 -- Add input assigmnents for next rising edge here
 reset_8 <= '0';
 user_w_write_8_wren <= '0';
 user_r_read_8_rden <= '1'; --read S
 wait until rising_edge(bus_clk);

 -- Add input assigmnents for next rising edge here
 reset_8 <= '0';
 user_w_write_8_wren <= '0';
 user_r_read_8_rden <= '1'; --read .
 wait until rising_edge(bus_clk);

 -- no more data to read
 user_r_read_8_rden <= '0';      

-- finished
wait;
  end process WaveGen_Proc;
end sim;

SIMULATION OUTPUT OK BUT PROBLEMS DURING THE EXECUTION ON FPGA: in the screenshot I have attached, it is possible to see that the simulation works. In particular, the program outputs the correct characters.
The problem is that when I execute this program on the FPGA, it does not work. More precisely, I developed a Java application that copies the input string on write device file, the FPGA processes the input string and as soon as data is on read device file, the Java application prints out data on the terminal. I have tested the Java application and it works perfectly. The problems that occur when I execute this program on FPGA are:

  1. I can execute this program only 3 times. After the third time the program does not output data anymore. I order to overcome this problem I have to program the FPGA again but the program will work for the next 3 times.
  2. During the 3 times that the program works, the VHDL application outputs 2 times the last character (as in my previous screenshot)

enter image description here

The screenshot shows that the simulation works, but I don't know why, when I execute this program on FPGA the 2 problems described above occur. How is this possible?

Best Answer

Testbench Fixes

I have played with your testbench and detected some minor problems. At first, the posted testbench only inputs "HELLO THIS" followed by a EOT character to the component my_buffer. And it misses to set user_w_write_8_wren to low at 210 ns as well as user_r_read_8_rden at 240 ns as can be seen in this simulator output:

simulator output of orig testbench

I have fixed this, by also inputting the missing " IS":

reset_8 <= '0';
user_r_read_8_rden <= '0';    
user_w_write_8_wren <= '1';
user_w_write_8_data <= "00100000"; -- ' '
wait until rising_edge(bus_clk);
user_w_write_8_wren <= '1';
user_w_write_8_data <= "01001001"; -- 'I'
wait until rising_edge(bus_clk);
user_w_write_8_wren <= '1';
user_w_write_8_data <= "01010011"; -- 'S'
wait until rising_edge(bus_clk);   

as well as set user_w_write_8_wren low after the EOT character:

-- already there
user_w_write_8_data <= "00000100"; --EOT
wait until rising_edge(bus_clk);     
-- new code
user_w_write_8_wren <= '0';
user_w_write_8_data <= (others => '-');

Fixing the user_r_read_8_rden is a little bit more complex. Once reading has started, XillyBus will keep this high until your component my_buffer signals user_r_read_8_empty high. To mimic this, I have appended one more read cycle before setting rden to low:

--Read until "empty" goes high
reset_8 <= '0';
user_r_read_8_rden <= '1';    
wait until rising_edge(bus_clk);    
-- no more data to read
user_r_read_8_rden <= '0';    

The simulation output is now:

simulation output after bugfix

AS you can see, XillyBus will actually read 6 characters "THIS ISS" instead of the expected 5 characters "THIS IS".

EOF Detection

I have even inserted some wait cycles into the testbench, but this does not provoke additional errors. Thus, I still believe that you missed to insert the EOT character by the application running on the CPU. This is also indicated in your question:

[...] until it reads EOF at the end of the input file.

More precisely, when the input file reaches EOF the application [...]

EOF is not a character, it is a condition which is signaled by the OS when the application running on the CPU tries to read beyond the input file. Thus, when you just cat your input file to the XillyBus device file, no EOF and also no NUL or EOT will be send to the FPGA. You have to explicitly send such character as the testbench does.

If you are still confused, select the dot ('.') as the terminating character and change the detecting to:

  if user_w_write_8_wren = '1' then
      if (user_w_write_8_data="00101110" and check=0) then
          check := 1;
      end if;
  end if;

Now send "HELLO THIS IS." to the FPGA, don't forget the '.' at the end.

Removal of Shared Variable

Please remove the shared variable check, as it is often not synthesized as intended. Thus, the change the declaration to signal. Then remove this code block at beginning of the process:

  if user_w_write_8_wren = '1' then
      if (user_w_write_8_data="00101110" and check=0 and counter<9) then --EOT:00000100, .:00101110
          user_r_read_8_empty <= '0'; --data available in the next clock cycle 
          user_w_write_8_full <= '1';   -- now our data buffers are full
          tmp_counter <= 1; --used to copy only the necessary remaing char. During every clock cycle it is decremented of 1 unit.
          check := 1; --force the app to go into elsif statement down here
      end if;
  end if;

and insert it into each block for the values 1 to 8 as follows. That is, check at each counter state for the terminating full dot. The signal assignment for check must be changed to <=. The original code for each counter value goes into the else case

  elsif (counter = 1 and check=0) then
     -- wait for first byte!
     if user_w_write_8_wren = '1' then
         if (user_w_write_8_data="00101110") then -- full dot
             user_r_read_8_empty <= '0'; --data available in the next clock cycle 
             user_w_write_8_full <= '1';   -- now our data buffers are full
             tmp_counter <= 1; --used to copy only the necessary remaing char. During every clock cycle it is decremented of 1 unit.
             check <= 1; --force the app to go into elsif statement down here
         else
             tmp_1 <= user_w_write_8_data; -- store the first char
             --From upper case to lower case                               
             --From lower case to upper case
             counter <= 2;
         end if;
     end if;

Repeat it for counter state 2 to 8 and don't forget to update the tmp suffix and the following counter state when copy & pasting.

Note, that the assignment check <= '1'; is delayed until the next clock cycle. Thus, the code block at

if  (check = 1 and (tmp_counter<counter)) then --application is ready for coping the remaing char

is executed first in the next clock cycle.

Notes Regarding "EDIT 2" of Question

To solve problem 1, you must reset check to 0 within the if (reset_8 = '1') then block.

Regarding problem 2: You set empty to high one cycle to late. Your updated simulation shows, that the last "S" is read twice. The transmitted data is always the one in the cycle after rden is high. For example, the rden from 240 to 250 ns requests more data which must be present from 250 ns to 260 ns on read_8_data. Thus, this rden reads 'i'. The rden from 280 ns to 290 ns reads 'S'. The rden from 290 ns to 300 ns reads another 'S'. Thus, XillyBus reads "iS ISS" as shown in your simulation output.