cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Explorer
Explorer
617 Views
Registered: ‎04-18-2017

Simulating VHDL from exported HLS does not behave as expected

Jump to solution

Hello,

I have a several IPs designed in VHDL. For implementation and testing purposes, I an not doing anything in a block design but pure VHDL. The thing is that I need to incorporate one HLS IP into the architecture and keep testing it without recurring to a block design.

To test the correct behavior of the exported HLS IP, I only included the VHDL files generated by vivado HLS and wrote a testbench in VHDL. The HLS code is described here in detail as I thought there was a problem with HLS. It basically receives a stream of bytes (via AXI Stream) that represent RGB images. So, I split each channel to further process them individual to later group them together by forming the AXI Stream frame back in the same RGB order. In this case, as I just want to prove the methodology, there is no processing but copying the input bytes to the output.

This is the simulation result I obtained using Vivado 2017.4:

Test_HLS_Simulation_2017_vhdl.png

Here the output does not match the input which is the expected behavior. Therefore, I created a block design and I only managed to obtain the correct output by placing an AXI4 FIFO at the input and output of my HLS IP:

Screenshot from 2020-06-26 12-30-43.png

By using *the same testbench* as in the first test, I got the expected output (output = input with a delay):

Test_HLS_Simulation_2017_bd.png

To be sure that there was no problem with an old Vivado version, I did the same test with Vivado 2019.2 and 2020.1. The design with the block design matched the one shown above, a correct behavior. However, simulating VHDL without a block design showed another result (again using the same VHDL module and testbench)

Test_HLS_Simulation_2019_vhdl.png

I could generate a complete block design with all VHDL and HLS IPs. However, this is too time consuming for debuging as every time there is something to change in the VHDL modules, I would need to edit each exported IP, generate output products and then simulate. Contrary to having directly all VHDL files in the Project Manager and making changing directly to run simulation afterwards easily.

So, what is the issue that I don't get the same and correct behavior when adding the exported VHDL files from an exported HLS IP in simulation? I attached the testbenches for the simulations with and without a block design and the top vhdl module instantiating the VHDL files from the HLS IP. The HLS files are in the thread linked at the beginning.

Thanks for the help.

0 Kudos
1 Solution

Accepted Solutions
Highlighted
Voyager
Voyager
543 Views
Registered: ‎06-20-2012

Re: Simulating VHDL from exported HLS does not behave as expected

Jump to solution

 

 stimulus: process
    variable s_counter_count :integer := 0;
  begin
  
    -- Put initialisation code here
    
    S_AXIS_0_tdata <= (others=>'0');
    S_AXIS_0_tvalid <= '0';
    S_AXIS_0_tlast <= '0';
    M_AXIS_0_tready <= '1';
    s_axis_aresetn_0 <= '0';
    wait for 1*clock_period;
    s_axis_aresetn_0 <= '1';
    wait for 10*clock_period;
    wait until s_axis_aclk_0 = '0' ;
    S_AXIS_0_tvalid <= '1';
    s_counter_count := 0;
    while(s_counter_count<12) loop
        if M_AXIS_0_tready='1' then
            S_AXIS_0_tdata <= stream(95 downto 88);
            stream <= stream(87 downto 0) & "00000000";
            if(s_counter_count=12-1) then   -- TotalBytes - 1
                S_AXIS_0_tlast <= '1';
            else
                S_AXIS_0_tlast <= '0';
            end if;        
            s_counter_count := s_counter_count + 1;
        end if;
    wait until s_axis_aclk_0 = '0' ;
    end loop;
    S_AXIS_0_tdata <= (others=>'0');
    S_AXIS_0_tvalid <= '0';
    S_AXIS_0_tlast <= '0';

    -- Put test bench stimulus code here

    wait;
  end process;
== If this was helpful, please feel free to give Kudos, and close if it answers your question ==

View solution in original post

X.PNG
0 Kudos
12 Replies
Highlighted
Voyager
Voyager
582 Views
Registered: ‎06-20-2012

Re: Simulating VHDL from exported HLS does not behave as expected

Jump to solution

@aripod 

The design is correct, the problem is on the test bench.
The signals must be delayed by at least one delta-delay with respect to the clock.

== If this was helpful, please feel free to give Kudos, and close if it answers your question ==
0 Kudos
Highlighted
Explorer
Explorer
568 Views
Registered: ‎04-18-2017

Re: Simulating VHDL from exported HLS does not behave as expected

Jump to solution

How do you add a delta cycle delay? Just with `after`?

    -- Put initialisation code here
    
    S_AXIS_0_tdata <= (others=>'0');
    S_AXIS_0_tvalid <= '0';
    S_AXIS_0_tlast <= '0';
    M_AXIS_0_tready <= '1' after 1 ns;
    s_axis_aresetn_0 <= '0';
    wait for 1*clock_period;
    s_axis_aresetn_0 <= '1';
    wait for 10*clock_period;
    
    S_AXIS_0_tvalid <= '1' after 1 ns;
    s_counter_count := 0;
    while(s_counter_count<12) loop
        if S_AXIS_0_tready='1' then
            S_AXIS_0_tdata <= stream(95 downto 88)  after 1 ns;
            stream <= stream(87 downto 0) & "00000000";
            if(s_counter_count=12-1) then   -- TotalBytes - 1
                S_AXIS_0_tlast <= '1' after 1 ns;
            else
                S_AXIS_0_tlast <= '0' after 1 ns;
            end if;        
            s_counter_count := s_counter_count + 1;
        end if;
        wait for clock_period;
    end loop;
    S_AXIS_0_tdata <= (others=>'0');
    S_AXIS_0_tvalid <= '0';
    S_AXIS_0_tlast <= '0';

    -- Put test bench stimulus code here

This didn't work.

0 Kudos
Highlighted
Voyager
Voyager
544 Views
Registered: ‎06-20-2012

Re: Simulating VHDL from exported HLS does not behave as expected

Jump to solution

 

 stimulus: process
    variable s_counter_count :integer := 0;
  begin
  
    -- Put initialisation code here
    
    S_AXIS_0_tdata <= (others=>'0');
    S_AXIS_0_tvalid <= '0';
    S_AXIS_0_tlast <= '0';
    M_AXIS_0_tready <= '1';
    s_axis_aresetn_0 <= '0';
    wait for 1*clock_period;
    s_axis_aresetn_0 <= '1';
    wait for 10*clock_period;
    wait until s_axis_aclk_0 = '0' ;
    S_AXIS_0_tvalid <= '1';
    s_counter_count := 0;
    while(s_counter_count<12) loop
        if M_AXIS_0_tready='1' then
            S_AXIS_0_tdata <= stream(95 downto 88);
            stream <= stream(87 downto 0) & "00000000";
            if(s_counter_count=12-1) then   -- TotalBytes - 1
                S_AXIS_0_tlast <= '1';
            else
                S_AXIS_0_tlast <= '0';
            end if;        
            s_counter_count := s_counter_count + 1;
        end if;
    wait until s_axis_aclk_0 = '0' ;
    end loop;
    S_AXIS_0_tdata <= (others=>'0');
    S_AXIS_0_tvalid <= '0';
    S_AXIS_0_tlast <= '0';

    -- Put test bench stimulus code here

    wait;
  end process;
== If this was helpful, please feel free to give Kudos, and close if it answers your question ==

View solution in original post

X.PNG
0 Kudos
Highlighted
Explorer
Explorer
518 Views
Registered: ‎04-18-2017

Re: Simulating VHDL from exported HLS does not behave as expected

Jump to solution

@calibragreat, that is perfect. Thanks.

0 Kudos
Highlighted
Explorer
Explorer
323 Views
Registered: ‎04-18-2017

Re: Simulating VHDL from exported HLS does not behave as expected

Jump to solution

@calibra  One thing I noticed now is that as it waits until the clk=0, then this whole part is in sync with the falling edge of the clock and my entire system is in sync with the rising edge of the clock and that causes some synchronization issues.

0 Kudos
Highlighted
Voyager
Voyager
316 Views
Registered: ‎06-20-2012

Re: Simulating VHDL from exported HLS does not behave as expected

Jump to solution

Because your design is syncronized  with the rising edge it is more easy to understand the waveform if the testbench is syncronized  with the falling edge.

You said "that causes some synchronization issues."

Why ?

 

== If this was helpful, please feel free to give Kudos, and close if it answers your question ==
0 Kudos
Highlighted
Explorer
Explorer
238 Views
Registered: ‎04-18-2017

Re: Simulating VHDL from exported HLS does not behave as expected

Jump to solution

@calibra  it worked. I just tried with a very simple axi stream passthrough in vhdl for the input and output and in the middle the HLS IP (also passthorugh) and it worked as expected, with the original testbench. I believe that this axis in VHDL are doing the same as the FIFOs in the block design as an interface from/to the HLS IP. Just to have things clear:

top.vhd:

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;

entity top is
  port (
    M_AXIS_0_tdata : out STD_LOGIC_VECTOR ( 7 downto 0 );
    M_AXIS_0_tlast : out STD_LOGIC;
    M_AXIS_0_tready : in STD_LOGIC;
    M_AXIS_0_tvalid : out STD_LOGIC;
    S_AXIS_0_tdata : in STD_LOGIC_VECTOR ( 7 downto 0 );
    S_AXIS_0_tlast : in STD_LOGIC;
    S_AXIS_0_tready : out STD_LOGIC;
    S_AXIS_0_tvalid : in STD_LOGIC;
    s_axis_aclk_0 : in STD_LOGIC;
    s_axis_aresetn_0 : in STD_LOGIC
  );
end top;

architecture behavior of top is

    signal s_in_tdata, s_out_tdata : std_logic_vector(7 downto 0);
    signal s_in_tvalid, s_out_tvalid : std_logic;
    signal s_in_tlast, s_out_tlast : std_logic;
    signal s_in_tready, s_out_tready : std_logic;
    
    signal s_in_hls_tlast, s_out_hls_tlast : std_logic_vector(0 downto 0);      

begin

    axis_in: entity work.axis(behavior)
      port map(
        M_AXIS_0_tdata => s_in_tdata,
        M_AXIS_0_tlast => s_in_tlast,
        M_AXIS_0_tready => s_in_tready,
        M_AXIS_0_tvalid => s_in_tvalid,
        S_AXIS_0_tdata => S_AXIS_0_tdata,
        S_AXIS_0_tlast => S_AXIS_0_tlast,
        S_AXIS_0_tready => S_AXIS_0_tready,
        S_AXIS_0_tvalid => S_AXIS_0_tvalid,
        s_axis_aclk_0 => s_axis_aclk_0,
        s_axis_aresetn_0 => s_axis_aresetn_0 
      );     
      
    s_in_hls_tlast(0) <= s_in_tlast;
    s_out_tlast <= s_out_hls_tlast(0);
    passthrough_0: entity work.Passthrough(behav)
      port map(
          input_r_TDATA => s_in_tdata,
          input_r_TLAST => s_in_hls_tlast,
          output_r_TDATA => s_out_tdata,
          output_r_TLAST => s_out_hls_tlast,
          ap_clk => s_axis_aclk_0,
          ap_rst_n => s_axis_aresetn_0,
          input_r_TVALID => s_in_tvalid,
          input_r_TREADY => s_in_tready,
          output_r_TVALID => s_out_tvalid,
          output_r_TREADY => s_out_tready
      );      
      
    axis_out: entity work.axis(behavior)
        port map(
          M_AXIS_0_tdata => M_AXIS_0_tdata,
          M_AXIS_0_tlast => M_AXIS_0_tlast,
          M_AXIS_0_tready => M_AXIS_0_tready,
          M_AXIS_0_tvalid => M_AXIS_0_tvalid,
          S_AXIS_0_tdata => s_out_tdata,
          S_AXIS_0_tlast => s_out_tlast,
          S_AXIS_0_tready => s_out_tready,
          S_AXIS_0_tvalid => s_out_tvalid,
          s_axis_aclk_0 => s_axis_aclk_0,
          s_axis_aresetn_0 => s_axis_aresetn_0 
        );         

end behavior;

axis.vhd ("interfaces" to HLS?):

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;

entity axis is
  port (
    M_AXIS_0_tdata : out STD_LOGIC_VECTOR ( 7 downto 0 );
    M_AXIS_0_tlast : out STD_LOGIC;
    M_AXIS_0_tready : in STD_LOGIC;
    M_AXIS_0_tvalid : out STD_LOGIC;
    S_AXIS_0_tdata : in STD_LOGIC_VECTOR ( 7 downto 0 );
    S_AXIS_0_tlast : in STD_LOGIC;
    S_AXIS_0_tready : out STD_LOGIC;
    S_AXIS_0_tvalid : in STD_LOGIC;
    s_axis_aclk_0 : in STD_LOGIC;
    s_axis_aresetn_0 : in STD_LOGIC
  );
end axis;

architecture behavior of axis is

begin

    M_AXIS_0_tdata <= S_AXIS_0_tdata when (S_AXIS_0_tvalid='1' and M_AXIS_0_tready='1') else (others=>'0');
    M_AXIS_0_tvalid <= '1' when S_AXIS_0_tvalid='1' else '0';
    M_AXIS_0_tlast <= '1' when ((S_AXIS_0_tvalid='1' and S_AXIS_0_tlast='1') and M_AXIS_0_tready='1') else '0';
    S_AXIS_0_tready <= '1' when M_AXIS_0_tready='1' else '0';       

end behavior;

and the testbench:

library IEEE;
use IEEE.Std_logic_1164.all;
use IEEE.Numeric_Std.all;

entity top_tb is
end;

architecture bench of top_tb is

  component top
    port (
      M_AXIS_0_tdata : out STD_LOGIC_VECTOR ( 7 downto 0 );
      M_AXIS_0_tlast : out STD_LOGIC;
      M_AXIS_0_tready : in STD_LOGIC;
      M_AXIS_0_tvalid : out STD_LOGIC;
      S_AXIS_0_tdata : in STD_LOGIC_VECTOR ( 7 downto 0 );
      S_AXIS_0_tlast : in STD_LOGIC;
      S_AXIS_0_tready : out STD_LOGIC;
      S_AXIS_0_tvalid : in STD_LOGIC;
      s_axis_aclk_0 : in STD_LOGIC;
      s_axis_aresetn_0 : in STD_LOGIC
    );
  end component;

  signal M_AXIS_0_tdata: STD_LOGIC_VECTOR ( 7 downto 0 );
  signal M_AXIS_0_tlast: STD_LOGIC;
  signal M_AXIS_0_tready: STD_LOGIC;
  signal M_AXIS_0_tvalid: STD_LOGIC;
  signal S_AXIS_0_tdata: STD_LOGIC_VECTOR ( 7 downto 0 );
  signal S_AXIS_0_tlast: STD_LOGIC;
  signal S_AXIS_0_tready: STD_LOGIC;
  signal S_AXIS_0_tvalid: STD_LOGIC;
  signal s_axis_aclk_0: STD_LOGIC;
  signal s_axis_aresetn_0: STD_LOGIC;
  
  constant clock_period: time := 10 ns;
  signal stop_the_clock: boolean;  
  
  signal stream : std_logic_vector(95 downto 0) := x"111213212223313233414243";

begin

  uut: top port map ( M_AXIS_0_tdata   => M_AXIS_0_tdata,
                                   M_AXIS_0_tlast   => M_AXIS_0_tlast,
                                   M_AXIS_0_tready  => M_AXIS_0_tready,
                                   M_AXIS_0_tvalid  => M_AXIS_0_tvalid,
                                   S_AXIS_0_tdata   => S_AXIS_0_tdata,
                                   S_AXIS_0_tlast   => S_AXIS_0_tlast,
                                   S_AXIS_0_tready  => S_AXIS_0_tready,
                                   S_AXIS_0_tvalid  => S_AXIS_0_tvalid,
                                   s_axis_aclk_0    => s_axis_aclk_0,
                                   s_axis_aresetn_0 => s_axis_aresetn_0 );

  stimulus: process
    variable s_counter_count :integer := 0;
  begin
  
    -- Put initialisation code here
    
    S_AXIS_0_tdata <= (others=>'0');
    S_AXIS_0_tvalid <= '0';
    S_AXIS_0_tlast <= '0';
    M_AXIS_0_tready <= '1';
    s_axis_aresetn_0 <= '0';
    wait for 1*clock_period;
    s_axis_aresetn_0 <= '1';
    wait for 10*clock_period;
    
    S_AXIS_0_tvalid <= '1';
    s_counter_count := 0;
    while(s_counter_count<12) loop
        if S_AXIS_0_tready='1' then
            S_AXIS_0_tdata <= stream(95 downto 88);
            stream <= stream(87 downto 0) & "00000000";
            if(s_counter_count=12-1) then   -- TotalBytes - 1
                S_AXIS_0_tlast <= '1';
            else
                S_AXIS_0_tlast <= '0';
            end if;        
            s_counter_count := s_counter_count + 1;
        end if;
        wait for clock_period;
    end loop;
    S_AXIS_0_tdata <= (others=>'0');
    S_AXIS_0_tvalid <= '0';
    S_AXIS_0_tlast <= '0';

    -- Put test bench stimulus code here

    wait;
  end process;

  clocking: process
  begin
    while not stop_the_clock loop
      s_axis_aclk_0 <= '1', '0' after clock_period / 2;
      wait for clock_period;
    end loop;
    wait;
  end process;
  
end;

Which is the original one, without `wait until clk='0'`.

The result is as expected:

Screenshot from 2020-06-30 09-59-49.png

Now the question is why that sort of a vhdl-hls "interface" is needed?

0 Kudos
Highlighted
Voyager
Voyager
226 Views
Registered: ‎06-20-2012

Re: Simulating VHDL from exported HLS does not behave as expected

Jump to solution

Your testbench is wrong.

    S_AXIS_0_tvalid <= '1';
    s_counter_count := 0;
    while(s_counter_count<12) loop
        if S_AXIS_0_tready='1' then

On the first iteration S_AXIS_0_tready='0'.

When you set S_AXIS_0_tvalid = '1', S_AXIS_0_tready ='1' some delta delays later.

== If this was helpful, please feel free to give Kudos, and close if it answers your question ==
Highlighted
Explorer
Explorer
221 Views
Registered: ‎04-18-2017

Re: Simulating VHDL from exported HLS does not behave as expected

Jump to solution

@calibra  why do you say that S_AXIS_0_tready='0' on the first iteration? It is set to 1 followed by some clock periods (to toggle the reset signal).Do you mean that tvalid will be zero on the first iteration? If that is the case, I added the wait until tvalid=1. However, there is no difference with that wait. I might have understood you wrongly.

    M_AXIS_0_tready <= '1';
    s_axis_aresetn_0 <= '0';
    wait for 1*clock_period;
    s_axis_aresetn_0 <= '1';
    wait for 10*clock_period;
    
    S_AXIS_0_tvalid <= '1';
    s_counter_count := 0;
    wait until S_AXIS_0_tvalid='1'; 
    while(s_counter_count<12) loop
        if S_AXIS_0_tready='1' then
0 Kudos
Highlighted
Voyager
Voyager
164 Views
Registered: ‎06-20-2012

Re: Simulating VHDL from exported HLS does not behave as expected

Jump to solution

It is

wait until S_AXIS_0_tready='1' ;

 

The problem is actually that the test bench is not well constructed.

== If this was helpful, please feel free to give Kudos, and close if it answers your question ==
0 Kudos
Highlighted
Explorer
Explorer
160 Views
Registered: ‎04-18-2017

Re: Simulating VHDL from exported HLS does not behave as expected

Jump to solution

@calibra  How should be well constructed?. thanks for the help. 

0 Kudos
Highlighted
Explorer
Explorer
73 Views
Registered: ‎04-18-2017

Re: Simulating VHDL from exported HLS does not behave as expected

Jump to solution

My problem remains. Using the same testbench, I replaced those "interface" modules with some real functionality and I got the expected results. The goal is to receive a stream of bytes and convert it to parallel. The special thing here is that the first 16 bytes (4x32 bit value) are independent and the last 12 bytes are meant to be broadcaster further also as axi stream:

 

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use ieee.std_logic_unsigned.all;
USE ieee.numeric_std.ALL; 

entity axisToIP is
    generic (
        BYTES   : natural := 17;
        BITS    : natural := 8
    );
    Port (
        clk : in std_logic;
        rst : in std_logic;
        -- AXIS Slave
        s_axis_tvalid               : in std_logic;
        s_axis_tdata                : in std_logic_vector (7 downto 0);
        s_axis_tlast                : in std_logic;
        s_axis_tready               : out std_logic;        
        -- Message definition
        total_length                : out std_logic_vector (31 downto 0);
        height                      : out std_logic_vector (31 downto 0);
        width                       : out std_logic_vector (31 downto 0);
        data_length                 : out std_logic_vector (31 downto 0);
        data_tdata                  : out std_logic_vector (7 downto 0);
        data_tvalid                 : out std_logic;
        data_tlast                  : out std_logic;
        data_tready                 : in std_logic
    );                
end axisToIP;

architecture Behavioral of axisToIP is

    --  Variable        --  Bits    --  Bytes   --  Count 
    -- total_length     -- 32       --  4       --  0, 1, 2, 3         
    -- height           -- 32       --  4       --  4, 5, 6, 7
    -- width            -- 32       --  4       --  8, 9, 10, 11
    -- data_length      -- 32       --  4       --  12, 13, 14, 15
    -- data             -- 8        --  1       --  16    

    signal s_total_length_A, s_total_length_B, s_total_length_C, s_total_length_D : std_logic_vector(7 downto 0) := (others=>'0');
    signal s_total_length : std_logic_vector(31 downto 0) := (others=>'0');
    signal s_total_length_A_ce, s_total_length_B_ce, s_total_length_C_ce, s_total_length_D_ce : std_logic := '0';
    signal s_height_A, s_height_B, s_height_C, s_height_D : std_logic_vector(7 downto 0) := (others=>'0');
    signal s_height : std_logic_vector(31 downto 0) := (others=>'0');
    signal s_width_A, s_width_B, s_width_C, s_width_D : std_logic_vector(7 downto 0) := (others=>'0');
    signal s_width : std_logic_vector(31 downto 0) := (others=>'0');
    signal s_data_length_A, s_data_length_B, s_data_length_C, s_data_length_D : std_logic_vector(7 downto 0) := (others=>'0');
    signal s_data_length : std_logic_vector(31 downto 0) := (others=>'0');
    
    signal s_counter_count : std_logic_vector(31 downto 0) := (others=>'0');
    signal s_axis_counter_count : std_logic_vector(31 downto 0) := (others=>'0');
    signal s_counter_rst, s_axis_counter_rst: std_logic := '0'; 
    signal s_counter_CE, s_axis_counter_CE: std_logic;    
    
    signal tmp : std_logic_vector(7 downto 0);
begin

    s_counter_CE <= (s_axis_tvalid and data_tready) when (s_counter_count=16 and s_axis_counter_CE='0') else
                    '1' when ((s_axis_tvalid='1' and s_axis_counter_CE='0') or s_axis_counter_rst='1') else '0';
                    
    s_counter_rst <= '1' when (s_counter_count = 16 and s_axis_counter_rst='1') else '0';
    
    s_axis_counter_CE <= '1' when 
                            (s_counter_count = 16 and data_tready='1') else 
                             '0';
     s_axis_counter_rst <= '1' when
                            ((s_counter_count = 16 and s_axis_counter_count=s_data_length-1))
                            else '0';
    
    counters: process(clk)
    begin
        if(rising_edge(clk)) then
            if(s_counter_rst='1') then
                s_counter_count <= (others=>'0');
            elsif(s_counter_CE='1') then
                s_counter_count <= s_counter_count + '1';
            end if;
        
            if(s_axis_counter_rst='1') then
                s_axis_counter_count <= (others=>'0');
            elsif(s_axis_counter_CE='1') then
                s_axis_counter_count <= s_axis_counter_count + '1';
            end if;        
        end if;
    end process;

    data: process(clk)
    begin
        if(rising_edge(clk)) then
            tmp <= s_axis_tdata;
        
            if(s_counter_count=0) then
                s_total_length_A <= tmp;--s_axis_tdata;
            end if;
            if(s_counter_count=1) then
                s_total_length_B <= tmp;--s_axis_tdata;
            end if;
            if(s_counter_count=2) then
                s_total_length_C <= tmp;--s_axis_tdata;
            end if;
            if(s_counter_count=3) then
                s_total_length_D <= tmp;--s_axis_tdata;
            end if;  
            
            if(s_counter_count=4) then
                s_height_A <= tmp;--s_axis_tdata;
            end if;
            if(s_counter_count=5) then
                s_height_B <= tmp;--s_axis_tdata;
            end if;
            if(s_counter_count=6) then
                s_height_C <= tmp;--s_axis_tdata;
            end if;
            if(s_counter_count=7) then
                s_height_D <= tmp;--s_axis_tdata;
            end if;
            
            if(s_counter_count=8) then
                s_width_A <= tmp;--s_axis_tdata;
            end if;
            if(s_counter_count=9) then
                s_width_B <= tmp;--s_axis_tdata;
            end if;
            if(s_counter_count=10) then
                s_width_C <= tmp;--s_axis_tdata;
            end if;
            if(s_counter_count=11) then
                s_width_D <= tmp;--s_axis_tdata;
            end if;      
            
            if(s_counter_count=12) then
                s_data_length_A <= tmp;--s_axis_tdata;
            end if;
            if(s_counter_count=13) then
                s_data_length_B <= tmp;--s_axis_tdata;
            end if;
            if(s_counter_count=14) then
                s_data_length_C <= tmp;--s_axis_tdata;
            end if;
            if(s_counter_count=15) then
                s_data_length_D <= tmp;--s_axis_tdata;
            end if;            
        end if;
    end process;    
    total_length <= s_total_length_D & s_total_length_C & s_total_length_B & s_total_length_A;
    height <= s_height_D & s_height_C & s_height_B & s_height_A;
    width <= s_width_D & s_width_C & s_width_B & s_width_A;
    data_length <= s_data_length; 
    s_data_length <= s_data_length_D & s_data_length_C & s_data_length_B & s_data_length_A;        
    
    data_tvalid <= '1' when (s_counter_count = 16 and s_axis_tvalid='1') else '0'; 
    data_tlast <= '1' when (s_counter_count = 16 and s_axis_counter_count=s_data_length-1) else '0';
    --data_tdata <= s_axis_tdata when ((s_counter_count = 16 and s_axis_tvalid='1') and data_tready='1') else (others=>'0');    
    data_tdata <= tmp when ((s_counter_count = 16 and s_axis_tvalid='1') and data_tready='1') else (others=>'0');
    
    s_axis_tready <=  data_tready when s_counter_count=16 else '1';

end Behavioral;

 

Which turns out to work as expected:

Screenshot from 2020-07-02 15-40-40.png

The testbench works only if I uncomment the wait until s_axis_tready='1'

Screenshot from 2020-07-02 15-36-24.png

axisToIP:

 

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use ieee.std_logic_unsigned.all;
USE ieee.numeric_std.ALL; 

entity axisToIP is
    generic (
        BYTES   : natural := 17;
        BITS    : natural := 8
    );
    Port (
        clk : in std_logic;
        rst : in std_logic;
        -- AXIS Slave
        s_axis_tvalid               : in std_logic;
        s_axis_tdata                : in std_logic_vector (7 downto 0);
        s_axis_tlast                : in std_logic;
        s_axis_tready               : out std_logic;        
        -- Message definition
        total_length                : out std_logic_vector (31 downto 0);
        height                      : out std_logic_vector (31 downto 0);
        width                       : out std_logic_vector (31 downto 0);
        data_length                 : out std_logic_vector (31 downto 0);
        data_tdata                  : out std_logic_vector (7 downto 0);
        data_tvalid                 : out std_logic;
        data_tlast                  : out std_logic;
        data_tready                 : in std_logic
    );                
end axisToIP;

architecture Behavioral of axisToIP is

    --  Variable        --  Bits    --  Bytes   --  Count 
    -- total_length     -- 32       --  4       --  0, 1, 2, 3         
    -- height           -- 32       --  4       --  4, 5, 6, 7
    -- width            -- 32       --  4       --  8, 9, 10, 11
    -- data_length      -- 32       --  4       --  12, 13, 14, 15
    -- data             -- 8        --  1       --  16    

    signal s_total_length_A, s_total_length_B, s_total_length_C, s_total_length_D : std_logic_vector(7 downto 0) := (others=>'0');
    signal s_total_length : std_logic_vector(31 downto 0) := (others=>'0');
    signal s_total_length_A_ce, s_total_length_B_ce, s_total_length_C_ce, s_total_length_D_ce : std_logic := '0';
    signal s_height_A, s_height_B, s_height_C, s_height_D : std_logic_vector(7 downto 0) := (others=>'0');
    signal s_height : std_logic_vector(31 downto 0) := (others=>'0');
    signal s_width_A, s_width_B, s_width_C, s_width_D : std_logic_vector(7 downto 0) := (others=>'0');
    signal s_width : std_logic_vector(31 downto 0) := (others=>'0');
    signal s_data_length_A, s_data_length_B, s_data_length_C, s_data_length_D : std_logic_vector(7 downto 0) := (others=>'0');
    signal s_data_length : std_logic_vector(31 downto 0) := (others=>'0');
    
    signal s_counter_count : std_logic_vector(31 downto 0) := (others=>'0');
    signal s_axis_counter_count : std_logic_vector(31 downto 0) := (others=>'0');
    signal s_counter_rst, s_axis_counter_rst: std_logic := '0'; 
    signal s_counter_CE, s_axis_counter_CE: std_logic;    
    
    signal tmp : std_logic_vector(7 downto 0);
begin

    s_counter_CE <= (s_axis_tvalid and data_tready) when (s_counter_count=16 and s_axis_counter_CE='0') else
                    '1' when ((s_axis_tvalid='1' and s_axis_counter_CE='0') or s_axis_counter_rst='1') else '0';
                    
    s_counter_rst <= '1' when (s_counter_count = 16 and s_axis_counter_rst='1') else '0';
    
    s_axis_counter_CE <= '1' when 
                            (s_counter_count = 16 and data_tready='1') else 
                             '0';
     s_axis_counter_rst <= '1' when
                            ((s_counter_count = 16 and s_axis_counter_count=s_data_length-1))
                            else '0';
    
    counters: process(clk)
    begin
        if(rising_edge(clk)) then
            if(s_counter_rst='1') then
                s_counter_count <= (others=>'0');
            elsif(s_counter_CE='1') then
                s_counter_count <= s_counter_count + '1';
            end if;
        
            if(s_axis_counter_rst='1') then
                s_axis_counter_count <= (others=>'0');
            elsif(s_axis_counter_CE='1') then
                s_axis_counter_count <= s_axis_counter_count + '1';
            end if;        
        end if;
    end process;

    data: process(clk)
    begin
        if(rising_edge(clk)) then
            tmp <= s_axis_tdata;
        
            if(s_counter_count=0) then
                s_total_length_A <= tmp;--s_axis_tdata;
            end if;
            if(s_counter_count=1) then
                s_total_length_B <= tmp;--s_axis_tdata;
            end if;
            if(s_counter_count=2) then
                s_total_length_C <= tmp;--s_axis_tdata;
            end if;
            if(s_counter_count=3) then
                s_total_length_D <= tmp;--s_axis_tdata;
            end if;  
            
            if(s_counter_count=4) then
                s_height_A <= tmp;--s_axis_tdata;
            end if;
            if(s_counter_count=5) then
                s_height_B <= tmp;--s_axis_tdata;
            end if;
            if(s_counter_count=6) then
                s_height_C <= tmp;--s_axis_tdata;
            end if;
            if(s_counter_count=7) then
                s_height_D <= tmp;--s_axis_tdata;
            end if;
            
            if(s_counter_count=8) then
                s_width_A <= tmp;--s_axis_tdata;
            end if;
            if(s_counter_count=9) then
                s_width_B <= tmp;--s_axis_tdata;
            end if;
            if(s_counter_count=10) then
                s_width_C <= tmp;--s_axis_tdata;
            end if;
            if(s_counter_count=11) then
                s_width_D <= tmp;--s_axis_tdata;
            end if;      
            
            if(s_counter_count=12) then
                s_data_length_A <= tmp;--s_axis_tdata;
            end if;
            if(s_counter_count=13) then
                s_data_length_B <= tmp;--s_axis_tdata;
            end if;
            if(s_counter_count=14) then
                s_data_length_C <= tmp;--s_axis_tdata;
            end if;
            if(s_counter_count=15) then
                s_data_length_D <= tmp;--s_axis_tdata;
            end if;            
        end if;
    end process;    
    total_length <= s_total_length_D & s_total_length_C & s_total_length_B & s_total_length_A;
    height <= s_height_D & s_height_C & s_height_B & s_height_A;
    width <= s_width_D & s_width_C & s_width_B & s_width_A;
    data_length <= s_data_length; 
    s_data_length <= s_data_length_D & s_data_length_C & s_data_length_B & s_data_length_A;        
    
    data_tvalid <= '1' when (s_counter_count = 16 and s_axis_tvalid='1') else '0'; 
    data_tlast <= '1' when (s_counter_count = 16 and s_axis_counter_count=s_data_length-1) else '0';
    --data_tdata <= s_axis_tdata when ((s_counter_count = 16 and s_axis_tvalid='1') and data_tready='1') else (others=>'0');    
    data_tdata <= tmp when ((s_counter_count = 16 and s_axis_tvalid='1') and data_tready='1') else (others=>'0');
    
    s_axis_tready <=  data_tready when s_counter_count=16 else '1';

end Behavioral;

Testbench:

 

 

library IEEE;
use IEEE.Std_logic_1164.all;
use IEEE.Numeric_Std.all;

entity axisToIP_tb is
end;

architecture bench of axisToIP_tb is

  component axisToIP
    generic (
        BYTES   : natural := 17;
        BITS    : natural := 8
    );
      Port (
          clk : in std_logic;
          rst : in std_logic;
          s_axis_tvalid               : in std_logic;
          s_axis_tdata                : in std_logic_vector (7 downto 0);
          s_axis_tlast                : in std_logic;
          s_axis_tready               : out std_logic;
          total_length                : out std_logic_vector (31 downto 0);
          height                      : out std_logic_vector (31 downto 0);
          width                       : out std_logic_vector (31 downto 0);
          data_length                 : out std_logic_vector (31 downto 0);
          data_tdata                  : out std_logic_vector (7 downto 0);
          data_tvalid                 : out std_logic;
          data_tlast                  : out std_logic;
          data_tready                 : in std_logic
      );                
  end component;

  signal clk: std_logic;
  signal rst: std_logic;
  signal s_axis_tvalid: std_logic;
  signal s_axis_tdata: std_logic_vector (7 downto 0);
  signal s_axis_tlast: std_logic;
  signal s_axis_tready: std_logic;
  signal total_length: std_logic_vector (31 downto 0);
  signal height: std_logic_vector (31 downto 0);
  signal width: std_logic_vector (31 downto 0);
  signal data_length: std_logic_vector (31 downto 0);
  signal data_tdata: std_logic_vector (7 downto 0);
  signal data_tvalid: std_logic;
  signal data_tlast: std_logic;
  signal data_tready: std_logic;
  
  constant clock_period: time := 10 ns;
  signal stop_the_clock: boolean;  
      
  signal stream : std_logic_vector(223 downto 0) := x"1800000002000000020000000C000000111213212223313233414243";

begin

  -- Insert values for generic parameters !!
  uut: axisToIP generic map ( BYTES            =>  17,
                              BITS             => 8  )
                   port map ( clk           => clk,
                              rst           => rst,
                              s_axis_tvalid => s_axis_tvalid,
                              s_axis_tdata  => s_axis_tdata,
                              s_axis_tlast  => s_axis_tlast,
                              s_axis_tready => s_axis_tready,
                              total_length  => total_length,
                              height        => height,
                              width         => width,
                              data_length   => data_length,
                              data_tdata    => data_tdata,
                              data_tvalid   => data_tvalid,
                              data_tlast    => data_tlast,
                              data_tready   => data_tready );

  stimulus: process
    variable s_counter_count :integer := 0;
  begin
  
    -- Put initialisation code here

    s_axis_tdata <= (others=>'0');
    s_axis_tvalid <= '0';
    s_axis_tlast <= '0';
    rst <= '0';
    wait for 1*clock_period;
    rst <= '1';
    wait for 1*clock_period;
    rst <= '0';    
    data_tready <= '1';
    wait for 2*clock_period;
    
    s_axis_tvalid <= '1';
    s_counter_count := 0;
    --wait until s_axis_tready='1'; 
    while(s_counter_count<28) loop
        if s_axis_tready='1' then
            s_axis_tdata <= stream(223 downto 216);
            stream <= stream(215 downto 0) & "00000000";
            if(s_counter_count=28-1) then   -- TotalBytes - 1
                s_axis_tlast <= '1';
            else
                s_axis_tlast <= '0';
            end if;        
            s_counter_count := s_counter_count + 1;
        end if;
        wait for clock_period;
    end loop;
    s_axis_tdata <= (others=>'0');
    s_axis_tvalid <= '0';
    s_axis_tlast <= '0';

    -- Put test bench stimulus code here

    wait;
  end process;

  clocking: process
  begin
    while not stop_the_clock loop
      clk <= '1', '0' after clock_period / 2;
      wait for clock_period;
    end loop;
    wait;
  end process;

end;

So at this point I could say that the intermediate IP between incomming AXI Stream and the HLS IP behaves as expected. Now, I incorporate it into a top module to combine it with the HLS IP:

 

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;

entity top_converters is
  port (
    M_AXIS_0_tdata : out STD_LOGIC_VECTOR ( 7 downto 0 );
    M_AXIS_0_tlast : out STD_LOGIC;
    M_AXIS_0_tready : in STD_LOGIC;
    M_AXIS_0_tvalid : out STD_LOGIC;
    S_AXIS_0_tdata : in STD_LOGIC_VECTOR ( 7 downto 0 );
    S_AXIS_0_tlast : in STD_LOGIC;
    S_AXIS_0_tready : out STD_LOGIC;
    S_AXIS_0_tvalid : in STD_LOGIC;
    s_axis_aclk_0 : in STD_LOGIC;
    s_axis_aresetn_0 : in STD_LOGIC
  );
end top_converters;

architecture behavior of top_converters is

    signal s_in_tdata, s_out_tdata : std_logic_vector(7 downto 0);
    signal s_in_tvalid, s_out_tvalid : std_logic;
    signal s_in_tlast, s_out_tlast : std_logic;
    signal s_in_tready, s_out_tready : std_logic;
    
    signal s_in_hls_tlast, s_out_hls_tlast : std_logic_vector(0 downto 0);      
    
    signal s_total_length : std_logic_vector(31 downto 0) := (others=>'0');
    signal s_height : std_logic_vector(31 downto 0) := (others=>'0');
    signal s_width : std_logic_vector(31 downto 0) := (others=>'0');
    signal s_data_length : std_logic_vector(31 downto 0) := (others=>'0');

begin

--    axis_in: entity work.axis(behavior)
--      port map(
--        M_AXIS_0_tdata => s_in_tdata,
--        M_AXIS_0_tlast => s_in_tlast,
--        M_AXIS_0_tready => s_in_tready,
--        M_AXIS_0_tvalid => s_in_tvalid,
--        S_AXIS_0_tdata => S_AXIS_0_tdata,
--        S_AXIS_0_tlast => S_AXIS_0_tlast,
--        S_AXIS_0_tready => S_AXIS_0_tready,
--        S_AXIS_0_tvalid => S_AXIS_0_tvalid,
--        s_axis_aclk_0 => s_axis_aclk_0,
--        s_axis_aresetn_0 => s_axis_aresetn_0 
--      );     
      
    axisToIP_0: entity work.axisToIP(Behavioral)
        generic map(
              BYTES => 17,
              BITS => 8
        )
        Port map (
              clk           => s_axis_aclk_0,
              rst           => s_axis_aresetn_0,
              s_axis_tvalid => S_AXIS_0_tvalid,
              s_axis_tdata  => S_AXIS_0_tdata,
              s_axis_tlast  => S_AXIS_0_tlast,
              s_axis_tready => S_AXIS_0_tready,        
              total_length  => open,
              height        => open,
              width         => open,
              data_length   => open,
              data_tdata    => s_in_tdata,
              data_tvalid   => s_in_tvalid,
              data_tlast    => s_in_tlast,
              data_tready   => s_in_tready
          );           
      
    s_in_hls_tlast(0) <= s_in_tlast;
    s_out_tlast <= s_out_hls_tlast(0);
    passthrough_0: entity work.Passthrough(behav)
      port map(
          input_r_TDATA => s_in_tdata,
          input_r_TLAST => s_in_hls_tlast,
          output_r_TDATA => s_out_tdata,
          output_r_TLAST => s_out_hls_tlast,
          ap_clk => s_axis_aclk_0,
          ap_rst_n => s_axis_aresetn_0,
          input_r_TVALID => s_in_tvalid,
          input_r_TREADY => s_in_tready,
          output_r_TVALID => s_out_tvalid,
          output_r_TREADY => s_out_tready
      );      
      
    axis_out: entity work.axis(behavior)
        port map(
          M_AXIS_0_tdata => M_AXIS_0_tdata,
          M_AXIS_0_tlast => M_AXIS_0_tlast,
          M_AXIS_0_tready => M_AXIS_0_tready,
          M_AXIS_0_tvalid => M_AXIS_0_tvalid,
          S_AXIS_0_tdata => s_out_tdata,
          S_AXIS_0_tlast => s_out_tlast,
          S_AXIS_0_tready => s_out_tready,
          S_AXIS_0_tvalid => s_out_tvalid,
          s_axis_aclk_0 => s_axis_aclk_0,
          s_axis_aresetn_0 => s_axis_aresetn_0 
        );         

end behavior;

With its testbench:

library IEEE;
use IEEE.Std_logic_1164.all;
use IEEE.Numeric_Std.all;

entity top_converters_tb is
end;

architecture bench of top_converters_tb is

  component top_converters
    port (
      M_AXIS_0_tdata : out STD_LOGIC_VECTOR ( 7 downto 0 );
      M_AXIS_0_tlast : out STD_LOGIC;
      M_AXIS_0_tready : in STD_LOGIC;
      M_AXIS_0_tvalid : out STD_LOGIC;
      S_AXIS_0_tdata : in STD_LOGIC_VECTOR ( 7 downto 0 );
      S_AXIS_0_tlast : in STD_LOGIC;
      S_AXIS_0_tready : out STD_LOGIC;
      S_AXIS_0_tvalid : in STD_LOGIC;
      s_axis_aclk_0 : in STD_LOGIC;
      s_axis_aresetn_0 : in STD_LOGIC
    );
  end component;

  signal M_AXIS_0_tdata: STD_LOGIC_VECTOR ( 7 downto 0 );
  signal M_AXIS_0_tlast: STD_LOGIC;
  signal M_AXIS_0_tready: STD_LOGIC;
  signal M_AXIS_0_tvalid: STD_LOGIC;
  signal S_AXIS_0_tdata: STD_LOGIC_VECTOR ( 7 downto 0 );
  signal S_AXIS_0_tlast: STD_LOGIC;
  signal S_AXIS_0_tready: STD_LOGIC;
  signal S_AXIS_0_tvalid: STD_LOGIC;
  signal s_axis_aclk_0: STD_LOGIC;
  signal s_axis_aresetn_0: STD_LOGIC;
  
  constant clock_period: time := 10 ns;
  signal stop_the_clock: boolean;  
  
  signal stream : std_logic_vector(223 downto 0) := x"1800000002000000020000000C000000111213212223313233414243";

begin

  uut: top_converters port map ( M_AXIS_0_tdata   => M_AXIS_0_tdata,
                                   M_AXIS_0_tlast   => M_AXIS_0_tlast,
                                   M_AXIS_0_tready  => M_AXIS_0_tready,
                                   M_AXIS_0_tvalid  => M_AXIS_0_tvalid,
                                   S_AXIS_0_tdata   => S_AXIS_0_tdata,
                                   S_AXIS_0_tlast   => S_AXIS_0_tlast,
                                   S_AXIS_0_tready  => S_AXIS_0_tready,
                                   S_AXIS_0_tvalid  => S_AXIS_0_tvalid,
                                   s_axis_aclk_0    => s_axis_aclk_0,
                                   s_axis_aresetn_0 => s_axis_aresetn_0 );

  stimulus: process
    variable s_counter_count :integer := 0;
  begin
  
    -- Put initialisation code here
    
    S_AXIS_0_tdata <= (others=>'0');
    S_AXIS_0_tvalid <= '0';
    S_AXIS_0_tlast <= '0';
    M_AXIS_0_tready <= '1';
    s_axis_aresetn_0 <= '0';
    wait for 1*clock_period;
    s_axis_aresetn_0 <= '1';
    wait for 10*clock_period;
    
    S_AXIS_0_tvalid <= '1';
    s_counter_count := 0;
    wait until S_AXIS_0_tready='1'; 
    while(s_counter_count<28) loop
        if S_AXIS_0_tready='1' then
            S_AXIS_0_tdata <= stream(223 downto 216);
            stream <= stream(215 downto 0) & "00000000";
            if(s_counter_count=28-1) then   -- TotalBytes - 1
                S_AXIS_0_tlast <= '1';
            else
                S_AXIS_0_tlast <= '0';
            end if;        
            s_counter_count := s_counter_count + 1;
        end if;
        wait for clock_period;
    end loop;
    S_AXIS_0_tdata <= (others=>'0');
    S_AXIS_0_tvalid <= '0';
    S_AXIS_0_tlast <= '0';

    -- Put test bench stimulus code here

    wait;
  end process;

  clocking: process
  begin
    while not stop_the_clock loop
      s_axis_aclk_0 <= '1', '0' after clock_period / 2;
      wait for clock_period;
    end loop;
    wait;
  end process;
  
end;

And the first block does not behave like previously when it was not connected to the HLS ip:

Screenshot from 2020-07-02 15-49-20.png

The first thing to notice is that the 2 counters that I have there, start counting correctly as expected but s_axis_tdata only starts streaming on the rising edge of s_axis_tready- Therefore, if t_axis_tready is already 1 and there are now triggers, how can this be simulated? In order to make it work, I commented also wait until S_AXIS_0_tready='1' and then it worked....

@calibrais the issue the testbench? If so, what is then?

Thanks for the help

0 Kudos