UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Scholar pedro_uno
Scholar
10,198 Views
Registered: ‎02-12-2013

How to excercise HSL block from Zynq processing system.

Jump to solution

Hello,

 

I have written, compiled and co-simulated an HLS block for a small comms function.  Now I want to excercise that block using the Zynq processor controlling the HLS block via the AXI bus.  In other words, I want to make my HLS block an AXI peripheral.

 

Can anyone tell me if there is a standard way to generate the HLS block so that it is ready to be controlled by the Zynq PS?

 

My block receives input as an array of data along with some discrete control lines.

 

What technique should I use to control it with the Zynq?

 

Thanks for any advice.

 

    Pete Dudley

----------------------------------------
DSP in hardware and software
-----------------------------------------
0 Kudos
1 Solution

Accepted Solutions
Scholar pedro_uno
Scholar
12,052 Views
Registered: ‎02-12-2013

Re: How to excercise HSL block from Zynq processing system.

Jump to solution

I can confirm that Vivado 2013.3 fixes this problem.  The data busses are now connected between the HLS port and the BRAM core.

 

Still there is one new trick you need to use. On the HLS port that goes to the BRAM you have to set the property->Config->Master Type = BRAM_CTRL

 

Good luck.

----------------------------------------
DSP in hardware and software
-----------------------------------------
0 Kudos
15 Replies
Xilinx Employee
Xilinx Employee
10,190 Views
Registered: ‎03-24-2010

Re: How to excercise HSL block from Zynq processing system.

Jump to solution

Hello,

The basic flow is:

1. Specify bus interfaces. Please refer to chapter "Specifying Bus Interfaces" in UG902.

2. Export IP Catalog formatted IP for use with Vivado. Please refer to chapter "Exporting the RTL Design" in UG902.

3. Add the exported IP to Vivado project. Please refer to chapter "Adding New IP to the IP Catalog" in UG896.

4. Use IPI to create embedded design. Please refer to UG898.

5. Export hardware to SDK.

6. Develop software in SDK.

One experiment tutorial can be found in chapter "Exporting Hardware to the Software Development" in UG871.

Regards,
brucey
----------------------------------------------------------------------------------------------
Kindly note- Please mark the Answer as "Accept as solution" if information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
----------------------------------------------------------------------------------------------
Scholar pedro_uno
Scholar
10,176 Views
Registered: ‎02-12-2013

Re: How to excercise HSL block from Zynq processing system.

Jump to solution
Thanks, that will keep me busy for a while. I will report with results.
----------------------------------------
DSP in hardware and software
-----------------------------------------
0 Kudos
Scholar pedro_uno
Scholar
10,157 Views
Registered: ‎02-12-2013

Re: How to excercise HSL block from Zynq processing system.

Jump to solution

Connecting HLS cores to the Zynq AXI bus is a very complicated process.

 

My block has this interface.

 

void di_dp(

    ap_int<8> data_in[8192],

    ap_uint<8> num_rows,

    ap_uint<7> num_cols,

    ap_uint<2> dp_mode,

     ap_int<8> data_out[8192])

 

data_in and data_out are arrays and I imagined feeding them through dual port RAMs.  One port of the BRAM would be connected to the AXI bus and the other to my HLS block. num_rows, num_cols and dp_mode are static configuration values.

 

I am hoping to fill the data_in BRAM, write the config values then set a start bit to make the core run.  I want to then poll a ready bit and read the results from the data_out BRAM when processing is complete. In preparation I created a Zynq design with BRAMs that I was able to write and read in SDK.

 

In order to export my HLS block as IP I added a bunch of different directives from the various UG's mentioned above but never really got anything that would wire up in IP Integrator.  Here is one set of directives that I tried.

 

set_directive_interface -mode ap_none -latency 0 "di_dp" num_rows

set_directive_interface -mode ap_none -latency 0 "di_dp" num_cols

set_directive_interface -mode ap_none -latency 0 "di_dp" dp_mode

set_directive_resource -core AXI4LiteS -metadata {-bus_bundle di_dp_axi} "di_dp" num_rows

set_directive_resource -core AXI4LiteS -metadata {-bus_bundle di_dp_axi} "di_dp" num_cols

set_directive_resource -core AXI4LiteS -metadata {-bus_bundle di_dp_axi} "di_dp" dp_mode

set_directive_interface -mode ap_memory "di_dp" data_in

set_directive_interface -mode ap_memory "di_dp" data_out

 

These directives gave me an AXI Lite port for my static values. Also, data_in and data_out ports appeared to be ready to connect to BRAM blocks but the IP Catalog does not offer BRAMs in size other than 32 wide.  My ports are 8 bit.

 

Also, I was not sure how to wrap the ap_ready and ap_done signals around to the AXI bus.

 

Any tips are greatly appreciated.

 

    Pete

 

----------------------------------------
DSP in hardware and software
-----------------------------------------
0 Kudos
Xilinx Employee
Xilinx Employee
10,154 Views
Registered: ‎08-17-2011

Re: How to excercise HSL block from Zynq processing system.

Jump to solution

Hello Pete,

 


@pedro_uno wrote:

(*1) Connecting HLS cores to the Zynq AXI bus is a very complicated process.

[...]

(*2)These directives gave me an AXI Lite port for my static values. Also, data_in and data_out ports appeared to be ready to connect to BRAM blocks but the IP Catalog does not offer BRAMs in size other than 32 wide.  My ports are 8 bit.

 

(*3)Also, I was not sure how to wrap the ap_ready and ap_done signals around to the AXI bus.

 

Any tips are greatly appreciated.

 

    Pete

 


so

*1: it's getting easier the second time.

*2: i'm not too sure on the IPI connection side.. you probably noticed you needed to have the parameter in the BRAM controler of number of BRAM interfaces to 1 to connect to one port of the bram, as you described.

For the 8 bits I don't know... sorry.. only thing I can suggest is to look into the array_reshape directive with factor of 4 and type cyclic to turn your array from 8 bits into 32 bits input. This may as well improve your throughput since basically 4 inputs will be read per BRAM access.

*3: the directive to get the control signals via AXI-Lite is :

#pragma HLS resource core=AXI4LiteS variable=return

(add you bus bundle name to match; from the GUI's directive tab you can assign it on the top level)

 

hopefully you'll be all good now.

- Hervé

SIGNATURE:
* New Dedicated Vivado HLS forums* http://forums.xilinx.com/t5/High-Level-Synthesis-HLS/bd-p/hls
* Readme/Guidance* http://forums.xilinx.com/t5/New-Users-Forum/README-first-Help-for-new-users/td-p/219369

* Please mark the Answer as "Accept as solution" if information provided is helpful.
* Give Kudos to a post which you think is helpful and reply oriented.
0 Kudos
Scholar pedro_uno
Scholar
10,145 Views
Registered: ‎02-12-2013

Re: How to excercise HSL block from Zynq processing system.

Jump to solution

Hervé, merci for the suggestion to put the directive on the return port.  It looks like the control signals to start and end processing go to the AXI bus.

 

Now, to experiment more rapidly I created a super small HLS design.  It is a bubble sorter that takes input as an array of four numbers and writes the result as an an array of four numbers.  I think my main confusion is how HLS handles arrays in the formal parameter list.  Here is my code for the sorter.

#include "ap_int.h"

int sorter(ap_int<12> data[4]){

    ap_int<12> temp;

    outer_for: for(int k=0;k<3;k++){
        inner_for: for(int i=0;i<3;i++){
            data_compare_if: if (data[i]>data[i+1]){
                temp = data[i];
                data[i] = data[i+1];
                data[i+1] = temp;
            }
        }
    }
    return(0);
}

 

Here is my directives.tcl file

 

############################################################
set_directive_resource -core AXI4LiteS -metadata {-bus_bundle axi_bundle} "sorter" return
set_directive_resource -core RAM_1P_LUTRAM "sorter" data

 

When I instantiate the "sorter" HLS core in IP Integrator it looks like there are two memory controller ports for the array parameter data_V_PORTA and data_V_PORTB.  Both are bidirectional and 16 bits wide. I am guessing it is up to the user to instantiate appropriate RAM and connect them to the AXI bus through an AXI BRAM Controller. Maybe PORTA is for input data and PORTB is for output data.

 

The vhdl is easier to interpret.

 


entity sorter is
port (
    ap_clk : IN STD_LOGIC;
    ap_rst : IN STD_LOGIC;
    ap_start : IN STD_LOGIC;
    ap_done : OUT STD_LOGIC;
    ap_idle : OUT STD_LOGIC;
    ap_ready : OUT STD_LOGIC;
    data_V_address0 : OUT STD_LOGIC_VECTOR (1 downto 0);
    data_V_ce0 : OUT STD_LOGIC;
    data_V_we0 : OUT STD_LOGIC;
    data_V_d0 : OUT STD_LOGIC_VECTOR (11 downto 0);
    data_V_q0 : IN STD_LOGIC_VECTOR (11 downto 0) );
end

 

For completeness here is my C++ testbench.

 

#include "ap_int.h"
extern void sorter(ap_int<12>*);

int main(){
    ap_int<12> data[4];

    srand(1);
    int test_error = 0;
    for(int i=0;i<100;i++){
        for(int j=0;j<4;j++){
            data[j]=rand();
        }
        sorter(data);
        int error = 0;
        for(int j=1;j<4;j++){
            if(data[j] < data[j-1]) error = 1;
        }
        if(error){
            test_error = 1;
            printf("ERROR: %d, %d, %d, %d\n",int(data[0]),int(data[1]),int(data[2]),int(data[3]));
        }else {
            printf("CORRECT: %d, %d, %d, %d\n",int(data[0]),int(data[1]),int(data[2]),int(data[3]));
        }
    }
    if (test_error){ printf("Test Failed!\n"); }else{ printf("All tests pass!\n"); }
}

----------------------------------------
DSP in hardware and software
-----------------------------------------
0 Kudos
Explorer
Explorer
10,131 Views
Registered: ‎06-17-2012

Re: How to excercise HSL block from Zynq processing system.

Jump to solution

I also come across similiar problem synthesizing ap-memory interface using Vivado HLS. 

I am trying to implement the vector dot multiplication as an example.

 

As you mentioned, I added AXI bram controller+blcok RAM to attach the IP core to Zynq system.

Everything seems to be perfect, I could DMA vector from main memory to the blcok ram and vice versa.

I am also using polling to check if the IP core has completed the computation, and it does return correct ap-done flag.

However, the computation result is constant zero. 

 

So far, I think the PS+AXI BRAM controller+true dual port BRAM part should be all right as I could read and write data to it. In fact, I also use the same method in some other designs and I am sure it is correct. As for the IP core control logic, I mean ap-start, ap-done, etc. they should be ok. After all, the main program gets out of the polling test loop.

 

I doubt if there is something wrong in the connection between BRAM and the IP core. I am trying to debugging it, but debugging the codesign is time consuming. So I am writing to see if you have solved your problem. Maybe I miss something in the design, and your experience could probably help me.

 

0 Kudos
Scholar pedro_uno
Scholar
10,122 Views
Registered: ‎02-12-2013

Re: How to excercise HSL block from Zynq processing system.

Jump to solution

Liucheng,

 

I have not succeeded yet but I have not surrendered yet either. 

 

I created a trivial, C = A + B design where all the I/O goes throuth the AXI interface of the HLS block. Then I run it with these calls.


        XHls_add_SetA_v(&hls_add, a);
        XHls_add_SetB_v(&hls_add, b);
        c = XHls_add_GetReturn(&hls_add);

 

That works and gives me confidence that the AXI register interface is working.

 

I am dropping back and trying an even simpler memory interface HLS block. It just negates the contents of a memory block.  I'll let you know if I get any kind of HLS to modify BRAM contents.

 

I think this is important because many applications will work like this. Load BRAM. Hit Start. Wait for Done. Read results from BRAM.

 

  Pete

----------------------------------------
DSP in hardware and software
-----------------------------------------
0 Kudos
Explorer
Explorer
10,114 Views
Registered: ‎06-17-2012

Re: How to excercise HSL block from Zynq processing system.

Jump to solution

Thanks for your reply. 

It seems we are making exactly the same progress. I could also make it work when all the arguments are variables or pointers. Anyway, I am trying to add debugging block in the design to see what's going on in the the BRAM and IP core connection, Hope it may help figure out the problem.

 

By the way, there will be a Xilinx Training on Vivado HLS. http://www.xilinx.com/training/dsp/high-level-synthesis-with-vivado-hls.htm I think it may help us to solve the problem. However, it will not be held in my city, you may check it if it is avaiable for you.

0 Kudos
Scholar pedro_uno
Scholar
10,092 Views
Registered: ‎02-12-2013

Re: How to excercise HSL block from Zynq processing system.

Jump to solution

Cheng,

 

I am still fighting to make my HLS block modify memory contents.  I think I have found a clue by trying to use Vivado Analyzer to look at the memory interface to my HLS block.

 

When I attached Analyzer probes to the busses of the memory interface I see the following signals 

 

zynq_system_i/mem_negate_1_data_v_porta_ADDR[*]

zynq_system_i/mem_negate_1_data_v_porta_DIN[*]

zynq_system_i/mem_negate_1_data_v_porta_DOUT[*]

zynq_system_i/mem_negate_1_data_v_porta_WE[*]

zynq_system_i/mem_negate_1_data_v_porta_EN

zynq_system_i/mem_negate_1_data_v_porta_RST

 

DOUT is the data output of my HLS block called mem_negate so it is the data input to the block ram.  I get an error when I try to put Analyzer probes on that bus because it is driven by GND. When I execute the HLS hardware from Zynq it writes all zeros to BRAM.  I think this is the same behavior you saw with your design. The results were always zero.

 

Here is the HLS code.

 

#include "ap_int.h"

 

void mem_negate(ap_uint<10> num_locs, ap_int<32> data[1024]){

 

    outer_for:for(int i=0;i<num_locs;i++){

        data[i] = -data[i];

    }

}

 

Can anyone tell my why this code only writes zeros to the BRAM?

 

My design intent is that the block read the value of each memory location, multiply by -1 and then write that value back to the BRAM.  The HLS C++ simulation works as expected.

 

Thanks for any advice.

 

    Pete

 

 

 

----------------------------------------
DSP in hardware and software
-----------------------------------------
0 Kudos
Teacher muzaffer
Teacher
7,244 Views
Registered: ‎03-31-2012

Re: How to excercise HSL block from Zynq processing system.

Jump to solution
>> The HLS C++ simulation works as expected.

So after RTL generation, the HLS testbench code compare succeeds?
One thing to do is to dump the contents of the BRAM to a file after inversion and check if the values are what you want in C++ and/or RTL too.
- Please mark the Answer as "Accept as solution" if information provided is helpful.
Give Kudos to a post which you think is helpful and reply oriented.
0 Kudos
Scholar pedro_uno
Scholar
7,236 Views
Registered: ‎02-12-2013

possibe IP Integrator bug

Jump to solution

Yes, the RTL vs. C++ simulation succeeds in HLS.  Also, I wrote a VHDL testbench for the HLS RTL output code and that works correctly.  I believe that HLS is generating the correct logic to control a BRAM.

 

The problem seems to be a bug in the way that Vivado IP Integrator wires up the BRAM to the memory port of my HLS block.  I am working with Xilinx Tech support right now so I will report what we learn on this forum.

 

  Pete

----------------------------------------
DSP in hardware and software
-----------------------------------------
0 Kudos
Scholar pedro_uno
Scholar
7,232 Views
Registered: ‎02-12-2013

Re: possibe IP Integrator bug

Jump to solution
It looks like this is a definite bug in the Vivado IP Integrator. It does not connect the data busses between the HLS core and the Block RAM.

The local FAE tested it in Vivado 2013.3 and found it to be fixed.

In the meantime be careful if you are wiring up HLS blocks to BRAM in IP Integrator.

Pete
----------------------------------------
DSP in hardware and software
-----------------------------------------
0 Kudos
Explorer
Explorer
7,225 Views
Registered: ‎06-17-2012

Re: possibe IP Integrator bug

Jump to solution

I am not familiar with the vivado Debugging, and it cost me some time to get the IO ports of the BRAM port. The BRAM port connecting the BRAM controller are correct. while the connection between vivado IP core and BRAM seem to be strange. In your case, it can't be analyzed using ILA as it is driven by the GND. But in my case, the driver cell seems to be OK. It is driven by BRAMB36 cell. However, eventually I can't get the data port signal even when the address signals behaves as expected. Anyway, I may go back to planAhead and XPS to continue my project. And hope the bug can be fixed soon.

 

 

 

0 Kudos
Explorer
Explorer
7,203 Views
Registered: ‎06-17-2012

Re: possibe IP Integrator bug

Jump to solution

I exported the design to PlanAhead and integrated it using XPS, the problem remains the same. 

The bug may probably still be the vivado HLS.

0 Kudos
Scholar pedro_uno
Scholar
12,053 Views
Registered: ‎02-12-2013

Re: How to excercise HSL block from Zynq processing system.

Jump to solution

I can confirm that Vivado 2013.3 fixes this problem.  The data busses are now connected between the HLS port and the BRAM core.

 

Still there is one new trick you need to use. On the HLS port that goes to the BRAM you have to set the property->Config->Master Type = BRAM_CTRL

 

Good luck.

----------------------------------------
DSP in hardware and software
-----------------------------------------
0 Kudos