cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
dadduni
Observer
Observer
743 Views
Registered: ‎10-08-2020

Pass integer matrix AXI Stream

Jump to solution

Hi,

in HLS I wrote an entity with "int A[8"] input. I used pragmas to accept all the bits in parallel. It works fine but now I want the input to be streamed as AXI Stream to connect it with the Zynq platform. How can I do that?

 

0 Kudos
1 Solution

Accepted Solutions
joancab
Teacher
Teacher
571 Views
Registered: ‎05-11-2015

 

You can think of an HLS function as being a loop where an iteration starts at every data coming in. Reading data in, storing, and then processing it with a for loop is a waste.

You can do something like:

static uint iter_count = 0;

x = stream.read();
iter_count++;
// do something with x, as if it was data[iter_count] in an array
...
if(iter_count == last_data){
    // do something to finalize, write output, etc)
    ...
    iter_count = 0;
}

 

View solution in original post

9 Replies
joancab
Teacher
Teacher
733 Views
Registered: ‎05-11-2015

 

- include the library hls_stream (you will need your files to be cpp, not just c)

- define your input parameter as a stream

- define your input port as axis

- use the method read to read from the stream (and write to output via axi stream)

dadduni
Observer
Observer
726 Views
Registered: ‎10-08-2020

I have an input of 8 int so 8*32 =256 bits.

I've to define a stream of 256? When I use read, how can I use the 256bit as a int[8] matrix?

0 Kudos
joancab
Teacher
Teacher
671 Views
Registered: ‎05-11-2015

You could define a 256-bit stream but may run into trouble later on because of the width.

The question is, do you need 256 bits every clock cycle? The quick answer could be "oh, yes, I want it fast, very fast, faster than the neighbor". Speed is expensive, it pays off looking closer.

Alternatively, you could feed data one int at a time (or another intermediate number) and find your optimum between silicon usage and speed. Depending on your process, it can be pipelined so you keep a low II (Initiation interval, i.e. every how many cycles you feed data in), even if the latency (clocks between an input and its corresponding output) is big. Many processes, even the so-called "real-time" have latencies in the 100s or 1000s of cycles, that, at ns/cycle become almost unnoticeable microseconds.

To reshape a 256-bit array you can just assign like this:

my_plain_int[0] = (uint8) my_large_array;
my_plain_int[1] = (uint8) (my_large_array >> 8);
etc...

Because casting to a shorter format takes the LSBs.

You could probably re-write the above as a smart and elegant loop 

0 Kudos
dadduni
Observer
Observer
664 Views
Registered: ‎10-08-2020

okay so you are saying I have to accept all the bits as one big number and then slice off myself.

I tought about another solution: can you please tell me if it's okay?

typedef struct
{
    float array[MAT_COL];
} VEC;

void func(hls::stream<VEC> &inStream){
    VECmat_in = inStream.read();
    mat_in.array[0]; // first int
    mat_in.array[1]; // second int
...
}

It seems okay to you?

0 Kudos
joancab
Teacher
Teacher
653 Views
Registered: ‎05-11-2015

 

Yes, a structure made up of an array is also fine, although my first approach would be not to pass all data at once unless absolutely necessary. Why? because a 256-bit bus can be unroutable when you drop that thing in Vivado with 20 other blue boxes. At that point, you would have taken 5 steps forward and you will have to take 6 backward.

Many processes using N values are or can be, a loop. There is no need to hoard all data to start the process. You read one value and process it, read the next and process it, etc. Take, for example, average calculation. Every value is added to an accumulator and at the end, divide by N. You don't need all N values at once. It's more efficient this way.

0 Kudos
dadduni
Observer
Observer
647 Views
Registered: ‎10-08-2020

Thank you for the answer, I'm not trying to achieving something specific, I'm just practicing with HLS and interconnect in the Zynq.

I'm still trying to implement the Covariance (you helped me) and so I've 8 channels to multiply each others, I can do that transmitting the data serally, but after that the data have to be buffered inside the function and precessed all in once.

How do you think is the best method to do that?

I was trying to set up an AXI Stream from the processor to the logic with all the columns in parallel so the IP could start all the multiplications.

0 Kudos
Rmccarty
Adventurer
Adventurer
637 Views
Registered: ‎09-05-2020

use 32 bit streams in and out. connect the streams to an axi dma and the dma to one of the ps hp slave interfaces.

 

0 Kudos
dadduni
Observer
Observer
630 Views
Registered: ‎10-08-2020

Okay, in this case I've to hand write the internal buffer of the function that stores 8 values from the stream before computing the multiplications or I can use the classic syntax of an input array and than the compiler instantiates the bus and the memory?

0 Kudos
joancab
Teacher
Teacher
572 Views
Registered: ‎05-11-2015

 

You can think of an HLS function as being a loop where an iteration starts at every data coming in. Reading data in, storing, and then processing it with a for loop is a waste.

You can do something like:

static uint iter_count = 0;

x = stream.read();
iter_count++;
// do something with x, as if it was data[iter_count] in an array
...
if(iter_count == last_data){
    // do something to finalize, write output, etc)
    ...
    iter_count = 0;
}

 

View solution in original post