cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
a.turowski2
Visitor
Visitor
8,607 Views
Registered: ‎01-15-2016

Merging hls::streams outputs from concurrent functions

Jump to solution

Hi all,

 

I am trying to use Vivado HLS 2015.v4 to create a design, where I want to merge multiple streams output from concurrent running functions into one output. The code looks like that:

 

hls_top_function(hls::stream<int> &outstream)

{

hls::stream<int> algo_outstreams[2];

 

   algo_0(algo_outstreams[0]);

   algo_1(algo_outstreams[1]);

 

   for(int i=0; i < 2; i++) {

#pragma HLS UNROLL

      if(algo_outstreams[i].empty() == false) {

         outstream.write(algo_outstreams[i].read());

     }

}

 

algo_0 and algo_1 functions generate output every couple of clock cycles.When I check in C/RTL cosimulation waveforms, I can see that algo_0 and algo_1 are running in parallel as expected. The problem is that the for loop merging algorithms outputs is waiting for both algorithms to generate an output before it is starting to send anything on the design output. Desired behaviour for the design would be to send an output whenever at least one algorith has generated an output. This kind of thing could be easily implemented in RTL, however I can't implement it in HLS. Can anyone tell me how to implement such a thing please?

 

Best regards,
Adam

Tags (1)
0 Kudos
1 Solution

Accepted Solutions
a.turowski2
Visitor
Visitor
15,080 Views
Registered: ‎01-15-2016

Hi herver,

 

Thank you for coming back to me with the suggestion. I was playing around with it and some other ideas. Unfortunately the only thing that managed to convince HLS to implement hardware where the output from each algorithm was immediately merged to output stream was to expose algo outputs to the external world and merge them in RTL.

 

Best regards,

Adam

View solution in original post

0 Kudos
4 Replies
guillaumebres
Scholar
Scholar
8,593 Views
Registered: ‎03-27-2014

can't you simply build the output vector from the AXIS stream algo_outstreams?

 

output.tdata( 63, 32 ) = algo_outstream[1].tdata
output.tdata( 31, 0 ) = algo_outstream[0].tdata

 

this way you naturally take advantage of the synchronisity of algo_0 and algo_1

 

gw.
Embedded Systems, DSP, cyber
0 Kudos
a.turowski2
Visitor
Visitor
8,588 Views
Registered: ‎01-15-2016

Hi guillaumebres

 

The output from each algorithm is exactly what should appear on the AXIS bus. In other words I want to put algorithms outputs on AXIS one after another. That bit works fine. The problem is that HLS starts to put values when ALL algorithms generated their output, but I want to have each algorithm output put on the bus when it is available without waiting for other algorithms. As I stated before, I confirmed that the algorithms are running concurently as expected. Only the merging bit behaviour is not as required and needs to be changed.

 

Best regards,

Adam

0 Kudos
herver
Xilinx Employee
Xilinx Employee
8,566 Views
Registered: ‎08-17-2011

Hello @a.turowski2

 

The main issue that you are facing is that the "untimed" C simulation is different from the RTL or C / RTL cosim where everything runs in parallel.

 

From a C simulation, the first func call needs to complete before the second one can take place.

Then the for loop ; but in your for loop you are reading at most one data for each of the streams but you are writing up to N=2 values in the output stream.

 

is your C TB expecting between 1 and N output values or any?

 

I think the only way you would get the desired functionailty is to use dataflow, and get the functions to run at II=1 with the same latency or very similar and then have a priority decoder for the output muxing:

 

   for(int i=0; i < 2; i++) {

#pragma HLS UNROLL

      if( ! algo_outstreams[i].empty() ) {

         outstream.write(algo_outstreams[i].read());

         break; // <<<<<< we will always write at most one output, never 2 or more

     }

with the "break" the II of the unrolled loop will be II=1;

[0] will have more priority than [1] and [1] more than [2] etc..

 

you would then call your top with II=1 ;

 

I hope this helps or put you on the right way.

 

- Hervé

SIGNATURE:
* Vivado HLS forums* http://forums.xilinx.com/t5/High-Level-Synthesis-HLS/bd-p/hls
* Readme/Guidance* http://forums.xilinx.com/t5/New-Users-Forum/README-first-Help-for-new-users/td-p/219369

* Please mark the Answer as "Accept as solution" if information provided is helpful.
* Give Kudos to a post which you think is helpful and reply oriented.
0 Kudos
a.turowski2
Visitor
Visitor
15,081 Views
Registered: ‎01-15-2016

Hi herver,

 

Thank you for coming back to me with the suggestion. I was playing around with it and some other ideas. Unfortunately the only thing that managed to convince HLS to implement hardware where the output from each algorithm was immediately merged to output stream was to expose algo outputs to the external world and merge them in RTL.

 

Best regards,

Adam

View solution in original post

0 Kudos