UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Visitor esterhui
Visitor
10,567 Views
Registered: ‎06-27-2013

HLS AXI Streams: Single producer, multiple consumer

Jump to solution

Hi,

 

I'm trying to implement a single producer/multi consumer design with Vivado HLS. The idea is the top level block gets data via a AXI Stream. Inside the block there are N channels, each consuming the SAME DATA, running concurrently. These channels perform slightly different computations on the data. Each channel then writes its results asynchronously to a AXI output stream (eg, the channel0 result might be ready before channel1 has results)

 

The internal channel is called 'consumer_ram', it might have to pause the incoming stream (and thus halting all other channels as well), once it is halted, software change register values of this channel, then software will tell it to continue reading from the stream (starting this specific channel and unblocking all other channels as a side-effect). 

 

Here are my questions:

 

1) Is the most efficient way to implement single producer multi consumer by reading the input stream and writing to N intermediate streams for each channel?

2) Currently when the #pragma HLS inline isn't specified in consumer_ram.cpp, HLS just hangs, any ideas?

3) I tried to declare an array of output streams hls::stream<int32_t> data_out[2], but am having trouble during cosimulation:

@E [SIM-346] C test bench instrumentation failed: Cannot determine C object for RTL port data_out_ch_V_0.

Does cosimulation support arrays of axi streams classes?

 

4) Cosimulation is not passed, when examining the signals with modelsim, it looks like the internal channel loops (SUM_STREAM_LOOP) never receives the ap_start signal. Any ideas?

 

I have attached example code, here it is as well (the consumer in this example is just a simple accumulator and mult):

 

consumer_ram_nch.h

#ifndef __CONSUMER_RAM_NCH_H
#define __CONSUMER_RAM_NCH_H

#include <stdio.h>
#include <inttypes.h>
#include <ap_int.h>
#include <hls_stream.h>

#include "consumer_ram.h"

#define NUM_CH 2

typedef struct {
    int16_t coeff[NUM_CH];
    uint32_t skip_samples[NUM_CH];
    uint32_t num_samples[NUM_CH];
} channel_config;

void consumer_ram_nch(hls::stream<int16_t> &data_in,
    channel_config config,
    hls::stream<int32_t> &data_out0,
    hls::stream<int32_t> &data_out1
    );

#endif

 

consumer_ram_nch.cpp

#include "consumer_ram_nch.h"

void consumer_ram_nch(hls::stream<int16_t> &data_in,
    channel_config config,
    hls::stream<int32_t> &data_out0,
    hls::stream<int32_t> &data_out1
    )
{
#pragma HLS DATAFLOW
    // Use array_reshape to map array into a register
#pragma HLS ARRAY_RESHAPE variable=config.coeff         complete dim=1
#pragma HLS ARRAY_RESHAPE variable=config.skip_samples  complete dim=1
#pragma HLS ARRAY_RESHAPE variable=config.num_samples   complete dim=1

    hls::stream<int16_t> data_in_ch[NUM_CH];

    int16_t data_temp;

#pragma HLS INTERFACE ap_none port=config
#pragma HLS INTERFACE ap_fifo port=data_in
#pragma HLS INTERFACE ap_fifo port=data_out0
#pragma HLS INTERFACE ap_fifo port=data_out1
        
#pragma HLS RESOURCE variable=config core=AXI4LiteS metadata="-bus_bundle BUS_CTRL"
#pragma HLS RESOURCE variable=data_in core=AXIS metadata="-bus_bundle STREAM_IN"
#pragma HLS RESOURCE variable=data_out0 core=AXIS metadata="-bus_bundle STREAM_OUT0"
#pragma HLS RESOURCE variable=data_out1 core=AXIS metadata="-bus_bundle STREAM_OUT1"

#pragma HLS RESOURCE core=AXI4LiteS metadata="-bus_bundle BUS_CTRL" variable=return

    // --- Read from single producer and write
    // --- to multiple consumers ---------- ///
    for (uint32_t i=0; i < config.num_samples[0]; i++) {
        data_temp=data_in.read();
        data_in_ch[0].write(data_temp);
        data_in_ch[1].write(data_temp);
    }

    consumer_ram(data_in_ch[0],
        config.coeff[0], config.skip_samples[0], config.num_samples[0],
        data_out0);

    consumer_ram(data_in_ch[1],
        config.coeff[1], config.skip_samples[1], config.num_samples[1],
        data_out1);

}

 consumer_ram.h

#ifndef __CONSUMER_RAM_H
#define __CONSUMER_RAM_H

#include <stdio.h>
#include <inttypes.h>
#include <ap_int.h>
#include <hls_stream.h>

#define MAX_PTS 20456

void consumer_ram(hls::stream<int16_t> &data_in,
    int16_t coeff,
    uint32_t skip_samples,
    uint32_t num_samples,
    hls::stream<int32_t> &data_out);

#endif

 consumer_ram.cpp

#include "consumer_ram.h"

void consumer_ram(hls::stream<int16_t> &data_in,
    int16_t coeff,
    uint32_t skip_samples,
    uint32_t num_samples,
    hls::stream<int32_t> &data_out)
{
// If I don't add the pragma inline below
// then HLS hangs here:
// @W [SYN-210] Renamed object name 'consumer_ram_nch_entry1_Loop_1_proc_U0' to 'consumer_ram_nch_entry1_Loop_1_proc_U0'
// @W [SYN-210] Renamed object name 'consumer_ram_U0' to 'consumer_ram_U0'
// @W [SYN-210] Renamed object name 'consumer_ram_old_U0' to 'consumer_ram_old_U0'
// @W [SYN-210] Renamed object name 'AP_FIFO_data_in_ch_0_V_U0' to 'data_in_ch_0_V'
// @W [SYN-210] Renamed object name 'AP_FIFO_data_in_ch_1_V_U0' to 'data_in_ch_1_V'
// @W [SYN-210] Renamed object name 'AP_FIFO_tmp_7_loc_channel_U0' to 'tmp_7_loc_channel'
//  <---- HANGS (checked over 1 hour)
#pragma HLS inline

    uint32_t i;
    int32_t accum=0;
    int16_t temp_in;

SUM_STREAM_LOOP:
    for (i=0; i < num_samples; i++) {
#pragma HLS loop_tripcount max=1048576
#pragma HLS pipeline
        // Fetch a word from the stream
        temp_in=data_in.read();
        //temp_in=data_in[i];

        // Skip this many samples
        if (i<skip_samples) {
            continue;
        }

        // Now accumulate
        accum+=temp_in*coeff;
    }

    // Write result to output stream;
    data_out.write(accum);
}

 consumer_ram_nch_test.cpp

#define NPTS 20456

#define EXP_VALUE 209213695

#include "consumer_ram_nch.h"

void consumer_sw(int16_t data_in[NPTS],
    int16_t coeff,
    uint32_t skip_samples,
    uint32_t num_samples,
    int32_t *data_out)
{

    uint32_t i;
    int32_t accum=0;
    int16_t temp_in;

    // Sum over num_samples but skip the first
    // skip_samples
    for (i=0; i < num_samples; i++) {
        temp_in=data_in[i];

        // Skip this many samples
        if (i<skip_samples) {
            continue;
        }

        // Now accumulate
        accum+=temp_in*coeff;
    }

    *data_out=accum;
}

int main()
{
    int retcode=0;
    hls::stream<int16_t> data_in;
    hls::stream<int32_t> data_out[NUM_CH];
    int16_t data_in_sw[NPTS];
    uint32_t skip_samples=10;
    uint32_t num_samples=NPTS;
    int32_t  data_out_sw, data_out_hw;
    int16_t coeff=3;

    channel_config config;

    // Fill SW and HW buffers
    for (int16_t i=0; i < NPTS; i++) {
        data_in_sw[i]=i;
        data_in.write(i);
    }

    // Init the variables
    for (int ch=0; ch < NUM_CH; ch++) {
        config.coeff[ch]=coeff;
        config.skip_samples[ch]=skip_samples;
        config.num_samples[ch]=num_samples;
    }

    // Run software version first
    consumer_sw(data_in_sw,coeff,skip_samples,num_samples,&data_out_sw);

    // Now run hardware
    consumer_ram_nch(data_in,config,data_out[0],data_out[1]);

    for (int ch=0; ch < NUM_CH; ch++) {
        fprintf(stdout,"Checking channel %d\n",ch);
        data_out_hw=data_out[ch].read();

        if (data_out_sw!=data_out_hw) {
            retcode=1;
            fprintf(stderr,"HW/SW accum didn't match\n");
            fprintf(stderr,"HW: %d\n",data_out_hw);
            fprintf(stderr,"SW: %d\n",data_out_sw);
        }
        else if (data_out_hw != coeff*EXP_VALUE) {
            retcode=1;
            fprintf(stderr,"HW accumulated to wrong value: %d\n");
            fprintf(stderr,"expected %d\n",EXP_VALUE);
        }
        else {
            fprintf(stdout,"Passed test\n");
        }
    }

    return retcode;
}

 

Cheers

 

- Stephan

 

0 Kudos
1 Solution

Accepted Solutions
Visitor esterhui
Visitor
16,432 Views
Registered: ‎06-27-2013

Re: HLS AXI Streams: Single producer, multiple consumer

Jump to solution

Hello Hervé,

 

Thank you for the input. I have switched to a class-based approach as you suggested, I ran into the same problem (cosimulation stalling), and finally tracked the problem down to the way consumer_ram was implemented. The simulation stalls because the second channel waits for the first channel to finish. I believe the dependecy was enforced in the very last line data_out.write(accum) :

 

void consumer_ram(hls::stream<int16_t> &data_in,
    int16_t coeff,
    uint32_t skip_samples,
    uint32_t num_samples,
    hls::stream<int32_t> &data_out)
{
#pragma HLS inline

    uint32_t i;
    int32_t accum=0;
    int16_t temp_in;
SUM_STREAM_LOOP:
    for (i=0; i < num_samples; i++) {
#pragma HLS loop_tripcount max=1048576
#pragma HLS pipeline
        // Fetch a word from the stream
        temp_in=data_in.read();

        // Skip this many samples
        if (i<skip_samples) {
            continue;
        }

        // Now accumulate
        accum+=temp_in*coeff;
    }

    // Write result to output stream;
    data_out.write(accum);
}

 When changing the above code to complete the write operation inside the for loop, cosimulation now works:

#include "consumer_ram.h"

void consumer_ram(hls::stream<int16_t> &data_in,
    int16_t coeff,
    uint32_t skip_samples,
    uint32_t num_samples,
    hls::stream<int32_t> &data_out)
{
#pragma HLS inline

    uint32_t i;
    int32_t accum=0;
    int16_t temp_in;

SUM_STREAM_LOOP:
    for (i=0; i < num_samples; i++) {
#pragma HLS loop_tripcount max=1048576
#pragma HLS pipeline
        // Fetch a word from the stream
        temp_in=data_in.read();

        // Skip this many samples
        if (i<skip_samples) {
            continue;
        }

        // Now accumulate
        accum+=temp_in*coeff;
        if (i==num_samples-1) {
            // Write result to output stream;
            data_out.write(accum);
        }
    }
}

 

Thanks

 

 - Stephan

2 Replies
Xilinx Employee
Xilinx Employee
10,542 Views
Registered: ‎08-17-2011

Re: HLS AXI Streams: Single producer, multiple consumer

Jump to solution

hello Stephan,

 

I haven't look into this in details, so i'll be able to only make comments from my understanding..

 

1)

first, unless i missed it, I think that there are the depths of the streams that is missing - the SW uses "infinite sized queues" but as you'll apreciate the HW needs to have a maximum defined depth.

eg #pragma HLS stream depth=8 variable=OutStream

 

next, i'm not sure the use of the loop inside your top will get you what you want..I suspect there may be some locks / overflow / underflow issues.


 

 

consumer_ram_nch.cpp

    // --- Read from single producer and write
    // --- to multiple consumers ---------- ///
    for (uint32_t i=0; i < config.num_samples[0]; i++) {
        data_temp=data_in.read();
        data_in_ch[0].write(data_temp);
        data_in_ch[1].write(data_temp);
    }

    consumer_ram(data_in_ch[0],
        config.coeff[0], config.skip_samples[0], config.num_samples[0],
        data_out0);

    consumer_ram(data_in_ch[1],
        config.coeff[1], config.skip_samples[1], config.num_samples[1],
        data_out1);

 


I would be tempted to suggest adopting a coding style similar to the one from the example frmo the tool: C:\Xilinx\Vivado_HLS\2013.2\examples\coding\cpp_FIR

and have something looking like this :

 

 

consumer_ram_nch( /*IO declaration */ )

// variables & directive ...

 

    static YOURCLASS<coef_t, data_t, acc_t> consum_ram0;

    static YOURCLASS<coef_t, data_t, acc_t> consum_ram1;

 

        data_temp=data_in.read();
        data_in_ch_0.write(data_temp);
        data_in_ch_1.write(data_temp);

 

  data_out0 << consum_ram0(data_in_ch_0);

  data_out1 << consum_ram1(data_in_ch_1);

}

also try to not have arrays in the proof of concept phase.

 

2) when using dataflow, the functions needs to be inlined - check UG902 on dataflow for details

 

3) not sure about the error but AXI interface adaptor aren't simulated with 2013.2 - ie they simulate only at the level set by the top synthesizable function so it doesn't include the bus adaptors.

 

4) not sure on that one either, but if you do the above changes, then I guess that may be fixed? (make sure to use 2013.2 and not earlier versions)

 

I hope this helps...

- Hervé

SIGNATURE:
* New Dedicated Vivado HLS forums* http://forums.xilinx.com/t5/High-Level-Synthesis-HLS/bd-p/hls
* Readme/Guidance* http://forums.xilinx.com/t5/New-Users-Forum/README-first-Help-for-new-users/td-p/219369

* Please mark the Answer as "Accept as solution" if information provided is helpful.
* Give Kudos to a post which you think is helpful and reply oriented.
0 Kudos
Visitor esterhui
Visitor
16,433 Views
Registered: ‎06-27-2013

Re: HLS AXI Streams: Single producer, multiple consumer

Jump to solution

Hello Hervé,

 

Thank you for the input. I have switched to a class-based approach as you suggested, I ran into the same problem (cosimulation stalling), and finally tracked the problem down to the way consumer_ram was implemented. The simulation stalls because the second channel waits for the first channel to finish. I believe the dependecy was enforced in the very last line data_out.write(accum) :

 

void consumer_ram(hls::stream<int16_t> &data_in,
    int16_t coeff,
    uint32_t skip_samples,
    uint32_t num_samples,
    hls::stream<int32_t> &data_out)
{
#pragma HLS inline

    uint32_t i;
    int32_t accum=0;
    int16_t temp_in;
SUM_STREAM_LOOP:
    for (i=0; i < num_samples; i++) {
#pragma HLS loop_tripcount max=1048576
#pragma HLS pipeline
        // Fetch a word from the stream
        temp_in=data_in.read();

        // Skip this many samples
        if (i<skip_samples) {
            continue;
        }

        // Now accumulate
        accum+=temp_in*coeff;
    }

    // Write result to output stream;
    data_out.write(accum);
}

 When changing the above code to complete the write operation inside the for loop, cosimulation now works:

#include "consumer_ram.h"

void consumer_ram(hls::stream<int16_t> &data_in,
    int16_t coeff,
    uint32_t skip_samples,
    uint32_t num_samples,
    hls::stream<int32_t> &data_out)
{
#pragma HLS inline

    uint32_t i;
    int32_t accum=0;
    int16_t temp_in;

SUM_STREAM_LOOP:
    for (i=0; i < num_samples; i++) {
#pragma HLS loop_tripcount max=1048576
#pragma HLS pipeline
        // Fetch a word from the stream
        temp_in=data_in.read();

        // Skip this many samples
        if (i<skip_samples) {
            continue;
        }

        // Now accumulate
        accum+=temp_in*coeff;
        if (i==num_samples-1) {
            // Write result to output stream;
            data_out.write(accum);
        }
    }
}

 

Thanks

 

 - Stephan