UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Voyager
Voyager
7,333 Views
Registered: ‎10-07-2011

Looking for coding tips

Hi folks,

 

I have to build a data processing pipe and I thought it would be a good opportunity to give HLS a try.

 

The very first module I have to build is a vector averaging module with initiation interval set to 1 (don't care about latency). Each of my dataset is, let's say, a vector of 256 values. I need to average,57 of them and produce an averaged 256-value vector.

 

If I was to do it in RTL, I would proceed as described in the attached block diagram file. I would use an AXI4-Stream Packet FIFO IP and a Divider IP. The other blocks would be hand-coded.

 

If I am to do the whole thing using HLS, how should I proceed? How can I tell HLS to use a FIFO? I think the divider IP will be derived from the 'C' / operator. But how do I tell HLS to use a FIFO? And how do I make sure it is configured properly?

 

I'm almost sure the answer is "let HLS decide how to store the accumulated data". If so, how can I do that?

 

Many thanks!

 

Claude

0 Kudos
6 Replies
Xilinx Employee
Xilinx Employee
7,317 Views
Registered: ‎03-24-2010

Re: Looking for coding tips

Just a conceptual illustration about how to realize ap_fifo,stream. You need to refer to UG902 for more about ap_fifo,stream,pipeline and etc.

 

void fun(int a[N],int b[N]){

#pragma HLS interface ap_fifo port=a

#pragma HLS interface ap_fifo port=b

#pragma HLS resource core=AXI4Stream variable=b

#pragma HLS resource core=AXI4Stream variable=b

    int temp=0;

    for(i=0;i<N;i++){

#pragma HLS PIPELINE

      temp=temp*i/(i+1)+a[i]/(i+1);

      b[i]=temp;

    }

}

Regards,
brucey
----------------------------------------------------------------------------------------------
Kindly note- Please mark the Answer as "Accept as solution" if information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
----------------------------------------------------------------------------------------------
0 Kudos
Voyager
Voyager
7,312 Views
Registered: ‎10-07-2011

Re: Looking for coding tips

Hello all,

 

Bruce, many thanks for the hint.

 

Here's where I'm getting confused. If I was to code the RTL myself, I would manage the pipe using TLAST, TVALID and TREADY stream signals. But when I'm coding in C, I have no access to these signals.

 

Two problems I'm facing:

 

1. I have no prior information on the incoming packet length. I know it won't exceed 4096 data beats but it can be anything. Hence, I must rely on TLAST to proceed the data correctly. How do I get access to TLAST?

 

2. The processing module is an averaging module, with no power consumption constraint. Hence, I don't care about having the divider output a new result every clock cycle even though this is garbage. If I was coding the RTL myself, I would simply manage the output TVALID signal to qualify each of the output TDATA. But what happens in the average module is that the only valid out packet is going out when the last (Nth) input packet is coming in. So how should I C-code such a behavior?

 

Cheers!

 

Claude

0 Kudos
Teacher muzaffer
Teacher
7,309 Views
Registered: ‎03-31-2012

Re: Looking for coding tips

I think there is a disconnect here. When you write your C code, you produce the data at its input and you receive the data at its output. Your description sounds as if the data is being sourced outside of your control. That maybe true for the driving C program but not for the part you want to synthesize with HLS. Your higher level program receives the data and there is a signal which tells it the end of the data. You need to code your C to HLS code with the same type of valid signal for input data and in the C function, add another signal like output_valid which becomes active once the processing is done in HLS/C code. Then you can figure out how to map these signals to an AXI stream interface.
Does that make sense at all?
- Please mark the Answer as "Accept as solution" if information provided is helpful.
Give Kudos to a post which you think is helpful and reply oriented.
0 Kudos
Visitor panyu
Visitor
7,305 Views
Registered: ‎01-04-2013

Re: Looking for coding tips

@Claude, if you are coming from a hardware background, using SystemC variation of the HLS will probably make a lot more sense. You are able to have almost complete control of each interface, being it signal, FIFO or AXI4 stream, and block structure much the same way as with RTL coding. Meanwhile, you also have the flexibility of doing design space exploration more quickly with the power of HLS pragmas.

 

With that said, you may take a look at my previous post for an example of SystemC HLS with AXI4 stream here for a starter: http://forums.xilinx.com/t5/High-Level-Synthesis-HLS/SystemC-FIFO-Synthesis-issue-in-2013-3/td-p/373487 . You may use HLS 2013.2 to compile it.

 

If you need to use TLAST on the AXIS interface, in theory you can do it like what I did in another pose here http://forums.xilinx.com/t5/High-Level-Synthesis-HLS/Problem-doing-RTL-cosimulation-with-struct-types-in-SystemC-in/m-p/347685#M69 . However, as of 2013.2, it only works in SystemC simulation, but not synthesis. A work around is to have TLAST on a separate AXIS interface, and access (read/write) it together with the company DATA interface. There will be separate AXIS interfaces generated for TLAST and DATA, but you can always glue them together by a top level wrapper of your own upon the generated top level.

 

Happy coding in HLS.

Visitor panyu
Visitor
7,304 Views
Registered: ‎01-04-2013

Re: Looking for coding tips

One more issue to notice is that, if you are doing deep pipelining in your code with an pipelined "/" operator infered automatically by HLS (e.g., 1 cycle per throughput on a 70~80 stage pipeline), it is better to ensure that the data going into and coming out of the pipeline in a continous manner. I have experienced non-tallied TVALID and output data when doing deep pipelining with a "/" operator, and my work around is to have a pair of fifos before and after the divider, and only kick start the division loop when the input fifo has enough data for a burst, and the output fifo has enough space to accommodate the results. (In addition, the num_available() and num_free() function with SystemC fifo in HLS are not what they seem to be though, their actual symentics are is_available() and is_free(), so you also some tricks to make sure the data/space availability, but this is another story...)

 

With above being said, it is usually preferable to avoid using variable "/" at all cost. An inferred divider for 2x-bit operands may cost you thousands of LUTs and 50+ stage pipelining. If "/" is not avoidable, and throughput is not so demanding, you may code a relatively simpler high radix divider and save much resource.

0 Kudos
Xilinx Employee
Xilinx Employee
7,250 Views
Registered: ‎11-28-2007

Re: Looking for coding tips

Please take a look at the example below in Vivado installation directory on how to use TLAST and other side channel signals in AXI4 stream in C++.

 

C:\Xilinx\Vivado_HLS\2013.3\examples\design\axi_stream_side_channel_data

 

Please also read hls_stream class in the HLS UG902.

 


@chevalier wrote:

Hello all,

 

Bruce, many thanks for the hint.

 

Here's where I'm getting confused. If I was to code the RTL myself, I would manage the pipe using TLAST, TVALID and TREADY stream signals. But when I'm coding in C, I have no access to these signals.

 

Two problems I'm facing:

 

1. I have no prior information on the incoming packet length. I know it won't exceed 4096 data beats but it can be anything. Hence, I must rely on TLAST to proceed the data correctly. How do I get access to TLAST?

 

2. The processing module is an averaging module, with no power consumption constraint. Hence, I don't care about having the divider output a new result every clock cycle even though this is garbage. If I was coding the RTL myself, I would simply manage the output TVALID signal to qualify each of the output TDATA. But what happens in the average module is that the only valid out packet is going out when the last (Nth) input packet is coming in. So how should I C-code such a behavior?

 

Cheers!

 

Claude




Cheers,
Jim
0 Kudos