UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Visitor elfwind
Visitor
2,907 Views
Registered: ‎06-15-2017

What's the advatange of piped kernels

Jump to solution

I am a new OpenCL and SDaccel programmer. I see many people are using pipes to pass data through kernels and I don't know what's the advantage of this.

The kernels kind looks like as follows:

 

p0
p1

kernel1{
 //read data from host, then write to pipe0
}


kernel2{
   //read pipe0
   //calculations
   //write pipe1
}

kernel3{
 //read pipe1
 //write to host
}

 

 

For this kind of use, 3 kernels are used and the maxium kernel number is 10.
Why don't we merge them and do all the things in 1 kernel?

0 Kudos
1 Solution

Accepted Solutions
Xilinx Employee
Xilinx Employee
5,074 Views
Registered: ‎01-12-2017

Re: What's the advatange of piped kernels

Jump to solution

Hi

 

When you use pipes all the kernels will be running in parallel. Read (P0) and Write (P1) kernels are essentially doing I/O operations for you and Compute kernel operates using P0 - Input and P1-Output

 

Pipes (Think of a water tap :) ) with a specified width holds the data produced by Read kernel (P0) and it will be consumed by compute kernel and as the compute kernel produces output data which is placed in a different pipe (P1), Write kernel consumes this data and writes back to DDR. 

 

If all the kernels are merged into a single kernel then it becomes sequential. If you want to achieve similar behavior even after merging kernels then you can use dataflow to achieve the same.

 

Please refer to the example below.

 

https://github.com/Xilinx/SDAccel_Examples/blob/master/getting_started/dataflow/dataflow_func_ocl/src/adder.cl

 

Thanks

Kali

View solution in original post

0 Kudos
5 Replies
Xilinx Employee
Xilinx Employee
5,075 Views
Registered: ‎01-12-2017

Re: What's the advatange of piped kernels

Jump to solution

Hi

 

When you use pipes all the kernels will be running in parallel. Read (P0) and Write (P1) kernels are essentially doing I/O operations for you and Compute kernel operates using P0 - Input and P1-Output

 

Pipes (Think of a water tap :) ) with a specified width holds the data produced by Read kernel (P0) and it will be consumed by compute kernel and as the compute kernel produces output data which is placed in a different pipe (P1), Write kernel consumes this data and writes back to DDR. 

 

If all the kernels are merged into a single kernel then it becomes sequential. If you want to achieve similar behavior even after merging kernels then you can use dataflow to achieve the same.

 

Please refer to the example below.

 

https://github.com/Xilinx/SDAccel_Examples/blob/master/getting_started/dataflow/dataflow_func_ocl/src/adder.cl

 

Thanks

Kali

View solution in original post

0 Kudos
Scholar u4223374
Scholar
2,888 Views
Registered: ‎04-26-2015

Re: What's the advatange of piped kernels

Jump to solution

There are two reasons:

 

(1) As @kalib has said, the kernels can all be run in parallel with piped/streaming data. This has big advantages in terms of RAM bandwidth and total throughput. If they were all run sequentially inside a single function, that function would have to store the output from kernel1 until it finished (potentially a very large amount of data), then feed it to kernel2 (and store the output from that until it finished), etc.

 

(2) You can write all the kernels to run in parallel in a single block, but it tends to get extremely messy for even fairly simple functions. A 3x3 image convolution function is straightforward, fairly neat, and can easily achieve 1 pixel/cycle throughput. Five 3x3 image convolution blocks with each one feeding into the next are really no more complex; you've just got five identical sets of code. They'll still achieve 1 pixel/cycle throughput. On the other hand, a single block that can perform five 3x3 convolutions sequentially on an image and provide 1 pixel/cycle throughput is going to be an extremely complex, error-prone piece of code. It could potentially be smaller if optimized properly, but the greatly increased development time and decreased maintainability will tend to be more of a concern.

Visitor elfwind
Visitor
2,830 Views
Registered: ‎06-15-2017

Re: What's the advatange of piped kernels

Jump to solution

Good to know!

To put the I/O operations in a separate function and run in parallel with main kernel seems like a good idea.

I still have 2 questions:

1. Can I merge kernel1 and kernel3 so that I can save 1 CU

2. Further more, what will happen if I use pipes between lots of CUs. For example, use 1 or 2 kernels to feed 8 CUs through pipes.

Will this result in lots of FIFOs directly routing between kernel modules and cause congestion?

Or pipes are actually formed by AXI buses and interconnects. Then these pipes will be time division and the performance may be downgraded.

Which one is right?

 

Thanks,

Syu

0 Kudos
Xilinx Employee
Xilinx Employee
2,781 Views
Registered: ‎07-18-2014

Re: What's the advatange of piped kernels

Jump to solution

hi @elfwind,

I would suggest if you can give a try to dataflow based implementation as suggested by @kalib:

https://github.com/Xilinx/SDAccel_Examples/blob/master/getting_started/dataflow/dataflow_func_ocl/src/adder.cl

 

Design is almost identical to three separate kernels connected through Pipe. Big advantage is that you will get the same performance using single kernel. So you can go upto 10 such instances.

 

-Heera

0 Kudos
Observer ywu1
Observer
2,766 Views
Registered: ‎05-19-2017

Re: What's the advatange of piped kernels

Jump to solution

Pipes are AXI stream FIFOs, point to point. Having 1 CU feeding 8 CUs with PIPEs should be fine provided they aren't huge FIFOs. 

0 Kudos