cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
apirrone
Observer
Observer
1,041 Views
Registered: ‎04-26-2018

[SDx] Hardware function hangs data access_pattern SEQUENTIAL

Hello,

 

Following the example found here : https://www.xilinx.com/support/documentation/sw_manuals/xilinx2017_2/ug1235-sdsoc-optimization-guide.pdf starting page 21, I wanted to implement a convolution algorithm.

 

I pretty much copied the example, but here is what I wrote :

 

 


//.hpp
[...]
#pragma SDS data access_pattern(input:SEQUENTIAL, output:SEQUENTIAL)
void hw_new_convolution(int input[IM_W*IM_H], int output[IM_W*IM_H]);
[...]

// .cpp
[...]
void hw_new_convolution(int input[IM_W*IM_H], int output[IM_W*IM_H] ){ int hconv_buffer[IM_W*IM_H]; int vconv_buffer[IM_W*IM_H]; int* phconv; int* pvconv; int hwin[K]; int vwin[K]; //Horizontal convolution phconv = hconv_buffer;// set pointer to start of buffer for(int col = 0 ; col < IM_H ; col++){ for(int row = 0 ; row < IM_W ; row++){//row #pragma HLS PIPELINE int in_val = *input++; int out_val = 0; for(int i = 0 ; i < K ; i++){ hwin[i] = i < K-1 ? hwin[i+1] : in_val; out_val += hwin[i]*hcoeff[i]; } if(row >= K-1) *phconv++ = out_val; } } // Vertical convolution int linebuf[K][IM_W]; phconv = hconv_buffer; //reset pointer to start of buffer pvconv = vconv_buffer; // set pointer to start of buffer for(int col = 0 ; col < IM_H ; col++){ for(int row = 0 ; row < IM_W-(K-1) ; row++){//row #pragma HLS DEPENDENCE variable=linebuf inter false #pragma HLS PIPELINE int in_val = *phconv++; int out_val = 0; for(int i = 0 ; i < K ; i++){ int vwin_val = i < K-1 ? linebuf[i][row] : in_val; out_val += vwin_val*vcoeff[i]; if(i > 0) linebuf[i-1][row] = vwin_val; } if(col >= K-1) *pvconv++ = out_val; } } //Border int borderBuf[IM_W]; pvconv=vconv_buffer; for(int i = 0 ; i < IM_H ; i++){ for(int j = 0 ; j < IM_W ; j++){ int pix_in, l_edge_pix, r_edge_pix, pix_out; #pragma HLS PIPELINE if( i==0 || ( i > BORDER && i < (IM_H - BORDER) ) ){ // Read a pixel out of the video stream and cache it for immediate use and later replication purposes if(j < IM_W - (K-1)){ pix_in = *pvconv++; borderBuf[j] = pix_in; } if(j == 0) l_edge_pix = pix_in; if(j == IM_W - K) r_edge_pix = pix_in; } //Select outpu tvalue from the appropriate cache resource if(j <= BORDER+1) pix_out = l_edge_pix; else if (j >= (IM_W - BORDER -1)) pix_out = r_edge_pix; else pix_out = borderBuf[j-BORDER]; *output++ = pix_out; } } }
[...]

 

When executing on the board (zcu104), the process hangs and there is no way of killing it (even with kill -9).

 

Using the SDSoC debugging tools, I noticed the function hangs at a "cf_wait(...)" instruction, which would indicate that the consumer is waiting for data to be sent by the producer (according to : https://www.xilinx.com/html_docs/xilinx2018_2/sdsoc_doc/srv1523743253596.html) in the context of the streaming interface created by specifying :

 

#pragma SDS data access_pattern(input:SEQUENTIAL, output:SEQUENTIAL)

 

I followed the advices of the documentation "Debugging System Hangs and Runtime Errors" (last link I provided), and I get the same results when running in emulation, but when analysing the signals, I noticed that on the signal corresponding to my output stream (sorry if I am not using the right terms, I'm new to all this):

 

Capture d’écran de 2018-08-01 11-17-56.png

 

I get the same behaviour when trying to execute this example https://github.com/Xilinx/SDSoC_Examples/tree/master/cpp/getting_started/dependence_inter (which is very close to what I want to do) without any code modification, I just mark the "vconv_hw" function to be implemented in hardware in the SDx interface.

 

 

Any help is very much appreciated !

 

Thank you very much !

 

Apirrone

 

0 Kudos
2 Replies
apirrone
Observer
Observer
961 Views
Registered: ‎04-26-2018

I still haven't resolved this issue, but something is bugging me :

 

The point of using a sequential data access pattern is to avoid copying the entire image into the FPGA fabric, right ?

 

Then what about these buffers  :

 

[...]
int hconv_buffer[IM_W*IM_H]; int vconv_buffer[IM_W*IM_H];
[...]

 

The documentation says "[...]However, there are now intermediate buffers, hconv and vconv, between each loop. Because these are accessed in a streaming manner, they are optimized into single registers in the final implementation." (here : https://www.xilinx.com/html_docs/xilinx2018_2/sdsoc_doc/algorithm-with-optimal-data-access-patterns-seo1504034428669.html)

 

But should I do something or is it automatically done ? I can't find any information about that.

 

Oh btw, I don't really know what I changed, but now when I try to synthetize the project, I get that :

 

 

 

ERROR: [VPL 30-640] Place Check : This design requires more RAMB36/FIFO cells than are available in the target device. This design requires 414 of such cell types but only 312 compatible sites are available in the target device. Please analyze your synthesis results and constraints to ensure the design is mapped to Xilinx primitives as expected. If so, please consider targeting a larger device.
ERROR: [VPL 30-640] Place Check : This design requires more RAMB18 and RAMB36/FIFO cells than are available in the target device. This design requires 868 of such cell types but only 624 compatible sites are available in the target device. Please analyze your synthesis results and constraints to ensure the design is mapped to Xilinx primitives as expected. If so, please consider targeting a larger device.
ERROR: [VPL 30-640] Place Check : This design requires more RAMB36E2 cells than are available in the target device. This design requires 414 of such cell types but only 312 compatible sites are available in the target device. Please analyze your synthesis results and constraints to ensure the design is mapped to Xilinx primitives as expected. If so, please consider targeting a larger device.
ERROR: [VPL 30-99] Placer failed with error: 'IO Clock Placer stopped due to earlier errors. Implementation Feasibility check failed, Please see the previously displayed individual error or warning messages for more details.'
Please review all ERROR and WARNING messages during placement to understand the cause for failure.
ERROR: [VPL 17-69] Command failed: Placer could not place all instances
ERROR: [VPL 60-704] Integration error, problem implementing dynamic region, place_design ERROR
ERROR: [VPL 60-806] Failed to finish platform linker
ERROR: [SdsCompiler 83-5019] Exiting sds++ : Error when calling '/home/apirrone/Xilinx/SDx/2018.2/bin/vpl   --iprepo /home/apirrone/workspace/convolution/Release/_sds/iprepo/repo  --iprepo /home/apirrone/Xilinx/SDx/2018.2/data/ip/xilinx  --platform /home/apirrone/zcu104-rv-ss-2018-2/zcu104_rv_ss/zcu104_rv_ss.xpfm  --temp_dir /home/apirrone/workspace/convolution/Release/_sds/p0  --output_dir /home/apirrone/workspace/convolution/Release/_sds/p0/vpl  --input_file /home/apirrone/workspace/convolution/Release/_sds/p0/.xsd/top.bd.tcl  --target hw   --save_temps  --kernels hw_new_convolution:adapter --webtalk_flag SDSoC  --remote_ip_cache /home/apirrone/workspace/ip_cache --xp "param:compiler.deleteDefaultReportConfigs=false" '
make: *** [convolution.elf] Erreur 1
sds++ log file saved as /home/apirrone/workspace/convolution/Release/_sds/reports/sds.log
ERROR: [SdsCompiler 83-5004] Build failed

 

Which fits nicely with my suspicions.

 

Thank you in advance for your help !

 

Antoine

0 Kudos
apirrone
Observer
Observer
857 Views
Registered: ‎04-26-2018

I still haven't found a solution ...

 

 

0 Kudos