UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Observer harryreid18
Observer
323 Views
Registered: ‎01-05-2019

PYNQ AXI-4 stream freeze

Hi,

I've created a custom IP for my PYNQ board which is a DCT algorithm - there are 3 steps - perform DCT, perform Quantization, perform IDCT. I input a 32-bit uint (400x8x8) array into the IP and output the same (the input is converted to float and I then use the round() ftn to convert back to uint at the end). The code for the algorithm created in Vivado HLS works perfectly when I simulate it, however when I implement it onto my block design and try to call in from Jupyter Notebook it freezes. I've got no idea why as I have tried to implement only the Quantize part of the algorithm:

for (z = 0; z < 400; z++) {
	    for (i = 0; i < 8; i++) {
	    			for (j = 0; j < 8; j++) {
	    //#pragma HLS PIPELINE II=1

	    				tmp_data = 0;
	    				tmp_data_round = 0;
	    				tmp_data = tmp_int[z][i][j]/quantizeMatrix1[i][j];
	    				tmp_data_round = hls::round(tmp_data);
	    				quantized[z][i][j] = tmp_data_round*quantizeMatrix1[i][j];
	    
	    			}
	    		}

}

And it works perfectly (I can send and receive data to and from PYNQ through the DMA). But when I try to implement the full algorithm it freezes. I do not think it is a problem of data types as the quantize ftn above deals with float/int conversion and works fine on the board.

I'm guessing it has something to do with the cos/sqrt ftns translating into the hardware but again this is a complete guess.

The fact that it works in the C simulation is what is really confusing me. My block design is also correct and I am sending the correct info over the Jupyter notebook but it seems to be catching somewhere. Any help would be much appreciated. Here is the full code of my IP.

 

#include <complex> // std::complex<T>
#include <cmath>
#include <cassert>
#include <iostream>
#include <ap_int.h>
#include <ap_axi_sdata.h>
#include <hls_stream.h>
#include <hls_math.h>



#define pi 3.142857
typedef ap_axiu<32,1,1,1> stream_type;

void multiplyArray2(stream_type input[400][8][8], stream_type output[400][8][8]){

	#pragma HLS INTERFACE ap_ctrl_none PORT = return
	#pragma HLS INTERFACE axis PORT = input
	#pragma HLS INTERFACE axis PORT = output

	int z,i,j,k,l;
	float flt[400][8][8];
	float flt1[400][8][8];
	float sum1, sum2;
	float ci, cj;
	float dct1, dct2;
	ap_uint<32> rounded[400][8][8];
	//ap_uint<32> check = static_cast<ap_uint<32>>(flt);
    float constantEight = 8;
	float constantTwo = 2;
	float tmp_data;
	ap_uint<32> tmp_data_round;
	ap_uint<32> quantized[400][8][8];
	ap_uint<32> tmp_int[400][8][8];
	ap_uint<32> init_array[400][8][8];
	ap_uint<32> out_array[400][8][8];

	float quantizeMatrix[8][8] = {{16,12,14,14,18,24,49,72},
							        {11,12,13,17,22,35,64,92},
							        {10,14,16,22,37,55,78,95},
							        {16,19,24,29,56,64,87,98},
							        {24,26,40,51,68,81,103,112},
							        {40,58,57,87,109,104,121,100},
							        {51,60,69,80,103,113,120,103},
							        {61,55,56,62,77,92,101,99}};

	ap_uint<32> quantizeMatrix1[8][8] = {{16,12,14,14,18,24,49,72},
								        {11,12,13,17,22,35,64,92},
								        {10,14,16,22,37,55,78,95},
								        {16,19,24,29,56,64,87,98},
								        {24,26,40,51,68,81,103,112},
								        {40,58,57,87,109,104,121,100},
								        {51,60,69,80,103,113,120,103},
								        {61,55,56,62,77,92,101,99}};

	for (z = 0; z < 400; z++) {
			for (i = 0; i < 8; i++) {
		        for (j = 0; j < 8; j++) {
		        	init_array[z][i][j] = input[z][i][j].data;

		        }
		        }
		        }
	for (z = 0; z < 400; z++) {
		for (i = 0; i < 8; i++) {
	        for (j = 0; j < 8; j++) {

				sum1 = 0;

	        	for (k = 0; k < 8; k++) {
					for (l = 0; l < 8; l++) {
						dct1 = init_array[z][i][j] *
						hls::cosf((2 * k + 1) * i * pi / (2 * 8)) *
						hls::cosf((2 * l + 1) * j * pi / (2 * 8));
						sum1 = sum1 + dct1;

	        	//flt[i][j] = hls::cosf(input[i][j])+1000;
	        	//output[i][j] = hls::round(flt[i][j]);
					}
	        	}

	        	if (i == 0)
					ci = 1 / hls::sqrt(constantEight);
				else
					ci = hls::sqrt(constantTwo) / hls::sqrt(constantEight);
				if (j == 0)
					cj = 1 / hls::sqrt(constantEight);
				else
					cj = hls::sqrt(constantTwo) / hls::sqrt(constantEight);

	        	flt[z][i][j] = sum1*ci*cj;
	        	tmp_int[z][i][j] = hls::round(flt[z][i][j]);
	        }
	    }
}

	    //QUANTIZE

	for (z = 0; z < 400; z++) {
	    for (i = 0; i < 8; i++) {
	    			for (j = 0; j < 8; j++) {
	    //#pragma HLS PIPELINE II=1

	    				tmp_data = 0;
	    				tmp_data_round = 0;
	    				tmp_data = tmp_int[z][i][j]/quantizeMatrix1[i][j];
	    				tmp_data_round = hls::round(tmp_data);
	    				quantized[z][i][j] = tmp_data_round*quantizeMatrix1[i][j];
	    
	    			}
	    		}

}

	for (z = 0; z < 400; z++) {
	    for (i = 0; i < 8; i++) {
			for (j = 0; j < 8; j++) {


				sum2 = 0;

				for (k = 0; k < 8; k++) {
					for (l = 0; l < 8; l++) {

						dct2 = flt[z][k][l] *
						hls::cosf((2 * i + 1) * k * pi / (2 * 8)) *
						hls::cosf((2 * j + 1) * l * pi / (2 * 8));
						sum2 = sum2 + dct2;

				//flt1[i][j] = flt[i][j]/2*3.1423;
				//output[i][j] = hls::round(flt1[i][j]);
					}
				}

				if (k == 0)
					ci = 1 / hls::sqrt(constantEight);
				else
					ci = hls::sqrt(constantTwo) / hls::sqrt(constantEight);
				if (l == 0)
					cj = 1 / hls::sqrt(constantEight);
				else
					cj = hls::sqrt(constantTwo) / hls::sqrt(constantEight);

				flt1[z][i][j] = sum2*ci*cj;
				out_array[z][i][j] = hls::round(flt1[z][i][j]);


			}
		}
	}
	for (z = 0; z < 400; z++) {
		for (i = 0; i < 8; i++) {
			for (j = 0; j < 8; j++) {
				output[z][i][j].data = out_array[z][i][j];
				output[z][i][j].keep = input[z][i][j].keep;
				output[z][i][j].strb = input[z][i][j].strb;
				output[z][i][j].user = input[z][i][j].user;
				output[z][i][j].last = input[z][i][j].last;
				output[z][i][j].id = input[z][i][j].id;
				output[z][i][j].dest = input[z][i][j].dest;
			}
		}
	}

}

 

0 Kudos
1 Reply
Xilinx Employee
Xilinx Employee
224 Views
Registered: ‎09-05-2018

Re: PYNQ AXI-4 stream freeze

Hey @harryreid18 ,

I think this forum post relates to your issue: https://forums.xilinx.com/t5/Vivado-High-Level-Synthesis-HLS/IP-outputs-0s-instead-of-a-number/m-p/954827#M16102

Try preparing the axiu type in a temporary variable and then sending assigning that to the array.

Otherwise, you might consider double checking the waveform produced by the RTL co-simulation to make sure your notebook drives the signals in the same way.

Nicholas Moellers

Xilinx Worldwide Technical Support
0 Kudos