cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Observer
Observer
716 Views
Registered: ‎01-05-2019

PYNQ AXI-4 stream freeze

Hi,

I've created a custom IP for my PYNQ board which is a DCT algorithm - there are 3 steps - perform DCT, perform Quantization, perform IDCT. I input a 32-bit uint (400x8x8) array into the IP and output the same (the input is converted to float and I then use the round() ftn to convert back to uint at the end). The code for the algorithm created in Vivado HLS works perfectly when I simulate it, however when I implement it onto my block design and try to call in from Jupyter Notebook it freezes. I've got no idea why as I have tried to implement only the Quantize part of the algorithm:

for (z = 0; z < 400; z++) {
	    for (i = 0; i < 8; i++) {
	    			for (j = 0; j < 8; j++) {
	    //#pragma HLS PIPELINE II=1

	    				tmp_data = 0;
	    				tmp_data_round = 0;
	    				tmp_data = tmp_int[z][i][j]/quantizeMatrix1[i][j];
	    				tmp_data_round = hls::round(tmp_data);
	    				quantized[z][i][j] = tmp_data_round*quantizeMatrix1[i][j];
	    
	    			}
	    		}

}

And it works perfectly (I can send and receive data to and from PYNQ through the DMA). But when I try to implement the full algorithm it freezes. I do not think it is a problem of data types as the quantize ftn above deals with float/int conversion and works fine on the board.

I'm guessing it has something to do with the cos/sqrt ftns translating into the hardware but again this is a complete guess.

The fact that it works in the C simulation is what is really confusing me. My block design is also correct and I am sending the correct info over the Jupyter notebook but it seems to be catching somewhere. Any help would be much appreciated. Here is the full code of my IP.

 

#include <complex> // std::complex<T>
#include <cmath>
#include <cassert>
#include <iostream>
#include <ap_int.h>
#include <ap_axi_sdata.h>
#include <hls_stream.h>
#include <hls_math.h>



#define pi 3.142857
typedef ap_axiu<32,1,1,1> stream_type;

void multiplyArray2(stream_type input[400][8][8], stream_type output[400][8][8]){

	#pragma HLS INTERFACE ap_ctrl_none PORT = return
	#pragma HLS INTERFACE axis PORT = input
	#pragma HLS INTERFACE axis PORT = output

	int z,i,j,k,l;
	float flt[400][8][8];
	float flt1[400][8][8];
	float sum1, sum2;
	float ci, cj;
	float dct1, dct2;
	ap_uint<32> rounded[400][8][8];
	//ap_uint<32> check = static_cast<ap_uint<32>>(flt);
    float constantEight = 8;
	float constantTwo = 2;
	float tmp_data;
	ap_uint<32> tmp_data_round;
	ap_uint<32> quantized[400][8][8];
	ap_uint<32> tmp_int[400][8][8];
	ap_uint<32> init_array[400][8][8];
	ap_uint<32> out_array[400][8][8];

	float quantizeMatrix[8][8] = {{16,12,14,14,18,24,49,72},
							        {11,12,13,17,22,35,64,92},
							        {10,14,16,22,37,55,78,95},
							        {16,19,24,29,56,64,87,98},
							        {24,26,40,51,68,81,103,112},
							        {40,58,57,87,109,104,121,100},
							        {51,60,69,80,103,113,120,103},
							        {61,55,56,62,77,92,101,99}};

	ap_uint<32> quantizeMatrix1[8][8] = {{16,12,14,14,18,24,49,72},
								        {11,12,13,17,22,35,64,92},
								        {10,14,16,22,37,55,78,95},
								        {16,19,24,29,56,64,87,98},
								        {24,26,40,51,68,81,103,112},
								        {40,58,57,87,109,104,121,100},
								        {51,60,69,80,103,113,120,103},
								        {61,55,56,62,77,92,101,99}};

	for (z = 0; z < 400; z++) {
			for (i = 0; i < 8; i++) {
		        for (j = 0; j < 8; j++) {
		        	init_array[z][i][j] = input[z][i][j].data;

		        }
		        }
		        }
	for (z = 0; z < 400; z++) {
		for (i = 0; i < 8; i++) {
	        for (j = 0; j < 8; j++) {

				sum1 = 0;

	        	for (k = 0; k < 8; k++) {
					for (l = 0; l < 8; l++) {
						dct1 = init_array[z][i][j] *
						hls::cosf((2 * k + 1) * i * pi / (2 * 8)) *
						hls::cosf((2 * l + 1) * j * pi / (2 * 8));
						sum1 = sum1 + dct1;

	        	//flt[i][j] = hls::cosf(input[i][j])+1000;
	        	//output[i][j] = hls::round(flt[i][j]);
					}
	        	}

	        	if (i == 0)
					ci = 1 / hls::sqrt(constantEight);
				else
					ci = hls::sqrt(constantTwo) / hls::sqrt(constantEight);
				if (j == 0)
					cj = 1 / hls::sqrt(constantEight);
				else
					cj = hls::sqrt(constantTwo) / hls::sqrt(constantEight);

	        	flt[z][i][j] = sum1*ci*cj;
	        	tmp_int[z][i][j] = hls::round(flt[z][i][j]);
	        }
	    }
}

	    //QUANTIZE

	for (z = 0; z < 400; z++) {
	    for (i = 0; i < 8; i++) {
	    			for (j = 0; j < 8; j++) {
	    //#pragma HLS PIPELINE II=1

	    				tmp_data = 0;
	    				tmp_data_round = 0;
	    				tmp_data = tmp_int[z][i][j]/quantizeMatrix1[i][j];
	    				tmp_data_round = hls::round(tmp_data);
	    				quantized[z][i][j] = tmp_data_round*quantizeMatrix1[i][j];
	    
	    			}
	    		}

}

	for (z = 0; z < 400; z++) {
	    for (i = 0; i < 8; i++) {
			for (j = 0; j < 8; j++) {


				sum2 = 0;

				for (k = 0; k < 8; k++) {
					for (l = 0; l < 8; l++) {

						dct2 = flt[z][k][l] *
						hls::cosf((2 * i + 1) * k * pi / (2 * 8)) *
						hls::cosf((2 * j + 1) * l * pi / (2 * 8));
						sum2 = sum2 + dct2;

				//flt1[i][j] = flt[i][j]/2*3.1423;
				//output[i][j] = hls::round(flt1[i][j]);
					}
				}

				if (k == 0)
					ci = 1 / hls::sqrt(constantEight);
				else
					ci = hls::sqrt(constantTwo) / hls::sqrt(constantEight);
				if (l == 0)
					cj = 1 / hls::sqrt(constantEight);
				else
					cj = hls::sqrt(constantTwo) / hls::sqrt(constantEight);

				flt1[z][i][j] = sum2*ci*cj;
				out_array[z][i][j] = hls::round(flt1[z][i][j]);


			}
		}
	}
	for (z = 0; z < 400; z++) {
		for (i = 0; i < 8; i++) {
			for (j = 0; j < 8; j++) {
				output[z][i][j].data = out_array[z][i][j];
				output[z][i][j].keep = input[z][i][j].keep;
				output[z][i][j].strb = input[z][i][j].strb;
				output[z][i][j].user = input[z][i][j].user;
				output[z][i][j].last = input[z][i][j].last;
				output[z][i][j].id = input[z][i][j].id;
				output[z][i][j].dest = input[z][i][j].dest;
			}
		}
	}

}

 

0 Kudos
1 Reply
Highlighted
Xilinx Employee
Xilinx Employee
617 Views
Registered: ‎09-05-2018

Hey @harryreid18 ,

I think this forum post relates to your issue: https://forums.xilinx.com/t5/Vivado-High-Level-Synthesis-HLS/IP-outputs-0s-instead-of-a-number/m-p/954827#M16102

Try preparing the axiu type in a temporary variable and then sending assigning that to the array.

Otherwise, you might consider double checking the waveform produced by the RTL co-simulation to make sure your notebook drives the signals in the same way.

Nicholas Moellers

Xilinx Worldwide Technical Support
0 Kudos