We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for
Did you mean:
Observer
286 Views
Registered: ‎01-05-2019

## PYNQ AXI-4 stream freeze

Hi,

I've created a custom IP for my PYNQ board which is a DCT algorithm - there are 3 steps - perform DCT, perform Quantization, perform IDCT. I input a 32-bit uint (400x8x8) array into the IP and output the same (the input is converted to float and I then use the round() ftn to convert back to uint at the end). The code for the algorithm created in Vivado HLS works perfectly when I simulate it, however when I implement it onto my block design and try to call in from Jupyter Notebook it freezes. I've got no idea why as I have tried to implement only the Quantize part of the algorithm:

```for (z = 0; z < 400; z++) {
for (i = 0; i < 8; i++) {
for (j = 0; j < 8; j++) {
//#pragma HLS PIPELINE II=1

tmp_data = 0;
tmp_data_round = 0;
tmp_data = tmp_int[z][i][j]/quantizeMatrix1[i][j];
tmp_data_round = hls::round(tmp_data);
quantized[z][i][j] = tmp_data_round*quantizeMatrix1[i][j];

}
}

}```

And it works perfectly (I can send and receive data to and from PYNQ through the DMA). But when I try to implement the full algorithm it freezes. I do not think it is a problem of data types as the quantize ftn above deals with float/int conversion and works fine on the board.

I'm guessing it has something to do with the cos/sqrt ftns translating into the hardware but again this is a complete guess.

The fact that it works in the C simulation is what is really confusing me. My block design is also correct and I am sending the correct info over the Jupyter notebook but it seems to be catching somewhere. Any help would be much appreciated. Here is the full code of my IP.

```#include <complex> // std::complex<T>
#include <cmath>
#include <cassert>
#include <iostream>
#include <ap_int.h>
#include <ap_axi_sdata.h>
#include <hls_stream.h>
#include <hls_math.h>

#define pi 3.142857
typedef ap_axiu<32,1,1,1> stream_type;

void multiplyArray2(stream_type input[400][8][8], stream_type output[400][8][8]){

#pragma HLS INTERFACE ap_ctrl_none PORT = return
#pragma HLS INTERFACE axis PORT = input
#pragma HLS INTERFACE axis PORT = output

int z,i,j,k,l;
float flt[400][8][8];
float flt1[400][8][8];
float sum1, sum2;
float ci, cj;
float dct1, dct2;
ap_uint<32> rounded[400][8][8];
//ap_uint<32> check = static_cast<ap_uint<32>>(flt);
float constantEight = 8;
float constantTwo = 2;
float tmp_data;
ap_uint<32> tmp_data_round;
ap_uint<32> quantized[400][8][8];
ap_uint<32> tmp_int[400][8][8];
ap_uint<32> init_array[400][8][8];
ap_uint<32> out_array[400][8][8];

float quantizeMatrix[8][8] = {{16,12,14,14,18,24,49,72},
{11,12,13,17,22,35,64,92},
{10,14,16,22,37,55,78,95},
{16,19,24,29,56,64,87,98},
{24,26,40,51,68,81,103,112},
{40,58,57,87,109,104,121,100},
{51,60,69,80,103,113,120,103},
{61,55,56,62,77,92,101,99}};

ap_uint<32> quantizeMatrix1[8][8] = {{16,12,14,14,18,24,49,72},
{11,12,13,17,22,35,64,92},
{10,14,16,22,37,55,78,95},
{16,19,24,29,56,64,87,98},
{24,26,40,51,68,81,103,112},
{40,58,57,87,109,104,121,100},
{51,60,69,80,103,113,120,103},
{61,55,56,62,77,92,101,99}};

for (z = 0; z < 400; z++) {
for (i = 0; i < 8; i++) {
for (j = 0; j < 8; j++) {
init_array[z][i][j] = input[z][i][j].data;

}
}
}
for (z = 0; z < 400; z++) {
for (i = 0; i < 8; i++) {
for (j = 0; j < 8; j++) {

sum1 = 0;

for (k = 0; k < 8; k++) {
for (l = 0; l < 8; l++) {
dct1 = init_array[z][i][j] *
hls::cosf((2 * k + 1) * i * pi / (2 * 8)) *
hls::cosf((2 * l + 1) * j * pi / (2 * 8));
sum1 = sum1 + dct1;

//flt[i][j] = hls::cosf(input[i][j])+1000;
//output[i][j] = hls::round(flt[i][j]);
}
}

if (i == 0)
ci = 1 / hls::sqrt(constantEight);
else
ci = hls::sqrt(constantTwo) / hls::sqrt(constantEight);
if (j == 0)
cj = 1 / hls::sqrt(constantEight);
else
cj = hls::sqrt(constantTwo) / hls::sqrt(constantEight);

flt[z][i][j] = sum1*ci*cj;
tmp_int[z][i][j] = hls::round(flt[z][i][j]);
}
}
}

//QUANTIZE

for (z = 0; z < 400; z++) {
for (i = 0; i < 8; i++) {
for (j = 0; j < 8; j++) {
//#pragma HLS PIPELINE II=1

tmp_data = 0;
tmp_data_round = 0;
tmp_data = tmp_int[z][i][j]/quantizeMatrix1[i][j];
tmp_data_round = hls::round(tmp_data);
quantized[z][i][j] = tmp_data_round*quantizeMatrix1[i][j];

}
}

}

for (z = 0; z < 400; z++) {
for (i = 0; i < 8; i++) {
for (j = 0; j < 8; j++) {

sum2 = 0;

for (k = 0; k < 8; k++) {
for (l = 0; l < 8; l++) {

dct2 = flt[z][k][l] *
hls::cosf((2 * i + 1) * k * pi / (2 * 8)) *
hls::cosf((2 * j + 1) * l * pi / (2 * 8));
sum2 = sum2 + dct2;

//flt1[i][j] = flt[i][j]/2*3.1423;
//output[i][j] = hls::round(flt1[i][j]);
}
}

if (k == 0)
ci = 1 / hls::sqrt(constantEight);
else
ci = hls::sqrt(constantTwo) / hls::sqrt(constantEight);
if (l == 0)
cj = 1 / hls::sqrt(constantEight);
else
cj = hls::sqrt(constantTwo) / hls::sqrt(constantEight);

flt1[z][i][j] = sum2*ci*cj;
out_array[z][i][j] = hls::round(flt1[z][i][j]);

}
}
}
for (z = 0; z < 400; z++) {
for (i = 0; i < 8; i++) {
for (j = 0; j < 8; j++) {
output[z][i][j].data = out_array[z][i][j];
output[z][i][j].keep = input[z][i][j].keep;
output[z][i][j].strb = input[z][i][j].strb;
output[z][i][j].user = input[z][i][j].user;
output[z][i][j].last = input[z][i][j].last;
output[z][i][j].id = input[z][i][j].id;
output[z][i][j].dest = input[z][i][j].dest;
}
}
}

}```

Xilinx Employee
187 Views
Registered: ‎09-05-2018

## Re: PYNQ AXI-4 stream freeze

Hey @harryreid18 ,