cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Visitor
Visitor
340 Views
Registered: ‎03-25-2020

Vitis DSP Library: 2d FFT Kernel Stalling

Hi, I am attempting to accelerate a series of FFT computations on an alveo u250 card using the Vitis DSP library. I've written an example based on the documentation.

When invoked using OpenCL, the Kernel works in sw emulation mode but stalls in hw emulation and hardware modes. HW emulation prints out the following repeatedly (for reference, 0.242KB is about 10% of total data):

INFO::[ Vitis-EM 22 ] [Time elapsed: 40 minute(s) 10 seconds, Emulation time: 0.355983 ms]
Data transfer between kernel(s) and global memory(s)
krnl_vitisdsp_fft2_1:m_axi_gmem-DDR[0] RD = 0.242 KB WR = 0.000 KB

Anyone got any ideas?

My code:

vitisdsp.hpp

#pragma once

#include "vitis_2dfft/float/vt_fft.hpp"
#include "vitis_2dfft/float/vitis_fft/hls_ssr_fft_2d_modeling_utilities.hpp"

#ifndef _TOP_2D_FFT_TEST_H_
#define _TOP_2D_FFT_TEST_H_
#ifndef __SYNTHESIS__
#include <iostream>
#endif


#ifndef __SYNTHESIS__
#include <iostream>
#endif

//typedef complex_wrapper<double> complex128;
typedef complex_wrapper<float> complex64;

typedef complex64 fft2_input;

// Width of the data stream in words
const int memory_width_bits = 512; // Taken from documentation, need to investigate reasons for this number
const int memory_width = memory_width_bits / (sizeof(fft2_input) * 8);

// Number of Kernels to run in parallel
const int kernel_radix = 4;
const int num_kernels = memory_width / kernel_radix;
const int kernel_size = 16; //NUM_ROWS*NUM_COLUMNS;

// Size of the dataset (set at compile time)
const int num_rows = kernel_size;
const int num_columns = kernel_size;
const int data_size = num_rows * num_columns;


// Taken from documentation, not sure what these are for yet!
const int row_instance_id_offset = 40000;
const int column_instance_id_offset = 80000;

struct RowParams: vitis::dsp::fft::ssr_fft_default_params {
    static const int N = kernel_size;
    static const int R = kernel_radix;
    static const vitis::dsp::fft::scaling_mode_enum scaling_mode = vitis::dsp::fft::SSR_FFT_SCALE;

    static const vitis::dsp::fft::transform_direction_enum transform_direction = vitis::dsp::fft::FORWARD_TRANSFORM;
};
struct ColumnParams: vitis::dsp::fft::ssr_fft_default_params {
    static const int N = kernel_size;
    static const int R = kernel_radix;
    static const vitis::dsp::fft::scaling_mode_enum scaling_mode = vitis::dsp::fft::SSR_FFT_SCALE;

    static const vitis::dsp::fft::transform_direction_enum transform_direction = vitis::dsp::fft::FORWARD_TRANSFORM;
};

typedef vitis::dsp::fft::FFTIOTypes<RowParams, fft2_input>::T_outType fft2_output_row;
typedef vitis::dsp::fft::FFTIOTypes<ColumnParams, fft2_output_row>::T_outType fft2_output;


typedef vitis::dsp::fft::WideTypeDefs<memory_width, fft2_input>::WideIFType MemWideIFTypeIn;
typedef vitis::dsp::fft::WideTypeDefs<memory_width, fft2_input>::WideIFStreamType MemWideIFStreamTypeIn;

typedef vitis::dsp::fft::WideTypeDefs<memory_width, fft2_output>::WideIFType MemWideIFTypeOut;
typedef vitis::dsp::fft::WideTypeDefs<memory_width, fft2_output>::WideIFStreamType MemWideIFStreamTypeOut;

extern "C" {
void krnl_vitisdsp_fft2(fft2_input input_buffer[num_rows][num_columns], fft2_output output_buffer[num_rows][num_columns]);
}

#endif

 

vitisdsp.cpp

#include <ap_int.h>
#include <hls_stream.h>
#include <string.h>

#include "vitisdsp.hpp"


//TRIPCOUNT identifiers
const unsigned int c_size = data_size;


// Simple debugging printout for emulator
void report_fft() {
	std::cout << "================================================================================" << std::endl;
	std::cout << "---------------------Calling 2D FFT Kernel with Parameters----------------------" << std::endl;
	std::cout << "================================================================================" << std::endl;
	std::cout << "    The Main Memory Width (no. complex<float>)   : " << memory_width << std::endl;
	std::cout << "    The Size of 1D Row Kernel                    : " << RowParams::N << std::endl;
	std::cout << "    The SSR for 1D Row Kernel                    : " << RowParams::R << std::endl;
	std::cout << "    The Transform Direction for Row Kernel       : "
			  << ((RowParams::transform_direction == vitis::dsp::fft::FORWARD_TRANSFORM) ? "Forward" : "Reverse");
	std::cout << std::endl;

	std::cout << "    The Size of 1D Column Kernel                 : " << ColumnParams::N << std::endl;
	std::cout << "    The SSR for 1D Column Kernel                    : " << ColumnParams::R << std::endl;
	std::cout << "    The Transform Direction for Row Kernel       : "
			  << ((ColumnParams::transform_direction == vitis::dsp::fft::FORWARD_TRANSFORM) ? "Forward" : "Reverse");
	std::cout << std::endl;

	std::cout << "    The Row Instance ID Offset                   : " << row_instance_id_offset << std::endl;
	std::cout << "    The Column Instance ID Offset                : " << column_instance_id_offset << std::endl;

	std::cout << "    Number of 1D Kernels Used Row/Col wise       : " << num_kernels << std::endl;
	std::cout << "    The Total Number of 1D Kernels Used(row+col) : " << 2 * num_kernels << std::endl;
	std::cout << "================================================================================" << std::endl;
}

extern "C" {
void krnl_vitisdsp_fft2(fft2_input input_buffer[num_rows][num_columns], fft2_output output_buffer[num_rows][num_columns]) {
	#pragma HLS INTERFACE m_axi port = input_buffer offset = slave bundle = gmem
	#pragma HLS INTERFACE m_axi port = output_buffer offset = slave bundle = gmem
	#pragma HLS INTERFACE s_axilite port = input_buffer bundle = control
	#pragma HLS INTERFACE s_axilite port = output_buffer bundle = control
	#pragma HLS INTERFACE s_axilite port = return bundle = control

	// Tells the compiler to unpack these variables as structures
	#pragma HLS data_pack variable = input_buffer
	#pragma HLS data_pack variable = output_buffer

	#ifndef __SYNTHESIS__
	report_fft();
	#endif

	// Declare input and output streams to global memory
	MemWideIFStreamTypeIn in_stream("fft2_instream");
	MemWideIFStreamTypeOut out_stream("fft2_outstream");

	// Tells the compiler to parallelise where possible
   #pragma HLS dataflow

	// Read input_buffer into stream
	stream2DMatrix<num_rows, num_columns, memory_width, fft2_input, MemWideIFTypeIn>(input_buffer, in_stream);

    // Call the library fft2 function, supplying many template parameters
	vitis::dsp::fft::fft2d<memory_width, num_rows, num_columns, num_kernels,
		RowParams, ColumnParams, row_instance_id_offset, column_instance_id_offset,
		fft2_input>(in_stream, out_stream);

	streamToMatrix<num_rows, num_columns, memory_width, fft2_output, MemWideIFTypeOut>(out_stream, output_buffer);


}
}

 

0 Kudos
2 Replies
Highlighted
254 Views
Registered: ‎07-02-2018

I am experiencing this exact same problem with the 1D FFT. I noticed that the example code didn't run the hw_emu. I can also see some postings that suggest the feature does not work. My questions:

1. Is this feature still supported?

2. Is it better to start with fixed point? I am currently using complex_wrapper<float>.

0 Kudos
Highlighted
52 Views
Registered: ‎07-10-2018

Dear @cf_diamond ,

Any news? Did you were able to solve?

Thanks and Regards,

Cristian

0 Kudos