UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Visitor ejcspii
Visitor
1,296 Views
Registered: ‎05-25-2018

Achieving low II with hls::fft

Hello,

 

I'm still trying to get the Xilinx factory example (single complex FFT) to work under HLS. This is essentially the same as the example supplied with Vivado.

 

#include "ap_fixed.h"
#include "hls_fft.h"
#include "hls_x_complex.h"

// configurable params
const char FFT_INPUT_WIDTH = 16;
const char FFT_OUTPUT_WIDTH = FFT_INPUT_WIDTH;
const char FFT_CONFIG_WIDTH = 16;
const char NFFT = 8;
const int FFT_LENGTH = (1 << NFFT);

struct config1 : hls::ip_fft::params_t {
static const unsigned ordering_opt = hls::ip_fft::natural_order;
static const unsigned config_width = FFT_CONFIG_WIDTH;
static const unsigned max_nfft = NFFT;
static const unsigned input_width = FFT_INPUT_WIDTH;
static const unsigned output_width = FFT_OUTPUT_WIDTH;
static const bool has_nfft = false; // no runtime config of the size
};

typedef hls::ip_fft::config_t<config1> fft_config_t;
typedef hls::ip_fft::status_t<config1> fft_status_t;

 

typedef ap_fixed<FFT_INPUT_WIDTH, 1> data_in_t;
typedef ap_fixed<FFT_OUTPUT_WIDTH, FFT_OUTPUT_WIDTH - FFT_INPUT_WIDTH + 1> data_out_t;
typedef hls::x_complex<data_in_t> cplx_data_in_t;
typedef hls::x_complex<data_out_t> cplx_data_out_t;

 

void dummy_proc_fe(fft_config_t *config, cplx_data_in_t in[FFT_LENGTH], cplx_data_in_t out[FFT_LENGTH]) {
#pragma HLS INTERFACE ap_fifo port=config
config->setDir(true);
config->setSch(0x2AB);

fe_loop: for (int i = 0; i < FFT_LENGTH; ++i) {
out[i] = in[i];
}
}

void dummy_proc_be(fft_status_t *status_in, bool *ovflo,
cplx_data_out_t in[FFT_LENGTH], cplx_data_out_t out[FFT_LENGTH]) {
#pragma HLS INTERFACE ap_fifo port=status_in
be_loop: for (int i = 0; i < FFT_LENGTH; ++i) {
out[i] = in[i];
}
*ovflo = status_in->getOvflo() & 0x1;
}

void spectrum_estimate(hls::x_complex<data_in_t> in[FFT_LENGTH],
hls::x_complex<data_out_t> out[FFT_LENGTH],
bool *ovflo) {
#pragma HLS INTERFACE ap_ctrl_none port=return
#pragma HLS INTERFACE ap_fifo depth=1 port=ovflo
#pragma HLS INTERFACE axis port=in
#pragma HLS INTERFACE axis port=out
#pragma HLS dataflow
hls::x_complex<data_in_t> xn[FFT_LENGTH];
hls::x_complex<data_out_t> xk[FFT_LENGTH];
fft_config_t fft_config;
fft_status_t fft_status;

dummy_proc_fe(&fft_config, in, xn);
hls::fft<config1>(xn, xk, &fft_status, &fft_config);
dummy_proc_be(&fft_status, ovflo, xk, out);
}

 

HLS, even if/when succeeds, identifies three functions: 

'dummy_proc_fe'
'_codeRepl__proc'
'dummy_proc_be'.

 

I can pipeline dummy_proc_fe and dummy_proc_be so that II=256 can be achieved for a 256-point FFT.

However, the II of the "black box" is always more than three times higher:

* Instance:
+--------------------+-----------------+-----+-----+-----+-----+----------+
| | | Latency | Interval | Pipeline |
| Instance | Module | min | max | min | max | Type |
+--------------------+-----------------+-----+-----+-----+-----+----------+
|p_codeRepl_proc_U0 |p_codeRepl_proc | 875| 875| 875| 875| none |
|dummy_proc_be_U0 |dummy_proc_be | 256| 256| 256| 256| function |
|dummy_proc_fe_U0 |dummy_proc_fe | 256| 256| 256| 256| function |
+--------------------+-----------------+-----+-----+-----+-----+----------+

 

owing to

* Instance:
+-------------------------+---------------+-----+-----+-----+-----+---------+
| | | Latency | Interval | Pipeline|
| Instance | Module | min | max | min | max | Type |
+-------------------------+---------------+-----+-----+-----+-----+---------+
|grp_fft_config1_s_fu_68 |fft_config1_s | 874| 874| 874| 874| none |
+-------------------------+---------------+-----+-----+-----+-----+---------+

 

And this large II scales with the size of the FFT. Dataflow optimization is not enough, other optimizations did not bring any gain, either. I don't understand what grp_fft_config1_s_fu_68 is doing.

 

So my question: is it possible to achieve an II=1 (i.e., II=257 for a 256-pt transform) using HLS?

 

On a side note, I understand that the actual FFT size and the scaling schedule are treated as "dynamic" parameters (although I bet 99.999% of the applications would involve a constant FFT size and the same scaling throughout). Is it possible to define these parameters once upfront, and avoid passing the config class around, as opposed to the way it is done in the factory example?

Thanks,

Peter

 

 

Tags (3)
0 Kudos
5 Replies
Xilinx Employee
Xilinx Employee
1,257 Views
Registered: ‎05-06-2008

Re: Achieving low II with hls::fft

Hello @ejcspii,

 

Does the code pass CSim after you modified the example code?

  I was not able get the code to pass CSim after making the modification you listed below.

 

I am still reviewing the rest of the code to see where we make changes to get closer to II=1.

 

Thanks,
Chris

0 Kudos
Visitor ejcspii
Visitor
1,241 Views
Registered: ‎05-25-2018

Re: Achieving low II with hls::fft

Hello @chrisz,

 

thanks for looking into this issue. My code worked in csim, but let me reproduce the issue from scratch, using only unmodified Xilinx example code.

 

1) Take the single complex FFT example from <Vivado installation>/Vivado/2017.4/examples/design/FFT/fft_single_x_complex/

(fft_top.h, fft_top.cpp, fft_tb.cpp as they are)

 

2) Build the C simulation. It is going to work.

 

3) Try to synthesize the design (xc7k410tffg900-2, Tclk = 5ns). It fails:

ERROR: [XFORM 203-801] Interface mode 'ap_auto' on the actual argument 'fft_status.data.V' (./fft_top.cpp:136) is incompatible with the mode 'ap_fifo' on the formal argument 'fft_status.data.V' for function '_codeRepl__proc' . Please consider to duplicate the function to avoid mode conflicts.
ERROR: [XFORM 203-801] Interface mode 'ap_auto' on the actual argument 'fft_config.data.V' (./fft_top.cpp:135) is incompatible with the mode 'ap_fifo' on the formal argument 'fft_config.data.V' for function '_codeRepl__proc' . Please consider to duplicate the function to avoid mode conflicts.
ERROR: [HLS 200-70] Failed building synthesis data model.
command 'ap_source' returned error code
while executing

 

4) Try to heal the somewhat mysterious issue by specifying FIFO ports in the front- and backend:

#pragma HLS INTERFACE ap_fifo port=config

in the fft_fe function,

and #pragma HLS INTERFACE ap_fifo port=status

in the fft_be function.

 

5) Looking at the report, and recalling that the FFT size in the example is 1024, 

+ Latency (clock cycles):
* Summary:
+------+------+------+------+----------+
| Latency | Interval | Pipeline |
| min | max | min | max | Type |
+------+------+------+------+----------+
| 3198| 3198| 3197| 3197| dataflow |
+------+------+------+------+----------+

+ Detail:
* Instance:
+--------------------+-----------------+------+------+------+------+---------+
| | | Latency | Interval | Pipeline|
| Instance | Module | min | max | min | max | Type |
+--------------------+-----------------+------+------+------+------+---------+
|p_codeRepl_proc_U0 |p_codeRepl_proc | 3196| 3196| 3196| 3196| none |
|dummy_proc_fe_U0 |dummy_proc_fe | 1025| 1025| 1025| 1025| none |
|dummy_proc_be_U0 |dummy_proc_be | 1025| 1025| 1025| 1025| none |
+--------------------+-----------------+------+------+------+------+---------+

we see that the II of the "black box" p_codeRepl_proc_U0 is about 3x the II required for "average II = 1". 

 

6) Side note. Curiously, although the front- and backend functions yield an II=1025 for NFFT=1024 outof the box, if I try to decrease the FFT size to 256, I need to apply explicit pipelining directives to achieve II=257 (256 actually) on these functions. I'm unsure why.

 

Best regards,

Peter

 

0 Kudos
Xilinx Employee
Xilinx Employee
1,230 Views
Registered: ‎05-06-2008

Re: Achieving low II with hls::fft

Hello @ejcspii,

 

It appears that some of these issues are resolved in Vivado HLS 2018.2.  Can you migrate to Vivado HLS 2018.2?

 

 

Thanks,
Chris

0 Kudos
Visitor ejcspii
Visitor
1,213 Views
Registered: ‎05-25-2018

Re: Achieving low II with hls::fft

Hello @chrisz,

 

migrating to 2018.2 is sort of a problem for our design but I gave it a shot. The interface issues in fact disappeared but the latency issue is still there:

+ Latency (clock cycles):
* Summary:
+------+------+------+------+----------+
| Latency | Interval | Pipeline |
| min | max | min | max | Type |
+------+------+------+------+----------+
| 3195| 3195| 3196| 3196| dataflow |
+------+------+------+------+----------+

+ Detail:
* Instance:
+------------------+---------------+------+------+------+------+---------+
| | | Latency | Interval | Pipeline|
| Instance | Module | min | max | min | max | Type |
+------------------+---------------+------+------+------+------+---------+
|dummy_proc_fe_U0 |dummy_proc_fe | 1025| 1025| 1025| 1025| none |
|dummy_proc_be_U0 |dummy_proc_be | 1025| 1025| 1025| 1025| none |
|fft_config1_U0 |fft_config1_s | 3195| 3195| 3195| 3195| none |
+------------------+---------------+------+------+------+------+---------+


Can I do something about it?

Peter

 

0 Kudos
Xilinx Employee
Xilinx Employee
1,111 Views
Registered: ‎05-06-2008

Re: Achieving low II with hls::fft

Hello @ejcspii,

 

I was able to get similar results, but I do not have any suggestions to reduce the II at this time.   I have asked the developer for assistance, but I have not heard back from them.  I will keep you posted on their response.

 

Thanks,
Chris

0 Kudos