Showing results for 
Show  only  | Search instead for 
Did you mean: 
Registered: ‎03-29-2019

Doubts on FFT

  • What is the meaning of the scaling and Unscaling of input in an IP perspective
    Both Unscaled and scaled take the same Input data width what is the difference between them.
  • We are applying a sinusoidal wave as input to the FFT IP whose signal strength varies from -10dbm to -30dbm, Please could you suggest to us which one to use scaled or unscaled?
  • What are the Standard ways of Calculating Amplitude using Xilinx IP's from the Output of FFT IP (From real and imaginary components)

Could you please tell me where I have gone wrong in the following simulation:


(Please find the attachment for Simulation screenshot)

The core is customized as-

Transform Length - 1024

Clk frequency - 100MHz

Pipelined streaming architecture

Data Format - Fixed point

Scaling options - Unscaled

Input Data width - 10

ARESETn is selected

Output ordering : Natural order

XKINDEX is selected

rest are default

Code :

`timescale 1ns / 1ps

`define clk_period 10

module FFT_matlab(
// Inputs
reg aclk;
reg s_axis_config_tvalid;
reg s_axis_data_tvalid;
reg s_axis_data_tlast;
reg m_axis_data_tready;
//reg [15:0] s_axis_config_tdata;
reg [7:0] s_axis_config_tdata;
reg [31:0] s_axis_data_tdata;
reg aresetn;

// Outputs
wire s_axis_config_tready;
wire s_axis_data_tready;
wire m_axis_data_tvalid;
wire m_axis_data_tlast;
wire event_frame_started;
wire event_tlast_unexpected;
wire event_tlast_missing;
wire event_status_channel_halt;
wire event_data_in_channel_halt;
wire event_data_out_channel_halt;
//wire [31:0] m_axis_data_tdata;
wire [47:0] m_axis_data_tdata;
wire [15:0]m_axis_data_tuser;
integer i;
// generate clk
always #(`clk_period/2) aclk =! aclk;

// Instantiate the Unit Under Test (UUT)
xfft_0 FFT_fltPt (
.aclk(aclk), // input wire aclk
.aresetn(aresetn), // input wire aresetn
.s_axis_config_tdata(s_axis_config_tdata), // input wire [15 : 0] s_axis_config_tdata
.s_axis_config_tvalid(s_axis_config_tvalid), // input wire s_axis_config_tvalid
.s_axis_config_tready(s_axis_config_tready), // output wire s_axis_config_tready
.s_axis_data_tdata(s_axis_data_tdata), // input wire [31 : 0] s_axis_data_tdata
.s_axis_data_tvalid(s_axis_data_tvalid), // input wire s_axis_data_tvalid
.s_axis_data_tready(s_axis_data_tready), // output wire s_axis_data_tready
.s_axis_data_tlast(s_axis_data_tlast), // input wire s_axis_data_tlast
.m_axis_data_tdata(m_axis_data_tdata), // output wire [31 : 0] m_axis_data_tdata
.m_axis_data_tuser(m_axis_data_tuser), // output wire [15 : 0] m_axis_data_tuser
.m_axis_data_tvalid(m_axis_data_tvalid), // output wire m_axis_data_tvalid
.m_axis_data_tready(m_axis_data_tready), // input wire m_axis_data_tready
.m_axis_data_tlast(m_axis_data_tlast), // output wire m_axis_data_tlast
.event_frame_started(event_frame_started), // output wire event_frame_started
.event_tlast_unexpected(event_tlast_unexpected), // output wire event_tlast_unexpected
.event_tlast_missing(event_tlast_missing), // output wire event_tlast_missing
.event_status_channel_halt(event_status_channel_halt), // output wire event_status_channel_halt
.event_data_in_channel_halt(event_data_in_channel_halt), // output wire event_data_in_channel_halt
.event_data_out_channel_halt(event_data_out_channel_halt) // output wire event_data_out_channel_halt
reg [9:0] data [1023:0];

$readmemb("InputData2.txt", data);

initial begin
// Initialize Inputs
aclk = 0;
s_axis_config_tvalid = 0;
s_axis_data_tvalid = 0;
s_axis_data_tlast = 0;
m_axis_data_tready = 0;
s_axis_config_tdata = 0;
s_axis_data_tdata = 0;
@(negedge aclk);
aresetn = 0;
// Wait 100 ns for global reset to finish

aresetn = 1;
// wait(event_frame_started ==1'b1);
// #(`clk_period);

m_axis_data_tready = 1;

s_axis_config_tvalid = 1;
//s_axis_config_tdata = 16'b0000000000000001; // FFT desired
s_axis_config_tdata = 8'b00000001;
wait(s_axis_config_tready == 1'b1);
s_axis_config_tvalid = 0;

s_axis_data_tvalid = 1;
s_axis_data_tdata[9:0] = data[i]; // I have a real input signal, so the upper half (corresponding to the immaginary part) is zero
s_axis_data_tdata[15:10] = 6'b000000;

s_axis_data_tdata[31:16] = 16'h0000;
wait(s_axis_data_tready == 1'b1);
s_axis_data_tlast = 1'b1;
s_axis_data_tvalid = 1'b0;
s_axis_data_tlast = 1'b0;










Tags (1)
0 Kudos
4 Replies
Registered: ‎05-21-2015


According to the FFT specification, the unscaled output has not been scaled to remove the natural bit growth of the FFT.  This means you have more bits than you need.  A quick statistical analysis of the FFT shows that the noise grows by one bit every two stages, so scaling by one bit every two stages will help to tame the computational burden of the FFT.

My recommendation is to always use scaled, and then to scale the FFT by one bit for every two stages.

In general, I avoid calculating amplitude as it rarely has the physical meaning I'm looking for.  I'm more likely to calculate the magnitude squared and then use that.  In one project, I wanted the log of the squared magnitude and was pleased to discover that if you shifted the squared magnitude value left until a "1" bit was in the MSB, the first several bits made a nice approximation to the logarithm.  Good enough for you?  Depends on your application.  Another approach to calculating amplitude, if that's what you actually needed, would be to take the square root of the magnitude.  I might also be tempted to take the maximum absolute value of both real and imaginary channels and use it for the total absolute value--but again, whether such a cheap and dirty approximation would work for you is very application dependent.  A CORDIC could work here as well, it's just rather expensive.

As to what you are doing wrong below, I might suggest that your FFT isn't (yet) long enough.  Run about 4 FFT lengths through the FFT before deciding that you have the right (or wrong) data.  From my own experience, that's about how long it takes to get data through it.

Also, if you are debugging an FFT, consider this approach.  It's a bit easier to use and easier to integrate with other tools, MATLAB or Octave, than using the internal logic analyzer.


Registered: ‎03-29-2019

@dgisselq  Thank you for replying.

I have configured the IP as scaled input and If I keep Input Datawidth as 10 it is accepting data as fix10_9, it means in 10 input bits 9 goes to fractional part, then what about the Integer part? What shall I do to input data before applying it to the IP?

As per your suggestions if the amplitude of FFT is not important then how shall we detect the frequency without using amplitude data. (In my application the number of signals could be 1 to 5 ).

Please could you tell me what's wrong in the simulation as I cannot go beyond using 1024 point FFT because of few conditions?




0 Kudos
Registered: ‎05-21-2015


  1. The FFT is linear.  Incoming scale and outgoing scale are linearly related.  If you want the results to be 2x larger, multiply the input by a factor of two.  If you use a scale schedule, then for every shift identified within it the output will be divided by two.  If you aren't certain what the scale factor is when going through the FFT, then follow a constant all "1"s signals through it and examine the scale on the other end.  I'm pretty certain 10_9 going into the FFT won't be 10_9 on the output--but you'll need to check to know.  On the other hand, if your whole goal is just frequency estimation, then the output scale shouldn't matter as long as you aren't overflowing anywhere.
  2. If you read my discussion above, you'll remember that I compared the amplitude with the squared magnitude.  The difference between these two is the difference between |z| and |z|^2.  Both hold (roughly) the same information.  Both can be used as part of a peak finding algorithm to estimate frequency.  Neither is truly appropriate for estimating sub-bin frequency resolution.  Of the two, magnitude squared is easy to calculate but amplitude is not.  Now it's your turn--which of the two do you want to use and how much work do you want to put into finding the "right" bin?
  3. As for what's wrong with your simulation, the first thing I noticed was all the blocking assignments.  These will make your simulation asynchronous, yet plotted on a synchronous trace.  This practice is known for hiding nasty bugs that don't show up in the trace and so such simulations can be very difficult to debug.

Finally, I have no idea what you mean by, "I cannot go beyond using 1024 point FFT because of few conditions."  That sentence doesn't make any sense to me.


0 Kudos
Xilinx Employee
Xilinx Employee
Registered: ‎09-18-2018

Hi @pavankumar ,

Regarding the scaling option in FFT, it is used to scale down the output bit growth as a result of FFT. This will lead to an output width same as that of input width.

The input and output bus width and the data format for each option can be noticed in the FFT IP GUI's " Implementation tab". Please see the snapshot which shows output width if unscaled is chosen.

The data format to the IP core indicates how the data has to interpreted at input and output of the core.

Also in the simulation, could you insert zero samples into the core after the 1024 valid values are provided. The unknown 'x' seems to provide the unknown 'x' values on the output bus.


0 Kudos