The Xilinx Fast Fourier Transform (FFT) IP has a scaling feature to handle the bit growth in FFT output. This article provides insight into the scaling methods available in the IP and a method to select a scaling schedule to avoid overflow is discussed.
Reason for Scaling:
The Fast Fourier Transform Product guide (PG109) explains that scaling is used to avoid bit growth in the output as a result of the arithmetic operations used to implement the FFT algorithm. This can be seen from the worst case scenarios that can happen in Radix-2 and Radix-4 FFT implementation.
In a Radix-2 butterfly, the output of the butterfly can grow by a factor of 1+sqrt(2) which is 2.414. So, in this case the output grows by 2 bits based on the signed format used for FFT. Similarly, the output of Radix-4 can increase by a factor of 1+3sqrt(2)=5.24 which is a growth of 3 bits.
Every subsequent butterfly stage can introduce this bit growth as a worst case. If the number of bits in the output are required to be the same as number of bits in the input, then this bit growth needs to be handled.
So for Radix-4, the bits can be shifted by 1,2 or 3 bits to the right based on the value in the scaling schedule. Correspondingly the output value is scaled down by 2^(scaling value) and the result can be handled without any bit growth in the output.
If scaling is not used, then it results in an unscaled output which has an integer value for the result of the bit growth, as explained above. The output precision will now contain an integer part and the binary point remains in the same place as in the input precision. In total, the output precision is (C_INPUT_WIDTH + C_NFFT_MAX + 1) bits and there are still (C_INPUT_WIDTH - 1) fractional bits after the binary point.
Types of Scaling
The Scaling option is only available in the FFT IP when the data format is of fixed point type. Let's assume that the input precision for the FFT IP is of type Fix(C_INPUT_WIDTH,C_INPUT_WIDTH - 1) .
There are three scaling options available for the FFT IP:
Block Floating Point
Unscaled: As the name suggests, no scaling is applied on the data within the FFT IP. This leads to bit growth in the output. The output precision will now contain an integer part and the growth in the LSB is either truncated or rounded. Output precision will now be "Fix(C_INPUT_WIDTH + C_NFFT_MAX + 1, C_INPUT_WIDTH - 1)", where "C_NFFT_MAX" is given as log2(maximum FFT point size).
Scaled: The output of the FFT IP is scaled down through the right shifting of bits in each FFT stage. The amount of shift in each stage is set using scaling_sch and is supplied using the "Configuration interface" on the FFT IP. Because of scaling, the output precision of the FFT IP is the same as the input precision i.e. Fix((C_INPUT_WIDTH,C_INPUT_WIDTH - 1).
Care has to be taken to provide sufficient scaling for each stage, otherwise an overflow of the result occurs which leads to data wrapping around. If Overflow occurs, data might not be usable further in the application. (PG109) describes the width and format of the scaling schedule field of the FFT IP in Vivado, C/Mex-model, and the FFT IP block in System Generator.
Block Floating point (BFP): In this mode, the core automatically determines a scaling value which best uses the available dynamic range and avoids an overflow. The output precision is the same as the input precision i.e. Fix((C_INPUT_WIDTH,C_INPUT_WIDTH - 1). The total number of right bit shifts applied by the core is output as an exponent value (blk_exp) on the "m_axis_data_tuser" and m_axis_status_tdata" interfaces. The scaling factor is computed as (2^blk_exp). Because the core automatically computes the scaling value, it makes use of additional resources leading to a higher resource count than in "scaled" mode.
Determining a scaling schedule
When the scaled mode is used in the FFT IP, the scaling schedule to avoid an overflow needs to be determined by running system simulations.
It can be difficult to choose the best scaling schedule that avoids overflow in a sufficiently large proportion of frames for a particular type of input data. The C-model is a tool that can help with the selection of a scaling schedule.
The decision to increase or decrease the scaling in each stage can be determined by monitoring the overflow signal in the status bus while running simulations with a typical input data set. (PG109), chapter 5 explains about the FFT C-model and MATLAB model in detail.
(PG109) also describes a conservative scaling schedule for each FFT architecture type. This scaling schedule avoids overflow for all data types.
While the conservative scaling might prevent an overflow for all input values, it might also reduce the output values due to excessive scaling. This might result in inefficient use of the available dynamic range and overscale of the output value. As a result, simulations need to be performed to determine the scaling factor that provides an output which uses the available dynamic range efficiently as well as avoiding an overflow.
Another method to determine such a scaling value is by using the FFT IP in Block-Floating Point (BFP) mode. The FFT C-model can be used in BFP mode and simulations can be run with the input data. The BFP will yield a block exponent value in the status interface which indicates the amount of scaling applied by the core to best use the dynamic range and avoid the overflow in output.
Because the Block exponent in the BFP indicates the number of bit shifts internal to the core, the scaling schedule in the "Scaled" mode can be set to shift the same number of bits. For example, if in BFP mode and for N=1024, if the block exponent value is '9', then the "scaling_ sch" in the scaled mode can be set to shift 9 bits. There are many ways to achieve the same bit shift across different FFT stages, so a suitable way to configure the "scaling_sch" is to follow the conservative scaling schedule pattern.
In the above example the block exponent of 9 can be configured in the "Scaling_sch" as ( 00 10 10 10 11), which causes shifting by 9 bits. This scaling schedule provides better output compared to the conservative scaling schedule which can over-scale the data.
An example in MATLAB using the FFT Mex model is used to observe the effects of scaling on the output. The Mex model is used so that data plots can be used. In the example, FFT of 14 different types of input frames are calculated using different scaling schedules for each frame. The MEX file is included in the ZIP file attached to this blog.
Initially FFT configured in BFP mode is used to determine the block exponent for each frame. Then the FFT is configured in scaled mode and output is calculated for each frame by setting scaling_schedule to the following:
A conservative scaling value.
A scaling value equivalent to the block exponent obtained for that frame in BFP mode.
The magnitudes of FFT obtained through Block Floating pointing (BFP) mode and Scaled mode using the scaling factors described above are then plotted. Below is one such plot. It can be observed in the plot that while conservative scaling has over-scaled the data and avoided overflow, the BFP output and scaled FFT output with block exponent output provide a higher output magnitude while avoiding an overflow.
Limitations of the example: The example uses a .txt file containing 14 frames of 2048 samples. To check for other FFT point sizes 'N' , generate data frames that have 'N' samples.
FFT IP Output. Red line - BFP mode output, green dotted - FFT IP output in Scaled mode and scaling factor set to block exponent & Blue - output with conservative scale factor.
The m-script attached here requires FFT IP C-model files. To run this example, unzip the directory and copy the example files (m-script and the text file) into the C-model directory which is created when the FFT IP output products are generated in Vivado.
After copying the files, the Mex file provided for FFT (make_xfft_v9_1_mex.m) needs to be built. For further instructions on using the FFT C-model, please refer to (PG109), chapter 5: C-model.