04-26-2019 01:46 PM
I am trying to investigate how sparsity (the number of zeroes in the filter coefficients) affects the implementation of a filter on the FPGA. Specifically, I am interested in the number of BRAM blocks and DSP slices used for a sparse filter versus a non-sparse filter of the same length. I assume the filter coefficients to be symmetric and the number of taps to be odd.
I have been experimenting with the FIR compiler GUI and have observed the following.
For one output per cycle (No overclocking),
The filter coefficients [1 2 3 4 0 1 2 3 4 ] use 9 DSP slices. Shouldn't this be 4 DSP slices (if we use the symmetry of the coefficients)?
The filter coefficients [1 0 0 0 0 0 0 0 1] use 5 DSP slices. We have just the two multipliers here? Why do we need 5 DSP slices?
For both of the filter coefficients, no BRAM units are used.
When I increase the clock frequency, the number of DSP slices used goes down (although, I do not see a clean division here i.e. increasing the clock frequency by a factor of 2 does not reduce the DSP slices by half), but the number of BRAM units used increases. For example, the filter coefficients [1 2 3 4 0 1 2 3 4] use 6 DSP slices, (down from 9), and 5 BRAM blocks (previously 0 in the not overclocked case) when overclocked by a factor of 2. Similarly, overclocking by a factor of 3 reduces the number of DSP slices to 4, and the BRAM count goes to 3. Is there some relation between the number of DSP slices and the BRAM count?
05-03-2019 12:43 AM
Hi @a.abbasi.01 ,
The Filter will use the symmetry of the coefficients and reduce the number of DSP slices based on the symmetry. Please see the attached pic. Please refer to the PG149, page 23, which explains how the symmetry is exploited.
In your case the coeff [1 2 3 4 0 1 2 3 4 ] is not symmetric and hence it did not reduce the slice. However the coeff [1 0 0 0 0 0 0 0 1] is symmetric and it has used 4 DSP slices for symmtery. It considers '0' as a coeff value.
As for the BRAM, the BRAM are used as MAC data buffers. When the frequency is increased, each MAC performs multiplications in each spare clock cycle and the MAC data is stored in the BRAM buffers.
The choice to use the type of memory for various data (i/p, mac data, o/p data ) is provided in the Detailed implementation tab of the FIR compiler IP.