Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

- Community Forums
- :
- Forums
- :
- Hardware Development
- :
- AI Engine, DSP IP and Tools
- :
- Help converting floating point coeff to fixed poin...

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

m3atwad

Voyager

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-26-2018 06:16 PM

2,321 Views

Registered:
05-25-2016

Hello,

I'm having trouble getting started with my fir filter. I've read around on these forums and it looks like a lot of people use the FIR compiler. I've also read the users guide for it, but the guide doesn't contain a helpful overview for a new to DSP guy. So the idea is...

I have a SPI module (I wrote) that is reading data from an ADC continuously. I want to pass this data into a FIR filter as soon as the data is rx'd and after coming out of the fir LPF I save it to a register and tell the microblaze to read it. So what I'm trying to make is a custom peripheral that consists of a SPI controller (which I already did and it works fine) and a fir filter to deliver filtered data samples to a microblaze. The part I'm missing is how to take my FIR coefficients (this is a very basic LPF) and convert them from very small fractional floats to integers. Below are my coefficients I got from scipy (python) which I used to generate a basic "model" of the filter.

[ -1.09615399e-18 7.96876090e-04 3.77600581e-03 1.01932394e-02

2.13824198e-02 3.79791502e-02 5.91696357e-02 8.23766260e-02

1.03640369e-01 1.18650046e-01 1.24071263e-01 1.18650046e-01

1.03640369e-01 8.23766260e-02 5.91696357e-02 3.79791502e-02

2.13824198e-02 1.01932394e-02 3.77600581e-03 7.96876090e-04

-1.09615399e-18]

As you can see, the look to be too small to apply a simple scaling factor. What should I do?

I know Xilinx has a FIR compiler axi interface module, but implementing a second axi interface in this simple little module doesn't make any sense as it is already going to be an axi peripheral that will connect into the uBlaze.

So my ultimate questions are:

1. What do you dsp guys generally do to convert float coefficients to fixed point or integers and what would you recommend I do?

2. If I use the FIR compiler can I just get integer coefficients out of it and use those in my own simple fir filter module?

3. Is there a way to implement a fir filter without having to use the axi interface? something similar to a simple little module one could write by hand and just clock data into?

I've uploaded a simple block diagram to try to help simplify this post. Hopefully it adds clarity to what I'm trying to develop.

Thanks!

1 Solution

Accepted Solutions

jmcclusk

Mentor

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-30-2018 04:46 AM - edited 05-18-2018 12:30 PM

2,173 Views

Registered:
02-24-2014

Here is an improved version of the filter, which uses symmetric folding and the preadders to reduce the number of DSP48's by a factor of 2. It has zero gain at DC, and this makes it impossible to have overflow. There is still a problem because the outside edge coefficients (0 and 20) are so small that they convert to zero. This could be fixed.

---------------------------------------------------------------------------------- -- Engineer: John McCluskey -- -- Publish Date: 04/29/2018 7:45:00 AM -- Design Name: -- Module Name: filter_example - Behavioral ---------------------------------------------------------------------------------- library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.NUMERIC_STD.ALL; use IEEE.MATH_REAL.ALL; entity filter_example is Port ( clk : in std_logic; in_data : in signed; out_data : out signed ); end filter_example; architecture Behavioral of filter_example is constant c_width : natural := 18; -- adjust coefficient length here.. 18 bits fits into the B input of the DSP48E1 constant c_num_coef : natural := 21; type t_real_vector is array( natural range <>) of REAL; type t_signed_vector is array( natural range <>) of signed(c_width-1 downto 0); constant coeff_vector : t_real_vector(0 to c_num_coef-1) := ( -1.09615399e-18, 7.96876090e-04, 3.77600581e-03, 1.01932394e-02, 2.13824198e-02, 3.79791502e-02, 5.91696357e-02, 8.23766260e-02, 1.03640369e-01, 1.18650046e-01, 1.24071263e-01, 1.18650046e-01, 1.03640369e-01, 8.23766260e-02, 5.91696357e-02, 3.79791502e-02, 2.13824198e-02, 1.01932394e-02, 3.77600581e-03, 7.96876090e-04, -1.09615399e-18 ); -- this function scales the floating point coefficients so the filter has unity gain at the highest response frequency function convert_coef( A : t_real_vector ) return t_signed_vector is variable B : t_signed_vector(A'range); variable sum_abs : real := 0.0; begin for i in A'range loop -- scale to give DC gain of unity sum_abs := sum_abs + abs(A(i)); -- calculate sum of absolute value end loop; sum_abs := sum_abs / real( 2**c_width - 10); -- adjust the scaling HERE 10 is for safety for i in A'range loop B(i) := to_signed( integer( A(i)/sum_abs ), c_width); end loop; return B; end function convert_coef; constant filter_coef : t_signed_vector(0 to c_num_coef-1) := convert_coef( coeff_vector ); type t_data_vector is array(natural range <>) of signed(in_data'range); type t_preadd_vector is array(natural range <>) of signed(in_data'length downto 0); constant c_delay : natural := c_num_coef/2; signal pre_add : t_preadd_vector(0 to c_delay - 1) := (others => (others => '0')); signal delay_reg : t_data_vector(0 to 2*c_delay) := (others => (others => '0')); signal shift_reg : t_data_vector(0 to 2*c_num_coef-1) := (others => (others => '0')); constant c_sum_len : natural := integer( log2(real(c_num_coef))) +1 + c_width + in_data'length; type t_sum_vector is array(natural range <>) of signed(c_sum_len-1 downto 0); signal sum_vector : t_sum_vector(0 to c_num_coef-1) := (others => (others => '0')); attribute use_dsp48 : string; attribute use_dsp48 of sum_vector : signal is "yes"; begin process(clk) is begin if rising_edge(clk) then delay_reg(0) <= in_data; for j in 1 to 2*c_delay loop delay_reg(j) <= delay_reg(j-1); -- this shift register should be implemented with SRL32 elements end loop; shift_reg(1) <= in_data; pre_add(0) <= resize(shift_reg(1),in_data'length+1) + resize(delay_reg(2*c_delay),in_data'length+1); sum_vector(0) <= resize( pre_add(0) * filter_coef(0), c_sum_len); for i in 1 to c_delay - 1 loop pre_add(i) <= resize(shift_reg(2*i+1),in_data'length+1) + resize(delay_reg(2*c_delay),in_data'length+1); shift_reg(2*i) <= shift_reg(2*i-1); shift_reg(2*i+1) <= shift_reg(2*i); sum_vector(i) <= sum_vector(i-1) + resize( pre_add(i) * filter_coef(i), c_sum_len); end loop; if c_num_coef mod 2 = 1 then shift_reg(2*c_delay) <= shift_reg(2*c_delay-1); shift_reg(2*c_delay+1) <= shift_reg(2*c_delay); shift_reg(2*c_delay+2) <= shift_reg(2*c_delay+1); sum_vector(c_delay) <= sum_vector(c_delay-1) + resize( shift_reg(2*c_delay+2) * filter_coef(c_delay), c_sum_len); out_data <= resize( shift_right(sum_vector(c_delay), c_width), out_data'length); else out_data <= resize( shift_right(sum_vector(c_delay-1), c_width), out_data'length); end if; end if; end process; end Behavioral;

Don't forget to close a thread when possible by accepting a post as a solution.

9 Replies

m3atwad

Voyager

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-26-2018 06:55 PM

2,304 Views

Registered:
05-25-2016

b = signal.firwin(21, 0.01, window='blackman')

from scipy library. b is the list of coefficients.

avcon_lee

Explorer

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-26-2018 07:33 PM

2,292 Views

Registered:
07-17-2014

I am not familiar with dsp and software, but i know that xilinx has a ip which can be used to convert floating point

numbers to fixed points. you can find [Floating-point] in IP catalog, hoping to help you

jmcclusk

Mentor

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-26-2018 09:55 PM - edited 04-26-2018 09:57 PM

2,269 Views

Registered:
02-24-2014

Let me fix that for you..

---------------------------------------------------------------------------------- -- Engineer: John McCluskey -- -- Create Date: 04/26/2018 11:05:12 PM -- Design Name: -- Module Name: filter_example - Behavioral ---------------------------------------------------------------------------------- library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.NUMERIC_STD.ALL; use IEEE.MATH_REAL.ALL; entity filter_example is Port ( clk : in std_logic; in_data : in signed(15 downto 0); out_data : out signed(15 downto 0) ); end filter_example; architecture Behavioral of filter_example is constant c_width : natural := in_data'length; -- adjust coefficient length here.. for convenience, we make it the same as in_data constant c_num_coef : natural := 21; type real_vector is array( natural range <>) of REAL; type signed_vector is array( natural range <>) of signed(c_width-1 downto 0); constant coeff_vector : real_vector(0 to c_num_coef-1) := ( -1.09615399e-18, 7.96876090e-04, 3.77600581e-03, 1.01932394e-02, 2.13824198e-02, 3.79791502e-02, 5.91696357e-02, 8.23766260e-02, 1.03640369e-01, 1.18650046e-01, 1.24071263e-01, 1.18650046e-01, 1.03640369e-01, 8.23766260e-02, 5.91696357e-02, 3.79791502e-02, 2.13824198e-02, 1.01932394e-02, 3.77600581e-03, 7.96876090e-04, -1.09615399e-18 ); -- this function scales the floating point coefficients so the largest one is just below the peak possible function convert_coef( A : real_vector ) return signed_vector is variable B : signed_vector(A'range); variable C : real := 0.0; -- used for scaling begin for i in A'range loop -- first find maximum value in array if C < abs(A(i)) then C := abs(A(i)); end if; end loop; C := C / real( 2**(c_width-1) - 1); -- adjust the scaling HERE for i in A'range loop B(i) := to_signed( integer( A(i)/C ), c_width); end loop; return B; end function convert_coef; constant filter_coef : signed_vector(0 to c_num_coef-1) := convert_coef( coeff_vector ); signal shift_reg : signed_vector(0 to c_num_coef-1); type t_product_vector is array(natural range <>) of signed(c_width+in_data'length-1 downto 0); signal product : t_product_vector(0 to c_num_coef -1); attribute use_dsp48 : string; attribute use_dsp48 of product : signal is "no"; constant c_sum_len : natural := integer( log2(real(c_num_coef))) +1 + c_width + in_data'length; signal sum_d, sum_d2, sum_d3 : signed(c_sum_len-1 downto 0); begin process(clk) is variable sum : signed(c_sum_len-1 downto 0) := (others => '0'); begin if rising_edge(clk) then shift_reg(0) <= in_data; product(0) <= shift_reg(0) * filter_coef(0); for i in 1 to c_num_coef-1 loop shift_reg(i) <= shift_reg(i-1); product(i) <= shift_reg(i) * filter_coef(i); end loop; sum := resize(product(0), c_sum_len); for j in 1 to c_num_coef-1 loop sum := sum + resize(product(j), c_sum_len); end loop; sum_d <= sum; -- provide some pipelining registers sum_d2 <= sum_d; sum_d3 <= sum_d2; -- TODO...clip the output instead of truncating. end if; end process; out_data <= resize( sum_d3, out_data'length); end Behavioral;

I was quite surprised by the synthesis results.. it created 20 DSP48's, all pipelined in a row... which was not my intention, actually. I was expecting a cloud of LUT's with an adder tree, all in fabric logic.

You might be interested by these systolic implementations of FIR filters: https://github.com/BBN-Q/VHDL-FIR-filters/tree/master/src

After inserting this attribute:

attribute use_dsp48 of product : signal is "no";

I got the expected cloud of luts and FF's, as shown.

Oh.. and here's a testbench:

library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.NUMERIC_STD.ALL; library xil_defaultlib; entity tb_filter_example is -- Port ( ); end tb_filter_example; architecture Behavioral of tb_filter_example is signal clk : std_logic := '0'; signal in_data, out_data : signed(15 downto 0) := (others => '0'); signal count : unsigned(7 downto 0) := (others => '0'); begin clk <= not clk after 5 ns; process(clk) is begin if rising_edge(clk) then count <= count + 1; if count = 32 then in_data <= (0 => '1', others => '0'); else in_data <= (others => '0'); end if; end if; end process; dut: entity xil_defaultlib.filter_example Port map ( clk => clk, in_data => in_data, out_data => out_data ); end Behavioral;

Your impulse response looks like this:

QED

Don't forget to close a thread when possible by accepting a post as a solution.

jmcclusk

Mentor

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-27-2018 08:09 AM - edited 04-27-2018 08:10 AM

2,250 Views

Registered:
02-24-2014

So I wasn't very satisfied with the prior version, so I "improved" it. Now it has coefficient scaling so that the filter can't overflow, and it uses a systolic architecture that maps very nicely into the DSP48 elements.

Voila.

---------------------------------------------------------------------------------- -- Engineer: John McCluskey -- -- Create Date: 04/26/2018 11:05:12 PM -- Design Name: -- Module Name: filter_example - Behavioral and improved systolic ---------------------------------------------------------------------------------- library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.NUMERIC_STD.ALL; use IEEE.MATH_REAL.ALL; entity filter_example is Port ( clk : in std_logic; in_data : in signed(15 downto 0); -- size can be changed, no problem out_data : out signed(15 downto 0) ); end filter_example; architecture Behavioral of filter_example is constant c_width : natural := 18; -- adjust coefficient length here.. 18 bits fits into the B input of the DSP48E1 constant c_num_coef : natural := 21; type t_real_vector is array( natural range <>) of REAL; type t_signed_vector is array( natural range <>) of signed(c_width-1 downto 0); constant coeff_vector : t_real_vector(0 to c_num_coef-1) := ( -1.09615399e-18, 7.96876090e-04, 3.77600581e-03, 1.01932394e-02, 2.13824198e-02, 3.79791502e-02, 5.91696357e-02, 8.23766260e-02, 1.03640369e-01, 1.18650046e-01, 1.24071263e-01, 1.18650046e-01, 1.03640369e-01, 8.23766260e-02, 5.91696357e-02, 3.79791502e-02, 2.13824198e-02, 1.01932394e-02, 3.77600581e-03, 7.96876090e-04, -1.09615399e-18 ); -- this function scales the floating point coefficients so the largest one is just below the peak possible function convert_coef( A : t_real_vector ) return t_signed_vector is variable B : t_signed_vector(A'range); variable sum_abs : real := 0.0; begin for i in A'range loop sum_abs := sum_abs + abs(A(i)); -- now calculate sum of absolute value end loop; sum_abs := sum_abs / real( 2**c_width - 10); -- adjust the scaling HERE 10 is for safety for i in A'range loop B(i) := to_signed( integer( A(i)/sum_abs ), c_width); end loop; return B; end function convert_coef; constant filter_coef : t_signed_vector(0 to c_num_coef-1) := convert_coef( coeff_vector ); type t_data_vector is array(natural range <>) of signed(in_data'range); signal shift_reg : t_data_vector(0 to 2*c_num_coef-1) := (others => (others => '0')); constant c_sum_len : natural := integer( log2(real(c_num_coef))) +1 + c_width + in_data'length; type t_sum_vector is array(natural range <>) of signed(c_sum_len-1 downto 0); signal sum_vector : t_sum_vector(0 to c_num_coef-1) := (others => (others => '0')); attribute use_dsp48 : string; attribute use_dsp48 of sum_vector : signal is "yes"; begin process(clk) is begin if rising_edge(clk) then shift_reg(0) <= in_data; shift_reg(1) <= shift_reg(0); sum_vector(0) <= resize( shift_reg(0) * filter_coef(0), c_sum_len); for i in 1 to c_num_coef-1 loop shift_reg(2*i) <= shift_reg(2*i-1); shift_reg(2*i+1) <= shift_reg(2*i); sum_vector(i) <= sum_vector(i-1) + resize( shift_reg(2*i) * filter_coef(i), c_sum_len); end loop; end if; end process; out_data <= resize( shift_right(sum_vector(c_num_coef-1), c_width), out_data'length); end Behavioral;

Don't forget to close a thread when possible by accepting a post as a solution.

m3atwad

Voyager

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-27-2018 10:26 AM

2,234 Views

Registered:
05-25-2016

avcon_lee - Thank you for the response I actually will look into that module just out of curiousity!

Jmcclusk - Thank you very much for such an awesome response!! I will be studying this today and attempt to understand and implement it.

Thank you both for taking the time to help me out! :)

maps-mpls

Mentor

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-27-2018 12:02 PM

2,228 Views

Registered:
06-20-2017

1. What do you dsp guys generally do to convert float coefficients to fixed point or integers and what would you recommend I do?

I use a spreadsheet or write some mcode to convert to integers. (But I like @jmcclusk's solution too).

2. If I use the FIR compiler can I just get integer coefficients out of it and use those in my own simple fir filter module?

I don't understand this question. You put coefficients into the FIR compiler. You get the coefficients from some other program typically (e.g., FDATool, an Octave or python script you've written, etc.)

3. Is there a way to implement a fir filter without having to use the axi interface? something similar to a simple little module one could write by hand and just clock data into?

Yes, but the AXI interface is trivial. See also UG901, which has Verilog/VHDL for an FIR filter.

Also, I'm not sure if you need this or not, but hopefully it will help somebody if not you:

-- S.FFF integer Dec. Frac. real -- ----- ------- ---------- ------ -- 1000 -8 -1.0 -1.000 -- 1001 -7 -7/8 -0.875 -- 1010 -6 -6/8 -0.750 -- 1011 -5 -5/8 -0.625 -- 1100 -4 -4/8 -0.500 -- 1101 -3 -3/8 -0.375 -- 1110 -2 -2/8 -0.250 -- 1111 -1 -1/8 -0.125 -- 0000 0 0 0.000 -- 0001 1 1/8 0.125 -- 0010 2 2/8 0.250 -- 0011 3 3/8 0.375 -- 0100 4 4/8 0.500 -- 0101 5 5/8 0.625 -- 0110 6 6/8 0.750 -- 0111 7 7/8 0.875

So, a 4-bit number in 2s complement signed fractional form is trivial for positive numbers. For negative numbers, you can just google two's complement.

*** Destination: Rapid design and development cycles *** Unappreciated answers get deleted, unappreciative OPs get put on ignored list ***

m3atwad

Voyager

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-27-2018 04:25 PM

2,217 Views

Registered:
05-25-2016

Thanks for the response maps!

I believe I have a much better understanding of this now after working on this today with the input I got from you guys as well as spending the day working through the source code that was so graciously posted for my benefit :) *Thanks again jmcclusk!!

I've been working on getting the example jmcclusk posted to simulate in my verilog project as well as building a python model of it to check my verilog results - this has gone pretty well, but I have come up with another general question after not figuring it out searching the web. Where can I find documentation for verilog that is equivalent (relative use of equivalent here as they are very different) to the numeric standards, math_real, abs() and other library as well as built in functions that were used by jmcclusk. I am not a pro at verilog yet and was a little lost in what documents I should be following to figure these things out. I'm just looking for reading material here to learn so this isn't a super specific question.

I understand in your first post the idea here I think in the first example jmcclusk posted:

process(clk) is variable sum : signed(c_sum_len-1 downto 0) := (others => '0'); begin if rising_edge(clk) then shift_reg(0) <= in_data; product(0) <= shift_reg(0) * filter_coef(0); for i in 1 to c_num_coef-1 loop shift_reg(i) <= shift_reg(i-1); product(i) <= shift_reg(i) * filter_coef(i); end loop; sum := resize(product(0), c_sum_len); for j in 1 to c_num_coef-1 loop sum := sum + resize(product(j), c_sum_len); end loop; sum_d <= sum; -- provide some pipelining registers sum_d2 <= sum_d; sum_d3 <= sum_d2; -- TODO...clip the output instead of truncating. end if; end process; out_data <= resize( sum_d3, out_data'length);

You are using the for loop to replicate multipliers and adders it looks like. The multiplies look like there is a register between each multiplier creating a 1 clock delay. The adders look like they do not - instead they are all connected "combinatorially-no register delays". Is that correct?

The second example you posted that was a modification of the first one you said had coefficient scaling to prevent overflow. Code below.

process(clk) is begin if rising_edge(clk) then shift_reg(0) <= in_data; shift_reg(1) <= shift_reg(0); sum_vector(0) <= resize( shift_reg(0) * filter_coef(0), c_sum_len); for i in 1 to c_num_coef-1 loop shift_reg(2*i) <= shift_reg(2*i-1); shift_reg(2*i+1) <= shift_reg(2*i); sum_vector(i) <= sum_vector(i-1) + resize( shift_reg(2*i) * filter_coef(i), c_sum_len); end loop; end if; end process;

I'm having a hard time figuring out where the 2*i part comes from and how it helps you scale the coefficients. Could you shed some light on what is going on here?

Next question, when you picked the amount of pipeline registers, shown below as an example,

shift_reg(0) <= in_data; shift_reg(1) <= shift_reg(0); sum_vector(0) <= resize( shift_reg(0) * filter_coef(0), c_sum_len);

and

shift_reg(2*i) <= shift_reg(2*i-1); shift_reg(2*i+1) <= shift_reg(2*i); sum_vector(i) <= sum_vector(i-1) + resize( shift_reg(2*i) * filter_coef(i), c_sum_len);

did you pick them based off of what is shown in the DSP48E1 users guide where it denotes the input register delays. Screenshot from the users guide attached as 1.jpg and 2.jpg just to show what I'm referring to.

Thanks again!!

jmcclusk

Mentor

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-28-2018 04:53 AM

2,204 Views

Registered:
02-24-2014

@m3atwadwrote:

I understand in your first post the idea here I think in the first example jmcclusk posted:

process(clk) is variable sum : signed(c_sum_len-1 downto 0) := (others => '0'); begin if rising_edge(clk) then shift_reg(0) <= in_data; product(0) <= shift_reg(0) * filter_coef(0); for i in 1 to c_num_coef-1 loop shift_reg(i) <= shift_reg(i-1); product(i) <= shift_reg(i) * filter_coef(i); end loop; sum := resize(product(0), c_sum_len); for j in 1 to c_num_coef-1 loop sum := sum + resize(product(j), c_sum_len); end loop; sum_d <= sum; -- provide some pipelining registers sum_d2 <= sum_d; sum_d3 <= sum_d2; -- TODO...clip the output instead of truncating. end if; end process; out_data <= resize( sum_d3, out_data'length);You are using the for loop to replicate multipliers and adders it looks like. The multiplies look like there is a register between each multiplier creating a 1 clock delay. The adders look like they do not - instead they are all connected "combinatorially-no register delays". Is that correct?

Well, in this case, I was goofing around, because I wanted to see what Vivado would do with a purely behaviorial description of an FIR filter architecture. I wanted to see how well it would collapse a bunch of constant multiplies and adders, using retiming. The result was not great, and should be considered purely an experiment.

The second example you posted that was a modification of the first one you said had coefficient scaling to prevent overflow. Code below.

process(clk) is begin if rising_edge(clk) then shift_reg(0) <= in_data; shift_reg(1) <= shift_reg(0); sum_vector(0) <= resize( shift_reg(0) * filter_coef(0), c_sum_len); for i in 1 to c_num_coef-1 loop shift_reg(2*i) <= shift_reg(2*i-1); shift_reg(2*i+1) <= shift_reg(2*i); sum_vector(i) <= sum_vector(i-1) + resize( shift_reg(2*i) * filter_coef(i), c_sum_len); end loop; end if; end process;I'm having a hard time figuring out where the 2*i part comes from and how it helps you scale the coefficients. Could you shed some light on what is going on here?

This had nothing at all to do with the coefficent scaling, and everything to do with the systolic "wave" filtering taken from the DSP48 User guide. See page 51 of the DSP48 User guide, and look at Figure 3-5. ALL these registers are already built into the DSP48 silicon, we just have to use them!

Next question, when you picked the amount of pipeline registers, shown below as an example,

shift_reg(0) <= in_data; shift_reg(1) <= shift_reg(0); sum_vector(0) <= resize( shift_reg(0) * filter_coef(0), c_sum_len);and

shift_reg(2*i) <= shift_reg(2*i-1); shift_reg(2*i+1) <= shift_reg(2*i); sum_vector(i) <= sum_vector(i-1) + resize( shift_reg(2*i) * filter_coef(i), c_sum_len);

did you pick them based off of what is shown in the DSP48E1 users guide where it denotes the input register delays. Screenshot from the users guide attached as 1.jpg and 2.jpg just to show what I'm referring to.

Again, Figure 3-5 is the crucial diagram. I'm working now on retooling the code to handle symmetric filters, which permits reducing the number of multipliers by a factor of 2.

Important note: There is a verilog example of the symmetric systolic FIR filter in this zip file:

Grab this and take a look.. It doesn't do anything useful in terms of coefficent scaling, but it gets the filter topology right.

Don't forget to close a thread when possible by accepting a post as a solution.

jmcclusk

Mentor

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-30-2018 04:46 AM - edited 05-18-2018 12:30 PM

2,174 Views

Registered:
02-24-2014

Here is an improved version of the filter, which uses symmetric folding and the preadders to reduce the number of DSP48's by a factor of 2. It has zero gain at DC, and this makes it impossible to have overflow. There is still a problem because the outside edge coefficients (0 and 20) are so small that they convert to zero. This could be fixed.

---------------------------------------------------------------------------------- -- Engineer: John McCluskey -- -- Publish Date: 04/29/2018 7:45:00 AM -- Design Name: -- Module Name: filter_example - Behavioral ---------------------------------------------------------------------------------- library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.NUMERIC_STD.ALL; use IEEE.MATH_REAL.ALL; entity filter_example is Port ( clk : in std_logic; in_data : in signed; out_data : out signed ); end filter_example; architecture Behavioral of filter_example is constant c_width : natural := 18; -- adjust coefficient length here.. 18 bits fits into the B input of the DSP48E1 constant c_num_coef : natural := 21; type t_real_vector is array( natural range <>) of REAL; type t_signed_vector is array( natural range <>) of signed(c_width-1 downto 0); constant coeff_vector : t_real_vector(0 to c_num_coef-1) := ( -1.09615399e-18, 7.96876090e-04, 3.77600581e-03, 1.01932394e-02, 2.13824198e-02, 3.79791502e-02, 5.91696357e-02, 8.23766260e-02, 1.03640369e-01, 1.18650046e-01, 1.24071263e-01, 1.18650046e-01, 1.03640369e-01, 8.23766260e-02, 5.91696357e-02, 3.79791502e-02, 2.13824198e-02, 1.01932394e-02, 3.77600581e-03, 7.96876090e-04, -1.09615399e-18 ); -- this function scales the floating point coefficients so the filter has unity gain at the highest response frequency function convert_coef( A : t_real_vector ) return t_signed_vector is variable B : t_signed_vector(A'range); variable sum_abs : real := 0.0; begin for i in A'range loop -- scale to give DC gain of unity sum_abs := sum_abs + abs(A(i)); -- calculate sum of absolute value end loop; sum_abs := sum_abs / real( 2**c_width - 10); -- adjust the scaling HERE 10 is for safety for i in A'range loop B(i) := to_signed( integer( A(i)/sum_abs ), c_width); end loop; return B; end function convert_coef; constant filter_coef : t_signed_vector(0 to c_num_coef-1) := convert_coef( coeff_vector ); type t_data_vector is array(natural range <>) of signed(in_data'range); type t_preadd_vector is array(natural range <>) of signed(in_data'length downto 0); constant c_delay : natural := c_num_coef/2; signal pre_add : t_preadd_vector(0 to c_delay - 1) := (others => (others => '0')); signal delay_reg : t_data_vector(0 to 2*c_delay) := (others => (others => '0')); signal shift_reg : t_data_vector(0 to 2*c_num_coef-1) := (others => (others => '0')); constant c_sum_len : natural := integer( log2(real(c_num_coef))) +1 + c_width + in_data'length; type t_sum_vector is array(natural range <>) of signed(c_sum_len-1 downto 0); signal sum_vector : t_sum_vector(0 to c_num_coef-1) := (others => (others => '0')); attribute use_dsp48 : string; attribute use_dsp48 of sum_vector : signal is "yes"; begin process(clk) is begin if rising_edge(clk) then delay_reg(0) <= in_data; for j in 1 to 2*c_delay loop delay_reg(j) <= delay_reg(j-1); -- this shift register should be implemented with SRL32 elements end loop; shift_reg(1) <= in_data; pre_add(0) <= resize(shift_reg(1),in_data'length+1) + resize(delay_reg(2*c_delay),in_data'length+1); sum_vector(0) <= resize( pre_add(0) * filter_coef(0), c_sum_len); for i in 1 to c_delay - 1 loop pre_add(i) <= resize(shift_reg(2*i+1),in_data'length+1) + resize(delay_reg(2*c_delay),in_data'length+1); shift_reg(2*i) <= shift_reg(2*i-1); shift_reg(2*i+1) <= shift_reg(2*i); sum_vector(i) <= sum_vector(i-1) + resize( pre_add(i) * filter_coef(i), c_sum_len); end loop; if c_num_coef mod 2 = 1 then shift_reg(2*c_delay) <= shift_reg(2*c_delay-1); shift_reg(2*c_delay+1) <= shift_reg(2*c_delay); shift_reg(2*c_delay+2) <= shift_reg(2*c_delay+1); sum_vector(c_delay) <= sum_vector(c_delay-1) + resize( shift_reg(2*c_delay+2) * filter_coef(c_delay), c_sum_len); out_data <= resize( shift_right(sum_vector(c_delay), c_width), out_data'length); else out_data <= resize( shift_right(sum_vector(c_delay-1), c_width), out_data'length); end if; end if; end process; end Behavioral;

Don't forget to close a thread when possible by accepting a post as a solution.