Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

- Community Forums
- :
- Forums
- :
- Hardware Development
- :
- FPGA Configuration
- :
- Logic utilization reduction

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Highlighted

hugobpontes

Participant

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-08-2020 03:37 AM - edited 08-08-2020 08:25 AM

348 Views

Registered:
11-09-2019

Hello, I'm a begginer in FPGA design and I'm designing a UART transceiver that receives 11 16 bit numbers, does some computations and returns 3 16 bit numbers.

Since these 11 inputs represent numbers of different magnitudes, I convert them to a format in fixed point of 16 int and 32 frac bits, multiply the numbers by their scaling factor, which then gives me the numbers in the appropriate scale and format.

After that I perform the computations in this format, and get the output, which I format for 16 bits again and then send it back.

However, when I try to generate the programming file, it tells me that it's overmapped and that I've used too much logic. Im using a Spartan 3E which is quite old but shouldn't I be able to store this kind of data? Is 48 bits too much?

I'm aware that I can scale the numbers differently, which means I wouldn't need to scale them inside the FPGA and probably have this work in another way but I'm sure this (48 bit operations) can be done? So I wonder how can I reduce the logic utilisation.

Anyway my code for that part is: (Formatting function)

function formatandscale( small_n : signed(wordlength-1 downto 0); scaling : integer range 0 to 9) return signed is variable formatted_and_scaled : signed(big_wordlength-1 downto 0); constant scale_9 : signed(big_wordlength-1 downto 0) := "000000000000000000000000000000000000000000000100"; constant scale_6 : signed(big_wordlength-1 downto 0) := "000000000000000000000000000000000001000011000111"; constant scale_4 : signed(big_wordlength-1 downto 0) := "000000000000000000000000000001101000110110111001"; constant zeros_32 : signed(big_int-1 downto 0) := "00000000000000000000000000000000"; begin if scaling = 9 then formatted_and_scaled := multiplysigned((small_n & zeros_32),scale_9); elsif scaling = 6 then formatted_and_scaled := multiplysigned((small_n & zeros_32),scale_6); elsif scaling = 4 then formatted_and_scaled := multiplysigned((small_n & zeros_32),scale_4); elsif scaling = 0 then formatted_and_scaled := (small_n & zeros_32); end if; return formatted_and_scaled; end function;

Multiplication function:

function multiplysigned(operand_a : signed(big_wordlength-1 downto 0); operand_b : signed(big_wordlength-1 downto 0)) return signed is variable multiplication_x : signed (2*big_wordlength-1 downto 0); begin multiplication_x := operand_a * operand_b; return multiplication_x(2*(big_wordlength)-big_int-1 downto big_frac); end function;

Adding function:

function addsigned(operand_a : signed(big_wordlength-1 downto 0); operand_b : signed(big_wordlength-1 downto 0)) return signed is variable operand_a_s : signed (big_wordlength downto 0); variable operand_b_s : signed (big_wordlength downto 0); variable sum_x : signed (big_wordlength downto 0); begin operand_a_s := resize(signed(operand_a), operand_a_s'length); operand_b_s := resize(signed(operand_b), operand_b_s'length); sum_x := operand_a_s + operand_b_s; return sum_x(big_wordlength-1 downto 0); end function;

Computations: (control law is a function that involves the two functions above)

when computing => sleds_2 <="0101"; i_data_bytes <= ((others=>(others=>(others=>'0')))); if computed = 0 then i_data_big(0) <= formatandscale(i_data(0),0); i_data_big(1) <= formatandscale(i_data(1),0); i_data_big(2) <= formatandscale(i_data(2),9); i_data_big(3) <= formatandscale(i_data(3),9); i_data_big(4) <= formatandscale(i_data(4),9); i_data_big(5) <= formatandscale(i_data(5),6); i_data_big(6) <= formatandscale(i_data(6),6); i_data_big(7) <= formatandscale(i_data(7),6); i_data_big(8) <= formatandscale(i_data(8),4); i_data_big(9) <= formatandscale(i_data(9),4); i_data_big(10) <= formatandscale(i_data(10),4); computed <= 1; state <= computing; elsif computed = 1 then o_data_big(0) <= control_law(i_data_big(0),i_data_big(1),i_data_big(3),i_data_big(4),i_data_big(6),i_data_big(7),i_data_big(9),i_data_big(10)); o_data_big(1) <= control_law(i_data_big(0),i_data_big(1),i_data_big(4),i_data_big(2),i_data_big(7),i_data_big(5),i_data_big(10),i_data_big(8)); o_data_big(2) <= control_law(i_data_big(0),i_data_big(1),i_data_big(2),i_data_big(3),i_data_big(5),i_data_big(6),i_data_big(8),i_data_big(9)); computed <= 2; state <= computing; elsif computed = 2 then o_data(0) <= o_data_big(0)(15 downto 0); o_data(1) <= o_data_big(1)(15 downto 0); o_data(2) <= o_data_big(2)(15 downto 0); computed <= 3; state <= computing; elsif computed = 3 then o_data_bytes(0,0) <= o_data(0)(7 downto 0); o_data_bytes(0,1) <= o_data(0)(15 downto 8); o_data_bytes(1,0) <= o_data(1)(7 downto 0); o_data_bytes(1,1) <= o_data(1)(15 downto 8); o_data_bytes(2,0) <= o_data(2)(7 downto 0); o_data_bytes(2,1) <= o_data(2)(15 downto 8); computed <= 4; state <= computing; elsif computed = 4 then data_out <= '0'; buffer_tx <= std_logic_vector(o_data_bytes(word_counter_tx,byte_counter)); state <= sending; end if;

1 Solution

Accepted Solutions

Highlighted

hugobpontes

Participant

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-10-2020 06:05 PM - edited 08-10-2020 06:08 PM

110 Views

Registered:
11-09-2019

I figured it out in the mean time.

The problem was that I was using functions that contained multiple nested multiplications and additions. By using a function the code assumes that all that is inside the function must be made within one clock cycle, which causes the resource usage to explode.

The solution was stop using functions and instead create parallel processes, whose inputs and outputs are set and retrieved in the UART process, one step at a time, which creates only 1 multiplier, 1 adder , 1 formatter, rather than a whole bunch of them. With this I got my resources that were on 150% down to about 50%, including LUTs and Slices.

3 Replies

Highlighted

u4223374

Advisor

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-08-2020 04:31 AM

317 Views

Registered:
04-26-2015

Which Spartan 3E?

My guess is that you're running out of DSP slices rather than any general-purpose logic. With the 18-bit signed multipliers in the Spartan 3, a 48-bit multiply is going to need nine multipliers. The smallest Spartan 3E only has four. Additionally, you've got a *lot* of multiply operations here (nine instantiations of formatandscale with non-zero scaling, plus any instantiations inside control_law) so it's most likely using 81 multipliers. Even the biggest Spartan 3 doesn't have that many.

Given the low speed of a UART, the obvious thing to do here is to multiplex all your operations through a single multiplier (ie so a 48-bit multiply will take at least 9 cycles to complete). This will make the code more complex, but drastically reduce resource usage.

Highlighted

hugobpontes

Participant

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-08-2020 05:00 AM - edited 08-08-2020 05:20 AM

307 Views

Registered:
11-09-2019

Its a XC3S500E Spartan 3E.

I get number of slice flip flops : 13% Number of 4 input LUT 133% Number of coccupied slices 160% and total number of 4 input LUTs 154 %

I don't think I understood your suggestion completely, how am I to multiplex my operations in this context? The only thing I could think of would be doing the operations at different clock cycles (increasing the "computed" variable, so it has more steps), would that help? (processing time is not of the essence for this design)

Edit: I tried and it didn't help so unfortunately I'm at a bit of a loss here.

Highlighted

hugobpontes

Participant

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-10-2020 06:05 PM - edited 08-10-2020 06:08 PM

111 Views

Registered:
11-09-2019

I figured it out in the mean time.

The problem was that I was using functions that contained multiple nested multiplications and additions. By using a function the code assumes that all that is inside the function must be made within one clock cycle, which causes the resource usage to explode.

The solution was stop using functions and instead create parallel processes, whose inputs and outputs are set and retrieved in the UART process, one step at a time, which creates only 1 multiplier, 1 adder , 1 formatter, rather than a whole bunch of them. With this I got my resources that were on 150% down to about 50%, including LUTs and Slices.