cancel
Showing results for
Show  only  | Search instead for
Did you mean:
Highlighted Participant
348 Views
Registered: ‎11-09-2019

## Logic utilization reduction

Hello, I'm a begginer in FPGA design and I'm designing a UART transceiver that receives 11 16 bit numbers, does some computations and returns 3 16 bit numbers.

Since these 11 inputs represent numbers of different magnitudes, I convert them to a format in fixed point of 16 int and 32 frac bits, multiply the numbers by their scaling factor, which then gives me the numbers in the appropriate scale and format.

After that I perform the computations in this format, and get the output, which I format for 16 bits again and then send it back.

However, when I try to generate the programming file, it tells me that it's overmapped and that I've used too much logic. Im using a Spartan 3E which is quite old but shouldn't I be able to store this kind of data? Is 48 bits too much?

I'm aware that I can scale the numbers differently, which means I wouldn't need to scale them inside the FPGA and probably have this work in another way but I'm sure this (48 bit operations) can be done? So I wonder how can I reduce the logic utilisation.

Anyway my code for that part is: (Formatting function)

```function formatandscale(			small_n 		: signed(wordlength-1 downto 0);
scaling 		: integer range 0 to 9) return signed is
variable formatted_and_scaled : signed(big_wordlength-1 downto 0);
constant scale_9 : signed(big_wordlength-1 downto 0) := "000000000000000000000000000000000000000000000100";
constant scale_6 : signed(big_wordlength-1 downto 0) := "000000000000000000000000000000000001000011000111";
constant scale_4 : signed(big_wordlength-1 downto 0) := "000000000000000000000000000001101000110110111001";
constant zeros_32 : signed(big_int-1 downto 0) := "00000000000000000000000000000000";

begin
if scaling = 9 then
formatted_and_scaled := multiplysigned((small_n & zeros_32),scale_9);
elsif scaling = 6 then
formatted_and_scaled := multiplysigned((small_n & zeros_32),scale_6);
elsif scaling = 4 then
formatted_and_scaled := multiplysigned((small_n & zeros_32),scale_4);
elsif scaling = 0 then
formatted_and_scaled := (small_n & zeros_32);
end if;

return formatted_and_scaled;

end function;```

Multiplication function:

```function multiplysigned(operand_a : signed(big_wordlength-1 downto 0);
operand_b : signed(big_wordlength-1 downto 0)) return signed is
variable multiplication_x : signed (2*big_wordlength-1 downto 0);

begin
multiplication_x := operand_a * operand_b;
return multiplication_x(2*(big_wordlength)-big_int-1 downto big_frac);

end function;```

```function addsigned(operand_a : signed(big_wordlength-1 downto 0);
operand_b : signed(big_wordlength-1 downto 0)) return signed is
variable operand_a_s : signed (big_wordlength downto 0);
variable operand_b_s : signed (big_wordlength downto 0);
variable sum_x : signed (big_wordlength downto 0);

begin
operand_a_s := resize(signed(operand_a), operand_a_s'length);
operand_b_s := resize(signed(operand_b), operand_b_s'length);
sum_x := operand_a_s + operand_b_s;
return sum_x(big_wordlength-1 downto 0);

end function;```

Computations: (control law is a function that involves the two functions above)

```when computing =>
sleds_2 <="0101";

i_data_bytes <= ((others=>(others=>(others=>'0'))));

if computed = 0 then

i_data_big(0) <= formatandscale(i_data(0),0);
i_data_big(1) <= formatandscale(i_data(1),0);
i_data_big(2) <= formatandscale(i_data(2),9);
i_data_big(3) <= formatandscale(i_data(3),9);
i_data_big(4) <= formatandscale(i_data(4),9);
i_data_big(5) <= formatandscale(i_data(5),6);
i_data_big(6) <= formatandscale(i_data(6),6);
i_data_big(7) <= formatandscale(i_data(7),6);
i_data_big(8) <= formatandscale(i_data(8),4);
i_data_big(9) <= formatandscale(i_data(9),4);
i_data_big(10) <= formatandscale(i_data(10),4);
computed <= 1;
state <= computing;

elsif computed = 1 then

o_data_big(0) <= control_law(i_data_big(0),i_data_big(1),i_data_big(3),i_data_big(4),i_data_big(6),i_data_big(7),i_data_big(9),i_data_big(10));
o_data_big(1) <= control_law(i_data_big(0),i_data_big(1),i_data_big(4),i_data_big(2),i_data_big(7),i_data_big(5),i_data_big(10),i_data_big(8));
o_data_big(2) <= control_law(i_data_big(0),i_data_big(1),i_data_big(2),i_data_big(3),i_data_big(5),i_data_big(6),i_data_big(8),i_data_big(9));

computed <= 2;
state <= computing;

elsif computed = 2 then

o_data(0) <= o_data_big(0)(15 downto 0);
o_data(1) <= o_data_big(1)(15 downto 0);
o_data(2) <= o_data_big(2)(15 downto 0);

computed <= 3;
state <= computing;
elsif computed = 3 then
o_data_bytes(0,0) <= o_data(0)(7 downto 0);
o_data_bytes(0,1) <= o_data(0)(15 downto 8);
o_data_bytes(1,0) <= o_data(1)(7 downto 0);
o_data_bytes(1,1) <= o_data(1)(15 downto 8);
o_data_bytes(2,0) <= o_data(2)(7 downto 0);
o_data_bytes(2,1) <= o_data(2)(15 downto 8);

computed <= 4;
state <= computing;
elsif computed = 4 then
data_out <= '0';
buffer_tx <= std_logic_vector(o_data_bytes(word_counter_tx,byte_counter));
state <= sending;
end if;```

1 Solution

Accepted Solutions
Highlighted Participant
110 Views
Registered: ‎11-09-2019

I figured it out in the mean time.

The problem was that I was using functions that contained multiple nested multiplications and additions. By using a function the code assumes that all that is inside the function must be made within one clock cycle, which causes the resource usage to explode.

The solution was stop using functions and instead create parallel processes, whose inputs and outputs are set and retrieved in the UART process, one step at a time, which creates only 1 multiplier, 1 adder , 1 formatter, rather than a whole bunch of them. With this I got my resources that were on 150% down to about 50%, including LUTs and Slices.

3 Replies
Highlighted 317 Views
Registered: ‎04-26-2015

Which Spartan 3E?

My guess is that you're running out of DSP slices rather than any general-purpose logic. With the 18-bit signed multipliers in the Spartan 3, a 48-bit multiply is going to need nine multipliers. The smallest Spartan 3E only has four. Additionally, you've got a lot of multiply operations here (nine instantiations of formatandscale with non-zero scaling, plus any instantiations inside control_law) so it's most likely using 81 multipliers. Even the biggest Spartan 3 doesn't have that many.

Given the low speed of a UART, the obvious thing to do here is to multiplex all your operations through a single multiplier (ie so a 48-bit multiply will take at least 9 cycles to complete). This will make the code more complex, but drastically reduce resource usage.

Highlighted Participant
307 Views
Registered: ‎11-09-2019

Its a XC3S500E Spartan 3E.

I get number of slice flip flops : 13% Number of 4 input LUT 133% Number of coccupied slices 160% and total number of 4 input LUTs 154 %

I don't think I understood your suggestion completely, how am I to multiplex my operations in this context? The only thing I could think of would be doing the operations at different clock cycles (increasing the "computed" variable, so it has more steps), would that help? (processing time is not of the essence for this design)

Edit: I tried and it didn't help so unfortunately I'm at a bit of a loss here.

Highlighted Participant
111 Views
Registered: ‎11-09-2019

I figured it out in the mean time.

The problem was that I was using functions that contained multiple nested multiplications and additions. By using a function the code assumes that all that is inside the function must be made within one clock cycle, which causes the resource usage to explode.

The solution was stop using functions and instead create parallel processes, whose inputs and outputs are set and retrieved in the UART process, one step at a time, which creates only 1 multiplier, 1 adder , 1 formatter, rather than a whole bunch of them. With this I got my resources that were on 150% down to about 50%, including LUTs and Slices.