cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
nlacoustics
Observer
Observer
1,068 Views
Registered: ‎07-27-2016

Squaring signals with IP-core

Hi all,

I have a rather complex design where I have to square a lot of paralell signals. Now I am using the Multiplier IP-core with two signed inputs of equal length (hence, squaring a signal). 

As for resource utilization, I am wondering if the Synthetizier and implementer recognize this that both inputs are the same? 

For example, with a 3-bit signed vector. With normal multiplication we would get outputs ranging from -4*3 to 3*3 = -12 to 9. (-12,-9,-8,-6,-4,-3,-2,-1,0,1,4,9,16) If I calculated right... 

But with squaring, the only outputs are 0,1,4,9,16.

This is a lot of less outcomes for the squaring. Hence already a look-up-table could be enough.

So my question is, can the optimizer take this into account, that it should be able to optimize when squaring instead of normal multiplication of signed?

Thank you for your input in advance!

Jonas

0 Kudos
7 Replies
baltintop
Voyager
Voyager
1,052 Views
Registered: ‎06-28-2018

Hi @nlacoustics 

You can simply synthesize/implement the design and see how much and what kind of resources the design uses.

0 Kudos
nlacoustics
Observer
Observer
1,047 Views
Registered: ‎07-27-2016

Thank you for your answer.

That is absolutely true, although this would need a lot of re-coding on my part so that is why I wondered if anyone knew the answer directly.

Cheers,

Jonas

0 Kudos
drjohnsmith
Teacher
Teacher
1,017 Views
Registered: ‎07-09-2009

Taking a step back,

on the assumption your going to hit an FPGA with this

in the FPGA , the DSP blocks on the later 7 series devices, have multipliers that can square in a one clock pipeline

look here, page 9
https://www.xilinx.com/support/documentation/user_guides/ug579-ultrascale-dsp.pdf
<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
dgisselq
Scholar
Scholar
1,013 Views
Registered: ‎05-21-2015

@nlacoustics,

What the optimizer can do is somewhat dependent upon the problem.  If you are squaring 8-bit numbers, then the optimizer should be able to convert that into a series of 16 lookup tables assuming you have no other logic in the same block.  Fewer bits than 8 should also result in a simple lookup table implementation.  If you are squaring 18-bit numbers (or more), the optimizer should convert your logic to using one (or more) hard multiplication resources (DSP) in your chip.  If you go much wider, there's not much the hardware can do in a single clock tick.  In the middle, there are possibilities that include using block RAM, it's in this place that I'm not sure where the cutoff is between trying to optimize with LUTs and optimizing with DSPs is.

Hence the answer above suggesting that you just try it out and see what happens.

Dan

nlacoustics
Observer
Observer
957 Views
Registered: ‎07-27-2016

Thank you for your answers! Really appreciate.

I took the time to replace the instantiated IP-cores with my own lookup tables as described before with the same latency as the IPs. 

And the result was a lot less resources used! So I'll stick with my own module. Here's a good change for Xilinx to improve the multiplier IP, with a "square" option.

And Yes, I know about DSP48, but my design is quite large so it would use up >1000 DSP48 units, which would not be feasible with my FPGA.  

Jonas

0 Kudos
drjohnsmith
Teacher
Teacher
937 Views
Registered: ‎07-09-2009

Just for reference,

even the smallest KU3P, has 1368 DSP blocks,
The VU13P has 12288 DSP blocks,

a thousand is very possible,

and they can run at 700 MHz clock

No problem using an LUT if thats what you want, I'm all for that,
BUT

just be aware of what can be done
<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
nlacoustics
Observer
Observer
787 Views
Registered: ‎07-27-2016

Thank you for your response and input.

Yes, I am aware of those high-end products, but I can't work with a chip that costs >1000$/piece (Digikey).

That is why I am sticking with Artix and need to adapt my solutions. But it is also fun to evaluate and research alternative algorithms

Thanks,

Jonas