cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Visitor
Visitor
10,572 Views
Registered: ‎01-17-2008

floating point operator

Hi,
 
I'm brand new to FPGA's and VHDL, so I'm probably asking some silly questions, but anyways...I'm working on a project to synthesize some signals (a 90Hz + 150Hz wave with varying amplitude) with a relatively high accuracy (16 bits).  These signals will then be sent to an upconverting DAC to achieve AM-LC modulation in the neighborhood of 108MHz.  I'm using a DDS block to generate the 90Hz and 150Hz wave, and then I need to multiply them by some value and eventually add the two waves together.
 
I figured (perhaps incorrectly) to use the floating point operator core on the DDS outputs, do all my operations (some adds and multiplies) in floating point using the core, and then when I'm ready to go to the DAC, convert the floating point to 16 bit and take my truncation hit there rather than distributing it across multiple operations.  Now my question relates to the floating point core (v3.0).  It specifies that the latency (defined as the number of cycles between an operand input and result output) can be defined between 0 and some maximum number depending on the operation.  Additionally, it seems to use far less FPGA logic if I specify 0 latency than using the maximum latency.  Is this backwards?  Furthermore, why would I want latency as I think that would require me to do more complicated handshaking between cores to ensure everything arrives at the right time?
 
Again, I apologize if I'm being naive, and certainly if anyone has a better suggestion for anything, you won't hurt my feelings.
 
Thanks,

Sean
0 Kudos
Reply
4 Replies
Highlighted
Explorer
Explorer
10,555 Views
Registered: ‎08-14-2007

If the DDS output is fixed point already (and I imagine it is) why use a floating point add/multilpy.  Just carry on in fixed point.  You can keep making the words wider to accomodate whatever width expansion you need.  The great thing about FPGAs is that you can efficiently have words any width you like (within reason) not just 16/32/64/128 etc like in software.

How much less logic does it use for less latency?  I would have imagined a bit extra might required, but not much.

Extra latency will improve your maximum clock rate (up to a point).  You don't neccessarily have to worry handshaking, you jsut know it will take 3 (or whatever) cycles for your answer to be ready.

HTH,

Martin
Martin Thompson
martin.j.thompson@trw.com
http://www.conekt.co.uk/capabilities/electronic-hardware
0 Kudos
Reply
Highlighted
Visitor
Visitor
10,550 Views
Registered: ‎01-17-2008

Hi Martin,
 
I originally wanted to do fixed point (and still do), but I have to output 16 bits to a DAC, so if I start with 16 bits from a DDS and have to do some operations on it, I will quickly be getting into large bit counts.  Should I just take the 16 MSB's off my answer?  The equation I have to process is:
 
f(t,x) = {1 + 0.2*[(1-x*0.155)cos(2*pi*90*t) + (1+x*0.155)cos(2*pi*150*t)]}
 
The two cosine terms are taken care of inside the DDS, and the x variable will come from outside.  It looks like I would end up with an answer 50 bits long.
 
As far as the latency issue with the floating point operator, if I remember correctly it used roughly half as much logic for zero latency vs. maximum latency (the default).  I can see how latency might be useful if you are trying to synchronize the result of different operations, but I don't understand how it could improve the maximum clock rate.  I don't doubt you, I just don't understand if you wouldn't mind explaining.
 
Thx,
 
Sean
0 Kudos
Reply
Highlighted
Explorer
Explorer
10,547 Views
Registered: ‎08-14-2007

A 50 bit answer is OK - just take the top 16 bits for your answer.  If you need to have a rounded answer, use the 17th bit from the top to decide whether to add an extra '1' to the 16 bit answer.

Regarding pipelining - the more registers you put in your logic, the smaller each stage of logic is, so the faster the clock will run (well, up to a point)

Eg - if your logic operation takes 50ns to run, you can clock at 20MHz.  If you split it into 5 sections of 10ns each, you can clock it at 100MHz - you have to wait 5 cycles for an answer, but you can put a new "question" in every clock cyle, so your throughput has increased.

It's not quite as straight forward as that, as each additional register puts some extra "delay" into the equation due to it's imperfections, but hopefully you get the idea..

I was looking for a decent web reference to point you at, but all I can find is stuff abotu processor pipelines, whic, while relevant, clouds the issue somewhat from a "raw logic" point of view.

Hope that made some kind of sense!

Cheers,
Martin
Martin Thompson
martin.j.thompson@trw.com
http://www.conekt.co.uk/capabilities/electronic-hardware
0 Kudos
Reply
Highlighted
Visitor
Visitor
10,545 Views
Registered: ‎01-17-2008

Martin,
 
That explanation helps a lot.  Thank you for your help.
 
Thx,
 
Sean
0 Kudos
Reply