Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

- Community Forums
- :
- Forums
- :
- Hardware Development
- :
- Other FPGA Architecture
- :
- Re: Multiple sine and cosine lookups at high speed

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

leif@rdos.net

Visitor

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-04-2020 01:08 AM

1,167 Views

Registered:
12-29-2019

I have a 1GHz ADC signal that enters the design at 250 MHz with 4 samples per channel and 2 channels. I want to measure power on a few frequencies in the 45 MHz band. I want to do this in real-time, but latency is not an issue. The preferred solution is to multiply every sample with a sine and cosine value that shifts phase based on the specific frequencies I want to analyze. However, that means I need 8 sine and 8 cosine values at 250 MHz per frequency I want to analyze. The DDS can only provide a single value (although it might be able to give both the sine and cosine at the same time), but does it really work at 250 MHz? If I place the table in block or distributed RAM I cannot read out more than one value per clock cycle. A solution that might be more promising is to generate a sine table up to pi / 2 in Verilog code, as this should mean I could reference it from several blocks at the same time. However, will this work with a table of 64k x 14 entries? I'm using Kintex-7 (KC705). Resource usage is not a big issue since I presently only use 10% of the available LUTs and FFs.

I realize I could skip 3 out of 4 samples and use the average of the channels, but this feels like a bad solution with poorer performance. I also would need to duplicate the block RAM or DDS if I want to analyze several frequencies at the same time.

So, is there a way to make multiple references in the same clock cycle to a sine and cosine table at high speed, like at 250 MHz?

0
Kudos

Reply

1 Solution

Accepted Solutions

drjohnsmith

Teacher

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-04-2020 06:27 AM

1,013 Views

Registered:
07-09-2009

CORDIC

https://pdfs.semanticscholar.org/adab/cb3a48d2ebb2251bf298b5589598f3eefd04.pdf

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>

14 Replies

archangel-lightworks

Scholar

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-04-2020 01:20 AM

1,156 Views

Registered:
07-23-2019

0
Kudos

Reply

archangel-lightworks

Scholar

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-04-2020 01:22 AM

1,154 Views

Registered:
07-23-2019

0
Kudos

Reply

leif@rdos.net

Visitor

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-04-2020 02:10 AM

1,136 Views

Registered:
12-29-2019

Latency is not a problem in the sense that it doesn't matter if the result is available 100 cycles later, but the solution must handle real-time and so it must deliver 8 sine and 8 cosine values at 250 MHz.

The idea is that I can cache sample data in a FIFO until I have a decision if the frequency is part of the data or not. Having a FIFO with hundreds or thousands of values is not a problem. The idea is that I can skip most of the data that is not interesting and only stream data that is relevant over PCIe, which allows for much longer sampling periods.

It's possible I could use smaller tables. It might be enough with 8 bits, and maybe 32k or 16k might work too.

Sample values are 14-bit signed integers. The power estimate will also be some type of integer. In fact, it will come down to a yes-no decision when values are accumulated in an 8k or 16k sequence.

0
Kudos

Reply

archangel-lightworks

Scholar

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-04-2020 02:30 AM

1,125 Views

Registered:
07-23-2019

While not trivial, I cannot see it impossible to feed and multiply int data on a kintex-7 at 250 MHz

0
Kudos

Reply

archangel-lightworks

Scholar

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-04-2020 02:36 AM

1,121 Views

Registered:
07-23-2019

I would do a quick try with HLS.

0
Kudos

Reply

bruce_karaffa

Scholar

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-04-2020 02:56 AM

1,108 Views

Registered:
06-21-2017

calibra

Voyager

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-04-2020 03:42 AM

1,094 Views

Registered:
06-20-2012

I assume that the two channels are independent because they do not share the same sine / cosine.

Therefore you have two independent phase accumulators.

But the 4 samples of the same channel are multiplied by 4 consecutive sine in the table.

It is right ?

== If this was helpful, please feel free to give Kudos, and close if it answers your question ==

0
Kudos

Reply

leif@rdos.net

Visitor

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-04-2020 05:16 AM

1,062 Views

Registered:
12-29-2019

Yes, the channels are somewhat separate, even though mostly because they could have phase differences based on different arrival times of the signals. They come from two antennas placed at different locations.

My original idea was to create a general-purpose sine and cosine table with a fixed number of values up to pi / 2, and with that setup, the values for consecutive samples would not be consecutive in the sine and cosine table.

An alternative approach might be to create a more dedicated sine and cosine table that captures the frequency searched for closely enough with a much shorter table that is tied to the sampling frequency by some factor M / N. For instance, using 16 values and 750 MHz sampling frequency would capture 46.875 MHz. 22 values and 1GHz sampling frequency would capture 45.45 MHz.

0
Kudos

Reply

calibra

Voyager

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-04-2020 05:41 AM

1,044 Views

Registered:
06-20-2012

Ok.

So divide the "general-purpose sine and cosine table with a fixed number of values up to pi / 2"

in 4 tables "with a fixed number of values" / 4, which consecutives values to read 4 sines in parallel.

Ex.

1 ROM 8 values(1,2,3,4,5,6,7,8)

4 ROM 2 values(1,5) (2,6) (3,7) (4,8)

Same area but throughput multiplied by 4.

== If this was helpful, please feel free to give Kudos, and close if it answers your question ==

drjohnsmith

Teacher

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-04-2020 06:27 AM

1,014 Views

Registered:
07-09-2009

CORDIC

https://pdfs.semanticscholar.org/adab/cb3a48d2ebb2251bf298b5589598f3eefd04.pdf

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>

leif@rdos.net

Visitor

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-04-2020 07:32 AM - edited 05-04-2020 07:45 AM

986 Views

Registered:
12-29-2019

It seems like the CORDIC IP can produce both sine and cosine values in just one clock cycle, and that it might work at 250 MHz with a Kintex-7 device. It doesn't have huge resource usage requirements, and so adding 8 of these should work. It will definitely be worth a try.

Correction: It should be enough with 4 as both channels can operate on the same sine and cosine phase. It's just the accumulators that need to be separated.

0
Kudos

Reply

bruce_karaffa

Scholar

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-04-2020 08:07 AM

977 Views

Registered:
06-21-2017

A CORDIC will require at least one clock cycle per bit of precision at the output.

0
Kudos

Reply

avrumw

Guide

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-04-2020 08:25 AM

970 Views

Registered:
01-23-2009

A 64kx14 table would require 32 block RAMs. Each block RAM can be true dual port, so you can use one table to look up two value per clock. To get 8 values per clock you would need 4 of these, so a total of 128 block RAMs. While this is a large number, the Kintex-7 325T (the device in the KC705) has 445 of them, so if you aren't using them heavily for other things, you should be able to use them.

Of course you will have to be careful using RAMs like this - they will be scattered throughout the die so you need to pipeline carefully to use them (including using the output registers of the RAM). But since latency isn't an issue, this shouldn't be a problem.

Avrum

0
Kudos

Reply

leif@rdos.net

Visitor

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-04-2020 11:14 AM - edited 05-04-2020 01:44 PM

894 Views

Registered:
12-29-2019

From the CORDIC PDF it seems like it can operate in parallell mode and use one pipeline stage per bit, but still will be able to output one sine and one cosine per clock. I can just feed it with an increasing phase value and then use the outputs as coefficients without having to know which phase they correspond to.

Tested it with KC705, and it only uses a bit over 1,000 LUTs and FFs with a 256k x 14 configuration. It works at 187.5 MHz and outputs a new value every clock.