cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
mistercoffee
Scholar
Scholar
960 Views
Registered: ‎04-04-2014

My CORDIC fails timing, what speed should I expect? Can't find accurate info on this

Jump to solution

I'll start off by pointing out that I have seen this page:

https://www.xilinx.com/support/documentation/ip_documentation/ru/cordic.html#kintex7

I have a CORDIC set up as follows:

- Sin/Cos

- Parallel Architecture

- Max Pipelining

- Scaled Radian mode

- Input Width 23 bits

- Output Width 29 bits

- Round mode Nearest Even

- Auto Iterations/Precision

- Blocking Flow Control

- Optimized for Performance

- Clock Speed 250MHz

- xc7z045tffg676-2 Zynq target

 

As you may have guessed it fails timing by nearly 1ns, due to a long carry chain within the IP core that I can't touch. The closest device in the above link is a xc7k480 K7, and that suggests I should be able to achieve 250MHz easily.

So, what should I expect? I don't see how but can I speed this up? The PG says absolutely nothing about performance targets.

Thanks

 

0 Kudos
Reply
1 Solution

Accepted Solutions
mistercoffee
Scholar
Scholar
244 Views
Registered: ‎04-04-2014

Thanks for the tips that's very helpful. We hadn't considered an open source CORDIC engine, or an alternative solution because to be frank the Xilinx IP solution offers what we needed, it's just not doing what it's supposed to. But given that, we'll definitely look at other options.

Last night I did manage to improve things by adding a pre-route of the CORDIC carry chains that are proving difficult to route. The large project fails because the shorter routes aren't available once it gets round to those modules so I figured by routing them first we'd meet timing. so far so good. If it helps anyone else here is what I did:

- Add the following command before the main route stage

route_design -nets [get_nets -hier -filter {NAME =~ *CORDIC_MODULE_NAME*gen_para_arch.gen_iteration[*]*i_lut6_addsub*}] -auto_delay

- Do your normal route but add -preserve, and if you're using the Explore directive change to something else because it's incompatible with preserve. e.g.

route_design -preserve -NoTimingRelaxation

James

View solution in original post

13 Replies
drjohnsmith
Teacher
Teacher
941 Views
Registered: ‎07-09-2009

Your IP says
- xc7z045tffg676-2 Zynq target
but you mention xc7k480 K7

which device you looking at ?
<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
0 Kudos
Reply
mistercoffee
Scholar
Scholar
920 Views
Registered: ‎04-04-2014

@drjohnsmith wrote:

Your IP says
- xc7z045tffg676-2 Zynq target
but you mention xc7k480 K7

which device you looking at ?

The linked page does not have figures for every device. There are no standard Zynq devices listed, only ultrascale. Which figures do you think I should use? I figured the 480 was closest but I dunno...

0 Kudos
Reply
drjohnsmith
Teacher
Teacher
911 Views
Registered: ‎07-09-2009
At the base level,
you need to make the IP for the same device as you want to use,
as you quote both, do you want an IP for a Zynq or a Kintex ?

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
0 Kudos
Reply
mistercoffee
Scholar
Scholar
909 Views
Registered: ‎04-04-2014

The IP IS set for the device I want to use, the Zynq 045 I listed above. I built and implemented my project for that part. It fails timing. I didn't expect this to happen because all of the performance speeds quoted for the CORDIC for similar configurations and devices are far in excess of the speed I am trying to run at. 

The kintex part I quoted is simply the part I could closest match to my device that is on that linked page. The page does not list my device, or in fact many many other devices. 

 

0 Kudos
Reply
drjohnsmith
Teacher
Teacher
890 Views
Registered: ‎07-09-2009
the data sheet with fmax speeds for that device is
https://www.xilinx.com/support/documentation/data_sheets/ds191-XC7Z030-XC7Z045-data-sheet.pdf

which , helps not a lot, as that device should work at 250 MHz,

Obvious things , register data in and out of IP ?

Ive not seen an IP fail the required timing, and to be honest 250 MHz does not sound un resonable,

I'd take step back, and try just the cordic in a blank design, registers on all input and outputs, and see if that meets timing,


<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
0 Kudos
Reply
mistercoffee
Scholar
Scholar
880 Views
Registered: ‎04-04-2014

cording timing 1.pngcording timing 2.png

Ok,  I may get round to creating a new project to prove this out. But if it helps in the meantime here is a shot of the failing path and associated timing. It is a long logic chain as you can see. All of it is entirely wihtin the CORDIC core. If I expand the cone to leaf cells from the start/end FFs all the attached logic is also within the core. 

Also, looking at where the logic cells have been placed on the device shows they are all located close to each other, there are no long routing paths responsible for the failure.

So, I'm not entirely sure adding registering on the in and out would help.

 

0 Kudos
Reply
drjohnsmith
Teacher
Teacher
874 Views
Registered: ‎07-09-2009
wonder why its using a ripple carry adder ?
sorry I'm on phone , so hard to see detail.

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
0 Kudos
Reply
mistercoffee
Scholar
Scholar
400 Views
Registered: ‎04-04-2014

Ok, I know this is old but it never got resolved and it is now causing us some real issues so I would like to try and find a solution or a workaround.

To recap:

- I am using a cordic as per the params in the OP but have now made it NonBlocking and removed the reset, to reduce complexity and help timing.

- I have added multiple stages of registering directly either side of the IP in my design.

- The long carry chain inside the IP is causing to timing in the larger project that holds it to fail occasionally. 

- A blank design with just the cordic IP in obviously fails much less often, but any realistic design seems to fail due to slightly longer routing paths.

- I have reviewed the rest of the design and adding in so much more timing slack (rewritten RTL, diff directives, appropriate slackened constraints) and it helps a lot, but we still get failures and always with this IP carry chain.

- I am using the IP in a Zynq-7000 045 device and clocking at 250MHz. The xilinx website lists expected speeds achievable with the core and we are well inside these figures for our particular configuration.

 

So, what can we do? We are using vivado 2017.2 which although old is probably not responsible or hindering the timing, and upgrading won't likely help (if not please say why, but we need a genuine possible reason that it will help before my company will pay for an upgrade). 

Is there anyothing we can do to the synthesis/implementation strategies to help? I tried making the IP Global instead of OOC and setting the synth strategy to "Fewer Carry Chains" as a wild shot in the dark, but it didn't help.

I really don't want to but I suppose we could lock down the IP to specific sites/routes in P&R, but that's really fiddly and it's not something I'd often consider. But.... I am running out of ideas.

Can someone please chip in with some thoughts?

mistercoffee
Scholar
Scholar
400 Views
Registered: ‎04-04-2014

Also l it seems I'm not the only one to find issues with the timing with this core.

https://forums.xilinx.com/t5/Timing-Analysis/Vivado-2017-2-Cordic-6-0-Timing-problem/m-p/811453

 

@guillermocj did you manage to find a solution in the end?

0 Kudos
Reply
mistercoffee
Scholar
Scholar
298 Views
Registered: ‎04-04-2014

Anyone? It seems to me the IP is not producing logic with enough margin to meet the state speed.  Is this something I can elevate to the superuser group then?

0 Kudos
Reply
dgisselq
Scholar
Scholar
281 Views
Registered: ‎05-21-2015

@mistercoffee ,

My go to answer to problems like this is to switch to an open source IP of some type.  There exist plenty of open source CORDIC implementations out there that you can use--this one for example.  Running "gencordic -t p2r -p 23 -o 29" should generate the CORDIC you are looking for.

Personally, if you need that kind of output width, a CORDIC is not the solution you would want.  You really want a quadratic lookup table.  (Same program, gencordic -t qtbl -i 23 -p 23 -o 27) That'll get you close to the accuracy you are looking (27bits vs 29 bits) for with about half the logic, less than half the latency, no nasty CORDIC scale factor, but (sadly) only one of the two legs.

Either of those two approaches will at least get you to the point where you have source code in front of you that you can use to debug the problem.

Dan

mistercoffee
Scholar
Scholar
245 Views
Registered: ‎04-04-2014

Thanks for the tips that's very helpful. We hadn't considered an open source CORDIC engine, or an alternative solution because to be frank the Xilinx IP solution offers what we needed, it's just not doing what it's supposed to. But given that, we'll definitely look at other options.

Last night I did manage to improve things by adding a pre-route of the CORDIC carry chains that are proving difficult to route. The large project fails because the shorter routes aren't available once it gets round to those modules so I figured by routing them first we'd meet timing. so far so good. If it helps anyone else here is what I did:

- Add the following command before the main route stage

route_design -nets [get_nets -hier -filter {NAME =~ *CORDIC_MODULE_NAME*gen_para_arch.gen_iteration[*]*i_lut6_addsub*}] -auto_delay

- Do your normal route but add -preserve, and if you're using the Explore directive change to something else because it's incompatible with preserve. e.g.

route_design -preserve -NoTimingRelaxation

James

View solution in original post

mistercoffee
Scholar
Scholar
219 Views
Registered: ‎04-04-2014

Hi @dgisselq . I have just realised that although this post was started when I was adding a CORDIC to generate sin/cos, the modules we are currently having timing trouble with are not this particular CORDIC, but another two different types. One is s translate function (to do a linear mag conversion from complex to magnitude) and an Arc Tanh (to convert linear mag to dB mag). Your github repositories and website are really good. It looks like you have similar generators for these types of CORDIC so I will have a look there in case we have trouble with our timing, though at the minute the pre-route solution I found last night is working well, but this may change in future...

Thanks

James