07-18-2019 07:54 AM
I'll start off by pointing out that I have seen this page:
https://www.xilinx.com/support/documentation/ip_documentation/ru/cordic.html#kintex7
I have a CORDIC set up as follows:
- Sin/Cos
- Parallel Architecture
- Max Pipelining
- Scaled Radian mode
- Input Width 23 bits
- Output Width 29 bits
- Round mode Nearest Even
- Auto Iterations/Precision
- Blocking Flow Control
- Optimized for Performance
- Clock Speed 250MHz
- xc7z045tffg676-2 Zynq target
As you may have guessed it fails timing by nearly 1ns, due to a long carry chain within the IP core that I can't touch. The closest device in the above link is a xc7k480 K7, and that suggests I should be able to achieve 250MHz easily.
So, what should I expect? I don't see how but can I speed this up? The PG says absolutely nothing about performance targets.
Thanks
02-18-2021 12:52 AM - edited 02-18-2021 12:56 AM
Thanks for the tips that's very helpful. We hadn't considered an open source CORDIC engine, or an alternative solution because to be frank the Xilinx IP solution offers what we needed, it's just not doing what it's supposed to. But given that, we'll definitely look at other options.
Last night I did manage to improve things by adding a pre-route of the CORDIC carry chains that are proving difficult to route. The large project fails because the shorter routes aren't available once it gets round to those modules so I figured by routing them first we'd meet timing. so far so good. If it helps anyone else here is what I did:
- Add the following command before the main route stage
route_design -nets [get_nets -hier -filter {NAME =~ *CORDIC_MODULE_NAME*gen_para_arch.gen_iteration[*]*i_lut6_addsub*}] -auto_delay
- Do your normal route but add -preserve, and if you're using the Explore directive change to something else because it's incompatible with preserve. e.g.
route_design -preserve -NoTimingRelaxation
James
07-18-2019 10:50 AM
07-19-2019 01:25 AM
@drjohnsmith wrote:
Your IP says
- xc7z045tffg676-2 Zynq target
but you mention xc7k480 K7
which device you looking at ?
The linked page does not have figures for every device. There are no standard Zynq devices listed, only ultrascale. Which figures do you think I should use? I figured the 480 was closest but I dunno...
07-19-2019 02:16 AM
07-19-2019 02:21 AM - edited 07-19-2019 02:21 AM
The IP IS set for the device I want to use, the Zynq 045 I listed above. I built and implemented my project for that part. It fails timing. I didn't expect this to happen because all of the performance speeds quoted for the CORDIC for similar configurations and devices are far in excess of the speed I am trying to run at.
The kintex part I quoted is simply the part I could closest match to my device that is on that linked page. The page does not list my device, or in fact many many other devices.
07-19-2019 06:09 AM
07-19-2019 06:42 AM - edited 07-19-2019 06:44 AM
Ok, I may get round to creating a new project to prove this out. But if it helps in the meantime here is a shot of the failing path and associated timing. It is a long logic chain as you can see. All of it is entirely wihtin the CORDIC core. If I expand the cone to leaf cells from the start/end FFs all the attached logic is also within the core.
Also, looking at where the logic cells have been placed on the device shows they are all located close to each other, there are no long routing paths responsible for the failure.
So, I'm not entirely sure adding registering on the in and out would help.
07-19-2019 07:00 AM
02-14-2021 10:38 AM - edited 02-15-2021 01:27 AM
Ok, I know this is old but it never got resolved and it is now causing us some real issues so I would like to try and find a solution or a workaround.
To recap:
- I am using a cordic as per the params in the OP but have now made it NonBlocking and removed the reset, to reduce complexity and help timing.
- I have added multiple stages of registering directly either side of the IP in my design.
- The long carry chain inside the IP is causing to timing in the larger project that holds it to fail occasionally.
- A blank design with just the cordic IP in obviously fails much less often, but any realistic design seems to fail due to slightly longer routing paths.
- I have reviewed the rest of the design and adding in so much more timing slack (rewritten RTL, diff directives, appropriate slackened constraints) and it helps a lot, but we still get failures and always with this IP carry chain.
- I am using the IP in a Zynq-7000 045 device and clocking at 250MHz. The xilinx website lists expected speeds achievable with the core and we are well inside these figures for our particular configuration.
So, what can we do? We are using vivado 2017.2 which although old is probably not responsible or hindering the timing, and upgrading won't likely help (if not please say why, but we need a genuine possible reason that it will help before my company will pay for an upgrade).
Is there anyothing we can do to the synthesis/implementation strategies to help? I tried making the IP Global instead of OOC and setting the synth strategy to "Fewer Carry Chains" as a wild shot in the dark, but it didn't help.
I really don't want to but I suppose we could lock down the IP to specific sites/routes in P&R, but that's really fiddly and it's not something I'd often consider. But.... I am running out of ideas.
Can someone please chip in with some thoughts?
02-14-2021 10:41 AM
Also l it seems I'm not the only one to find issues with the timing with this core.
https://forums.xilinx.com/t5/Timing-Analysis/Vivado-2017-2-Cordic-6-0-Timing-problem/m-p/811453
@guillermocj did you manage to find a solution in the end?
02-17-2021 07:59 AM
Anyone? It seems to me the IP is not producing logic with enough margin to meet the state speed. Is this something I can elevate to the superuser group then?
02-17-2021 12:03 PM
My go to answer to problems like this is to switch to an open source IP of some type. There exist plenty of open source CORDIC implementations out there that you can use--this one for example. Running "gencordic -t p2r -p 23 -o 29" should generate the CORDIC you are looking for.
Personally, if you need that kind of output width, a CORDIC is not the solution you would want. You really want a quadratic lookup table. (Same program, gencordic -t qtbl -i 23 -p 23 -o 27) That'll get you close to the accuracy you are looking (27bits vs 29 bits) for with about half the logic, less than half the latency, no nasty CORDIC scale factor, but (sadly) only one of the two legs.
Either of those two approaches will at least get you to the point where you have source code in front of you that you can use to debug the problem.
Dan
02-18-2021 12:52 AM - edited 02-18-2021 12:56 AM
Thanks for the tips that's very helpful. We hadn't considered an open source CORDIC engine, or an alternative solution because to be frank the Xilinx IP solution offers what we needed, it's just not doing what it's supposed to. But given that, we'll definitely look at other options.
Last night I did manage to improve things by adding a pre-route of the CORDIC carry chains that are proving difficult to route. The large project fails because the shorter routes aren't available once it gets round to those modules so I figured by routing them first we'd meet timing. so far so good. If it helps anyone else here is what I did:
- Add the following command before the main route stage
route_design -nets [get_nets -hier -filter {NAME =~ *CORDIC_MODULE_NAME*gen_para_arch.gen_iteration[*]*i_lut6_addsub*}] -auto_delay
- Do your normal route but add -preserve, and if you're using the Explore directive change to something else because it's incompatible with preserve. e.g.
route_design -preserve -NoTimingRelaxation
James
02-18-2021 07:27 AM - edited 02-18-2021 07:27 AM
Hi @dgisselq . I have just realised that although this post was started when I was adding a CORDIC to generate sin/cos, the modules we are currently having timing trouble with are not this particular CORDIC, but another two different types. One is s translate function (to do a linear mag conversion from complex to magnitude) and an Arc Tanh (to convert linear mag to dB mag). Your github repositories and website are really good. It looks like you have similar generators for these types of CORDIC so I will have a look there in case we have trouble with our timing, though at the minute the pre-route solution I found last night is working well, but this may change in future...
Thanks
James