cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
648 Views
Registered: ‎07-02-2019

CORDIC 6.0 SQRT - wrong results

Hi,

I want to calculate an SQRT with the CORDIC 6.0. Configuration is for unsigned int, input width is 33 bits (resulting in 40 bits and 24 bits output), rounding mode is truncate. The behavioural simulation runs correct, but a timing simulation shows something strange - I'll attach a picture of the simulation results (only the upper 5 signals are of interest here):

Input value is X"8000", but at output valid, the results is X"00B4" instead of the expected X"00B5", which occurs one clock later. Out of the values I tested, this happens only for that single input value. Moreover, also this X"8000" at least once was processed correctly, resulting in the expected X"00B5".

What happens here? I don't think this is a timing problem (timing report is ok), the signals out of the CORDIC are simply wrong. What can I do to avoid this behaviour? Might something similar happen also for higher bits, not only for the LSB?

cordic-sqrt-cycle_b4-vs-b5.png
0 Kudos
Reply
7 Replies
615 Views
Registered: ‎03-27-2014

axel.schmidt@htw-dresden.de,

in this exact setup,

can you count and give us the amount of clock cycles between the input 'valid', and the output 'valid'

  1. between 0x8000 (in) and 0x00B4 (out - almost correct)
  2. between 0x8000 (in) and 0x00B5 (out - now corrected)
  3. any other input (like 0xABCD) that you previously tested and its valid result (that you already verified)

From what I see, in this example that would be 17 clock cycles; but I need to be sure. I'm looking for any latency variation between correct and faulty results.

I also see that your signal changes on the falling edge of the clock, I have never interacted with a Xilinx IP like that, I usually have both valid + data change on the rising edge.

Can you also quickly try it this way?? see if there is any change of behavior on the output side? What I really don't like is that the output data is somehow corrected (1 lsb is corrected) but it's not validated by a valid. Maybe this will do the trick

gw.
Embedded Systems, DSP, cyber
0 Kudos
Reply
520 Views
Registered: ‎07-02-2019

Hi guillaumebres,

thanks for your answer. Regarding your questions:

- The number of clock cycles between input valid and output valid is always as shown. That 17 clock cycles is also what is given in the generator when creating the component.

- The change of the signals is already done in a synchronous process on rising edge of clk_200. I'll attach the behavioural simulation, showing the same calculation cycle. I control the whole process (there is some more around) by means of a state machine, which is shown in the behavioural simu as well. In s13 and s14 I do some processing (multiplication, addition), just to have not all of these things in one clock cycle. In s15 I set the input data and input valid signal, in s16 I reset the input valid. What the timing simulation shows is what really happens, but there is sufficient setting time for the signal and the timing report doesn't mention any errors, i.e. from my point of view it should run.

- Currently, there are test data, just for verification. Test data run in a circle, 8 different values repeating (8 comes from the outside process, other data 0x8820, 0x9080, 0x9920, 0xa200, 0xab20, 0xb480, 0xbe20). Strange thing is that other tested values process correctly, also in case there is an LSB change. And in the first test data cycle, also the value in Question 0x8000 runs correct - but this is the very first value, the CORDIC comes from reset state.

Do you have any idea?

Axel

cordic-sqrt_behav_ok.png
0 Kudos
Reply
473 Views
Registered: ‎07-02-2019

Hi guillaumebres,

maybe I solved it, but it is still a little strange.

Report_methodology includes several warnings, stating:

"Asynchronous driver check
DSP xxx input pin xxx/A[11] is connected to registers with an asynchronous reset. This is preventing the possibility of merging these
registers in to the DSP Block since the DSP block registers only possess synchronous reset capability. It is suggested to recode or change these registers
to remove the reset or use a synchronous reset to get the best optimization for performance, power and area."

The input register of the CORDIC SQRT is a few synchronously clocked registers away from the registers with reset, and there is also a statement "Related violations: <none>" - that's why one might think it should run anyway, in case I don't need the highest possible performance. Furthermore, as stated in my first message, there is no timing error reported, and the error didn't occur for all values but for  "special" ones.

When I removed the registers in question from the asynchronous reset, the report_methodology doesn't mention the warning and the post_implementation timing simulation is ok now. I hope, this can be verified by further tests I will run, otherwise I'll be back here.

Best regards

Axel

0 Kudos
Reply
466 Views
Registered: ‎03-27-2014

axel.schmidt@htw-dresden.de,

I'm glad you were able to figure this out, that was not easy 


axel.schmidt@htw-dresden.de wrote:

the post_implementation timing simulation is ok now. I hope, this can be verified by further tests I will run, otherwise I'll be back here.


I assume everything works as you want now? so you're not getting this weird 1 clock cycle long faulty result anymore

I would not have thought about DSP slices lack of optimization. Indeed, the CORDIC is by itself a bunch of accumulators & substractors. Obviously this can be implemented combining DSP slices (for optimized throughput), but that's not the natural option. The Xilinx core probably does it on its own.

One problem I have with Xilinx DSP cores, is you quite often end up with not fully optimized DSP slices (missing pipelining registers, etc..) and you cannot do anything about it. 

Your problem actually demonstrates how post implementation simulation is important in some cases

IMO, the problem with Vivado (that is still true to this day, 2020.x) you get so many errors/warnings not doing anything (especially if you add a 'Zynq' interface), combined to a non existing history manager, it gets very hard to find an error message like the one you pointed out

gw.
Embedded Systems, DSP, cyber
0 Kudos
Reply
426 Views
Registered: ‎07-02-2019

Hi Guillaumebres,

I'll have an eye on that issue also in the future, but currently I think it works well. And yes, the design seems not fully optimised, there is a message telling me about missing pipelining - but in my design there is pipelining. For the things I want to do this point is not of importance, my speed limitations don't come from the calculation speed, so it doesn't matter.

You're right, also in my opinion there are to many warnings generated by Vivado, most of them are not really influencing the result. And obviously there are some tricky points in the classification of those warnings, i.e. which one really can cause problems and will do in the concrete design. But I understand also that the whole process from VHDL/Verilog to an implementation for a given FPGA is an extremely complex one. Therefore - as a common saying tells us -  one will always find only the next-to-last error.

Thanks for your support!

BR

Axel

0 Kudos
Reply
407 Views
Registered: ‎07-02-2019

Hi again,

unfortunately, the problem is not solved.

I continued to work on my design, adding some simple issues to control my program. The SQRT part was not touched at all. After compilation and post-implementation timing simulation I found again a not correct working CORDIC SQRT. This time the effect was even worse tha before. Sometimes there were wrong values, and at least once there was (within a sequence of several values) a new input value incl. the valid signal, and at the output there was a valid signal but no result change at all. I had a first look into the reports and I will do so again after this post, but up to now I couldn't find any hint regarding the CORDIC component.

Maybe it would be better to skip it and deal with the numbers without SQRT.

BR

Axel

0 Kudos
Reply
397 Views
Registered: ‎03-27-2014

axel.schmidt@htw-dresden.de ,

wow sorry to hear that,

I think you are better off writing the SQRT function yourself at this point. I know it's probably not your initial plan, but you will be surprised, once you understand the algorithm it's far from complex, here is what I suggest:

  • checkout the "opencores" website, they have a bunch of CORDIC you can try right out the box, just replace the function in your current simulation, maybe that will get you running almost instantly
  • if you do it yourself: forget about DSP slices, describe accumulations and shifting the usual way, this is only limiting at a clock rate of 150,200MHz
  • determine the accuracy your application requires on the SQRT output and fix the number of CORDIC iterations / stages accordingly (do not bother trying to make it generic at this point because it's pointless, you will see that later)
  • best CORDIC related documentations I know:

you are obviously interested in the magnitude/hypotenuse related mode of operation

gw.
Embedded Systems, DSP, cyber
0 Kudos
Reply