08-05-2020 03:38 AM
I want to calculate an SQRT with the CORDIC 6.0. Configuration is for unsigned int, input width is 33 bits (resulting in 40 bits and 24 bits output), rounding mode is truncate. The behavioural simulation runs correct, but a timing simulation shows something strange - I'll attach a picture of the simulation results (only the upper 5 signals are of interest here):
Input value is X"8000", but at output valid, the results is X"00B4" instead of the expected X"00B5", which occurs one clock later. Out of the values I tested, this happens only for that single input value. Moreover, also this X"8000" at least once was processed correctly, resulting in the expected X"00B5".
What happens here? I don't think this is a timing problem (timing report is ok), the signals out of the CORDIC are simply wrong. What can I do to avoid this behaviour? Might something similar happen also for higher bits, not only for the LSB?
08-05-2020 09:26 AM - edited 08-05-2020 09:34 AM
in this exact setup,
can you count and give us the amount of clock cycles between the input 'valid', and the output 'valid'
From what I see, in this example that would be 17 clock cycles; but I need to be sure. I'm looking for any latency variation between correct and faulty results.
I also see that your signal changes on the falling edge of the clock, I have never interacted with a Xilinx IP like that, I usually have both valid + data change on the rising edge.
Can you also quickly try it this way?? see if there is any change of behavior on the output side? What I really don't like is that the output data is somehow corrected (1 lsb is corrected) but it's not validated by a valid. Maybe this will do the trick
08-11-2020 01:59 AM
thanks for your answer. Regarding your questions:
- The number of clock cycles between input valid and output valid is always as shown. That 17 clock cycles is also what is given in the generator when creating the component.
- The change of the signals is already done in a synchronous process on rising edge of clk_200. I'll attach the behavioural simulation, showing the same calculation cycle. I control the whole process (there is some more around) by means of a state machine, which is shown in the behavioural simu as well. In s13 and s14 I do some processing (multiplication, addition), just to have not all of these things in one clock cycle. In s15 I set the input data and input valid signal, in s16 I reset the input valid. What the timing simulation shows is what really happens, but there is sufficient setting time for the signal and the timing report doesn't mention any errors, i.e. from my point of view it should run.
- Currently, there are test data, just for verification. Test data run in a circle, 8 different values repeating (8 comes from the outside process, other data 0x8820, 0x9080, 0x9920, 0xa200, 0xab20, 0xb480, 0xbe20). Strange thing is that other tested values process correctly, also in case there is an LSB change. And in the first test data cycle, also the value in Question 0x8000 runs correct - but this is the very first value, the CORDIC comes from reset state.
Do you have any idea?
08-17-2020 07:08 AM
maybe I solved it, but it is still a little strange.
Report_methodology includes several warnings, stating:
"Asynchronous driver check
DSP xxx input pin xxx/A is connected to registers with an asynchronous reset. This is preventing the possibility of merging these
registers in to the DSP Block since the DSP block registers only possess synchronous reset capability. It is suggested to recode or change these registers
to remove the reset or use a synchronous reset to get the best optimization for performance, power and area."
The input register of the CORDIC SQRT is a few synchronously clocked registers away from the registers with reset, and there is also a statement "Related violations: <none>" - that's why one might think it should run anyway, in case I don't need the highest possible performance. Furthermore, as stated in my first message, there is no timing error reported, and the error didn't occur for all values but for "special" ones.
When I removed the registers in question from the asynchronous reset, the report_methodology doesn't mention the warning and the post_implementation timing simulation is ok now. I hope, this can be verified by further tests I will run, otherwise I'll be back here.
08-17-2020 07:44 AM
I'm glad you were able to figure this out, that was not easy
the post_implementation timing simulation is ok now. I hope, this can be verified by further tests I will run, otherwise I'll be back here.
I assume everything works as you want now? so you're not getting this weird 1 clock cycle long faulty result anymore
I would not have thought about DSP slices lack of optimization. Indeed, the CORDIC is by itself a bunch of accumulators & substractors. Obviously this can be implemented combining DSP slices (for optimized throughput), but that's not the natural option. The Xilinx core probably does it on its own.
One problem I have with Xilinx DSP cores, is you quite often end up with not fully optimized DSP slices (missing pipelining registers, etc..) and you cannot do anything about it.
Your problem actually demonstrates how post implementation simulation is important in some cases
IMO, the problem with Vivado (that is still true to this day, 2020.x) you get so many errors/warnings not doing anything (especially if you add a 'Zynq' interface), combined to a non existing history manager, it gets very hard to find an error message like the one you pointed out
08-18-2020 11:53 PM
I'll have an eye on that issue also in the future, but currently I think it works well. And yes, the design seems not fully optimised, there is a message telling me about missing pipelining - but in my design there is pipelining. For the things I want to do this point is not of importance, my speed limitations don't come from the calculation speed, so it doesn't matter.
You're right, also in my opinion there are to many warnings generated by Vivado, most of them are not really influencing the result. And obviously there are some tricky points in the classification of those warnings, i.e. which one really can cause problems and will do in the concrete design. But I understand also that the whole process from VHDL/Verilog to an implementation for a given FPGA is an extremely complex one. Therefore - as a common saying tells us - one will always find only the next-to-last error.
Thanks for your support!
08-20-2020 03:59 AM
unfortunately, the problem is not solved.
I continued to work on my design, adding some simple issues to control my program. The SQRT part was not touched at all. After compilation and post-implementation timing simulation I found again a not correct working CORDIC SQRT. This time the effect was even worse tha before. Sometimes there were wrong values, and at least once there was (within a sequence of several values) a new input value incl. the valid signal, and at the output there was a valid signal but no result change at all. I had a first look into the reports and I will do so again after this post, but up to now I couldn't find any hint regarding the CORDIC component.
Maybe it would be better to skip it and deal with the numbers without SQRT.
08-20-2020 05:17 AM - edited 08-20-2020 05:23 AM
wow sorry to hear that,
I think you are better off writing the SQRT function yourself at this point. I know it's probably not your initial plan, but you will be surprised, once you understand the algorithm it's far from complex, here is what I suggest:
you are obviously interested in the magnitude/hypotenuse related mode of operation