cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
javitxu
Visitor
Visitor
664 Views
Registered: ‎09-17-2018

Vivado timing analysis for a source synchronous interface

Jump to solution

Hi everyone!

I have some issues with the timing analysis tool of Vivado which I can´t really understand. 

First, some context: I have a source synchronous interface between an ADC (AD9249, https://www.analog.com/media/en/technical-documentation/data-sheets/AD9249.pdf) and a series 7 Zynq. As the ADC provides its data serialized by 14 and at double data rate, I have instantiated a IBUFDS and two cascade ISERDESE2, both using the Xilinx utility "SelectIO Wizard" (I could have done the instantiation manually but this simplifies the process):

javitxu_1-1610712348066.png

The clocks for the ISERDESE2 come from a MMCM which takes the input clock (nominally at 350 MHz) and outputs two different clocks: one at 350 MHz and another at 50 MHz, both phase aligned (at 0 degrees) with the input clock.

The ADC provides the clock center-aligned with respect to data, with an uncertainty of 300 ps, so the minimum input delay from both clock edges to the correspondent data edge is bit_period/2-300 ps and the maximum input delay is bit_period/2+300 ps, being the bit period equal to (1/350 MHz)/2 = 1.429 ns. Therefore the constraints are (note that ADC_CLKOUT_B1_P is the name for the input clock):

set_input_delay -clock [get_clocks ADC_CLKOUT_B1_P] -clock_fall -min -add_delay 0.414 [get_ports ADC_*_N]
set_input_delay -clock [get_clocks ADC_CLKOUT_B1_P] -clock_fall -max -add_delay 1.014 [get_ports ADC_*_N]
set_input_delay -clock [get_clocks ADC_CLKOUT_B1_P] -min -add_delay 0.414 [get_ports ADC_*_N]
set_input_delay -clock [get_clocks ADC_CLKOUT_B1_P] -max -add_delay 1.014 [get_ports ADC_*_N]
set_input_delay -clock [get_clocks ADC_CLKOUT_B1_P] -clock_fall -min -add_delay 0.414 [get_ports ADC_*_P]
set_input_delay -clock [get_clocks ADC_CLKOUT_B1_P] -clock_fall -max -add_delay 1.014 [get_ports ADC_*_P]
set_input_delay -clock [get_clocks ADC_CLKOUT_B1_P] -min -add_delay 0.414 [get_ports ADC_*_P]
set_input_delay -clock [get_clocks ADC_CLKOUT_B1_P] -max -add_delay 1.014 [get_ports ADC_*_P] 

Ok, so I implement the design and run the timing report, and I get timing violations on every data signal from the ADC. This is the detailed analysis for a representative route:

TimingResult.png

So, now my questions:

First: why is the destination clock "advanced" in the MMCM? As far as I understand it, if I configure (using the Clocking Wizard) the MMCM to have its output and input phase aligned at zero degrees, the "time that the clock needs to pass through the MMCM" should be zero. Am I understanding it wrong? If it were not for that "advance", my clock would arrive at 7.235 ns, which is anyway too late. However changing the phase relationship in the MMCM I might be able to adjust the timing (or I could delay the clock via IODELAYS). 

And second: is there any way to tell Vivado that I need some specific signals to be captured one cycle late? Instead of capturing them in the edge that follows the one that generated them, use the next one. I get why Vivado does not do this automatically, as it could desynchronize signals that are synchronized when they enter the FPGA, but is there any way I can force this?. I think that "multicycle delay" is not what I'm looking for, as I need to capture one data sample per cycle. It is more like convincing the tool that I have something like a "pipeline" structure with a depth of 1. With the interface as I have it now, it could happen that my clock got delayed by a lot (but approximately the same in all my input signals), but still sample my data correctly (1 or N cycles late), and the timing analysis tool would shout at me anyway. 

 Thank you!

[EDIT: I attach the images as I dont know how to make them bigger]

 

TimingResult.png
schematic.png
0 Kudos
1 Solution

Accepted Solutions
avrumw
Expert
Expert
624 Views
Registered: ‎01-23-2009

OK.

Before we get started, I have some bad news for you. There is no way this is going to work with static capture. Period.

Your bit rate is 1.43ns, and the uncertainty is +/-300ps, so that means you lose 600ps on each bit. This leaves you with an 830ps valid window. This is WAY too small for static capture. Under ideal conditions, static capture only starts being viable with a valid window of somewhere between 1.5ns and 1.75ns with the best clocking structure.

Second, your clocking structure is not ideal. Take a look at this post on clocking structures for input capture; the best clocking structure is "ChipSync" - I hope your clock and data are all in the same clock region/IO bank.

But even with that, this isn't going to work. You will have to use dynamic capture, and that is a whole different can of worms. You can look at XAPP524 - it documents a capture scheme for exactly this case. I haven't looked at it recently (it is pretty old), and I am not certain I completely agree that this is the best way of doing things, but if you are going to run with this ADC at this rate, you have no choice but to implement something similar to this.

We could spend more time figuring out your constraints, but there isn't really any point. If you are going to use dynamic capture, then you don't need/can't use constraints - constraints are for static timing analysis (STA), and you are going to use dynamic capture - so static timing doesn't work. In fact this is one of the biggest problems with dynamic capture; it is nearly impossible to prove that your solution will work... If you are interested in how it would work if the interface were statically captured, you can look at this post on constraining center aligned interfaces; you may want to look at this one on edge aligned interfaces since it introduces many of the concepts used in the other one.

So you are in for a rough ride with this... If you are still able to, I would consider a different ADC - one that uses more pins at a slower rate, or one that uses JESD204. This "in between" interface (high speed serial LVDS data) is really hard for the FPGA to deal with...

Avrum

View solution in original post

5 Replies
avrumw
Expert
Expert
625 Views
Registered: ‎01-23-2009

OK.

Before we get started, I have some bad news for you. There is no way this is going to work with static capture. Period.

Your bit rate is 1.43ns, and the uncertainty is +/-300ps, so that means you lose 600ps on each bit. This leaves you with an 830ps valid window. This is WAY too small for static capture. Under ideal conditions, static capture only starts being viable with a valid window of somewhere between 1.5ns and 1.75ns with the best clocking structure.

Second, your clocking structure is not ideal. Take a look at this post on clocking structures for input capture; the best clocking structure is "ChipSync" - I hope your clock and data are all in the same clock region/IO bank.

But even with that, this isn't going to work. You will have to use dynamic capture, and that is a whole different can of worms. You can look at XAPP524 - it documents a capture scheme for exactly this case. I haven't looked at it recently (it is pretty old), and I am not certain I completely agree that this is the best way of doing things, but if you are going to run with this ADC at this rate, you have no choice but to implement something similar to this.

We could spend more time figuring out your constraints, but there isn't really any point. If you are going to use dynamic capture, then you don't need/can't use constraints - constraints are for static timing analysis (STA), and you are going to use dynamic capture - so static timing doesn't work. In fact this is one of the biggest problems with dynamic capture; it is nearly impossible to prove that your solution will work... If you are interested in how it would work if the interface were statically captured, you can look at this post on constraining center aligned interfaces; you may want to look at this one on edge aligned interfaces since it introduces many of the concepts used in the other one.

So you are in for a rough ride with this... If you are still able to, I would consider a different ADC - one that uses more pins at a slower rate, or one that uses JESD204. This "in between" interface (high speed serial LVDS data) is really hard for the FPGA to deal with...

Avrum

View solution in original post

javitxu
Visitor
Visitor
489 Views
Registered: ‎09-17-2018

Ok, first of all thanks a lot for the response as the links you provided have helped me understand much better what does the STA actually do. In fact, even my question about how to tell Vivado that some signals should be captured one or more cycles late is answered in one of the posts you linked (multicycle delay after all!).

About what you mentioned on clocking structures, yes and no: I have most data channels in the same bank as the clock, but not all of them, so that also complicates things.
But why do you conclude so positively that static capture is impossible for this interface? With a data window of 830 ps, and considering a typical setup time for a 7 series ISERDES of negative 20 ps with a hold time of 120 ps (see attached image from https://www.xilinx.com/support/documentation/data_sheets/ds187-XC7Z010-XC7Z020-Data-Sheet.pdf),
I would still have "plenty of window left". I understand that data might suffer jitter once inside the fpga but as much as ~700 ps? Or are there more aspects that I should consider?
I would like to avoid using dynamic capture, and as the ADC cannot be changed I might have to end up reducing the sample rate, as the value of 50MHz was not so critical.
In that case and after looking at the posts you linked I would say that the timing constraints I am using (setting aside the "create clock" ones) are sufficient, are they not?
Anyway, the first question in my original post remains the same: why does the analyzer "advance" the clock as it goes through the MMCM?

SwitchingCharacteristics.JPG
0 Kudos
avrumw
Expert
Expert
474 Views
Registered: ‎01-23-2009

But why do you conclude so positively that static capture is impossible for this interface? With a data window of 830 ps, and considering a typical setup time for a 7 series ISERDES of negative 20 ps with a hold time of 120 ps.

This is the setup and hold time of the ISERDES itself; between the CLK pin of the ISERDES and the D pin of the ISERDES - there are lots of other things on this path

  • Data path
    • The IBUF
    • Some internal routing in the IOB
  • Clock path 
    • The IBUF
    • The route to the MMCM
    • The MMCM itself 
    • The route to the BUF
    • The BUFG itself
    • The clock network - this is a significant delay (especially from a BUFG)

Now the MMCM attempts to cancel out much of the stuff on the clock path, but it isn't perfect.

All of these delays are process/voltage/temperature (PVT) dependent. Again, the MMCM removes some of this PVT on the clock path, but what remains (and the  PVT dependence of the data path IBUF) all contribute to widening the required setup and hold time. There is also jitter which contributes...

If you look at the same datasheet at tables 79-82, this gives the "pin-to-pin" setup and hold time requirements for various clocking schemes. I mention these schemes in my post on clocking structures (and I even mention the appropriate timing parameter from these tables). As you can see in these tables, the windows are MUCH wider than 120ps. 

Furthermore, these numbers are guidelines only, and (with a TON of digging) they are not truly worst case. The "final" numbers come from static timing analysis with the tools (which is why you need constraints). [DON'T ask my why the numbers in the datasheet are not "worst case"...]

As a result, I have experimented with all the different clock structures in several different technologies to determine what the "real" limits are (as defined by the tools).

Using "ChipSync" (the BUFIO/BUFR clocking), the datasheet sys the window is 1.17ns (for a -2 device - Tpscs/Tphcs). But again, the datasheet isn't truly worst case. Furthermore, this timing would be without an IDELAY, which would require the valid window to be in a very specific place; almost all interfaces will not "natively" meet the Tpscs/Tphcs, so will need the IDELAY to move it. So, for a "real" interface, using IDELAY and ChipSync clocking, after designing the capture mechanism, constraining it, and analyzing the results, I have found that 1.5-1.75ns is around the lower limit of what is possible with Chip Sync (with all data in the same clock region - the BUFMR makes timing much worse). By the way, this number isn't highly dependent on device family or speedgrade (it has been in a similar range since Virtex-5); unfortunately there isn't a "true" equivalent to this in UltraScale/UltraScale+/Versal - and in these technologies, the best you can do is a bit worse (although the "native" mode I/O can give you significantly better results for some interfaces).

Other clocking structures (i.e. like the one you have, which results in Tpsmmcm/Tphmmcm) are significantly slower - for example for a 7020 in -2 the datasheet says 2.20ns is the minimum window - again, the tool is a bit more pessimistic, and I wouldn't expect an interface like this to work with less than about 2.5n (this number is more family and speedgrade dependent, so I don't know it as well).

So, I stand by my previous assertion (and with a fair amount of confidence) - with an 800ps data valid window, you aren't even close to minimum requirement for static capture.

Sorry....

Avrum

avrumw
Expert
Expert
471 Views
Registered: ‎01-23-2009

Anyway, the first question in my original post remains the same: why does the analyzer "advance" the clock as it goes through the MMCM?

This actually ties in with my previous response.

The main role of the MMCM in this case (with 1:1 clock ratios, and using BUFG feedback) is to "cancel out" the delay (and more importantly the PVT variable delay) of some of the stuff I mentioned in the clock path above. It does this by a complicated mechanism, but ultimately works on actively aligning the feedback input to the MMCM (CLKFB) with the clock input of the MMCM (CLKIN). From a static timing point of view, this looks like a negative propagation delay. In fact it is actively working to make the total sum of source clock delay (the entire path from port of the FPGA to clock pin of the flip-flops inside the design) "close to 0" (not actually 0 - in fact I think it is tuned to try and get a slightly negative total, which is advantageous for input hold times).

Avrum

javitxu
Visitor
Visitor
416 Views
Registered: ‎09-17-2018

Ok, I think that answers all my questions. Thank you very much for the detailed answers!

0 Kudos