cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
charysm
Visitor
Visitor
1,177 Views
Registered: ‎10-22-2020

Source synchronous DDR input implementation

Jump to solution

Hello,

We implement the following DDR input from an external ADC. The target device is a xc7z045 (I have graphically simplified the hierarchy in the image) :

ssync_ddr_in.png

SYMPTOM: We can successfully create a valid eye for a number of different IDELAY tap settings. However, a sine signal as test input appears under certain circumstances with "needles", i.e. singular maximum and minimum values especially at zero crossings of the sine signal. The digital encoding is signed integer, hence we suspect bit flips on the MSB. Their occurrence is independent of the IDELAY setting. "Certain circumstances" means that there are on one hand particular devices that show the symptom and others don't with the identical bitfile, and so do particular builds from similar or even identical sources, run on the same device.

The ADC_LVDS_CLKP input is on a clock capable MRCC I/O on the top right corner of the device (Clock region X1Y6). f=245MHz. The local resources IBUFDS; BUFIO and IDELAY seem to be placed correctly in the same area.

Because we do successfully use IDELAY but the symptom is independent of the IDELAY setting, I believe that the straight forward assumption would be a critical timing relationship on the path from the IDDR launch clock to the ADC_Data_Real_reg data capture. The tool does calculate positive slack on that path. The PLL is set to a 1x period and no phase shift set on CLKOUT0. The IDDR have the SAME_EDGE_PIPELINED attribute set, and the clock path through the PLL and the two BUFG to the ADC_Data_Real register is long enough to push the capture edge into the subsequent cycle with respect to the IDDR launch edge.

With no location constraints on the PLL the tool puts the PLL on the opposite physical corner of the device (Clock region X0Y0). BUFG driven by the PLL (clkout1_buf as well as clkdf_buf) are located in the same half of the device. If we locate the PLL in the same clock region as the input, calculated delays are very similar through the BUFGs. We cannot determine whether this substantially changes the result, as it is not safely reproducible across builds. Most of the design's logic is clocked by the same clock as ADC_Data_Real_reg, and in general we have very low logic resource utilization and no placement constraints on logic.

Are we doing something obvious incorrectly with this topology? Are there any suggestions as to what to verify, how to debug the issue? We are trying now to to implement ILA cores on the input and output nets of ADC_Data_Real_reg, but have so far not been able to build a failing design that way.

Thanks for any input!

0 Kudos
Reply
1 Solution

Accepted Solutions
avrumw
Guide
Guide
924 Views
Registered: ‎01-23-2009

You have an illegal (or at least questionable) clock crossing here....

You are capturing the input data on a BUFIO. From there, you are transferring the data to a clock that is BUFG->MMCM->BUFG. Aside from the fact that you shouldn't have a BUFG before the MMCM (this isn't necessary if your clock is on a clock capable pin in the same clock region as the MMCM), but this is not a valid combination of clocks... The only clock that you should use to capture the data from the output of your IDDR is the BUFR. You can even use the BUFR to clock both the IDDR and the output logic. I suspect that you are seeing failed transfers between the IDDR and the flip-flop clocked by the BUFG/MMCM/BUFG. These bad clock crossings would be independent of the IDELAY setting.

If you need to transfer data to a "global" clock, then you first need to get the data legally on the BUFR domain and then use a proper mesochronous clock domain crossing circuit to bring it to a global clock - almost certainly a clock domain crossing FIFO; a shallow one implemented in distributed RAM is enough for mesochronous clock crossing.

Avrum

View solution in original post

13 Replies
hemangd
Moderator
Moderator
1,097 Views
Registered: ‎03-16-2017

@charysm 

So you are suspecting the Bit flipping issue on the hardware. 

Did you able to find out which register/other primitive and in which clock domain you are seeing this bitflip issue? If yes, can you share the routed checkpoint (dcp) with us via secured ftp so we can analyze it from the timing side? 

 

So your debug approach should be - Use ILAs on the faulty register o/p pin, D pin, reset pin to check the results

2. Resolve all the CDC violations.

 

Regards,
hemangd

Don't forget to give kudos and mark it as accepted solution if your issue gets resolved.
0 Kudos
Reply
charysm
Visitor
Visitor
1,060 Views
Registered: ‎10-22-2020

@hemangd 

Thank you for your reply. We could not yet determine where exactly the bit error (rather than bitflip) occurs. Adding ILA has so far only lead to designs where the issue doesn't occur.

One interesting observation is that we so far could rather observe erroneous '1' than erroneous '0', i.e. the 1 seems to be more sticky than the 0, with what we consider valid IDELAY tap values. A similar pattern can however be observed with any "good" implementation when IDELAY of a particular DDR input is set to one particular value, which is on the border of the valid tap values. This raises the question whether the IDLAY is actually correctly set and operated. Is there a particular reset sequence to be observed of IDELAYCTRL? It is connected to peripheral_reset of the PS, which I understand is asserted at least 2*16 clock cycles at 50 MHz. I believe this should satisfy the requirement   "A reset pulse width Tidelayctrl_rpw is required". The PLLE2_ADV has its reset tied to ground. There is one IDELAYCTRL instance and all IDELAY are in the same clock region.

We are still investigating your other suggestions. Thanks.

1,017 Views
Registered: ‎01-22-2015

@charysm 

The PL side of your XC7Z045 is similar to a Kintex-7 (ref DS190).  Using the best capture architecture (BUFIO + BUFR (ChipSync)), a capture window of more than 1ns is needed.  However, your 245MHz DDR interface is starting with a window of only 2.04ns and the clock-to-out variation of your ADC is going to eat into this.

Point being, your interface is bordering on the limit of what is possible with static capture.

Please show us the set_input_delay constraints that you have written for this interface.

Cheers,
Mark

 

hemangd
Moderator
Moderator
986 Views
Registered: ‎03-16-2017

@charysm 

Sharing this blog here https://forums.xilinx.com/t5/Design-and-Debug-Techniques-Blog/Using-the-Methodology-Report-Part-Four-Rare-Bit-Flips/ba-p/1164870 which may help here because the issue looks similar. 

Regards,
hemangd

Don't forget to give kudos and mark it as accepted solution if your issue gets resolved.
charysm
Visitor
Visitor
946 Views
Registered: ‎10-22-2020

Thank you for all the suggestions. We will certainly try the suggested input architecture, however the question remains as to how to determine whether we solved anything in case we get a few "good" and only good builds, or just decreased the likelihood of a "bad" build.

These are the input constraints:

set_input_delay -clock [get_clocks HSCLK_IN] -max 1.484 [get_ports {ADC_LVDS_inB[*]}]
set_input_delay -clock [get_clocks HSCLK_IN] -min 0.550 [get_ports {ADC_LVDS_inB[*]}]
set_input_delay -clock [get_clocks HSCLK_IN] -max 1.484 [get_ports {ADC_LVDS_inB[*]}] -add_delay -clock_fall
set_input_delay -clock [get_clocks HSCLK_IN] -min 0.550 [get_ports {ADC_LVDS_inB[*]}] -add_delay -clock_fall

I can confirm that we do not achieve static timing closure on those inputs. At a certain IDELAY_VALUE property of the IDELAY, some inputs show setup and others hold violations. In practice, with a "good" build, we have +10 taps and -10 taps margin around that value during calibration of the interface. Recall: on a "bad" build, one out of 7 DDR channels shows permanent symptoms of failure regardless of IDELAY setting, while the others work as expected. 

Question: Since all input resources are routed within the I/O tile associated with the input, are these constraints even relevant to the implementation? Will they affect anything else beyond the IDDR?

CDC: There are no violations in the cdc_report of the above clocks. Other CDC show critical warnings, but they are all related to AXI registers that influence the data path, and those registers do not toggle while the data path is active. The LUT3 in above schematic involves such a path, where in practice I0 is switched to O at all times and other inputs are static or ignored. I however still suspect that CDC to be rather the issue than the path into IDDR, given the symptoms described. I haven't been able to find a way to determine where the bit error occurs.

By the way, we are using FPGA modules with the on the same mainboard that includes everything else, power, ADC etc. So the device to device variation that we see (besides build to build) only relates to the FPGA.

0 Kudos
Reply
avrumw
Guide
Guide
925 Views
Registered: ‎01-23-2009

You have an illegal (or at least questionable) clock crossing here....

You are capturing the input data on a BUFIO. From there, you are transferring the data to a clock that is BUFG->MMCM->BUFG. Aside from the fact that you shouldn't have a BUFG before the MMCM (this isn't necessary if your clock is on a clock capable pin in the same clock region as the MMCM), but this is not a valid combination of clocks... The only clock that you should use to capture the data from the output of your IDDR is the BUFR. You can even use the BUFR to clock both the IDDR and the output logic. I suspect that you are seeing failed transfers between the IDDR and the flip-flop clocked by the BUFG/MMCM/BUFG. These bad clock crossings would be independent of the IDELAY setting.

If you need to transfer data to a "global" clock, then you first need to get the data legally on the BUFR domain and then use a proper mesochronous clock domain crossing circuit to bring it to a global clock - almost certainly a clock domain crossing FIFO; a shallow one implemented in distributed RAM is enough for mesochronous clock crossing.

Avrum

View solution in original post

charysm
Visitor
Visitor
904 Views
Registered: ‎10-22-2020

Thanks a lot, @avrumw , this is very helpful. We will implement your suggestions.

Looking to improve our use of the available tools and documentation in the future, should we have had or did we miss a corresponding warning in the implementation flow? Are these principles part of the manuals somewhere? I didn't find anything on the capture of IDDR outputs, and regarding the PLL I had interpreted the below diagram from ug472 as describing a legal clocking path from BUFG to PLL:

 

clocking.PNG

0 Kudos
Reply
870 Views
Registered: ‎01-22-2015

@charysm 

I can confirm that we do not achieve static timing closure on those inputs. 

Q1: Which paths are failing timing analysis?  The paths from the ADC to the IDDR - or the paths from the IDDR to ADC_Data_Real_reg[.]?

Q2: What is the value of negative slack that you get for the failed timing paths?

 

0 Kudos
Reply
charysm
Visitor
Visitor
811 Views
Registered: ‎10-22-2020

[edit: rewording and correcting slack] The paths from the LVDS I/Os to the IDDR fail timing by about -300ps slack, combined setup and hold. That is, a path will show for example -200 setup and -100 hold slack for a particular IDELAY_VALUE property setting.

While any advice on dealing with those is highly welcome, I believe that Avrum's answer captures the problem at hand pretty well.

0 Kudos
Reply
avrumw
Guide
Guide
790 Views
Registered: ‎01-23-2009

It's not a matter of whether the MMCM can clock the IDDR - it can - there are clocking topologies where this is useful and recommended (although the BUFG before the MMCM isn't recommended). Take a look at this post on clocking topologies for input interfaces.

The question is about what are legal synchronous clock crossings inside the FPGA. The BUFIO and BUFR are designed to work together as low-skew buffer pairs - the BUFIO specifically tailored for the high speed side of the ISERDES (but it can also clock an IDDR) and the BUFR designed for the low speed side of the ISERDES as well as other clocked resources in its clock region (including, by the way, the IDDR - I have found that the BUFR alone works slightly better than the BUFIO when using the IDDR).

Other pairs of clocking resources are not balanced - specifically, the BUFIO is not balanced with the BUFG->MMCM->BUFG combination you have. This makes the passing of data between these domains difficult if not impossible; they are just another "bad clock crossing". What is odd, though, is that the tools understand the timing of all these elements and will time this path (from the IDDR to the clocked element on the BUFR) normally, including accounting for the large and PVT variable skew between them. Normally, this would result in a static timing failure (which is what markg@prosensing.com was asking). If you are not seeing a failure here, then, even though this isn't recommended, it should work... UNLESS you have some timing exception that is suppressing the violation on this path... Check your constraints...

Avrum

0 Kudos
Reply
svdmark
Adventurer
Adventurer
774 Views
Registered: ‎08-17-2009

This is exactly the question my colleague @charysm and I are wondering about:

Why do the tools not complain? 

We have checked the timing of the path from BUFIO driven IDDR to the BUFG driven FDRE, and the tool calculates 90ps slack. Not a lot, but should be enough. It also did work fine in series production for a year or so, only with the newest batch of Zynq boards we received, we saw the timing issues (no changes in PCB Layout). On some examples only one of the data connections produce errors, on some several bits fail, and only with some builds (of identical sources). Some builds work on all examples we currently have in the lab (about 10), and some older examples of Zynq work with all builds (all are 7045 of speed grade 2).

We will certainly change the structure as you recommend, but the question still is important: obviously we need to be sure that the new solution works reliably for any future builds (there will be updates during the product life cycle) and for any future batches of Zynq.

Thanks a lot for your help!

Stefan

 

0 Kudos
Reply
749 Views
Registered: ‎01-22-2015

@charysm 

The paths from the LVDS I/Os to the IDDR fail timing by about -300ps slack...

If you are going to use static-capture for this interface, then your design must pass timing analysis.  Your constraints show that the ADC has a clock-to-out variation for HSCLK_IN of (1.484 - 0.550 = 0.934ns).  This will subtract from your DDR/245MHz window of 2.04ns giving the FPGA a capture window of (2.04 - 0.934 = 1.106ns) for the data.  A 1.106ns capture window is just barely enough for the best static-capture architecture in 7-Series devices.

If your static-capture design cannot pass timing analysis, then you can instead use dynamic-capture methods to keep the clock capture-edge in the middle of the data-eye.

 

 

I believe that Avrum's answer captures the problem at hand pretty well.

I agree with Avrum's advice to replace the BUFG->MMCM->BUFG chain with a BUFR.  However, his advice will not solve "The paths from the LVDS I/Os to the IDDR fail timing".   The only solution I know for this problem is to use dynamic-capture methods.

 

 

@svdmark  Why do the tools not complain? 

Vivado assumes that all clocks are synchronous.  Therefore, Vivado will try to time data-crossings between all clock-domains.  It is our responsibility to know when clocks are asynchronous, which is not always easy as demonstrated in <this> post.  If two clocks are asynchronous (or mesochronous) then it is our responsibility to place a clock-domain-crossing-circuit (CDCC) at the crossing.

When you setup the MMCM with the Clocking Wizard, you should have selected "Phase Alignment", which ensures a known phase relationship between the clock-input and clock-output of the MMCM.  If you did this, then the clock-output of the BUFG->MMCM->BUFG chain should be synchronous with the clock-output of the BUFIO.  Therefore, I think that Vivado will time a crossing of data between the two clock-domains correctly.  However, I think Avrum's point is that the two clocks are highly skewed and this will make it difficult for a crossing of data between the two clock-domains to pass timing analysis.  You can reduce skew between the two clocks by replacing the BUFG->MMCM->BUFG chain with a BUFR (as Avrum recommends).

 

 

 

charysm
Visitor
Visitor
616 Views
Registered: ‎10-22-2020

Thank you all for your input. We are using the IN_FIFO now for the CDC. According to its description it is just meant for this type of application, and a depth of 8 seems quite suitable as well. If there are some other non-obvious concerns for that type of architecture, please let me know!

0 Kudos
Reply