cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Visitor
Visitor
4,157 Views
Registered: ‎05-04-2016

Problem compensating BUMR delay with PLLs for deserializer module in FPGA (Zedboard/Vivado2016.4)

Jump to solution

Good day,

 

I am using Vivado to develop a module to interface an 14bit-ADC with 16 channels to the FPGA of a Zedboard (I understand its FPGA is XC7Z020CLG484) The interface is source synchronous and DDR center aligneed. The serial data channels and the data clock (DCLK) come from the ADC as LVDS pairs.  The 16  input channels are distributed into 2 adjacent clock regions of the FPGA.   I am using PLLs in each region to cancel the clock skewing introduced in DCLK by a regional clock buffer or  BUFR (I place another BUFR in the PLL feedback path), however,  I am still not able to meet timing in my designs implementation. Apparently the problem is that in the path of DCLK I have Multiregional Clock (BUFMR) which is also skewing the clock, however I do not think I can include a BUFMR in the PLL feedback path to remove this effect.  How safe would it be to just estimate the phase delay to re-align (or totally deskew) the DCLK with the serial data inputs, and just enter this delay as parameter to the PLL?

 

Below is the extended description of my problem and the setup I have.  I look forward to your comments:

 

The clock to data relationship is DDR center aligned as shown in the next image. My design is constrained to have a DCLK period of 100/7 ≈ 14.285 ns and a FCLK (or sampling clock) period of 100ns

 

Timing Diagram.JPG

 

Note:  As I show in detail below it is worth noting the following two points:

  • I am ignoring the FCLK that the ADC provides. I generate my own from an output of the PLL. 
  • I use the ISERDES primitive to de-serialize the data and within this I use the "bitslip" feature and a training pattern from the ADC feature to recover the sampling frame

 

In my circuit board the ADC board connects to the FPGA via standard FMC connectors such that:

 

  • 7 of the 16 ADC channels are mapped to BANK34 of the FPGA
  • The other 9 are mapped to BANK35.
  • The data clock input DCLK is mapped to BANK35 on a clock capable pin
  • BANK34 and BANK35 are located in different but adjacent clock regions in the FPGA

The image below shows the mapping of the inputs to the FPGA

 

 Clock_Input_Regions.JPG

 

Because the serial data input ports are distributed in adjacent clock regions I used a Multiregion Clock buffer  (BUMR) to make the data clock signal could be available to both regions.

 

I use PLLs in each clock regions to regenerate the data clock signal and compensate for the clock skew that the regional clock buffers introduce. The image below is from the RTL’s Elaborated Design Schematic of my interface module:

 

Clock_Manager_Module_2.JPG

 

  • The IBUF_CLK buffer converts the data clock from LVDS to single ended. All the data inputs go through similar buffers, so I assume the data clock-to-data skew is cancelled for this input buffer. Note that DCLK comes through the clock region associated to Bank35.
  • The BUFMR_DCLK buffer is the multiregional clock buffer which goes to BUFR's (regional clock buffers) in both clock regions.
  • Each PLL is associated with a clock region.
  • Each PLL has a BUFR in its feedback path (between pins CLKFBIN and CLKFBOUT) to compensate for the clock skew introduced by the BUFRs. Note that I do not have a BUFMR in the PLL feedback path. I do not know if it is possible to include one there.
  • I had to add a BUFR at the outputs of each PLL to prevent the Vivado from using Global buffers (I saw it doing so)
  • Both outputs of the PLLs (dclk_bank* and fclk_bank*) are in phase with the PPLs clock input, but the period of fclk_bank* outputs is equal divided by 7

Below is an RTL schematic of how I am using the ISERDESE2 primitive to deserialize the 14 bit word of one channel.  I have to use 2 primitives in master/slave mode:

 

Single 14 Bit Deserializer.JPG

 

Below are the constraints I used to define the timing of the clocks.  Note that I am using a virtual clock (VDCLK) to indicate that the data inputs are delayed by 90 degrees from DCLK.  CLOCK_FPGA_0 is the clock that I use for the rest of my circuit and it is generated by the Zynq processor.

 

Time Constraints.JPG

-----------------------------------

 

 After implementation the design fails timing in most of the inputs on both banks:

 

FPGA with Timing Errors - both regions with errors.JPG

 

Then I tried bypassing the BUFMR for the clock region of BANK35 since the clock is already in that region.  I still had to use BUFMR to bring the clock to region of BANK34

 

Clock_Manager_Module.JPG

 

After implementation of this variant the tool indicates that I fail timing and that ALL the paths associated with the inputs on BANK34 but all the ones in BANK35 are OK.

 

 Paths with Timing Errors.JPG

 

and the errors are the following (may need to zoom in to see them clearly)

 

 Timing Erors.JPG

 

So to me it is clear that the BUFMR is skewing the DCLK enough to mess up timing. But since I cannot introduce a BUFMR in the feedback path of the PLLs I thought of simply estimating the phase delay I need to introduce in the PLL output, based on the timing report, to get the proper alignment between the serial data input and the pll's dclk output.  I am somewhat concerned about variations with voltage and temperature, but I think the clock frequency is relatively low (70MHz). 

 

I look forward to your comments,

 

Thanks

 

JCV

1 Solution

Accepted Solutions
Highlighted
Guide
Guide
6,833 Views
Registered: ‎01-23-2009

Basically your clocking structure is "illegal" (or at least not recommended).

 

The BUFR is a regional clock resource. The PLL is a global clock resource. There are no (or limited) dedicated connections between these two, and hence you are ending up in fabric routing for at least some of these paths. As a result, the compensation provided by your BUFR on the feedback path is not actually going to compensate for the BUFR in the data path (since there is fabric routing in both).

 

You have two choices:

 

  - Don't use the BUFR. If you are using the global clocking resources (the PLL) then you are not getting the advantage of the regional clocking resources anyway - just use BUFGs for your clocking - have the PLL generate both the high speed and low speed (/7) clocks on different outputs and use BUFGs for both. Use the phase shift of the PLL (or an MMCM instead, if you need it) to adjust the clock/data relationship. At this clock frequency you shouldn't have any trouble capturing the data; 100MHz DDR is a 5ns valid window which is more than enough for this kind of clocking

 

  - Don't use the PLL. The BUFIO/BUFR are designed specifically for use with source synchronous interfaces. To use them properly you should only be using them (i.e. no PLL). Since your loads are in two banks you need to use the BUFMR and you need to synchronize the BUFRs - both of these are described in the user guide. To manage the clock/data timing relationship you should use the IDELAY on inputs. At 5ns periods, the IDELAYs may not give enough delay (they can only provide 2.5ns), but you can choose between pushing the clock forward up to 2.5ns or pushing the data forward up to 2.5ns (minus some jitter), so you should be able to find a combination that works. Note: the BUFMR adds a fair bit of uncertainty...

 

Both of these should yield viable solutions, but the hybrid you have now probably wont...

 

Avrum

View solution in original post

4 Replies
Highlighted
Guide
Guide
6,834 Views
Registered: ‎01-23-2009

Basically your clocking structure is "illegal" (or at least not recommended).

 

The BUFR is a regional clock resource. The PLL is a global clock resource. There are no (or limited) dedicated connections between these two, and hence you are ending up in fabric routing for at least some of these paths. As a result, the compensation provided by your BUFR on the feedback path is not actually going to compensate for the BUFR in the data path (since there is fabric routing in both).

 

You have two choices:

 

  - Don't use the BUFR. If you are using the global clocking resources (the PLL) then you are not getting the advantage of the regional clocking resources anyway - just use BUFGs for your clocking - have the PLL generate both the high speed and low speed (/7) clocks on different outputs and use BUFGs for both. Use the phase shift of the PLL (or an MMCM instead, if you need it) to adjust the clock/data relationship. At this clock frequency you shouldn't have any trouble capturing the data; 100MHz DDR is a 5ns valid window which is more than enough for this kind of clocking

 

  - Don't use the PLL. The BUFIO/BUFR are designed specifically for use with source synchronous interfaces. To use them properly you should only be using them (i.e. no PLL). Since your loads are in two banks you need to use the BUFMR and you need to synchronize the BUFRs - both of these are described in the user guide. To manage the clock/data timing relationship you should use the IDELAY on inputs. At 5ns periods, the IDELAYs may not give enough delay (they can only provide 2.5ns), but you can choose between pushing the clock forward up to 2.5ns or pushing the data forward up to 2.5ns (minus some jitter), so you should be able to find a combination that works. Note: the BUFMR adds a fair bit of uncertainty...

 

Both of these should yield viable solutions, but the hybrid you have now probably wont...

 

Avrum

View solution in original post

Highlighted
Guide
Guide
4,107 Views
Registered: ‎01-23-2009

As for constraints, look at this post on constraining source synchronous center aligned interfaces. It is primarily written for the BUFIO/BUFR solution - the solution for the PLL/MMCM may be different depending on the phase shift required.

 

Avrum

Tags (1)
0 Kudos
Highlighted
Visitor
Visitor
4,026 Views
Registered: ‎05-04-2016

Avrum,

 

Thanks for your input. I had tried to manage my clocks without the PLLs before but could never meet timing.  With the clocking structure I presented in this post and some trial/error adjustments on the PLL's phase I was able to meet timing. 

 

I will try the second option you mentioned (without PLLs) and adjust the clock/data timing relationship with IDELAYs... However, I am concerned about this option since you are commenting that the "the BUFMR adds a fair bit of uncertainty..."  If I "tune" the IDELAY properly and achieve a timing analysis report without failing paths, should I still be concerned about this uncertainty?

 

I found  some information about BUFMR and BUFR alignment on UG472 (v1.13, 2017) page 110.   I think it is something manageable.

 

Thanks for your help

jcv65

0 Kudos
Highlighted
Guide
Guide
4,019 Views
Registered: ‎01-23-2009

If I "tune" the IDELAY properly and achieve a timing analysis report without failing paths, should I still be concerned about this uncertainty?

 

No. Assuming your constraints are correct, then if the tools indicate that the design meets timing then it will be fine. The tools understand all the uncertainty in the system, including that coming from the BUFMR.

 

What I was pointing out here is the fact that different solutions have different uncertainties. These uncertainties ultimately determine how small the data valid window can be and still achieve reliable capture. A capture mechanism in a single bank (and hence using a single BUFIO/BUFR) will work with a smaller data valid window than a mechanism that is split over multiple banks and needs the BUFMR. The same is true of IDELAYs - if you are lucky and your timing window works without IDELAYs, then the required window will be smaller (your margins will be larger). If you need to adjust timing with the IDELAYs, the required window gets larger.

 

A mechanism with global clocking will require a different size window - normally larger than what is required when using a single BUFIO/BUFR, but maybe not smaller than what is required when using the BUFMR - particularly if you need IDELAYs.

 

I am puzzled by your statement that you were having trouble with the PLL solution (and I don't know if you meant an actual PLL or an MMCM - the MMCM gives better control over phase shift for data capture) - with a unit interval (bit period) of over 7ns, you should easily be able to meet timing with any of these solutions (unless the clock/data skew from your ADC is really bad) - generally with a window of more than 3ns (or maybe 4ns), things become pretty easy.

 

Avrum

0 Kudos