cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Scholar
Scholar
1,377 Views
Registered: ‎06-23-2014

Please help with this timing failure

I'm using Vivado 2018.1 and System Verilog.  Custom target board loosely derived from the Kintex-7 KC705.

 

This timing failure is inside a 3rd party IP to which I do not have source code.  It is a Camera Link framegrabber IP.

 

I had been building my bitstreams successfully, and they have been running successfully, but then it was recently discovered that I failed to constrain the incoming Camera Link clock to 85MHz.  So I added that constraint, and as fate would have it, the project no longer meets timing. 

 

The worst timing path doesn't make good sense to me.  I am accustomed to what the timing looks like for a failing 5ns path coming from my 200MHz system clock.  This timing failure doesn't look comparable.  I also see a MCMM involved.  The 85MHz means a n 11.746ns period (yes, I rounded down).  The timing failure features times like 17.646 and 18.486.  Those confuse me.  I've only need non-5ns times in the system clock scenario whenever I had a bad (unaccounted for) clock crossing.  That experience, applied to these numbers, plus the MCMM I see in the 0.840ns requirement near the top of the report, all make me wonder about what's actually going on inside the IP.

 

Any advice on this would be greatly appreciated.  The timing summary is below:

CL Timing Failure 201809150929_top.jpg

CL Timing Failure 201809150929_middle.jpg

CL Timing Failure 201809150929_bottom.jpg

0 Kudos
6 Replies
Highlighted
Guide
Guide
1,348 Views
Registered: ‎01-23-2009

It appears the destination clock is rise at 0 and fall at 1.681ns for a clock rate around 297.5MHz - which would be 3.5x the 85MHz. It looks like this camera link is therefore generating a high speed serial clock at 3.5x the 85MHz for capturing the data. It is also presumably DDR, which makes the bit period 1.68ns.

 

Given this, the requirement on this path (assuming the input is constrained from the "correct" clock) would be 1.68ns, but its not - it is 0.84ns, which is 1/2 of this bit period - that doesn't look right. But without

  - the constraints

  - the programming of the MMCM

I can't tell why.

 

A static data capture of 1.68ns is barely possible with a Kintex-7, but it is very tight, and it depends on the speedgrade of the part, and it depends on the clocking structure of the capture interface.

 

The capture interface is "odd" - it includes both an IDELAY and an MMCM - this shouldn't be necessary - if you have an MMCM you can use the fine phase shift of the MMCM to generate the delay you need - this comes basically "for free" with the MMCM. Using the IDELAY instead costs more delay, uncertainty and has coarser adjustment. You also have an IDELAY on the data (again, only one of these three should be necessary).

 

So

  - show us the constraints - both from the IP's XDC and from your master XDC

     - it is possible that the IP's XDC already constrained the interface, and the new constraints you added are somehow incorrect

     - we need both the

        -create_clock and create_generated_clock commands

          - (there probably shouldn't be any create_generated_clock commands) 

        - the set_input_delay commands

          - from the timing report I am not sure that there are any set_input_delay commands

  - show us the attributes of the MMCM - we need to know what the MMCM is doing

     - a report_clocks might also help

 

But all of this leads to the point that this interface is questionable from the static timing point of view. But this may be OK - are you sure that the camera link is doing static capture? If the interface is already doing dynamic capture (which may be the case at these speeds and with all these programmable delays) then static timing is irrelevant, and constraints may have been (correctly) left out for a reason...

 

Avrum

 

0 Kudos
Highlighted
Scholar
Scholar
1,329 Views
Registered: ‎06-23-2014

@avrumw , thanks once again for your very valuable reply.

 

Many of your questions are really getting out of range of my comfort range and understanding.  I will try to collect answers as well as more certainty.  (There, I finished writing the below.  My action items are to confirm that I got the full IP and that it came with no constraints.  Otherwise, @avrumw , please tell me of anything else specific that might help.)

 

Note that I inherited a zip of the IP which I believe is complete, but I need to confirm with associates.  I find no constraints in it at all.  The doc is very light weight.  In essence, all I got that I can recognize is a thin pdf doc, plus three files: two .vhd's and one .edn.  My project has included as source both one of the vhd's and the edn.  I don't know what an edn file, but it looks to me like a 9MB text netlist.

 

The ONLY constraint I can figure associated with this logic is the one I just added:

create_clock -period 11.764 -name A_CL_BASE_XCLK_P -waveform {0.000 5.882} [get_ports A_CL_BASE_XCLK_P]

(Oh, yes, I did include that in the original post, but didn't provide all the associated info above, of course.)

 

The 3.5x clock sounds reasonable.  Camera Link has 7 data bits per clock, so at DDR it takes a 3.5x clock.  I've done a little unrelated de-serialization and that's exactly what I used.  I don't know how you picked out the rise/fall of 0/1.681ns.  Perhaps you might educate me slightly on that as well.  The closest I can come to figuring it out is the upper "Requirement" spec where the 17.xxx and 18.xxx differ by 0.84 and, OH!, it says 0.840 right there anyway.  I do notice the relatively HUGE clock path skew and uncertainty.

 

MY BIG QUESTION is that I don't understand what you mean by static capture vs dynamic capture.  I assume you do know Camera Link already.  For our current usage, this IP simply takes in a base camera link signal that has 4 serial lines (thus up to 28 bits of which only 7 are data, as you probably know), and every CL clock cycle maps those bits to an 80-bit wide word.  (The word is 80-bits wide in case one uses the other medium or full cables.)  Then, that 80-bit word comes out of an AXI-Stream port.  That's pretty much all there is to it.  So, again, I don't know what you mean by static capture vs dynamic capture.

 

You asked about create_clock.  Well, there it is further above.

 

Nobody I know of inserted any set_input_delay commands.  Do I need to?  Might that be a second step after getting it to meet timing at all?

 

I'm not sure I can access any attributes of the MMCM.  It's inside the IP where I can't see.

 

Here comes the report_clocks.  I see the 3.5x clock in there, as well as a 1.75x one.  I see embedded in the path name the word "ddr".  Already knew it has to be DDR.  Note that since the IP can do not just base, but also medium and full, I connected two unused base clock inputs to this guy's medium and full.  That's why you see {A|B|C}_CL_BASE_XCLK_P.

report_clocks
Copyright 1986-2018 Xilinx, Inc. All Rights Reserved.
------------------------------------------------------------------------------------
| Tool Version : Vivado v.2018.1 (win64) Build 2188600 Wed Apr  4 18:40:38 MDT 2018
| Date         : Sat Sep 15 23:32:21 2018
| Host         : Madison running 64-bit major release  (build 9200)
| Command      : report_clocks
| Design       : MyProject_Top
| Device       : 7k160t-ffg676
| Speed File   : -2  PRODUCTION 1.12 2017-02-17
------------------------------------------------------------------------------------

Clock Report


Attributes
  P: Propagated
  G: Generated
  A: Auto-derived
  R: Renamed
  V: Virtual
  I: Inverted
  S: Pin phase-shifted with Latency mode

Clock                                                                                       Period(ns)  Waveform(ns)    Attributes  Sources
base_mb_wrapper/base_mb_i/mdm_1/U0/Use_E2.BSCAN_I/Use_E2.BSCANE2_I/DRCK                     33.333      {0.000 16.667}  P           {base_mb_wrapper/base_mb_i/mdm_1/U0/Use_E2.BSCAN_I/Use_E2.BSCANE2_I/DRCK}
base_mb_wrapper/base_mb_i/mdm_1/U0/Use_E2.BSCAN_I/Use_E2.BSCANE2_I/UPDATE                   33.333      {0.000 16.667}  P           {base_mb_wrapper/base_mb_i/mdm_1/U0/Use_E2.BSCAN_I/Use_E2.BSCANE2_I/UPDATE}
txoutclk_x0y0                                                                               10.000      {0.000 5.000}   P           {xillybus_ins/pcie/pcie_k7_vivado/inst/inst/gt_top_i/pipe_wrapper_i/pipe_lane[0].gt_wrapper_i/gtx_channel.gtxe2_channel_i/TXOUTCLK}
FPGA_SYSCLK                                                                                 5.000       {0.000 2.500}   P           {FPGA_SYSCLK_P}
clkfbout_base_mb_clk_wiz_1_0_1                                                              5.000       {0.000 2.500}   P,G,A       {base_mb_wrapper/base_mb_i/clk_wiz_1/inst/mmcm_adv_inst/CLKFBOUT}
clk_out1_base_mb_clk_wiz_1_0_1                                                              10.000      {0.000 5.000}   P,G,A       {base_mb_wrapper/base_mb_i/clk_wiz_1/inst/mmcm_adv_inst/CLKOUT0}
clk_out2_base_mb_clk_wiz_1_0_1                                                              5.000       {0.000 2.500}   P,G,A       {base_mb_wrapper/base_mb_i/clk_wiz_1/inst/mmcm_adv_inst/CLKOUT1}
mmcm_fb                                                                                     10.000      {0.000 5.000}   P,G,A       {xillybus_ins/pipe_clock/pipe_clock/mmcm_i/CLKFBOUT}
clk_125mhz                                                                                  8.000       {0.000 4.000}   P,G,A       {xillybus_ins/pipe_clock/pipe_clock/mmcm_i/CLKOUT0}
clk_250mhz                                                                                  4.000       {0.000 2.000}   P,G,A       {xillybus_ins/pipe_clock/pipe_clock/mmcm_i/CLKOUT1}
userclk1                                                                                    4.000       {0.000 2.000}   P,G,A       {xillybus_ins/pipe_clock/pipe_clock/mmcm_i/CLKOUT2}
A_CL_BASE_XCLK_P                                                                            11.764      {0.000 5.882}   P           {A_CL_BASE_XCLK_P}
B_CL_BASE_XCLK_P                                                                            11.764      {0.000 5.882}   P           {B_CL_BASE_XCLK_P}
C_CL_BASE_XCLK_P                                                                            11.764      {0.000 5.882}   P           {C_CL_BASE_XCLK_P}
loop8.rx_mmcm_adv_inst_n_0                                                                  11.764      {0.000 5.882}   P,G,A       {cam_1/cam_A_core/design_1_i/FC_AXI_CL_R_0/U0/FC_AXI_CL_R_v1_0_M01_AXIS_inst/top4x3_7to1_ddr_rx_inst/rx0/rx0/loop8.rx_mmcm_adv_inst/CLKFBOUT}
loop8.rx_mmcm_adv_inst_n_4                                                                  3.361       {0.000 1.681}   P,G,A       {cam_1/cam_A_core/design_1_i/FC_AXI_CL_R_0/U0/FC_AXI_CL_R_v1_0_M01_AXIS_inst/top4x3_7to1_ddr_rx_inst/rx0/rx0/loop8.rx_mmcm_adv_inst/CLKOUT0}
loop8.rx_mmcm_adv_inst_n_6                                                                  6.722       {0.210 3.571}   P,G,A       {cam_1/cam_A_core/design_1_i/FC_AXI_CL_R_0/U0/FC_AXI_CL_R_v1_0_M01_AXIS_inst/top4x3_7to1_ddr_rx_inst/rx0/rx0/loop8.rx_mmcm_adv_inst/CLKOUT1}
sys_clk                                                                                     10.000      {0.000 2.500}   P           {PCIE_REFCLK_P}
dbg_hub/inst/BSCANID.u_xsdbm_id/SWITCH_N_EXT_BSCAN.bscan_inst/SERIES7_BSCAN.bscan_inst/TCK  33.000      {0.000 16.500}  P           {dbg_hub/inst/BSCANID.u_xsdbm_id/SWITCH_N_EXT_BSCAN.bscan_inst/SERIES7_BSCAN.bscan_inst/TCK}


====================================================
Generated Clocks
====================================================

Generated Clock     : clkfbout_base_mb_clk_wiz_1_0_1
Master Source       : base_mb_wrapper/base_mb_i/clk_wiz_1/inst/mmcm_adv_inst/CLKIN1
Master Clock        : FPGA_SYSCLK
Multiply By         : 1
Generated Sources   : {base_mb_wrapper/base_mb_i/clk_wiz_1/inst/mmcm_adv_inst/CLKFBOUT}

Generated Clock     : clk_out1_base_mb_clk_wiz_1_0_1
Master Source       : base_mb_wrapper/base_mb_i/clk_wiz_1/inst/mmcm_adv_inst/CLKIN1
Master Clock        : FPGA_SYSCLK
Edges               : {1 2 3}
Edge Shifts(ns)     : {0.000 2.500 5.000}
Generated Sources   : {base_mb_wrapper/base_mb_i/clk_wiz_1/inst/mmcm_adv_inst/CLKOUT0}

Generated Clock     : clk_out2_base_mb_clk_wiz_1_0_1
Master Source       : base_mb_wrapper/base_mb_i/clk_wiz_1/inst/mmcm_adv_inst/CLKIN1
Master Clock        : FPGA_SYSCLK
Multiply By         : 1
Generated Sources   : {base_mb_wrapper/base_mb_i/clk_wiz_1/inst/mmcm_adv_inst/CLKOUT1}

Generated Clock     : mmcm_fb
Master Source       : xillybus_ins/pipe_clock/pipe_clock/mmcm_i/CLKIN1
Master Clock        : txoutclk_x0y0
Multiply By         : 1
Generated Sources   : {xillybus_ins/pipe_clock/pipe_clock/mmcm_i/CLKFBOUT}

Generated Clock     : clk_125mhz
Master Source       : xillybus_ins/pipe_clock/pipe_clock/mmcm_i/CLKIN1
Master Clock        : txoutclk_x0y0
Edges               : {1 2 3}
Edge Shifts(ns)     : {0.000 -1.000 -2.000}
Generated Sources   : {xillybus_ins/pipe_clock/pipe_clock/mmcm_i/CLKOUT0}

Generated Clock     : clk_250mhz
Master Source       : xillybus_ins/pipe_clock/pipe_clock/mmcm_i/CLKIN1
Master Clock        : txoutclk_x0y0
Edges               : {1 2 3}
Edge Shifts(ns)     : {0.000 -3.000 -6.000}
Generated Sources   : {xillybus_ins/pipe_clock/pipe_clock/mmcm_i/CLKOUT1}

Generated Clock     : userclk1
Master Source       : xillybus_ins/pipe_clock/pipe_clock/mmcm_i/CLKIN1
Master Clock        : txoutclk_x0y0
Edges               : {1 2 3}
Edge Shifts(ns)     : {0.000 -3.000 -6.000}
Generated Sources   : {xillybus_ins/pipe_clock/pipe_clock/mmcm_i/CLKOUT2}

Generated Clock     : loop8.rx_mmcm_adv_inst_n_0
Master Source       : cam_1/cam_A_core/design_1_i/FC_AXI_CL_R_0/U0/FC_AXI_CL_R_v1_0_M01_AXIS_inst/top4x3_7to1_ddr_rx_inst/rx0/rx0/loop8.rx_mmcm_adv_inst/CLKIN1
Master Clock        : A_CL_BASE_XCLK_P
Multiply By         : 1
Generated Sources   : {cam_1/cam_A_core/design_1_i/FC_AXI_CL_R_0/U0/FC_AXI_CL_R_v1_0_M01_AXIS_inst/top4x3_7to1_ddr_rx_inst/rx0/rx0/loop8.rx_mmcm_adv_inst/CLKFBOUT}

Generated Clock     : loop8.rx_mmcm_adv_inst_n_4
Master Source       : cam_1/cam_A_core/design_1_i/FC_AXI_CL_R_0/U0/FC_AXI_CL_R_v1_0_M01_AXIS_inst/top4x3_7to1_ddr_rx_inst/rx0/rx0/loop8.rx_mmcm_adv_inst/CLKIN1
Master Clock        : A_CL_BASE_XCLK_P
Edges               : {1 2 3}
Edge Shifts(ns)     : {0.000 -4.201 -8.403}
Generated Sources   : {cam_1/cam_A_core/design_1_i/FC_AXI_CL_R_0/U0/FC_AXI_CL_R_v1_0_M01_AXIS_inst/top4x3_7to1_ddr_rx_inst/rx0/rx0/loop8.rx_mmcm_adv_inst/CLKOUT0}

Generated Clock     : loop8.rx_mmcm_adv_inst_n_6
Master Source       : cam_1/cam_A_core/design_1_i/FC_AXI_CL_R_0/U0/FC_AXI_CL_R_v1_0_M01_AXIS_inst/top4x3_7to1_ddr_rx_inst/rx0/rx0/loop8.rx_mmcm_adv_inst/CLKIN1
Master Clock        : A_CL_BASE_XCLK_P
Edges               : {1 2 3}
Edge Shifts(ns)     : {0.210 -2.311 -4.832}
Generated Sources   : {cam_1/cam_A_core/design_1_i/FC_AXI_CL_R_0/U0/FC_AXI_CL_R_v1_0_M01_AXIS_inst/top4x3_7to1_ddr_rx_inst/rx0/rx0/loop8.rx_mmcm_adv_inst/CLKOUT1}



====================================================
User Uncertainty
====================================================

From Clock  To Clock    Setup(ns)  Hold(ns)  Edges
clk_125mhz  clk_125mhz  0.100      0.100     rf -> rf
clk_250mhz  clk_250mhz  0.100      0.100     rf -> rf


====================================================
User Jitter
====================================================

Clock        Jitter(ns)
FPGA_SYSCLK  0.050

 

 

 

0 Kudos
Highlighted
Guide
Guide
1,300 Views
Registered: ‎01-23-2009

The doc is very light weight. 

 

This is really the biggest problem. The clocking structure used here is "odd", and I have no way of knowing if it is odd for a reason, or it is odd because it is poorly designed. From what I can extract from the information you have posted, I have no idea of how the system is supposed to work, so there is no way for me to determine if it is done correctly or not.

 

The only thing I can try and help you with is static vs. dynamic capture.

 

In static (normal) input capture, the upstream device provides a guaranteed phase relationship between the clock and the data. While normally the clock rate is the same as the data rate (or 1/2 the data rate if the data is DDR), it is possible to send a phase related clock that is a division of the bitrate clock - this is done in HDMI and I presume cameralink is related to HDMI (I don't specifically know cameralink).

 

A static capture is done using this phase relationship. Using this phase relationship an internal capture clock can be generated based on the incoming clock that has edges at the "right" places to capture the data - somewhere in the middle of the each valid data bit. Once this is done, the paths involved are normal static timing paths

  - the incoming clock is defined with a create_clock

  - the incoming data is defined with set_input_delay commands with respect to the incoming clock

  - if the incoming clock goes through an MMCM (which would be required for this system) the tool automatically creates generated clocks

  - the static timing path is analyzed for setup and hold using this timing information

 

If the setup and hold checks pass then this interface with a static configuration (i.e. all configurable parameters for the timing like MMCM phase shift and any IDELAY tap values set to constants in the bitstream), then this interface is guaranteed to have clean capture of the data across all combinations of legal process, voltage and temperature (PVT). (Assuming all the constraints are completely correct and accurate representations of what will really happen on the board).

 

For a (fairly complex) example of constraints and analysis of an interface see this post on constraints for a source synchronous edge aligned double data rate interface.

 

In some cases, this is not possible

  - the phase relationship between the input clock and input data is not defined

  - the phase relationship between the input clock and input data has too much variability

  - ...

 

Ultimately if there is not enough static window; after all variations due to PVT are taken into account the known valid time of each bit of the incoming data with respect to the incoming clock does not have a "wide enough" known valid time (and you need about 1.75ns or more depending on clocking structure) then you cannot capture this statically; the variations are too big.

 

In this case then you may be able to use dynamic capture.

 

In dynamic capture, the FPGA does some dynamic adjustment of some of its programmable delays (usually under the control of a state machine) to inspect the incoming clock and/or data and adjust the programmable delays to dynamically find the valid window of the incoming data. This can be done using power-up training if the incoming data has a known data pattern (a defined training window), or using some other characteristic of the incoming clock or data. There are many ways of doing dynamic capture (dynamic calibration), depending on tons of different requirements

  - is power on training sufficient or must it be continuous or periodic

  - is it acceptable for the training to be disruptive (i.e. drop data during training) or must all data make it through (once the link is established)

  - is there some phase guarantee between the clock and data (even if it is not enough for static capture)

  - is there enough transition density in the data

  - is there some protocol that allows you to determine if data is properly received (i.e. CRC or parity or something else)

  - (and probably a whole bunch more)

 

Based on all these things, a dynamic capture mechanism may be able to be designed.

 

So it all boils down to the IP. How is it trying to capture the data?

 

If it is static capture then it must have correct clock and set_input_delay constraints. These have to be extracted from your system (mostly from the transmitting device).

 

If it is dynamic, then you really have only two choices

 a) you trust that the designer designed the dynamic capture mechanism properly and the interface will be robust or

 b) you don't

 

Without any documentation it is impossible for you to analyze if the dynamic capture mechanism is "reasonable".

 

Avrum

0 Kudos
Highlighted
Scholar
Scholar
1,289 Views
Registered: ‎06-23-2014

@avrumw , first of all, I realized I've specified the CL clock at 50% duty cycle, when in fact it's 3/7 low and 4/7 high.  I'm rebuilding now and we'll see what happens.


Otherwise, my personal experience with a similar deserialization project was that I used the DYNAMIC method you describe.  The Xilinx example for 1:7 deserialization does it this way, mentions HDMI as I recall, and is applicable to CL.  The IDELAY is controlled by a state machine that finds the eye.

 

In that project, however, I specified the word clock.  (Call it a word clock since it's not a bit clock.  7-bit words in this case.)

 

As I understood what you wrote, You mentioned previously that with dynamic configuration, it might be possible to not specify the clock constraint.  I don't see how this can be.  I think it *must* be specified (and correctly 3/7-4/7 LOL) at a frequency high enough to cover the Camera Link clock speeds to be supported.  No amount of dynamic training could get over the fact that the logic might be slower than the clock (including 3.5x clock) is going.  

 

Did I misunderstand you before?  Am I correct that I *must* have a constraint given what I said above?

 

Thanks again,

Helmut

0 Kudos
Highlighted
Guide
Guide
1,281 Views
Registered: ‎01-23-2009

it might be possible to not specify the clock constraint

 

Sorry - I was imprecise. Clock constraints are always needed for all clocks. It is only the set_input_delay constraints that are irrelevant for dynamically captured interfaces.

 

However, it is worth pointing out that without the set_input_delay commands, the design will fail check_timing for these inputs - the tool specifically checks that all inputs have input constraints.

 

In order to keep the tool happy, you need to

  - add set_input_delay commands on the inputs (the values are irrelevant)

  - put a set_false_path -from [get_ports {<the input ports>}]

 

This tells the tool - that you have not forgotten about these ports, and that the timing is known to be false (due to the dynamic capture).

 

Avrum

0 Kudos
Highlighted
Scholar
Scholar
1,251 Views
Registered: ‎06-23-2014

I was just given an XDC that didn't have before.

 

The project still doesn't meet timing.

 

There are a number of set_false_path's for which I had to adapt the path names.  What's a good way to confirm that I got that part correct?  I ran report_clocks and it showed the CL clock constraints, but not any false path info.  I'm sure the success or failure of these are in there somewhere, but I don't know how to find them...

0 Kudos