UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Observer kuki
Observer
453 Views
Registered: ‎05-24-2018

Right way to clock gate with BUFGCE.

Hi,

 

Setup:

Vivado 2018.2

Ultrascale+ Architecture.

 

I have always had issue understanding the right way to use BUFG modules to properly clock gate part of the design, so I would really appreciate expert advice / help here.

 

In the design I am just trying to Clock gate BUFGCE through CE input, which is driven from a FF.

Both FF and BUFGCE are "clocked" from the same source. Path delay between those units is very small -> 0.248.

However, the tool claims there is much bigger delay from the same CLOCK ROOT (X2Y2) to FF -> 3.237 in comparison to BUFGCE -> 0.426, hence the timing violation.

 

Here is the path:

 

Max Delay Paths
--------------------------------------------------------------------------------------
Slack (VIOLATED) :        -1.570ns  (required time - arrival time)
  Source:                 CDC_STEP[2].u_cdc_step/enable_r_reg/C
                            (rising edge-triggered cell FDRE clocked by clk_3  {rise@0.000ns fall@0.833ns period=1.667ns})
  Destination:            CDC_STEP[2].u_cdc_step/BUFGCE_inst/CE
                            (rising edge-triggered cell BUFGCE clocked by clk_3  {rise@0.000ns fall@0.833ns period=1.667ns})
  Path Group:             clk_3
  Path Type:              Setup (Max at Slow Process Corner)
  Requirement:            1.667ns  (clk_3 rise@1.667ns - clk_3 rise@0.000ns)
  Data Path Delay:        0.327ns  (logic 0.079ns (24.159%)  route 0.248ns (75.841%))
  Logic Levels:           0  
  Clock Path Skew:        -2.794ns (DCD - SCD + CPR)
    Destination Clock Delay (DCD):    0.426ns = ( 2.093 - 1.667 ) 
    Source Clock Delay      (SCD):    3.237ns
    Clock Pessimism Removal (CPR):    0.017ns
  Clock Uncertainty:      0.057ns  ((TSJ^2 + DJ^2)^1/2) / 2 + PE
    Total System Jitter     (TSJ):    0.071ns
    Discrete Jitter          (DJ):    0.088ns
    Phase Error              (PE):    0.000ns
  Clock Net Delay (Source):      3.237ns (routing 1.675ns, distribution 1.562ns)
  Clock Net Delay (Destination): 0.426ns (routing 1.522ns, distribution -1.096ns)

    Location             Delay type                Incr(ns)  Path(ns)    Partition      Netlist Resource(s)
  -------------------------------------------------------------------    ----------------------------------
                         (clock clk_3 rise edge)      0.000     0.000 r                 
    BUFGCTRL_X1Y42       BUFGCTRL                     0.000     0.000 r  static         BUFGCE[3].BUFGMUX_sel_1/O
    X2Y2 (CLOCK_ROOT)    net (fo=144244, routed)      3.237     3.237    static         CDC_STEP[2].u_cdc_step/clk_3
    SLICE_X112Y330       FDRE                                         r  static         CDC_STEP[2].u_cdc_step/enable_r_reg/C
  -------------------------------------------------------------------    ----------------------------------
    SLICE_X112Y330       FDRE (Prop_FFF_SLICEL_C_Q)
                                                      0.079     3.316 r  static         CDC_STEP[2].u_cdc_step/enable_r_reg/Q
                         net (fo=1, routed)           0.248     3.564    static         CDC_STEP[2].u_cdc_step/enable_r
    BUFGCE_X1Y132        BUFGCE                                       r  static         CDC_STEP[2].u_cdc_step/BUFGCE_inst/CE
  -------------------------------------------------------------------    ----------------------------------

                         (clock clk_3 rise edge)      1.667     1.667 r                 
    BUFGCTRL_X1Y42       BUFGCTRL                     0.000     1.667 r  static         BUFGCE[3].BUFGMUX_sel_1/O
    X2Y2 (CLOCK_ROOT)    net (fo=144244, routed)      0.426     2.093    static         CDC_STEP[2].u_cdc_step/clk_3
    BUFGCE_X1Y132        BUFGCE                                       r  static         CDC_STEP[2].u_cdc_step/BUFGCE_inst/I
                         clock pessimism              0.017     2.110                     
                         clock uncertainty           -0.057     2.053                     
    BUFGCE_X1Y132        BUFGCE (Setup_BUFCE_BUFGCE_I_CE)
                                                     -0.059     1.994    static           CDC_STEP[2].u_cdc_step/BUFGCE_inst
  -------------------------------------------------------------------
                         required time                          1.994                     
                         arrival time                          -3.564                     
  -------------------------------------------------------------------
                         slack                                 -1.570                     

 

Could someone please explain what is going on?

 

 

Cheers,

Arsen.

 

0 Kudos
4 Replies
Historian
Historian
411 Views
Registered: ‎01-23-2009

Re: Right way to clock gate with BUFGCE.

Take a look at this post on using the BUFGCE. It was written for the 7 series, so the BUFHCE part doesn't apply, but the mechanism of using the BUFGCE in UltraScale is the same; the "base" clock (which clocks the flip-flops that generate the CE) need to be on one BUFG/BUFGCE and the divided clock need to be on another. In your architecture, you have your BUFGCE in series (not in parallel) with the BUFGCTRL, which is what is causing the extra skew.

 

In UltraScale/UltraScale+ you need to also ensure that the output nets of the two BUFGCE's are placed in the same CLOCK_DELAY_GROUP to ensure that their CLOCK_ROOT are co-located. Take a look at this post on the CLOCK_DELAY_GROUP.

 

Also, there is something odd with your constraints - this timing path starts at the output of a BUFGCTRL - it looks like there is a primary clock defined there. This is "bad form" (or incorrect) - primary clocks (create_clock) should only be defined on input clock ports. The only exceptions are

  - the PS_CLK, which is internal from the programmable system in a Zynq and

  - the RXCLKOUT/TXCLKOUT of a GTx in earlier devices

    - I think (but am not sure) that UltraScale/UltraScale+ can derive generated clocks through GTx from the REFCLK inputs

 

Avrum

Moderator
Moderator
368 Views
Registered: ‎01-16-2013

Re: Right way to clock gate with BUFGCE.

Hi,

 

I would suggest to go through UG949 to understand the correct methodology for clock gating.

https://www.xilinx.com/support/documentation/sw_manuals/xilinx2018_1/ug949-vivado-design-methodology.pdf

 

Thanks,
Yash

0 Kudos
Observer kuki
Observer
343 Views
Registered: ‎05-24-2018

Re: Right way to clock gate with BUFGCE.

Thanks @avrumw for your response and detailed explanation.

I was hoping you will reply to my issue.

 

Dear @yashp. Thanks a lot for the link. Definitely missed this doc. Went through it quickly, but will read it in much more details.

 

Avrum,

Basically I went through each of the steps you have suggested and here is what I have now:

 

Max Delay Paths
--------------------------------------------------------------------------------------
Slack (VIOLATED) :        -2.603ns  (required time - arrival time)
  Source:                 CDC_STEP[3].u_cdc_step/clk_on_reg/C
                            (rising edge-triggered cell FDCE clocked by clk_4  {rise@0.000ns fall@0.833ns period=1.666ns})
  Destination:            BUFGCE_GATED[4].u_BUFGCE/CE
                            (rising edge-triggered cell BUFGCE clocked by clk_out3_clk_wiz_0  {rise@0.000ns fall@0.833ns period=1.666ns})
  Path Group:             clk_out3_clk_wiz_0
  Path Type:              Setup (Max at Slow Process Corner)
  Requirement:            1.666ns  (clk_out3_clk_wiz_0 rise@1.666ns - clk_4 rise@0.000ns)
  Data Path Delay:        0.327ns  (logic 0.079ns (24.159%)  route 0.248ns (75.841%))
  Logic Levels:           0  
  Clock Path Skew:        -3.833ns (DCD - SCD + CPR)
    Destination Clock Delay (DCD):    0.711ns = ( 2.377 - 1.666 ) 
    Source Clock Delay      (SCD):    4.331ns
    Clock Pessimism Removal (CPR):    -0.213ns
  Clock Uncertainty:      0.050ns  ((TSJ^2 + DJ^2)^1/2) / 2 + PE
    Total System Jitter     (TSJ):    0.071ns
    Discrete Jitter          (DJ):    0.071ns
    Phase Error              (PE):    0.000ns
  Clock Net Delay (Source):      2.666ns (routing 0.874ns, distribution 1.792ns)
  Clock Net Delay (Destination): 0.425ns (routing 0.004ns, distribution 0.421ns)

    Location             Delay type                Incr(ns)  Path(ns)    Partition      Netlist Resource(s)
  -------------------------------------------------------------------    ----------------------------------
                         (clock clk_4 rise edge)      0.000     0.000 r                 
    F32                                               0.000     0.000 r  static         clk_in1_p (IN)
                         net (fo=0)                   0.000     0.000    static         u_pll/inst/clkin1_ibufds/I
    HPIOBDIFFINBUF_X0Y204
                         DIFFINBUF (Prop_DIFFINBUF_HPIOBDIFFINBUF_DIFF_IN_P_O)
                                                      0.569     0.569 r  static         u_pll/inst/clkin1_ibufds/DIFFINBUF_INST/O
                         net (fo=1, routed)           0.050     0.619    static         u_pll/inst/clkin1_ibufds/OUT
    F32                  IBUFCTRL (Prop_IBUFCTRL_HPIOB_M_I_O)
                                                      0.000     0.619 r  static         u_pll/inst/clkin1_ibufds/IBUFCTRL_INST/O
                         net (fo=1, routed)           0.384     1.003    static         u_pll/inst/clk_in1_clk_wiz_0
    MMCM_X0Y8            MMCME4_ADV (Prop_MMCM_CLKIN1_CLKOUT2)
                                                     -2.416    -1.413 r  static         u_pll/inst/mmcme4_adv_inst/CLKOUT2
                         net (fo=1, routed)           0.248    -1.165    static         u_pll/inst/clk_out3_clk_wiz_0
    BUFGCE_X0Y204        BUFGCE (Prop_BUFCE_BUFGCE_I_O)
                                                      0.028    -1.137 r  static         u_pll/inst/clkout3_buf/O
                         net (fo=8, routed)           1.106    -0.031    static         clk_600mhz
    BUFGCTRL_X0Y76       BUFGCTRL (Prop_BUFGCTRL_I1_O)
                                                      0.093     0.062 r  static         BUFG_MUX[4].BUFGMUX_sel_1_0/O
                         net (fo=2, routed)           1.575     1.637    static         clk_l[4]
    BUFGCE_X0Y134        BUFGCE (Prop_BUFCE_BUFGCE_I_O)
                                                      0.028     1.665 r  static         BUFGCE[4].u_BUFGCE/O
    X1Y4 (CLOCK_ROOT)    net (fo=2316, routed)        2.666     4.331    static         CDC_STEP[3].u_cdc_step/out[0]
    SLICE_X55Y571        FDCE                                         r  static         CDC_STEP[3].u_cdc_step/clk_on_reg/C
  -------------------------------------------------------------------    ----------------------------------
    SLICE_X55Y571        FDCE (Prop_DFF_SLICEL_C_Q)
                                                      0.079     4.410 r  static         CDC_STEP[3].u_cdc_step/clk_on_reg/Q
                         net (fo=1, routed)           0.248     4.658    static         clk_on[4]
    BUFGCE_X0Y236        BUFGCE                                       r  static         BUFGCE_GATED[4].u_BUFGCE/CE
  -------------------------------------------------------------------    ----------------------------------

                         (clock clk_out3_clk_wiz_0 rise edge)
                                                      1.666     1.666 r                 
    F32                                               0.000     1.666 r  static         clk_in1_p (IN)
                         net (fo=0)                   0.000     1.666    static         u_pll/inst/clkin1_ibufds/I
    HPIOBDIFFINBUF_X0Y204
                         DIFFINBUF (Prop_DIFFINBUF_HPIOBDIFFINBUF_DIFF_IN_P_O)
                                                      0.472     2.138 r  static         u_pll/inst/clkin1_ibufds/DIFFINBUF_INST/O
                         net (fo=1, routed)           0.040     2.178    static         u_pll/inst/clkin1_ibufds/OUT
    F32                  IBUFCTRL (Prop_IBUFCTRL_HPIOB_M_I_O)
                                                      0.000     2.178 r  static         u_pll/inst/clkin1_ibufds/IBUFCTRL_INST/O
                         net (fo=1, routed)           0.333     2.511    static         u_pll/inst/clk_in1_clk_wiz_0
    MMCM_X0Y8            MMCME4_ADV (Prop_MMCM_CLKIN1_CLKOUT2)
                                                     -1.888     0.623 r  static         u_pll/inst/mmcme4_adv_inst/CLKOUT2
                         net (fo=1, routed)           0.217     0.840    static         u_pll/inst/clk_out3_clk_wiz_0
    BUFGCE_X0Y204        BUFGCE (Prop_BUFCE_BUFGCE_I_O)
                                                      0.024     0.864 r  static         u_pll/inst/clkout3_buf/O
                         net (fo=8, routed)           1.008     1.872    static         clk_600mhz
    BUFGCTRL_X0Y76       BUFGCTRL (Prop_BUFGCTRL_I1_O)
                                                      0.080     1.952 r  static         BUFG_MUX[4].BUFGMUX_sel_1_0/O
    X2Y9 (CLOCK_ROOT)    net (fo=2, routed)           0.425     2.377    static         clk_l[4]
    BUFGCE_X0Y236        BUFGCE                                       r  static         BUFGCE_GATED[4].u_BUFGCE/I
                         clock pessimism             -0.213     2.164                     
                         clock uncertainty           -0.050     2.114                     
    BUFGCE_X0Y236        BUFGCE (Setup_BUFCE_BUFGCE_I_CE)
                                                     -0.059     2.055    static           BUFGCE_GATED[4].u_BUFGCE
  -------------------------------------------------------------------
                         required time                          2.055                     
                         arrival time                          -4.658                     
  -------------------------------------------------------------------
                         slack                                 -2.603                     

 

1) First of all indeed I was creating a clock on the output of BUFGCTRL, so that's now removed.

2) I have parallelised BUFGs. Basically in my design MMCM generates 3 different clocks. Two cascaded BUFGCTRLs select one of them, which is driving two independent BUFGCE's. One of them has CE constantly assigned to 1 ( BUFGCE_X0Y134 ) and the other one ( BUFGCE_X0Y236 ) is controlled by a FF, which is clocked by BUFGCE_X0Y134.

 

3) I have setup the following constraint for CLOCK ROOT balancing:

set_property CLOCK_DELAY_GROUP clk_and_clk_gated_4 [get_nets {clk[4] clk_gated[4]}]

 

clk[4] is the net driven by BUFGCE_X0Y134, whereas clk_gated[4] is the net driven by BUFGCE_X0Y236.

 

 

Don't know whether that makes any difference, but to let you know. The clk[4] is driving logic both in Partial Reconfiguration region as well as Static region, whereas clk_gated[4] is driving logic located only in Static region.

 

Honestly speaking simple clock gating should not be so complicated, I must be still doing something fundamentally wrong, so would appreciate your further comments.

 

Cheers,

Arsen.

0 Kudos
Historian
Historian
294 Views
Registered: ‎01-23-2009

Re: Right way to clock gate with BUFGCE.

First you still have too many clock buffers in this design. Normally the path would go straight from the MMCM to the two parallel BUFGCEs. You say you are also doing clock MUXing - this will add one more BUFGCTRL, but you have a total of 3 in series - that is one more than you need.

 

But the real problem is the route from your BUFGCTRL to the two BUFGCEs - the routes are VERY unbalanced; one is 1.575ns and the other is 0.425ns. These are not timed at the same corner -  [SLOW_MAX] vs. [SLOW_MIN], but the difference between them is much larger than what one would expect from the different timing corners (they should be closer to 10% different). I don't know why this is - it is the same net going to two different BUFGCEs - maybe the BUFGCEs are too far apart? If so you may consider placing a LOC constraint on the BUFGTRL and the 2 BUFGCEs.

 

But, ultimately, you are trying to do clock gating at 600MHz - that simply may not be possible. The clock tree of your ungated clock (BUFGCE_X0Y134) takes 2.666ns to get to the gating flip-flop - this is significantly larger than your period of 1.666ns. Generally clock gating is used to generate really slow clocks from medium speed clocks - it may be impossible to use a clock this fast as the source for clock gating.

 

Why don't you just generate the slower clock in the MMCM. If you need it to be MUXed, then generate both sets of clocks you need and MUX them both...

 

Avrum

0 Kudos