UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Adventurer
Adventurer
1,315 Views
Registered: ‎10-31-2017

Intra-clock setup violation: how to fix it?

Jump to solution

I am working on the timing issues of my design and there are several intra-clock (among others). Below one of the problematic paths.

 

The violating paths connect a lookup table to DSP48 blocks. There are 34 blocks spread through the design that are fed by the table and I think this may be the reason why some paths fail. Despite being clocked by an 80 MHz clock, these particular paths stay stable for several clock periods before the DSP blocks effectively process them.

So, my questions are:

1- Is there a straightforward way to fix these paths?

2- If not, is there a way to add a constraint that relaxes the requirement of these paths while keeping the clock rate?

 

One crazy idea that occurred to me was duplicating the lookup table logic (similar to what is done with flip-flops of high fanout nets). Is it possible to do it transparently, i.e. without explicitly adding it?

 

 

Copyright 1986-2018 Xilinx, Inc. All Rights Reserved.
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Tool Version : Vivado v.2018.3 (win64) Build 2405991 Thu Dec  6 23:38:27 MST 2018
| Date         : Wed Mar 13 20:47:18 2019
| Host         : TIMPEL-PD-0273 running 64-bit major release  (build 9200)
| Command      : report_timing -from [get_pins SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/SinData_reg/CLKARDCLK] -to [get_pins {SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/GEN_X[6].demod_sin_inst/demod_mac_inst/SumReg_reg/A[16]}] -delay_type min_max -max_paths 10 -sort_by group -input_pins -routable_nets -file yy.txt
| Design       : SAR_ZYNQ_wrapper
| Device       : 7z020-clg484
| Speed File   : -1  PRODUCTION 1.11 2014-09-11
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Timing Report

Slack (VIOLATED) :        -0.421ns  (required time - arrival time)
  Source:                 SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/SinData_reg/CLKARDCLK
                            (rising edge-triggered cell RAMB18E1 clocked by clk_80M0_SAR_ZYNQ_clk_wiz_0_0  {rise@0.000ns fall@6.250ns period=12.500ns})
  Destination:            SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/GEN_X[6].demod_sin_inst/demod_mac_inst/SumReg_reg/A[16]
                            (rising edge-triggered cell DSP48E1 clocked by clk_80M0_SAR_ZYNQ_clk_wiz_0_0  {rise@0.000ns fall@6.250ns period=12.500ns})
  Path Group:             clk_80M0_SAR_ZYNQ_clk_wiz_0_0
  Path Type:              Setup (Max at Slow Process Corner)
  Requirement:            12.500ns  (clk_80M0_SAR_ZYNQ_clk_wiz_0_0 rise@12.500ns - clk_80M0_SAR_ZYNQ_clk_wiz_0_0 rise@0.000ns)
  Data Path Delay:        12.226ns  (logic 2.454ns (20.072%)  route 9.772ns (79.928%))
  Logic Levels:           0  
  Clock Path Skew:        -0.240ns (DCD - SCD + CPR)
    Destination Clock Delay (DCD):    4.525ns = ( 17.025 - 12.500 ) 
    Source Clock Delay      (SCD):    5.032ns
    Clock Pessimism Removal (CPR):    0.266ns
  Clock Uncertainty:      0.093ns  ((TSJ^2 + DJ^2)^1/2) / 2 + PE
    Total System Jitter     (TSJ):    0.071ns
    Discrete Jitter          (DJ):    0.172ns
    Phase Error              (PE):    0.000ns

    Location             Delay type                Incr(ns)  Path(ns)    Netlist Resource(s)
  -------------------------------------------------------------------    -------------------
                         (clock clk_80M0_SAR_ZYNQ_clk_wiz_0_0 rise edge)
                                                      0.000     0.000 r  
    M19                                               0.000     0.000 r  CLK_SYS_P (IN)
                         net (fo=0)                   0.000     0.000    SAR_ZYNQ_i/util_ds_buf_0/U0/IBUF_DS_P[0]
    M19                                                               r  SAR_ZYNQ_i/util_ds_buf_0/U0/USE_IBUFDS.GEN_IBUFDS[0].IBUFDS_I/I
    M19                  IBUFDS (Prop_ibufds_I_O)     0.905     0.905 r  SAR_ZYNQ_i/util_ds_buf_0/U0/USE_IBUFDS.GEN_IBUFDS[0].IBUFDS_I/O
                         net (fo=1, routed)           2.205     3.110    SAR_ZYNQ_i/clk_wiz_0/inst/clk_in1
    BUFGCTRL_X0Y17                                                    r  SAR_ZYNQ_i/clk_wiz_0/inst/clkin1_bufg/I
    BUFGCTRL_X0Y17       BUFG (Prop_bufg_I_O)         0.102     3.212 r  SAR_ZYNQ_i/clk_wiz_0/inst/clkin1_bufg/O
                         net (fo=1, routed)           1.806     5.018    SAR_ZYNQ_i/clk_wiz_0/inst/clk_in1_SAR_ZYNQ_clk_wiz_0_0
    MMCME2_ADV_X1Y0                                                   r  SAR_ZYNQ_i/clk_wiz_0/inst/mmcm_adv_inst/CLKIN1
    MMCME2_ADV_X1Y0      MMCME2_ADV (Prop_mmcme2_adv_CLKIN1_CLKOUT1)
                                                     -3.793     1.225 r  SAR_ZYNQ_i/clk_wiz_0/inst/mmcm_adv_inst/CLKOUT1
                         net (fo=1, routed)           1.889     3.114    SAR_ZYNQ_i/clk_wiz_0/inst/clk_80M0_SAR_ZYNQ_clk_wiz_0_0
    BUFGCTRL_X0Y1                                                     r  SAR_ZYNQ_i/clk_wiz_0/inst/clkout2_buf/I
    BUFGCTRL_X0Y1        BUFG (Prop_bufg_I_O)         0.101     3.215 r  SAR_ZYNQ_i/clk_wiz_0/inst/clkout2_buf/O
                         net (fo=3868, routed)        1.816     5.032    SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/Clk
    RAMB18_X5Y31         RAMB18E1                                     r  SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/SinData_reg/CLKARDCLK
  -------------------------------------------------------------------    -------------------
    RAMB18_X5Y31         RAMB18E1 (Prop_ramb18e1_CLKARDCLK_DOPADOP[0])
                                                      2.454     7.486 r  SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/SinData_reg/DOPADOP[0]
                         net (fo=32, routed)          9.772    17.258    SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/GEN_X[6].demod_sin_inst/demod_mac_inst/A[16]
    DSP48_X2Y32          DSP48E1                                      r  SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/GEN_X[6].demod_sin_inst/demod_mac_inst/SumReg_reg/A[16]
  -------------------------------------------------------------------    -------------------

                         (clock clk_80M0_SAR_ZYNQ_clk_wiz_0_0 rise edge)
                                                     12.500    12.500 r  
    M19                                               0.000    12.500 r  CLK_SYS_P (IN)
                         net (fo=0)                   0.000    12.500    SAR_ZYNQ_i/util_ds_buf_0/U0/IBUF_DS_P[0]
    M19                                                               r  SAR_ZYNQ_i/util_ds_buf_0/U0/USE_IBUFDS.GEN_IBUFDS[0].IBUFDS_I/I
    M19                  IBUFDS (Prop_ibufds_I_O)     0.862    13.362 r  SAR_ZYNQ_i/util_ds_buf_0/U0/USE_IBUFDS.GEN_IBUFDS[0].IBUFDS_I/O
                         net (fo=1, routed)           2.006    15.368    SAR_ZYNQ_i/clk_wiz_0/inst/clk_in1
    BUFGCTRL_X0Y17                                                    r  SAR_ZYNQ_i/clk_wiz_0/inst/clkin1_bufg/I
    BUFGCTRL_X0Y17       BUFG (Prop_bufg_I_O)         0.092    15.460 r  SAR_ZYNQ_i/clk_wiz_0/inst/clkin1_bufg/O
                         net (fo=1, routed)           1.612    17.073    SAR_ZYNQ_i/clk_wiz_0/inst/clk_in1_SAR_ZYNQ_clk_wiz_0_0
    MMCME2_ADV_X1Y0                                                   r  SAR_ZYNQ_i/clk_wiz_0/inst/mmcm_adv_inst/CLKIN1
    MMCME2_ADV_X1Y0      MMCME2_ADV (Prop_mmcme2_adv_CLKIN1_CLKOUT1)
                                                     -3.425    13.647 r  SAR_ZYNQ_i/clk_wiz_0/inst/mmcm_adv_inst/CLKOUT1
                         net (fo=1, routed)           1.725    15.372    SAR_ZYNQ_i/clk_wiz_0/inst/clk_80M0_SAR_ZYNQ_clk_wiz_0_0
    BUFGCTRL_X0Y1                                                     r  SAR_ZYNQ_i/clk_wiz_0/inst/clkout2_buf/I
    BUFGCTRL_X0Y1        BUFG (Prop_bufg_I_O)         0.091    15.463 r  SAR_ZYNQ_i/clk_wiz_0/inst/clkout2_buf/O
                         net (fo=3868, routed)        1.562    17.025    SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/GEN_X[6].demod_sin_inst/demod_mac_inst/Clk
    DSP48_X2Y32          DSP48E1                                      r  SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/GEN_X[6].demod_sin_inst/demod_mac_inst/SumReg_reg/CLK
                         clock pessimism              0.266    17.292    
                         clock uncertainty           -0.093    17.199    
    DSP48_X2Y32          DSP48E1 (Setup_dsp48e1_CLK_A[16])
                                                     -0.362    16.837    SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/GEN_X[6].demod_sin_inst/demod_mac_inst/SumReg_reg
  -------------------------------------------------------------------
                         required time                         16.837    
                         arrival time                         -17.258    
  -------------------------------------------------------------------
                         slack                                 -0.421    




Slack (MET) :             4.633ns  (arrival time - required time)
  Source:                 SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/SinData_reg/CLKARDCLK
                            (rising edge-triggered cell RAMB18E1 clocked by clk_80M0_SAR_ZYNQ_clk_wiz_0_0  {rise@0.000ns fall@6.250ns period=12.500ns})
  Destination:            SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/GEN_X[6].demod_sin_inst/demod_mac_inst/SumReg_reg/A[16]
                            (rising edge-triggered cell DSP48E1 clocked by clk_80M0_SAR_ZYNQ_clk_wiz_0_0  {rise@0.000ns fall@6.250ns period=12.500ns})
  Path Group:             clk_80M0_SAR_ZYNQ_clk_wiz_0_0
  Path Type:              Hold (Min at Fast Process Corner)
  Requirement:            0.000ns  (clk_80M0_SAR_ZYNQ_clk_wiz_0_0 rise@0.000ns - clk_80M0_SAR_ZYNQ_clk_wiz_0_0 rise@0.000ns)
  Data Path Delay:        4.959ns  (logic 0.585ns (11.797%)  route 4.374ns (88.203%))
  Logic Levels:           0  
  Clock Path Skew:        0.260ns (DCD - SCD - CPR)
    Destination Clock Delay (DCD):    2.039ns
    Source Clock Delay      (SCD):    1.679ns
    Clock Pessimism Removal (CPR):    0.099ns

    Location             Delay type                Incr(ns)  Path(ns)    Netlist Resource(s)
  -------------------------------------------------------------------    -------------------
                         (clock clk_80M0_SAR_ZYNQ_clk_wiz_0_0 rise edge)
                                                      0.000     0.000 r  
    M19                                               0.000     0.000 r  CLK_SYS_P (IN)
                         net (fo=0)                   0.000     0.000    SAR_ZYNQ_i/util_ds_buf_0/U0/IBUF_DS_P[0]
    M19                                                               r  SAR_ZYNQ_i/util_ds_buf_0/U0/USE_IBUFDS.GEN_IBUFDS[0].IBUFDS_I/I
    M19                  IBUFDS (Prop_ibufds_I_O)     0.334     0.334 r  SAR_ZYNQ_i/util_ds_buf_0/U0/USE_IBUFDS.GEN_IBUFDS[0].IBUFDS_I/O
                         net (fo=1, routed)           0.674     1.008    SAR_ZYNQ_i/clk_wiz_0/inst/clk_in1
    BUFGCTRL_X0Y17                                                    r  SAR_ZYNQ_i/clk_wiz_0/inst/clkin1_bufg/I
    BUFGCTRL_X0Y17       BUFG (Prop_bufg_I_O)         0.027     1.035 r  SAR_ZYNQ_i/clk_wiz_0/inst/clkin1_bufg/O
                         net (fo=1, routed)           0.597     1.631    SAR_ZYNQ_i/clk_wiz_0/inst/clk_in1_SAR_ZYNQ_clk_wiz_0_0
    MMCME2_ADV_X1Y0                                                   r  SAR_ZYNQ_i/clk_wiz_0/inst/mmcm_adv_inst/CLKIN1
    MMCME2_ADV_X1Y0      MMCME2_ADV (Prop_mmcme2_adv_CLKIN1_CLKOUT1)
                                                     -1.150     0.482 r  SAR_ZYNQ_i/clk_wiz_0/inst/mmcm_adv_inst/CLKOUT1
                         net (fo=1, routed)           0.529     1.011    SAR_ZYNQ_i/clk_wiz_0/inst/clk_80M0_SAR_ZYNQ_clk_wiz_0_0
    BUFGCTRL_X0Y1                                                     r  SAR_ZYNQ_i/clk_wiz_0/inst/clkout2_buf/I
    BUFGCTRL_X0Y1        BUFG (Prop_bufg_I_O)         0.026     1.037 r  SAR_ZYNQ_i/clk_wiz_0/inst/clkout2_buf/O
                         net (fo=3868, routed)        0.642     1.679    SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/Clk
    RAMB18_X5Y31         RAMB18E1                                     r  SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/SinData_reg/CLKARDCLK
  -------------------------------------------------------------------    -------------------
    RAMB18_X5Y31         RAMB18E1 (Prop_ramb18e1_CLKARDCLK_DOPADOP[0])
                                                      0.585     2.264 r  SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/SinData_reg/DOPADOP[0]
                         net (fo=32, routed)          4.374     6.638    SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/GEN_X[6].demod_sin_inst/demod_mac_inst/A[16]
    DSP48_X2Y32          DSP48E1                                      r  SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/GEN_X[6].demod_sin_inst/demod_mac_inst/SumReg_reg/A[16]
  -------------------------------------------------------------------    -------------------

                         (clock clk_80M0_SAR_ZYNQ_clk_wiz_0_0 rise edge)
                                                      0.000     0.000 r  
    M19                                               0.000     0.000 r  CLK_SYS_P (IN)
                         net (fo=0)                   0.000     0.000    SAR_ZYNQ_i/util_ds_buf_0/U0/IBUF_DS_P[0]
    M19                                                               r  SAR_ZYNQ_i/util_ds_buf_0/U0/USE_IBUFDS.GEN_IBUFDS[0].IBUFDS_I/I
    M19                  IBUFDS (Prop_ibufds_I_O)     0.368     0.368 r  SAR_ZYNQ_i/util_ds_buf_0/U0/USE_IBUFDS.GEN_IBUFDS[0].IBUFDS_I/O
                         net (fo=1, routed)           0.731     1.099    SAR_ZYNQ_i/clk_wiz_0/inst/clk_in1
    BUFGCTRL_X0Y17                                                    r  SAR_ZYNQ_i/clk_wiz_0/inst/clkin1_bufg/I
    BUFGCTRL_X0Y17       BUFG (Prop_bufg_I_O)         0.030     1.129 r  SAR_ZYNQ_i/clk_wiz_0/inst/clkin1_bufg/O
                         net (fo=1, routed)           0.864     1.993    SAR_ZYNQ_i/clk_wiz_0/inst/clk_in1_SAR_ZYNQ_clk_wiz_0_0
    MMCME2_ADV_X1Y0                                                   r  SAR_ZYNQ_i/clk_wiz_0/inst/mmcm_adv_inst/CLKIN1
    MMCME2_ADV_X1Y0      MMCME2_ADV (Prop_mmcme2_adv_CLKIN1_CLKOUT1)
                                                     -1.467     0.526 r  SAR_ZYNQ_i/clk_wiz_0/inst/mmcm_adv_inst/CLKOUT1
                         net (fo=1, routed)           0.576     1.102    SAR_ZYNQ_i/clk_wiz_0/inst/clk_80M0_SAR_ZYNQ_clk_wiz_0_0
    BUFGCTRL_X0Y1                                                     r  SAR_ZYNQ_i/clk_wiz_0/inst/clkout2_buf/I
    BUFGCTRL_X0Y1        BUFG (Prop_bufg_I_O)         0.029     1.131 r  SAR_ZYNQ_i/clk_wiz_0/inst/clkout2_buf/O
                         net (fo=3868, routed)        0.908     2.039    SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/GEN_X[6].demod_sin_inst/demod_mac_inst/Clk
    DSP48_X2Y32          DSP48E1                                      r  SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/GEN_X[6].demod_sin_inst/demod_mac_inst/SumReg_reg/CLK
                         clock pessimism             -0.099     1.939    
    DSP48_X2Y32          DSP48E1 (Hold_dsp48e1_CLK_A[16])
                                                      0.066     2.005    SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/GEN_X[6].demod_sin_inst/demod_mac_inst/SumReg_reg
  -------------------------------------------------------------------
                         required time                         -2.005    
                         arrival time                           6.638    
  -------------------------------------------------------------------
                         slack                                  4.633    

 

0 Kudos
1 Solution

Accepted Solutions
1,158 Views
Registered: ‎01-22-2015

Re: Intra-clock setup violation: how to fix it?

Jump to solution

Hi Elder,

Though the current version is pipelined (see code in my previous post) and I have checked it works on simulation, the previous version was not (so fully combinatorial). The slack was similar or equal; P&R was similar, apparently.
In the original timing path that started at your lookup table and ended at the output of a DSP48, most of the delay is inside the DSP48.  So, to get more slack in this path, try NOT using the DONT_TOUCH attribute on the pipeline register that you placed in this path. Without the DONT_TOUCH attribute, the DSP48 input will pull-in the pipeline register to pipeline itself (and delay will be redistributed more evenly on each side of the register).

If you let it, the DSP48 will pull-in 4 registers to pipeline itself: two registers on its output port and one or two registers on each input port.  It is the registers on the output port that help the most with DSP48 pipelining (and balancing slack – as you say).  So, another thing to try is placing a pipeline register on the output of the DSP48 as shown below.  Again, do NOT use the DONT_TOUCH attribute on this pipeline register – because we want the DSP48 to pull-in the register.

attribute USE_DSP of P1P,P2P : signal is "YES";

PR1: process(CLK1)
begin
   if rising_edge(CLK1) then 
      LTP1 <= LT; --pipeline from lookup table, LT, to each DSP48
      LTP2 <= LT;
      BP1 <= B1;   --pipeline from other input, B1, to the DSP48
      BP2 <= B2;
      P1P <= LTP1 * BP1;  --P1P is calculated using DSP48
      P1 <= P1P;          --P1P is a pipeline register on output of DSP48
      P2P <= LTP2 * BP2;
      P2 <= P2P;
   end if; 
end process PR1;

 

I am trying to figure out what I did wrong as I expected a larger slack as I supposedly set the constraint to two cycles.
You now have three methods of solving the timing analysis problem between your lookup table and the DSP48s: 1) pipelining in HDL, 2) pipelining with the help of Vivado post-place physical optimization, and 3) multicycle constraints.  I argue that “pipelining in HDL” is best because it is easy to understand, easy to implement, and very portable.  I appreciate that you are trying to understand the multicycle constraints.  However, unless you use these constraints often (or you are Avrum :-) ) then it is just too easy to make a mistake with them.  If you are really interested in making the multicycle constraints work, then I can study them and get back to you.  However, I strongly suggest that you keep things simple and use “pipelining in HDL” for this problem.  Also, keep in mind that our current “pipelining in HDL” can easily be improved – by adding more registers to the pipeline.

Cheers,
Mark

0 Kudos
7 Replies
Xilinx Employee
Xilinx Employee
1,272 Views
Registered: ‎07-16-2008

回复: Intra-clock setup violation: how to fix it?

Jump to solution

This is a path from Block RAM to DSP. The clk-to-out logic delay through BRAM is very large.

    RAMB18_X5Y31         RAMB18E1 (Prop_ramb18e1_CLKARDCLK_DOPADOP[0])
                                                      2.454     7.486 r  SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/SinData_reg/DOPADOP[0]

Is the Block RAM inferred or generated from IP? You may try to add a pipeline register at the BRAM output, if latency is not a problem. That will substantially reduce the logic delay. 

-------------------------------------------------------------------------
Don't forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
1,269 Views
Registered: ‎01-22-2015

Re: Intra-clock setup violation: how to fix it?

Jump to solution

@eldercosta 

     ...these particular paths stay stable for several clock periods 
It sounds like you’ve got clock cycles to burn. So, pipelining one or both of the inputs to your DSP48s should help with passing timing analysis. Here’s some example VHDL that does this for two DSP48s.

signal LT : unsigned(2 downto 0); --lookup table of values
signal B1,B2 : unsigned(2 downto 0); --other input to each DSP48
signal LTP1,LTP2,BP1,BP2 : unsigned(2 downto 0); --pipeline registers for LT, B1, B2
signal P1,P2 : unsigned(5 downto 0); --output from each DSP48

-- These attributes prevent DPS48 from pulling in the pipeline registers
attribute DONT_TOUCH : string;
attribute DONT_TOUCH of LTP1,LTP2,BP1,BP2: signal is "TRUE"; 
-- These attributes force multiplications for P1 and P2 to be done in a DSP48
attribute USE_DSP : string;
attribute USE_DSP of P1,P2 : signal is "YES";

PR1: process(CLK1)
begin
   if rising_edge(CLK1) then 
      LTP1 <= LT; --pipeline from lookup table, LT, to each DSP48
      LTP2 <= LT;
      BP1 <= B1; --pipeline from other input, B1, to the DSP48
      BP2 <= B2;
      P1 <= LTP1 * BP1; --P1 is calculated using DSP48
      P2 <= LTP2 * BP2;
   end if; 
end process PR1;


I’ve placed the DONT_TOUCH attribute on the pipeline registers to prevent the DSP48s from “pulling in” these registers and ruining the pipeline effect we are trying to achieve.  I can talk more about this “pulling-in” if you want.

Cheers,
Mark

Oops - looks like Grace and I were answering you at the same time.

Adventurer
Adventurer
1,244 Views
Registered: ‎10-31-2017

Re: Intra-clock setup violation: how to fix it?

Jump to solution

Hello, Mark markg@prosensing.com  and Grace @graces 

 

First of all, thank you both for your quick responses. Grace, the RAM is being inferred.

 

I did think of pipelining after I posted but I was thinking of adding flip-flops between the table and the DSPs. More on that below.

 

I tried different combinations of your suggestions.

1- BRAM side

2- RAM with DONT_TOUCH per Mark's suggestion

3- MACC with DONT_TOUCH per Mark's suggestion

 

#1 passed with a very short worst case slack (0.16ns IIRC). It used the BRAM output register.

#2 passed with a shorter worst case slack (0.07ns IIRC). It used external flip-flops but physically near to the BRAM.

#3 failed with a slack nearly identical to the slack of the original circuit. The flip-flops were placed physically near the DSPs and that is the reason why I think it failed.

For reference, here is the MACC code. Data_B_reg is 20 bits wide and it is the signal I applied DONT_TOUCH to. FWIW Data_A_reg is 18 bits wide.

process (Clk)
begin
	if rising_edge(Clk) then
		Data_A_reg <= Data_A;
		Data_B_reg <= Data_B;
		ProdX <= resize (Data_A_reg * Data_B_reg, ProdX'length);
		if ResetSum = '1' then
			SumReg <= (others => '0');
		elsif EnableSum = '1' then
			SumReg <= ProdX + SumReg;
		end if;
	end if;
end process;

I assumed in either case the P&R would place the flip-flop midway in the path. Another thing I assumed was it would replicate flip-flops and place them so that time closure would be easier (assuming also clock paths delays are shorter than data paths). I am under the impression the P&R of Vivado is not as smart (or tries not to be) as XST; I used the latter with Spartan 3 long ago and I think it replicated the flip-flops as needed, by default.

 

I also tried to set a multi cycle path for these paths:

set_multicycle_path -setup -rise_from [get_pins SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/SinData_reg/CLKARDCLK] \ 
-rise_to [get_cells -hierarchical -filter { NAME =~ "*Data_B_reg_reg*" && NAME =~ "*demod_sin_inst*" }] 2 set_multicycle_path -hold -rise_from [get_pins SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/SinData_reg/CLKARDCLK] \
-rise_to [get_cells -hierarchical -filter { NAME =~ "*Data_B_reg_reg*" && NAME =~ "*demod_sin_inst*" }] 1 set_multicycle_path -setup -rise_from [get_pins SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/CosData_reg/CLKARDCLK] \
-rise_to [get_cells -hierarchical -filter { NAME =~ "*Data_B_reg_reg*" && NAME =~ "*demod_cos_inst*" }] 2 set_multicycle_path -hold -rise_from [get_pins SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/CosData_reg/CLKARDCLK] \
-rise_to [get_cells -hierarchical -filter { NAME =~ "*Data_B_reg_reg*" && NAME =~ "*demod_cos_inst*" }] 1

 

It worked for these paths (they do not appear in the timing summary). However, I expected a much larger slack than 1.45ns (perhaps misuse of the tool or misinterpretation of the results by me - see below). I may live with the exceptions set as above but I would rather have it fixed without them with a reasonable margin, if it is possible. There are other timing violations I have to work on and I fear these come back to haunt me after I fix the others, if the slack is too low.

 

-------------------------------------------------------------------------------------------------------------------------------------------
| Tool Version : Vivado v.2018.3 (win64) Build 2405991 Thu Dec  6 23:38:27 MST 2018
| Date         : Thu Mar 14 13:11:15 2019
| Host         : TIMPEL-PD-0273 running 64-bit major release  (build 9200)
| Command      : report_timing -to [get_cells -hierarchical -filter { NAME =~  "*Data_B_reg_reg*" && NAME =~  "*demod_sin_inst*" }] -setup
| Design       : SAR_ZYNQ_wrapper
| Device       : 7z020-clg484
| Speed File   : -1  PRODUCTION 1.11 2014-09-11
-------------------------------------------------------------------------------------------------------------------------------------------

Timing Report

Slack (MET) :             1.453ns  (required time - arrival time)
  Source:                 SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/SinData_reg/CLKBWRCLK
                            (rising edge-triggered cell RAMB18E1 clocked by clk_80M0_SAR_ZYNQ_clk_wiz_0_0  {rise@0.000ns fall@6.250ns period=12.500ns})
  Destination:            SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/GEN_X[18].demod_sin_inst/demod_mac_inst/Data_B_reg_reg[19]/D
                            (rising edge-triggered cell FDRE clocked by clk_80M0_SAR_ZYNQ_clk_wiz_0_0  {rise@0.000ns fall@6.250ns period=12.500ns})
  Path Group:             clk_80M0_SAR_ZYNQ_clk_wiz_0_0
  Path Type:              Setup (Max at Slow Process Corner)
  Requirement:            12.500ns  (clk_80M0_SAR_ZYNQ_clk_wiz_0_0 rise@12.500ns - clk_80M0_SAR_ZYNQ_clk_wiz_0_0 rise@0.000ns)
  Data Path Delay:        10.703ns  (logic 2.454ns (22.929%)  route 8.249ns (77.071%))
  Logic Levels:           0  
  Clock Path Skew:        -0.171ns (DCD - SCD + CPR)
    Destination Clock Delay (DCD):    4.575ns = ( 17.075 - 12.500 ) 
    Source Clock Delay      (SCD):    5.012ns
    Clock Pessimism Removal (CPR):    0.266ns
  Clock Uncertainty:      0.093ns  ((TSJ^2 + DJ^2)^1/2) / 2 + PE
    Total System Jitter     (TSJ):    0.071ns
    Discrete Jitter          (DJ):    0.172ns
    Phase Error              (PE):    0.000ns

    Location             Delay type                Incr(ns)  Path(ns)    Netlist Resource(s)
  -------------------------------------------------------------------    -------------------
                         (clock clk_80M0_SAR_ZYNQ_clk_wiz_0_0 rise edge)
                                                      0.000     0.000 r  
    M19                                               0.000     0.000 r  CLK_SYS_P (IN)
                         net (fo=0)                   0.000     0.000    SAR_ZYNQ_i/util_ds_buf_0/U0/IBUF_DS_P[0]
    M19                  IBUFDS (Prop_ibufds_I_O)     0.905     0.905 r  SAR_ZYNQ_i/util_ds_buf_0/U0/USE_IBUFDS.GEN_IBUFDS[0].IBUFDS_I/O
                         net (fo=1, routed)           2.205     3.110    SAR_ZYNQ_i/clk_wiz_0/inst/clk_in1
    BUFGCTRL_X0Y17       BUFG (Prop_bufg_I_O)         0.102     3.212 r  SAR_ZYNQ_i/clk_wiz_0/inst/clkin1_bufg/O
                         net (fo=1, routed)           1.806     5.018    SAR_ZYNQ_i/clk_wiz_0/inst/clk_in1_SAR_ZYNQ_clk_wiz_0_0
    MMCME2_ADV_X1Y0      MMCME2_ADV (Prop_mmcme2_adv_CLKIN1_CLKOUT1)
                                                     -3.793     1.225 r  SAR_ZYNQ_i/clk_wiz_0/inst/mmcm_adv_inst/CLKOUT1
                         net (fo=1, routed)           1.889     3.114    SAR_ZYNQ_i/clk_wiz_0/inst/clk_80M0_SAR_ZYNQ_clk_wiz_0_0
    BUFGCTRL_X0Y1        BUFG (Prop_bufg_I_O)         0.101     3.215 r  SAR_ZYNQ_i/clk_wiz_0/inst/clkout2_buf/O
                         net (fo=5143, routed)        1.796     5.012    SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/Clk
    RAMB18_X0Y1          RAMB18E1                                     r  SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/SinData_reg/CLKBWRCLK
  -------------------------------------------------------------------    -------------------
    RAMB18_X0Y1          RAMB18E1 (Prop_ramb18e1_CLKBWRCLK_DOBDO[1])
                                                      2.454     7.466 r  SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/SinData_reg/DOBDO[1]
                         net (fo=32, routed)          8.249    15.714    SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/GEN_X[18].demod_sin_inst/demod_mac_inst/D[19]
    SLICE_X103Y30        FDRE                                         r  SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/GEN_X[18].demod_sin_inst/demod_mac_inst/Data_B_reg_reg[19]/D
  -------------------------------------------------------------------    -------------------

                         (clock clk_80M0_SAR_ZYNQ_clk_wiz_0_0 rise edge)
                                                     12.500    12.500 r  
    M19                                               0.000    12.500 r  CLK_SYS_P (IN)
                         net (fo=0)                   0.000    12.500    SAR_ZYNQ_i/util_ds_buf_0/U0/IBUF_DS_P[0]
    M19                  IBUFDS (Prop_ibufds_I_O)     0.862    13.362 r  SAR_ZYNQ_i/util_ds_buf_0/U0/USE_IBUFDS.GEN_IBUFDS[0].IBUFDS_I/O
                         net (fo=1, routed)           2.006    15.368    SAR_ZYNQ_i/clk_wiz_0/inst/clk_in1
    BUFGCTRL_X0Y17       BUFG (Prop_bufg_I_O)         0.092    15.460 r  SAR_ZYNQ_i/clk_wiz_0/inst/clkin1_bufg/O
                         net (fo=1, routed)           1.612    17.073    SAR_ZYNQ_i/clk_wiz_0/inst/clk_in1_SAR_ZYNQ_clk_wiz_0_0
    MMCME2_ADV_X1Y0      MMCME2_ADV (Prop_mmcme2_adv_CLKIN1_CLKOUT1)
                                                     -3.425    13.647 r  SAR_ZYNQ_i/clk_wiz_0/inst/mmcm_adv_inst/CLKOUT1
                         net (fo=1, routed)           1.725    15.372    SAR_ZYNQ_i/clk_wiz_0/inst/clk_80M0_SAR_ZYNQ_clk_wiz_0_0
    BUFGCTRL_X0Y1        BUFG (Prop_bufg_I_O)         0.091    15.463 r  SAR_ZYNQ_i/clk_wiz_0/inst/clkout2_buf/O
                         net (fo=5143, routed)        1.611    17.075    SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/GEN_X[18].demod_sin_inst/demod_mac_inst/Clk
    SLICE_X103Y30        FDRE                                         r  SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/GEN_X[18].demod_sin_inst/demod_mac_inst/Data_B_reg_reg[19]/C
                         clock pessimism              0.266    17.341    
                         clock uncertainty           -0.093    17.248    
    SLICE_X103Y30        FDRE (Setup_fdre_C_D)       -0.081    17.167    SAR_ZYNQ_i/sar_zynq_IP_0/U0/demod_inst/GEN_X[18].demod_sin_inst/demod_mac_inst/Data_B_reg_reg[19]
  -------------------------------------------------------------------
                         required time                         17.167    
                         arrival time                         -15.714    
  -------------------------------------------------------------------
                         slack                                  1.453    

 

 

0 Kudos
1,227 Views
Registered: ‎01-22-2015

Re: Intra-clock setup violation: how to fix it?

Jump to solution

Hi Elder,

     I assumed in either case the P&R would place the flip-flop midway in the path.
Unless you setup the DSP48 to be pipelined (or allowed it to pull-in registers and pipeline itself), the DSP48 is just a mass of combinational logic (ie. no registers) with considerable insertion delay.  So, to balance delay on either side of the flip-flop that you added, P&R should have placed your flip-flop nearer the DSP48.

     Another thing I assumed was it would replicate flip-flops and place them so that time closure would be easier
Vivado can do this too during post-place physical optimization.  On a problem very similar to yours (address lines into large BRAM), jmcclusk showed me how to do it in <this> post.

     I am under the impression the P&R of Vivado is not as smart (or tries not to be) as XST;
Blasphemy 😊

     There are other timing violations I have to work on and I fear these come back to haunt me after I fix the others, if the slack is too low.
Like you, I have this desire to maximize slack. That is, tweak my design to get the biggest positive slack possible. However, this is wrong, wrong, wrong.  Vivado implementation (+ timing analysis) does NOT optimize slack. Rather, it runs until it gets positive slack everywhere – and then stops.  So, for now, use whatever method you like to get positive slack between the BRAM and the DSP48s – and then move on to the next challenge in your project.

Cheers,
Mark

0 Kudos
Adventurer
Adventurer
1,208 Views
Registered: ‎10-31-2017

Re: Intra-clock setup violation: how to fix it?

Jump to solution

Hello, Mark.

Unless you setup the DSP48 to be pipelined (or allowed it to pull-in registers and pipeline itself), the DSP48 is just a mass of combinational logic (ie. no registers) with considerable insertion delay.  So, to balance delay on either side of the flip-flop that you added, P&R should have placed your flip-flop nearer the DSP48.

Though the current version is pipelined (see code in my previous post) and I have checked it works on simulation, the previous version was not (so fully combinatorial). The slack was similar or equal; P&R was similar, apparently.

Like you, I have this desire to maximize slack. That is, tweak my design to get the biggest positive slack possible. However, this is wrong, wrong, wrong.  Vivado timing analysis does NOT optimize slack. Rather, it runs until it gets positive slack everywhere – and then stops.  So, for now, use whatever method you like to get positive slack between the BRAM and the DSP48s – and then move on to the next challenge in your project.

OK, it makes sense as it uses worst case parameters. But in this particular case, I am trying to figure out what I did wrong as I expected a larger slack as I supposedly set the constraint to two cycles. So I expected a setup of around one clock cycle.

 

0 Kudos
1,159 Views
Registered: ‎01-22-2015

Re: Intra-clock setup violation: how to fix it?

Jump to solution

Hi Elder,

Though the current version is pipelined (see code in my previous post) and I have checked it works on simulation, the previous version was not (so fully combinatorial). The slack was similar or equal; P&R was similar, apparently.
In the original timing path that started at your lookup table and ended at the output of a DSP48, most of the delay is inside the DSP48.  So, to get more slack in this path, try NOT using the DONT_TOUCH attribute on the pipeline register that you placed in this path. Without the DONT_TOUCH attribute, the DSP48 input will pull-in the pipeline register to pipeline itself (and delay will be redistributed more evenly on each side of the register).

If you let it, the DSP48 will pull-in 4 registers to pipeline itself: two registers on its output port and one or two registers on each input port.  It is the registers on the output port that help the most with DSP48 pipelining (and balancing slack – as you say).  So, another thing to try is placing a pipeline register on the output of the DSP48 as shown below.  Again, do NOT use the DONT_TOUCH attribute on this pipeline register – because we want the DSP48 to pull-in the register.

attribute USE_DSP of P1P,P2P : signal is "YES";

PR1: process(CLK1)
begin
   if rising_edge(CLK1) then 
      LTP1 <= LT; --pipeline from lookup table, LT, to each DSP48
      LTP2 <= LT;
      BP1 <= B1;   --pipeline from other input, B1, to the DSP48
      BP2 <= B2;
      P1P <= LTP1 * BP1;  --P1P is calculated using DSP48
      P1 <= P1P;          --P1P is a pipeline register on output of DSP48
      P2P <= LTP2 * BP2;
      P2 <= P2P;
   end if; 
end process PR1;

 

I am trying to figure out what I did wrong as I expected a larger slack as I supposedly set the constraint to two cycles.
You now have three methods of solving the timing analysis problem between your lookup table and the DSP48s: 1) pipelining in HDL, 2) pipelining with the help of Vivado post-place physical optimization, and 3) multicycle constraints.  I argue that “pipelining in HDL” is best because it is easy to understand, easy to implement, and very portable.  I appreciate that you are trying to understand the multicycle constraints.  However, unless you use these constraints often (or you are Avrum :-) ) then it is just too easy to make a mistake with them.  If you are really interested in making the multicycle constraints work, then I can study them and get back to you.  However, I strongly suggest that you keep things simple and use “pipelining in HDL” for this problem.  Also, keep in mind that our current “pipelining in HDL” can easily be improved – by adding more registers to the pipeline.

Cheers,
Mark

0 Kudos
Adventurer
Adventurer
1,142 Views
Registered: ‎10-31-2017

Re: Intra-clock setup violation: how to fix it?

Jump to solution

Hello, Mark.

 

I invested a few hours playing with Vivado to analyze the paths and timings (and learned to use Vivado's search filters that helps to set TCL syntax) and it paid off as I got a better understanding of the tool and how to navigate through the logic.

>> I argue that “pipelining in HDL” is best because it is easy to understand, easy to implement, and very portable. 

I totally agree. I used pipelines aggressively in my previous designs and also in the parts of this one that I modified or created. But it is the first time I need to delve into design internal for timing closure as my previous designs worked at lower frequencies and the synchronous design technique along with the default settings of the tools were sufficient to have a successful design timing wise. I have again added the register to the outuput of the lookup tables and this there were no failures in the intra clock paths. I also made several of the outputs to use the registers in the IOB, which made their slacks positive (it is related to another question that I posted in the forum - so I solved another issue without having to resort to a workaround).

I appreciate that you are trying to understand the multicycle constraints.  However, unless you use these constraints often (or you are Avrum :-) ) then it is just too easy to make a mistake with them. 

I learned what I was doing wrong and I set the multicycle paths just to see what was reported and it naturaly showed a much larger setup time as expected. But I will not use it unless absolutely necessary.

 

Once more, thank you very much for your help.

 

Elder

 

0 Kudos