cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
salcock
Contributor
Contributor
584 Views
Registered: ‎12-19-2016

Clock Mux Hold Violation In DSP48 and ILA

Jump to solution

I have a design with a clock multiplexor, implemented as an MMCM. There are two inputs, which are both nominally 88 MHz but driven from independent sources. The job of the MMCM is to select one of these inputs and propagate it to the first output. It also mixes the selected input to a higher frequency (158 MHz) and propagates this to the second output.

If I run report_clocks in a post-synthesised design, Vivado does indeed report the following outputs from the MMCM:

out_clk158_timing_clk_mux_2 6.309 {0.000 3.154} P,G,A {timing_wrapper_0/timing_clk_mux_0/inst/mmcme4_adv_inst/CLKOUT0}
out_clk158_timing_clk_mux_3 6.309 {0.000 3.154} P,G,A {timing_wrapper_0/timing_clk_mux_0/inst/mmcme4_adv_inst/CLKOUT0}
out_clk88_timing_clk_mux_2 11.356 {0.000 5.678} P,G,A {timing_wrapper_0/timing_clk_mux_0/inst/mmcme4_adv_inst/CLKOUT1}
out_clk88_timing_clk_mux_3 11.356 {0.000 5.678} P,G,A {timing_wrapper_0/timing_clk_mux_0/inst/mmcme4_adv_inst/CLKOUT1}

So far, so good (note: the suffixes used to be _0 and _1. I don't know why they are now _2 and _3).

Having read many forum posts about constraining mux outputs (for example, https://forums.xilinx.com/t5/Timing-Analysis/Vivado-12-4739-set-clock-groups-No-valid-object-s-found-for/td-p/1165772, https://forums.xilinx.com/t5/Timing-Analysis/Vivado-and-BUFGMUX-timing/td-p/444448), I added the following lines to my constraints file:

set_clock_groups -name clk_mux_exclusive_88 -physically_exclusive -group [get_clocks -of [get_pins timing_wrapper_0/timing_clk_mux_0/inst/mmcme4_adv_inst/CLKOUT0]]
set_clock_groups -name clk_mux_exclusive_158 -physically_exclusive -group [get_clocks -of [get_pins timing_wrapper_0/timing_clk_mux_0/inst/mmcme4_adv_inst/CLKOUT1]]

My understanding is that these constraints prevent Vivado from analysing timing paths between two possible mux outputs, since they cannot exist at the same time.

HOWEVER

I still get hold violations - one set are for signals through a DSP48, and the other set are through a Vivado-generated ILA. For example:

Slack (VIOLATED) : -0.250ns (arrival time - required time)
Source: timing_wrapper_0/labtools_fmeter_0/FMETER_gen[1].COUNTER_F_inst/bl.DSP48E_2/DSP_ALU_INST/CLK
(rising edge-triggered cell DSP_ALU clocked by out_clk158_timing_clk_mux_3 {rise@0.000ns fall@3.154ns period=6.309ns})
Destination: timing_wrapper_0/labtools_fmeter_0/FMETER_gen[1].COUNTER_F_inst/bl.DSP48E_2/DSP_OUTPUT_INST/ALU_OUT[0]
(rising edge-triggered cell DSP_OUTPUT clocked by out_clk158_timing_clk_mux_2 {rise@0.000ns fall@3.154ns period=6.309ns})
Path Group: out_clk158_timing_clk_mux_2
Path Type: Hold (Min at Slow Process Corner)
Requirement: 0.000ns (out_clk158_timing_clk_mux_2 rise@0.000ns - out_clk158_timing_clk_mux_3 rise@0.000ns)
Data Path Delay: 0.200ns (logic 0.200ns (100.000%) route 0.000ns (0.000%))
Logic Levels: 0
Clock Path Skew: 0.333ns (DCD - SCD - CPR)
Destination Clock Delay (DCD): 2.714ns
Source Clock Delay (SCD): 2.018ns
Clock Pessimism Removal (CPR): 0.363ns
Clock Uncertainty: 0.055ns ((TSJ^2 + DJ^2)^1/2) / 2 + PE
Total System Jitter (TSJ): 0.071ns
Discrete Jitter (DJ): 0.084ns
Phase Error (PE): 0.000ns
Clock Net Delay (Source): 3.198ns (routing 1.562ns, distribution 1.636ns)
Clock Net Delay (Destination): 3.578ns (routing 1.720ns, distribution 1.858ns)

Slack (VIOLATED) : -0.189ns (arrival time - required time)
Source: u_ila_6/inst/ila_core_inst/shifted_data_in_reg[7][0]_srl8/CLK
(rising edge-triggered cell SRL16E clocked by out_clk88_timing_clk_mux_3 {rise@0.000ns fall@5.678ns period=11.356ns})
Destination: u_ila_6/inst/ila_core_inst/shifted_data_in_reg[8][0]/D
(rising edge-triggered cell FDRE clocked by out_clk88_timing_clk_mux_2 {rise@0.000ns fall@5.678ns period=11.356ns})
Path Group: out_clk88_timing_clk_mux_2
Path Type: Hold (Min at Slow Process Corner)
Requirement: 0.000ns (out_clk88_timing_clk_mux_2 rise@0.000ns - out_clk88_timing_clk_mux_3 rise@0.000ns)
Data Path Delay: 0.260ns (logic 0.235ns (90.385%) route 0.025ns (9.615%))
Logic Levels: 0
Clock Path Skew: 0.330ns (DCD - SCD - CPR)
Destination Clock Delay (DCD): 3.251ns
Source Clock Delay (SCD): 2.548ns
Clock Pessimism Removal (CPR): 0.372ns
Clock Uncertainty: 0.058ns ((TSJ^2 + DJ^2)^1/2) / 2 + PE
Total System Jitter (TSJ): 0.071ns
Discrete Jitter (DJ): 0.093ns
Phase Error (PE): 0.000ns
Clock Net Delay (Source): 3.724ns (routing 2.062ns, distribution 1.662ns)
Clock Net Delay (Destination): 4.110ns (routing 2.270ns, distribution 1.840ns)


I am concerned that the destination clock in the DSP48, for example, has a source clock of out_clk158_timing_clk_mux_3, and a destination clock of out_clk158_timing_clk_mux_2, EVEN THOUGH I HAVE CONSTRAINED THESE CLOCKS TO BE PHYSICALLY EXCLUSIVE.

It is interesting that the violations only occur through the DSP48 and the ILA, and not through any of the other logic clocked by the mux outputs.

What is happening? Does anyone know what I can do to fix this?

Any assistance would be much appreciated,

Steven

0 Kudos
1 Solution

Accepted Solutions
avrumw
Expert
Expert
397 Views
Registered: ‎01-23-2009

I just re-read your original post:

In fact, in my particular case, the two input clocks are actually used to clock logic upstream of the MMCM. Has this new constraint effectively false-pathed all this logic (at least with respect to the clocks in question)?

Yes, and this is a problem.

So since clk_clkin1 and clk_clkin2 DO go elsewhere in the design, we cannot declare them physically exclusive. So, we have to do it after the MUX.

If the MUX were an separate MUX (i.e. a BUFGMUX - no the MUX internal to the MMCM) then there is a syntax to do this (its a bit messy and I won't go into it here, since it doesn't apply). But since the MUX is internal to the MMCM (and is the better solution from the design point of view) we can't do that, so we need to work with what we have, which is the outputs of the MMCM. But this is a bit more complicated.

There are multiple "formats" of the set_clock_group command

  • set_clock_group -group {clka}
    • clka is unrelated to any other clock in the design
  • set_clock_group -group {clka} -group {clkb}
    • clka is unrelated to clkb
    • It says nothing about their relationship to other clocks
  • set_clock_group -group {clka clkc} -group {clkb clkd}
    • clka is related to clkc
    • clkb is related to clkd
    • clka is unrelated to clkb
    • clka is unrelated to clkd
    • clkc is unrelated to clkb
    • clkc is unrelated to clkd
  • set_clock_group -group {clka clkc}   # which is effectively what you had before
    • clka is related to clkc
    • other than that clka and clkc are unrelated to all other clocks in the design

So, what you really want is

set_clock_groups -physically_exclusive -group {out_clk158_timing_clk_mux_2} -group {out_clk158_timing_clk_mux_3}
set_clock_groups -physically_exclusive -group {out_clk88_timing_clk_mux_2} -group {out_clk88_timing_clk_mux_3}

Note: this is very different from what you originally had...

But this uses the automatically created names of the generated clocks, which may change as other parts of your design change, so it isn't an ideal solution. It is solvable (at least somewhat...), but this format should work for now.

Avrum

 

View solution in original post

Tags (1)
7 Replies
salcock
Contributor
Contributor
499 Views
Registered: ‎12-19-2016

Update: in desperation, I tried explicitly false-pathing the failing nets, using the GUI to generate the constraint (the only difference is I have wild-carded the ALU_OUT[*] slice to avoid the need for 48 separate constraints):

set_false_path -from [get_pins {timing_wrapper_0/labtools_fmeter_0/FMETER_gen[1].COUNTER_F_inst/bl.DSP48E_2/DSP_ALU_INST/CLK}] -to [get_pins {timing_wrapper_0/labtools_fmeter_0/FMETER_gen[1].COUNTER_F_inst/bl.DSP48E_2/DSP_OUTPUT_INST/ALU_OUT[*]}]

I confirmed the constraint was valid by explicitly checked one of the slices:

report_timing -from [get_pins {timing_wrapper_0/labtools_fmeter_0/FMETER_gen[1].COUNTER_F_inst/bl.DSP48E_2/DSP_ALU_INST/CLK}] -to [get_pins {timing_wrapper_0/labtools_fmeter_0/FMETER_gen[1].COUNTER_F_inst/bl.DSP48E_2/DSP_OUTPUT_INST/ALU_OUT[0]}]
INFO: [Timing 38-91] UpdateTimingParams: Speed grade: -2L, Temperature grade: E, Delay Type: max.
INFO: [Timing 38-191] Multithreading enabled for timing update using a maximum of 8 CPUs
INFO: [Timing 38-78] ReportTimingParams: -from_pins -to_pins -max_paths 1 -nworst 1 -delay_type max -sort_by slack.
Copyright 1986-2019 Xilinx, Inc. All Rights Reserved.
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Tool Version : Vivado v.2019.1 (lin64) Build 2552052 Fri May 24 14:47:09 MDT 2019
| Date : Thu Apr 15 11:57:14 2021
| Host : localhost.localdomain running 64-bit unknown
| Command : report_timing -from [get_pins {timing_wrapper_0/labtools_fmeter_0/FMETER_gen[1].COUNTER_F_inst/bl.DSP48E_2/DSP_ALU_INST/CLK}] -to [get_pins {timing_wrapper_0/labtools_fmeter_0/FMETER_gen[1].COUNTER_F_inst/bl.DSP48E_2/DSP_OUTPUT_INST/ALU_OUT[0]}]
| Design : dgro_master_top
| Device : xcvu9p-flga2104
| Speed File : -2L PRODUCTION 1.23 03-18-2019
| Temperature Grade : E
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Timing Report

Slack: inf
Source: timing_wrapper_0/labtools_fmeter_0/FMETER_gen[1].COUNTER_F_inst/bl.DSP48E_2/DSP_ALU_INST/CLK
(rising edge-triggered cell DSP_ALU clocked by out_clk158_timing_clk_mux {rise@0.000ns fall@3.154ns period=6.309ns})
Destination: timing_wrapper_0/labtools_fmeter_0/FMETER_gen[1].COUNTER_F_inst/bl.DSP48E_2/DSP_OUTPUT_INST/ALU_OUT[0]
(rising edge-triggered cell DSP_OUTPUT clocked by out_clk158_timing_clk_mux_1 {rise@0.000ns fall@3.154ns period=6.309ns})
Path Group: (none)
Path Type: Setup (Max at Slow Process Corner)
Data Path Delay: 0.893ns (logic 0.893ns (100.000%) route 0.000ns (0.000%))
Logic Levels: 0
Clock Path Skew: -0.695ns (DCD - SCD + CPR)
Destination Clock Delay (DCD): 2.014ns
Source Clock Delay (SCD): 2.709ns
Clock Pessimism Removal (CPR): 0.000ns
Clock Uncertainty: 0.055ns ((TSJ^2 + DJ^2)^1/2) / 2 + PE
Total System Jitter (TSJ): 0.071ns
Discrete Jitter (DJ): 0.084ns
Phase Error (PE): 0.000ns
Clock Net Delay (Source): 3.573ns (routing 1.720ns, distribution 1.853ns)
Clock Net Delay (Destination): 3.194ns (routing 1.562ns, distribution 1.632ns)
Timing Exception: False Path

The tool reports the false path, yet the hold violation persists. How can this be?

0 Kudos
avrumw
Expert
Expert
446 Views
Registered: ‎01-23-2009

I don't think your set_clock_groups commands are correct...

If I understand correctly you have two clocks coming in to your design both at the same frequency. You MUX them together, either before the MMCM or using the MMCM clock selection. The MMCM then generates two outputs at different frequencies - one on CLKOUT0 and one on CLKOUT1.

As you have observed, this results in two clocks on each of the two outputs of the MMCM.

What your set_clock_groups command does is probably wrong - it declares false any path between the clock(s) on CLKOUT0 and the clock(s) on CLKOUT1 (actually it declares them false from CLKOUT0 to any other clock and from CLKOUT1 to any other clock). This is probably:

  • Not what you want to do - are the paths from the 88MHz to the 158MHz domains really false paths?
    • If you cross between these two domains, you may (or may not) need a clock domain crossing circuit (CDCC) and this CDCC will need an exception
      • Depending on the CDCC a false path may be underconstraining the CDCC
  • Doesn't do what I think you are trying to do, which is declare the "paths" between the two different 158MHz clocks as false

When you do 

get_clocks -of [get_pins timing_wrapper_0/timing_clk_mux_0/inst/mmcme4_adv_inst/CLKOUT0]

You are getting all of the clocks on this pin - in this case you are getting both out_clk158_timing_clk_mux_2 and out_clk158_timing_clk_mux_3, so your command is effectively 

set_clock_groups -name clk_mux_exclusive_88 -physically_exclusive -group {out_clk158_timing_clk_mux_2 out_clk158_timing_clk_mux_3}

This is explicitly putting these two clocks in the same group - so timing is not disabled between them.

Probably what you really meant to do was this

set_clock_groups -name clk_mux -physically_exclusive -group [get_clocks -include_generated_clocks <input_clock1>] -group [get_clcoks -include_generated_clocks <input_clock2>]

Where <input_clock1> and <input_clock2> are the names of the primary clocks - the ones that feed the input of the clock MUX (which should come from create_clock commands). 

This basically says that "any clocks that come through the opposite sides of the MUX, or any clocks derived from them, are false", which is probably what you want.

Avrum

salcock
Contributor
Contributor
415 Views
Registered: ‎12-19-2016

Thank you Avrum! You are absolutely correct in your interpretation of what I am trying to achieve. The new constraint has indeed resolved the timing error.

I don't fully understand WHY, however.

Let's step away from the complexities of my specific example and consider a general case:

  • We have an MMCM acting as clock multiplexer. We have two inputs: CLK_IN_A and CLK_IN_B.
  • We have one multiplexed output: CLK_OUT.

My understanding was:

  • Vivado needs to perform timing analysis on both possible MMCM outputs, so it creates two clocks: CLK_OUT0 and CLK_OUT1, which I have assumed correspond the different input clocks.
  • We need to tell Vivado that CLK_OUT0 and CLK_OUT1 cannot exist at the same time, and there is therefore no need to perform timing analysis between these two clock domains.
  • I therefore set a -physically_exclusive constraint on the two output clocks, but Vivado still tried to perform timing analysis, resulting in hold violations.

You have suggested modifying the constraint to set a -physically_exclusive constraint on the two INPUT clocks (and clocks derived from them), which has indeed resolved the timing issue. But surely it is the OUTPUT clocks that I care about? The two output clocks cannot physically exist at the same time, but the two input clocks definitely do. In fact, in my particular case, the two input clocks are actually used to clock logic upstream of the MMCM. Has this new constraint effectively false-pathed all this logic (at least with respect to the clocks in question)?

0 Kudos
avrumw
Expert
Expert
402 Views
Registered: ‎01-23-2009

I think you may be misunderstanding what the MMCM is.

Fundamentally the MMCM takes in one clock - CLKIN1 (I will come back to CLKIN2 later) and generates several different outputs where the frequency of the outputs depends on the frequency of the input clock and some programmable parameters to the MMCM; M and D (of which there is only one of each per MMCM) and O0-O6, which are specific to each output (CLKOUT0-CLKOUT6):

  • f_CLKOUT0 = f_CLKIN1*M/D/O0
  • f_CLKOUT1 = f_CLKIN1*M/D/O1
  • ...
  • f_CLKOUT6 = f_CLKIN1*M/D/O6

(It also does deskew and phase shifting operations, but those aren't important for this discussion).

From the static timing point of view, Vivado mimics this behavior. If you have a create_clock upstream of the CLKIN1, it takes the attributes of that clock to generate new clocks on each of the MMCM outputs. Thus you end up with a new clock on the CLKOUT0 output that is a generated clock with a frequency of the the original clock (the upstream create_clock) multiplied by M/D/O0 (and the same for the other outputs).

In your design your input clock is 88MHz, and you have programmed M, D, and O0 so that M/D/O0 gives 1 (generating an output with the same frequency) and M/D/O1 gives 1.8, thus giving you your 88MHz and 158MHz clock. These are two clocks that your design specifically created - presumably it needs them both, and it needs for them to be at these specific frequencies. How these two clocks interact is part of your design - are they completely independent (no crossing between them)? Are there paths that attempt synchronous crossing between them (I don't know, this might be possible with these ratios)? Are they treated as asynchronous clocks with true clock domain crossing circuits (and the required exceptions) between them? Without knowing your design, I can't tell.

Now, one other function of the MMCM is that it has a MUX "in front" of the actual MMCM, which MUXes CLKIN1 and CLKIN2. This is a separate function. 

Presumably you have a create_clock (of 88MHz) upstream of CLKIN1 (I will call this clk_clkin1) and another create_clock (also at 88MHz) upstream of CLKIN2 (I will call this clk_clkin2).

When two clocks are MUXed together in Vivado, the output of the MUX carries both clocks - so the output of the MUX in front of the MMCM (which can't be queried since it is internal to the MMCM cell) would carry both clk_clkin1 and clk_clkin2. These two clocks downstream of the MUX are physically exclusive (so if these clocks go nowhere other than the CLKIN1 and CLKIN2 of the MMCM, then clk_clkin1 and clk_clkin2 are physically exclusive).

As I mentioned above, the MMCM uses the attributes of the clock on the MMCM input to generate the clock outputs. But now there are TWO clocks on the MMCM input, so it generates two clocks on each of the MMCM outputs:

  • CLKOUT0:
    • out_clk158_timing_clk_mux_2: clk_clkin1*M/D/O0
    • out_clk158_timing_clk_mux_3: clk_clkin2*M/D/O0
    • out_clk88_timing_clk_mux_2: clk_clkin1*M/D/O1
    • out_clk88_timing_clk_mux_3: clk_clkin2*M/D/O1

Since clk_clkin1 and clk_clkin2 are physically exclusive:

  • out_clk158_timing_clk_mux_2 and out_clk158_timing_clk_mux_3 are physically exclusive and
  • out_clk88_timing_clk_mux_2 and out_clk88_timing_clk_mux_3 are physically exclusive

And that is what we informed the tools with the set_clock_groups I gave you above.

None of this tells us anything about the relationships that exist between out_clk_158* and out_clk_88* - as I said, these are clocks created intentionally by your design for specific reasons in your design.

Avrum

Tags (1)
0 Kudos
avrumw
Expert
Expert
398 Views
Registered: ‎01-23-2009

I just re-read your original post:

In fact, in my particular case, the two input clocks are actually used to clock logic upstream of the MMCM. Has this new constraint effectively false-pathed all this logic (at least with respect to the clocks in question)?

Yes, and this is a problem.

So since clk_clkin1 and clk_clkin2 DO go elsewhere in the design, we cannot declare them physically exclusive. So, we have to do it after the MUX.

If the MUX were an separate MUX (i.e. a BUFGMUX - no the MUX internal to the MMCM) then there is a syntax to do this (its a bit messy and I won't go into it here, since it doesn't apply). But since the MUX is internal to the MMCM (and is the better solution from the design point of view) we can't do that, so we need to work with what we have, which is the outputs of the MMCM. But this is a bit more complicated.

There are multiple "formats" of the set_clock_group command

  • set_clock_group -group {clka}
    • clka is unrelated to any other clock in the design
  • set_clock_group -group {clka} -group {clkb}
    • clka is unrelated to clkb
    • It says nothing about their relationship to other clocks
  • set_clock_group -group {clka clkc} -group {clkb clkd}
    • clka is related to clkc
    • clkb is related to clkd
    • clka is unrelated to clkb
    • clka is unrelated to clkd
    • clkc is unrelated to clkb
    • clkc is unrelated to clkd
  • set_clock_group -group {clka clkc}   # which is effectively what you had before
    • clka is related to clkc
    • other than that clka and clkc are unrelated to all other clocks in the design

So, what you really want is

set_clock_groups -physically_exclusive -group {out_clk158_timing_clk_mux_2} -group {out_clk158_timing_clk_mux_3}
set_clock_groups -physically_exclusive -group {out_clk88_timing_clk_mux_2} -group {out_clk88_timing_clk_mux_3}

Note: this is very different from what you originally had...

But this uses the automatically created names of the generated clocks, which may change as other parts of your design change, so it isn't an ideal solution. It is solvable (at least somewhat...), but this format should work for now.

Avrum

 

View solution in original post

Tags (1)
salcock
Contributor
Contributor
363 Views
Registered: ‎12-19-2016

Your reading of the situation is correct. There are two clocks which have nominally the same frequency (88 MHz) but they are derived from different sources. The design requirement is to dynamically select one of these clocks and forward it to the rest of the design. A new clock at 9/5 the frequency must also be generated in relation to this multiplexed clock (I appreciate the exact numbers are bit funky here - they are necessary in order to satisfy external requirements). All the input clocks and output clocks are completely independent: I would use CDC components if clock-crossing were required.

If it makes life easier in terms of syntax, there's nothing preventing me from instantiating a separate BUFGMUX primitive to handle the multiplexing. I could then pass its output to the MMCM for the frequency generation. I just thought it was cleaner to do everything inside the MMCM.

In this case, the main problem seems to be that I fundamentally misunderstood the syntax of the set_clock_groups constraint. I had the correct clocks, but I was erroneously putting them in the SAME group instead of in DIFFERENT groups. Your explanation has fully cleared this up and I am very grateful.

I ended up with a tcl-based constraint to get around the necessity to use the generated clock name:

set clk_out0_list [get_clocks -of [get_pins timing_wrapper_0/timing_clk_mux_0/inst/mmcme4_adv_inst/CLKOUT0]]
set_clock_groups -physically_exclusive -group [lindex $clk_out0_list 0] -group [lindex $clk_out0_list 1]
set clk_out1_list [get_clocks -of [get_pins timing_wrapper_0/timing_clk_mux_0/inst/mmcme4_adv_inst/CLKOUT1]]
set_clock_groups -physically_exclusive -group [lindex $clk_out1_list 0] -group [lindex $clk_out1_list 1]

It seems to work but I would of course be interested if there is a cleaner solution.

Once again, my thanks for your high-quality insight.

avrumw
Expert
Expert
359 Views
Registered: ‎01-23-2009

Yes - that is exactly what I was going to suggest (using the lindex). It's not perfect since it presumes that there are only two clocks (and not, say, three), but it is better than using the fixed clock names.

As for using a separate BUFGMUX, the syntax isn't that much cleaner, and from an architectural point of view putting the BUFGMUX before the MMCM isn't ideal, so what you have now is preferable.

Avrum

0 Kudos