Basic understanding of the CLOCK_LOW_FANOUT Constraint.
The RQS_CLOCK-12 suggestion is an automatic incremental friendly suggestion generated for UltraScale and UltraScale+ devices.
It uses the “CLOCK_LOW_FANOUT” property that will be assigned to either a clock net or a set of flip flops driven by a global clock buffer based on its load count.
When applied on a clock net, the placement of the loads of a global clock buffer is constrained to a single clock region.
When applied on a set of flip-flops, a new global clock buffer is replicated in parallel to the existing global clock buffer created during opt_design. The loads of the new global clock buffer are for these sets of flip-flops only, and are constrained to a single clock region.
Now let us see how the RQS_CLOCK-12 suggestion applies CLOCK_LOW_FANOUT to help timing closure of the design by reducing the clock skew.
Consider the two scenarios below in a routed design where there is an improper clock skew causing timing violation on the path from the flip-flop to the control pins (CE/CLR) of the global buffer.
In this timing failed path, the clock buffer BUFGCE1 (clockout3_buf), the flip-flop and its driver BUFGCE2 (bufce_i) are all placed in the same clock region. The BUFGCE1 driving the flip-flop has high fanout (6419) and its clock net is spanning across the device as highlighted due to the loads.
The tool selects a CLOCK_ROOT location far from the global clock buffer driving it, causing high clock net delay and high clock skew.
Resolution for Scenario 1:
Apply CLOCK_LOW_FANOUT on the flip-flop so that a newly replicated BUFGCE (clkout3_buf_replica) is created from the original BUFGCE1 during opt_design which will drive only this critical flip flop. The net will now be constrained to a single clock region, reducing the clock net delay.
Also, because the source and loads are in the same clock region, CLOCK_LOW_FANOUT forces the clock root to be in the same clock region which helps in reducing the clock skew.
Schematic after CLOCK_LOW_FANOUT is applied on the critical flip-flop:
During the BUFG optimization phase of opt_design you should see a message on the global clock buffer created for the CLOCK_LOW_FANOUT property.
INFO: [Opt 31-1077] Phase BUFG optimization inserted 1 global clock buffer(s) for CLOCK_LOW_FANOUT.
In this timing failed path, the clock buffer BUFGCE1 (clkout1_BUFG_inst), the flip-flop and its driver BUFGCE2 are also placed in the same clock region. BUFGCE1 has low fanout (16) driving flip-flops but the loads are spread across multiple clock regions (marked in red). As a result, the tool selects a different CLOCK_ROOT to the global clock buffer driving it which causes high clock net delay and high clock skew.
Resolution for Scenario 2:
When the BUFGCE1 has low fanout (<2000) but the clock loads are spread across multiple clock regions, apply CLOCK_LOW_FANOUT on the clock net directly driven by BUFGCE1 so that the placement of all of its loads is constrained to a single clock region. This will reduce the clock net delay.
With source and loads now in the same clock region, CLOCK_LOW_FANOUT forces the clock root to be in the same clock region which helps to reduce the clock skew.
Schematic after CLOCK_LOW_FANOUT is applied on the clock net:
In this blog we have learned through the two example designs how the RQS_CLOCK-12 suggestion will be generated to apply the CLOCK_LOW_FANOUT property either on flip-flops or a clock net which are directly driven by a global clock buffer.