UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 

Saving Compile Time Series 2: Using Incremental Implementation

Xilinx Employee
Xilinx Employee
4 0 534

The incremental implementation flow has evolved in the time since it was first supported, and a number of enhancements targeting performance and compile time have been added.

It addresses the need for fast iteration during the implementation phase, greatly saves on compile time, and can also ensure predictable results and performance.

The following is a chart showing the trend in savings on compile time after adopting the incremental implementation flow on a suite of difficult designs:

blog2-1.png

Figure 1: 2019.1 Compile time savings with incremental implementation flow for both internal and external designs

blog2-2.png

Figure 2: 2019.1 QoR Predictability with incremental implementation flow

Figure 1 shows an average compile time saving of 2.12x with this flow for designs running over 2 hours (up to 10% design changes).

Figure 2 shows a measure of QoR predictability for small RTL changes. ΔWNS shows the WNS degradation compared to the reference run. You can see that incremental compile enables better QoR predictability compared to default runs. An exception is that there is less compile time benefit for smaller design, because the initialization step takes up more of the total compile time.

Flow:

the flow is supported in both Project Mode and Non-Project Mode. When you load the reference design checkpoint using the read_checkpoint -incremental <dcp_file> command and the <dcp_file> points to the reference DCP location and name, it enables the Incremental Compile design flow for subsequent place and route operations.

In Non-Project Mode, read_checkpoint -incremental should follow opt_design and precede place_design.

blog2-3.png

Currently, automatic and non-automatic modes are both supported. To enable the automatic mode, you can open the implementation settings, and check the “Automatically use the checkpoint from the previous run” option. If the automatic mode is not checked, the user desired DCP can also be specified as a reference checkpoint to guide the next runs.

blog2-4.png

Figure: Design Run Settings

 

When to use this flow?


This flow is useful when your design code is stable and small code modification can be made later, or if you are working on the last mile to close timing and are nearly there. In both situations you might want a short cycle for generation of each implementation version.


The incremental implementation flow reads placement and routing information from the reference checkpoint, and does a match comparison with the current post opt_design netlist.

The matched cells will be reused and newly added logic will get optimized to run in the default flow. Crossing optimization between matched and unmatched cells will also be done.

As a result, this flow will have the most benefit to compile time when most of the logic can be reused, and when the design is close to meet timing.


Another use case is when you have a difficult design which is not near to being closed, but you want to reuse the design at some levels (for example, the SLR/Block type/module level). In this case you can change the incremental mode to partial reuse mode.

For example, the constraint line below has the same effect as back-annotating all RAMs’ location from a netlist into an XDC and applying the constraints in a following run:


read_checkpoint -incremental routed.dcp -reuse_objects [all_rams] -fix_objects [all_rams]

 

Factors that can affect incremental implementation compile time

Before adopting this flow, be aware of the following which can help you to get the most out of the incremental flow:

  • Choose the right checkpoint. You need to make sure that the reference checkpoint is at the same part as the design to be guided, and implemented by the same Vivado release as the current run. DCPs generated by a different release can result in less cell matching and the savings in compile time will not be as expected.
  • Limit the amount of change in timing-critical areas to ensure design closure consistency and timing closure. Too many changes in the  design logic could lead to poor guiding results or a longer compile time. If critical path placement and routing cannot be reused, more effort is required to preserve timing. Also, if the small design changes introduce new timing problems that did not exist in the reference design, higher effort and run time might be required, and the design might not meet timing. Always ensure that you are matching the opt_design directive, as changes in opt_design could lead to more cell name changes.
  • If the automatic mode is enabled, the reference checkpoint only gets updated if the timing of the reference run is > -0.250 ns, which means you must get a good enough timing reference checkpoint.
    A poor reference netlist could cause long compile times. When not updated, if an existing checkpoint exists from an earlier run, Vivado attempts to use that as a reference checkpoint. Otherwise, it reverts to the default implementation flow when no reference exists.
    When the default run behaviour is followed, Vivado respects the run strategy selected by the user, and the compile time would be similar to a non-incremental run.
  • If a checkpoint exists after a run begins (either a newly updated or pre-existing reference checkpoint), a second check algorithm concerning design netlist changes is run and the incremental flow is only used if the required criteria is met. When these conditions have not been met, the flow automatically falls back to the default implementation flow, and the following message is issued after read checkpoint incremental:

WARNING: [Project 1-964] Cell Matching is less than the threshold needed to run Incremental flow. Switching to default Implementation flow

  • High-Reuse mode: High reuse mode is entered when the cell reuse percentage is above 75%. In high reuse mode, the place and route algorithms are optimized to reuse as much existing placement and routing information as possible. High reuse mode is most effective on designs where the reference checkpoint is the timing closed with 95% or more cell reuse. For example, where there are small design changes between the reference and current design, or adding debug cores to the design.
    There are three directives that are available for use as place_design and route_design:
    • Default: Gets results that are as close to the reference run as possible. Targets reference design WNS. This mode is compile time optimal for the typical use case.
    • Explore: Tries to improve timing as much as possible. Targets 0.00 ns WNS. This takes more compile time.
    • Quick: Runs the place and route commands without calling the timing engine. This is compile time optimal and will not affect QoR in some designs with a high reuse rate > 99.5%.
  • Low Reuse mode: Low reuse mode is entered when there have been large changes to the design compared to the reference checkpoint, or the user has specified that only a subset of cells from the reference checkpoint should be reused using the -only_reuse switch to the read_checkpoint command.
    In low reuse mode, all place_design and route_design directives are supported and the tool will target a WNS of 0.00 ns. This could take a longer compile time compared to high reuse mode. Low reuse mode is most effective on designs that are exhibiting challenges to the place and route in specific areas. For example, reusing Block Memory or DSP placement from a good run, or reusing a specific level of hierarchy that closes timing intermittently.
  • The initialization portion of the place and route run time.
    In short place and route runs, the initialization overhead of the Vivado placer and router might eliminate any gain from the incremental place and route process. For designs with longer run times, initialization becomes a smaller percentage of the run time, and we can see significant compile time gain.
  • The implementation compile time can be further shortened by means of enabling multi-threading. Currently the maximum limit is 8 for Linux systems.

set_param general.maxThreads 8

 

Generate the incremental compile time saving report

Run the command report_incremental_reuse to generate a report showing the incremental reuse. Section 3 of the report lists the compile time (elapsed and cpu), showing the compile time spent for each step.

Because the incremental guiding only begins at the place_design phase, you need to be aware that the time counted for the incremental run will include the read_checkpoint step which is to read in the reference checkpoint, and the incremental compile time comparison should only be started from place_design.

The following table shows the time for each phase, with the readcheckpoint.

Additionally, the amount of change in the new netlist could have a higher impact on the incremental run time.

Please note that the phys_opt_design step in the incremental run is an optional step, when it is called in the flow, it will run in default mode to further optimize on unguided or changed paths, no impact to reused paths .

blog2-5.png

Summary

We can achieve fast iteration of implementation runs by adopting the incremental implementation flow, with compromises on compile time and QoR, when the factors listed above are taken account of during the running of this flow.

More details about this flow can be found in (UG904).

Tags (2)