A lot of FPGA designs can struggle to hit their required performance targets. The reasons for this are varied but here is a list of some possible causes:
Not following the UltraFast Design Methodology
Poor timing constraints
Too many control sets
High number of logic levels for target performance
Tool optimization limited due to constraints
How can these issues be identified and fixed quickly with minimum effort?
Report QoR Suggestions
Report QoR Suggestions (RQS) identifies issues with a design and offers solutions in the form of tool switches, properties that influence tool behavior on cells, and recommends text modifications where it is not possible to automate a solution.
In Vivado 2019.1, RQS started to output suggestion objects. This allows us to track suggestions, automate their implementation, improve the validation of each suggestion and allow more complex suggestions. With that came the introduction of new commands and some alterations in the flow which we will take a look at here:
The 'report_qor_suggestions' command will generate new suggestions and report on existing ones. As is shown in the following diagram, it can be run after any phase of the implementation process
Once suggestions are reviewed, an RQS file is written containing the selected suggestions using 'write_qor_suggestions'. Suggestions are automatically ENABLED (capitalization indicates a property on the suggestion object) during this process.
Next the RQS file is typically read into the Suggestion Run before 'synth_design' or 'opt_design'. In this flow, AUTOMATIC suggestions will get applied after they pass through the APPLICABLE_FOR flow stage.
For AUTOMATIC suggestions to be APPLIED, they should be ENABLED at the same time as the APPLICABLE_FOR stage is called in the Suggestion Run. The following diagram shows what happens to a suggestion as it passes through the APPLICABLE_FOR stage:
In the Suggestion Run, a user can call 'report_qor_suggestions' again. The whole process can be repeated, accumulating suggestions from a previous run and the current run into a single file, that can be fed into the latest suggestion run.
If some of the suggestions are not desired, you can limit what you write to the file using the following command:
If 'report_qor_suggestions' is run multiple times during the flow, when the same suggestions are generated at different stages of the flow RQS will automatically manage duplicates.
It is possible to get overlapping suggestions. For example, you can achieve the same outcome by running synthesis or 'opt_design' suggestions. When this is done, RQS will only allow one of the suggestions to be APPLIED with preference given to the synthesis suggestion.
In addition, when a checkpoint is written, the current state of the suggestions is stored within the checkpoint. So, if suggestions have been read, then a checkpoint is subsequently written, when the checkpoint is reopened there is no need to read the suggestions again.
Case Study Example
The best way to see how suggestions work is to take an example.
Below is a list of suggestions that can be seen after 'place_design' for this specific example design:
First of all let us focus on the name. The first suggestion has the NAME RQS_XDC-1-1. The NAME gives an indication of the category of suggestion. This is from the XDC category. There are 6 categories in total:
As a rule of thumb, suggestions that impact Utilization, XDC and Clocking should be resolved early in the design cycle as shown in the following figure:
These suggestions typically impact a high number of paths and will also reduce the severity of the issues seen in congestion and timing later on in the design closure process. There is nothing to prevent the application of timing and congestion solving suggestions at the same time as clocking, utilization and XDC suggestions, but they might increase utilization and might not be required after a clocking fix.
For this reason, it is generally not recommended to try to solve Timing or Congestion issues before Clocking and XDC are in line with Xilinx UltraFast Methodology recommendations.
Timing and congestion issues tend to be focused on specific modules or specific timing paths.
Congestion is only seen after placement, and improves in accuracy after routing.
Timing suggestions are typically only reported on paths where RQS sees a timing path violation. By default, RQS sees 100 clock paths per clock group. If there are timing failures and they are not seen in these paths, RQS will not offer suggestions on these paths. To increase the number of paths run the following command:
report_qor_suggestions -max_paths <value more than 100>
Strategy suggestions will be covered in a future blog.
Next look at the last suggestion in the table, RQS_CLOCK-1-1. From the table you can see that this is an AUTOMATIC suggestion. This suggestion will apply a CLOCK_DELAY_GROUP property to nets driven by a BUFG.
The second to last suggestion, RQS_CLOCK-2-1, is a manual (AUTOMATIC = 0) suggestion. This recommends changing the clocking topology to be more optimal by swapping BUFGCEs + MMCM dividers for BUFGCE_DIVs with built in dividers. Vivado does not have the capability to swap these buffers automatically so an RTL edit is required.
As the name would suggest, AUTOMATIC suggestions are low effort whilst manual suggestions require more effort. The following shows the different approaches required for Automatic and manual suggestions.
Apply properties to objects
Add switches to commands
Minor constraints changes
RTL design edit is required
Constraints update is required
More user analysis is required
In total, just under 80% of the total suggestions are automatic. Due to the effort required with manual suggestions, it might be acceptable to skip some manual CLOCKING or UTILIZATION suggestions before attempting AUTOMATIC congestion suggestions. However, the best QoR will be seen by resolving these issues first.
The QoR Gain
Shown below are the results from 30 designs using the following criteria:
'place_design' Explore directive
Reference Run without suggestions versus identical flow Suggestion Run
Clocking suggestions generated at 'place_design'
All other suggestions generated at 'route_design'
Only AUTOMATIC suggestions are compared
The QoR gain is measured in two ways:
Looking at the absolute improvement in WNS (an easy to digest metric).
Looking at the geomean gain across all clocks that were failing in the reference run (a more solid measure of QoR gain).
The following example is from the design whose table we looked at previously:
The blue height represents the Reference Run and the orange height shows the new WNS of the Suggestion Run. It can be seen that RQS can have a significant improvement on a design's WNS. Average WNS gain across these 30 designs is 0.648 ns.
This graph shows a more robust measure. It is looks at all failing clocks and takes the geomean % improvement. This method smooths out numbers where a single clock has a large failure but multiple clocks fail timing.
The average geomean gain in these designs is 12.1%.
There are of course standout gains too. In the top 4 designs, QoR was improved by an average of 34.7%.
If you profile the gains you will see the following:
20+% QoR gains are seen when there is a particular issue that has a big impact on a small number of paths. This is low hanging fruit.
10%+ QoR gains are seen when clocking issues are resolved
Smaller gains are seen when work is done on individual timing paths that are typically at the end of the design closure cycle.
Gains are always more difficult to realize when the low hanging fruit has already been captured. This profile shows how RQS has an influence throughout the design cycle and should be run after significant alterations in the design.
In addition to the numbers shown, there is no easy way to measure the gain from the manual suggestions, so the QoR gain seen by the user can even exceed the numbers shown here when the manual modifications are carried out.
If you want to get started with RQS. The perfect place to start is the tutorial located here: