cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Explorer
Explorer
1,092 Views
Registered: ‎09-10-2019

[BIG PROBLEM] unused mark_debug not optimized away

I have a struct used only to see data more easily when I use an ILA. I thought the implementation would optimize it away if the signal isn't connected anywhere.

(* mark_debug="true" *) t_axi_rq maincore_rq;

// Just for ILA
(* mark_debug="true" *) t_hdr_RQ   maincore_rq_probe_hdr;
assign maincore_rq_probe_hdr = maincore_rq.tdata;

Nevertheless, because I can't meet timing, I surprisingly saw it hasn't been removed and it added a physical useless LUT in between using useless paths.

image.png

image.png

That's the routing to link the red to the yellow

image.png

My bus is 512b wide, the wasted resource is astronomic. I even don't dare checking all the other unused mark_debug of my design.

Why? That's a HUGE issue.

Tags (1)
0 Kudos
14 Replies
Highlighted
Explorer
Explorer
1,074 Views
Registered: ‎09-10-2019

Removing the mark_debug solves all the failing paths I had.
0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
1,068 Views
Registered: ‎08-13-2007

You should be able to remove it via TCL before opt_design if that is easier than modifying the RTL:

https://forums.xilinx.com/t5/Implementation/Remove-MARK-DEBUG-to-allow-LUT-optimization/td-p/907170

I believe mark_debug explicitly disables normal synthesis and implementation optimizations to allow the signals to be probed if needed.

Cheers,

bt

 

0 Kudos
Highlighted
Explorer
Explorer
1,057 Views
Registered: ‎09-10-2019

@barriet

Signal to be probed after implementation? I'm curious to know how a signal can be probed after the implementation.

mark_debug should have an effect only on synthesis, nothing else.

  • Why does it prevent the implementation to optimize them?
  • What's the point?
  • Why doesn't the implementation optimize garbage LUTs at the Opt Design?

That's a very serious issue, and they are real questions.

I'm not interested in using any TCL commands to do manually what Vivado should do out-of-the-box. Especially if I have to manually remove the mark_debug of all the signals that aren't connected (to any logic or even to a debug core).

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
1,053 Views
Registered: ‎08-13-2007

there's a variety of debug flows possible - and some might use use this flexibility

see ug936, "Lab 6 Using the ECO Flow to Replace Debug Probes Post Implementation" for one example

 

also, from from ug835 v2020.1 page 1325

"• MARK DEBUG: number of cells in the path with a MARK_DEBUG value of TRUE. By default a
net with MARK_DEBUG has DONT_TOUCH set to TRUE which disables optimization on that
net. The DONT_TOUCH can be set to FALSE to enable optimization and potentially improve
timing."
https://www.xilinx.com/support/documentation/sw_manuals/xilinx2020_1/ug835-vivado-tcl-commands.pdf

Cheers,

bt

0 Kudos
Highlighted
Explorer
Explorer
1,045 Views
Registered: ‎09-10-2019

Thank you for your answer @barriet .

Could you explain me the reason and more details on your previous answer?

Signal to be probed after implementation? I'm curious to know how a signal can be probed after the implementation.

  • Why does it prevent the implementation to optimize them?
  • What's the point?
  • Why doesn't the implementation optimize garbage LUTs at the Opt Design?

 

It makes sense to prevent optimization on these signals during Synthesis but I don't understand why the implementation doesn't optimize unused/unconnected signals.

What is the architectural reason behind that behavior?

More I use Vivado and more I find non-sense behaviors like this one.

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
1,025 Views
Registered: ‎01-30-2019

Hi @alexis_jp 

Can you confirm if the LUT1 is present after synthesis? If the LUT is not present after synthesis then it is possible that the tool added a LUT on this path during post place Phys opt to fix the hold Violations.

Is it possible for you to share the design with me for a debug?  Let me know so that I can share an ezmove link for sharing the design securely

0 Kudos
Highlighted
Explorer
Explorer
1,013 Views
Registered: ‎09-10-2019

@surajc

The LUT is present at synthesis and stays after implementation. I won't be able to share my code. I need to check but it seems all the mark_debug=true signals have a garbage LUT on the net.

Do you confirm that is a bug or it's an intended behavior?

You should be able to reproduce the issue with any design using mark_debug not tied to any debug core.

My design contains tens of thousands of those garbage LUTs placed and routed.

0 Kudos
Highlighted
Explorer
Explorer
956 Views
Registered: ‎09-10-2019

Using "set_property mark_debug false ..." in an XDC file PROCESSING_ORDER=NORMAL or EARLY doesn't change anything.

The signals aren't flagged as mark_debug anymore but the LUTs are still present and the implemented design has a huge WNS. It isn't implemented as if I removed all the mark_debug from the code, not even close.

0 Kudos
Highlighted
Scholar
Scholar
945 Views
Registered: ‎09-16-2009

The use of the MARK_DEBUG attribute by the Xilinx tools is a rather blunt tool.  It does affect results both during synthesis and during place and route implementation.

The idea, I believe, is to allow for quicker iterations of changing ILA connections.  Customers can add/remove ILA connections and only run part of the implementation.  The MARK_DEBUG tags signals in such a way to allow these changes, at the cost of preventing some optimizations on these signals.

It sounds like your problem is you have a large structure (512 bits) and are only interested in the debug of some of these signals. But since MARK_DEBUG is such a blunt tool, you can only apply the attribute to the large atomic, full 512 structure.

Here's how I use MARK_DEBUG and ILA's to solve similar problems without the overhead.  A detail note - I use MARK_DEBUG simply to manage the naming of the probes within vivado analyzer.  I wish the tool had a better way of manipulating the signal names.  It's either derived it (via MARK_DEBUG), or a user is left with awful hand editting of generated LTX files.

I use ILA instantiation, not inference - I prefer to stick the ILA directly inside my RTL code.  To me this is MUCH easier than hunting through a gate level netlist in order to find the signal I'm interested in.  Some examples:

generate
  if( DEBUG )
  begin : gen_debug
    (* mark_debug = "yes" *) wire [ 1 : 0 ] probe_state = state;
    (* mark_debug = "yes" *) wire [ 1 : 0 ] probe_some_2bit_signal = large_signal[ 511: 510 ];
    (* mark_debug = "yes" *) wire [ 7 : 0 ] probe_some_byte = some_large_structure.subfield;
   
  wire [ 255 : 0 ] debug_bus = 
  { 
     probe_state, 
     probe_some_2bit_signal,
     probe_some_byte,
    // ...
  };
  ila_256_2048  my_ila
  (
    .clk( clk_i ), .probe0( debug_bus )
  );
end
endgenerate

The idea here is I explicitly create signals (usually with name "probe*") of just the signals I'm interested in debugging with the ILA.  These have the "mark_debug" attribute added.  They are all grouped together, and connected to a (generic) 2048 deep by 256 wide) ILA.  Doing things this way, the mark_debug attribute is granular enough that ONLY the signals being probed are marked to prevent optimization.  Further, the naming of the probed signals are correctly implemented in the Vivado Analyzer via the LTX file.

This method, to me, is much easier than the Xilinx  recommend use of MARK_DEBUG, which requires the user to hunt through gate-level netlists and uses post-implementation "magic" to hook up ILAs.  The cost of this, however, is greater machine time to build my debug bit file - in that I must re-run synthesis, not just place and route to change my desired ILA probes.  To me, more machine time is almost always the better solution over more (tedious) human time.

Regards,

Mark

Highlighted
Xilinx Employee
Xilinx Employee
931 Views
Registered: ‎08-13-2007

Good input, Mark - thank you for sharing your experience and approach here.

 

Cheers,

bt

0 Kudos
Highlighted
Guide
Guide
909 Views
Registered: ‎01-23-2009

It makes sense to prevent optimization on these signals during Synthesis but I don't understand why the implementation doesn't optimize unused/unconnected signals.

What is the architectural reason behind that behavior?

This is intended behavior.

If you look at UG937 Chapter 7, you will see that there is a flow for changing your ILA connections post-implementation. In essence, you can use the ECO (engineering change order) mechanism of VIvado to change connections to the ILA. This requires only an ECO re-route, which is fast and leaves your design very similar to the design before the ECO. To do this, though, and expect that you can connect all the nets that were flagged as MARK_DEBUG, these nets need to be preserved. So, while your initial implementation stage didn't use these nets (and the LUT1s that were used to anchor them down), they are intentionally left so that these nets are available for connection to the ILA in an ECO step.

Mechanically, the MARK_DEBUG sets the DONT_TOUCH, and the DONT_TOUCH is explicitly described as having an effect in both synthesis and implementation.

So there is a cost for using MARK_DEBUG, even if you don't actually use the signals for your ILA in the initial design. For this reason, you should be careful with MARK_DEBUG; when this is applied to only a small number of nets, the cost is pretty negligible, but you have found the problem - if you have lots of nets (including wide busses) with the MARK_DEBUG set, they can start to affect your utilization and timing results.

Avrum

Highlighted
Explorer
Explorer
880 Views
Registered: ‎09-10-2019

Thank you all for your answer.

Our goal is to touch the code the least and taking advantage of the tool's features.

We used MARK_DEBUG on all the important signals in all our modules "just in case". We confidently thought Vivado was smart enough to optimize and discard unused flags after the synthesis if no ILA exists.

@markcurry 

> To me this is MUCH easier than hunting through a gate level netlist in order to find the signal I'm interested in.

That isn't true and we don't agree at all. You're doing what the tool is supposed to do for you automatically and in a code non-invasive way.

Your solution is limited, you assumed I need part of the 512b structure only, and I still need to change the code each time I want to probe a signal.

Using the "Set up debug" is much more human time saving and easier. Keeping MARK_DEBUG in the code on all the important signals, "Set up debug", searching the signals by name and MARK_DEBUG=true and that's all. Letting Vivado doing all the job, everything in the GUI without changing any line of code. It doesn't take more than 2min.

Intel's Signaltap was very convenient for that, no need for a special flag and we always could find the signals.

@avrumw 

That's true, but in an ideal world that ECO feature is configurable like the incremental compilation. When disabled, treat mark_debug signals as normal and optimize them if not connected to a debug core.

It'd be interesting to see the usage statistics of the ECO feature among all the Vivado users. Penalizing all the users for such a rare use case... I'm curious about the real gain for a real life design.

 

@avrumw @markcurry I removed the MARK_DEBUG during synthesis using set_property -verbose MARK_DEBUG false [get_nets -hierarchical -filter { MARK_DEBUG == "TRUE" } ].

Nets aren't flagged anymore in the Synthesis' netlist, but the garbage logic stays and that leads to the exact same implemented design as if I didn't remove the flag.

 

Overall conclusion: the MARK_DEBUG, DONT_TOUCH, KEEP flags are completely sh***y and not handled in an efficient way. The only way to use them is to manually add and remove them each time we need to probe signals. For big design with lots of files, good luck. Such a waste of time.

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
796 Views
Registered: ‎01-30-2019

Hi @alexis_jp , and All,

I was able to reproduce the issue with your STR ( RTL attached below) 

As mentioned in the earlier email and the forums thread, the presence of these LUT1's after synthesis is questionable. 
The root cause of this is the following lines in your code. //names changed as per my test case
(* mark_debug = "yes" *) wire [3:0] temp_1,temp_2;
(* mark_debug = "yes" *) wire [3:0] probe_temp1;
assign probe_temp1 = temp_1
what's happening here is you are assigning an already mark_debug net to another mark_debug net, now Since mark_debug is equivalent to dont_touch the synthesis tool has to infer a single net with two names temp_1 and probe_temp1. And this can be only done by making two sections of the net separated by a garbage LUT. If you check the net name of the input of the LUT its temp_1 and the output is probe_temp1. This is the reason for the presence of the LUT1's in the first place.
 
Now since mark_debug acts as dont_touch, the implementation is not touching the LUT 1's and thus you are seeing the timing issues in your design.
So to conclude this is expected behaviour of the tool with the above-mentioned use of mark_debug attribute.
 
Since you do not want to make modifications to the RTL, a workaround for this issue using tcl is as follows:- 
a. Get the nets of the garbage LUT's
b. remove mark_debug from them
c. opt_design -retarget -debug_log  
d. apply the mark_debug to other nets 
This workaround is mentioned in the attached TCL file, So simply open the Synthesized design source the tcl and save the checkpoint, and then use implementation as we normally do.
0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
630 Views
Registered: ‎01-30-2019

Hi All,

A CR was filed to modify this current behaviour of the tool. The CR is like 

The tool during opt_design will check if any mark_debug net is connected to an ILA or not. If the net is not connected to the ILA then the tool will consider these nets as normal ( mark_debug false ) and optimise them if they are a valid optimisation.

For now, using the following method is recommended by "me". i.e.

Identify the nets which will not be connected to the ILA -> create another XDC ( used in implementation only ) -> add mark_debug false to those nets -> proceed further.

Let us know if there are any concerns with this workaround.

Tags (1)
0 Kudos