09-23-2019 12:26 PM
I have a design with pretty bad congestion (level 7) and timing. I suspect part of both is related to fanout. Utilization does not seem to be a problem, so I ought to be able to throw some FFs at it.
Does anyone have a rule of thumb as to what max fan-out should be set to globally? Or is it more often done within each module file locally? Does anyone have experience with "rebuilt" hierarchy? I see in the logs that max fan-out is ignored in the case of crossing hierarchy, so makes sense to flatten but I don't want to lose hierarchy (in the short term so I don't have a redo constraints).
09-23-2019 02:03 PM
09-23-2019 03:06 PM
if you have low utilization but high congestion, you probably have a design with very poor pipelining. Do you have some very complicated logic relying on a lot fo inputs?
I would consider a fanout >1000 to be rather large. and > 500 can make timing harder, but it really depends on the clock speed and the design.
Can you post your code?
09-23-2019 03:29 PM
Fanout goals are generally discouraged within Vivado Synthesis. Currently UG901 indicates the tool many even ignore any fanout_limits that a user places on the design. Xilinx does this as the fanout problem is better solved by the back end tools during place and route, and other implementation steps. Those steps, actually have to sometimes work to undo the duplication done by synthesis (or those done manually by hand). Yes, the back-end tools sometime work against the front-end tools. We've removed all fanout goals from synthesis years ago based on suggestions from Xilinx, and haven't had any trouble.
I suggest reviewing the UltraFast Design Methodology Guide for the Vivado Design Suite (UG949). There's lots of good info in there about tackling hard to implement designs.
09-24-2019 09:12 AM
Thank you all so far for your advice.
I'm trying to go through UltraFast Methodology for ideas.
It is an extremely parallel design. (doing something like 20K+ similar operations per cycle). Because of the type of algorithm, there is a large interconnection between many of the operations before the next set of operations. I think this is causing much of the congestion.
I tend never to worry about fan-out, but this particular design, because of the congestion, I've been looking in that direction. I'm using timing and congestion reports as a guide as to where to look, and I may manually replicate where necessary. Since the bus/array is so large, for any conditional statement (e.g. if valid_in, case state=something) that affects the large arrays, whatever logic is created, applies to the 20K+ bits, which is why I thought to focus on fanout. The tool may automatically replicate where necessary, but in this case can't find a solution for routing.
Pipelining makes sense, and I'm thinking about it, but I'm limited on how many cycles I have to complete the entire algorithm. Tough one.
09-24-2019 10:21 AM
you could try to add attributes to the signal in the code to minimise fanout. for example, in VHDL
attribute max_fanout : integer; attribute max_fanout of <your_signal> : signal is N;
Assuming register duplication is turned on, this will force duplication when the fanout goes beyond your specified limit for that signal only.
09-24-2019 11:14 AM
Thank you. I've started that and had some initial success so may try to extend to more signals and see how it goes.
I've noticed that in some cases the tool may not know the total fanout if much of the fanout happens in lower levels of hierarchy than the level of hierarchy that created the flipflop (e.g. an input valid mapped into 20 instances of a block). So, I may have to explicitly create my own duplication.
I'm trying the "rebuilt" synthesis option, which I had hoped would flatten so that max_fanout would have best effect, but still have hierarchy for implementation constraints.