cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Visitor
Visitor
8,593 Views
Registered: ‎01-21-2013

Kintex 7, extra LUTs between input IO pad and FF (slice register)

Jump to solution

On Kintex 7 target (I'm using xc7k410t), after P&R, I found that ISE puts 3 extra LUTs on every data net between input Pads and slice registers, acting as route-thrus. But in the device utilization report, those LUTs are regarded as logic, not route-thru.I tried ISE 13.4 and 14.3, both have this behavior.

 

Anyone knows why why there are such extra LUTs?

 

 

Below is the topology of a single data bit, between input Pad and Flip-flop defined in my VHDL:

 

And below is the interconnection in a slice. The O6 output of this LUT is equal to A1 input, this signal then goes out of the slice.

 

Tags (3)
0 Kudos
1 Solution

Accepted Solutions
Highlighted
Xilinx Employee
Xilinx Employee
11,301 Views
Registered: ‎06-20-2008

If your design uses a BUFG that does not use an MMCM, DCM or PLL (CMT) to remove the clock in sertion delay then ISE will insert 3 LUTs to help minimize the routing that must be added to resolve hold times.  If your BUFG does not use the CMT to remove the clock insertion delay then you will have a lot of added delay from the PAD to the BUFG and then from the BUFG to the FF.  In previous architectures the tools would always enable the uncompensated IOBDELAY element for this circuit topology. An uncompensated IOBDELAY does not use the IDELAYCNTRL to guarantee a fixed delay so it also has the potential for large variation in delay over PVT, the same as the 3 LUT and fabric routing.  In 7-series all IOs do not have the uncompensated IOBDELAY so they tools are not able to automatically insert it to account for the large delay of the clock (when a CMT is not used).  So when this topology is identified then the tools will insert 3 LUTs.

 

If you do not want the 3 LUTs then just IOBDELAY=NONE on the component and this will disable the insertion of the delay.

Also you can set IOB=FORCE this will FORCE the FF into the IOB and prevent the insertion of the LUTs.

Set IOB=FORCE and set IOBDELAY=IFF for High Range IOs only and this will turn on the uncompensated delay in the IO and also prevent the LUT insertion. 

Note that there is no uncompensated IOBDELAY in the High Range IOs so you will get as error if you attempt to do this in an HR IO.

Setting IOB=TRUE for this topology will (BUFG without a CMT to remove clock insertion delay) will still result in the FF being pushed into the fabric and the 3 LUTs being inserted since the tools recognize that there will likely be a hold violation that must be resolved.  If you want to over ride this then set IOB=FALSE

 

Please refer to the constraints guide for more inforamation on using the IOBDELAY constraint.

View solution in original post

7 Replies
Highlighted
Visitor
Visitor
8,592 Views
Registered: ‎01-21-2013

Sorry that I haven't attached the figures.

 

This attachment is the topology

 

topology.png
0 Kudos
Highlighted
Visitor
Visitor
8,591 Views
Registered: ‎01-21-2013

This is the interconnect in a slice

 

slice view.PNG
0 Kudos
Highlighted
Guide
Guide
8,579 Views
Registered: ‎01-23-2009

You haven't given us much information, but my guess is that the tool is intentionally adding these to fix a hold time violation.

 

If you have an OFFSET IN with a VALID specification on an input, the tools will perform a hold check on the end of the VALID window. If the data would go away before the flip-flop has a chance to sample it (and meet its hold time requirement), then the tools will add delay to the incoming data path in order to keep the data around "longer" to meet the hold time requirement of the flop.

 

This seems to be confirmed by the fact that the tool has named the cells *_DELAY_*

 

If you are 100% certain that your OFFSET IN constraint is 100% correct (i.e. you have analyzed the input interface and included all sources of clock skew, jitter and propagation), and the tool meets timing with this configuration, then the interface should work.However, often it wont. The tool will fix hold time violations over setup time violations. In adding these three LUTs of delay, the tool has added the minimal amount of delay to meet the hold time requirement. However, these delays are highly process/temperature/voltage dependent - to add a minumum of 1ns of delay at "best PVT", this will result in around 3ns of delay at worst PVT. This delay can (and often will) make it impossible to meet the setup time requirement on this input.

 

This is not (generally) a good mechanism for designing a reliable input interface. It is generally more reliable to use IOB FFs for capturing input data. If the data window is not in the proper place (after analysis) then you should design a mechanism of putting it in the right place - either capture it with a clock that arrives earlier (using a DCM/MMCM/PLL), or put some delay on the data using the IDELAY cell. The main advantage of these mechanism is that they are all PVT compensated - the delay on the IDELAY and the phase adjustments on a DCM/PLL/MMCM are all quite precise; if you ask for 1ns of delay, you will get 1ns +/- a few hundred picoseconds - not 1-3ns as you would get with fabric delays.

 

Avrum

Highlighted
Visitor
Visitor
8,558 Views
Registered: ‎01-21-2013

Avrum,

 

Thanks for your reply, this is very helpful for solving general I/O input timing violation problems.

 

However, this still can't explain why ISE always inserts extra LUTs into my design. I did 3 experiments, every time, ISE inserts exactly 3 LUTs between every I/O and the first register:

 

1. Use a loose clock period constraint, no OFFSET IN constraint. There is no setup/hold violation, so there is no need to delay data signals.

NET "Clk" TNM_NET = Clk;
TIMESPEC TS_Clk = PERIOD "Clk" 10 ns HIGH 50%;

 

2. Set a loose OFFSET IN constraint, again, there is no need to delay data signals.

NET "Clk" TNM_NET = Clk;
TIMESPEC TS_Clk = PERIOD "Clk" 10 ns HIGH 50%;
TIMEGRP "DataIn" OFFSET = IN 5 ns VALID 10 ns BEFORE "Clk" RISING;

 

3. Set a very small IN constraint, so using LUTs to delay the data signal will result in setup time violation. And it actually causes setup time violation.

NET "Clk" TNM_NET = Clk;
TIMESPEC TS_Clk = PERIOD "Clk" 10 ns HIGH 50%;
TIMEGRP "DataIn" OFFSET = IN 0.1 ns VALID 10 ns BEFORE "Clk" RISING;

In these cases, it's not necessary to use LUTs to delay the signal. So I'm wondering why this happens. This behavior causes trouble to me because I want to evaluate how much resources are used in my module, and the number of LUTs are always higher than expected. And I also tried Virtex 5, there is no such LUTs.

 

I have attached a sample project below. It's a simple 16-bit adder, which only needs 16 LUTs. But actaully it uses 96 more LUTs to delay the input signals.

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
11,302 Views
Registered: ‎06-20-2008

If your design uses a BUFG that does not use an MMCM, DCM or PLL (CMT) to remove the clock in sertion delay then ISE will insert 3 LUTs to help minimize the routing that must be added to resolve hold times.  If your BUFG does not use the CMT to remove the clock insertion delay then you will have a lot of added delay from the PAD to the BUFG and then from the BUFG to the FF.  In previous architectures the tools would always enable the uncompensated IOBDELAY element for this circuit topology. An uncompensated IOBDELAY does not use the IDELAYCNTRL to guarantee a fixed delay so it also has the potential for large variation in delay over PVT, the same as the 3 LUT and fabric routing.  In 7-series all IOs do not have the uncompensated IOBDELAY so they tools are not able to automatically insert it to account for the large delay of the clock (when a CMT is not used).  So when this topology is identified then the tools will insert 3 LUTs.

 

If you do not want the 3 LUTs then just IOBDELAY=NONE on the component and this will disable the insertion of the delay.

Also you can set IOB=FORCE this will FORCE the FF into the IOB and prevent the insertion of the LUTs.

Set IOB=FORCE and set IOBDELAY=IFF for High Range IOs only and this will turn on the uncompensated delay in the IO and also prevent the LUT insertion. 

Note that there is no uncompensated IOBDELAY in the High Range IOs so you will get as error if you attempt to do this in an HR IO.

Setting IOB=TRUE for this topology will (BUFG without a CMT to remove clock insertion delay) will still result in the FF being pushed into the fabric and the 3 LUTs being inserted since the tools recognize that there will likely be a hold violation that must be resolved.  If you want to over ride this then set IOB=FALSE

 

Please refer to the constraints guide for more inforamation on using the IOBDELAY constraint.

View solution in original post

Highlighted
Visitor
Visitor
8,542 Views
Registered: ‎01-21-2013

Thanks llewis, this answers my question.

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
8,530 Views
Registered: ‎06-20-2008

To correct a typo in my post. In the paragraph below I said there no uncompensated IOBDELAY (ZHOLD_DELAY) in the High Range IOs.  This should say there are no ZHOLD_DELAYs in the High Performance IOBs.  Using High Performance IOs the delay variation of the ZHOLD will likely be too much so when using High Performance IOs the designer should be using the Calibrated IOBDELAY with IDELAYCNTRL to ensure good delay tracking.

 

Note that there is no uncompensated IOBDELAY in the High Range IOs so you will get as error if you attempt to do this in an HR IO. Setting IOB=TRUE for this topology will (BUFG without a CMT to remove clock insertion delay) will still result in the FF being pushed into the fabric and the 3 LUTs being inserted since the tools recognize that there will likely be a hold violation that must be resolved.  If you want to over ride this then set IOB=FALSE

0 Kudos