UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Explorer
Explorer
9,192 Views
Registered: ‎12-01-2010

routing congestion

Jump to solution

Hi, I wanna ask a PAR problem I met yesterday.

 

I used the IODELAYE1 to a signal named dqs for 90 degree. Let's call the delayed signal dqs1.

Then I flipped the dqs as dqs2.

Next, dqs1 xor dqs2. So we got dqs3.

 

I tried to use dqs3 as a clock, but PAR cannot pass.

The dqs1 cannot be routed.

 

WARNING:Route:543 - This design is experiencing routing congestion. Please review the Xilinx Routing Optimization White Paper, WP381 on
www.xilinx.com, for guidelines and techniques in resolving this issue.

WARNING:Route:563 -
Router will not fix hold error

 

I think the logic is quite simple, why it failed?

I also see the wp381 and try to fix it. But not worked.

 

Please give me some advice.

Thanks.

0 Kudos
1 Solution

Accepted Solutions
Historian
Historian
8,369 Views
Registered: ‎01-23-2009

Re: routing congestion

Jump to solution

If you really do need to capture on the DQS, then what I said before works - take the DQS (assuming it is on a clock capable I/O - and it really needs to be), bring it to a BUFIO and use the IDDR in the flip-flops to capture data on both edges.

 

Since each IDDR captures data on both edges, you end up with two bits of data for each clock period per bit - this can be moved to the rising edge within the IOB (using SAME_EDGE) so that you now have data only on the rising edge of clock, but twice the width of your interface.

 

The data coming from the IDDRs clocked using the BUFIO can be transferred to any logic clocked on the BUFRs that are in the same bank (the clock capable I/O can drive both the BUFIO and BUFR). Thus you can take that data from the IDDR and (for example) write it into a BRAM that is clocked on the BUFR.

 

Since the BRAMs are true dual port, you can have the other port of the RAM clocked on your "main" clock domain to read the data from the BRAMs.

 

The main problem in DRAM type applications is determining which edges of DQS carry valid return data, and which don't - particularly since DQS is bidirectional and has different meanins during read and write. Thus. its hard to know when to write the data into the BRAM.

 

If the DQS from your flash is unidirectional and (say) only runs during valid read operations, then what I propose here is all you need (basically use the BRAM as a FIFO and you are done). If the DQS does something else between reads, then you will have other issues to deal with.

 

Avrum

0 Kudos
15 Replies
Highlighted
Community Manager
Community Manager
9,188 Views
Registered: ‎06-14-2012

Re: routing congestion

Jump to solution

which device and tools version are you using?

0 Kudos
Explorer
Explorer
9,180 Views
Registered: ‎12-01-2010

Re: routing congestion

Jump to solution

Hi, thank you for your quick reply.

My FPGA is virtex-6 xc6vlx130t-2ff1156 and my ISE is 13.2 version.

 

0 Kudos
Community Manager
Community Manager
9,175 Views
Registered: ‎06-14-2012

Re: routing congestion

Jump to solution

Is it possible to share your design? If not, I would suggest to check with few options.

 

1.Smartxplorer with -cr switch.(Congestion reduction)

 2. map (-xe c)

0 Kudos
Xilinx Employee
Xilinx Employee
9,170 Views
Registered: ‎07-01-2008

Re: routing congestion

Jump to solution

You are gating the clocks through the XOR logic and so as a result you have a LUT driving the clock net on local resources rather than a BUFGCTRL on dedicated clock routing resources. This is why you get congestion and hold errors due to the delay on the clock net. 

0 Kudos
Explorer
Explorer
9,169 Views
Registered: ‎12-01-2010

Re: routing congestion

Jump to solution

Hi, thanks for your reply.

My code is as follows.

entity XORDQS is
port(
	clk : in std_logic;	--200MHz
	rst : in std_logic:= '0';
	ce  : in std_logic;
	rden		: in std_logic;
	dqs : in std_logic;	--100MHz
	NANDOUT  : in std_logic_vector(7 downto 0);
	nand_rd_valid : out std_logic;
	rdreg		: out std_logic_vector(63 downto 0);
	done		: out std_logic;
	addrout	: out std_logic_vector(10 downto 0);
	en			: out std_logic
	);
end XORDQS;

architecture Behavioral of XORDQS is
signal dqs1 : std_logic;
signal dqs2 : std_logic;
signal dqs3 : std_logic;
signal RDY : std_logic;
signal count : std_logic_vector(2 downto 0):= "000";
signal CNTVALUEOUT : std_logic_vector(4 downto 0);

  attribute keep : string;
  attribute keep of dqs1 : signal is "TRUE";
  attribute keep of dqs2 : signal is "TRUE";
  
begin
   IODELAYE1_inst : IODELAYE1
   generic map (
      CINVCTRL_SEL => FALSE,         -- Enable dynamic clock inversion ("TRUE"/"FALSE") 
      DELAY_src=> "I",                -- Delay input ("I", "CLKIN", "DATAIN", "IO", "O")
      HIGH_PERFORMANCE_MODE => TRUE, -- Reduced jitter ("TRUE"), Reduced power ("FALSE")
      IDELAY_TYPE => "FIXED",        -- "DEFAULT", "FIXED", "VARIABLE", or "VAR_LOADABLE" 
      IDELAY_VALUE => 30,               -- Input delay tap setting (0-32)
      ODELAY_TYPE => "FIXED",          -- "FIXED", "VARIABLE", or "VAR_LOADABLE" 
      ODELAY_VALUE => 0,               -- Output delay tap setting (0-32)
      REFCLK_FREQUENCY => 200.0,       -- IDELAYCTRL clock input frequency in MHz
      SIGNAL_PATTERN => "CLOCK"         -- "DATA" or "CLOCK" input signal
   )
   port map (
      CNTVALUEOUT => CNTVALUEOUT, -- 5-bit output - Counter value for monitoring purpose
      DATAOUT => dqs1,         -- 1-bit output - Delayed data output
      C => '0',                     -- 1-bit input - Clock input
      CE => '0',                   -- 1-bit input - Active high enable increment/decrement function
      CINVCTRL => '0',       -- 1-bit input - Dynamically inverts the Clock (C) polarity
      CLKIN => clk,             -- 1-bit input - Clock Access into the IODELAY
      CNTVALUEIN => "00000",   -- 5-bit input - Counter value for loadable counter application
      DATAIN => '0',           -- 1-bit input - Internal delay data
      IDATAIN => dqs,         -- 1-bit input - Delay data input
      INC => '0',                 -- 1-bit input - Increment / Decrement tap delay
      ODATAIN => '0',         -- 1-bit input - Data input for the output datapath from the device
      RST => '0',                 -- 1-bit input - Active high, synchronous reset, resets delay chain to IDELAY_VALUE/
                                  -- ODELAY_VALUE tap. If no value is specified, the default is 0.
      T => '0'                      -- 1-bit input - 3-state input control. Tie high for input-only or internal delay or
                                  -- tie low for output only.
   );

   IDELAYCTRL_inst : IDELAYCTRL
   port map (
      RDY => RDY,       -- 1-bit output indicates validity of the REFCLK
      REFCLK => clk, -- 1-bit reference clock input
      RST => rst        -- 1-bit reset input
   );

dqs2 <= not dqs;
dqs3 <= dqs2 xor dqs1;

process(dqs3)
begin
	if(dqs3'event and dqs3 = '1')then
		if(count < "111")then
			count <= count + "001";
		elsif(count = "111")then
			count <= "000";
		end if;
		case count is
			when "000" => rdreg(7 downto 0) <= NANDOUT;
			when "001" => rdreg(15 downto 8) <= NANDOUT;
			when "010" => rdreg(23 downto 16) <= NANDOUT;
			when "011" => rdreg(31 downto 24) <= NANDOUT;
			when "100" => rdreg(39 downto 32) <= NANDOUT;
			when "101" => rdreg(47 downto 40) <= NANDOUT;
			when "110" => rdreg(55 downto 48) <= NANDOUT;
			when "111" => rdreg(63 downto 56) <= NANDOUT;
			when others => rdreg <= "1111111111111111111111111111111111111111111111111111111111111111";
		end case;
	end if;
end process;


end Behavioral;

I have tried the SmartXplorer and all algorithm have failed.

Thus, I highly wonder that my code is not followed with some design rules,

However, I'd ever use a similar way to eliminate the pulses in the gate clock before. So I think xor operation seems to be OK to deal with this clock. But the Router doesn't work as I suppose. And I can't see any detail in Planahead since the PAR failed.

 

dqs3 <= dqs2 xor dqs1;

Anyway, thanks for your quick reply again.

 

 

0 Kudos
Explorer
Explorer
9,164 Views
Registered: ‎12-01-2010

Re: routing congestion

Jump to solution

Oh, yes. That's the reason why PAR failed. Thank you!

If I really want to use a xor operation to generate the new dqs as a clock, is that possible to achieve it by adding some constraints or a BUFGCTRL?

I have to say I am still a green hand in FPGA and do not know much about the clock related issues. But I've realized that the clock domain is always the main part that leads my design unrouted or not meet the setup/hold time.

Thanks again.

0 Kudos
Community Manager
Community Manager
9,157 Views
Registered: ‎06-14-2012

Re: routing congestion

Jump to solution

Can you try using a BUFG at the output of your gated clock?

 

This is not a recommended practice as it would led to clock skew issues and leading to timing errors..

0 Kudos
Explorer
Explorer
9,151 Views
Registered: ‎12-01-2010

Re: routing congestion

Jump to solution

Hi, thanks for your reply.

I tried to add BUFG to the output of my clock, but the clock is still unrouted.

Is it very unusual to add some arithmetic functions to the clock? If so, maybe I should give up and found a new way to deal with the problem.

0 Kudos
Xilinx Employee
Xilinx Employee
9,120 Views
Registered: ‎07-23-2012

Re: routing congestion

Jump to solution
Hi,

Are you still facing the routing congestion after driving the clock through BUFG?

If yes, can you please try to do the floorplanning (creating pblocks) in such a way that the congestion is reduced?

It is not a good practice to have gated clocks because it accounts for skew. But, it doesn't lead to congestion. Poor floorplanning or over utilization leads to congestion.

If possible, can you please post the map utilization summary as well.

Regards,
Krishna
-----------------------------------------------------------------------------------------------
Please mark the post as "Accept as solution" if the information provided answers your query/resolves your issue.

Give Kudos to a post which you think is helpful.
0 Kudos
Xilinx Employee
Xilinx Employee
5,621 Views
Registered: ‎07-01-2008

Re: routing congestion

Jump to solution

Krishna,

 

When a high fanout net has to be routed on local resources  instead of the available global resources it can overload the local resources and cause congestion. Many of these connections will need to span long distances across the chip.

 

The usual recomendation as an alternative to clock gating is to make use of clock enables instead.

0 Kudos
Explorer
Explorer
5,610 Views
Registered: ‎12-01-2010

Re: routing congestion

Jump to solution

Hi, thanks a lot for your advise.

I've tryied the BUFG, and it didn't work.

Since the unrouted part is due to the XOR/XNOR operation, I add the constraints to the dqs related signal.


dqs4 <= dqs2 xnor dqs;

WARNING:ParHelpers:360 - Design is not completely routed.

dqs_IBUF

INST "dqs_IBUF" AREA_GROUP = "pblock_dqs_IBUF";


 

I've never used the floorplanning before. So I am not sure whether I correctly understand your advice or not.

I rerouted the design and the congestion is still there. My understanding is that the clock must be routed using the clock  resource instead of LUT. If I use the XOR operation, the clock is forced to go through the LUT and then unrouteable. But the usual gating clock has a easier logic, which the clock resource itself can deal with. Thus, normal gating clock needn't be routed to LUT. I'm not sure if I get the point.

 

The following is the utilization summary.

1.jpg

2.jpg

 

Thanks again.

0 Kudos
Historian
Historian
5,587 Views
Registered: ‎01-23-2009

Re: routing congestion

Jump to solution

Based on the (extremely) low utilization of your FPGA, it is very unlikely that this is a real "congestion" issue. This is more likely something that is structurally unroutable.

 

In a normal IOB, the IBUF output either goes directly to the fabric, to the IDELAY, or to the IOB FF. The IDELAY output can also go directly to the fabric or to the IOB FF. I suspect that your combination of signals is messing things up, and the mapper is trying to pack dqs2 into the IOB FF using the path from the IBUF to the IOB FF and also trying to use the IDELAY for dqs1. This is probably illegal.

 

Nothing having to do with smartXplorer or floorplanning is going to fix this - it is structurally impossible.

 

You may be able to work around it by forcing the dqs2 FF out of the IOB using the IOB=FALSE attribute on it...

 

But...

 

What exactly are you trying to do? This delay & XOR thing is a (cheating) way of doubling your clock frequency (or making a rising edge out of both the rising and falling edge of the DQS). I can't see any reason why you would want to do this. If you want to capture data on both edges of the DQS, then simply run the dqs input (either through an IDELAY or not) to a clock buffer (preferably a BUFIO) and use the IDDR flip-flops in the IOB to capture your incoming data on both edges of the DQS - there is no need to double the clock.

 

Furthermore - designing the PHY layer for a DDR SDRAM (I presume this is what you are trying to do) is VERY difficult. Once you have the data captured by the DQS, then what? You need to move this into some other clock domain, and you need to correlate it to your read commands... At high speeds the data eyes are narrow and hard to find, often requiring dynamic calibration...

 

This is why Xilinx provides the Memory Interface Generator - it contains the PHY for launching and capturing data to and from DDR memories (and it doesn't do it by clocking on the DQS - at least in most technologies). Even if you want to use your own controller, Xilinx recommends that you start with the PHY layer generated by the MIG, which does all the calibration, etc...

 

Avrum

0 Kudos
Explorer
Explorer
5,580 Views
Registered: ‎12-01-2010

Re: routing congestion

Jump to solution

Thanks for your detailed answer. 

I'll try the IOB=FALSE attribute and see if it works.

 

You really point out the problem I am facing. And you clearly point out that I need to move the data into my BRAM related clock domain. Yes, that's the annoying part of the design. However, what I deal with is not DDR3 SDRAM but the Flash memory. The difference between them is that DDR3 need not calibrate the phase (I mean the tap of IODELAY) every time, nevertheless, the Flash always have an unknown DQS arriving time. So if we use the training method in the MIG to deal with the DQS, we have to calibrate the tap before every READ operation.

Actually, I have read the MIG code and found out that it's not a suitable solution for my achitecture and it's too complex. Thus, I am trying to double the DQS to see if I can use only rising edge to get the DQ data. Clearly, I can use selectIO IP and let IDDR to deal with the double edge data. Then, I am thinking how to transfer the data to my memory, like BRAM. I have to face the cross clock domain issue. I still puzzled about it.

 

Anyway, thanks again for your reply.

 

 

0 Kudos
Historian
Historian
8,370 Views
Registered: ‎01-23-2009

Re: routing congestion

Jump to solution

If you really do need to capture on the DQS, then what I said before works - take the DQS (assuming it is on a clock capable I/O - and it really needs to be), bring it to a BUFIO and use the IDDR in the flip-flops to capture data on both edges.

 

Since each IDDR captures data on both edges, you end up with two bits of data for each clock period per bit - this can be moved to the rising edge within the IOB (using SAME_EDGE) so that you now have data only on the rising edge of clock, but twice the width of your interface.

 

The data coming from the IDDRs clocked using the BUFIO can be transferred to any logic clocked on the BUFRs that are in the same bank (the clock capable I/O can drive both the BUFIO and BUFR). Thus you can take that data from the IDDR and (for example) write it into a BRAM that is clocked on the BUFR.

 

Since the BRAMs are true dual port, you can have the other port of the RAM clocked on your "main" clock domain to read the data from the BRAMs.

 

The main problem in DRAM type applications is determining which edges of DQS carry valid return data, and which don't - particularly since DQS is bidirectional and has different meanins during read and write. Thus. its hard to know when to write the data into the BRAM.

 

If the DQS from your flash is unidirectional and (say) only runs during valid read operations, then what I propose here is all you need (basically use the BRAM as a FIFO and you are done). If the DQS does something else between reads, then you will have other issues to deal with.

 

Avrum

0 Kudos
Explorer
Explorer
5,541 Views
Registered: ‎12-01-2010

Re: routing congestion

Jump to solution

Hi, Avrum. Thanks for your reply.

 

Now I use the IODELAY to delay the DQS and generate eight IDDR to deal with the DQ. The behavior simulation looks quite perfect. But the problem is still there: the timing cannot pass. Too many setup/hold time cannot meet to the constraints (I only write constraints for the input clock.)

 

It seems that I have taken the cross clock domain problem seriously yet. Once I thought that the BRAM need only several signals: 

 

enable

wea

address

datain

 

Thus, my original method is to output all these signals in the DQS clock domain(100MHz) and then use a 200MHz system clock to get them at the rising edge. Now I find out that it's not a good way. As you said, the data written to the BRAM should be clocked. Otherwise I will certainlly receive an unusual high timing score. Do I understand your point?

 

When I see the post-map static timing report, it puzzles me again. The biggest timing slack lies in the path between the IDDR and a flash controller. I once try to solve the problem by add some TIG into the constraints. But it doesn't work well. 

1.jpg

 

Anyway, now the problem has gone far away from the congestion. I think I can solve the left things. Thanks again for your helps.

0 Kudos