cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
randraka
Observer
Observer
2,115 Views
Registered: ‎05-11-2010

Registering LUTRAM in same slice

Vivado 2017.4, XCKU035. I've got some dual port LUT rams that I need to register in the same slice. The particular case I am having difficulty with timing closure is a RAM256x1D with different read (500MHz) and write clocks. The tools refuse to put the register on DPO in the same slice. I've tried inferencing as well as instantiating and RLOCing as well as RLOCS with BELS set. In the course of experimenting with it, I've also changed the DP read clock to match the LUTRAM's WCLK, with the same result. I've since gone back and looked at other LUTRAMs in the design (32xN) and see the same issue there, just that those didn't pop up as failed timing. The timing is close enough (<10ps in most cases) that it should meet timing easily if the register were placed in the slice with the RAM, and routed using the in-slice connections. Has anyone else seen this behavior, and if so what am I missing? I know many of the earlier architectures did not allow registers with a different clock than the RAM in the same slice, but it appears ultrascale should allow it. I do see the tool putting a few unrelated registers that are on the desired clock in the slice with the memory, so I don't believe there is a clock resource issue, and the registers are FD's (no reset or ce), so there should not be an issue there. I also tried breaking the RAM into RAM128's and then RAM64's and registering before the select muxes, again with it refusing to allow the registers in the same slice. Any insight would be more than welcome. Thanks!
0 Kudos
6 Replies
jmcclusk
Mentor
Mentor
2,075 Views
Registered: ‎02-24-2014

Hi Ray,

 

I have an Ultrascale design XCVU065 (Vivado 2017.4) with RAM32M16 elements which are properly registered, so it's definitely possible with some LUTRAM flavors (no constraints used, just RTL).   If RLOC and BEL's don't work, it definitely sounds like a placer flaw.    Got a test case that can be plugged into 2018.1?   That should be out, shortly.

Don't forget to close a thread when possible by accepting a post as a solution.
0 Kudos
randraka
Observer
Observer
2,061 Views
Registered: ‎05-11-2010

None of these are putting the register in the same slice as the memory, well not without going out of the CLB and coming back in.  Same is true if rclk is changed to wclk in the second process.  Hopefully I am overlooking something from being too close to the problem.  As a sanity check, I ran this under ise 14.7 with a kintex7.  It correctly packed the registers with the memory slices for 32 and 64 (the others are composite in K7).  Not one of them packed in KU035 under vivado 2017.4  hmmm.

library IEEE;
use IEEE.std_logic_1164.all;
use ieee.numeric_std.all;

entity testrams is
port(
wclk : in STD_LOGIC;
rclk : in STD_LOGIC;
we : in STD_LOGIC;
raddr : in STD_LOGIC_VECTOR(7 downto 0);
waddr : in STD_LOGIC_VECTOR(7 downto 0);
wdata : in STD_LOGIC_VECTOR(1 downto 0);
rdata256 : out STD_LOGIC_VECTOR(1 downto 0);
rdata128 : out STD_LOGIC_VECTOR(1 downto 0);
rdata64 : out STD_LOGIC_VECTOR(1 downto 0);
rdata32 : out STD_LOGIC_VECTOR(1 downto 0)
);
end testrams;

 

architecture testrams of testrams is
type mem_array is array (natural range <>) of std_logic_vector(1 downto 0);
signal ram256: mem_array(0 to 255);
signal ram128: mem_array(0 to 127);
signal ram64: mem_array(0 to 63);
signal ram32: mem_array(0 to 31);

begin
   process(wclk)
   begin
       if rising_edge(wclk) then
          if we='1' then
              ram256(to_integer(unsigned(waddr))) <= wdata;
              ram128(to_integer(unsigned(waddr(6 downto 0)))) <= wdata;
              ram64(to_integer(unsigned(waddr(5 downto 0)))) <= wdata;
              ram32(to_integer(unsigned(waddr(4 downto 0)))) <= wdata;
           end if;
      end if;
end process;

process(rclk)
begin
   if rising_edge(rclk) then
      rdata256 <= ram256(to_integer(unsigned(raddr)));
      rdata128 <= ram128(to_integer(unsigned(raddr(6 downto 0))));
      rdata64 <= ram64(to_integer(unsigned(raddr(5 downto 0))));
      rdata32 <= ram32(to_integer(unsigned(raddr(4 downto 0))));
   end if;
end process;

 

Tags (1)
randraka
Observer
Observer
2,053 Views
Registered: ‎05-11-2010

It looks like it maybe it only packs the register for the SPO output, not for the DPO.  It did indeed place SPO registers when I added that to the model in my earlier post.   I don't see anything in the routing view that would stop it from doing the same for the DPO, but it appears the placer doesn't know about it.   The UltraScale Architecture CLB User Guide (www.xilinx.com 27 UG574 (v1.5) February 28, 2017), in figure 2-6 implies the register is available for DPO, but for the life of me, I can't get the placer to agree.

Turns out the output from the Distrubuted RAM Generator 8.0 also does not place the DPO register in the same slice as the RAM, so it is either a placer issue or there is something I missed where the DPO cant be registered in the slice.   From the routing resources view in the plan-ahead, it looks like the resources are there, but for whatever reason the software is not allowing the use for registering DPO.

 

Looks like I'm going to have to fins another way for this one.  Anybody else who's seen this and found a work-around?

randraka
Observer
Observer
2,023 Views
Registered: ‎05-11-2010

Resolution isn't ideal, but it works in this case.   I replaced the 256x2 DPRAM, which gets refilled from a repository if any changes are made in the respository, with SRL32e's strung together to make an SRL256 for each bit.  Plus a pair of shadow bits so that shifting in new does not screw up the reading while the shifting is occurring (basically a double bucket memory) .  Still had to instantiate and rloc the slr32's, f7,f8 and f9 muxes and flip flop to convince the placer to put them all in the same slice, but at least it can make timing consistently now.

Xilinx, you need to look at the registering of the DPO on dual ported LUT RAM.  Your literature and the routing diagrams in plan-ahead view all seem to indicate it is possible to register the DPO output in-slice for best timing, but the tools do not seem to know this, and could not be coaxed to put them together.  Use my code above for a test case

jmcclusk
Mentor
Mentor
2,021 Views
Registered: ‎02-24-2014

you said it Ray.    We need to check 2018.1 when it's released, and get a CR filed if it's still broken.   This must not be allowed to persist.

Don't forget to close a thread when possible by accepting a post as a solution.
0 Kudos
seamusbleu
Adventurer
Adventurer
662 Views
Registered: ‎08-12-2008

Ray, did you ever hear back about this issue (or possibly an answer record)?  I'm using 2019.2 and I'm seeing that RAMD32's (still) aren't using the FDRE in the same slice.

0 Kudos