03-26-2018 07:10 AM
03-26-2018 08:54 AM
Hi Ray,
I have an Ultrascale design XCVU065 (Vivado 2017.4) with RAM32M16 elements which are properly registered, so it's definitely possible with some LUTRAM flavors (no constraints used, just RTL). If RLOC and BEL's don't work, it definitely sounds like a placer flaw. Got a test case that can be plugged into 2018.1? That should be out, shortly.
03-26-2018 11:16 AM
None of these are putting the register in the same slice as the memory, well not without going out of the CLB and coming back in. Same is true if rclk is changed to wclk in the second process. Hopefully I am overlooking something from being too close to the problem. As a sanity check, I ran this under ise 14.7 with a kintex7. It correctly packed the registers with the memory slices for 32 and 64 (the others are composite in K7). Not one of them packed in KU035 under vivado 2017.4 hmmm.
library IEEE;
use IEEE.std_logic_1164.all;
use ieee.numeric_std.all;
entity testrams is
port(
wclk : in STD_LOGIC;
rclk : in STD_LOGIC;
we : in STD_LOGIC;
raddr : in STD_LOGIC_VECTOR(7 downto 0);
waddr : in STD_LOGIC_VECTOR(7 downto 0);
wdata : in STD_LOGIC_VECTOR(1 downto 0);
rdata256 : out STD_LOGIC_VECTOR(1 downto 0);
rdata128 : out STD_LOGIC_VECTOR(1 downto 0);
rdata64 : out STD_LOGIC_VECTOR(1 downto 0);
rdata32 : out STD_LOGIC_VECTOR(1 downto 0)
);
end testrams;
architecture testrams of testrams is
type mem_array is array (natural range <>) of std_logic_vector(1 downto 0);
signal ram256: mem_array(0 to 255);
signal ram128: mem_array(0 to 127);
signal ram64: mem_array(0 to 63);
signal ram32: mem_array(0 to 31);
begin
process(wclk)
begin
if rising_edge(wclk) then
if we='1' then
ram256(to_integer(unsigned(waddr))) <= wdata;
ram128(to_integer(unsigned(waddr(6 downto 0)))) <= wdata;
ram64(to_integer(unsigned(waddr(5 downto 0)))) <= wdata;
ram32(to_integer(unsigned(waddr(4 downto 0)))) <= wdata;
end if;
end if;
end process;
process(rclk)
begin
if rising_edge(rclk) then
rdata256 <= ram256(to_integer(unsigned(raddr)));
rdata128 <= ram128(to_integer(unsigned(raddr(6 downto 0))));
rdata64 <= ram64(to_integer(unsigned(raddr(5 downto 0))));
rdata32 <= ram32(to_integer(unsigned(raddr(4 downto 0))));
end if;
end process;
03-26-2018 02:20 PM - edited 03-26-2018 03:05 PM
It looks like it maybe it only packs the register for the SPO output, not for the DPO. It did indeed place SPO registers when I added that to the model in my earlier post. I don't see anything in the routing view that would stop it from doing the same for the DPO, but it appears the placer doesn't know about it. The UltraScale Architecture CLB User Guide (www.xilinx.com 27 UG574 (v1.5) February 28, 2017), in figure 2-6 implies the register is available for DPO, but for the life of me, I can't get the placer to agree.
Turns out the output from the Distrubuted RAM Generator 8.0 also does not place the DPO register in the same slice as the RAM, so it is either a placer issue or there is something I missed where the DPO cant be registered in the slice. From the routing resources view in the plan-ahead, it looks like the resources are there, but for whatever reason the software is not allowing the use for registering DPO.
Looks like I'm going to have to fins another way for this one. Anybody else who's seen this and found a work-around?
03-28-2018 01:13 PM
Resolution isn't ideal, but it works in this case. I replaced the 256x2 DPRAM, which gets refilled from a repository if any changes are made in the respository, with SRL32e's strung together to make an SRL256 for each bit. Plus a pair of shadow bits so that shifting in new does not screw up the reading while the shifting is occurring (basically a double bucket memory) . Still had to instantiate and rloc the slr32's, f7,f8 and f9 muxes and flip flop to convince the placer to put them all in the same slice, but at least it can make timing consistently now.
Xilinx, you need to look at the registering of the DPO on dual ported LUT RAM. Your literature and the routing diagrams in plan-ahead view all seem to indicate it is possible to register the DPO output in-slice for best timing, but the tools do not seem to know this, and could not be coaxed to put them together. Use my code above for a test case
03-28-2018 01:19 PM
you said it Ray. We need to check 2018.1 when it's released, and get a CR filed if it's still broken. This must not be allowed to persist.
03-12-2020 05:35 AM
Ray, did you ever hear back about this issue (or possibly an answer record)? I'm using 2019.2 and I'm seeing that RAMD32's (still) aren't using the FDRE in the same slice.