08-17-2015 10:56 AM - edited 08-17-2015 10:58 AM
I am doing a design that uses a lot of wide shallow fifos of different widths. Because they are shallow I want to use distributed LUT memory to make the fifos. I am instantiating them within for-gen loops.
The Vivado IP core generator can make nice distributed fifos but I would rather not have a different core for every different width of fifo. Word size grows incrementally as I go though the processing pipeline so I would end up with quite a few fifo cores that differ only in width.
Does Xilinx provide a parameterized fifo library component that can have width assigned at instantiation?
Is there a way to use the IP Core generator in such a way that I can have a core that is parametrized for width?
What about fifo inference? Is is practical to write a parametrized fifo model that will synthesize to good quality distributed fifos?
For standard memory components we really should not have to grope for the core generator. It should be possible to just instantiate parameterized library modules on the fly.
08-17-2015 01:11 PM
Inference is probably your best option for distributed RAM. For Block RAM there are Macros you could use: like FIFO_SYNC_MACRO, but those seem to only use BRAM.
I do not know of any way to make the IP Core generator created parametrized cores.
You can create your own component from the RAMnnn components, which are typically just 1 bit wide of various depths, in the various Libraries Guides (UG768 for 7-Series), but that might be more work than you are looking for. If you follow the requirements for inference of DRAM, you should be fine.
08-17-2015 02:12 PM - edited 08-17-2015 02:49 PM
I normally infer rams so that I can make them parameterized with the generic map feature of VHDL.
In this case I need FIFOs. I don't find information on inferring FIFOs but I have friends that just design their own FIFOs by inferring rams, read/write counters, etc. That is more than I want to get into but is possible.
I have noticed that when I generate a distributed memory FIFO using the core generator in Vivado it creates a .vhd file under the synth/ folder. Inside that folder it instantiates a fifo_generator_v12_0 fifo from library fifo_generator_v12_0. On that instantiated fifo the generics are set to implement the fifo that I requested in the Vivado core generator.
I tried to take the design from the synth folder and compile it as source but got all kinds of missing black box errors. I wonder if is possible to instantiate the fifo_generator_v12_0 component in my source.
While searching for answers I found some references to the Altera Library of Parameterized Modules (LPM) library. That seems to be what I want. I don't know how well it works and Xilinx has no support for LPM of course.
08-17-2015 02:39 PM
08-17-2015 02:52 PM
For now, I only need common clock FIFOs. That simplifies the logic a lot. Maybe it makes sense to write my own FIFO.
08-17-2015 03:57 PM
08-18-2015 07:20 AM
08-18-2015 12:20 PM
I wrote a simple common clock fifo just so that I could parameterize it. The source code is attached here but it has not been tested thoroughly. In a 48 wide by 128 deep configuration it barely meets my Fmax requirement of 250MHz in a -1 Artix. Logic utilization is about the same as the Vivado core of the same dimensions.
Now I can go back to my actual work.
entity sync_fifo is
awidth : natural := 8;
dwidth : natural := 16);
clk : in std_logic;
reset : in std_logic;
wr_ena : in std_logic;
wr_data : in std_logic_vector(dwidth-1 downto 0);
rd_ena : in std_logic;
rd_data : out std_logic_vector(dwidth-1 downto 0);
empty : out std_logic;
full : out std_logic);
end entity sync_fifo;
architecture rtl of sync_fifo is
type d_array_t is array (2**awidth-1 downto 0) of std_logic_vector(dwidth-1 downto 0);
signal d_array : d_array_t;
attribute ram_style : string;
attribute ram_style of d_array : signal is "distributed";
signal wr_addr, rd_addr : unsigned(awidth-1 downto 0);
signal wr_ptr, rd_ptr : unsigned(awidth-0 downto 0); -- extra bit for full/empty discrimination
signal ram_dout : std_logic_vector(dwidth-1 downto 0);
signal full_int : std_logic;
signal empty_int : std_logic;
wait until rising_edge(clk);
if reset = '1' then
wr_ptr <= (others=>'0');
rd_ptr <= (others=>'0');
rd_data <= (others=>'0');
if wr_ena='1' and full_int='0' then
wr_ptr <= wr_ptr + 1;
-- ram write is synchronous
d_array(to_integer(wr_addr)) <= wr_data;
if rd_ena='1' and empty_int='0' then
rd_ptr <= rd_ptr + 1;
rd_data <= ram_dout;
-- rip off part of pointers that is memory address.
wr_addr <= wr_ptr(wr_addr'range);
rd_addr <= rd_ptr(rd_addr'range);
-- ram read is asynchronous
ram_dout <= d_array(to_integer(rd_addr));
-- full and empty
empty_int <= '1' when ((wr_addr = rd_addr) and (wr_ptr(wr_ptr'left) = rd_ptr(rd_ptr'left))) else '0';
full_int <= '1' when ((wr_addr = rd_addr) and (wr_ptr(wr_ptr'left) /= rd_ptr(rd_ptr'left))) else '0';
empty <= empty_int;
full <= full_int;
end architecture rtl;
08-18-2015 02:16 PM
08-18-2015 06:05 PM
08-21-2015 11:38 AM
I did try both those two fifos before I did my own. I believe both of them failed my Fmax timing spec.