cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
458 Views
Registered: ‎01-25-2012

AUTOPIPELINE example

Jump to solution

I have been trying to use the AUTOPIPELINE feature on a custom interface as described in UG949 (v2019.2) UltraFast Design Methodology Guide. So far it hasn't worked. No registers have been added and my small test project fails to meet timing. Other that the mention in the UFDM, there doesn't seem to be any other information out there.

The description in the UFDM is detailed but it isn't clear what is required vs what is just a recommendation. Can I use this with PBLOCKS, with soft PBLOCKS or must it be with USER_SLR_ASSIGNMENT set to a string (and not SLRx)? Why are the attributes which are intended to target nets being applied to registers? Where should this register that being targeted be? ... etc.

Does anyone have an example which works? I guess I could dig through the AXI Register Slice RTL but I was hoping to avoid that.

I have tried lots of different variations but clearly I am missing something. This is just one of the many different things I've tried

 

library ieee;
use IEEE.std_logic_1164.ALL;


entity test_autopipe is
  Port (
      clk        : IN  STD_LOGIC;
      din        : IN  STD_LOGIC;
      dout       : OUT STD_LOGIC
  );
end test_autopipe;

architecture Behavioral of test_autopipe is
component module1
  Port (
      clk        : IN  STD_LOGIC;
      din        : IN  STD_LOGIC;
      dout       : OUT STD_LOGIC
  );
end component;
  signal dataA2B  : STD_LOGIC;
  signal dinA  : STD_LOGIC;
  signal doutB  : STD_LOGIC;
  attribute AUTOPIPELINE_MODULE : boolean;
  attribute AUTOPIPELINE_MODULE of instA         : label is TRUE;
  attribute AUTOPIPELINE_LIMIT : integer;
  attribute AUTOPIPELINE_LIMIT of dataA2B        : signal is 24;
  attribute AUTOPIPELINE_GROUP : string;
  attribute AUTOPIPELINE_GROUP of dataA2B        : signal is "A2B";
begin

  process (clk)
  begin
    if (rising_edge(clk)) then
      dinA      <= din;   -- these registers are just to force the signal to traverse the chip
      dout      <= doutB; -- otherwise the soft USER_SLR_ASSIGNMENT constraint lets the tools move the FFs
    end if;
  end process;

  instA : module1
  Port map (
      clk    => clk   ,
      din    => dinA   ,
      dout   => dataA2B
  );

  instB : module1
  Port map (
      clk    => clk   ,
      din    => dataA2B,
      dout   => doutB
  );

end Behavioral;

 

and

 

entity module1 is
  Port (
      clk        : IN  STD_LOGIC;
      din        : IN  STD_LOGIC;
      dout       : OUT STD_LOGIC
  );
end module1;
architecture Behavioral of module1 is
begin
  process (clk)
  begin
    if (rising_edge(clk)) then
      dout      <= din;
    end if;
  end process;
end Behavioral;

 

and the xdc file (along with some of the failed attempts that are now commented out)

 

create_clock -period 2.500 -name my_clk [get_ports clk]
set_false_path -from [get_ports * -filter direction==in]
set_false_path -to [get_ports * -filter direction==out]

set_property LOC SLICE_X0Y0 [get_cells dinA_reg]
set_property LOC SLICE_X232Y959 [get_cells dout_reg]

set_property USER_SLR_ASSIGNMENT SLR0 [get_cells instA]
set_property USER_SLR_ASSIGNMENT SLR3 [get_cells instB]

#set_property USER_SLR_ASSIGNMENT instAslr [get_cells instA]
#set_property USER_SLR_ASSIGNMENT instCslr [get_cells instB]

#create_pblock pblock_instA
#add_cells_to_pblock [get_pblocks pblock_instA] [get_cells instA]
#resize_pblock [get_pblocks pblock_instA] -add {SLR3}
#
#create_pblock pblock_instB
#add_cells_to_pblock [get_pblocks pblock_instB] [get_cells instB]
#resize_pblock [get_pblocks pblock_instB] -add {SLR0}

#set_property AUTOPIPELINE_MODULE true [get_cells instB]
#set_property AUTOPIPELINE_GROUP my_grp [get_nets instB/dout_reg]
#set_property AUTOPIPELINE_LIMIT 24 [get_nets instB/dout_reg]

 

I'm targeting the Alveo250 board. The archived project is attached.

Ultimately what I'd really like is a language template of XPM component I could drop on the target net.

 

Tags (1)
0 Kudos
1 Solution

Accepted Solutions
Highlighted
274 Views
Registered: ‎01-25-2012

I give up. It seemed like a great idea but it turns out to be just a giant time pit. My recommendation is run away before it sucks you in.

I couldn't get the different pipes to give me a common depth. Maybe I could get it to work with another week of trial and error but maybe not. strike 1.

My design started having hold time failures. I think that was because the auto pipe was adding too many registers. The tools pushed the registers into the  laguna FFs. The data delay became negligible while the clock skew between the SLRs was 0.35ns. Tada, hold failure. Strike 2

I ended up with even more congestion problems with the excess FF's it was adding. Using a smaller number of hard coded pipe registers gave better QoR. Strike 3.

 

View solution in original post

0 Kudos
4 Replies
Highlighted
433 Views
Registered: ‎01-25-2012

Well I find myself answering my own question once again...

I think these are most of the unstated requirements:

  1. Auto-Pipelining attributes must be set in RTL. It doesn't work from XDC
  2. The autopipeline_module and autopipeline_include attributes are not required
  3. The autopipeline_group and autopipeline_limit attributes are required
  4. The attributes should be applied on the net that is the output from a FF.
  5. The attributes should be applied on the net at the same hierarchical level as the FF.
  6. You can apply the attributes to nets on higher levels in the heirarchy but the -flatten_hierarchy must be set to full and a keep applied to the net. It doesn't work if the rebuilt directive is used.
  7. Applying don't touch to the target net prevents it from working. Well duh...
  8. No PBLOCKs or USER_SLR_ASSIGNMENTs are required

This works:

library ieee;
use IEEE.std_logic_1164.ALL;

entity test_autopipe is
  Port (
      clk        : IN  STD_LOGIC;
      din        : IN  STD_LOGIC;
      dout       : OUT STD_LOGIC
  );
end test_autopipe;

architecture Behavioral of test_autopipe is

component module1
  Port (
      clk        : IN  STD_LOGIC;
      din        : IN  STD_LOGIC;
      dout       : OUT STD_LOGIC
  );
end component;

component module2
  Port (
      clk        : IN  STD_LOGIC;
      din        : IN  STD_LOGIC;
      dout       : OUT STD_LOGIC
  );
end component;

  signal dataA2B  : STD_LOGIC;
  signal dinA  : STD_LOGIC;
  signal doutB  : STD_LOGIC;

begin

  process (clk)
  begin
    if (rising_edge(clk)) then
      dinA      <= din;   -- these registers are just to force the signal to traverse the chip
      dout      <= doutB; -- otherwise the soft USER_SLR_ASSIGNMENT constraint lets the tools
                          -- move the signal source and destination.
    end if;
  end process;

  instA : module2
  Port map (
      clk    => clk   ,
      din    => dinA   ,
      dout   => dataA2B
  );

  instB : module1
  Port map (
      clk    => clk   ,
      din    => dataA2B,
      dout   => doutB
  );
end Behavioral;

 

library ieee;
use IEEE.std_logic_1164.ALL;

entity module1 is
  Port (
      clk        : IN  STD_LOGIC;
      din        : IN  STD_LOGIC;
      dout       : OUT STD_LOGIC
  );
end module1;

architecture Behavioral of module1 is

begin

  process (clk)
  begin
    if (rising_edge(clk)) then
      dout      <= din;
    end if;
  end process;

end Behavioral;


library ieee;
use IEEE.std_logic_1164.ALL;

entity module2 is
  Port (
      clk        : IN  STD_LOGIC;
      din        : IN  STD_LOGIC;
      dout       : OUT STD_LOGIC
  );
end module2;

architecture Behavioral of module2 is

  signal dout_r  : STD_LOGIC;
  attribute autopipeline_limit : integer;
  attribute autopipeline_limit of dout_r        : signal is 24;
  attribute autopipeline_group : string;
  attribute autopipeline_group of dout_r        : signal is "my_grp";

begin

  process (clk)
  begin
    if (rising_edge(clk)) then
      dout_r      <= din;
    end if;
  end process;

  dout <= dout_r;

end Behavioral;

 

set_property LOC SLICE_X0Y0 [get_cells dinA_reg]
set_property LOC SLICE_X232Y959 [get_cells dout_reg]

create_clock -period 2.500 -name my_clk [get_ports clk]
set_false_path -from [get_ports * -filter direction==in]
set_false_path -to [get_ports * -filter direction==out]

 

0 Kudos
Highlighted
324 Views
Registered: ‎01-25-2012

I jumped the gun. There are a some other situations where the auto-pipline won't work. If the output net fans out or is connected to LUTs (maybe, didn't check this in detail) the auto-pipline just doesn't happen. There are no warnings or notices, it just doesn't get implemented.

I have made a module that can be placed on a net that seems to reliably generate auto-piplines. The second FF isn't needed so long as the net is connected to a single FF externally. Having both FF's makes it much less likely to fail.

library ieee;
use IEEE.std_logic_1164.ALL;

entity autopipe_regs is
  generic (
    WIDTH          : integer;
    AP_GROUP       : string;
    AP_LIMIT       : integer
  );
  Port (
    clk        : IN  STD_LOGIC;
    din        : IN  std_logic_vector(WIDTH-1 downto 0);
    dout       : OUT std_logic_vector(WIDTH-1 downto 0)
  );
end autopipe_regs;

architecture Behavioral of autopipe_regs is

  signal autopipe  : std_logic_vector(WIDTH-1 downto 0);

  attribute KEEP : string;
  attribute KEEP of autopipe        : signal is "TRUE";

  attribute autopipeline_limit : integer;
  attribute autopipeline_group : string;
  attribute autopipeline_limit of autopipe        : signal is AP_LIMIT;
  attribute autopipeline_group of autopipe        : signal is AP_GROUP;

begin

  process (clk)
  begin
    if (rising_edge(clk)) then
      autopipe <= din;
      dout <= autopipe;
    end if;
  end process;

end Behavioral;

 

0 Kudos
Highlighted
286 Views
Registered: ‎01-25-2012

But wait! There are more hoops you must jump through like a well trained poodle...

When I forced the auto-pipeline feature to connect between different locations and you give it a common group name, you would expect it would to have a common delay between them... but no...

 

Phase 2.1 Floorplanning

Summary of Latency Increase due to Auto-Pipeline Insertion
===========================================================
--------------------------------------------------------------
|  Module  |  Group   |  Limit  |  Actual  |  Include Group  |
--------------------------------------------------------------
|  pipeA   |  my_grp  |     24  |      16  |                 |
|  pipeB   |  my_grp  |     24  |       6  |                 |
|  pipeC   |  my_grp  |     24  |       8  |                 |
|  pipeD   |  my_grp  |     24  |       2  |                 |
--------------------------------------------------------------

 

 

Capture.PNG

I'm guessing this is a result of ignoring the poorly explained autopipeline_module attribute. Give me a minute hour and I'll check.

0 Kudos
Highlighted
275 Views
Registered: ‎01-25-2012

I give up. It seemed like a great idea but it turns out to be just a giant time pit. My recommendation is run away before it sucks you in.

I couldn't get the different pipes to give me a common depth. Maybe I could get it to work with another week of trial and error but maybe not. strike 1.

My design started having hold time failures. I think that was because the auto pipe was adding too many registers. The tools pushed the registers into the  laguna FFs. The data delay became negligible while the clock skew between the SLRs was 0.35ns. Tada, hold failure. Strike 2

I ended up with even more congestion problems with the excess FF's it was adding. Using a smaller number of hard coded pipe registers gave better QoR. Strike 3.

 

View solution in original post

0 Kudos