cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
jonbho
Observer
Observer
570 Views
Registered: ‎10-23-2019

What is the reason behind 7 wait states when accessing DDR memory from Zynq 7000 PL?

Hi, we are using HLS synthesized logic to process large amounts of data from DDR memory on a Zynq 7010, using the AXI_HP0 port. It works well and we can get some decent bandwidth. We have noticed that the start of a burst-mode DDR read operations takes 7 stages in the finite state machine generated by HLS, this can be seen in the generated VHDL or in the intermediate report files:

...
State 2 <SV = 1> <Delay = 4.86> ST_2 : Operation 19 [1/1] (0.00ns) ---> "%tmp_4 = zext i29 %input1_V3 to i32" ---> Operation 19 'zext' 'tmp_4' <Predicate = true> <Delay = 0.00> ST_2 : Operation 20 [1/1] (0.00ns) ---> "%m_axi_hp2_addr = getelementptr i64* %m_axi_hp2, i32 %tmp_4" ---> Operation 20 'getelementptr' 'm_axi_hp2_addr' <Predicate = true> <Delay = 0.00> ST_2 : Operation 21 [1/1] (0.00ns) ---> "%tmp_6 = zext i29 %input0_V1 to i32" ---> Operation 21 'zext' 'tmp_6' <Predicate = true> <Delay = 0.00> ST_2 : Operation 22 [1/1] (0.00ns) ---> "%m_axi_hp0_addr = getelementptr i64* %m_axi_hp0, i32 %tmp_6" ---> Operation 22 'getelementptr' 'm_axi_hp0_addr' <Predicate = true> <Delay = 0.00> ST_2 : Operation 23 [7/7] (4.86ns) ---> "%m_axi_hp0_addr_rd_re = call i1 @_ssdm_op_ReadReq.m_axi.i64P(i64* %m_axi_hp0_addr, i32 %tmp_3)" [D:/Xilinx/workspace/membw/src/fpga-membw.cpp:20] ---> Operation 23 'readreq' 'm_axi_hp0_addr_rd_re' <Predicate = true> <Delay = 4.86> <Core = "m_axi"> ---> Core 9 'm_axi' <Latency = 6> <II = 1> <Delay = 1.00> <Adapter> <Opcode : 'read' 'write' 'readreq' 'writereq' 'writeresp'> ST_2 : Operation 24 [7/7] (4.86ns) ---> "%m_axi_hp2_addr_rd_re = call i1 @_ssdm_op_ReadReq.m_axi.i64P(i64* %m_axi_hp2_addr, i32 %tmp_3)" [D:/Xilinx/workspace/membw/src/fpga-membw.cpp:21] ---> Operation 24 'readreq' 'm_axi_hp2_addr_rd_re' <Predicate = true> <Delay = 4.86> <Core = "m_axi"> ---> Core 9 'm_axi' <Latency = 6> <II = 1> <Delay = 1.00> <Adapter> <Opcode : 'read' 'write' 'readreq' 'writereq' 'writeresp'> State 3 <SV = 2> <Delay = 4.86> ST_3 : Operation 25 [6/7] (4.86ns) ---> "%m_axi_hp0_addr_rd_re = call i1 @_ssdm_op_ReadReq.m_axi.i64P(i64* %m_axi_hp0_addr, i32 %tmp_3)" [D:/Xilinx/workspace/membw/src/fpga-membw.cpp:20] ---> Operation 25 'readreq' 'm_axi_hp0_addr_rd_re' <Predicate = true> <Delay = 4.86> <Core = "m_axi"> ---> Core 9 'm_axi' <Latency = 6> <II = 1> <Delay = 1.00> <Adapter> <Opcode : 'read' 'write' 'readreq' 'writereq' 'writeresp'> ST_3 : Operation 26 [6/7] (4.86ns) ---> "%m_axi_hp2_addr_rd_re = call i1 @_ssdm_op_ReadReq.m_axi.i64P(i64* %m_axi_hp2_addr, i32 %tmp_3)" [D:/Xilinx/workspace/membw/src/fpga-membw.cpp:21] ---> Operation 26 'readreq' 'm_axi_hp2_addr_rd_re' <Predicate = true> <Delay = 4.86> <Core = "m_axi"> ---> Core 9 'm_axi' <Latency = 6> <II = 1> <Delay = 1.00> <Adapter> <Opcode : 'read' 'write' 'readreq' 'writereq' 'writeresp'> State 4 <SV = 3> <Delay = 4.86> ST_4 : Operation 27 [5/7] (4.86ns) ---> "%m_axi_hp0_addr_rd_re = call i1 @_ssdm_op_ReadReq.m_axi.i64P(i64* %m_axi_hp0_addr, i32 %tmp_3)" [D:/Xilinx/workspace/membw/src/fpga-membw.cpp:20] ---> Operation 27 'readreq' 'm_axi_hp0_addr_rd_re' <Predicate = true> <Delay = 4.86> <Core = "m_axi"> ---> Core 9 'm_axi' <Latency = 6> <II = 1> <Delay = 1.00> <Adapter> <Opcode : 'read' 'write' 'readreq' 'writereq' 'writeresp'> ST_4 : Operation 28 [5/7] (4.86ns) ---> "%m_axi_hp2_addr_rd_re = call i1 @_ssdm_op_ReadReq.m_axi.i64P(i64* %m_axi_hp2_addr, i32 %tmp_3)" [D:/Xilinx/workspace/membw/src/fpga-membw.cpp:21] ---> Operation 28 'readreq' 'm_axi_hp2_addr_rd_re' <Predicate = true> <Delay = 4.86> <Core = "m_axi"> ---> Core 9 'm_axi' <Latency = 6> <II = 1> <Delay = 1.00> <Adapter> <Opcode : 'read' 'write' 'readreq' 'writereq' 'writeresp'> State 5 <SV = 4> <Delay = 4.86> ST_5 : Operation 29 [4/7] (4.86ns) ---> "%m_axi_hp0_addr_rd_re = call i1 @_ssdm_op_ReadReq.m_axi.i64P(i64* %m_axi_hp0_addr, i32 %tmp_3)" [D:/Xilinx/workspace/membw/src/fpga-membw.cpp:20] ---> Operation 29 'readreq' 'm_axi_hp0_addr_rd_re' <Predicate = true> <Delay = 4.86> <Core = "m_axi"> ---> Core 9 'm_axi' <Latency = 6> <II = 1> <Delay = 1.00> <Adapter> <Opcode : 'read' 'write' 'readreq' 'writereq' 'writeresp'> ST_5 : Operation 30 [4/7] (4.86ns) ---> "%m_axi_hp2_addr_rd_re = call i1 @_ssdm_op_ReadReq.m_axi.i64P(i64* %m_axi_hp2_addr, i32 %tmp_3)" [D:/Xilinx/workspace/membw/src/fpga-membw.cpp:21] ---> Operation 30 'readreq' 'm_axi_hp2_addr_rd_re' <Predicate = true> <Delay = 4.86> <Core = "m_axi"> ---> Core 9 'm_axi' <Latency = 6> <II = 1> <Delay = 1.00> <Adapter> <Opcode : 'read' 'write' 'readreq' 'writereq' 'writeresp'> State 6 <SV = 5> <Delay = 4.86> ST_6 : Operation 31 [3/7] (4.86ns) ---> "%m_axi_hp0_addr_rd_re = call i1 @_ssdm_op_ReadReq.m_axi.i64P(i64* %m_axi_hp0_addr, i32 %tmp_3)" [D:/Xilinx/workspace/membw/src/fpga-membw.cpp:20] ---> Operation 31 'readreq' 'm_axi_hp0_addr_rd_re' <Predicate = true> <Delay = 4.86> <Core = "m_axi"> ---> Core 9 'm_axi' <Latency = 6> <II = 1> <Delay = 1.00> <Adapter> <Opcode : 'read' 'write' 'readreq' 'writereq' 'writeresp'> ST_6 : Operation 32 [3/7] (4.86ns) ---> "%m_axi_hp2_addr_rd_re = call i1 @_ssdm_op_ReadReq.m_axi.i64P(i64* %m_axi_hp2_addr, i32 %tmp_3)" [D:/Xilinx/workspace/membw/src/fpga-membw.cpp:21] ---> Operation 32 'readreq' 'm_axi_hp2_addr_rd_re' <Predicate = true> <Delay = 4.86> <Core = "m_axi"> ---> Core 9 'm_axi' <Latency = 6> <II = 1> <Delay = 1.00> <Adapter> <Opcode : 'read' 'write' 'readreq' 'writereq' 'writeresp'> State 7 <SV = 6> <Delay = 4.86> ST_7 : Operation 33 [2/7] (4.86ns) ---> "%m_axi_hp0_addr_rd_re = call i1 @_ssdm_op_ReadReq.m_axi.i64P(i64* %m_axi_hp0_addr, i32 %tmp_3)" [D:/Xilinx/workspace/membw/src/fpga-membw.cpp:20] ---> Operation 33 'readreq' 'm_axi_hp0_addr_rd_re' <Predicate = true> <Delay = 4.86> <Core = "m_axi"> ---> Core 9 'm_axi' <Latency = 6> <II = 1> <Delay = 1.00> <Adapter> <Opcode : 'read' 'write' 'readreq' 'writereq' 'writeresp'> ST_7 : Operation 34 [2/7] (4.86ns) ---> "%m_axi_hp2_addr_rd_re = call i1 @_ssdm_op_ReadReq.m_axi.i64P(i64* %m_axi_hp2_addr, i32 %tmp_3)" [D:/Xilinx/workspace/membw/src/fpga-membw.cpp:21] ---> Operation 34 'readreq' 'm_axi_hp2_addr_rd_re' <Predicate = true> <Delay = 4.86> <Core = "m_axi"> ---> Core 9 'm_axi' <Latency = 6> <II = 1> <Delay = 1.00> <Adapter> <Opcode : 'read' 'write' 'readreq' 'writereq' 'writeresp'> State 8 <SV = 7> <Delay = 4.86> ST_8 : Operation 35 [1/1] (0.00ns) ---> "call void (...)* @_ssdm_op_SpecBitsMap(i64* %m_axi_hp2), !map !45" ---> Operation 35 'specbitsmap' <Predicate = true> <Delay = 0.00> ST_8 : Operation 36 [1/1] (0.00ns) ---> "call void (...)* @_ssdm_op_SpecBitsMap(i64* %m_axi_hp0), !map !49" ---> Operation 36 'specbitsmap' <Predicate = true> <Delay = 0.00> ST_8 : Operation 37 [1/1] (0.00ns) ---> "call void (...)* @_ssdm_op_SpecBitsMap(i64* %out_V), !map !53" ---> Operation 37 'specbitsmap' <Predicate = true> <Delay = 0.00> ST_8 : Operation 38 [1/1] (0.00ns) ---> "call void (...)* @_ssdm_op_SpecBitsMap(i64 %len), !map !57" ---> Operation 38 'specbitsmap' <Predicate = true> <Delay = 0.00> ST_8 : Operation 39 [1/1] (0.00ns) ---> "call void (...)* @_ssdm_op_SpecTopModule([14 x i8]* @mem_read_test_str) nounwind" ---> Operation 39 'spectopmodule' <Predicate = true> <Delay = 0.00> ST_8 : Operation 40 [1/1] (0.00ns) ---> "call void (...)* @_ssdm_op_SpecInterface(i64 %len, [10 x i8]* @p_str, i32 0, i32 0, [1 x i8]* @p_str1, i32 0, i32 0, [1 x i8]* @p_str1, [1 x i8]* @p_str1, [1 x i8]* @p_str1, i32 0, i32 0, i32 0, i32 0, [1 x i8]* @p_str1, [1 x i8]* @p_str1) nounwind" [D:/Xilinx/workspace/membw/src/fpga-membw.cpp:15] ---> Operation 40 'specinterface' <Predicate = true> <Delay = 0.00> ST_8 : Operation 41 [1/1] (0.00ns) ---> "call void (...)* @_ssdm_op_SpecInterface(i64* %out_V, [10 x i8]* @p_str, i32 0, i32 0, [1 x i8]* @p_str1, i32 0, i32 0, [1 x i8]* @p_str1, [1 x i8]* @p_str1, [1 x i8]* @p_str1, i32 0, i32 0, i32 0, i32 0, [1 x i8]* @p_str1, [1 x i8]* @p_str1) nounwind" [D:/Xilinx/workspace/membw/src/fpga-membw.cpp:15] ---> Operation 41 'specinterface' <Predicate = true> <Delay = 0.00> ST_8 : Operation 42 [1/1] (0.00ns) ---> "call void (...)* @_ssdm_op_SpecInterface(i32 0, [10 x i8]* @p_str, i32 0, i32 0, [1 x i8]* @p_str1, i32 0, i32 0, [1 x i8]* @p_str1, [1 x i8]* @p_str1, [1 x i8]* @p_str1, i32 0, i32 0, i32 0, i32 0, [1 x i8]* @p_str1, [1 x i8]* @p_str1) nounwind" [D:/Xilinx/workspace/membw/src/fpga-membw.cpp:15] ---> Operation 42 'specinterface' <Predicate = true> <Delay = 0.00> ST_8 : Operation 43 [1/1] (0.00ns) ---> "call void (...)* @_ssdm_op_SpecLatency(i32 1, i32 65535, [1 x i8]* @p_str1) nounwind" [D:/Xilinx/workspace/membw/src/fpga-membw.cpp:15] ---> Operation 43 'speclatency' <Predicate = true> <Delay = 0.00> ST_8 : Operation 44 [1/1] (0.00ns) ---> "call void (...)* @_ssdm_op_SpecInterface(i64* %m_axi_hp0, [6 x i8]* @p_str2, i32 0, i32 0, [1 x i8]* @p_str1, i32 0, i32 0, [10 x i8]* @p_str3, [6 x i8]* @p_str4, [1 x i8]* @p_str1, i32 16, i32 16, i32 16, i32 16, [1 x i8]* @p_str1, [1 x i8]* @p_str1) nounwind" [D:/Xilinx/workspace/membw/src/fpga-membw.cpp:15] ---> Operation 44 'specinterface' <Predicate = true> <Delay = 0.00> ST_8 : Operation 45 [1/1] (0.00ns) ---> "call void (...)* @_ssdm_op_SpecInterface(i32 %input0_V, [10 x i8]* @mode, i32 0, i32 0, [1 x i8]* @p_str1, i32 0, i32 0, [1 x i8]* @bundle, [6 x i8]* @p_str4, [1 x i8]* @p_str1, i32 16, i32 16, i32 16, i32 16, [1 x i8]* @p_str1, [1 x i8]* @p_str1) nounwind" [D:/Xilinx/workspace/membw/src/fpga-membw.cpp:15] ---> Operation 45 'specinterface' <Predicate = true> <Delay = 0.00> ST_8 : Operation 46 [1/1] (0.00ns) ---> "call void (...)* @_ssdm_op_SpecInterface(i64* %m_axi_hp2, [6 x i8]* @p_str2, i32 0, i32 0, [1 x i8]* @p_str1, i32 0, i32 0, [10 x i8]* @p_str5, [6 x i8]* @p_str4, [1 x i8]* @p_str1, i32 16, i32 16, i32 16, i32 16, [1 x i8]* @p_str1, [1 x i8]* @p_str1) nounwind" [D:/Xilinx/workspace/membw/src/fpga-membw.cpp:16] ---> Operation 46 'specinterface' <Predicate = true> <Delay = 0.00> ST_8 : Operation 47 [1/1] (0.00ns) ---> "call void (...)* @_ssdm_op_SpecInterface(i32 %input1_V, [10 x i8]* @mode1, i32 0, i32 0, [1 x i8]* @p_str1, i32 0, i32 0, [1 x i8]* @bundle2, [6 x i8]* @p_str4, [1 x i8]* @p_str1, i32 16, i32 16, i32 16, i32 16, [1 x i8]* @p_str1, [1 x i8]* @p_str1) nounwind" [D:/Xilinx/workspace/membw/src/fpga-membw.cpp:16] ---> Operation 47 'specinterface' <Predicate = true> <Delay = 0.00> ST_8 : Operation 48 [1/7] (4.86ns) ---> "%m_axi_hp0_addr_rd_re = call i1 @_ssdm_op_ReadReq.m_axi.i64P(i64* %m_axi_hp0_addr, i32 %tmp_3)" [D:/Xilinx/workspace/membw/src/fpga-membw.cpp:20] ---> Operation 48 'readreq' 'm_axi_hp0_addr_rd_re' <Predicate = true> <Delay = 4.86> <Core = "m_axi"> ---> Core 9 'm_axi' <Latency = 6> <II = 1> <Delay = 1.00> <Adapter> <Opcode : 'read' 'write' 'readreq' 'writereq' 'writeresp'> ST_8 : Operation 49 [1/7] (4.86ns) ---> "%m_axi_hp2_addr_rd_re = call i1 @_ssdm_op_ReadReq.m_axi.i64P(i64* %m_axi_hp2_addr, i32 %tmp_3)" [D:/Xilinx/workspace/membw/src/fpga-membw.cpp:21] ---> Operation 49 'readreq' 'm_axi_hp2_addr_rd_re' <Predicate = true> <Delay = 4.86> <Core = "m_axi"> ---> Core 9 'm_axi' <Latency = 6> <II = 1> <Delay = 1.00> <Adapter> <Opcode : 'read' 'write' 'readreq' 'writereq' 'writeresp'> ST_8 : Operation 50 [1/1] (1.76ns) ---> "br label %1" [D:/Xilinx/workspace/membw/src/fpga-membw.cpp:18] ---> Operation 50 'br' <Predicate = true> <Delay = 1.76>

This example is reading from both HP0 and HP2 at the same time, but this is not relevant here. I'm most interested in states 2 to 7 in the state machine, I have checked the generated VHDL code, and I have verified that those states actually do nothing, so they are just wait states. Crucially, if you use direct random access to DDR instead of burst mode access, every operation has those 7 states, so it's quite slow. In the case above we're compiling at 150MHz, but even if we switch the clock to something different such as 100MHz, the number of wait states is still 7. The VHDL code seems to show that the first state does wait for some valid signal from the AXI interface and actually blocking before advancing to the next state, so it's not as if you really need explicit wait states in the finite state machien to wait for a some externally triggered valid state.

My question is: what are those 7 wait states for? Why are they necessary? If you are using the ACP AXI port and the data is in the L2 cache, accessing it should be much faster than reading from DDR, isn't it the case that the 7 wait states make it impossible to take advantage of the L2 cache vs the slower external DDR interface?

Thanks you,

  -- Jon

0 Replies