cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Contributor
Contributor
835 Views
Registered: ‎08-28-2020

Data transmission from PS to PL using DMA ACP

Hello, I'm trying to transfer data (which are already cached), to the PL via ACP port. Data transmission is successful but I observed that when AXI DMA reads the data from the cache in PS, for each burst transfer of data (64 bytes which consists of 16 beats of 4 bytes) RVALID deasserts for few clock cycles as you can see in the below image. (Here AXI DMA M_AXI_MM2S channel and M_AXIS_MM2S data width given as 32 bits)

slverr in M_AXI_MM2S AR channel RRESP.jpg

In this situation, ACP port is only used to transmit data to the PL side using the AXI DMA and ACP port will not be utilized by any other module in the FPGA. I can't explain why it happens.

In another similar situation where I used ACP port to transmit data  (64 bytes which consists of 4 beats of 16 bytes), I found that sometimes RLAST kept asserted for a long time which drives RVALID to be kept deasserted. And also it's evident that for each burst (64 bytes transfer) RVALID deasserted for a single clock cycle. (Here AXI DMA M_AXI_MM2S channel and M_AXIS_MM2S data width given as 128 bits)

data transmission discontinuity with RVALID deassertion.jpg

 

I like to clearly understand what could be the possible issue for this situation and how can I improve it to transmit data continuously without RVALID being deasserted. 

Tags (3)
0 Kudos
Reply
7 Replies
Adventurer
Adventurer
753 Views
Registered: ‎08-07-2014

hello @kavinduvsomadas 

 

What SoC are you using ? Zynq 7000 or Zynq Ultrascale+ MPSoC ?

 

-

Brasilino

Contributor
Contributor
744 Views
Registered: ‎08-28-2020

@brasilino Zynq Ultrascale+ MPSoC

0 Kudos
Reply
Adventurer
Adventurer
703 Views
Registered: ‎08-07-2014

Hello @kavinduvsomadas 

 

How the AXI READ Address channel looks like? ACP is an AXI4 interface, but other interfaces like HP*_FPD are converted to AXI3 at PS side, thus having a burst limit of 16. AXI DMA might be limiting burst size for that reason.

In regards of RLAST, in those clock cycles that RVALID is deasserted, it doesn't matter to which level RLAST is set: RVALID is telling that anything available in the interface (RDATA,RLAST,RRESP) during that time should not be taken in account. They are all invalid.
The circuitry AXI DMA is connected to isn't deasserting RLAST probably to save resources, i.e., no need to add logic to deassert something we don't care.

 

Hope it helps!

Brasilino

Contributor
Contributor
619 Views
Registered: ‎08-28-2020

I'm really sorry for being late to answer the above concerns. And thank you for answering my issue. Here what I have gathered so far.


How the AXI READ Address channel looks like?

I only took following screen shot of AR channel of AXI interfacing of AXI DMA connected with ACP port. 

M_AXI_MM2S AR channel .jpg

Please avoid the slave error happened at the final burst transaction (axi_dma_1_M_AXI_MM2S_RRESP). It happened because that final burst transaction did not consist of 64 bytes (Masters targeting transactions via ACP interface should consists of 64 byte aligned 64 byte read/write INCR transactions - https://www.xilinx.com/support/answers/66643.html).  

I have mentioned that "...RLAST kept assserted for a long time which drives RVALID to be kept deasserted". Here what I wanted to say was why RVALID deasserts after every burst transaction. Why it's actually not continuous.

Because when it comes transaction from DRAM to PL via HP port with AXI DMA, RVALID kept asserted until all the data transferred. (HP port clock - 200 MHz, burst length- 16 beats, burst size - 16 bytes per beat) Through HP port I transferred 1536 bytes (which requires a bandwidth of 16 * 200 = 3200 MB/s = 3.125 GB/s) without any RVALID deassertion until the end of transmission. (check the below images)

M_AXI_MM2S connected to S_AXI_HP0_FPDM_AXI_MM2S connected to S_AXI_HP0_FPD

BP overview.jpg

Here I'm using ultra-96 V1 which consists of LPDDR4 DRAM and it's working at 533 MHz with 32 bit data bus width (Therefore it can provide data bandwidth of 2 * 4 * 533 = 4264 MB/s = 4.16 GB/s roughly)

Since 4.16 GB/s > 3.125 GB/s, HP port should transfer data without any RVALID deassertion and it is evident from the above image.

But when it comes to ACP port, it directly connects with cache of the MPSoC. cache is much faster than DRAM, therefore it needs to transfer data continuously without RVALID deassertion. (for ACP port situation I've given in the issue, it's working at 100 MHz. Therefore required bandwidth = 16 * 100 = 1600 MB/s = 1.56 GB/s) 

1.56 GB/s is much lower than 4.16 GB/s of LPDDR4 bandwidth. Therefore I believe I have done some kind of mistake while configuring ACP connectivity which leads to RVALID deassertion making ACP port unable to transfer data continuously.

Can you please explain how can I improve it to transfer data continuously through ACP port similar to HP port ?

0 Kudos
Reply
Contributor
Contributor
550 Views
Registered: ‎04-02-2014

I don't know if it helps you, but this is my experience with the ACP port.

I have used the ACP port in the application shown below.

These applications use the ZynqMP-ACP-Adapter. ZynqMP-ACP-Adapter is an adapter to connect AXI4 Master to ZynqMP Accelerator Coherency Port(ACP). ZynqMP-ACP-Adapter is available at the following URL.

 

The waveforms measured by the ILA (Integrated Logic Analyzer) when reading data from the ACP port in these applications are shown below.

fig12.png

This waveform measures the read transaction from the AXI side with Addr = 0x00_7010_6400 and burst length = 160 words divided into 4 words x 40 read transactions by the ZynqMP-ACP-Adapter. (The frequency is 250MHz.)

Looking at this lower half of waveform, ZynqMP ACP outputs the first read data 8 clocks after the read transaction request (ACP_ARVALID = 1 and ACP_ARREADY = 1), and then outputs the read data of the remaining words with no wait.

Since the 160-word (2560Byte) read transaction is completed in 688nsec (4nsec x 172clock), we can see that the transfer rate is approximately 3.7GByte/sec.

I don't know how it differs from your application, but maybe the following are different:

  • It is directly connected to the ACP port without using AXI-Interconnect or AXI SmartConnect.
  • The data width of ACP or AXI is 128bit(16byte). Therefore, an extra bus width conversion circuit is not included.
  • The address information is speculatively put into the ACP AR channel as much as possible.

 

Adventurer
Adventurer
493 Views
Registered: ‎08-07-2014

Hello @kavinduvsomadas 

 

Thanks for the graphs, they were very helpful.

But when it comes to ACP port, it directly connects with cache of the MPSoC. cache is much faster than DRAM, therefore it needs to transfer data continuously without RVALID deassertion. (for ACP port situation I've given in the issue, it's working at 100 MHz. Therefore required bandwidth = 16 * 100 = 1600 MB/s = 1.56 GB/s) 

Cache is much faster than DRAM in terms of access latency, but not necessarily bandwidth. ACP port treat every transaction as coherent, which  triggers many housekeeping operations to keep L2 and L1 caches updated.
ZynqMPSoC states that ACP port has limited throughput:

brasilino_0-1604929114099.png

Can you please explain how can I improve it to transfer data continuously through ACP port similar to HP port ?



Using the example and experience from @kawazome , it was possible to keep RVALID asserted when using multiple 16-bytes transfer. So I would suggest you to split your 64byte transfer in 4 16byte transfers always keeping/tracking four outstanding transactions towards ACP interface. https://github.com/ikwzm/ZynqMP-ACP-Adapter project seems to do the splitting transparently 

regards

Brasilino

Contributor
Contributor
316 Views
Registered: ‎08-28-2020

As you told, this happens because ACP port can provide only 4 outstanding transactions. As a result, AXI DMA won't be able to keep the RVALID asserted continuously. But now I tried to use 2 clock domains. In one clock domain, transmit data from ACP and buffer them. Then transmit it to the next clock domain. But somehow I failed. Can you please look at that issue too.

https://forums.xilinx.com/t5/Processor-System-Design-and-AXI/Problem-in-AXI-Stream-FIFO-in-CDC/m-p/1177353

 

0 Kudos
Reply