UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Scholar norman_wong
Scholar
11,528 Views
Registered: ‎05-28-2012

PL330 DMAC Test Code

Here's some test code for others that may be wrangling with the PL330 DMAC. Some notes

- PL code that responds to PL330 DMA burst handshaking

- 2014.1 linux kernel from the Xilinx Git

- FPGA to DDR is tested

- DDR to FPGA is not tested.

No warranty. No guarantees.

 

6 Replies
Xilinx Employee
Xilinx Employee
11,524 Views
Registered: ‎10-24-2013

Re: PL330 DMAC Test Code

Hi @norman_wong,
Thanks for the contribution. We really appreciate it.
Thanks,Vijay
--------------------------------------------------------------------------------------------
Please mark the post as an answer "Accept as solution" in case it helped resolve your query.
Give kudos in case a post in case it guided to the solution.
0 Kudos
Scholar norman_wong
Scholar
11,499 Views
Registered: ‎05-28-2012

Re: PL330 DMAC Test Code

A bit more detailed release notes:

FPGA
- PL code needs to respond to PL330 DMA burst handshaking. See Zynq TRM. PL code is not included here.
- PL code needs implement a 32-bit FIFO style register at 0x40000000. This base address can be changed via module parameters.
- PL code needs to use peripheral channel 0. This can be changed via module parameters.

pl330.c
- Modification of the pl330.c of the 2014.1 linux kernel from the Xilinx Git.
- Memory to memory code is left unmodified. Only peripheral code has been modified.
- Add support for burst.
- Removed support for single mode. Burst mode with a length of 1 should be equivalent.
- Add support for unrolled loops
- Add support for reduced handshaking. Original code WFP and FLUSHP on every word. Support option to WFP and FLUSHP once per transaction.
- Add support for PL330 specific config.
- General refactoring for readability or hoped for speed. Add comments.
- In theory, modifications are backwards compatible. Should not break the existing platforms that use the PL330 driver.
- File passes the linux checkpatch script.

hwfifo.ko
- Implements a devfs interface to the PL330 driver. Device file is "/dev/hwfifo".
- Assumes a 32-bit FIFO register in the FPGA at a physical address of 0x40000000. Address can be changed via module parameters.
- Assumes a peripheral channel of 0. This can be changed via module parameters.
- Uses the newly added PL330 specific config. Will work with original driver but transfer speeds will be slower as every word will have a WFP and FLUSHP.
- Burst length defaults to one.
- See readme.txt in source for details on module parameters and usage examples.

Test Case
- Zynq 7010
- PL at 100Mhz. PL implements 32-bit register that incrementing sequence for data verification.
- FPGA to DDR has been tested.
- DDR to FPGA has NOT been tested. Outside my project requirements.
- DDR is 16-bit, BL=8
- DMA burst=4. Higher burst lengths result is data corruption.

- Maximum transfer size tested is 128KB. At this size, transfer rate is about 200MB/s.

No warranty. No guarantees.

 

0 Kudos
Adventurer
Adventurer
10,949 Views
Registered: ‎03-27-2014

Re: PL330 DMAC Test Code

Hello Norman,

 

I am trying to move data from the DRAM to M_AXI_GP port via the PS DMA Controller. I haven't use Linux drivers before so am having trouble getting started. I would appreciate just a little bit of pointers to get going.

 

I have C code (Linux app) that does a memcpy() today to move data. This is very slow (I don't quite need the CPU cycles as the ARM processor doesn't have much to do in my case right now). I want to increase throughput and use the hard DMA controller (not instantiate a new DMA if I can avoid it).

 

What do I have to do to replace the memcpy() with DMA calls? I have a 3.14.0-69913493 kernel with CONFIG_XILINX_DMA=y in the .config and HAS_DMA=y as well. This suggests that I don't need to build a loadable kernel module, correct? 

 

As you can see, I am quite lost and something basic will help me a lot. After that I can perhaps use your (and others) useful posts more.

0 Kudos
Scholar norman_wong
Scholar
10,938 Views
Registered: ‎05-28-2012

Re: PL330 DMAC Test Code

For the PL330, the .config has "CONFIG_PL330_DMA=y". I believe CONFIG_XILINX_DMA is for the Xilinx soft IPs. In the 2014.3/3.15 kernel, both PL330 and Xilinx DMA is compiled into the kernel. The devicetree will also need a definition to select which one is active.

All DMA operations occur in kernel space. Userspace needs a way to access the kernel space. There is no standard way for userspace to access DMA. That is where the loadable kernel module comes in. You have to write a kernel module to use the DMA from userspace.

An additional complication is virtual and physical memory. DMA requires the physical addresses that are usually only accessed in kernel space. Virtual memory also implies a paging of memory. Pages of memory may be swapped out at any time. DMA requires memory to be permanently swapped in for the duration of the transfer. For these reasons, DMA to userspace is difficult, non-portable and usually avoided. Most code usualy double buffers to move data between kernel and userspace. I believe John Linn posted some presentations to show how to do this specifically for the Zybqw.

I'd suggest reading the Linux Device Drivers book (3rd or 4th edition). I think it available online for free.

 

0 Kudos
Visitor iviktorin
Visitor
8,864 Views
Registered: ‎08-13-2014

Re: PL330 DMAC Test Code

Hello,

the Linux kernel supports pl330 but it provides only in-kernel API for this. The actual transfers are quite inefficient. Also, the way pl330 works is strange because it is difficult to control the amount of transferred data (especially in RX direction). You need to cooperate from the FPGA quite a lot.

So, to run with the current Linux implementation, you need a kernel module that uses the dmaengine API and provides an interface to the userspace (eg character device). It is definitely not like memcpy. It is about syscalls write and read and you have to take care of cache coherency in kernel and you must cooperate with your pl330-enabled peripheral in fpga to not overflow buffers.

I can propose you a commercial solution that takes care of those.

Regards
Jan
Ph.D. student at Brno University of Technology | System Architect at RehiveTech spin-off company
0 Kudos
Scholar milosoftware
Scholar
8,813 Views
Registered: ‎10-26-2012

Re: PL330 DMAC Test Code

Using the PL330 to do memcpy() of data is not not going to help, been there, done that.

 

The PL330 doesn't have access to the CPU caches (coherency), so the caches must be flushed manually by the CPU. This operation is cumbersome and takes up a LOT of CPU cycles. So much even, that it provides hardly any improvement at all over just copying the memory using the CPU, even if the memory blocks are larger than the L2 cache. The "flush cache" operation limited the throughput to about 200MB/s (mem-to-mem), accupying the CPU all that time.

 

To get some real-world benefit, you could put a DMA controller in logic (e.g. axi-central-dma) and connect that to the ACP port. Then it can truly take over some memcpy tasks from the CPU.

 

The PL330 is useful for tasks like feeding audio data to/from logic. Without logic, it's utterly useless, because none of the Zynq's periferals have any events linked to the controller, so you cannot even use it for SPI or UART access for example. All it can do without logic is mem-to-mem transfers without external events, and even that it doesn't do well.

 

What could also work is to use non-cacheable memory regions. These can be copied by the DMA controller without synchronizing the cache. The penalty on accessing this memory from the CPU is quite high though, a byte-by-byte access will run at only 1/20th of the normally cached speed. So this will only work well if you access the memory regions sequentially and using large word instructions (e.g. NEON code).

0 Kudos