07-16-2014 04:43 PM
07-16-2014 07:08 PM
07-17-2014 10:07 AM
A bit more detailed release notes:
- PL code needs to respond to PL330 DMA burst handshaking. See Zynq TRM. PL code is not included here.
- PL code needs implement a 32-bit FIFO style register at 0x40000000. This base address can be changed via module parameters.
- PL code needs to use peripheral channel 0. This can be changed via module parameters.
- Modification of the pl330.c of the 2014.1 linux kernel from the Xilinx Git.
- Memory to memory code is left unmodified. Only peripheral code has been modified.
- Add support for burst.
- Removed support for single mode. Burst mode with a length of 1 should be equivalent.
- Add support for unrolled loops
- Add support for reduced handshaking. Original code WFP and FLUSHP on every word. Support option to WFP and FLUSHP once per transaction.
- Add support for PL330 specific config.
- General refactoring for readability or hoped for speed. Add comments.
- In theory, modifications are backwards compatible. Should not break the existing platforms that use the PL330 driver.
- File passes the linux checkpatch script.
- Implements a devfs interface to the PL330 driver. Device file is "/dev/hwfifo".
- Assumes a 32-bit FIFO register in the FPGA at a physical address of 0x40000000. Address can be changed via module parameters.
- Assumes a peripheral channel of 0. This can be changed via module parameters.
- Uses the newly added PL330 specific config. Will work with original driver but transfer speeds will be slower as every word will have a WFP and FLUSHP.
- Burst length defaults to one.
- See readme.txt in source for details on module parameters and usage examples.
- Zynq 7010
- PL at 100Mhz. PL implements 32-bit register that incrementing sequence for data verification.
- FPGA to DDR has been tested.
- DDR to FPGA has NOT been tested. Outside my project requirements.
- DDR is 16-bit, BL=8
- DMA burst=4. Higher burst lengths result is data corruption.
- Maximum transfer size tested is 128KB. At this size, transfer rate is about 200MB/s.
No warranty. No guarantees.
10-31-2014 08:31 AM
I am trying to move data from the DRAM to M_AXI_GP port via the PS DMA Controller. I haven't use Linux drivers before so am having trouble getting started. I would appreciate just a little bit of pointers to get going.
I have C code (Linux app) that does a memcpy() today to move data. This is very slow (I don't quite need the CPU cycles as the ARM processor doesn't have much to do in my case right now). I want to increase throughput and use the hard DMA controller (not instantiate a new DMA if I can avoid it).
What do I have to do to replace the memcpy() with DMA calls? I have a 3.14.0-69913493 kernel with CONFIG_XILINX_DMA=y in the .config and HAS_DMA=y as well. This suggests that I don't need to build a loadable kernel module, correct?
As you can see, I am quite lost and something basic will help me a lot. After that I can perhaps use your (and others) useful posts more.
10-31-2014 11:24 AM
For the PL330, the .config has "CONFIG_PL330_DMA=y". I believe CONFIG_XILINX_DMA is for the Xilinx soft IPs. In the 2014.3/3.15 kernel, both PL330 and Xilinx DMA is compiled into the kernel. The devicetree will also need a definition to select which one is active.
All DMA operations occur in kernel space. Userspace needs a way to access the kernel space. There is no standard way for userspace to access DMA. That is where the loadable kernel module comes in. You have to write a kernel module to use the DMA from userspace.
An additional complication is virtual and physical memory. DMA requires the physical addresses that are usually only accessed in kernel space. Virtual memory also implies a paging of memory. Pages of memory may be swapped out at any time. DMA requires memory to be permanently swapped in for the duration of the transfer. For these reasons, DMA to userspace is difficult, non-portable and usually avoided. Most code usualy double buffers to move data between kernel and userspace. I believe John Linn posted some presentations to show how to do this specifically for the Zybqw.
I'd suggest reading the Linux Device Drivers book (3rd or 4th edition). I think it available online for free.
10-18-2015 06:24 AM
10-23-2015 06:49 AM
Using the PL330 to do memcpy() of data is not not going to help, been there, done that.
The PL330 doesn't have access to the CPU caches (coherency), so the caches must be flushed manually by the CPU. This operation is cumbersome and takes up a LOT of CPU cycles. So much even, that it provides hardly any improvement at all over just copying the memory using the CPU, even if the memory blocks are larger than the L2 cache. The "flush cache" operation limited the throughput to about 200MB/s (mem-to-mem), accupying the CPU all that time.
To get some real-world benefit, you could put a DMA controller in logic (e.g. axi-central-dma) and connect that to the ACP port. Then it can truly take over some memcpy tasks from the CPU.
The PL330 is useful for tasks like feeding audio data to/from logic. Without logic, it's utterly useless, because none of the Zynq's periferals have any events linked to the controller, so you cannot even use it for SPI or UART access for example. All it can do without logic is mem-to-mem transfers without external events, and even that it doesn't do well.
What could also work is to use non-cacheable memory regions. These can be copied by the DMA controller without synchronizing the cache. The penalty on accessing this memory from the CPU is quite high though, a byte-by-byte access will run at only 1/20th of the normally cached speed. So this will only work well if you access the memory regions sequentially and using large word instructions (e.g. NEON code).