UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Observer ariels1
Observer
8,508 Views
Registered: ‎07-22-2013

BRAM DMA transfer limitation

Hi everyone,

 

i use the zc702 with xilinx linux.

we use BRAM in the PL.

 

The BRAM size is larger than 8192 bytes.( for testing we did it 256KB ).

read and write to address less than 8192,  with either memcpy or dma-driver( using pl330)  works fine.

 

but when i try to write to address 8192 and above ,

1) without the DMA:  (using mmap )

      BRAMBaseAddr[8192] = j++;

i get :

Unhandled fault: external abort on non-linefetch (0x818) at 0xb6f10000
Bus error

 

2)with DMA driver for the pl330 for buffer size of 10000 (BRAM is much more larger ):

using this code for  write :   write(bram_fd,(char*)w_bramBuf,MAX_BUF_LEN);

i get :

   dma-pl330 f8003000.ps7-dma: Reset Channel-0 CS-e FTC-20000

using this code for read:    ret = read(bram_fd,r_bramBuf,MAX_BUF_LEN);

i get :

   dma-pl330 f8003000.ps7-dma: Reset Channel-0 CS-e FTC-40000

 

so my questions :

1. is there a limitation on the pl330 for max buffer transfer ?

2. if it's a DMA issue why do i have a problem also without using the dma?

 

i checked with the FPGA guy , and seems that with standalone the BRAM works OK even in 256KB. so i assume the  problem isn't there.

 

thanks for advance,

 

Ari.

 

 

 

 

0 Kudos
24 Replies
Scholar norman_wong
Scholar
8,498 Views
Registered: ‎05-28-2012

Re: BRAM DMA transfer limitation

The text "FTC-20000" means data_write_err.
The text "FTC-40000" means data_read_err.
For write err, the TRM says this:
"Indicates the AXI response that the DMAC
 receives on the BRESP bus, after the DMA channel
 thread performs a data write:
 0: OKAY response
 1: EXOKAY, SLVERR, or DECERR response. This
    fault is an imprecise abort."
The read error is much the same.
I would guess that the FPGA is not sending the right response on the AXI bus.

Visitor manigjack
Visitor
7,982 Views
Registered: ‎10-27-2014

Re: BRAM DMA transfer limitation

Hello ,

 

I am also getting the same message,

 

"dma-pl330 f8003000.ps7-dma: Reset Channel-0     CS-f FTC-40000"

 

But in my case i am just trying to use DMA to copy some data from one DDR location (eg: 0x10030000) to another memory location which i got from dma_alloc_coherent API call.

 

I am really wondering what went wrong. Is it because of AXI bus or with incorrect DMA configs?

 

These are the steps that i followed,

 

1) Copied the data from the user to kernel through copy_from_user call

2) Transferring the data from Kernel buffer to DDR location (0x10030000) by translating to Kernel virtual address.     

      ddr_start = phys_to_virt(DDR_START);

    where DDR_START corresponds to 0x10030000.

3) Initialized DMA with all necessary configs and wait for the callback

4) Finally after DMA operation giving it back to user through copy_to_user call.

 

/** Pseudo code*/

    g_dma_config.direction      = DMA_MEM_TO_MEM;
    g_dma_config.src_addr       = (unsigned long)DDR_START;    
    g_dma_config.src_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;    
    g_dma_config.src_maxburst   = 1;

 

    gp_dma_rx = dma_request_channel(mask, NULL, NULL);
    if(!gp_dma_rx)
    {
        return ENOTSUPP; /* Just a fancy return value */
    }   

    rc = dmaengine_slave_config(gp_dma_rx, &g_dma_config);
    if(0 != rc)
    {
        return ENOTSUPP; /* Just a fancy return value */        
    }
    p_dmadev = gp_dma_rx->device->dev;     


    gp_check = dma_alloc_coherent(p_dmadev, count, &addr, GFP_KERNEL);

    if(NULL == gp_check)

        return ENOTSUPP;

       
    p_dma_desc = dmaengine_prep_slave_single(gp_dma_rx, addr, count,   DMA_MEM_TO_MEM, DMA_PREP_INTERRUPT);
    if(NULL == p_dma_desc)
    {
        return ENOTSUPP; /* Just a fancy return value */     
    }
       
    /** Initializing callbacks*/    
    p_dma_desc->callback = &rxd_dma_callback;
    p_dma_desc->callback_param = NULL;
   
    dmaengine_submit(p_dma_desc);    
    /** Final stage to issue pending signal*/
    dma_async_issue_pending(gp_dma_rx);
    while(!dma_check)
    {       
    }

 

 

 

Thanks.

 

 

 

 

0 Kudos
Scholar norman_wong
Scholar
7,962 Views
Registered: ‎05-28-2012

Re: BRAM DMA transfer limitation

I don''t think you can use dmaengine_prep_slave_single() for DMA memory to memory. The pl330.c driver assumes DMA to/from device. It does not handle the direction flags very well.  I have never tried using DMA memcpy. From what I can tell, these APIs will use the prep_dma_memcpy() hook:

 

dma_async_memcpy_pg_to_pg()
dma_async_memcpy_buf_to_buf()
dma_async_memcpy_buf_to_pg()

 

 

0 Kudos
Visitor manigjack
Visitor
7,946 Views
Registered: ‎10-27-2014

Re: BRAM DMA transfer limitation

Hello Norman,

 

Thanks a lot for pointing it out.

Now the dma is working fine after using the dma_async_memcpy_buf_to_buf() API.

I can able to do transactions between two different DDR locations using DMA.

 

My code roughly looks like this:

 

    gp_dma_rx = dma_request_channel(mask, NULL, NULL);
    if(!gp_dma_rx)
    {
        return ENOTSUPP; /* Just a fancy return value */
    }   
    cookie = dma_async_memcpy_buf_to_buf(gp_dma_rx, d_base, c_base, count);
    
    while(dma_async_is_tx_complete(gp_dma_rx, cookie, NULL, NULL) == DMA_IN_PROGRESS)
    {
        dma_sync_wait(gp_dma_rx, cookie); /** just return success */
    }   

 

Where d_base - destination address

           c_base - source address

 

Thanks,

Mani.

0 Kudos
Scholar norman_wong
Scholar
7,940 Views
Registered: ‎05-28-2012

Re: BRAM DMA transfer limitation

Thanks for posting your results. I am sure it will help me or other people in the future.

0 Kudos
Visitor manigjack
Visitor
7,866 Views
Registered: ‎10-27-2014

Re: BRAM DMA transfer limitation

Hello @norman_wong 

 

Sorry to pull this topic again. I was trying to transfer data from PL to DDR as you and @jqhu have already done it.

I have done the steps as mentioned in the following post, 

[ http://forums.xilinx.com/t5/Embedded-Linux/PS-DMA-driver-in-Linux/td-p/481520 ],

but i am getting the following message while running,

"dma-pl330 f8003000.ps7-dma: Reset Channel-0     CS-f FTC-40000" - Which is not desired since dma operation is not done i guess.

 

This is my code snippet,

#############################################################

 

    gp_dma_rx = dma_request_channel(mask, NULL, NULL);
    if(!gp_dma_rx)
    {
        return ENOTSUPP; /* Just a fancy return value */
    }  
    memset(&g_dma_config,0, sizeof(g_dma_config));
    /** config values */
    g_dma_config.direction      = DMA_DEV_TO_MEM;
    g_dma_config.src_addr       = (unsigned long)BRAM_START;   /**BRAM_START - 0x40000000*/
    g_dma_config.src_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;          
    g_dma_config.src_maxburst   = 1;               

   dmaengine_slave_config(gp_dma_rx, &g_dma_config);
    p_dmadev = gp_dma_rx->device->dev;
    char *cpu_addr = dma_alloc_coherent(p_dmadev, tot_bytes, &addr, GFP_KERNEL); /**tot_bytes = 128 bytes*/

    /** Third step - Getting the transcriptor */
    p_dma_desc = gp_dma_rx->device->device_prep_dma_memcpy(gp_dma_rx, addr,
                    g_dma_config.src_addr, tot_bytes, DMA_PREP_INTERRUPT);        
    /** Initializing callbacks*/    
    p_dma_desc->callback = &rxd_dma_callback;
    p_dma_desc->callback_param = NULL;
    
    /** Fourth stage to submit the obtained descriptor*/
   dmaengine_submit(p_dma_desc);   
    /** Final stage to issue pending signal*/
    dma_async_issue_pending(gp_dma_rx);  
    

   /**Timeout handling*/
    init_waitqueue_head (&wait);    
    wait_event_timeout(wait, dma_check == 1, 3*HZ);
    if(!dma_check)
    {
        printk(KERN_INFO"Timeout happened !! \n\r");
        goto end;   
    }       
    copy_to_user(p_buf, cpu_addr, tot_bytes);

##################################################################

 

When i run this code everytime the driver goes to timeout and prints the dma failed message on the console.

Can you suggest me what i have to take care apart from this?

 

Second question would be,  can the PL memory can be addressed similar to a nomal memory region?

Put it in other way, can i do DMA_MEM_TO_MEM copy between DDR and PL since PL registers are also part of the system address space so, Is it not possible to address like a normal memory read/write operation?

 

Thanks.

 

0 Kudos
Visitor jqhu
Visitor
7,859 Views
Registered: ‎02-20-2009

Re: BRAM DMA transfer limitation

As I cross-checked with my code, it seemed I didn't call dmaengine_slave_config(); and I guess dmaengine_slave_config is the call-back function name?  I assigned the function directly to p_dma_desc->callback, so it is also possible that the callback function is not addressed correctly?

0 Kudos
Scholar norman_wong
Scholar
7,844 Views
Registered: ‎05-28-2012

Re: BRAM DMA transfer limitation

Your code seems okay. I guess. You mix a DEV_TO_MEM config with a memcpy prep. Not sure if the DMA engine will like that. Some DMA scenarios so far:

1) Unmodified Peripheral DMA
- Config uses DEV_TO_MEM or MEM_TO_DEV
- Uses one of the prep_slave_sg() macros.
- Peripheral address is NOT incremented. FIFO register is assumed.
- Memory address is incremented.
- Unmodified pl330.c will use WFB, LDP and STP instructions. This means the PL330 will expect handshake signals from the PL. If PL does not send the signals, the DMA times out.

2) Modified Peripheral DMA
- Config uses DEV_TO_MEM or MEM_TO_DEV
- Uses one of the prep_slave_sg() macros.
- Peripheral address is NOT incremented.  FIFO register is assumed.
- Memory address is incremented.
- Modified pl330.c will use LD and ST instructions. WFP calls removed. This means the PL330 will NOT expect handshake signals from the PL. If PL does not send the signals, the DMA continues anyways. Even if the PL is not ready.

3) Unmodified Memory DMA
- Config uses MEM_TO_MEM.
- Use one of the prep_dma_memcpy() macros.
- Source and Destination Memory address are incremented.
- Unmodified pl330.c will use LD and ST instructions. No PL330 handshaking is expected. Whatever the PL330 is talking to, must keep up.


PL BRAM can be accessed with just a normal memory read/write operation like memcpy(). I think this is usually called Programmed IO (PIO) to differentiate it from DMA. The point of DMA is to off load the processor and allow the processor to do other things. In all the examples shown so far, there is busy  wait polling loop. In a production driver, you would not busy wait poll.

0 Kudos
Visitor ghalady
Visitor
7,811 Views
Registered: ‎02-12-2014

Re: BRAM DMA transfer limitation

Hi Norman / jqhu,

                 Could you guys please share the DMA test client source for pl330 DMA from DDR to BRAM (@4000_0000) if this has finally worked for you guys.  I too have this standard BRAM (8K now to start with) in PL and am trying to get DMA work on Petalinux, though I was successful in getting the baremetal test work.

 

girish.haaldy@gmail.com

 

Thanks

Girish

0 Kudos
Visitor ghalady
Visitor
9,441 Views
Registered: ‎02-12-2014

Re: BRAM DMA transfer limitation

correction : the e-mail is girish.halady@gmail.com
0 Kudos
Visitor manigjack
Visitor
9,435 Views
Registered: ‎10-27-2014

Re: BRAM DMA transfer limitation

Hello @norman_wong ,

 

Thanks for the detailed explanation and you are correct i mixed the API usage.

 

Also for my situation the first two cases which you have posted are not suitable since i dont have a FIFO in PL side.

I tried to implement third case and it is working without any issues.

 

Now i could able to do transaction between DDR to PL and vice-versa too using DMA.

Thanks a lot for your suggestions.

 

@jqhu thanks for the response and i have removed the dmaengine_slave_config() which is unnecessary in this context as you have done the same and there is no problem with the callback function since it is being called everytime when i try to do dma operation.

 

My observation was I could able to achieve a bandwidth of around 40MB/s guess might be correct...

 

But sometimes i noticed during the test that the data transfer was incorrect especially some data is not copied properly from PL to DDR. It seems to copy zero values and the rest are perfectly correct. Dont know why it is transferring wrong data sometimes.......

 

My code roughly looks like below: [for DDR - PL case]

 

********************************************************************

    /** First stage to get request slave channel*/
    gp_dma_rx = dma_request_channel(mask, NULL, NULL);
    if(!gp_dma_rx)
    {
        printk(KERN_INFO"slave request failed... \n\r");
        return ENOTSUPP; /* Just a fancy return value */
    }
    gp_dmadev = gp_dma_rx->device->dev;
    cpu_addr = dma_alloc_coherent(gp_dmadev, tot_bytes, &g_addr, GFP_KERNEL);         
    memcpy(cpu_addr, p_buf, tot_bytes);  /** copying the user space data to cpu_addr*/


   /** BRAM_START corresponds to 0x61C200000*/
    p_dma_desc = gp_dma_rx->device->device_prep_dma_memcpy(gp_dma_rx, BRAM_START,
                                             g_addr,tot_bytes, DMA_PREP_INTERRUPT);
    if(NULL == p_dma_desc)
    {
        printk(KERN_INFO"Third stage failed go to exit\n\r");
        return ENOTSUPP; /* Just a fancy return value */
    }
    /** Initializing callbacks*/    
    p_dma_desc->callback = &rxd_dma_callback;
    p_dma_desc->callback_param = NULL;    
    /** Submit the obtained descriptor*/
    dmaengine_submit(p_dma_desc);    
     
    /** Final stage to issue pending signal*/
    dma_async_issue_pending(gp_dma_rx);  
    /** Wait for 3seconds to complete or terminate*/
    wait_event_timeout(wait, gdma_check == 1, 3*HZ);
    if(!gdma_check)
    {
        printk(KERN_INFO"Timeout happened !! \n\r");
        goto end;   
    } 

 

********************************************************************

Thanks,

Mani.

0 Kudos
Scholar norman_wong
Scholar
9,417 Views
Registered: ‎05-28-2012

Re: BRAM DMA transfer limitation

@ghalady
There are snippets of PL330 DMA code throughout this forum. For both older and newer kernels. Each person will tend to address their specific DMA needs. You will need to extrapolate from what they have done. I posted an example here:
  http://forums.xilinx.com/t5/Embedded-Linux/PL330-DMAC-Test-Code/td-p/491154

Search the forum for terms like DMA, PL330 and Zynq, There are quite a few posts.

0 Kudos
Scholar norman_wong
Scholar
9,416 Views
Registered: ‎05-28-2012

Re: BRAM DMA transfer limitation

@manigjack
I have seen corrupted data values during DMA. In my case, the PL timing was a bit off. The PL was removing the data before the PS had a chance to read it. I don't know the specifics. My FPGA guy did all that. I have also seen data problems if the burst length is too large. The combination of bus width, burst length has to match on both ends, DDR and PL. Typically DDR has BL of 8 words. The PL is much deeper and wider. Check your pl330.c to see if brst_len is actually used. Last I checked, it was hard-coded to 16 or 1 depending on mode.

0 Kudos
Visitor ghalady
Visitor
9,378 Views
Registered: ‎02-12-2014

Re: BRAM DMA transfer limitation

Hi Norman,

              Thanks for the pointer, I was looking for a specific example code where the new (Samsung) PL330 DMA driver used in Petalinux / VIVADO 2014.4 toolchain is used. Specifically, I was looking for an example application to do DMA in User-space  between DDR and PL BRAM (@4000_0000 through 4001_FFFF) . Though the driver provides a list of API's wsan't sure how to invoke these API's in the user-space as none of these are visible in the user-space apparently, and going by John's presentation "Linux_DMA_from_User_Space-public.pdf, the mmap() system call apparently is not implemented by the samsung pl330.c driver. would greatly help if you could help with a quick example of how I go about using this DMA driver.

 

Best Regards

Girish

0 Kudos
Scholar norman_wong
Scholar
9,372 Views
Registered: ‎05-28-2012

Re: BRAM DMA transfer limitation

To my knowledge, there is no user space DMA API built into the kernel. DMA is pretty well always done in kernel space.

 

I haven't looked at John's presentation in detail. I can't help you on that. Good luck.

 

0 Kudos
Visitor ghalady
Visitor
9,359 Views
Registered: ‎02-12-2014

Re: BRAM DMA transfer limitation

Thanks Norman,

                    John would you please be able to help as Norman aptly pointed out, I am trying to configure the DMAEngine from a user-space application more like what the device driver would do inside the kernel space.  More specifically, I am trying to get to do DMA between DDR (unused Kernel space) and BRAM implemented in PL from the userspace.  Any help would be greatly appreciated.

 

 

Best Regards
Girish Halady

0 Kudos
Visitor ghalady
Visitor
9,347 Views
Registered: ‎02-12-2014

Re: BRAM DMA transfer limitation

Hi Mani,

            Could you please share your driver code example for BRAM access on Linux? I could get the BRAM access to work with mmap() call but this only does one access at a time, I need to be able to do DMA from DDR to BRAM. If you did have an example which has worked for you, please do share.

 

Thanks

Girish

0 Kudos
Xilinx Employee
Xilinx Employee
9,334 Views
Registered: ‎09-10-2008

Re: BRAM DMA transfer limitation

Hi Girish,

 

Happy New Year (not quite, but almost for me).  

 

I attached the source files for the project I did when I wrote that presentation.  This example was used with an AXI DMA in the PL but all DMA engines that hook into the Linux DMA engine generally look the same when using the Linux DMA API (other than a few details unique to each).  

 

This example was simple in that it was not using any scatter gather so you need to understand it was a starting point only. User space DMA is not trivial as you are implementing a kernel driver that will facilitate the user space access since Linux does not provide user space DMA.

 

Thanks

John

Visitor ghalady
Visitor
9,319 Views
Registered: ‎02-12-2014

Re: BRAM DMA transfer limitation

Thanks John,

            Did see your other presentation and let me try this one too. HAPPY NEW YEAR to you too.

 

Best

Girish

0 Kudos
Visitor manigjack
Visitor
4,101 Views
Registered: ‎10-27-2014

Re: BRAM DMA transfer limitation

Hello Girish,

 

You could observe some code snippets present in this discussion which forms the important portion of my driver where in the actual DMA operations is initialized and configured.

 

Right now my driver looks ugly so i am in the process of improving the driver code and also i am facing data corruption during some tests as I mentioned already. Once it is done i will try to post the entire code in the forum.

 

 

Thanks,

Mani.

0 Kudos
Adventurer
Adventurer
3,623 Views
Registered: ‎05-12-2012

Re: BRAM DMA transfer limitation

Hi. I try to use dma_proxy kernel module. It is really work. But for read transaction I cannot receive more than 128KB. Could anyone help me with it?

0 Kudos
Observer bencai
Observer
3,254 Views
Registered: ‎05-12-2014

Re: BRAM DMA transfer limitation

Hi John and All,

 

I've been working on a design which requires to transfer large amount (>150MB/s) of data from device to memory over multiple DMA transmissions (s2mm only). So far, the design, which based on John's dma_proxy.c driver is operational for a single transmission limited to s2mm_length, the challenge I face is to develop a driver to accommodate multiple transmissions.

 

There is no requirement to process the data during transmissions as it would be done so "offline" on a different PC.  The Linux Kernel is configured to allocate sufficient memory (CMA) for the worst case requirement.

 

The hardware appears to be functional and it’s arrangement is: Data is fed to a deep FIFO which pipes into a AXI stream. TLAST is generated every N bytes consistent with the value of DATA_SIZE (s2mm_length).  The state/count of the FIFO is monitored by an UIO driver which is can be used to support DMA transmissions. 

 

It is not clear to me if the dma_proxy driver can accommodate multiple transmissions without overwriting the contents in the memory.

What are the methods to achieve this?   Much appreciate your advice….

 

Thanks,

Ben

 

0 Kudos
Explorer
Explorer
2,182 Views
Registered: ‎10-19-2017

Re: BRAM DMA transfer limitation

Please see my response

https://forums.xilinx.com/t5/Embedded-Linux/Using-a-DMA-Proxy-Driver/m-p/828236/highlight/true#M24067

 

This driver is not functional in 2017.3 unless I loaded it improperly. I attempted to load it with insmod. You will note that earlier I was successfully able to run the axi dma test on the same system

http://www.wiki.xilinx.com/DMA+Drivers+-+Soft+IPs

 

Run on a zc706 evaluation board. If you can run it successfully still having built Linux in 2017.3, the issue may be my hdf.

 

Thanks.

0 Kudos
Visitor fguimond
Visitor
762 Views
Registered: ‎08-29-2018

Re: BRAM DMA transfer limitation

John, I think it would be useful to create an AR ( or similar official documentation, maybe wiki page? ) with your DMA proxy example instead of being buried in a forum thread...
0 Kudos