cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
FrKr
Visitor
Visitor
243 Views
Registered: ‎03-18-2021

Slow copy from reserved memory. Problems getting data from Kernel to User space.

Hi everyone,

 

I am currently working with an Ultra96 board and evaluating the platform for our company. First of all this is our (and my) first steps into embedded Linux programming as we want to switch from QNX. We did some tests to evaluate write performance to SATA USBs connected via USB3 and want to setup a test chain which receives data from the AXI DMA into a buffer in a kernel module. This data shall then be read via the character device (either read or through a memory map) to an user-space application. This application shall then write the data to the SSD.

The current test setup is excluding the DMA as we setup a new kernel module after some mistakes in the first one. Currently we are seeing problems with the data rate from kernel-space to user-space with around 62 MB/s which is a bit slow. The data is transfered from the Kernel Module to the User Space Application via the read() function. The read() function in the kernel module is implemented via the copy_to_user function as below:

static ssize_t dev_read( struct file *p_file, char *p_buf, size_t len, loff_t *p_offset ) {
    int n_err_cnt = 0;
    int n_msg_size = 0;
    
    if ( len > p_dma->n_dma_mem_size )
        n_msg_size = p_ndt_dma->n_dma_mem_size;
    else
        n_msg_size = len;

    n_err_cnt = copy_to_user( p_buf, p_dma->p_dma_mem_addr, n_msg_size );

    if ( n_err_cnt == 0 ) {
        return n_msg_size;
    } else {
        printk( "%s: Failed to send characters to user.\n", DRIVER_NAME );
        return -EFAULT;
    }
}

 

p_dma->p_dma_mem_addr is the pointer from dma_alloc_coherent in the probe function. The allocation can be seen below:

    /* Get reserved memory information */
    rc = of_reserved_mem_device_init( dev );
    if ( rc ) {
        dev_err( dev, "Could not initialized reserved mem region.\n");
        return -1;
    }

    /* Set DMA coherent mask */
    dma_set_coherent_mask( dev, 0xFFFFFFFF );

    /* Set buffer size */
    p_dma->n_dma_mem_size = DMA_BUF_SIZE; 

    /* Allocate memory */
    p_dma->p_dma_mem_addr = dma_alloc_coherent( dev, p_dma->n_dma_mem_size, &(p_dma->dma_handle), GFP_KERNEL );
    dev_info( dev, "Allocated DMA coherent mem to vaddr: 0x%p.\n", p_dma->p_dma_mem_addr );

 

The system-user.dtsi has the following entries for the driver module:

/ {
    reserved-memory {
        #address-cells = <2>;
        #size-cells = <2>;
        ranges;
  
        data_reserved: buffer@0 {
            compatible = "shared-dma-pool";
            no-map;
            reg = <0x0 0x40000000 0x0 0x4000000>;
            linux,cma-default;
        };
    };
};

&axi_dma_0 {
    compatible = "dma-drv";
    memory-region = <&data_reserved>; 
};

 

I tried allocating a memory region in the kernel module via kmalloc and transfering this which resulted in data rates around 1800 MB/s which is more in the region I was expecting.

 

Anyone having similar problems or knows a solution to this? Or should I reserve the memory differently?

 

Thanks in advance and best regards,

Fred

Tags (2)
0 Kudos
1 Reply
kawazome
Adventurer
Adventurer
197 Views
Registered: ‎04-02-2014

Perhaps the CPU's data cache is not enabled. To enable the data cache, use the "reusable" property instead of the "no-map" property.

/ {
    reserved-memory {
        #address-cells = <2>;
        #size-cells = <2>;
        ranges;
  
        data_reserved: buffer@0 {
            compatible = "shared-dma-pool";
            reusable;
            reg = <0x0 0x40000000 0x0 0x4000000>;
        };
    };
};

&axi_dma_0 {
    compatible = "dma-drv";
    memory-region = <&data_reserved>; 
};

 

When using the Linux Kernel reserved-memory from a DMA device, there are two different memory pooling mechanisms.
Specify which mechanism to use by specifying the "no-map" or "reusable" property in the device tree.

In case of "no-map", the DMA memory pool mechanism in kernel/dma/coherent.c is used.
In case of "reusable", the CMA memory pool mechanism in kernel kernel/dma/contiguous.c is used.

When using the DMA memory pool mechanism with the "no-map" property, dma_init_coherent_memory() in kernel/dma/coherent.c disables the data cache.

When using the CMA memory pool mechanism with "reusable", attention should be paid to the cache coherency between the DMA device and the CPU.

For example, when the CPU reads the DMA buffer, it must be sandwiched between dma_sync_single_for_cpu () and dma_sync_single_for_device () as follows:

static ssize_t dev_read( struct file *p_file, char *p_buf, size_t len, loff_t *p_offset ) {
    size_t     result = 0;
    size_t     n_err_cnt = 0;
    size_t     n_msg_size = 0;
    dma_addr_t dma_addr;
    void*      mem_addr;
    
    if ( *p_offset >= p_dma->n_dma_mem_size ) {
        result = 0;
        goto failed;
    }

    if ( *p_offset + len > p_dma->n_dma_mem_size )
        n_msg_size = p_dma->n_dma_mem_size - *p_offset;
    else
        n_msg_size = len;

    dma_addr = p_dma->dma_handle     + *p_offset;
    mem_addr = p_dma->p_dma_mem_addr + *p_offset;

    dma_sync_single_for_cpu( p_dma->dev, dma_addr, n_msg_size, DMA_FROM_DEVICE);

    n_err_cnt = copy_to_user( p_buf, mem_addr, n_msg_size );

    dma_sync_single_for_device( p_dma->dev, dma_addr, n_msg_size, DMA_FROM_DEVICE);

    if ( n_err_cnt != 0 ) {
        printk( "%s: Failed to send characters to user.\n", DRIVER_NAME );
        result = -EFAULT;
        goto failed;
    }

    *p_offset += n_msg_size;
    result     = n_msg_size;
  failed:
    return result;
}

Cache coherency is a bit complicated, see another source for more information.