UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Adventurer
Adventurer
4,618 Views
Registered: ‎07-08-2016

memcpy_to_io() on axi4-lite slave address range behaving oddly in my device driver

tl;dr: Is there something preventing one from directly reading and writing to the axi4-lite slave memory space under linux, assuming it has been correctly mapped into virtual memory? When I write to the input registers of my hardware block, then read the input registers to ensure it was correct, the data differs from what I wrote to it.

 

I wrote an RSA encryption block in HLS for the Zedboard, and now I'm trying to talk to it in Linux (xilinx linux kernel). I've already tested that the RSA block works as expected in hardware on a bare-metal system.

 

Because of reasons, I'm not going with the Xilinx UIO framework to talk to the hardware under linux, but rather I'm writing my own custom device drivers. I've successfully done this for both an AES and SHA256 hardware block, and am using those working drivers as a template for RSA.

 

For both AES and SHA, I am able to write the input parameters to their respective AXI reigisters using standard file operations, and the standard copy_to_user(kernel_buf, userspace_buf, ...) followed by memcpy_to_io(virtual_address, kernel_buf, ...). However, when I attempt to do the exact same thing for RSA (and then examine virtual_address to ensure the writes were successful), I notice that only a few of the bytes remain in memory, and the rest are overwritten to zero automatically.

 

For example, the below write function yields the following output (note the print statements).

 

typedef struct {
    char base[RSA_SIZE_BYTES];
    char exponent[RSA_SIZE_BYTES];
    char modulus[RSA_SIZE_BYTES];
} RSAPublic_t;


/* Stuff here */
.....
...

static ssize_t wsrsa_write(struct file *filep, const char *buffer, size_t len, loff_t *offset) { RSAPublic_t PublicData; // Memory for bytes passed from userspace // copy base,exponent,modulus from userspace-->kmem struct copy_from_user(&PublicData, buffer, sizeof(RSAPublic_t)); // copy base from kmem into AXI memory memcpy_toio(vbaseaddr+XWSRSA1024_AXILITES_ADDR_BASE_V_DATA, PublicData.base, RSA_SIZE_BYTES); print_hex_dump_bytes(".base = ",0, PublicData.base, RSA_SIZE_BYTES); // copy exponent from kmem into AXI memory memcpy_toio(vbaseaddr+XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA, PublicData.exponent, RSA_SIZE_BYTES); print_hex_dump_bytes(".exp = ",0,PublicData.exponent,RSA_SIZE_BYTES); // copy modulus from kmem into AXI memory memcpy_toio(vbaseaddr+XWSRSA1024_AXILITES_ADDR_MODULUS_V_DATA, PublicData.modulus, RSA_SIZE_BYTES); print_hex_dump_bytes(".modulus = ",0,PublicData.modulus,RSA_SIZE_BYTES); printk(KERN_DEFAULT "--------------------------------------------------------------"); print_hex_dump_bytes("Kbase_dest = ",0,vbaseaddr+XWSRSA1024_AXILITES_ADDR_BASE_V_DATA, RSA_SIZE_BYTES); print_hex_dump_bytes("Kexp_dest = ",0,vbaseaddr+XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA,RSA_SIZE_BYTES); print_hex_dump_bytes("Kmod_dest = ",0,vbaseaddr+XWSRSA1024_AXILITES_ADDR_MODULUS_V_DATA,RSA_SIZE_BYTES); // start RSA block to encrypt/decrypt wsrsa_runonce_blocking(); printk(KERN_INFO "wsrsa1024: Received message of length %zu bytes from userspace\n", len); return len; }

 

yielding this output:

 

[160037.467344] .base    = 21 64 6c 72 6f 57 20 2c 6f 6c 6c 65 48 00 00 00  !dlroW ,olleH...
[160037.467356] .base    = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[160037.467366] .base    = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[160037.467376] .base    = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[160037.467386] .base    = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[160037.467435] .base    = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[160037.467452] .base    = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[160037.467473] .base    = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[160037.467496] .exp     = 01 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[160037.467516] .exp     = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[160037.467525] .exp     = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[160037.467534] .exp     = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[160037.467543] .exp     = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[160037.467553] .exp     = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[160037.467562] .exp     = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[160037.467570] .exp     = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[160037.467580] .modulus = 49 f5 eb 73 5b 82 9c eb 4b c2 af 74 64 29 38 a8  I..s[...K..td)8.
[160037.467590] .modulus = af 7e a4 77 ba 9c 79 b6 9b 5e 65 bc ba 74 84 3e  .~.w..y..^e..t.>
[160037.467599] .modulus = 84 bf 5c d4 d1 f4 ec d4 83 3d c6 9b 7b 52 5c 2f  ..\......=..{R\/
[160037.467608] .modulus = 25 79 6d 21 79 b3 31 7a 0d ad b1 b9 dc 5f e5 3d  %ym!y.1z....._.=
[160037.467617] .modulus = 13 21 f6 fb 97 1a fb b9 7f 4d 26 0f 10 37 ea ea  .!.......M&..7..
[160037.467626] .modulus = ec 97 a4 79 37 fb 62 33 9e b3 28 c4 30 8a a6 94  ...y7.b3..(.0...
[160037.467635] .modulus = 9a 9f 0d df e2 f5 b4 1f 25 4f e1 6f 35 bf 82 bf  ........%O.o5...
[160037.467644] .modulus = e6 a2 a0 15 80 a1 69 97 d8 3d 85 88 9e 88 4d d9  ......i..=....M.
[160037.467652] --------------------------------------------------------------
[160037.474358] Kbase_dest = 21 00 00 00 6f 00 00 00 6f 00 00 00 48 00 00 00  !...o...o...H...
[160037.474536] Kbase_dest = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[160037.474551] Kbase_dest = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[160037.474566] Kbase_dest = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[160037.474581] Kbase_dest = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[160037.474596] Kbase_dest = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[160037.474610] Kbase_dest = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[160037.474625] Kbase_dest = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[160037.474640] Kexp_dest  = 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[160037.474654] Kexp_dest  = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[160037.474669] Kexp_dest  = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[160037.474683] Kexp_dest  = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[160037.474697] Kexp_dest  = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[160037.474712] Kexp_dest  = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[160037.474726] Kexp_dest  = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[160037.474741] Kexp_dest  = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[160037.474755] Kmod_dest  = 49 00 00 00 5b 00 00 00 4b 00 00 00 64 00 00 00  I...[...K...d...
[160037.474770] Kmod_dest  = af 00 00 00 ba 00 00 00 9b 00 00 00 ba 00 00 00  ................
[160037.474785] Kmod_dest  = 84 00 00 00 d1 00 00 00 83 00 00 00 7b 00 00 00  ............{...
[160037.474799] Kmod_dest  = 25 00 00 00 79 00 00 00 0d 00 00 00 dc 00 00 00  %...y...........
[160037.474813] Kmod_dest  = 13 00 00 00 97 00 00 00 7f 00 00 00 10 00 00 00  ................
[160037.474828] Kmod_dest  = ec 00 00 00 37 00 00 00 9e 00 00 00 30 00 00 00  ....7.......0...
[160037.474843] Kmod_dest  = 9a 00 00 00 e2 00 00 00 25 00 00 00 35 00 00 00  ........%...5...
[160037.474857] Kmod_dest  = e6 00 00 00 80 00 00 00 d8 00 00 00 9e 00 00 00  ................

I imagine that after the memcp_to_io() calls, vbaseaddr+XWSRSA1024_AXILITES_ADDR_XXX_V_DATA should exactly match the PublicData.XXX kernel buffers. But it looks like a bunch of the bytes turn into zeros.

 

Is there something going on with the memory space at the locations of the input data ports that I am missing? Why are they all zeros (except for a few random bytes) after I write to them?

 

I map it into kernel memory like this:

 

static int __init wsrsa_init(void)
{
    int ret = 0; 
    printk(KERN_INFO "wsrsa1024: Initializing the wsrsa LKM\n");

    // request physical memory for driver 
    if (!request_mem_region(WSRSABASEADDR, SZ_64K, "wsrsa")) {
        printk(KERN_ALERT "wsrsa failed to request memory region\n");
        return -EBUSY;
    }
    // map reserved physical memory into into virtual memory TODO dtc support
    vbaseaddr = ioremap(WSRSABASEADDR, SZ_64K);
    if (! vbaseaddr) {
        printk(KERN_ALERT "wsrsa unable to map virual memory\n");
        release_mem_region(WSRSABASEADDR, SZ_64K);
        return -EBUSY;
    }
    vbaseaddr = ioremap(WSRSABASEADDR, SZ_64K);
    printk(KERN_INFO "wsrsa1024: Virtual Address = 0x%X\n", (unsigned int)vbaseaddr);

    // Try to statically allocate a major number for the device driver
    ret = register_chrdev(MAJOR_NUM, DEVICE_NAME, &fops);
    if (ret < 0) {
        printk(KERN_ALERT "wsrsa failed to register major number %d\n",MAJOR_NUM);
        return ret;
    }
    printk(KERN_INFO "wsrsa1024: registered correctly with major number %d\n", MAJOR_NUM);    

    // Register the device class with sysfs
    wsrsacharClass = class_create(THIS_MODULE, CLASS_NAME);
    if (IS_ERR(wsrsacharClass)) {              // Check for error and clean up if there is
        unregister_chrdev(MAJOR_NUM, DEVICE_NAME);
        printk(KERN_ALERT "wsrsa1024: Failed to register device class\n");
        return PTR_ERR(wsrsacharClass);          // Correct way to return an error on a pointer
    }
    printk(KERN_INFO "wsrsa1024: device class registered correctly\n");

    // Register the driver for the device class with sysfs
    wsrsacharDevice = device_create(wsrsacharClass, NULL, MKDEV(MAJOR_NUM, 0), NULL, DEVICE_NAME);
    if (IS_ERR(wsrsacharDevice)) {             // Clean up if there is an error
        class_destroy(wsrsacharClass);           // Repeated code but the alternative is goto statements
        unregister_chrdev(MAJOR_NUM, DEVICE_NAME);
        printk(KERN_ALERT "wsrsa1024: Failed to create the device\n");
        return PTR_ERR(wsrsacharDevice);
    }
    printk(KERN_INFO "wsrsa1024: device class created correctly\n"); // Made it! device was initialized

    // init hardware parameters 
    printk(KERN_INFO "wsrsa1024: initializing wsrsa block to mode ENCRYPT\n");
    mode = ENCRYPT;
    iowrite8(mode, vbaseaddr + XWSRSA1024_AXILITES_ADDR_MODE_DATA); // write new mode value to memory 

    return 0;
}

 

If you want to see the entire device driver file (not very long at all) it is posted HERE.

 

Thanks,

Brett

 

0 Kudos
41 Replies
Voyager
Voyager
4,596 Views
Registered: ‎06-24-2013

Re: memcpy_to_io() on axi4-lite slave address range behaving oddly in my device driver

Hey @bigbrett,

 

To me it looks like the AXI slave is to blame here, but I didn't find your HLS code.

Anyway, try the following from Linux ...

 

devmem2 XWSRSA1024_AXILITES_ADDR_BASE_V_DATA w 0x12345678
devmem2 XWSRSA1024_AXILITES_ADDR_BASE_V_DATA b 0x55
devmem2 XWSRSA1024_AXILITES_ADDR_BASE_V_DATA w

... and let me know the result you get.

 

Thanks,

Herbert

-------------- Yes, I do this for fun!
0 Kudos
Adventurer
Adventurer
4,585 Views
Registered: ‎07-08-2016

Re: memcpy_to_io() on axi4-lite slave address range behaving oddly in my device driver

Hi @hpoetzl Herbert,

 

Are you saying to run the following from the command line? If so, I just get

-sh: devmem2: command not found

Also, I have validated that the slave works in a bare-metal system, so I doubt it is the AXI slave.

 

You can find the HLS code HERE. The top-level function is in wsrsa2048.cpp (should be 1024, never bothered to rename it)

0 Kudos
Adventurer
Adventurer
4,574 Views
Registered: ‎07-08-2016

Re: memcpy_to_io() on axi4-lite slave address range behaving oddly in my device driver

Also, I don't see how the slave would even effect this, as I'm just writing to PS memory space....

0 Kudos
Voyager
Voyager
4,565 Views
Registered: ‎06-24-2013

Re: memcpy_to_io() on axi4-lite slave address range behaving oddly in my device driver

Regarding devmem2, you can also use devmem (typically available on older Linux distros) you just need to replace the 'w' with '32' and 'b' with '8' ... other than that, it's almost the same.

 

What address (physical) are you writing to?

I.e. what is XWSRSA1024_AXILITES_ADDR_BASE_V_DATA?

 

Best,

Herbert

-------------- Yes, I do this for fun!
0 Kudos
Adventurer
Adventurer
4,561 Views
Registered: ‎07-08-2016

Re: memcpy_to_io() on axi4-lite slave address range behaving oddly in my device driver

I'm writing to virtual addresses, since I map them in, but the physical addresses to which they translate are those that were exported by Vivado:

 

 // mirrors XPAR_WSRSA1024_0_S_AXI_AXILITES_BASEADDR in xparameters.h
#define WSRSABASEADDR 0x43C00000

// from xwsrsa1024_hw.h
#define XWSRSA1024_AXILITES_ADDR_AP_CTRL         0x000
#define XWSRSA1024_AXILITES_ADDR_GIE             0x004
#define XWSRSA1024_AXILITES_ADDR_IER             0x008
#define XWSRSA1024_AXILITES_ADDR_ISR             0x00c
#define XWSRSA1024_AXILITES_ADDR_MODE_DATA       0x010
#define XWSRSA1024_AXILITES_BITS_MODE_DATA       2
#define XWSRSA1024_AXILITES_ADDR_BASE_V_DATA     0x018
#define XWSRSA1024_AXILITES_BITS_BASE_V_DATA     1024
#define XWSRSA1024_AXILITES_ADDR_BASE_V_DATA_    0x040
#define XWSRSA1024_AXILITES_BITS_BASE_V_DATA     1024
#define XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA  0x09c
#define XWSRSA1024_AXILITES_BITS_PUBLEXP_V_DATA  1024
#define XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA_ 0x0c4
#define XWSRSA1024_AXILITES_BITS_PUBLEXP_V_DATA  1024
#define XWSRSA1024_AXILITES_ADDR_MODULUS_V_DATA  0x120
#define XWSRSA1024_AXILITES_BITS_MODULUS_V_DATA  1024
#define XWSRSA1024_AXILITES_ADDR_MODULUS_V_DATA_ 0x148
#define XWSRSA1024_AXILITES_BITS_MODULUS_V_DATA  1024
#define XWSRSA1024_AXILITES_ADDR_RESULT_V_DATA   0x1a4
#define XWSRSA1024_AXILITES_BITS_RESULT_V_DATA   1024
#define XWSRSA1024_AXILITES_ADDR_RESULT_V_DATA_  0x1cc
#define XWSRSA1024_AXILITES_BITS_RESULT_V_DATA   1024
#define XWSRSA1024_AXILITES_ADDR_RESULT_V_CTRL 0x224

They are mapped in using the following code snippet from the device driver's __init function

    // request physical memory for driver 
    if (!request_mem_region(WSRSABASEADDR, SZ_64K, "wsrsa")) {
        printk(KERN_ALERT "wsrsa failed to request memory region\n");
        return -EBUSY;
    }
    // map reserved physical memory into into virtual memory 
    vbaseaddr = ioremap(WSRSABASEADDR, SZ_64K);
    if (! vbaseaddr) {
        printk(KERN_ALERT "wsrsa unable to map virual memory\n");
        release_mem_region(WSRSABASEADDR, SZ_64K);
        return -EBUSY;
    }
    vbaseaddr = ioremap(WSRSABASEADDR, SZ_64K);
    printk(KERN_INFO "wsrsa1024: Virtual Address = 0x%X\n", (unsigned int)vbaseaddr);
0 Kudos
Adventurer
Adventurer
4,560 Views
Registered: ‎07-08-2016

Re: memcpy_to_io() on axi4-lite slave address range behaving oddly in my device driver

@hpoetzl I also do not have a devmem command.

 

root@zedboard-zynq7:~# devmem
-sh: devmem: command not found
root@zedboard-zynq7:~# devmem2
-sh: devmem2: command not found

It is a very minimal distribution built using Yocto. However, this method worked perfectly for my other two HLS crypto blocks

0 Kudos
Adventurer
Adventurer
4,523 Views
Registered: ‎07-08-2016

Re: memcpy_to_io() on axi4-lite slave address range behaving oddly in my device driver

I think I might have realized what is going on here, but please let me know if I'm wrong....

 

In my other HLS crypto designs (SHA256 and AES), the top level I/O ports were arrays on the AXI interface. However due to latency constraints when working with 1024 bit numbers, I had to use scalar arguments to my HLS function (when processing the big arguments word-by-word, latency was unacceptably large due to memory bottlenecks). I'm guessing that this made the AES and SHA array arguments visible over the AXI interface (and therefore to the PS In its memory space), whereas the RSA top level ports are implemented as read-only registers?

 

This is just a hypothesis, so let me know your thoughts.

 

0 Kudos
Voyager
Voyager
4,518 Views
Registered: ‎06-24-2013

Re: memcpy_to_io() on axi4-lite slave address range behaving oddly in my device driver

@bigbrett,

 

#define WSRSABASEADDR 0x43C00000

That's what I suspected, so you are not writing just to some 'memory' you are writing to the M_AXI_GP0 so any write there will be translated to an AXI transaction which ends up being interpreted by your HLS IP.

 

If you get any devmem working, you will see that writing words and bytes give different results than you would expect with 'normal' memory.

 

Best,

Herbert

-------------- Yes, I do this for fun!
0 Kudos
Adventurer
Adventurer
4,514 Views
Registered: ‎07-08-2016

Re: memcpy_to_io() on axi4-lite slave address range behaving oddly in my device driver

@hpoetzl: "writing words and bytes give different results than you would expect with 'normal' memory"

 

Hmmmm, ok that makes sense.....is this why the auto-generated drivers are structured to write a word (4 bytes) at a time, like this?

 

void XWsrsa1024_Set_publexp_V(XWsrsa1024 *InstancePtr, XWsrsa1024_Publexp_v Data) {
    Xil_AssertVoid(InstancePtr != NULL);
    Xil_AssertVoid(InstancePtr->IsReady == XIL_COMPONENT_IS_READY);

    XWsrsa1024_WriteReg(InstancePtr->Axilites_BaseAddress, XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA + 0, Data.word_0);
    XWsrsa1024_WriteReg(InstancePtr->Axilites_BaseAddress, XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA + 4, Data.word_1);
    XWsrsa1024_WriteReg(InstancePtr->Axilites_BaseAddress, XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA + 8, Data.word_2);
    XWsrsa1024_WriteReg(InstancePtr->Axilites_BaseAddress, XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA + 12, Data.word_3);
    XWsrsa1024_WriteReg(InstancePtr->Axilites_BaseAddress, XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA + 16, Data.word_4);
    XWsrsa1024_WriteReg(InstancePtr->Axilites_BaseAddress, XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA + 20, Data.word_5);
    XWsrsa1024_WriteReg(InstancePtr->Axilites_BaseAddress, XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA + 24, Data.word_6);
    XWsrsa1024_WriteReg(InstancePtr->Axilites_BaseAddress, XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA + 28, Data.word_7);
    XWsrsa1024_WriteReg(InstancePtr->Axilites_BaseAddress, XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA + 32, Data.word_8);
    XWsrsa1024_WriteReg(InstancePtr->Axilites_BaseAddress, XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA + 36, Data.word_9);
    XWsrsa1024_WriteReg(InstancePtr->Axilites_BaseAddress, XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA + 40, Data.word_10);
    XWsrsa1024_WriteReg(InstancePtr->Axilites_BaseAddress, XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA + 44, Data.word_11);
    XWsrsa1024_WriteReg(InstancePtr->Axilites_BaseAddress, XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA + 48, Data.word_12);
    XWsrsa1024_WriteReg(InstancePtr->Axilites_BaseAddress, XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA + 52, Data.word_13);
    XWsrsa1024_WriteReg(InstancePtr->Axilites_BaseAddress, XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA + 56, Data.word_14);
    XWsrsa1024_WriteReg(InstancePtr->Axilites_BaseAddress, XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA + 60, Data.word_15);
    XWsrsa1024_WriteReg(InstancePtr->Axilites_BaseAddress, XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA + 64, Data.word_16);
    XWsrsa1024_WriteReg(InstancePtr->Axilites_BaseAddress, XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA + 68, Data.word_17);
    XWsrsa1024_WriteReg(InstancePtr->Axilites_BaseAddress, XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA + 72, Data.word_18);
    XWsrsa1024_WriteReg(InstancePtr->Axilites_BaseAddress, XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA + 76, Data.word_19);
    XWsrsa1024_WriteReg(InstancePtr->Axilites_BaseAddress, XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA + 80, Data.word_20);
    XWsrsa1024_WriteReg(InstancePtr->Axilites_BaseAddress, XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA + 84, Data.word_21);
    XWsrsa1024_WriteReg(InstancePtr->Axilites_BaseAddress, XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA + 88, Data.word_22);
    XWsrsa1024_WriteReg(InstancePtr->Axilites_BaseAddress, XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA + 92, Data.word_23);
    XWsrsa1024_WriteReg(InstancePtr->Axilites_BaseAddress, XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA + 96, Data.word_24);
    XWsrsa1024_WriteReg(InstancePtr->Axilites_BaseAddress, XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA + 100, Data.word_25);
    XWsrsa1024_WriteReg(InstancePtr->Axilites_BaseAddress, XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA + 104, Data.word_26);
    XWsrsa1024_WriteReg(InstancePtr->Axilites_BaseAddress, XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA + 108, Data.word_27);
    XWsrsa1024_WriteReg(InstancePtr->Axilites_BaseAddress, XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA + 112, Data.word_28);
    XWsrsa1024_WriteReg(InstancePtr->Axilites_BaseAddress, XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA + 116, Data.word_29);
    XWsrsa1024_WriteReg(InstancePtr->Axilites_BaseAddress, XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA + 120, Data.word_30);
    XWsrsa1024_WriteReg(InstancePtr->Axilites_BaseAddress, XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA + 124, Data.word_31);
}

I remember looking at it and thinking "wow you could do this in one line of code with memcpy". But if the writes have to be word-by-word, then that makes sense!

I will try doing the writes word-by-word in my driver and see if that changes anything. Do you forsee any isues with just having something like this?

 

 

static ssize_t wsrsa_write(struct file *filep, const char *buffer, size_t len, loff_t *offset)
{  
    RSAPublic_t PublicData; // Memory for bytes passed from userspace

    // copy base,exponent,modulus from userspace-->kmem struct
    copy_from_user(&PublicData, buffer, sizeof(RSAPublic_t)); 

    // copy base WORD-BY-WORD from kmem into AXI memory
for (int byte_offset=0; byte_offset<128; i+=4) { memcpy_toio(vbaseaddr+XWSRSA1024_AXILITES_ADDR_BASE_V_DATA + byte_offset, &(PublicData.base) + byte_offset, 32); } // do the same for other 1024-bit input registers
// .....
// .....
// start RSA block to encrypt/decrypt wsrsa_runonce_blocking(); printk(KERN_INFO "wsrsa1024: Received message of length %zu bytes from userspace\n", len); return len; }

 

Again, I can't work within the devmem or the uio framework. I have to manually do everything :/

 

0 Kudos
Voyager
Voyager
3,860 Views
Registered: ‎06-24-2013

Re: memcpy_to_io() on axi4-lite slave address range behaving oddly in my device driver

The memcpy_toio() assumes that you have IO Memory which behaves like real memory, which is not the case in your AXI client. I suggest you stick to iowrite32() and ioread32(), assumed that your IP uses 32bit transactions.

 

Best,

Herbert

-------------- Yes, I do this for fun!
Adventurer
Adventurer
3,854 Views
Registered: ‎07-08-2016

Re: memcpy_to_io() on axi4-lite slave address range behaving oddly in my device driver

Wow I would never have come to this conclusion myself, I don't know enough about linux internals. Thank you for your help so far.

 

I tried using iowrite32 and things still aren't working, but I think I might be doing something funky with pointers. Is there any minimum amount of time that I need to wait before writing the next piece of data to IO memory? I don't want to be writing faster than the AXI bus can handle

0 Kudos
Voyager
Voyager
3,851 Views
Registered: ‎06-24-2013

Re: memcpy_to_io() on axi4-lite slave address range behaving oddly in my device driver

Hey @bigbrett,

 

Wow I would never have come to this conclusion myself, I don't know enough about linux internals.

Thank you for your help so far.

You're welcome!

 

I tried using iowrite32 and things still aren't working, but I think I might be doing something funky with pointers.

I'd say let's do the same test as suggested with devmem, just with iowrite32, iowrite8 and ioread32

 

Is there any minimum amount of time that I need to wait before writing the next piece of data to IO memory?

I don't want to be writing faster than the AXI bus can handle.

No worries there, the AXI bus will block your write (or read) as long as it takes for the IP to ack the transaction.

 

Best,

Herbert

-------------- Yes, I do this for fun!
Adventurer
Adventurer
3,837 Views
Registered: ‎07-08-2016

Re: memcpy_to_io() on axi4-lite slave address range behaving oddly in my device driver

I'd say let's do the same test as suggested with devmem, just with iowrite32, iowrite8 and ioread32

Sorry, which test do you mean? iowrite32 is not something I can invoke from the command line.

 

0 Kudos
Voyager
Voyager
3,826 Views
Registered: ‎06-24-2013

Re: memcpy_to_io() on axi4-lite slave address range behaving oddly in my device driver

Hey @bigbrett,

 

Sorry, which test do you mean?

iowrite32 is not something I can invoke from the command line.

LOL, you really need to make up your mind here.

If you are on the Linux command line, you can run devmem (you probably need to install it)

If you are limited to the kernel driver, you can use iowrite32() and friends to test.

Whatever your preferred way to test 32bit vs 8bit write/read is it doesn't really matter.

 

All the best,

Herbert

-------------- Yes, I do this for fun!
0 Kudos
Adventurer
Adventurer
3,825 Views
Registered: ‎07-08-2016

Re: memcpy_to_io() on axi4-lite slave address range behaving oddly in my device driver

@hpoetzl got it, I wasn't sure what you were asking me to do :). I'm not doing any devmem, and am limited to the kernel driver, so I'm currently writing up the test with iowrite/ioread's like you said. I'll report back once I get a result. Thanks again for the help!

0 Kudos
Voyager
Voyager
3,812 Views
Registered: ‎06-24-2013

Re: memcpy_to_io() on axi4-lite slave address range behaving oddly in my device driver

No problem!

Let me know how it goes ...

 

Best,

Herbert

-------------- Yes, I do this for fun!
0 Kudos
Scholar hbucher
Scholar
3,810 Views
Registered: ‎03-22-2016

Re: memcpy_to_io() on axi4-lite slave address range behaving oddly in my device driver

@bigbrett

https://github.com/hackndev/tools/blob/master/devmem2.c

vitorian.com --- We do this for fun. Always give kudos. Accept as solution if your question was answered.
I will not answer to personal messages - use the forums instead.
Adventurer
Adventurer
3,801 Views
Registered: ‎07-08-2016

Re: memcpy_to_io() on axi4-lite slave address range behaving oddly in my device driver

@hpoetzl so unfortunately no luck, it still isn't working. I just verified again, that

  1. I am writing to the correct addresses (AXI registers)
  2. The values that I think are being written are actually being written
  3. I am reading from the correct registers

This puts me in an annoying position. I know the hardware works (bare metal tests all are correct), but for some reason I'm not getting the right answer through linux.

 

However, there is another clue: each time I read the result back from the hardware, I get a different answer. So, its not like the hardware block is returning the wrong answer, since if this was the case, it would consistently return the same wrong answer due to the nature of the algorithm. Does this give you any new ideas?

 

Does this narrow anything down? I've already chased the following potential issues:

 

Potential issue: Maybe the HLS block was constantly running, and each time I wrote a new partial word to an input register, it would attempt to rerun the algorithm.

Debug steps: I disabled auto-restart, and I do not start the block (set ap_start high) until each input is fully loaded. Didn't change anything. I also have sleep(1) peppered throughout my test code (not in the module) to ensure that I'm not reading before the result is finished (even though I check for this in my kernel write function)

 

Potential issue: Maybe I was writing to the wrong addresses

Debug steps: printed my pointer arithmetic to the kernel log, everything looked fine.

 

So now, I'm pretty stuck :(

 

One remaining thought: is there any guarantee that the virtual memory I allocate and assign to the HLS block's physical address range is contiguous? Because I am using the same register offsets from the base address that the physical address uses (exported from Vivado). Again, this WORKED PERFECTLY for AES and SHA256, however keep in mind that my top level interfaces were arrays for those, and not a bunch of 32-bit registers.

 

Again, below are my __init, read, and write functions. Kernel Module source code and HLS source code

static int __init wsrsa_init(void)
{
    int ret = 0; 
    printk(KERN_INFO "wsrsa1024: Initializing the wsrsa LKM\n");

    // request physical memory for driver 
    if (!request_mem_region(WSRSABASEADDR, SZ_64K, "wsrsa")) {
        printk(KERN_ALERT "wsrsa failed to request memory region\n");
        return -EBUSY;
    }
    // map reserved physical memory into into virtual memory TODO dtc support
    vbaseaddr = ioremap(WSRSABASEADDR, SZ_64K);
    if (! vbaseaddr) {
        printk(KERN_ALERT "wsrsa unable to map virual memory\n");
        release_mem_region(WSRSABASEADDR, SZ_64K);
        return -EBUSY;
    }
    vbaseaddr = ioremap(WSRSABASEADDR, SZ_64K);
    printk(KERN_INFO "wsrsa1024: Virtual Address = 0x%X\n", (unsigned int)vbaseaddr);

    // Try to statically allocate a major number for the device driver
    ret = register_chrdev(MAJOR_NUM, DEVICE_NAME, &fops);
    if (ret < 0) {
        printk(KERN_ALERT "wsrsa failed to register major number %d\n",MAJOR_NUM);
        return ret;
    }
    printk(KERN_INFO "wsrsa1024: registered correctly with major number %d\n", MAJOR_NUM);    

    // Register the device class with sysfs
    wsrsacharClass = class_create(THIS_MODULE, CLASS_NAME);
    if (IS_ERR(wsrsacharClass)) {              // Check for error and clean up if there is
        unregister_chrdev(MAJOR_NUM, DEVICE_NAME);
        printk(KERN_ALERT "wsrsa1024: Failed to register device class\n");
        return PTR_ERR(wsrsacharClass);          // Correct way to return an error on a pointer
    }
    printk(KERN_INFO "wsrsa1024: device class registered correctly\n");

    // Register the driver for the device class with sysfs
    wsrsacharDevice = device_create(wsrsacharClass, NULL, MKDEV(MAJOR_NUM, 0), NULL, DEVICE_NAME);
    if (IS_ERR(wsrsacharDevice)) {             // Clean up if there is an error
        class_destroy(wsrsacharClass);           // Repeated code but the alternative is goto statements
        unregister_chrdev(MAJOR_NUM, DEVICE_NAME);
        printk(KERN_ALERT "wsrsa1024: Failed to create the device\n");
        return PTR_ERR(wsrsacharDevice);
    }
    printk(KERN_INFO "wsrsa1024: device class created correctly\n"); // Made it! device was initialized

    // init mode to ENCRYPT
    printk(KERN_INFO "wsrsa1024: initializing wsrsa block to mode ENCRYPT\n");
    mode = ENCRYPT;
    iowrite8(mode, vbaseaddr + XWSRSA1024_AXILITES_ADDR_MODE_DATA); // write new mode value to memory 

    // Disable autorestart
    iowrite8(0, vbaseaddr + XWSRSA1024_AXILITES_ADDR_AP_CTRL);

    return 0;
}


static ssize_t wsrsa_read(struct file *filep, char *buffer, size_t len, loff_t *offset)
{
    unsigned int data_out[32]; // Memory for bytes passed back to userspace

    // copyt ciphertext/plaintext data_from AXI Memory to kmem
    //memcpy_fromio(data_out, vbaseaddr+XWSRSA1024_AXILITES_ADDR_RESULT_V_DATA, RSA_SIZE_BYTES);

    printk(KERN_INFO "RESULT = ");
    unsigned int *reg = vbaseaddr+XWSRSA1024_AXILITES_ADDR_RESULT_V_DATA;
    int i;
    for (i=0; i<32; i++)
    {
        data_out[i] = ioread32(reg++);
        printk(KERN_CONT "0x%08X, ",data_out[i]);
    }
    printk(KERN_INFO "\n");

    // Copy data_out from kmem into userspace (*to,*from,size)
    copy_to_user(buffer, data_out, RSA_SIZE_BYTES);

    printk(KERN_INFO "wsrsa1024: Copied data of length %d bytes back to userspace\n", RSA_SIZE_BYTES);
    return RSA_SIZE_BYTES;  
}



static ssize_t wsrsa_write(struct file *filep, const char *buffer, size_t len, loff_t *offset)
{  
    RSAPublic_t PublicData; // Memory for bytes passed from userspace

    // copy base,exponent,modulus from userspace-->kmem struct
    copy_from_user(&PublicData, buffer, sizeof(RSAPublic_t)); 
    print_hex_dump_bytes(".base    = ",0, PublicData.base, RSA_SIZE_BYTES);
    print_hex_dump_bytes(".exp     = ",0,PublicData.exponent,RSA_SIZE_BYTES);
    print_hex_dump_bytes(".modulus = ",0,PublicData.modulus,RSA_SIZE_BYTES);

    // copy base from kmem into AXI memory WORD AT A TIME 
    int byte_offset;
    //printk(KERN_INFO "BASE WRITTEN = ");
    printk(KERN_INFO "BASE ADDRS = ");
    for (byte_offset=0; byte_offset<128; byte_offset+=4) 
    {
        iowrite32(*((unsigned int*)(PublicData.base + byte_offset)), vbaseaddr+XWSRSA1024_AXILITES_ADDR_BASE_V_DATA + byte_offset);     
        //printk(KERN_CONT "0x%08X, ",*((unsigned int*)(PublicData.base + byte_offset)));
        printk(KERN_CONT "0x%X ", vbaseaddr+XWSRSA1024_AXILITES_ADDR_BASE_V_DATA + byte_offset);;    
    }
    printk(KERN_INFO "\n");

    // copy exponent from kmem into AXI memory WORD AT A TIME 
    printk(KERN_INFO "EXP WRITTEN = ");
    for (byte_offset=0; byte_offset<128; byte_offset+=4) 
    {
        iowrite32(*((unsigned int*)(PublicData.exponent+ byte_offset)), vbaseaddr+XWSRSA1024_AXILITES_ADDR_PUBLEXP_V_DATA + byte_offset);     
        //printk(KERN_CONT "0x%08X, ",*((unsigned int*)(PublicData.exponent+ byte_offset)));
    }
    printk(KERN_INFO "\n");

    // copy exponent from kmem into AXI memory WORD AT A TIME 
    printk(KERN_INFO "MOD WRITTEN = ");
    for (byte_offset=0; byte_offset<128; byte_offset+=4) 
    {
        iowrite32(*((unsigned int*)(PublicData.modulus+ byte_offset)), vbaseaddr+XWSRSA1024_AXILITES_ADDR_MODULUS_V_DATA + byte_offset);     
        //printk(KERN_CONT "0x%08X, ",*((unsigned int*)(PublicData.modulus+ byte_offset)));
    }    
    printk(KERN_INFO "\n");

    // start RSA block to encrypt/decrypt
    wsrsa_runonce_blocking();

    printk(KERN_INFO "wsrsa1024: Received message of length %zu bytes from userspace\n", len);
    return len;
}

 

 

0 Kudos
Adventurer
Adventurer
3,801 Views
Registered: ‎07-08-2016

Re: memcpy_to_io() on axi4-lite slave address range behaving oddly in my device driver

@hbucher thanks, I'll see if I can integrate that into my Yocto build and poke around with it

0 Kudos
Voyager
Voyager
4,051 Views
Registered: ‎06-24-2013

Re: memcpy_to_io() on axi4-lite slave address range behaving oddly in my device driver

Hey @bigbrett,

 

Yocto should already have recipes for devmem and devmem2.

 

Best,

Herbert

-------------- Yes, I do this for fun!
0 Kudos
Adventurer
Adventurer
4,033 Views
Registered: ‎07-08-2016

Re: memcpy_to_io() on axi4-lite slave address range behaving oddly in my device driver

@hpoetzl Ok Herbert, I am really at a loss for what is going on here.

 

I just yet again checked that the bare metal hardware works. It does. I even printed out the EXACT values I was writing into the AXI registers, just in case I was messing up the bye-Endianness (since each 1024-bit input has 32 input registers, I figured maybe I was writing the words in backwards?).

 

Once I obtained the EXACT values I was writing to the input ports in my bare-metal drivers, I proceeded to write a script that manually writes to each register using devmem2. This was to be absolutely SURE that I am writing the correct values.

 

Once I loaded the input values (base, exponent, modulus, and operating mode), I write 0x1 to the ap_ctrl register to start the block.

 

The answer is STILL different every time.

 

I have NO IDEA what is going on here. Again, the hardware WORKS when I write data to it using the Xilinx bare-metal drivers. But under linux, even when directly writing to physical memory using devmem2, everything breaks. And I know its most likely not the hardware, because it returns a DIFFERENT incorrect answer every time.

 

So I have absolutely no idea how to proceed with debugging. For reference, lets compare my bare metal driver with the linux devmem2 method:

 

Bare metal Encryption function:

 

uint8_t privexp_arr[] = {0xA1,0x11,0xAD,0xAD,0x48,0x88,0xF5,0x2D,0x35,0xF5,0x42,0x8E,0x39,0x39,0x68,0x06,0xBE,0x32,0x52,0x5C,0xDA,0x2B,0xF2,0x2A,0x27,0x58,0x1B,0xDE,0xEE,0x18,0x63,0x92,0xD8,0x9F,0x02,0x2C,0xFB,0xDF,0x77,0xE6,0x1F,0xDB,0xDC,0x84,0x6C,0x90,0x38,0xA0,0x8D,0x8A,0xEB,0x5C,0x2A,0xF7,0xCC,0x25,0x9D,0x62,0xBA,0xB5,0xB2,0xB8,0x7B,0xCD,0x66,0xD6,0x77,0xD5,0x32,0x9D,0xF1,0x98,0x9C,0xB1,0xAC,0x50,0x23,0x7C,0xCF,0x28,0x69,0x32,0xD9,0x3A,0x21,0x82,0x9D,0xE0,0xE1,0xBA,0x12,0x3C,0x79,0x95,0x10,0x7A,0x50,0x6E,0xA2,0x91,0x87,0x04,0x2B,0x6F,0xE4,0x8C,0x05,0x51,0x31,0x81,0x50,0xE9,0x52,0x69,0x09,0xCF,0x68,0x1D,0x74,0x88,0x6B,0x17,0x43,0xE8,0xFD,0x9C,0x7B,0x04};
uint32_t publexp_arr [] = {0x10001,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
uint8_t modulus_arr[] = {0x49,0xF5,0xEB,0x73,0x5B,0x82,0x9C,0xEB,0x4B,0xC2,0xAF,0x74,0x64,0x29,0x38,0xA8,0xAF,0x7E,0xA4,0x77,0xBA,0x9C,0x79,0xB6,0x9B,0x5E,0x65,0xBC,0xBA,0x74,0x84,0x3E,0x84,0xBF,0x5C,0xD4,0xD1,0xF4,0xEC,0xD4,0x83,0x3D,0xC6,0x9B,0x7B,0x52,0x5C,0x2F,0x25,0x79,0x6D,0x21,0x79,0xB3,0x31,0x7A,0x0D,0xAD,0xB1,0xB9,0xDC,0x5F,0xE5,0x3D,0x13,0x21,0xF6,0xFB,0x97,0x1A,0xFB,0xB9,0x7F,0x4D,0x26,0x0F,0x10,0x37,0xEA,0xEA,0xEC,0x97,0xA4,0x79,0x37,0xFB,0x62,0x33,0x9E,0xB3,0x28,0xC4,0x30,0x8A,0xA6,0x94,0x9A,0x9F,0x0D,0xDF,0xE2,0xF5,0xB4,0x1F,0x25,0x4F,0xE1,0x6F,0x35,0xBF,0x82,0xBF,0xE6,0xA2,0xA0,0x15,0x80,0xA1,0x69,0x97,0xD8,0x3D,0x85,0x88,0x9E,0x88,0x4D,0xD9};
const uint8_t ciphertext_golden_ans[] = {0xF0,0xCA,0x37,0xC7,0xFA,0x38,0xB3,0xDF,0x00,0xA6,0xFA,0x10,0x14,0xEA,0xD7,0x36,0x83,0x61,0x5F,0x12,0x29,0x6C,0x19,0xC3,0x3A,0xC6,0x03,0xC9,0x74,0xF2,0x9E,0x57,0x68,0x2C,0xA8,0xAD,0xE6,0xAF,0x27,0x35,0xEF,0xD6,0x33,0x34,0xA8,0x0F,0x8E,0x2D,0x84,0xA5,0xA9,0xF3,0xC6,0x9A,0xF7,0xC9,0xB6,0x9B,0x12,0x0E,0xF3,0x40,0x6E,0x8E,0x2A,0x40,0x4B,0x6C,0x63,0x6B,0x42,0xEC,0xE6,0xB5,0x2E,0x1D,0x5A,0x95,0xFF,0x8E,0xAF,0xB3,0x24,0x8D,0x88,0x01,0x61,0x42,0x1D,0xA9,0x80,0x93,0xD2,0xE9,0x04,0x30,0x63,0x43,0x16,0xC1,0xD0,0xCC,0xFD,0xD1,0xA0,0xA8,0xC3,0xD0,0x73,0xF6,0x66,0x38,0x95,0x42,0xA1,0x75,0x77,0xD1,0xE2,0xBB,0xB8,0x49,0x7B,0x78,0x6F,0x66,0x44,0x93};
const uint32_t plaintext_golden_ans[] = {0x726C6421,0x2C20576F,0x656C6C6F,0x00000048,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};

int32_t wsrsa_encrypt(uint8_t* plaintext, uint8_t* publexp, uint8_t*modulus,  uint8_t* ciphertext)
{
	// Set Base
	XWsrsa1024_Base_v plaintext_st;
	memcpy(&plaintext_st, plaintext, sizeof(XWsrsa1024_Base_v));
	XWsrsa1024_Set_base_V(&xrsamodexp,plaintext_st);

	// Set public exponent
	XWsrsa1024_Publexp_v publexp_st;
	memcpy(&publexp_st, publexp, sizeof(XWsrsa1024_Publexp_v));
	XWsrsa1024_Set_publexp_V(&xrsamodexp, publexp_st);

	// Set Modulus
	XWsrsa1024_Modulus_v modulus_st;
	memcpy(&modulus_st, modulus, sizeof(modulus_st));
	XWsrsa1024_Set_modulus_V(&xrsamodexp, modulus_st);

	// Create empty result struct and initialize it to zero
	XWsrsa1024_Result_v ciphertext_st;
        memset(ciphertext_st, 0, sizeof(XWsrsa1024_Result_v));

	// Print input data for debugging
	xil_printf("BASE DATA = \n");   printBaseData(&xrsamodexp,plaintext_st);
	xil_printf("EXPO DATA = \n");   printExpData(&xrsamodexp,publexp_st);
	xil_printf("MODU DATA = \n");   printModData(&xrsamodexp,modulus_st);

	// Set mode to encrypt
	XWsrsa1024_Set_mode(&xrsamodexp,ENCRYPT);

	// Start hardare block
	XWsrsa1024_Start(&xrsamodexp);
	// wait for result
	while( !XWsrsa1024_IsDone(&xrsamodexp));

	// read back data into local buffer
	ciphertext_st = XWsrsa1024_Get_result_V(&xrsamodexp);

	// copy local struct data into user buffer
	memcpy(ciphertext, &ciphertext_st, sizeof(XWsrsa1024_Result_v));

	// compare result against golden truth data, and fail if its wrong
	if (memcmp(ciphertext, golden_ans, sizeof(XWsrsa1024_Result_v)) )
	{
		printf("ERROR, CIPHERTEXT IS INCORRECT\n");
		return XST_FAILURE;
	}
	else
		return XST_SUCCESS;
}

int32_t rsa_test(void)
{
	uint8_t result[RSA_NUM_BYTES];
	// initialize RSA block
	uint32_t ret = rsa_init();
	if (ret != XST_SUCCESS)
	        xil_printf("RSA init error!\n");

	// Test public encyption on plaintext, comparing against known ciphertext
	ret = wsrsa_encrypt(plaintext_golden_ans, publexp_arr, modulus_arr, result);
	xil_printf("Enc result = "); printHex(result,RSA_NUM_BYTES);
        return ret;
}

 

And with devmem2:

 

 

#!/bin/sh

# Set mode to 0
devmem2 0x43c00010 b 0

# WRITE BASE WORD-BY-WORD
devmem2 0x43C00018 w 0x726C6421
devmem2 0x43C0001C w 0x2C20576F
devmem2 0x43C00020 w 0x656C6C6F
devmem2 0x43C00024 w 0x00000048
devmem2 0x43C00028 w 0x00000000
devmem2 0x43C0002C w 0x00000000
devmem2 0x43C00084 w 0x00000000
# ....
#  zero writes through 0x43C00094 ....
# ....
devmem2 0x43C00094 w 0x00000000

# WRITE EXPONENT WORD-BY-WORD devmem2 0x43C0009C w 0x00010001 devmem2 0x43C000A0 w 0x00000000 devmem2 0x43C000A4 w 0x00000000 # .... # zero writes through 0x43C00118 .... # .... devmem2 0x43C00118 w 0x00000000
# WRITE MODULUS WORD-BY-WORD devmem2 0x43C00120 w 0x73EBF549 devmem2 0x43C00124 w 0xEB9C825B devmem2 0x43C00128 w 0x74AFC24B devmem2 0x43C0012C w 0xA8382964 devmem2 0x43C00130 w 0x77A47EAF devmem2 0x43C00134 w 0xB6799CBA devmem2 0x43C00138 w 0xBC655E9B devmem2 0x43C0013C w 0x3E8474BA devmem2 0x43C00140 w 0xD45CBF84 devmem2 0x43C00144 w 0xD4ECF4D1 devmem2 0x43C00148 w 0x9BC63D83 devmem2 0x43C0014C w 0x2F5C527B devmem2 0x43C00150 w 0x216D7925 devmem2 0x43C00154 w 0x7A31B379 devmem2 0x43C00158 w 0xB9B1AD0D devmem2 0x43C0015C w 0x3DE55FDC devmem2 0x43C00160 w 0xFBF62113 devmem2 0x43C00164 w 0xB9FB1A97 devmem2 0x43C00168 w 0x0F264D7F devmem2 0x43C0016C w 0xEAEA3710 devmem2 0x43C00170 w 0x79A497EC devmem2 0x43C00174 w 0x3362FB37 devmem2 0x43C00178 w 0xC428B39E devmem2 0x43C0017C w 0x94A68A30 devmem2 0x43C00180 w 0xDF0D9F9A devmem2 0x43C00184 w 0x1FB4F5E2 devmem2 0x43C00188 w 0x6FE14F25 devmem2 0x43C0018C w 0xBF82BF35 devmem2 0x43C00190 w 0x15A0A2E6 devmem2 0x43C00194 w 0x9769A180 devmem2 0x43C00198 w 0x88853DD8 devmem2 0x43C0019C w 0xD94D889E
# disable autorestart devmem2 0x43C00000 b 0
# start block devmem2 0x43C00000 b 1
# read back the results to the console devmem2 0x43C001A4 w devmem2 0x43C001A8 w devmem2 0x43C001AC w devmem2 0x43C001B0 w devmem2 0x43C001B4 w devmem2 0x43C001B8 w devmem2 0x43C001BC w devmem2 0x43C001C0 w devmem2 0x43C001C4 w devmem2 0x43C001C8 w devmem2 0x43C001CC w devmem2 0x43C001D0 w devmem2 0x43C001D4 w devmem2 0x43C001D8 w devmem2 0x43C001DC w devmem2 0x43C001E0 w devmem2 0x43C001E4 w devmem2 0x43C001E8 w devmem2 0x43C001EC w devmem2 0x43C001F0 w devmem2 0x43C001F4 w devmem2 0x43C001F8 w devmem2 0x43C001FC w devmem2 0x43C00200 w devmem2 0x43C00204 w devmem2 0x43C00208 w devmem2 0x43C0020C w devmem2 0x43C00210 w devmem2 0x43C00214 w devmem2 0x43C00218 w devmem2 0x43C0021C w devmem2 0x43C00220 w

 

Every time I call the devmem2 script, the results are different. But I can loop my bare metal test program, constantly setting ap_start, and the result never changes (as it shouldnt). I have absolutely no idea how to debug this any further :(

 

@hbucher maybe you might be able to have some wisdom to drop here as well?

 

Any and all help appreciated,

Brett

 

 

 

0 Kudos
Scholar hbucher
Scholar
4,026 Views
Registered: ‎03-22-2016

Re: memcpy_to_io() on axi4-lite slave address range behaving oddly in my device driver

@bigbrett Make sure your kernel is compiled with CONFIG_STRICT_DEVMEM .

https://github.com/torvalds/linux/blob/f986e31bb4d0dba0a10adc51308bf9de2d0e7e4a/drivers/char/mem.c#L62

Make sure address translation is correct because by default linux on x86 will allow only access to PCIe areas (not actual memory).

https://lwn.net/Articles/267427/

Make sure the translation from PCIe to AXI address spaces on the FPGA side is correct. 

https://www.xilinx.com/support/documentation/ip_documentation/xdma/v3_0/pg195-pcie-dma.pdf   (page 74)

If all fails, try to set a debug ILA on the AXI channel coming out of the PCI subsystem.

 

 

 

vitorian.com --- We do this for fun. Always give kudos. Accept as solution if your question was answered.
I will not answer to personal messages - use the forums instead.
0 Kudos
Voyager
Voyager
4,011 Views
Registered: ‎06-24-2013

Re: memcpy_to_io() on axi4-lite slave address range behaving oddly in my device driver

Hey @bigbrett,

 

I am really at a loss for what is going on here.

Sorry to hear, but I haven't given up yet :)

 

But under linux, even when directly writing to physical memory using devmem2, everything breaks.

Hmm, can you show me an example of this?

 

So I have absolutely no idea how to proceed with debugging ...

For me, it still looks like an issue with the way you write and read the data, but the project is quite complex so it would be definitely a good idea to make a really simple (i.e. trivial) example which contains all the elements of your design without the complexity (reduce interface to one or two words, remove complex calculations, etc).

 

Best,

Herbert

-------------- Yes, I do this for fun!
0 Kudos
Adventurer
Adventurer
4,009 Views
Registered: ‎07-08-2016

Re: memcpy_to_io() on axi4-lite slave address range behaving oddly in my device driver

@hbucherThanks for the response.

 

Rather than muck around with the first two suggestions (I'm on the zynq by the way, which is an ARM not x86), I thought it would be prudent to jump right to the last suggestion, since it would provide the most information as to what is going on, and should reveal any errors.

 

As per your recommendation, I placed an ILA on the axi channel between the HLS block and the axi interconnect. I then ran the bare-metal test program to see how it should look. Everything looked fine (obviously) in bare metal.

 

However, when I booted into linux and ran the test program for my kernel module, the ILA showed that the writes to the input ports and  ap_start looked perfect, and yet the result was still wrong! I am literally watching the exact same sequence of writes to the input ports and to ap_ctrl regs, with correct data values AND correct addresses, for both cases (bare-metal and linux). However, the HLS block returns different values under Linux than in bare-metal!!! And the incorrect values on the ILA are the same incorrect values that my kernel module receives!! There is nothing in either the kernel module or the test program that touches the AXI bus in between the reads/writes based on what I can see on the ILA. It is write, start, poll ap_ctrl for done signal, then read for BOTH Linux and bare-metal. Yet one case returns the correct answer, and the other returns an incorrect answer that is different every time.

 

My mind is blown right now. I have absolutely no idea how to proceed. Happy to post waveforms screenshots for proof, however I promise you that they look identical, except for the reads on the result.

 

 

 

 

0 Kudos
Adventurer
Adventurer
4,008 Views
Registered: ‎07-08-2016

Re: memcpy_to_io() on axi4-lite slave address range behaving oddly in my device driver

@hpoetzl just saw your comment. See my response to @hbucher above.

0 Kudos
Scholar hbucher
Scholar
4,003 Views
Registered: ‎03-22-2016

Re: memcpy_to_io() on axi4-lite slave address range behaving oddly in my device driver

@bigbrett I was referring to the PCI-express host being x86 and the PCIe endpoint being ARM/Zynq/Linux, which is usually the case.

So your ARM/Linux board is working as PCIe root? 

vitorian.com --- We do this for fun. Always give kudos. Accept as solution if your question was answered.
I will not answer to personal messages - use the forums instead.
0 Kudos
Adventurer
Adventurer
3,993 Views
Registered: ‎07-08-2016

Re: memcpy_to_io() on axi4-lite slave address range behaving oddly in my device driver

@hbucher I'm not familiar with PCI express, and I don't believe I'm using it.

 

I'm building the kernel module and test on a host machine (x86) using Yocto, but am copying the binaries over to the zynq (runnng linux) via tftp and running it live on hardware with no emulation. I just have the zynq connected to the host comupter via a TTY console.

 

In this case, I think the answer is yes, my ARM/Linux board is PCIe root? Although I didn't know the zedboard had a PCIe slot. Pretty sure I'm using standard on-board DDR memory

0 Kudos
Voyager
Voyager
3,993 Views
Registered: ‎06-24-2013

Re: memcpy_to_io() on axi4-lite slave address range behaving oddly in my device driver

@bigbrett: Yeah, please upload all the data you gathered, ideally in some kind of wiki or packaged in an archive (tar/zip) with a readme to explain what is what.

 

@hbucher: As far as I understand, it's just a ZYNQ with Linux/Bare Metal on the PS (arm cores) ... no x86 or PCIe involved :)

 

Thanks,

Herbert

-------------- Yes, I do this for fun!
0 Kudos
Scholar hbucher
Scholar
3,969 Views
Registered: ‎03-22-2016

Re: memcpy_to_io() on axi4-lite slave address range behaving oddly in my device driver

@bigbrett I dont understand this line

What is the type of XWsrsa1024_Base_v and the prototype of XWsrsa1024_Set_Base_V?

 

XWsrsa1024_Base_v plaintext_st;
memcpy(&plaintext_st, plaintext, sizeof(XWsrsa1024_Base_v));
XWsrsa1024_Set_base_V(&xrsamodexp,plaintext_st); 

 

 

vitorian.com --- We do this for fun. Always give kudos. Accept as solution if your question was answered.
I will not answer to personal messages - use the forums instead.
0 Kudos