UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Visitor alondight
Visitor
8,570 Views
Registered: ‎12-28-2012

HLS: how to have byte-wise control on the address in memcpy

I am writing a module using memcpy to access a bus. However I've no idea about how to transfer data to a space on the bus starting at an address that can be byte-wise controlled. Here is an example:

 

void data_move( volatile int* bus, volatile int* addr_fifo )

{

#pragma AP interface ap_bus port=bus

#pragma AP resource core = AXI4M variable=bus

#pragma AP interface ap_fifo port=addr_fifo

    int a[1000];

    int addr = *addr_fifo;

    memcpy( bus + (addr / sizeof(int), a, 1000 )

}

 

Since the AXI4 adaptor generated by Vivado will multiply the address offset on the bus by sizeof(int), so I pre-divide "addr" in the code. However since "addr" comes from outside and can be something like 0xC0000001, the last 2 bits will just get lost in the division and the address will be  misinterpreted as 0xC0000000. Could anyone tell me how to solve this problem? Thanks very much.

 

PS: Someone may ask me to use the parameter C_M_AXI_BUS_TARGET_ADDR to have byte-wise control, however this parameter can only statically set the starting address rather than runtime adjustment.

0 Kudos
5 Replies
Instructor
Instructor
8,561 Views
Registered: ‎08-14-2007

Re: HLS: how to have byte-wise control on the address in memcpy

It isn't clear that you can use memcpy when the source and destination addresses don't

align to the same byte of a word.  Normally to transfer from an odd location to another

odd location you'd just do the loose bytes at the start and end in a program loop and

use memcpy to transfer the bulk of the data starting and ending on a word boundary.

In your case, it seems that only one address is on an odd location, and thus a lot

of shifting would need to happen to transfer the data into an aligned location at the

other end.  I'm pretty sure memcpy won't do that for you.

 

-- Gabor

-- Gabor
0 Kudos
Xilinx Employee
Xilinx Employee
8,555 Views
Registered: ‎08-17-2011

Re: HLS: how to have byte-wise control on the address in memcpy

I think Gabor is correct.

 

unless i'm mistaken pointer to an integer, (int *) is always 32 bits aligned, ie 4 bytes.

so to get what you want i'd do a byte pointer ie (char *).

Those are the C rules that apply here.

 

Then I'm assuming the generated RTL will take care of this for you, i mean generating the correct addresses.

I don't know what happens without looking into more details or actually trying out.

 

I'm always assuming the address values I get are the ones generated by the hypothetical processor I would connect to.

HTH

- Hervé

SIGNATURE:
* New Dedicated Vivado HLS forums* http://forums.xilinx.com/t5/High-Level-Synthesis-HLS/bd-p/hls
* Readme/Guidance* http://forums.xilinx.com/t5/New-Users-Forum/README-first-Help-for-new-users/td-p/219369

* Please mark the Answer as "Accept as solution" if information provided is helpful.
* Give Kudos to a post which you think is helpful and reply oriented.
0 Kudos
Highlighted
Visitor alondight
Visitor
8,547 Views
Registered: ‎12-28-2012

Re: HLS: how to have byte-wise control on the address in memcpy

Thank you very much, Gabor. Here is the real issue I am encountering. I have a 256-bit AXI4 bus in the platform. I have computed an integer array in the Vivado_hls-synthesized pcore. I want to use memcpy to transfer the array to DRAM via the AXI4 bus at the maximum possible throughput, i.e., 8 integers per clock cycle. Here is how I try to implement it:

void pcore( volatile uint256* bus, volatile int* addr_fifo )

{

#pragma AP interface ap_bus port=bus

#pragma AP resource core = AXI4M variable=bus

#pragma AP interface ap_fifo port=ctrl_fifo

    int a[1000];

    compute_on( a );

    uint256 b[125];

    int addr = *ctrl_fifo;

    int buf_offset = addr % ( sizeof(uint256) / sizeof(int) );

     reorder_data_layout_from_to( a, b, buf_offset );

    int length = ( 1000 - buf_offset ) * sizeof(int);

    memcpy( bus + (addr / sizeof(int), b + ( buf_offset / sizeof(int) ), length )

}

I did allignment on both the two addresses in memcpy so that there is no shifting. However when "addr" is something like 0xC0000010 which is not alligned on the uint256 boundary, the offset "10" will still get lost. If I don't want to lose anything at the starting address since there are some useful data stored in range 0xC00000000 ~ 0xC0000010 and I want my data to be placed right after that range, what should I do? Thank you very much!

 

 

0 Kudos
Visitor alondight
Visitor
8,546 Views
Registered: ‎12-28-2012

Re: HLS: how to have byte-wise control on the address in memcpy

Thank you very much, Hervé. Using "char*" will realize the correct function, but the data tranfer throughput will be reduced from 32bit/cycle to 8bit/cycle. Actually I even want to achieve a throughput of 256bit/cycle by using uint256 (more details can be found in my reply to Gabor). Is there anyway so that Vivado_HLS can generate the required RTL for me? Thank you very much!

0 Kudos
Xilinx Employee
Xilinx Employee
8,521 Views
Registered: ‎11-11-2012

Re: HLS: how to have byte-wise control on the address in memcpy

Considering the AXI bus is only 32bits, (or 64 bits), there is no way to generate 256 bits peak throughput for you in the system.

 

Here are some rules:

1. Under C syntax, what you added as the offset to the "bus" will be counted in the sizeof(bus). This is a rule of C language.

    e.g., memcpy (bus +1, ...,...) means copy data from "BASE_ADDR + 1 * sizeof(bus)

    So there is no way to control the step size of memcpy smaller than sizeof(bus).

2. You may use "BASE_ADDR" parameter for unaligned starting address, but as you mentioned, this is not a run time parameter. Using a unaligned byte address may create serious performance issues.

3. The peak throughput is limited by the bus bitwidth in your case. Using 256 bits interface may have some internal performance benefits, but may not be as large as 256/64.

0 Kudos