UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Participant valato
Participant
1,150 Views
Registered: ‎09-29-2017

Memcpy timing

Jump to solution

I am working with Zync 7020 in HLS and I have encountered a timing problem of 'mul' operation consuming 6.38ns which exceeds effective budget (5.83ns in my case for 150MHz = 6.67ns). One cycle multiplications which require single DSP consume usually 6.38ns. For regular multiplication I usually find a workaround (bigger latency or larger numbers using 2 DSPs) but I am stuck with memcpy usage:

static ap_uint<32> fontBuf[20][(16*16*20)/32];
unsigned int top(ap_uint<32> * input)
{
#pragma HLS INTERFACE m_axi depth=1650 port=input offset=slave bundle=MASTER_AXI //* 10 instructions
#pragma HLS ARRAY_PARTITION variable=fontBuf complete dim=1
    for (ap_uint<5> i = 0; i < 20; i++, input += (16*16*20)/32) //* this line is marked as critical 6.38ns 'mul' operation after Synthesis
    {
             memcpy(fontBuf[i], input, 4*(16*16*20)/32);
    }
    //* following code is just to consume data somehow
    unsigned int count = 0;
    for (int i = 0; i < 20; i++)
    {
        for (int j = 0; j < (16*16*20)/32; j++)
        {
            if (fontBuf[i][j] != 0) count++;
        }
    }
    return count;
}

The only solution to get rid of multiplication issue I found so far is to completely unroll first loop. But it consumes many resources to create 20 'gmem' instances of memcpy pipeline. Another solution seems to be reading data sequentially (without burst) but it does not work in my whole design since all the requests would overload my Zync PS HP port used by other IPs.

Is there a way to persuade HLS to perform memcpy's multiplication with larger latency? Or to solve it another way?

----- Please mark the post as an answer "Accept as solution" in case it helped to solve your problem. Give kudos in case the post guided you to the solution.
0 Kudos
1 Solution

Accepted Solutions
Highlighted
Participant valato
Participant
1,506 Views
Registered: ‎09-29-2017

Re: Memcpy timing

Jump to solution

User @u4223374 actually managed to solved my problem in a different thread - multiplication timing.

Setting global directive:
config_core {DSP48, Latency=3}
solved my problem. It is a pitty you cannot apply it only to some region or operation.

----- Please mark the post as an answer "Accept as solution" in case it helped to solve your problem. Give kudos in case the post guided you to the solution.

View solution in original post

0 Kudos
2 Replies
Participant valato
Participant
1,132 Views
Registered: ‎09-29-2017

Re: Memcpy timing

Jump to solution

Here is a complete project in zip for easy simulation.

----- Please mark the post as an answer "Accept as solution" in case it helped to solve your problem. Give kudos in case the post guided you to the solution.
0 Kudos
Highlighted
Participant valato
Participant
1,507 Views
Registered: ‎09-29-2017

Re: Memcpy timing

Jump to solution

User @u4223374 actually managed to solved my problem in a different thread - multiplication timing.

Setting global directive:
config_core {DSP48, Latency=3}
solved my problem. It is a pitty you cannot apply it only to some region or operation.

----- Please mark the post as an answer "Accept as solution" in case it helped to solve your problem. Give kudos in case the post guided you to the solution.

View solution in original post

0 Kudos