We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for
Did you mean:
Visitor
242 Views
Registered: ‎07-24-2018

## Effect of an accumulated variable in matrix multiplication

Hello,

I have two functions of matrix multiplication as shown in below. The first one is with an accumulated variable Ci. Another is without Ci or it will accumulated directly to the output array C. The latency of the function with the accumulated variable is better than the one without for 25%. The question is why by adding a variable Ci for accumulation, make the latency is so much better?

Thanks.

The function with the accumulated value. (latency 7,680,009)

```void matmul(float A[1600], float B[1600][1600], float C[1600])
{
#pragma HLS ARRAY_RESHAPE variable=B complete dim=1
width:for(int i=0; i<1600; i++)
{
float Ci = 0;
product:for(int k=0; k<1600; k++)
{
#pragma HLS PIPELINE II=1
Ci += A[k]*B[k][i];
}
C[i] = Ci;
}
}```

The function without the accumulated value (latency 12,800,007).

```void matmul(float A[1600], float B[1600][1600], float C[1600])
{
#pragma HLS ARRAY_RESHAPE variable=B complete dim=1
width:for(int i=0; i<1600; i++)
{
product:for(int k=0; k<1600; k++)
{
#pragma HLS PIPELINE II=1
C[i] += A[k]*B[k][i];
}
}
}```
1 Solution

Accepted Solutions
Highlighted
Voyager
219 Views
Registered: ‎03-28-2016

## Re: Effect of an accumulated variable in matrix multiplication

"Ci" is a register implemented with flip-flops.  "C[1600]" is a RAM implemented with Block RAMs.  Writing to a register takes less time than writing to a RAM.  That likely accounts for the difference in the latency.

Ted Booth - Tech. Lead FPGA Design Engineer
www.designlinxhs.com
3 Replies
Highlighted
Voyager
220 Views
Registered: ‎03-28-2016

## Re: Effect of an accumulated variable in matrix multiplication

"Ci" is a register implemented with flip-flops.  "C[1600]" is a RAM implemented with Block RAMs.  Writing to a register takes less time than writing to a RAM.  That likely accounts for the difference in the latency.

Ted Booth - Tech. Lead FPGA Design Engineer
www.designlinxhs.com
Scholar
215 Views
Registered: ‎04-26-2015

## Re: Effect of an accumulated variable in matrix multiplication

In addition to what @tedbooth has said - what interface type are you defining for C? If it is, for example, an AXI Master, then that could take ages (in FPGA terms) to update.

Visitor
184 Views
Registered: ‎07-24-2018