Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

- Community Forums
- :
- Forums
- :
- Software Development and Acceleration
- :
- HLS
- :
- HLS overlapping computation and memory transfer pr...

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

3008202060

Participant

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-07-2018 06:09 PM

884 Views

Registered:
06-15-2016

HLS overlapping computation and memory transfer process?

Hi All,

I tried to use HLS for matrix multiplication. It basic read data from DDR to block ram, and then do computation. I was wondering if it is possible to ping-pong the data transfer and computation process in C++.

For example, before I have matrix A,B and C. Every time, I read A,B from DDR and store in on block ram, then do matrix multiplication and save the result to C.

Is it possible that I have A1,B1,C1 and A2,B2,C2. When I finish reading A1,B1 and try to compute C1, it will continue reading data to A2 and B2. After C1 is Done, it will continue computing C2.

Thanks

0
Kudos

Reply

2 Replies

nmoeller

Xilinx Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-09-2018 12:43 PM

835 Views

Registered:
09-05-2018

Hey @3008202060,

Vivado HLS has a library for linear algebra functions that includes a matrix_multiply function. As well, there are examples of how to call these functions in the provided examples. I highly recommend you check out "matrix_multiply" and "matrix_multiply_alt" in the "linear algebra folder", under "design examples". You can open example projects from welcome page, which you can find by going to "Welcome" under the "Help" menu item.

The optimization you describe sounds like a pipelined loop. You can read about this under "Loops" in UG902. Below is an example that I wrote up that I think does what you're looking for. If you synthesize the function, you should see that A_i and B_i are loaded in the first clock cycle, then the multiply and the accumulate are completed in the second.

MATRIX_T A_i, B_i, prod,mult; ITER_T r, c, i, j; for( r = 0 ; r < C_ROWS ; r ++ ) { // over rows of C for( c = 0 ; c < C_COLS ; c ++ ) { /// over cols of C prod = 0; for( int i = 0 ; i < B_ROWS; i ++ ) { #pragma HLS PIPELINE A_i = A[r][i]; B_i = B[i][c]; mult = A_i*B_i; prod += mult;; } C[r][c]=prod; } }

However, I would always recommend reusing code rather than implementing your own. In the HLS library, the matrix_multiply function not only pipelines that inner for loop but also partitions the arrays and unrolls the loop apropriately depending on the optimization factor you set.

Nicholas Moellers

Xilinx Worldwide Technical Support

Xilinx Worldwide Technical Support

0
Kudos

Reply

u4223374

Advisor

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-11-2018 02:12 AM

819 Views

Registered:
04-26-2015

@3008202060 You can certainly overlap computation and processing. The normal way to do this would be to set up a loop in the code that calls two functions: one to perform processing, one to perform I/O. In each loop iteration they should not share any input/output arrays, although having both read from the same scalar inputs (eg. matrix size) is acceptable.

My basic layout is:

int matrix_in_0A[1024]; int matrix_in_1A[1024]; int matrix_in_0B[1024]; int matrix_in_1B[1024]; int matrix_out_0[1024]; int matrix_out_1[1024]; for (int i = 0; i < 100; i++) { if ((i & 1) == 1) { process(matrix_in_0A, matrix_in_0B,matrix_out_0); dataIO(matrix_in_1A,matrix_in_1B,matrix_out_1); } else { process(matrix_in_1A, matrix_in_1B,matrix_out_1); dataIO(matrix_in_0A,matrix_in_0B,matrix_out_0); } }

In each iteration, dataIO has to write out matrix_out, and read in the two new input matrices. In the next iteration, these new inputs are fed to the processing function so it can produce a new output.

0
Kudos

Reply