01-07-2020 01:26 AM
In my program, the interface parameter of the hardware is a large array, and the large array is translated to stream in hardware function. I want to save the large array in PS-DDR, but there is an error when I use the "zero_copy" pragma.
"Hardware function 'MatrixMultiplicationKernel' array argument 'a_member' has 262144 elements, which exceeds the maximum supported BRAM depth of 16384."
In addition, the access to the large array is not sequential .
What SDSoc pragam should I use when the parameter of the hardware function is large?
The hardware function statement is:
#ifndef TOP_H #define TOP_H #include "MatrixMultiplication.h"
#pragma SDS data zero_copy(a[0:size_n*size_k/kMemoryWidthN], b[0:size_k*size_m/kMemoryWidthM],c[0:size_n*size_m/kMemoryWidthM]) void MatrixMultiplicationKernel(MemoryPackN_t const a[BLOCK_SIZE*BLOCK_SIZE/kMemoryWidthN], MemoryPackM_t const b[BLOCK_SIZE*BLOCK_SIZE/kMemoryWidthM], MemoryPackM_t c[BLOCK_SIZE*BLOCK_SIZE/kMemoryWidthM], const unsigned size_n, const unsigned size_k, const unsigned size_m); #endif
The code about translated the large array to steam is:
void ReadATransposed(MemoryPackN_t const memory[],Mem_N &pipe, const unsigned size_n, const unsigned size_k, const unsigned size_m) { // assert((static_cast<unsigned long>(OuterTilesN(size_n)) * // OuterTilesM(size_m) * size_k * kOuterTileSizeNMemory * // MemoryPackN_t::kWidth) == TotalReadsFromA(size_n, size_k, size_m)); ReadA_OuterTile_N: for (unsigned n0 = 0; n0 < OuterTilesN(size_n); ++n0) { ReadA_OuterTile_M: for (unsigned m0 = 0; m0 < OuterTilesM(size_m); ++m0) { ReadA_K: for (unsigned k = 0; k < size_k; ++k) { ReadA_BufferA_N1: for (unsigned n1m = 0; n1m < kOuterTileSizeNMemory; ++n1m) { #pragma HLS PIPELINE II=1 #pragma HLS LOOP_FLATTEN pipe<<memory[IndexATransposed(k, n0, n1m, size_n, size_k, size_m)]; } } } } }
01-07-2020 02:12 AM - edited 01-07-2020 02:13 AM
Hi @yanxiaopan
To use the ZERO_COPY pragma, the memory corresponding to the array must be physically contiguous, which is allocated with sds_alloc.
sds_alloc() allocates physical memory. Please use this API and see if you are still getting the error.
01-13-2020 01:11 AM
I was using sds_alloc() , but there still has an error.
01-13-2020 02:19 AM
Hi @yanxiaopan
Please share the full error details.
01-13-2020 02:34 AM
ERROR: [CF2XD 83-2235] Hardware function 'MatrixMultiplicationKernel' array argument 'a_member' has 262144 elements, which exceeds the maximum supported BRAM depth of 16384.
ERROR: [CF2XD 83-2235] You can use '#pragma SDS data access_pattern(a_member:SEQUENTIAL)' to map this argument to a FIFO interface.
ERROR: [CF2XD 83-2239] failed to create xd_adapter for accelerator comp MatrixMultiplicationKernel_1
ERROR: [CF2XD 83-2009] An error has occurred during generation of the system block diagram. For more information, please look for additional ERROR messages in the console and in log files.
01-13-2020 03:28 AM
Hi @yanxiaopan
As error info suggests, please use #pragma SDS data access_pattern(a_member:SEQUENTIAL)
You can use it before the function declaration.
Here is an example.
#pragma SDS DATA COPY(out[0:size])
#pragma DATA ACCESS_PATTERN(data:SEQUENTIAL, out:SEQUENTIAL)
void accelerator(float *data, int *out, int size);
01-13-2020 04:36 PM
But if my data access is not sequential,what can I do ?