Working with a source code which runs computations on large sparse matrices by adressing the matrix with a pointer of type structure containing various data members supporting the computatiions.
The compute intensive loop working with the same huge(12000x12000, to start with) sparse matrix is marked for hardware acceleration by creating a specific function for it.
PS and PL must share this sparse matrix at diifent instances of the code.
How do i achieve the most suitable memory transfer.(Zero_copy or Copy),
What accelerator interface whouls achieve this. Streaming or RAM.
Another question is that the compute function offloaded to the hardware makes calls to various sub functions.
How can subfunctions be offloaded?(can i use inline functions)
The Matrix Struct is global to main() and accelerated function. And so are many other variables.
How to handel global variables Properly between PS and PL.
Can global functions be called from hardware code or should it be redefined in hardware.h?