Hi guys,
I need to transfer some data out and in at the same time. I've written two functions for both load and store as shown below
#pragma SDS data access_pattern(output:SEQUENTIAL, input:SEQUENTIAL)
void function(input[A][B], double_tf output[A][B]){
input_buf, output_buf;
for(int i = 0; i < max_val ; ++i) {
load(input[i], input_buf);
store(output_buf, output[i]);
}
}
The both the synthesis result and the performance estimation report show that they run in parallel. However when I test it with the board, the measured time shows that they run sequentially. I wonder how could that even happen. Looking for any help. Thanks.