We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

SDSoC Better execution time on ARM than on FPGA

Posts: 29
Registered: ‎04-19-2017

SDSoC Better execution time on ARM than on FPGA



I work at some image processing functions that I want to accelerate on FPGA. I use Zedboatd. One function is computeMean that computes the mean value from an image. For this functions I have two implementations, one for ARM and one for FPGA. The image dimension is 640 x 480.


ARM function is:

int computeMean(Mat image)
    int sum = 0;
    int ii, jj;
    for(ii = 0; ii < image.rows; ii++)
        for(jj = 0 ; jj < image.cols; jj++)
            sum = sum + image.at<uchar>(ii,jj);

    return sum / (image.rows * image.cols);


   mean_start = clock();
    int mean = computeMean(channels[2]);
    mean_end = clock();

    printf("The execution time for mean function is : %f\n", ((double)(mean_end  - mean_start)) / CLOCKS_PER_SEC);



FPGA function is:


  #pragma SDS data copy(vector_image_in[0:WIDTH*HEIGHT])
  #pragma SDS data zero_copy(mean[0:1])
  #pragma SDS data mem_attribute(vector_image_in:PHYSICAL_CONTIGUOUS, mean:PHYSICAL_CONTIGUOUS)
  #pragma SDS data access_pattern(vector_image_in:SEQUENTIAL, mean:SEQUENTIAL)
  #pragma SDS data data_mover(vector_image_in:AXIDMA_SIMPLE, mean:AXIDMA_SIMPLE)

void computeMeanVector(unsigned short vector_image_in[WIDTH * HEIGHT], unsigned int mean[1])
#pragma HLS inline
    int sum = 0;
    int ii;
    for(ii = 0; ii < WIDTH * HEIGHT; ii++)
#pragma HLS UNROLL factor=100
        sum = sum + vector_image_in[ii];
    mean[0] = sum / (WIDTH * HEIGHT);


    unsigned short *vector_image_hue;

    unsigned int *meanV;

    vector_image_hue = (unsigned short *) sds_alloc_non_cacheable(HEIGHT * WIDTH * sizeof(unsigned short));

    meanV = (unsigned int *) sds_alloc_non_cacheable(1 * sizeof(unsigned int));

    mean_start = clock();
#pragma HLS array_partition variable=vector_image_hue cyclic factor=100 dim=1
    computeMeanVector(vector_image_hue, meanV);
    mean_end = clock();

    printf("The execution time for mean function is : %f\n", ((double)(mean_end  - mean_start)) / CLOCKS_PER_SEC);



The problem is that the execution time for ARM function is 0.001620 and the execution time for FPGA function is 0.002304 for the same image. So the function is to accelerated in hardware. Can you suggest any improvements that could be made to FPGA function in order to obtain a better execution time?


Thank you!

Posts: 5,143
Registered: ‎03-31-2012

Re: SDSoC Better execution time on ARM than on FPGA

@bogdan.deac First a disclaimer: I don't understand the sdsoc pragma well yet so I'm assuming you're setting up the memory copy properly in a streaming way.


My main comment is that a simple sum of all pixels is probably not a good candidate for acceleration in any case. It doesn't have enough density of datapath to be much better than the processor which is probably running ~4x times faster. Also the frame buffer is declared as non-cached so access to it through the PL will potentially be less efficient than the processor. Did you set the memory non-cached for the processor test too? Another minor optimization would be to hard-wire the image size to get better PL  efficiency but I am not sure how much this would help.

- Please mark the Answer as "Accept as solution" if information provided is helpful.
Give Kudos to a post which you think is helpful and reply oriented.