cancel
Showing results for
Show  only  | Search instead for
Did you mean:
Observer
2,833 Views
Registered: ‎05-03-2017

## what does estimated clock(hardware accelerated) consist of in performance report ?

Hi,

I'm new to SDSoC, currently I have designed a hardware function, it takes inData and weight as input to execute 3*3 2-D convolution(with zero padding), and save the result to outData. Both PS and PL clock frequency is set to 166.7MHz:

`void top(int inData[28*28], int weight[9], int outData[28*28]);`

This function gives the expected result. In HLS report, it shows that the total cycles in PL is 2440, however, the estimated clock reported in performance estimation is 23331, which is much more than 2440. Even if I take data transfer clocks and data mover setup clocks into consideration, 23331 is still too large.

`2440 + (1010 + 3815) * 2 + 1004 + 1057 = 14151`

So I want to know what exactly consist of the estimated clock(hardware accelerated) in performance estimation report ? Any help will be appreciated, thanks!

ivagli

1 Solution

Accepted Solutions
Teacher
4,689 Views
Registered: ‎03-31-2012

@ivagli you are making a dimension error in

`2440 + (1010 + 3815) * 2 + 1004 + 1057 = 14151`

2440 is PL cycles so you need to multiply it by ~800/166.7 =>

2440 * 4.8 + ... = 23423

not sure why it's being reported as 23331

Give Kudos to a post which you think is helpful and reply oriented.
6 Replies
Teacher
4,690 Views
Registered: ‎03-31-2012

@ivagli you are making a dimension error in

`2440 + (1010 + 3815) * 2 + 1004 + 1057 = 14151`

2440 is PL cycles so you need to multiply it by ~800/166.7 =>

2440 * 4.8 + ... = 23423

not sure why it's being reported as 23331

Give Kudos to a post which you think is helpful and reply oriented.
Teacher
2,815 Views
Registered: ‎03-31-2012

@ivagli also you have to realize how badly unbalanced this setup is. Assuming my assumption is correct on interpretation of 2440, you are spending 11712 cycles to do the computation and 11711 cycles to transfer data. You should be able to do much better (bundle both vectors together so input time is roughly half, use two HP ports to increase throughput, run data transfer faster)

Give Kudos to a post which you think is helpful and reply oriented.
Observer
2,775 Views
Registered: ‎05-26-2016

Check your platform. The PS clock should be 666.666MHz

Observer
2,733 Views
Registered: ‎05-03-2017

@muzaffer thanks for your reply! As you see, the throughput of the design is not good. You mention that I can "bundle both vectors together so input time is roughly half, use two HP ports to increase throughput", I use separate HP ports to transfer inData and weight,  do you mean that I can bundle this two input vectors into a single vector of 64-bit, and transfer it with one HP port, and leave another HP port for outData?

Teacher
2,725 Views
Registered: ‎03-31-2012

@ivagli the issue with using separate ports for indata & weight is that their size is too different so you pay a large cost for dma setup & get very little return in parallelism as weight is tiny. It would be faster if you setup dma once for both arrays and read them 16 bytes at a time (instead of 8 with a single HP at 64 bits). There is no point in using outData on a separate HP port because reading and writing are spatially separate so you can use the same port(s) for both.