cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
cerilet
Explorer
Explorer
5,296 Views
Registered: ‎08-26-2014

HLS resources and latency increase without any sense

Hello,

 

I am coding a program using double-precision floating-point variables to be implemented in a Zynq. I have resource problems and I am trying to shrink it in order to make it fit in the fabric (precision reduction is not a solution).

 

The algorithm calculates a Tustin discretization of a doubly-fed induction generator (DFIG), which includes among other operations one 4x4 matrix inversion and two 4x4 matrix multiplications.

 

I have tested two different 4x4 matrix multiplications in a separate project and the latency, initiation interval and resources usage are the following:

 

Implementation A:

Latency = 60

Initiation Interval = 61

BRAM = 0 (0% of resources)

DSP48E = 28 (12% of resources)

FF = 2664 (4% of resources)

LUT = 4261 (8% of resources)

 

Implementation B:

Latency = 80

Initiation Interval = 81

BRAM = 2 (1% of resources)

DSP48E = 14 (6% of resources)

FF = 1151 (1% of resources)

LUT = 1984 (4% of resources)

 

Then, I just use these matrix multiplications as functions in the whole algorithm.
 

Regarding the whole program, the resources used using either matrix multiplication version A and B are the following:

 

Using matrix multiplication version A:

Latency = 382

Initiation Interval = 383

BRAM = 16 (5% of resources)

DSP48E = 300 (136% of resources)

FF = 27279 (25% of resources)

LUT = 43124 (81 % of resources)

 

 

Using matrix multiplication version B:

Latency = 391

Initiation Interval = 392

BRAM = 20 (7% of resources)

DSP48E = 314 (142% of resources)

FF = 25100 (23% of resources)

LUT = 41491 (77 % of resources)

 

It doesn't make much sense, does it? Can somebody tell me why is this happening or how can I reduce the number of DSPs used?

 

Many thanks,

 

Cerilet

Tags (3)
0 Kudos
1 Reply
debrajr
Moderator
Moderator
5,275 Views
Registered: ‎04-17-2011

Few quick suggestions to try would be:
a_ Use #pragma HLS INLINE on the matrix-multiplication function in your complete project
b_ Use the ALLOCATION directive on the matrix-multiplication function to limit the number of multiplication operations inorder to encourage Resource sharing. These mul operations usually gets fitted in the DSP.
c_ Use a higher order of binding by setting the config_bind in Solution Configurations
Regards,
Debraj
----------------------------------------------------------------------------------------------
Kindly note- Please mark the Answer as "Accept as solution" if information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
----------------------------------------------------------------------------------------------
0 Kudos