Showing results for 
Show  only  | Search instead for 
Did you mean: 
Registered: ‎08-26-2014

HLS resources and latency increase without any sense



I am coding a program using double-precision floating-point variables to be implemented in a Zynq. I have resource problems and I am trying to shrink it in order to make it fit in the fabric (precision reduction is not a solution).


The algorithm calculates a Tustin discretization of a doubly-fed induction generator (DFIG), which includes among other operations one 4x4 matrix inversion and two 4x4 matrix multiplications.


I have tested two different 4x4 matrix multiplications in a separate project and the latency, initiation interval and resources usage are the following:


Implementation A:

Latency = 60

Initiation Interval = 61

BRAM = 0 (0% of resources)

DSP48E = 28 (12% of resources)

FF = 2664 (4% of resources)

LUT = 4261 (8% of resources)


Implementation B:

Latency = 80

Initiation Interval = 81

BRAM = 2 (1% of resources)

DSP48E = 14 (6% of resources)

FF = 1151 (1% of resources)

LUT = 1984 (4% of resources)


Then, I just use these matrix multiplications as functions in the whole algorithm.

Regarding the whole program, the resources used using either matrix multiplication version A and B are the following:


Using matrix multiplication version A:

Latency = 382

Initiation Interval = 383

BRAM = 16 (5% of resources)

DSP48E = 300 (136% of resources)

FF = 27279 (25% of resources)

LUT = 43124 (81 % of resources)



Using matrix multiplication version B:

Latency = 391

Initiation Interval = 392

BRAM = 20 (7% of resources)

DSP48E = 314 (142% of resources)

FF = 25100 (23% of resources)

LUT = 41491 (77 % of resources)


It doesn't make much sense, does it? Can somebody tell me why is this happening or how can I reduce the number of DSPs used?


Many thanks,



Tags (3)
0 Kudos
1 Reply
Registered: ‎04-17-2011

Few quick suggestions to try would be:
a_ Use #pragma HLS INLINE on the matrix-multiplication function in your complete project
b_ Use the ALLOCATION directive on the matrix-multiplication function to limit the number of multiplication operations inorder to encourage Resource sharing. These mul operations usually gets fitted in the DSP.
c_ Use a higher order of binding by setting the config_bind in Solution Configurations
Kindly note- Please mark the Answer as "Accept as solution" if information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
0 Kudos