10-04-2018 12:41 AM
I have a design with several HLS IPs.
Generally, in my designs, minimizing power consumption is a high priority.
Some of HLS IPs performs their task rarely.
In power consumption perspective, what happens when "ap_reset" is active?
In power consumption perspective, what happens when "ap_start" is low (non active)?
Is there any HLS methodology guidelines for minimize power consumption?
(Beside converting Floating Point to Fixed Point).
10-09-2018 03:32 PM
This is a very good question.
First - generally speaking, if no task is executing (i.e., ap_reset=1 or ap_start=0) then most logic inside an HLS block is unlikely to be enabled, resulting in a low power state, effectively only consuming static power. You can examine static and dynamic power consumption using the Xilinx Power Estimator, or XPE, in Vivado. It is possible that if the external inputs are transitioning, then this may result in some signals inside the HLS block transitioning resulting in very small dynamic power consumption for the block.
As for general guidelines - small, efficient designs are almost always lowest power. It's difficult for HLS to optimize for power because these optimizations often make the implementation more difficult. As such, most power optimization is left to Vivado to do during the RTL implementation stage.
You make a good point that converting floating to fixed point is a good power reducing strategy; I would add that one should use arbitrary precision data types. Remember - a 64 bit adder utilizes greater than twice the resources of a 32 bit adder, and the C standard defaults to 64 bit integers on a 64 bit system. The largest number that can fit into an unsigned 32 bit integer is over 4 billion, so for most applications you could probably use even less than 32 bits, and Vivado is capable of implementing even, say, 9 bit data types inside the programmable logic.
In general, in HLS you should focus on the efficiency of a design, and that will be very dependent on your code. As an example, my intuition was to suggest that you not unroll or pipeline loops because that would utilize fewer resources. However, I did check with the development team first and they advised that this was not the case. In some cases, unrolling loops can greatly reduce the logic associated with muxing. Pipelining can also increase the efficiency of a design reducing overall energy. So again - it's going to depend greatly on the code.
I hope that helps; let me know if there's any other information I can provide that might be of assistance.
10-11-2018 07:23 AM - edited 10-13-2018 10:37 PM
Thank for reply.
Your answer is very helpful.
Usually I tie the ap_start to a constant '1' (the main input/output stream data are managed by the axis protocol).
But now I understand that I have a motivation to drive ap_start ='0' when axis input is empty in order to bring the HLS IP to an idle mode (minimizing power consumption).
By activate reset (ap_rst_n), I get the same power consumption efficiency like deactivate ap_start = '0'?
By gating the clock outside the HLS IP, I can get much better results in term of power consumption?
About your general notes:
**Our applications are very heavy and therefore we are using only arbitrary precision data types (even the "for loops" counter index).
**Most of our applications are real time streaming data processing (data-flow without feedback) and must handle initiation interval II = 1 (of course using pipeline)
Our strategy usually is to divide the design into several IP cores, thereby reducing the complexity and it improves efficiency (and minimizing power consumption in RTL level).
10-11-2018 08:56 AM
Yes - no matter how you disable the HLS block (reset, start, or gated clock), the resulting power consumption savings should be similar. In either of those cases, no task is executing, the HLS block is unlikely to be enabled, and the block will effectively consume only static power.
Also - I've been asked to add - once you're in Vivado you can use power_opt to reduce logic power as well. This feature can analyze the design activity and create slice clock enables to suppress unnecessary toggling in a design.
If for example your design computes several multiplications but only for a fraction of the total time, Power_opt can detect that and bring a clock enable to the input of the multipliers so that they compute only when needed.