05-17-2017 06:49 AM
I'm new to power estimating with Vivado, using 2017.1, Kintex-7.
This question is about how to get the Total On-Chip Power estimate to update.
Right now, my incomplete design is showing 1.493W on the Power section of the Project Summary tab. The Confidence level is "Low". I clicked on "Implement Power Report", then on "Launch Power Constraint Advisor". Among the Low confidence nets, I found only 1 with fanout greater than 200 (it was 791). Coming from a system reset button, I set the static probability to 0 and the toggle rate to 0. I did some things I'll describe later, but the 1.493W estimate didn't change. Among the remaining low confidence nets, I found only 7 with fanout greater than 100 (they were all 148). Believing they were also reset related, I changed their static probability and toggle rate to 0 as well. The 1.493W estimate still didn't change.
I would expect at least a tiny change in the estimate. I believe it's not updating. In the "Power" tab is a refresh icon. After saving (OK) from the Power Constraints Advisor, I click this refresh icon. When I do so, I get a message "The report 'Top_power_routed" is up to date. Rerun anyway?". This already suggests things aren't updating. Why not? When I click OK to rerun, the estimate remains 1.493W. What do I need to do to make this update?
Those 7 with fanout greater than 100, as well as a number of others, appear to be the signal rd_rst_reg inside a number of FIFOs I'm using. While I created the FIFOs, I didn't create their innards. So I don't know for sure what this signal is. I suspect it's downstream from the rst input to the FIFO, but how can I be sure? If they are coming from the rst input, which only happens once at power up, I feel safe configuring the Static Probability and Toggle rate to zero. But what if they're not?
There are a bunch of other signals in the Power Constraints Advisor that have low or medium confidence that I know nothing about. How can I provide values for them if I don't know what they are?
My goal is to get a medium or high confidence in the power estimation. I figure this means I need to get rid of most of the low confidence lines in the Power Constraint Advisor, and perhaps also most of the medium ones. Of course, if I can get question #1 answered and start getting updates to the estimated wattage, I might be able to hedge my bets a little.
This is a general design question. Imagine I have 320M multiply-and-accumulates (MACs) that must be done every second and my system clock is 200 MHz. That means I must do 1.6 MACs per clock. I have a design choice. I might implement 4 MACs and feed them new data at a rate of 80 MHz. Or I might implement 32 MACs and feed them new data at a rate of 10 MHz.
I suspect that power "overhead" should be low, and because both of these solutions end up requiring the same number of signal transitions to perform the same amount of calculating, that the power consumed by these two strategies should be very similar. That is, whether I implement 4 MACs under medium load or 32 MACs under light load, the power consumption should be nearly the same.
Do you agree?
Note that I have enough resources to implement the 32 MACs, and control logic will be more simple. (I'm going to refrain from further hypothesizing about increased or reduced control logic power overhead, vs leakage that's there whether or not I use resources in the FPGA, vs whatever else.)
You know, I looked in several places and couldn't find the max package power allowed for the Kintex-7 FFG676. I could start with that as a very top end unreasonable power estimate. But I couldn't find it. Where might I find that?
05-17-2017 08:00 AM
Use the power estimator spreadsheet to play around with heatsinks, ambient temperature, and airflow.
The maximum power to be dissipated depends on many things, and there is no specification for that. It is up to you to control the junction temperature such that it does not exceed the limit for your device: 85 C for commercial, 100 C for industrial, 125 C for military or automotive.
Generally, lower clock with more elements is always lower power due to various reasons. One of which is unused logic still dissipates static power, so you might as well use it. Higher clock rates also tend to increase the glitch power (signals arriving at different times rattling the LUT outputs). Lower power means lower internal temperature which means less leakage. Less leakage means less power. Power increases exponentially with junction temperature.
The tools are only as good as the data you give them. From your post, it is clear to are not giving the tool anything useful. A simulation (long enough to get to your main functions, not just startup) captured allows the tools to do a much better job. Without that, it really has no idea what the signal toggle rates are.
A search for "power" on xilinx.com will reveal a wealth of resources and guides to do what I have described above.
05-17-2017 09:54 AM
Thanks for the info, but I need more info to move forward.
I looked at the power estimator spreadsheet, and it appears to be a LARGE AMOUNT of data entry to get an estimate. And where do I get that info from? I don't have a full time week in the budget allowed for this, and it looks like that much work or more. Why not at least *start* with the power estimate Vivado is giving me? (I've written a bunch of Verilog code, which puts one in a mindset of coding rather than wiring. I don't know how many signals there are or what the fanout is. Extracting that seems like a HUGE effort. It seems to me I could focus that same effort on finishing up the design faster, and then measure for real. So I don't see the value vs accuracy of the power estimator spreadsheet for a large project.)
Can you please answer my question about how to make Vivado update it's estimate? That was "QUESTION ONE" in my original post. In turn, can you please give me insight into "QUESTION TWO"?
I think you answered my "QUESTION THREE", although not directly. I take it from you that you're saying that 32 MAC's clocked slower will use less power than 4 MAC's clocked faster, especially because the other 28 unused MAC's are still dissipating static power. (I did imagine that scenario, already.) Now, I'm **not** talking about slowing down the clock to the 32 MAC's. I'm just talking about changing the inputs slower. This should have a similar although not identical result. Would you still agree that 32 MAC's given slower data should take less power? Thanks.
Thanks again. You've answered MANY of my questions in the past and I sincerely appreciate it. For real. Note I've been doing electronics for over 40 years. Three degrees in EE from Ga Tech, piled higher and deeper. I've backed into FPGA work in the past 2 years, never with the opportunity to do it the right way slowly from the front. That's why I find myself only now worrying about power in advance.
05-17-2017 10:10 AM - edited 05-17-2017 10:19 AM
As far as Vivado power analyzer goes, I believe the test-bench .vcs file is required.
Keeping the clock fast, and slowing data should still be an overall savings, but it will be less. I suggest slowing down the clock. Slowing down the clock is free (just one more clock output on a MMCM/PLL that is there for you). Multiplexing/demultiplexing narrow fast to wide slow is not all that difficult.
And, at 1.5 watts, it looks like you are not doing anything at all. Check that the tools are keeping all your logic (it may be ripping out everything because things are not connected).
I believe you can write your design (export) to the data file needed by the spreadsheet (and vice-versa). Look in the TCL users guide ug835:
report_power, page 820, -xpe arg - (Optional) Output the results to an XML file for importing into XPower Estimator
or XPower Analyzer
05-17-2017 11:51 AM
Thanks. Acknowledged. I'll follow your suggestions.
Note I'm aware of optimizer stripping stuff out, but in the Netlist I have checked in the past to make sure all major components are there, and even clicked on them to see highlight in the Device view. I am only using very little so far. It's mostly input/output with fifo flow through logic for now. I haven't yet confirmed my flow through in hardware, however, but as I said it seems to be placed. I haven't specified true input bandwidth yet because I haven't seen the 1.4?? number budge yet. But I do have internal traffic simulators/generators that should be speed self defined, and they're running all the time, never yet disabled.
QUESTION REMAINS: Any advice on question two? Related, is that list of nets in the Power Constraints Advisor only about reset logic and not actual data flowing?
05-17-2017 12:12 PM
Ignore fan out,
All interconnect is buffered, so it is a nit (hardly matters for any reason). Yes, a fan out of 100,000 on a net might be a good indication to use a global clock buffer instead of tons of local routing.
Right now, you do not have enough logic or fast enough clock to get even the 5th least significant digit to change.
Think about it: 100's of thousands of LUT and DFF, at 300 MHz may only be 5 to 10 dynamic watts. So all you see is static power.
You may wish to use a generate statement to create thousands (ten thousands) of copies to get some actual results.