UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

Adam Taylor’s MicroZed Chronicles, Part 201: SDK Performance Analysis and Profiling

by Xilinx Employee on ‎06-12-2017 11:18 AM (4,479 Views)

 

By Adam Taylor

 

We can create very responsive design solutions using Xilinx Zynq SoC or Zynq UltraScale+ MPSoC devices, which enble us to architect systems that exploit the advantages provided by both the PS (processor system) and the PL (programmable logic) in these devices. When we work with logic designs in the PL, we can optimize the performance of design techniques like pipelining and other UltraFast design methods. We can see the results of our optimization techniques using simulation and Vivado implementation results.

 

When it comes to optimizing the software, which runs on acceleration cores instantiated in the PS, things may appear a little more opaque. However, things are not what they might appear. We can gather statistics on our accelerated code with ease using the performance analysis capabilities built into XSDK. Using performance analysis, we can examine the performance of the software we have running on the acceleration cores and we can monitor AXI performance within the PL to ensure that the software design is optimized for the application at hand.

 

Using performance analysis, we can examine several aspects of our running code:

 

  • CPU Utilization – Percentage of non-idling CPU clock cycles
  • CPU Instructions Per Cycle – Estimated number of executed instructions per cycle
  • L1 Cache Data Miss Rate % – L1 data-cache miss rate
  • L1 Cache Access Per msec – Number of L1 data-cache accesses
  • CPU Write Instructions Stall per cycle – Estimated number of stall cycles per instruction
  • CPU Read Instructions Stall per cycle – Estimated number of stall cycles per instruction

 

For those who may not be familiar with the concept, a stall occurs when the cache does not contain the requested data, which must then be fetched from main memory. While the data is fetched, the core can continue to process different instructions using out-of-order (OOO) execution, however the processor will eventually run out of independent instructions. It will have to wait for the information it needs. This is called a stall.

 

We can gather these stall statistics thanks to the Performance Monitor Unit (PMU) contained within each of the Zynq UltraScale+ MPSoC’s CPUs. The PMU provides six profile counters, which are configured by and post processed by XSDK to generate the statistics above.

 

If we want to use the performance monitor within SDK, we need to work with a debug build and then open the Performance Monitor Perspective within XSDK. If we have not done so before, we can open the perspective as shown below:

 

 

Image1.jpg 

 

 

Image2.jpg

 

 

Opening the Performance Analysis Perspective

 

 

With the performance analysis perspective open, we can debug the application as normal. However, before we click on the run icon (the debugger should be set to stop at main, as default), we need to start the performance monitor. To do that, right click on the “System Debugger on Local” symbol within the performance monitor window and click start.

 

 

Image3.jpg

 

Starting the Performance Analysis

 

 

 

Then, once we execute the program, the statistics will be gathered and we can analyse them within XDSK to determine the best optimizations for our code.

 

To demonstrate how we can use this technique to deliver a more optimized system, I have created a design that runs on the ZedBoard and performs AES256 Encryption on 1024 packets of information. When this code was run the ZedBoard the following execution statistics were collected:

 

 

Image4.jpg

 

Performance Graphs

 

 

 

Image5.jpg

 

Performance Counters

 

 

 

So far, these performance statistics only look at code executing on the PS itself. Next time, we will look at how we can use the AXI Performance Monitor with XSDK. If we wish to do this, we need to first instrument the design in Vivado.

 

 

 

 

Code is available on Github as always.

 

If you want E book or hardback versions of previous MicroZed chronicle blogs, you can get them below.

 

 

 

  • First Year E Book here
  • First Year Hardback here.

 

 

MicroZed Chronicles hardcopy.jpg 

  

 

  • Second Year E Book here
  • Second Year Hardback here

 

 

MicroZed Chronicles Second Year.jpg 

 

 

Labels
About the Author
  • Be sure to join the Xilinx LinkedIn group to get an update for every new Xcell Daily post! ******************** Steve Leibson is the Director of Strategic Marketing and Business Planning at Xilinx. He started as a system design engineer at HP in the early days of desktop computing, then switched to EDA at Cadnetix, and subsequently became a technical editor for EDN Magazine. He's served as Editor in Chief of EDN Magazine, Embedded Developers Journal, and Microprocessor Report. He has extensive experience in computing, microprocessors, microcontrollers, embedded systems design, design IP, EDA, and programmable logic.