UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

Adam Taylor’s MicroZed Chronicles Part 98: SDSoC In depth Example Part 5

by Xilinx Employee ‎08-31-2015 11:15 AM - edited ‎01-06-2016 12:59 PM (53,533 Views)

 

By Adam Taylor

 

 

We last left this example with the AES encryption algorithm running on the PL (programmable logic) side of the device and taking 36010 processor clock cycles. That was the initial result after using SDSoC in a sort of “blind” way with no optimization, which gave a 2% performance improvement over the 36662 cycles needed to execute the AES algorithm in software running on the Zynq SoC’s ARM Cortex-A9 MPCore processor. In this blog, we will use SDSoC’s optimization commands and a few other tricks to significantly cut the number of clock cycles required to perform the encryption.

Accelerating the AES algorithm is slightly more complicated than the matrix multiplication algorithm we previously looked at. This is because the main loop of the AES algorithm is interdependent. That is, the result from one function must be computed before the next function can run.

 

The strategy I undertook for accelerating the AES algorithm is as follows:

 

 

  • Examine the loops to see where I could unroll them
  • Optimize the memory bandwidth
  • Select the correct frequency for the data motion clock frequency
  • Select the correct frequency for the hardware functions

 

I should mention that I am using the latest version of SDSoC 2015.2. This is slightly different than the version we were previously using and the new version introduces a configurable projects tab through which we can easily select the function we wish to accelerate and the clocks used for moving the data and for hardware functions.

 

 

Image1.jpg

 

As discussed in the previous blog post, the main loop of the AES encryption function consists of functions that perform each AES step. Consequently, the algorithm’s main loop consists of interdependent stages, unlike in the previous matrix multiplication example. Each function in the AES algorithm must be completed and the result computed before the next function can run. This is called interdependency and it requires us to use a different approach to acceleration in contrast to the previous pipelining example. To get the best performance for the AES algorithm, we must focus our efforts on the AES steps created as separate functions. There is plenty of potential for optimization within these steps. There is also some data-flow pipelining available for optimization, which we will look at in another blog.

 

Several AES functions—add round key, substitute bytes, and mix columns—can be pipelined for increased performance. Within these functions, we use the HLS Pipeline command by putting pragmas within the first loop. The inner loop should be unrolled. Several of these functions read from look up tables normally built from BRAM (Block RAM) and the memory bandwidth needs to be increased, so for this example I have specified the pragma parameter “complete” which implements the memory contents as discrete registers as opposed to a BRAM.

 

The ability to transfer the data between the PS (processor system) and the PL is also of key importance in boosting performance. My first step was to set the data motion clock network at its highest possible clock frequency: 200MHz. The second approach was to ensure that DMA was used for data transfer between the PS and PL. To do this, I had to re-write the interface slightly and use the sds_alloc function to ensure that the data was contiguous in memory, as required for a DMA transfer.

 

 

Image2.jpg

 

 

 

My third and final optimization step was to set the hardware functions lock rate at the highest frequency supported for this application it was 166.67MHz.

 

When I finally put these all together and build the example the code ran in 16544 processor clock cycles, which is 16544 / 36662 = 45% of the cycles needed when running the AES code in software alone. That’s a massive 55% reduction in execution time for a fairly complex and interdependent algorithm.

 

The code is available on Github as always.

 

 

 

 

 MicroZed Chronicles.jpg

 

 

 

 

 

 

 

Now, you can have convenient, low-cost Kindle access to the first year of Adam Taylor’s MicroZed Chronicles for a mere $7.50. Click here.

 

 

Please see the previous entries in this MicroZed Chronicles series by Adam Taylor:

 

Adam Taylor’s MicroZed Chronicles Part 97: SDSoC In depth Example Part 4

 

Adam Taylor’s MicroZed Chronicles Part 96: SDSoC In-Depth Example Part 3

 

Adam Taylor’s MicroZed Chronicles Part 95: SDSoC In-Depth Example Part 2

 

Adam Taylor’s MicroZed Chronicles Part 94: SDSoC In depth Example Part 1

 

Adam Taylor’s MicroZed Chronicles Part 93: SDSoC Debugging with Linux Part 9

 

Adam Taylor’s MicroZed Chronicles Part 92: SDSoC Verification & Build Issues Part 8

 

Adam Taylor’s MicroZed Chronicles Part 91: More on High-Level Synthesis and SDSoC, Part 7

 

Adam Taylor’s MicroZed Chronicles Part 90: Introduction to High-Level Synthesis and SDSoC, Part 6

 

Adam Taylor’s MicroZed Chronicles Part 89: SDSoC Optimization, Part 5

 

Adam Taylor’s MicroZed Chronicles Part 88: SDSoC Part 4—a look under the hood

 

Adam Taylor’s MicroZed Chronicles Part 87: Getting SDSoC up and running Part 3

 

Adam Taylor’s MicroZed Chronicles Part 86: Getting SDSoC up and running

 

Adam Taylor’s MicroZed Chronicles Part 85: SDSoC—the first instalment

 

Adam Taylor’s MicroZed(ish) Chronicles Part 84: Simple Communication Interfaces Part 4

 

Adam Taylor’s MicroZed(ish) Chronicles Part 83: Simple Communication Interfaces Part 3

 

Adam Taylor’s MicroZed(ish) Chronicles Part 82: Simple Communication Interfaces Part 2

 

Adam Taylor’s MicroZed(ish) Chronicles Part 81: Simple Communication Interfaces

 

Adam Taylor’s MicroZed Chronicles Part 80: LWIP Stack Configuration

 

Adam Taylor’s MicroZed Chronicles Chronicles Part 79: Zynq SoC Ethernet Part III

 

Adam Taylor’s MicroZed Chronicles Chronicles Part 78: Zynq SoC Ethernet Part II

 

Adam Taylor’s MicroZed Chronicles Microzed Chronicles Part 77 – Introducing the Zynq SoC’s Ethernet

 

Adam Taylor’s MicroZed Chronicles Part 76: Constraints for Relatively Placed Macros

 

Adam Taylor’s MicroZed Chronicles, Part 75: Placement Constraints – Pblocks

 

Adam Taylor’s MicroZed Chronicles, Part 73: Physical Constraints

 

Adam Taylor’s MicroZed Chronicles, Part 73: Working with other Zynq-Based Boards

 

Adam Taylor’s MicroZed Chronicles, Part 72: Multi-cycle Constraints

 

Adam Taylor’s MicroZed Chronicles, Part 70: Constraints—Clock Relationships and Avoiding Metastability

 

Adam Taylor’s MicroZed Chronicles, Part 70: Constraints—Introduction to timing and defining a clock

 

Adam Taylor’s MicroZed Chronicles Part 69: Zynq SoC Constraints Overview

 

Adam Taylor’s MicroZed Chronicles Part 68: AXI DMA Part 3, the Software

 

Adam Taylor’s MicroZed Chronicles Part 67: AXI DMA II

 

Adam Taylor’s MicroZed Chronicles Part 66: AXI DMA

 

Adam Taylor’s MicroZed Chronicles Part 65: Profiling Zynq Applications II

 

Adam Taylor’s MicroZed Chronicles Part 64: Profiling Zynq Applications

 

Adam Taylor’s MicroZed Chronicles Part 63: Debugging Zynq Applications

 

Adam Taylor’s MicroZed Chronicles Part 62: Answers to a question on the Zynq XADC

 

Adam Taylor’s MicroZed Chronicles Part 61: PicoBlaze Part Six

 

Adam Taylor’s MicroZed Chronicles Part 60: The Zynq and the PicoBlaze Part 5—controlling a CCD

 

Adam Taylor’s MicroZed Chronicles Part 59: The Zynq and the PicoBlaze Part 4

 

Adam Taylor’s MicroZed Chronicles Part 58: The Zynq and the PicoBlaze Part 3

 

Adam Taylor’s MicroZed Chronicles Part 57: The Zynq and the PicoBlaze Part Two

 

Adam Taylor’s MicroZed Chronicles Part 56: The Zynq and the PicoBlaze

 

Adam Taylor’s MicroZed Chronicles Part 55: Linux on the Zynq SoC

 

Adam Taylor’s MicroZed Chronicles Part 54: Peta Linux SDK for the Zynq SoC

 

Adam Taylor’s MicroZed Chronicles Part 53: Linux and SMP

 

Adam Taylor’s MicroZed Chronicles Part 52: One year and 151,000 views later. Big, Big Bonus PDF!

 

Adam Taylor’s MicroZed Chronicles Part 51: Interrupts and AMP

 

Adam Taylor’s MicroZed Chronicles Part 50: AMP and the Zynq SoC’s OCM (On-Chip Memory)

 

Adam Taylor’s MicroZed Chronicles Part 49: Using the Zynq SoC’s On-Chip Memory for AMP Communications

 

Adam Taylor’s MicroZed Chronicles Part 48: Bare-Metal AMP (Asymmetric Multiprocessing)

 

Adam Taylor’s MicroZed Chronicles Part 47: AMP—Asymmetric Multiprocessing on the Zynq SoC

 

Adam Taylor’s MicroZed Chronicles Part 46: Using both of the Zynq SoC’s ARM Cortex-A9 Cores

 

Adam Taylor’s MicroZed Chronicles Part 44: MicroZed Operating Systems—FreeRTOS

 

Adam Taylor’s MicroZed Chronicles Part 43: XADC Alarms and Interrupts 

 

Adam Taylor’s MicroZed Chronicles MicroZed Part 42: MicroZed Operating Systems Part 4

 

Adam Taylor’s MicroZed Chronicles MicroZed Part 41: MicroZed Operating Systems Part 3

 

Adam Taylor’s MicroZed Chronicles MicroZed Part 40: MicroZed Operating Systems Part Two

 

Adam Taylor’s MicroZed Chronicles MicroZed Part 39: MicroZed Operating Systems Part One

 

Adam Taylor’s MicroZed Chronicles MicroZed Part 38 – Answering a question on Interrupts

 

Adam Taylor’s MicroZed Chronicles Part 37: Driving Adafruit RGB NeoPixel LED arrays with MicroZed Part 8

 

Adam Taylor’s MicroZed Chronicles Part 36: Driving Adafruit RGB NeoPixel LED arrays with MicroZed Part 7

 

Adam Taylor’s MicroZed Chronicles Part 35: Driving Adafruit RGB NeoPixel LED arrays with MicroZed Part 6

 

Adam Taylor’s MicroZed Chronicles Part 34: Driving Adafruit RGB NeoPixel LED arrays with MicroZed Part 5

 

Adam Taylor’s MicroZed Chronicles Part 33: Driving Adafruit RGB NeoPixel LED arrays with the Zynq SoC

 

Adam Taylor’s MicroZed Chronicles Part 32: Driving Adafruit RGB NeoPixel LED arrays

 

Adam Taylor’s MicroZed Chronicles Part 31: Systems of Modules, Driving RGB NeoPixel LED arrays

 

 Adam Taylor’s MicroZed Chronicles Part 30: The MicroZed I/O Carrier Card

 

Zynq DMA Part Two – Adam Taylor’s MicroZed Chronicles Part 29

 

The Zynq PS/PL, Part Eight: Zynq DMA – Adam Taylor’s MicroZed Chronicles Part 28  

 

The Zynq PS/PL, Part Seven: Adam Taylor’s MicroZed Chronicles Part 27

 

The Zynq PS/PL, Part Six: Adam Taylor’s MicroZed Chronicles Part 26

 

The Zynq PS/PL, Part Five: Adam Taylor’s MicroZed Chronicles Part 25

 

The Zynq PS/PL, Part Four: Adam Taylor’s MicroZed Chronicles Part 24

 

The Zynq PS/PL, Part Three: Adam Taylor’s MicroZed Chronicles Part 23

 

The Zynq PS/PL, Part Two: Adam Taylor’s MicroZed Chronicles Part 22

 

The Zynq PS/PL, Part One: Adam Taylor’s MicroZed Chronicles Part 21

 

Introduction to the Zynq Triple Timer Counter Part Four: Adam Taylor’s MicroZed Chronicles Part 20

 

Introduction to the Zynq Triple Timer Counter Part Three: Adam Taylor’s MicroZed Chronicles Part 19

 

Introduction to the Zynq Triple Timer Counter Part Two: Adam Taylor’s MicroZed Chronicles Part 18

 

Introduction to the Zynq Triple Timer Counter Part One: Adam Taylor’s MicroZed Chronicles Part 17

 

The Zynq SoC’s Private Watchdog: Adam Taylor’s MicroZed Chronicles Part 16

 

Implementing the Zynq SoC’s Private Timer: Adam Taylor’s MicroZed Chronicles Part 15

 

MicroZed Timers, Clocks and Watchdogs: Adam Taylor’s MicroZed Chronicles Part 14

 

More About MicroZed Interrupts: Adam Taylor’s MicroZed Chronicles Part 13

 

MicroZed Interrupts: Adam Taylor’s MicroZed Chronicles Part 12

 

Using the MicroZed Button for Input: Adam Taylor’s MicroZed Chronicles Part 11

 

Driving the Zynq SoC's GPIO: Adam Taylor’s MicroZed Chronicles Part 10

 

Meet the Zynq MIO: Adam Taylor’s MicroZed Chronicles Part 9

 

MicroZed XADC Software: Adam Taylor’s MicroZed Chronicles Part 8

 

Getting the XADC Running on the MicroZed: Adam Taylor’s MicroZed Chronicles Part 7

 

A Boot Loader for MicroZed. Adam Taylor’s MicroZed Chronicles, Part 6 

 

Figuring out the MicroZed Boot Loader – Adam Taylor’s MicroZed Chronicles, Part 5

 

Running your programs on the MicroZed – Adam Taylor’s MicroZed Chronicles, Part 4

 

Zynq and MicroZed say “Hello World”-- Adam Taylor’s MicroZed Chronicles, Part 3

 

Adam Taylor’s MicroZed Chronicles: Setting the SW Scene

 

Bringing up the Avnet MicroZed with Vivado

 

 

 

Comments
by Visitor jaguarxk000
on ‎07-08-2016 02:42 AM
we trid the code from github.however,it took about 3800 cycles on ps only,and 7000 cycles on pl .
by Visitor jaguarxk000
on ‎07-08-2016 02:45 AM
platform is zc706,and os is standalone.
that really cofused us.
Labels
About the Author
  • Be sure to join the Xilinx LinkedIn group to get an update for every new Xcell Daily post! ******************** Steve Leibson is the Director of Strategic Marketing and Business Planning at Xilinx. He started as a system design engineer at HP in the early days of desktop computing, then switched to EDA at Cadnetix, and subsequently became a technical editor for EDN Magazine. He's served as Editor in Chief of EDN Magazine, Embedded Developers Journal, and Microprocessor Report. He has extensive experience in computing, microprocessors, microcontrollers, embedded systems design, design IP, EDA, and programmable logic.