We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for
Did you mean:

# Adam Taylor’s MicroZed Chronicles Part 98: SDSoC In depth Example Part 5

Xilinx Employee
0 2 112K

We last left this example with the AES encryption algorithm running on the PL (programmable logic) side of the device and taking 36010 processor clock cycles. That was the initial result after using SDSoC in a sort of “blind” way with no optimization, which gave a 2% performance improvement over the 36662 cycles needed to execute the AES algorithm in software running on the Zynq SoC’s ARM Cortex-A9 MPCore processor. In this blog, we will use SDSoC’s optimization commands and a few other tricks to significantly cut the number of clock cycles required to perform the encryption.

Accelerating the AES algorithm is slightly more complicated than the matrix multiplication algorithm we previously looked at. This is because the main loop of the AES algorithm is interdependent. That is, the result from one function must be computed before the next function can run.

The strategy I undertook for accelerating the AES algorithm is as follows:

• Examine the loops to see where I could unroll them
• Optimize the memory bandwidth
• Select the correct frequency for the data motion clock frequency
• Select the correct frequency for the hardware functions

I should mention that I am using the latest version of SDSoC 2015.2. This is slightly different than the version we were previously using and the new version introduces a configurable projects tab through which we can easily select the function we wish to accelerate and the clocks used for moving the data and for hardware functions.

As discussed in the previous blog post, the main loop of the AES encryption function consists of functions that perform each AES step. Consequently, the algorithm’s main loop consists of interdependent stages, unlike in the previous matrix multiplication example. Each function in the AES algorithm must be completed and the result computed before the next function can run. This is called interdependency and it requires us to use a different approach to acceleration in contrast to the previous pipelining example. To get the best performance for the AES algorithm, we must focus our efforts on the AES steps created as separate functions. There is plenty of potential for optimization within these steps. There is also some data-flow pipelining available for optimization, which we will look at in another blog.

Several AES functions—add round key, substitute bytes, and mix columns—can be pipelined for increased performance. Within these functions, we use the HLS Pipeline command by putting pragmas within the first loop. The inner loop should be unrolled. Several of these functions read from look up tables normally built from BRAM (Block RAM) and the memory bandwidth needs to be increased, so for this example I have specified the pragma parameter “complete” which implements the memory contents as discrete registers as opposed to a BRAM.

The ability to transfer the data between the PS (processor system) and the PL is also of key importance in boosting performance. My first step was to set the data motion clock network at its highest possible clock frequency: 200MHz. The second approach was to ensure that DMA was used for data transfer between the PS and PL. To do this, I had to re-write the interface slightly and use the sds_alloc function to ensure that the data was contiguous in memory, as required for a DMA transfer.

My third and final optimization step was to set the hardware functions lock rate at the highest frequency supported for this application it was 166.67MHz.

When I finally put these all together and build the example the code ran in 16544 processor clock cycles, which is 16544 / 36662 = 45% of the cycles needed when running the AES code in software alone. That’s a massive 55% reduction in execution time for a fairly complex and interdependent algorithm.

The code is available on Github as always.

Now, you can have convenient, low-cost Kindle access to the first year of Adam Taylor’s MicroZed Chronicles for a mere \$7.50. Click here.

Please see the previous entries in this MicroZed Chronicles series by Adam Taylor:

Adam Taylor’s MicroZed(ish) Chronicles Part 83: Simple Communication Interfaces Part 3

Adam Taylor’s MicroZed Chronicles Microzed Chronicles Part 77 – Introducing the Zynq SoC’s Ethernet

Adam Taylor’s MicroZed Chronicles, Part 70: Constraints—Introduction to timing and defining a clock

Adam Taylor’s MicroZed Chronicles Part 61: PicoBlaze Part Six

Adam Taylor’s MicroZed Chronicles Part 59: The Zynq and the PicoBlaze Part 4

Adam Taylor’s MicroZed Chronicles Part 58: The Zynq and the PicoBlaze Part 3

Adam Taylor’s MicroZed Chronicles Part 56: The Zynq and the PicoBlaze

Adam Taylor’s MicroZed Chronicles Part 55: Linux on the Zynq SoC

Adam Taylor’s MicroZed Chronicles Part 52: One year and 151,000 views later. Big, Big Bonus PDF!

Adam Taylor’s MicroZed Chronicles Part 46: Using both of the Zynq SoC’s ARM Cortex-A9 Cores

Adam Taylor’s MicroZed Chronicles Part 44: MicroZed Operating Systems—FreeRTOS

Adam Taylor’s MicroZed Chronicles Part 43: XADC Alarms and Interrupts

Adam Taylor’s MicroZed Chronicles MicroZed Part 42: MicroZed Operating Systems Part 4

Adam Taylor’s MicroZed Chronicles MicroZed Part 41: MicroZed Operating Systems Part 3

Adam Taylor’s MicroZed Chronicles MicroZed Part 40: MicroZed Operating Systems Part Two

Adam Taylor’s MicroZed Chronicles MicroZed Part 39: MicroZed Operating Systems Part One

Adam Taylor’s MicroZed Chronicles MicroZed Part 38 – Answering a question on Interrupts

Adam Taylor’s MicroZed Chronicles Part 37: Driving Adafruit RGB NeoPixel LED arrays with MicroZed Part 8

Adam Taylor’s MicroZed Chronicles Part 36: Driving Adafruit RGB NeoPixel LED arrays with MicroZed Part 7

Adam Taylor’s MicroZed Chronicles Part 35: Driving Adafruit RGB NeoPixel LED arrays with MicroZed Part 6

Adam Taylor’s MicroZed Chronicles Part 34: Driving Adafruit RGB NeoPixel LED arrays with MicroZed Part 5

Adam Taylor’s MicroZed Chronicles Part 33: Driving Adafruit RGB NeoPixel LED arrays with the Zynq SoC

Adam Taylor’s MicroZed Chronicles Part 32: Driving Adafruit RGB NeoPixel LED arrays

Adam Taylor’s MicroZed Chronicles Part 31: Systems of Modules, Driving RGB NeoPixel LED arrays

Adam Taylor’s MicroZed Chronicles Part 30: The MicroZed I/O Carrier Card

Zynq DMA Part Two – Adam Taylor’s MicroZed Chronicles Part 29

The Zynq PS/PL, Part Seven: Adam Taylor’s MicroZed Chronicles Part 27

The Zynq PS/PL, Part Six: Adam Taylor’s MicroZed Chronicles Part 26

The Zynq PS/PL, Part Five: Adam Taylor’s MicroZed Chronicles Part 25

The Zynq PS/PL, Part Four: Adam Taylor’s MicroZed Chronicles Part 24

The Zynq PS/PL, Part Three: Adam Taylor’s MicroZed Chronicles Part 23

The Zynq PS/PL, Part Two: Adam Taylor’s MicroZed Chronicles Part 22

The Zynq PS/PL, Part One: Adam Taylor’s MicroZed Chronicles Part 21

Introduction to the Zynq Triple Timer Counter Part Four: Adam Taylor’s MicroZed Chronicles Part 20

Introduction to the Zynq Triple Timer Counter Part Three: Adam Taylor’s MicroZed Chronicles Part 19

Introduction to the Zynq Triple Timer Counter Part Two: Adam Taylor’s MicroZed Chronicles Part 18

Introduction to the Zynq Triple Timer Counter Part One: Adam Taylor’s MicroZed Chronicles Part 17

The Zynq SoC’s Private Watchdog: Adam Taylor’s MicroZed Chronicles Part 16

Implementing the Zynq SoC’s Private Timer: Adam Taylor’s MicroZed Chronicles Part 15

MicroZed Timers, Clocks and Watchdogs: Adam Taylor’s MicroZed Chronicles Part 14

More About MicroZed Interrupts: Adam Taylor’s MicroZed Chronicles Part 13

MicroZed Interrupts: Adam Taylor’s MicroZed Chronicles Part 12

Using the MicroZed Button for Input: Adam Taylor’s MicroZed Chronicles Part 11

Driving the Zynq SoC's GPIO: Adam Taylor’s MicroZed Chronicles Part 10

Meet the Zynq MIO: Adam Taylor’s MicroZed Chronicles Part 9

MicroZed XADC Software: Adam Taylor’s MicroZed Chronicles Part 8

Getting the XADC Running on the MicroZed: Adam Taylor’s MicroZed Chronicles Part 7

A Boot Loader for MicroZed. Adam Taylor’s MicroZed Chronicles, Part 6

Figuring out the MicroZed Boot Loader – Adam Taylor’s MicroZed Chronicles, Part 5

Running your programs on the MicroZed – Adam Taylor’s MicroZed Chronicles, Part 4

Zynq and MicroZed say “Hello World”-- Adam Taylor’s MicroZed Chronicles, Part 3

Adam Taylor’s MicroZed Chronicles: Setting the SW Scene

Bringing up the Avnet MicroZed with Vivado

Tags (2)