This latest instalment of Adam Taylor's blog shows the result of a fixed-point math function implementation in the ARM-based Zynq SoC's programmable logic.
Having looked at how we can implement fixed-point mathematics within the PL (programmable-logic) side of the Zync SoC in previous blog posts in the MicroZed Chronicles series, we now focus on implementing these functions within a system and we will see the rather surprising results of doing so.
Before we get to cutting code, we need to determine the scaling factors (location of the decimal point) that we will use in this specific implementation. In this example, the input signal will range between 0 and 10 so we can pack four decimal bits and twelve fractional bits into a 16-bit input vector.
We are implementing the above equation, which has three constants A, B, and C:
A = -0.0088 B = 1.7673 C =131.29
We need to scale these constants in our implementation. The beauty of doing this in an FPGA is that we can scale each constant differently to optimize performance, as in this table:
As we implement the above equation, we will need to consider the expansion of the resultant vectors, which for the terms Ax2 and Bx are defined below:
To perform the final addition with constant C we need to have the decimal point aligned. Therefore, we need to divide the results and Ax2 and Bx by a power of two to align the decimal points with C. The result will also be formatted in this value which is 8,8.
Having calculated the above we are ready to implement the design within the Vivado peripheral that we created in previous installments. The first implementation step is to open up the block diagram view within Vivado, right click on the peripheral, and select “Edit in IP Packager”. Once the IP Packager opens within the top-level file, we can easily implement a simple process that performs the calculation over a number of clock cycles. (Five clocks in this example, although you could optimize this further.)
Now we can re-package and rebuild the project within Vivado (remember to update the version number) before exporting the updated hardware to SDK.
Once we are within SDK we can use the same approach as before with the exception of using a fixed-point number system now instead of the floating-point system used in the earlier example:
Although the numeric result is the same, the big difference is the time it takes to perform the calculations. Although the actual computation requires only 5 clocks by the peripheral design, generating the result consumes 140 clocks or 420ns versus 25 CPU clocks using the ARM Cortex-A9 processor on the PS side of the Zynq SoC.
Why the discrepancy? Shouldn’t the programmable logic be faster?
This is a lesson in peripheral I/O overhead. When using the PL side, we must take into account the bus latency over the AXI bus and the AXI bus frequency which in this application is 142.8MHz (the requested was 150 MHz). The AXI bus overhead accounts for the longer-than-expected computation time. However, all is not lost. We’re just doing it wrong. Offloading tasks to the Zynq SoC’s PL is not intended to be used in this manner precisely because of this I/O overhead.
If we were to take a more reasonable approach, we would send a block of inputs requiring calculation to our peripheral using DMA as I explained in part 1 of this blog series on PL/PL interfacing. This example establishes why DMA is so important, which now permits me to explore how we use this experimental result in the next blog.
Please see the previous entries in this MicroZed series by Adam Taylor: