With this latest Vivado HLx 2016.1 release that includes AXI SmartConnect IP, Xilinx has extended SmartConnect technology with optimization techniques including useful skew optimization, time borrowing, retiming, and pipeline analysis that identify and mitigate system-performance bottlenecks without requiring heavy manual optimizations, extra latency insertion, or costly architecture redesign. Ideally, you’d like to have a highly automated way of optimizing all of this, including interconnect structures.
For devices based on the UltraScale architecture, Xilinx calls that automated interconnect optimization technique “SmartConnect.” SmartConnect technology boosts performance per watt of AXI interconnect by optimizing interconnect networks for performance and area, within the specific interconnectivity requirements inherent to the overall design.
Consider clock skew. The way we go fast in logic design is through the age-old technique called pipelining. Seymour Cray used this logic design technique during the 1960s to build what were then the world’s fastest mainframe computers. We put controlled amounts of logic between registers to divide and pipeline the work to be done. If we get things just right, there’s exactly the same amount of logic—and exactly the same amount of delay—between each pair of pipeline registers so that the entire pipeline runs at some maximum frequency. Only in rarely do we get things just right and increasingly, wire delay plays a large role in the overall delay of each pipeline stage so that delays are never truly equal. There’s always one slowest delay in a pipeline that limits the overall pipeline clock frequency.
The sledgehammer approach to fixing this problem is to add more pipeline registers and to divide the logic between registers ever more finely to produce ever shorter logic delays and to reduce wire delays. Although this technique works, it adds physical registers and pipeline latency. Adding registers increases power and energy consumption. If you really wanted to brute-force this approach, you’d sprinkle pipeline registers all across your FPGA just in case you might need them. This approach adds die area, degrades pipeline latency, and increases static and dynamic power consumption, which explains why Xilinx did not take this approach with SmartConnect technology.
Instead, Xilinx designed several features into UltraScale+ devices including programmable delays in the leaf-clock buffers so that the Vivado design tools can adjust clock skew on a leaf-by-leaf basis to fully exploit useful clock skew in system designs. These leaf-clock buffers each have five discrete delay-tap settings that allow the Vivado router to automatically optimize clock delays. This feature is one aspect of the “ASIC-like clocking” available in All Programmable devices based on the Xilinx UltraScale architecture.
Here’s a diagram of this innovation from the SmartConnect White Paper:
These programmable leaf-clock buffers allow the Vivado Design Suite router to automatically fix setup and hold violations without designer intervention. The router employs timing analysis to determine the exact tap setting for each leaf-clock buffer, which helps achieve timing closure at high clock rates. You do not want to manually deal with all of these skew-delay problems at the leaf level and with SmartConnect technology, you won’t.
The leaf-clock buffers in UltraScale+ devices and the ability of the latest Vivado Design Suite tools to exploit the benefits of these buffers are what Xilinx means when it says that UltraScale+ devices and the Vivado Design Suite are “co-optimized.” You can easily see the benefits of such co-optimization for pipelined function blocks and for interconnect. The White Paper discusses other related co-optimizations.
In addition to the SmartConnect tool optimizations, Xilinx has now introduced AXI SmartConnect IP to really automate the optimal design of large, IP-based systems. Here’s a diagram from the White Paper that illustrates the use of this IP:
As you can see from the diagram, the entire AXI SmartConnect IP appears as one IP block. It’s a simple exercise to draw the 14 wires needed to connect the IP blocks comprising a very complex system based on the Xilinx Zynq UltraScale+ MPSoC heterogeneous processor complex with a large number of DMA and memory controllers. SmartConnect optimizations are baked into the AXI SmartConnect IP block.