Showing results for 
Show  only  | Search instead for 
Did you mean: 

Xilinx AXI SmartConnect IP now baked into Vivado 2016.1. How about a technical deep dive on this technology?

Xilinx Employee
Xilinx Employee
0 2 122K


Xilinx announced SmartConnect interconnect automation for optimizing interconnects in complex systems built using Xilinx All Programmable devices based on the UltraScale architecture more than a year ago. Today, Xilinx announced that the recently released 2016.1 release of the Vivado Design Suite HLx Editions now incorporates extensions to the SmartConnect technology including new AXI Smartconnect IP that give you an unprecedented performance boost for system designs that use 16nm UltraScale+ devices—2x better than systems based on devices built with 28nm process technology. (Note: For the Vivado 2016.1 announcement, see “Vivado Design Suite—HLx Editions version 2016.1 now online, ready for download.”) Last year when I wrote about SmartConnect technology (see “SmartConnect: Interconnect design automation for UltraScale+ that cuts system area and power by 20% to 30%”), I could only write in general terms because the specifics were not public. With today’s announcement and the associated White Paper, I can now give you a lot more technical detail about this performance-boosting technology.


With this latest Vivado HLx 2016.1 release that includes AXI SmartConnect IP, Xilinx has extended SmartConnect technology with optimization techniques including useful skew optimization, time borrowing, retiming, and pipeline analysis that identify and mitigate system-performance bottlenecks without requiring heavy manual optimizations, extra latency insertion, or costly architecture redesign. Ideally, you’d like to have a highly automated way of optimizing all of this, including interconnect structures.


For devices based on the UltraScale architecture, Xilinx calls that automated interconnect optimization technique “SmartConnect.” SmartConnect technology boosts performance per watt of AXI interconnect by optimizing interconnect networks for performance and area, within the specific interconnectivity requirements inherent to the overall design.


Consider clock skew. The way we go fast in logic design is through the age-old technique called pipelining. Seymour Cray used this logic design technique during the 1960s to build what were then the world’s fastest mainframe computers. We put controlled amounts of logic between registers to divide and pipeline the work to be done. If we get things just right, there’s exactly the same amount of logic—and exactly the same amount of delay—between each pair of pipeline registers so that the entire pipeline runs at some maximum frequency. Only in rarely do we get things just right and increasingly, wire delay plays a large role in the overall delay of each pipeline stage so that delays are never truly equal. There’s always one slowest delay in a pipeline that limits the overall pipeline clock frequency.


The sledgehammer approach to fixing this problem is to add more pipeline registers and to divide the logic between registers ever more finely to produce ever shorter logic delays and to reduce wire delays. Although this technique works, it adds physical registers and pipeline latency. Adding registers increases power and energy consumption. If you really wanted to brute-force this approach, you’d sprinkle pipeline registers all across your FPGA just in case you might need them. This approach adds die area, degrades pipeline latency, and increases static and dynamic power consumption, which explains why Xilinx did not take this approach with SmartConnect technology.


Instead, Xilinx designed several features into UltraScale+ devices including programmable delays in the leaf-clock buffers so that the Vivado design tools can adjust clock skew on a leaf-by-leaf basis to fully exploit useful clock skew in system designs. These leaf-clock buffers each have five discrete delay-tap settings that allow the Vivado router to automatically optimize clock delays. This feature is one aspect of the “ASIC-like clocking” available in All Programmable devices based on the Xilinx UltraScale architecture.


Here’s a diagram of this innovation from the SmartConnect White Paper:



UltraScale Leaf-Clock Buffer Delay Feature.jpg 


These programmable leaf-clock buffers allow the Vivado Design Suite router to automatically fix setup and hold violations without designer intervention. The router employs timing analysis to determine the exact tap setting for each leaf-clock buffer, which helps achieve timing closure at high clock rates. You do not want to manually deal with all of these skew-delay problems at the leaf level and with SmartConnect technology, you won’t.



The leaf-clock buffers in UltraScale+ devices and the ability of the latest Vivado Design Suite tools to exploit the benefits of these buffers are what Xilinx means when it says that UltraScale+ devices and the Vivado Design Suite are “co-optimized.” You can easily see the benefits of such co-optimization for pipelined function blocks and for interconnect. The White Paper discusses other related co-optimizations.


In addition to the SmartConnect tool optimizations, Xilinx has now introduced AXI SmartConnect IP to really automate the optimal design of large, IP-based systems. Here’s a diagram from the White Paper that illustrates the use of this IP:



AXI SmartConnect IP.jpg 


As you can see from the diagram, the entire AXI SmartConnect IP appears as one IP block. It’s a simple exercise to draw the 14 wires needed to connect the IP blocks comprising a very complex system based on the Xilinx Zynq UltraScale+ MPSoC heterogeneous processor complex with a large number of DMA and memory controllers. SmartConnect optimizations are baked into the AXI SmartConnect IP block.



There’s a lot more technical detail in the White Paper “Breakthrough UltraScale+ Device Performance with SmartConnect Technology” so I recommend that you download and read it.

Tags (3)