Robert Roe recently published an overview article titled “Will OpenCL open the gates for FPGAs?” on the Scientific Computing World Web site about the use of FPGAs in a wide variety of applications. However his article covers a lot more ground than just OpenCL. Some of the quotes in the article can really set you to thinking if you’re stuck with the 20th-century impression that FPGAs are mostly good for glue logic. Here’s a quote from the article by Larry Getman, VP of Strategic Marketing and Planning at Xilinx:
“When FPGAs first started they could do very basic things such as Boolean algebra and it was really used for glue logic. Over the years, FPGAs have really advanced and evolved with more hardened structures which are much more specialized.
The maximum DSP count for a 16nm Virtex UltraScale+ All Programmable device is 11904 DSP48E2 slices. That’s a big, big number. It’s big enough to get me wondering about the evolution of DSP slices within Xilinx All Programmable devices. After all, 11904 DSP slices is huge, but how does that number compare with prior FPGA generations? I did a little research in the online data sheets and product tables going back to the days of Virtex-4 devices. The rapid rise in DSP slice count clearly indicates why FPGAs have pretty much taken over the world’s DSP heavy lifting. Conventional DSP processors simply cannot muster the MACs of the massively parallel armies of DSP slices in a Xilinx FPGA.
Here are the numbers for Xilinx high-end and mid-range devices going back to the Virtex-4 generation:
And here’s a plot of the data for the more visually inclined:
Note that the Virtex-4 device generation is fabricated with 90nm IC process technology and the UltraScale+ generation employs 16nm FinFET process technology—a 6-generation span. Also, this simple analysis doesn't account for the evolution and enhancements to the basic DSP48 slice over time. It's just a simple numerical count.
I’ve not included data for Xilinx low-end devices (Spartan-6 and Artix-7), although if you need a lot of DSP the 28nm Artix-7 family’s maximum count of 740 DSP slices represent pretty formidable resources for DSP crunching and bests the maximum DSP count of the high-end Virtex-4 generation by nearly 50% (plus a convenient speed boost thanks to process-technology advances from 90nm to 28nm and a greatly enhanced DSP48 design with an expanded mutiplier).
PCIe is a standard system interconnect, thanks in no small part to the interface’s huge success in the PC market. Xilinx has just introduced three new PCIe-based reference designs and documents that help you use the PCIe interface in your Kintex UltraScale FPGA designs:
UG918 KCU105 PCI Express Control Plane TRD User Guide: The PCI Express Control Plane TRD targets the Kintex UltraScale XCKU040-2FFVA1156E FPGA running on the KCU105 evaluation board. It demonstrates a control plane application using a PCI Express Endpoint block in a x1 Gen1 configuration. Simple base address register (BAR)-mapped read and write transactions are demonstrated using a kernel mode software driver controlled by the Control & Monitoring graphical user interface (GUI).
UG919 KCU105 PCI Express Memory-Mapped Data Plane TRD User Guide: The PCIe memory mapped data plane TRD targets the Kintex UltraScale XCKU040-2FFVA1156E FPGA running on the KCU105 evaluation board. It demonstrates an AXI memory mapped data plane application using a PCI Express (PCIe) Endpoint block in x8 Gen3 configuration through the use of high performance Expresso DMA from Northwest Logic. The AXI Bridge from Northwest Logic is used to demonstrate PCIe to AXI conversion of transactions. The downstream slaves include a power monitor module, user space registers, and an AXI performance monitor.
UG920 KCU105 PCI Express Streaming Data Plane TRD User Guide: The PCI Express Streaming Data Plane TRD targets the Kintex UltraScale XCKU040-2FFVA1156E FPGA running on the KCU105 evaluation board. It demonstrates an AXI streaming data plane application using a PCI Express Endpoint block in a x8 Gen2 configuration through use of the high performance Expresso DMA from Northwest Logic. Design option control is provided through the Control & Monitoring graphical user interface (GUI). The AXI Bridge from Northwest Logic is used to demonstrate PCIe-to-AXI conversion of transactions and vice versa.
By Adam Taylor
Having looked at how we constrain I/O in the previous blog, the next logical step is to look at how we constrain placement and routing of our design within the FPGA. You can use placement constraints for a number of reasons: to help achieve timing or maybe to provide isolation between sections of the design. Before we delve too deeply into this topic, there are a few terms we need to define:
A LOC allows you to define a slice or other location within the device. A BEL constraint allows you to target at a finer granularity than the LOC and identify the specific flip-flop to use within the slice. While PBlocks can be used to group logic together, they are also used for defining logical regions when we wish to perform partial reconfiguration.Read more...
Can you pass on your low-fat soy double mocha cappuccino with an extra two half-caf espresso shots today? Then you can pick up the new Kindle version of Adam Taylor’s “complete” (to date), 311-page MicroZed Chronicles truthfully titled “Zynq 101.” Take my word for it. The book will stay with you much longer than the empty calories.
There are some awesomely powerful system-analysis tools now under the hood of the Xilinx SDK. You’re going to want to know about these tools if you’re using Zynq SoCs or considering them for your next design. Here’s a 4.5-minute video with an overview and demo of these tools. Let me caution you, you’re going to need to let the first minute and a half worth of marketing just walk on by but then the video comes to the meat you’re looking for. Here’s the video:
For some hard technical facts, see Forrest Picket’s excellent, 36-page app note, XAPP1219, “System Performance Analysis of an All Programmable SoC.”
By Adam Taylor
Having looked at timing-related constraints, we would be remiss if we did not consider the physical constraints that we can apply to our design. The most commonly used physical constraints an engineer uses is the placement of I/O pins and the definition of parameters associated with each I/O pin (standard, drive strength, etc.). However there are other types of physical constraints:
As always, there are a few constraints that sit outside these groups. Vivado has three and they are predominantly used on the Netlist:Read more...
The Coherent Accelerator Processor Interface (CAPI) on IBM POWER8 server systems allows solution architects to improve system-level performance by connecting custom acceleration engines to the coherent fabric of the POWER8 multicore processor chip, which results in a simple programming paradigm while delivering performance well beyond today's I/O-attached acceleration engines. Convey Computer announced an initial version of its CAPI Development Kit based on its Eagle PCIe coprocessor board at last week’s Open Power Summit held in Silicon Valley. The PCIe Eagle coprocessor/hardware accelerator combines a Xilinx Virtex-7 980T FPGA with a large amount of on-board memory (16 or 32 Gbytes, 16Gbytes included in the CAPI Development Kit). The board dissipates only 75W.
CAPI relies on a Power Service Layer (PSL) loaded in the coprocessor FPGAs to provide address translation and caching for the hardware accelerator. A Coherent Accelerator Processor Proxy (CAPP) in the POWER8 chip participates directly in the POWER8 coherency protocols on behalf of the coprocessor, ensuring a consistent view of memory within the virtual address space. To a program running on a Power8 processor, access looks like a thread running on the host processor. Here’s a simple diagram showing the link between the IBM Power8 multicore processor and the Convey Eagle Accelerator.
SP Devices has just announced a line of 2Gsamples/sec; 14-bit; 1-, 2-, or 4-channel digitizer boards called the ADQ14 series. There are 18 (!) boards in the series with various combinations of channels, coupling (ac, dc, dc with gain), and sample rate (500Msamples/sec, 1Gsamples/sec, 2Gsamples/sec).
All members of the ADQ14 product family have an on-board Xilinx Kintex-7 K325T FPGA for managing the data acquisition and channel sequencing. There’s also a user area in the FPGA to permit additional functions to be added. Currently the company is offering three additional functions as firmware add-ons:
In the rollup to next week’s OFC 2015 event in Los Angeles, Lightwaveonline.com has just published the 2015 Lightwave Innovation Award Elite Scores, which recognize both SDNet and SDAccel from Xilinx as innovative, groundbreaking design tools that set new standards. Both products earned a score of 4.5 (out of 5) from a panel of judges and they are the only design tools listed in this year’s awards.
That score of 4.5 means that SDNet and SDAccel were each considered a “superb product that sets new standards for performance,” “groundbreaking,” and as setting “new technical milestones.” Each product was evaluated by three judges knowledgeable in the relevant technology or application. The 10-member judging panel for the 2015 Lightwave Innovation Awards included:
The Xilinx SDx family of development environments allow developers with little or no FPGA expertise to obtain the performance benefits and reduced power consumption long associated with programmable logic using high-level languages and application-specific design environments instead of Verilog or VHDL.
SDNet enables the creation of “Softly” Defined Networks through high-level networking specifications that are processed by a set of integrated development tools to produce a complete, high-speed networking design. The SDNet user describes required packet processing functions in a natural way, without including any implementation details, and the SDNet tools automatically transform the specification into an optimized hardware implementation that delivers line-rate performance, based on Xilinx All Programmable devices.
The SDAccel Development Environment gives data-center application developers the complete, high-performance hardware/software design solution they want—based on FPGA technology—with specifications written in C, C++, or OpenCL (instead of Verilog or VHDL). SDAccel includes a software-development flow that presents a familiar CPU/GPU-like work environment to the developer with an Eclipse-based IDE for code development, profiling, and debugging. A fast, architecturally optimizing compiler that makes efficient use of on-chip FPGA resources operates under the hood. Completed designs produced by SDAccel include dynamic reconfigurable accelerators optimized for different data center applications that can be swapped in and out on the fly for a CPU/GPU-like run-time environment.
Note: Earlier this month, Xilinx announced a third design environment in the SDx family: SDSoC. (See “SDSoC development environment for Zynq SoCs/MPSoCs says ‘buh bye’ to Verilog, VHDL. Wait, what?”)
Three high-speed networking demos in the Xilinx booth (#729) at next week’s OFC 2015 in Los Angeles highlights the abilities of Xilinx UltraScale FPGAs and IP to implement 100G and 400G systems with one chip. The demos include:
If you’d like to hear an overview covering the use of advanced All Programmable devices to implement high-performance networking hardware including SDN equipment, you might want to attend Gordon Brebner’s two presentations at OFC. The first, presented on Monday, March 23 in Room 408B, is titled “Programmable Hardware in Software Defined Networking” and the second, on Thursday, March 26 in Room 410, is titled “Programmable Hardware for High Performance SDN.” Gordon is a Distinguished Engineer at Xilinx.
Luis Bielich’s new Xilinx Application Note XAPP1217 titled “Zero Latency Multiplexing I/O for ASIC Emulation” describes a technique for moving many parallel bits from one FPGA to another with zero latency—effectively teleporting the bits from one FPGA to another—using one high-speed serial link. Naturally, this technique only delivers zero latency as long as the FPGA system clock rate is significantly slower than the serial link’s bit rate. That’s normally true for ASIC emulation but it can also be true for other applications where two FPGAs are somewhat far removed, on different boards for example. Bielich’s technique can get your bits from here to there with a minimum number of I/O pins—like one.
The concept of multiplexing several bits over a high-speed serial link is pretty simple, as shown below in Figure 1, taken from the app note:
The trick to getting “zero latency”—effectively bit teleportation—from this technique is that the serial port’s bit rate must be significantly higher than the logic clock frequency.
With OFC 2015 in Los Angeles coming up next week, this is the week for a focus on high-speed optical communications. In addition to the 100G Reed-Solomon FEC IP block announcement from Xilinx that I covered earlier today, here’s a just-posted, 3.5-minute video demo of 100G optical Ethernet modules operating with a Xilinx Virtex UltraScale VU095 FPGA, which incorporates a hardened-core 100G Ethernet MAC. The demo, narrated by Technical Marketing Manager Martin Gilpatric with his usual crystal clarity and a minimal amount of marketing schmaltz, shows hot swapping of 10x10G and 4x25G optical modules while the 100G Ethernet IP implementation keeps pace.
FECs are one of those unique elements in the electronics world. Two truths about FECs:
1. If you don’t know what a FEC is, you probably don’t need one.
2. If you know what a FEC is, you probably need one.
(A FEC is a Forward Error Correction block)
Xilinx has just announced a low-latency 100G IEEE 802.3bj Reed-Solomon FEC (RS-FEC) as a LogiCORE IP block for high-speed, 100G Ethernet communications over optical media using standards including SR4, CWDM4, PSM4, or ER4f. There’s also a reference design with the FEC block integrated with other 100G Ethernet IP. Xilinx will be demonstrating this 100G RS-FEC implemented with a Xilinx Virtex UltraScale VU095 FPGA and operating with optics from Finisar (in the Ethernet Alliance booth #2531) and TE Connectivity (booth #1417) at OFC 2015 in Los Angeles later this month.
Kevin Morris at EEJournal has just published his impressions of the new SDSoC Design Environment in an article titled “Software Defines Everything: Xilinx Announces SDSoC.” Here are a few excerpts from his article:
“SDSoC creates a programming environment like we would expect for “conventional” SoCs and ASSPs, with the additional capabilities required to take full advantage of devices like the new Zynq UltraScale+. Of course, SDSoC includes the normal things we’d expect in an embedded software development environment, such as compilers and debuggers with the special features we need to debug embedded software in a parallel heterogeneous multi-processing environment.”
“Xilinx has worked to create what they call an ‘ASSP-like programming experience.’”
“While the profiling, debugging, and ASSP-like experience are certainly required steps for wooing design teams that might otherwise be using more conventional SoCs, Xilinx has taken SDSoC several more steps toward what we need for real productivity leaps.”
“One of the more impressive (and useful) bits of functionality is an automatic system-level connectivity generator.”
Click here to read the full article on EEJournal.com.
(Excerpted and adapted from the latest issue of Xcell Journal)
By Paul Dillien and Tom Kean, PhD
An obvious tactic for protecting information is to encrypt data as it transits the network and moves around the data center. Encryption ensures that, should the data be intercepted by an unauthorized party sniffing the link, it cannot be read. Ideally, too, the data should be authenticated to ensure its integrity. Message authentication is designed to detect where the original encrypted data has been altered, either by means of a transmission error or from being maliciously tampered with by an attacker seeking to gain an advantage.
The popularity of the Ethernet standard has driven down costs, making it even more attractive, and this virtuous circle ensures the continuance of Ethernet as the Layer 2 technology of choice. However, up until a few years ago, the specification did not include any encryption, leaving the job to technologies such as IPsec that operate in the upper layers of the communications protocol stack.
Now, a new extension to Ethernet adds a raft of security measures, under the specification IEEE 802.1AE. Specified a few years ago, this technology features an integrated security system that encrypts and authenticates messages while also detecting and defeating a range of attacks on the network. The specification is known as the Media Access Control Security standard, or more commonly as MACsec, and Algotronix set out several years ago to produce IP cores that provide hardware-accelerated encryption over a range of data rates. (Algotronix also supplies an intellectual-property core for IPsec that has a very similar interface to the MACsec product and would be a good choice in systems that need to support both standards.)Read more...
Vadatech has just introduced the AMC502, a double-module, mid-size AMC FPGA carrier with two FMC (VITA-57) connectors driven by a Xilinx Kintex-7 XC7K420T FPGA. AMC Ports 4-7 and 8-11 are routed to the FPGA per AMC.1, AMC.2 and AMC.4 for programmable support of various I/O protocols including PCIe, SRIO, and XAUI. Here’s a block diagram of the board:
By Adam Taylor
Over the last week I have been approached by a number of different people who are using different Zynq based development kits and they’re wondering how to apply the MicroZed Chronicles to their chosen hardware. In addition to the Avnet MicroZed, there are a number of other popular development kits based on versions of the Zynq. Here’s a list of popular Zynq-based development boards showing the Zynq SoC variant on each board:
Why should users of others dev kits not want to follow along with the internet’s preeminent Zynq blog? It’s actually pretty easy to do and takes us back over 12 months to the very first blog in this series where we defined the configuration of the hardware we would be working with.Read more...
(Excerpted and adapted from the latest issue of Xcell Journal)
By John Kilpatrick and Robbie Shergill (Analog Devices), and Manish Sinha (Xilinx)
The ever-increasing demand for data on the world’s cellular networks has operators searching for ways to increase the capacity 5,000-fold by 2030. Getting there will require a 5x increase in channel performance, a 20x increase in allocated spectrum and a 50x increase in the number of cell sites. Many of these new cells will be placed indoors, where the majority of traffic originates, and fiber is the top choice to funnel the traffic back into the networks. But there are many outdoor locations where fiber is not available or is too expensive to connect, and for these situations wireless backhaul is the most viable alternative.
Unlicensed spectrum at 5GHz is available and does not require a line-of-sight path. However, the bandwidth is limited and interference from other users of this spectrum is almost guaranteed due to heavy traffic and wide antenna patterns. Communication links of 60GHz are emerging as a leading contender to provide these backhaul links for the many thousands of outdoor cells that will be required to meet the capacity demands. This spectrum is also unlicensed, but unlike frequencies below 6GHz, it contains up to 9GHz of available bandwidth. Moreover, the high frequency allows for very narrow and focused antenna patterns that are somewhat immune to interference.
A complete 60-GHz two-way data communication link developed by Xilinx and Hittite Microwave (now part of Analog Devices) demonstrates superior performance and the flexibility to meet the requirements of the small-cell backhaul market (Figure 1). Xilinx developed the digital modem portion of the platform and Analog Devices, the millimeter-wave radio portion.
Figure 1 – High-level block diagram of the complete two-way communication linkRead more...
There are low-end Zynq SoCs for high-volume applications and there are high-end Zynq SoCs when you need extra performance. What are the key differences?
Make that “didn’t” for those last two high/low-end Zynq differentiators, because the Z-7015 low-end Zynq SoC introduced as the sixth member of the Zynq SoC family late last year integrates four 6.25Gbps SerDes ports and one PCIe Gen2 x4 integrated block. It’s an enhanced, low-end I/O machine.Read more...
The Xilinx Spartan-6 family is a truly durable family of economical, low-end FPGAs and the Spartan-6 XC6SLX16 has proven particularly popular because it offers a handy mix of on-chip programmable resources and I/O. If you’ve used that device in past designs, it’s time you looked at the new XC7A15T, the latest member of the Xilinx Artix-7 family, for your next design. The XC7A15T has 12.9% more logic cells (16,640 versus 14,579), 40% more DSP48 slices (45 versus 32), 56% more block RAM (900Kbits versus 576Kbits), and is available with more I/O pins (a maximum of 250 versus 232) than the XC6SLX16. The Artix-7 XC7A15T has four 6.6Gbps SerDes ports while the Spartan-6 XC6SLX16 has none. All Artix-7 devices including the XC7A15T have an on-chip XADC block that provides you with analog inputs and muxes (including on-chip supply voltage inputs); two 12-bit, 1Msamples/sec ADCs; and an on-chip temperature sensor (see “Do you know about the mixed-signal processing block embedded in Xilinx 7 series FPGAs and Zynq SoCs?”).
Those are perhaps the most obvious advantages that you’ll see from the product tables—but the Artix-7 family has another key advantage for you to consider as well:
Mathworks put together a 4-part video series on using MATLAB and Simulink with the Xilinx Zynq SoC. Why do that? So that you can use model-based design techniques.Read more...
Dick Selwood just published a really interesting article on the EEJournal Web site called “Taking the FPGA Pulse.” It discusses the results of a 23-question online FPGA survey conducted last October in the UK by FPGA industry veteran Doug Amos, who conducted the study for NMI and presented it at last month’s Verification Futures Conference held in the UK. The study is called the “NMI FPGA Usage Survey 2014.” NMI is a UK membership organization that’s dedicated to increasing “the quality and quantity of electronic engineering and manufacturing in the UK.”
Doug Amos worked with Aldec, Altera, FirstEDA, Mentor Graphics, Synopsys, Xilinx, XMOS, and UK publication New Electronics to devise and publicize the questionnaire. The study seeks to “chart the particular usage, methods, and challenges of FPGA Users in UK and Ireland” and Amos notes that the study is “not intended as a market share survey.” One reason for that is because the number of responses is small—only 174 respondents (of which eight were disqualified), all in the UK and Ireland, and there are only about 150 responses to each question on average.Read more...
PRO Design recently announced the proFPGA quad VUS 440 Prototyping System, a modular FPGA prototyping system based on four Xilinx 20nm UltraScale VU440 3D FPGAs. The company claims that the prototyping system’s capacity is 120M ASIC gates, which is 2.5x larger than the company’s previous-generation FPGA prototyping system based on Xilinx Virtex-7 devices. You can connect five of these proFPGA quad VUS 440 Prototyping Systems together to increase the prototyping capacity to 600M (more than half a billion!) ASIC gates. Maximum system speed is said to be 500MHz.
The system will be available to early-adopter customers in April.
George T Haber, founder and Managing Partner of the Cresta Fund (“Startups baked to perfection!) and tech startup CrestaTech, “unabashedly claims” Haber’s Law as his own: “If it can be done in software, it will!” Based on my 40 years (!) of embedded system design experience, there’s no doubt in my mind that Haber’s Law is right. That might seem to be a deadly admission coming from the guy who writes the Xilinx Xcell Daily blog, but let’s get real here. If you can implement an embedded system with nothing more than software running on a microcontroller or microprocessor, you should.Read more...
The prolific Embedded.com and EETimes writer Max Maxfield—aka Max the Magnificent—who also happens to be a friend, has just published his take on today’s SDSoC announcement by Xilinx. You really should go and read the full article, but here are a few choice quotes:
“It appears that Xilinx has discovered the ‘Holy Grail’ with regard to embedded systems development.”
“The thing that makes this so useful for system architects, platform architects, and software developers is that so much of the magic happens ‘under the hood.’”
“It's important to note that, at the time of this writing, SDSoC has been in Beta with select real-world users for over a year, which provides a high-level of confidence that this development environment is ready for ‘prime time’ usage.”
As I said, it’s worth your time to go and read Max’s full article.
By Adam Taylor
Over the last few blogs, we looked the basics of timing constraints. We should now be able to define clocks within a design, establish and declare their relationships, and declare any imperfections in both the clocks and the system. As the system-design engineers, we must also focus on what happens within a defined clock group when an exception occurs. Before we declare exceptions, we must first understand what an exception is. Xilinx User Guide 903, “Vivado Design Suite User Guide: Using Constraints”, defines a timing exception as:
“A timing exception is needed when the logic behaves in a way that is not timed correctly by default.”
One common example of a timing exception is a result that’s captured only every other clock cycle. Another would be transferring data between a section of logic with a slow clock to a section with a faster clock (or vice versa), where both clocks are synchronous. In fact both of these examples are a timing exception commonly referred to as a multi-cycle path.
“Two words prevent embedded system developers and software engineers from using Zynq SoCs to boost system performance,” I told my colleagues at lunch last week. They waited for the two words:
“Verilog” and “VHDL”
The Dawn spacecraft entered orbit around the dwarf planet Ceres early this morning, marking the start of a new page in the robotic vehicle’s scientific mission. Of particular interest are two bright spots detected inside of a large crater on the dwarf planet last month.
Images beamed back to earth will be taken by the Dawn spacecraft’s two identical Framing Cameras, which are refractive telescopes coupled to 1024x1024-pixel CCD imagers. Each camera is controlled by its own FPGA-based DPU (data processing unit) developed at IDA (the Institut für Datentechnik und Kommunikationsnetze at TU Braunschweig) in Germany. The FPGAs are space-grade, radiation-tolerant 90nm Xilinx Virtex-4QV FPGAs.
For more information on the Dawn mission, see “Visit to a small planet: NASA’s Dawn spacecraft sends video postcard from Ceres in the asteroid belt.”
Image credit: NASA/JPL-Caltech/UCLA/MPS/DLR/IDA