Designers often ask “what is the maximum frequency my design will operate at?” The maximum frequency, or ‘Fmax,’ really should not be a question--it should be a requirement of the architecture chosen for the design.
For example, if I need a data rate of 1 Gb/s to be processed, I have a choice: I can represent this amount of data as ten 100 Mb/s lines or 100 ten Mb/s data paths. In an FPGA device, as it is programmable, making this simple choice has benefits, as well as costs. The sooner you make these types of decisions, the easier things will be later.
Fast and Narrow
Choosing a narrow bus width and operating fast, but below the ‘Fmax’ of the various FPGA device blocks, is a common approach. However, as the clock period is short, it will be harder to meet timing, if there are many levels of logic desired. To achieve a faster design, one must use pipelining to separate the levels of logic. In terms of power, fast implies more power, as dynamic power is number of nets times capacitance times frequency times voltage squared (n*CFV^2). By using a lower frequency and more paths, the capacitance may or may not vary, so one has to compare designs to see if power is better or worse with respect to number of paths.
Slow and Wide
Choosing a wide data path and operating at a slower clock rate might require some additional Verilog or VHDL, but it is often the best solution. FPGA devices have a wealth of interconnections, and making use of them means that timing closure will often be trivial to achieve. Resources will be greater (a separate LUT and DFF for each path means more logic), but that may be a small price to pay for the simplicity in placement and routing. Dynamic power may be less due the reduction in glitching (logic levels changing as LUT outputs toggle as the various inputs to the LUTs settle) as the data rate is slower (changes happen less often). As there may be more LUTs on more paths, dynamic power may also not change by much over the fast and narrow case.
What about the I/O?
Fast and narrow on the I/O is best accomplished by the multi-gigabit transceivers (MGT). With rates from ~600 Mb/s to 28 Gb/s (depending on device in the 7 series), that is a whole lot of bits which very effectively use only four pins (differential receive pair, and differential transmit pair). The input/output sides of the MGT appear as 16 or more bits in parallel to the device fabric, so immediately the MGT helps get you to a slower and wider solution which better fits the FPGA device.
Another common use case is memory. DDR memory uses the single ended I/O pins, so for large bandwidths one is forced to go to wide data words. For example, White Paper 383, Achieving High Performance DDR3 Data Rates in Virtex-7 and Kintex-7FPGAs,
details how rates of up to 1.866 Gb/s per I/O pin on the DDR memories is possible.
Thus, with a 144 bit wide data path, using the error check and correct blocks in block RAMs (the block RAM bits go unused as just the ECC logic is used), one is able to achieve a tremendous bandwidth on the memory channel: 144 * 1.866 Gb/s or 240 Gb/s of usable bandwidth (30 gigabytes per second). Similarly, even wider memory words are capable of providing rates which allow processing 40 Gb/s, and even 100 Gb/s Ethernet traffic today.
So, the next time you wonder “how fast can I go,” understand that you may have asked yourself the wrong question. The question is really, “what architecture (data path width) should I chose that is best suited to solve my problem in a Xilinx FPGA device?” You can always ask “how fast will it go?” later, once you have at least given yourself a fighting chance to succeed.