Jan Gray is a master at fitting 32-bit CPUs into programmable-logic fabrics and he writes a blog called FPGA CPU News, subtitled “Exploring Parallel Computer Architecture with FPGAs.” One of his recent posts titled “FPGAs, Then and Now” compares the fitting of one of Gray’s J32 32-bit RISC CPUs into a Xilinx XC4010PC84-5 FPGA circa 1995 versus implementing the same processor in a contemporary Xilinx Virtex-7 XC7VX690T FPGA. Gray’s J32 processor employs a classic RISC architecture with a 3-operand instruction, a 4-stage pipeline (fetch, register read, execute, writeback), and a 32-register operand file.
In 1995, Gray’s J32 processor consumed essentially all of 800 of the 4-input LUTs in an XC4010 FPGA. The layout looked like this:
Fast-forward 13 years. The same J32 processor core plugs into a Virtex-7 XC7VX690T FPGA—which has more than 433,000 6-input LUTs—one thousand times with room left for 250 router cores to interconnect the 1000 processors. The layout for one J32 RISC processor looks like this:
“In other words, in the past 18 years Moore’s Law has taken us from 1K LUTs per FPGA to 1K 32-bit CPUs per FPGA” writes Gray. (By the way, the largest Virtex Ultrascale 3D FPGA has 4.4 million logic cells, so the logic capacity is sufficient for perhaps more than 10,000 of Jan Gray’s 32-bit J32 RISC CPUs with interconnect. But wait! See Note below.)
If the interaction between processors and FPGAs plays a role in your system design, take a look at Jan Gray’s blog posts on FPGA CPU News.
Note: Per Jan Gray's comment, the Virtex UltraScale VU440 FPGA has 2520 BRAMs so that becomes the limit to a straight port of Gray's design—"only" 2520 32-bit RISC processor cores. So perhaps the Virtex UltraScale XCVU160 with more BRAMs and fewer logic resources might be a better choice in this case. I'll leave it to Jan to suss this out.