04-27-2020 04:41 PM
I am planning to learn Vitis in the near future and use it on Alveo board.
In the meantime, I need to choose a workstation.
Which workstation is better suited for the Alveo U250?
One with a processor that has more cores but lower clock rate
for instance the 18 core 2.3 Ghz
Intel Xeon W-2195 (2.3GHz, 4.3GHz Turbo, 18C, 24.75MB Cache, HT,
or one that has a processor with fewer cores but faster clock rate,
for instance the 4-core 4.1Ghz
Intel Xeon Processor W-2225 4C(4.1GHz 4.6GHz Turbo HT 8.25MB (105W) DDR4-2933)
Any other recommendation for the system? (I'm using it for development, not as a server).
04-27-2020 07:06 PM
The list of Xilinx certified OEMs is as follows:
But while deciding a server you should be really looking at minimum system requirement for working with Alveo.
For additional specifications and details on the acceptable environmental conditions, see Alveo
U200 and U250 Data Center Accelerator Cards Data Sheet (DS962) and Alveo U280 Data Center
Accelerator Cards Data Sheet (DS963).
Please note this information is mentioned in detail on the UG1301.
Hope this helps
04-28-2020 04:02 AM
I read the material in the links that you specified.
I can see general recommendation, for instance, servers that have the right Dual slot PCIe x16 necessary for the Alveo U250 active.
But my question is not about that.
My question is more about the processing architecture. Let me try to elaborate on that. I am not familiar with vitis and writing Kernels, but my plan will be (once I get the time to learn it) to write kernels in RTL for the Alveo U250. Given that this is the case, as I see it, there are two main options as far as CPU (on the host side):
1. more cores (for instance, >=16, but lower speeds (approx in the range of ~2 - 3 Ghz)
2. fewer cores (<16), but higher speeds (approx in the range of ~ 3 - 4 ghz, maybe higher).
From the standpoint of an engineer who plans to write Kernels in RTL on the Alveo, and considering the speed on the PCIe (8 giga Transfers per second), in order to fully utilize the Alveo (namely, to keep it "busy"), which of the above two alternatives is more suitable?
04-28-2020 10:18 AM
I'm not sure if I am getting the full picture of your question.
I do know that to saturate the PCIe bandwidth, you'll want to perform the largest transfers across PCIe. Once the data is on the Alveo card, you would want to use multiple kernels to process the data in parallel, then send the data back once as much processing as possible is completed.
The number of host cores are not in this picture though. There is one CPU core initiating the data transfers. Once the data is on the accelerator card, the kernels you designed and placed on the card take over the processing somewhat independent from the host until they are completed with their compute operation.
The overall recommendation is to use Gen3x16 first, then if PCIe is completely the bottleneck, use the Vitis Analyzer tool optimize your application.
Does that help?
04-28-2020 01:19 PM
Thanks for your answer.
Let's assume that 8G transfers per second of the PCIe is not the bottle neck when communicating with the Alveo U250.
What I am asking is: does one CPU core can keep up with serving multiple kernels (they could be in the tens or maybe hundreds) or does it make more sense to split up the work (on the host side) on multiple cores (perhaps each one running at a lower speed, as I mentioned in earlier) in order to keep as many kernels busy on the Alveo?
Has there been any study on this? I would be surprised that this question didn't come up? It seems to me perhaps one of the first major questions that one should ask himself before they go and spend thousands of dollars on a workstation. That is, one should choose the right kind of CPU otherwise, what's the use for an accelerator board if the host is not optimized to keep the accelerator busy?
05-05-2020 07:58 PM
For development more cores will make synthesis faster but not help with back-end place and route. Unless you wanted to kick off multiple place and route jobs in parallel. For deployment it really depends on your application.
Sequencing the kernels isn’t generally very CPU intensive but if you have a bunch of other processing they need to do off-card, or your application takes data from a lot of other off CPU sources, more cores would be better. All of this is highly dependent on your target application. The cards are meant as production units to accelerate already existing jobs. If you are starting with FPGAs, you could be more interested in an FPGA development card which can help you use all the existing IP in the Xilinx catalog.
It typically doesn't make sense to split up the work of a single application thread to multiple CPUs. However if you were an application server with multiple users starting their own jobs and you wanted to accelerate their jobs, then yes, more cores is going to be helpful for your overall workload processing.
This isn't FPGA, Acceleration, or PCIe dependent, it is all application dependent. The question is too generic to answer with an analysis.
Our typical workflow recommends taking an application you have in C or some other language, analyzing it with the Vitis profiling tools, seeing where bottlenecks are, and finally determining what you want to offload to the acceleration card. While the Alveo card is processing work, your cores can be off doing other things.
I recommend you check out some of our blog posts that talk about this flow. Here is a blog post that goes through that exact process - https://developer.xilinx.com/en/articles/part1-introduction-to-ethash.html