05-18-2011 10:15 PM
I am a newbie in fpga and driver programming. I followed the instructions on the XTP045 to generate a pcie gen1 integrated block with 1mb bar space for Vertex6 ML605 board and wrote a linux charcter driver to read into its bar space. Everything seems to work at this point.
However, when comparing the integrated block implementations of width x8 and width x1, they seem to have the same badwidth for reading/writing to their BAR memory space, which bothers me a lot. Does anyone here know where might be the mistake? The main function is pasted for reference.
05-19-2011 07:42 AM
Unless you starting DMAing packets with high payload, you're not going to see much of a difference in performance with more lanes. This is because the packets are limited to 1 Dword. You're going to be subject to your processor speed when you only send 1 DW packets.
If you set up a Bus Master DMA engine as described in XAPP 1052, then you will see the performance benefit.
I would also recommend reading through the following white paper on PCI Express Performance:
Hope this helps...
05-20-2011 11:28 AM
thanks for your response Luisb.
So are you saying without DMA, the CPU by dafault construct pcie packet with only 1 DWORD payload?
I thought the max payload size of TLP sizes varies from 128-4k bits...so why is it limited 1 DWORD when CPU issues the read/write?
05-23-2011 07:32 AM
That is correct, without DMA the default packet is 1DWORD. This is definitely a known restriction in the pcie world and the workaround is to have your own dma engine. I really don't know the reason why most systems are designed this way, but I would guess it's so that the processor is not hung waiting for large transfers to occur. If you offload this to another module, then the processor can continue while the hardware will read or write directly to memory.