03-25-2019 09:01 AM
I'm working with the Axi Bridge for PCI Express block and have found large success. By setting the AXIBAR2PCIBAR via the AXI_CTL interface I have succesfully based data between the kernel and the FPGA. However, I recently tried to add a second dma in the kernel with a second BAR mapping. That way 1 could be for configuration and the other for data. This is where the problem appears.
My write to AXI2PCI Bar 0 correctly maps into the bus address the kernel wants and data appears in the kernel as expected.
The second write to AXI2PCI Bar 1 does not write at all it seems. The BRESP line is throwing SLVERR for the entire data burst. Both of these writes are done using the exact same method (and in fact the exact same data while I test) and strictly the AXI address is adjusted. I check the axi address in the "Address Editor" and the write should fall perfectly in line but still the problem persits. Any advice?
As a note the address passed by the kernel to load into AXIBAR2PCIBAR are all 4KB aligned which should be good according to the spec.
Pictures below to show what I am describing.
03-25-2019 10:02 AM
If you're ready to move on to DMA, I recommend using the XDMA block (pg195) as this has the basics you need to build a DMA-enabled system. It's popular, lots of folks here use it. Has drivers for Windows and Linux, and open-source drivers as well.
03-25-2019 10:15 AM
I took a look at that block but was a bit confused by it. The AXI block seems to support both read/write to/from the PCI device I don't see how that's possible with the DMA PCI block. Could you possibly expand a bit on how to use this block for those 4 transactions from the block design? I can see connecting it to an interconnect to then connect to a DMA but that still only seems to enable read/write from the kernel not the other way around.
As for the open source kernel are you talking about this one. https://www.xilinx.com/support/answers/65444.html If so I have taken a look at it before but it seems a lot of people in this forum have found it to be full of bugs/edge cases. Has that changed?
03-25-2019 01:49 PM - edited 03-25-2019 01:50 PM
The xdma block implements a basic DMA data mover to and from either to AXI-MM or AXI-S. In most devices it supports up to 2 host-to-card and 2 card-to-host. It has scatter-gather capability, and if you're using a local CPU (e.g., Zynq) it has a AXI-Lite slave as well. It's a good fit for the lower-end devices.
If you're looking for a higher perfomance solution however the Northwest Logic DMA IP may be more suitable. It's what's used in Xilinx's own TRD's for KC705, VC707/709, AC201, etc. But note that depending on its configuration this block is larger and may not fit on a smaller device. (You didn't specify which device you were using.) Nevertheless for 10G Ethernet it's worth a look.
When xdma used to communictate with AXI-MM, you need a switch to map the devices. When used to communicate with AXI-S, the ports are inferred on the block directly and use TDEST routing.
03-26-2019 05:48 AM
Thank you for the response. I'm taking more of a look at that IP and once I set it to AXI stream mode I see a lot more what you are talking about.
That said this block still offers me one piece of confusion. I see BAR's titled, PCIe to DMA and PCIe to AXI lite but I don't see the reverse of this. Without a AXI to PCIe or DMA to PCIe BAR how can I properly map my dma data into the kernels bus address? Am I thinking of this wrong and these fields somehow do both ways? You say that when AXI-S is used TDEST is used to for routing. Does that mean that TDEST is esentially used as the BAR? Would the same be true but with awaddr if I used AXI-MM?
Sorry for what are likely simple questions but this block is honestly confusing me. I found this man's design to offer some sense but he appears to have configured the IP to Bridge mode which my IP will not allow (guessing due to the chip selected). https://forums.xilinx.com/t5/PCI-Express/DMA-Bridge-Subsystem-for-PCIe/td-p/861231
As for the chip I am using an xcku095-ffva1156-2-e which is a Kintex Ultrascale.
03-26-2019 08:49 AM - edited 03-26-2019 08:52 AM
Assume for the moment we're using xdma as an endpoint DMA, with its transfers managed by the PCIe host. Let's also assume AXI-MM operation for DMA (stream DMA would, by definition, have no address.)
Let's look at a host-to-card (h2c) transfer.
The DMA unit has two 64-bit addresses: the source address, which is mapped into PCIe (virtual) space, and the destination address which is mapped into AXI DMA physical space.
On the PCIe side, the DMA driver knows the virtual address that it's been handed by the kernel when it mmap'd the memory buffer. So it uses that offset in forming the DMA source address.
On the AXI side you're dealing with a physical hardware address that you define, which the DMA unit can address with its entire 32-bit address (or 64 bit, depending on whether you tick the '64 bit enable' box or not.) It knows nothing of PCIe virtual address and has an implied offset of zero. It's up to you to form the address, but since you know your own address map this is used in the driver directly.
The PCIe-to-DMA Bypass path option *does* have a BAR however. This maps the AXI DMA space to the host, forming a dual-port access to AXI DMA which both the host and the DMA unit can use.
As for TDEST, that field doesn't really know about BAR. It's an AXI method that is part of the payload for routing data across a switch fabric. If you decide to use this method, it's up to you to supply TDEST and interpret it.
03-27-2019 06:46 AM - edited 03-27-2019 06:48 AM
Ahhh ok I am starting to get a good grip on how this block works (there are a lot of differences from the AXI one which I was not expecting). That said I've still got one nagging question. This block does support H2C and C2H but not how I anticipated, specifically on the C2H side of things. C2H seems to be described more as a read from the kernels perspective. This is extremely apparent given that in the descriptor for a C2H transfer I'm supposed to specify the length of the transfer in bytes. My intended application is passing network traffic from the FPGA to the kernel via PCIe. There is no practical way that the kernel can know the size of the packet before it sees the packet. In most cases I observe this problem is solved by having 2 spaces for the PCIe device to write to, one specifying things like the packet length, the status, the checksum, etc. The other was where you placed the data (these addresses the kernel would pass to the PCI device before the transfer which was just stored until the addresses were needed). Hence I don't need to know the length ahead of time, I provide a PAGE of memory to write to, the PAGE is written to with some amount of data, I check the status field, and then I read out what was valid. Now I see 2 mechanisms for supporting such behavior but both don't feel great.
1. I pad the beginning of my data with this information for the kernel and just blindly make all DMA transfers of size 1 PAGE.
2. I separate this singular data transfer into 2 transfers where the first is always a set length and the second is determined by the response of the first.
Neither case feels particularly good but I guess this is correct? (Is this where multiple DMA channels would be used?_
On this same note how does the Kernel know when to set the start flag at address 0x1004? Do I fire an MSI interrupt on the PCI bus and use different numbers to distinguish between transferring data and simply telling the kernel I have data for it? That seems like a great way to flood the kernels interrupt table (or well not really since it's MSI but kind of).
03-27-2019 02:26 PM - edited 03-27-2019 02:36 PM
As I said, xdma is a simple block that's only doing the data mover function.
For c2h, if there's a FIFO between your MAC and the AXI fabric you could use that FIFO's fullness to generate an interrupt so the host can intervene and set up the transfer. Not the most efficient way, but it works. That's how I use it.
I encourage you to take a look at the Kintex TRD's that use the Northwest Logic DMA block. This can yield some ideas on how you'd beat the xdma block into a form that's more to your liking, or you may decide it's not enough. You'll find it here: https://www.xilinx.com/support/documentation/boards_and_kits/k7_conn/2014_3/ug927-K7-Connectivity-TRD.pdf