Baselining NWL DMA or PIO for large data transfers over PCIe
I'm struggling with an PCIe input data problem and need some technical advice before I waste too much time paddling in the wrong direction. I have two example designs, the PIO that comes standard with the Vivado IP and the TRD that is built in planahead and looks like it may be a bit antiquated, even though I have successfully got it up and runnning. I'm trying to modify one of these example designs so the incoming TLP packets will be filtered and transferred directly to DDR3 through the HP ports.
So my first plan was to take the PIO and add in some states that handled TLP's greater than size 1 and feed those packets through to my filter. This was under the assumption I could easily send maximum sized TLPs orhad some control over that, which I've now been informed that I really don't and that's handled at a level that I as a user (Linux or Windows) can't really control or guarantee. So my simple solution really doesn't work if I can't have larger TLP's sent by the host through posted writes.
So following this disappointment, I found that one can achieve greater packet DW sizes by using a DMA engine either on the endpoint or from the Host acting in a rootport config. So thus I started up the ZC706 TRD for pcie, which uses the NWL core(yes I know this core stops working after 12 hours) since I figured that I could achieve the larger data packet sizes using this DMA. But I'm not really comfortable with this design for several reasons.
1) I don't really need a lot of the VDMA and DDR3 storage features of the design so a lot of this TRD for me is vestigle space I could be using for my application, whereas the PIO was so light weight and generally easier to follow design.
2) The design looks like it's being phased out and it's built in planahead which I've used before but a great deal of the documentation for the Integrated PCIe 7 series core is assuming you are using Vivado
3) Is the NWL DMA actually getting larger DW packet sizes? It looks like it and I would assume it is based on the throughput it claims to be getting but the performance monitoring doesn't look to me like it's looking at the actual DW data but instead just the total data such as HW and DW.