07-22-2020 05:56 PM
I am using Vivado 2017.4 and UltraScale Devices Gen3 Integrated Block for PCI Express v4.4. FPGA is an KU115 on a hitech global card, HTG-830. HIP core is configured for Gen2x4, 128-bit interface. This is the version of the PCIe HIP core where the user parses and generates TLPs between the 4 HIP interfaces (CQ, CC, RQ, RC). Pg156 is the document used as reference (as well as the huge Mindspring PCIe reference).
So we have ultimately encountered corrupted data from two different testcases.
Read and write request TLPs are generated by user software (single dword payloads). Once piece of software is running asynchronous background write and read operations whereas the other one is a user test application. The test application is writing, then reading to user registers in the FPGA EP, in loops in the 10’s of thousands, as a stress test (good ol' fashioned register test). When the background read operations are running, we see a few corrupted completion data values returned during our register stress test. When we turn off the background operations, the stress test runs error free.
The same background asynchronous software is running (reads and writes, single dword payloads). However, the stress test is different in that the FPGA EP generates block write request TLPs (as bus master, multiple dword payload). We see very similar corrupted data. When we turn off the background operations, the stress test runs error free.
With testcase #1, all of the requests are handled by the CQ and CC interfaces by the EP as a completer device. We see the corruption when the requests (test and async/background) are all to the same BAR, or to different BARs.
With testcase #2, the EP is sending data over the RQ interface as bus master, but the corrupted data is from asynchronous completion data returned over the CC interface (from requests over the CQ). Juggling data between CC and RQ interfaces is solely handled within the HIP core.
Are there any known problems or has anyone else encountered similar behavior with the HIP core scrambling or mixing completion data and/or/with master request data over time or under high speed/volume stress?
We still plan to investigate further on our end (software, TLP request parser/completer/router logic, chipscope, simulation, different host platform, different FPGA/card) in the future, but wanted to reach out and see if there were any known, similar and/or related issues. Nothing popped up on my search of the Xilinx web forums. Host platform is running on Centos7.x. Max payload of the system is 256 bytes as reported by lspci.
I have a full PCIe EP (PIPE) simulation (using Modelsim PE), but obviously cannot realistically test high volume/speed in the way our stress tests do on actual hardware.
I have not yet tried a different version of the tools or HIP core, although it is likely an option in the future.
08-19-2020 03:26 PM
No, there are no known issues. Before delving into this issue more, could you please make sure the following requirement ,as mentioned in PG156, is adhered to?
Make sure tvalid is not deasserted before the transfer is complete. You could add a suitable trigger in ILA and see if it triggers if the requirement mentioned above is not met.
08-27-2020 07:56 AM
yes, I am confident I am driving CC TValid properly. I can verify with chipscope however.
I have reposted this topic using a new account based upon my work email, as recommended by my FAE (and so I can submit SR).
You can find it here: https://forums.xilinx.com/t5/PCIe-and-CPM/PCIe-Data-Corruption/td-p/1143944