03-29-2020 04:14 PM
This is for 'pcie3u' core in EndPoint config in the X0Y4 position with Quad231+Quad232 (Gen3x8), on xcku085-flva1517-2-i.
We are observing that the PCIE HM accepts more commands on its RQ interface when PF0_DEV_CAP_MAX_PAYLOAD_SIZE parameter is set lower, such as:
- For MAX_PAYLOAD_SIZE=128bytes it starts pushing back after 15 enqueued commands
- For MAX_PAYLOAD_SIZE=256bytes it starts pushing back after 12 enqueued commands
- For MAX_PAYLOAD_SIZE=512bytes it starts pushing back after 10 enqueued commands
- For MAX_PAYLOAD_SIZE=1024bytes it starts pushing back after 8 enqueued commands
That has direct implications on the MEM_RD throughput, which is the highest for MAX_PAYLOAD_SIZE=128. But, such a minimal payload size lowers the MEM_WR throughput.
Any suggestions on how to make the PCIE core accept more requests for larger MAX_PAYLOAD_SIZE (512 is our ideal target), or insights into interactions between motherboard and PCIE core that are at play in this case?
04-28-2020 06:17 PM
04-28-2020 06:17 PM
04-29-2020 12:21 AM
04-29-2020 03:55 PM
there is no mixing of read and write requests in this particular test -- It is solely and exclusively MEM_RD DMA request from Xilinx EP into x86_64 mobo RC.
Is the core 'recovery' from TREADY=0 slow, thus the designer should strive to avoid putting it to such state?
04-29-2020 04:08 PM
04-29-2020 04:31 PM
got it, tks., it's good to know this subtlety.
So, with my completion buffers set in Xilinx IP Catalog to 'Extreme Performance', i.e. the maximum that the PCIE core offers, is there anything I can do within PCIE core config to also maximize the flow control credits, primarily for MEM_RD RQ?
Or, is that a system issue? (which my empirical success by swapping in the newer mobo seems to indicate)
04-29-2020 04:36 PM
12-30-2020 10:51 AM
Do you know if lack of credits on the rq interface could cause a lockup on cc? I am running into an issue with the PCIe core where the root is throwing a completion timeout error, presumably from a non-posted request sent the the fpga. During this time there was lots of data being mastered by the fpga on the rq interface, and I might not have been respecting the flow control credits correctly. In the documentation it mentions head of line blocking, where a lack of non-posted credits can block posted requests behind it. But does the same apply to completions? Can a non-posted request on rq blocked by credits prevent a completion from being sent from the cc interface?
02-09-2021 11:34 PM - edited 02-10-2021 01:10 AM
IIRC completions cannot be blocked by anything but other completions. There are six types of credits, with separate header and data credits for posted requests, non-posted requests, and completions. So as long as you have completion credits, you can send completions. The main head of line blocking that you need to worry about is on the RQ interface only, and this shouldn't cause any issues other than blocking that interface. I suppose that is one of the advantages of the RQ+RC/CQ+CC split in the ultrascale/ultrascale+ core, the core can exert backpressure against RQ while not blocking CC. If you were to do this muxing outside of the core (as in older Xilinx parts and all Intel parts that I am familiar with that have a single TX interface and a single RX interface), then this could be an issue.
Edit: almost forgot about the PCIe ordering model. Completions cannot pass posted requests. However, I'm not sure if this applies in the Xilinx core until the posted request is actually sent on the wire. I'm leaning towards you may need to enforce this yourself in some way by looking at TX sequence numbers if it's important.
At any rate, what you can do is set cfg_fc_sel = 3'b100 and then monitor cfg_fc_nph, cfg_fc_ph, and cfg_fc_pd. If cfg_fc_nph falls below some small number - say, 4, or 8 - then don't issue any read requests. Similarly, if cfg_fc_ph or cfg_fc_pd fall too low, don't send any write requests. One unit of cfg_fc_pd is 4 DWORDs (16 aligned bytes). If you do that right, then RQ should never block. Ah, that may not be sufficient actually - the core does some management of read requests internally, and if there isn't sufficient completion buffer space, then read requests will not be released from the core, and if they aren't released then no non-posted credits will be consumed. Ostensibly that's what pcie_tfc_nph_av/pcie_tfc_npd_av are for, but these don't work, so the only option left is transmit sequence numbers. So, what you need to do is mark read requests and limit the number of outstanding read requests. What I do is set the MSB of the transmit sequence number and then use a counter to keep track of in-flight operations, incremented when issuing the request and decremented based on pcie_rq_seq_num/pcie_rq_seq_num_vld. I have found that a limit of 16 in-flight operations is reasonable. Once there are 16 requests sitting in the hard IP core, no new requests are issued until the core releases some of these towards the host.