cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
250 Views
Registered: ‎02-27-2014

PCIE Over Optical cable Performance

Hello Experts,
 
We are using a VCU118 board connected to a host via QSFP interface. The GT transceivers of the QSFP modules are routed to the XDMA core internally. So basically we are making a PCIe Gen3 link over optical cables. The link is stable and the performance of the both directions (C2H and H2C) are around 87% when we use a 10 meter optical cable. But the performance of the C2H direction drops to 40% when we use a 100m cable, whilst, the performance of the H2C is around 82% in this case.
Looking at the "design_1_xdma_0_1_pcie4_ip_pcie4_uscale_core_top.v" file, we found the following parameters that's been used by the core to calculate the transmit and receive flow control parameters but I have no idea how these numbers are calculated nor used:
 
 
 

 

image.png
 
So the question is, if we are looking at the right parameters to solve our problems and if yes, how do we set them correctly.
Thanks for your help.
 
Cheers,
Reza
0 Kudos
5 Replies
Highlighted
Xilinx Employee
Xilinx Employee
164 Views
Registered: ‎08-06-2008

The image wasn't attached properly so not sure what you were referring to.

The performance issue could be due to link issue as well. Have you checked if LTSSM is dropping to recovery during data transfer?

Thanks.

 

0 Kudos
Highlighted
160 Views
Registered: ‎02-27-2014

Thanks for your reply.

Yes I have checked the LTSSM and PCIe link status, the link is stable with no disconnection, error or recovery.

The snap shot of the credit related parameters are attached again here.

Thanks.

credit_params.png
0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
113 Views
Registered: ‎08-06-2008

Thanks for posting the image. Those parameters are factory defined and need not be tuned. Regarding the issue you are having, it is difficult to pin-point where the issue might be coming from. I have put below few pointers that might help to debug the issue:

  • Check the Link Status in lspci to ensure that your link is coming up to the full speed and width
  • Check: “Getting the Best Performance with Xilinx’s DMA for PCI Express” https://www.youtube.com/watch?v=WcEvAvtXL94
  • Check XDMA Debug Guide – AR71435 https://www.xilinx.com/support/answers/71435.html
  • Check XDMA Performance Number answer record – AR68049 https://www.xilinx.com/support/answers/68049.html
  • Use the latest driver from: https://github.com/Xilinx/dma_ip_drivers/tree/master/XDMA/linux-kernel
  • Try with the latest version of Vivado
  • Check if the device used is production or ES.
  • Try using prefetchable BAR memory.
  • One of the main factors affecting data throughput is interrupt processing. Once data transfer is completed, the DMA sends an interrupt to the host and waits for ISR to process the status. However, this wait time is not predictable and so the overall total data transfer time is slow and unpredictable. There are a couple of options you can try to work around this.
    • MSI-X interrupt: Users can try using MSI-X interrupt Instead of MSI or legacy interrupts. With MSI-X interrupt, the data rate is better than with an MSI or legacy interrupt-based design.
    • Poll mode (See AR:71435 Users can try using Poll mode which gives the best data rate. With Poll mode, there are no interrupts to process.
    • Try with Descriptor Credit based transfer. (See: AR71435)
  • Have you checked MPS, MRRS values? Systems with better MPS will give a better performance; a typical system would have 128Bytes MPS. (See WP350 for more details).
  • Double check link stability. See AR71355 for more details.
  • Do you have a link analyzer to see if there are any NAKs being issued? You could also check this by looking at the PIPE interface using Gen3 descrambler module. See https://forums.xilinx.com/t5/Design-and-Debug-Techniques-Blog/Demystifying-PIPE-interface-packets-using-the-in-built/ba-p/980246  for more details.
  • Try with Gen3x8
  • Try increasing the DMA transfer size (within the limit of the application).
  • Try increasing higher number of channels. The performance should increase but the trade-off is, it will consume more logic inside the device.
  • Check credit information. See if there is enough credit from the link partner for the XDMA IP to initiate data transfer.
  • Try using BRAM if DDR is being used.
  • If you have configured your design for Gen3, can you try by configuring it as Gen2? Gen3 could be more error-prone if the signal integrity on the board is not robust enough. Configuring the IP for a lower speed will reduce the possibility of signal integrity issue kicking in. If this happens, you should see an increase in performance.
  • What is the AXI side data width and frequency configured for? Have you tried higher values if the option is available?
  • Do you have AXI Smart Connect in your design? If so, try by replacing it with AXI Interconnect IP
  • If you have AXI Interconnect, try by using it in synchronous mode.
  • See if your AXI system is on the same data width. This could provide some performance boost if the performance is affected due to hardware.
  • Try by disabling Narrow Burst, if it is enabled.
  • Check in the XDMA log if there is a call for repeated ISR.

Thanks.

This video walks through the process of setting up and testing the performance of Xilinx's PCIe DMA Subsystem. The video will show the hardware performance t...
0 Kudos
Highlighted
50 Views
Registered: ‎02-27-2014

Thanks for your reply.

Here is the answer of your recommendations: 

  • Try with the latest version of Vivado: The current version is 2019.2. I tried 2020.1 and the link could not even get established. So went back to 2019.2
  • Check if the device used is production or ES. The device is Virtex UltraScale+ VCU118 Evaluation Platform (xcvu9p-flga2104-2L-e)
  • Try using prefetchable BAR memory. Did not change anything.
  • One of the main factors affecting data throughput is interrupt processing. Once data transfer is completed, the DMA sends an interrupt to the host and waits for ISR to process the status. However, this wait time is not predictable and so the overall total data transfer time is slow and unpredictable. There are a couple of options you can try to work around this.
    • MSI-X interrupt: Users can try using MSI-X interrupt Instead of MSI or legacy interrupts. With MSI-X interrupt, the data rate is better than with an MSI or legacy interrupt-based design.
    • Poll mode (See AR:71435 Users can try using Poll mode which gives the best data rate. With Poll mode, there are no interrupts to process.
    • Try with Descriptor Credit based transfer. (See: AR71435)
    • It acts the same way with MSI-X or in the poling mode.
  • Have you checked MPS, MRRS values? Systems with better MPS will give a better performance; a typical system would have 128Bytes MPS. (SeeWP350for more details). MPS and MRRS are 256 Bytes
  • Double check link stability. See AR71355 for more details.
  • Do you have a link analyzer to see if there are any NAKs being issued? You could also check this by looking at the PIPE interface using Gen3 descrambler module. See https://forums.xilinx.com/t5/Design-and-Debug-Techniques-Blog/Demystifying-PIPE-interface-packets-using-the-in-built/ba-p/980246  for more details.
  • Try with Gen3x8 :  This is the current system speed.
  • Try increasing the DMA transfer size (within the limit of the application). It does not increase the performance more than 40%
  • Try increasing higher number of channels. The performance should increase but the trade-off is, it will consume more logic inside the device. Using 4 channels in the current system.
  • Check credit information. See if there is enough credit from the link partner for the XDMA IP to initiate data transfer. Please have a look at below waveforms
  • Try using BRAM if DDR is being used. Have not tried that yet. Currently I am using DDR4
  • If you have configured your design for Gen3, can you try by configuring it as Gen2? Gen3 could be more error-prone if the signal integrity on the board is not robust enough. Configuring the IP for a lower speed will reduce the possibility of signal integrity issue kicking in. If this happens, you should see an increase in performance. Have not tried that,
  • What is the AXI side data width and frequency configured for? Have you tried higher values if the option is available? 256 bits @ 250 Mhz
  • Do you have AXI Smart Connect in your design? If so, try by replacing it with AXI Interconnect IP: It is not a Axi Smart 
  • If you have AXI Interconnect, try by using it in synchronous mode. It is synchronous
  • See if your AXI system is on the same data width. This could provide some performance boost if the performance is affected due to hardware.
  • Try by disabling Narrow Burst, if it is enabled.
  • Check in the XDMA log if there is a call for repeated ISR.

 

I would like to emphasize on the fact that, the performance is around 88% (Read and Write) when we use a 10m optical cable and it only drops on the C2H direction when we use the 100m optical cable.

Here is a two snapshots of the Completer request  Axi interface plus the credits counts of the end device for both scenarios (10M and 100M). I hope it could shed some light on this issue:

sanp10M.png
snap100m.png
0 Kudos
Highlighted
Teacher
Teacher
41 Views
Registered: ‎07-09-2009

The fact its dependent upon link length / attenuation seems to indicate its a power / equalisation setting does it not ?

Just a thought have you tried more than on 100m cable ?
have you tried reversing the cable direction ( I know its crazy , but try it )

your using QSFP modules, I'm assuming your QSFP are active units, with re timers / equalisers in them ? if so the link is dependent upon them. QSFP come in different categories of how long a cable they can drive, could be that your QSFP is on its limits. Can you try swapping the QSFP out for other ones ?

I can't remember, how do the QSFP get controlled to set the Tx / Rx power / equalisation, is it over the I2C ior is it internal to the QSFP ?

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
0 Kudos