UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Observer skoehler
Observer
1,192 Views
Registered: ‎05-25-2018

Bug: TCP/IP data corruption

When receiving data via TCP/IP, the data often gets corrupted. I can reproduce the data corruption with a few lines of code. I can post code here. I'm using lwIP 2.0.2 in a baremetal app for the Zynq 7010 (zybo board). The app is compiled using XSDK version 2018.2.1. I'm using the Ethernet controller of the Zynq's PS, not an AXI Ethernet in the PL.

 

I simply send 1MB of random data preceded by an adler32 checksum. If I send the data slowly, almost no corruptions occur. However, if I send the data without any throttling, the data almost surely gets corrupted. The Zybo board is connected directly to an 100MBit USB adapter.

 

I believe the issue might be related to the checksum offloading. Or maybe the handling of the DMA buffers has an error?

 

In lwipopts.h we have the following:

 

#define CHECKSUM_GEN_TCP 	0
#define CHECKSUM_GEN_UDP 	0
#define CHECKSUM_GEN_IP  	0
#define CHECKSUM_CHECK_TCP  0
#define CHECKSUM_CHECK_UDP  0
#define CHECKSUM_CHECK_IP 	0
#define LWIP_FULL_CSUM_OFFLOAD_RX  1
#define LWIP_FULL_CSUM_OFFLOAD_TX  1

 

If I set CHECKSUM_CHECK_TCP to 1 by hand, the errors disappear completely. So this seems to be a quite good workaround, even though it might cause a lot of CPU load.

 

However, lwipopts.h is frequently overwritten by the XSDK (cleaning the project, modifying the BSP settings). As long as the problem persists, I need a reliable way to enable the workaround. When I review the BSP's settings, all the checksum offloading settings are disabled. But the options seem to apply to AXI Ethernet only. I'm using the controller in the Zynq's PS, not an AXI controller in the PL. Unfortunately, as was pointed out here before, the editor of the lwip options is quite limited. So I don't seem to have any say in the value of the CHECKSUM_CHECK_* values above.

 

Is Xilinx aware of the problem? If not here, where can I file a proper bug report?

This is quite a severe problem, since it affects most users of lwIP+TCP/IP on the Zynq 7000.

 

 

0 Kudos
6 Replies
Observer skoehler
Observer
1,179 Views
Registered: ‎05-25-2018

Re: Bug: TCP/IP data corruption

I found this thread: https://forums.xilinx.com/t5/Embedded-Processor-System-Design/ZYNQ-platform-UDP-Packet-corruption-when-optimization-is-ON/td-p/778735

 

I made sure that I have -O0 in the compiler options of the BSP. There was no improvement. The data still gets corrupted.

 

0 Kudos
Observer skoehler
Observer
1,170 Views
Registered: ‎05-25-2018

Re: Bug: TCP/IP data corruption

BTW: I have not been able to reproduce data corruption with UDP packets. Only TCP seems to be affected.

0 Kudos
Observer skoehler
Observer
1,116 Views
Registered: ‎05-25-2018

Re: Bug: TCP/IP data corruption

Turns out that the problem is related to the size of the TCP segments.

 

With the default TCP settings, the tcp_wnd setting is pretty small: 2048. This results in TCP segments not being any larger than 1024, because the host (a Linux PC in my case) won't send TCP segments larger than half the window size. I don't like such a small TCP window, so I had changed the setting to 0x8000. After all, such a small window will impact performance.

 

Since I changed the tcp_wnd parameter to 0x8000 and the default tcp_mss setting is 1460, the TCP segments became larger than 1024 bytes and I saw lots and lots of checksum errors.

 

If I set tcp_mss to 1024 to artificially limit the size of TCP segments, then the checksum problems disappear - even if I use the larger tcp_wnd value. So setting the tcp_mss to something small (like 1024) seems to be a good workaround.

 

Update: The problem persists. Data corruption occurs once every 600MB, even if the MSS is 1024

Update 2: an even smaller MSS value (768) seems to reduce the likelihood of data corruption even further. I was able transfer gigabytes without data corruption. However, the bug still persists and this is simply a workaround.

0 Kudos
Adventurer
Adventurer
1,052 Views
Registered: ‎09-19-2016

Re: Bug: TCP/IP data corruption

0 Kudos
Observer skoehler
Observer
1,048 Views
Registered: ‎05-25-2018

Re: Bug: TCP/IP data corruption

I will try that fix within a few hours. Thanks for pointing that thread out.

How can I keep the XSDK from overwriting the fixed version of the files?

0 Kudos
Observer skoehler
Observer
1,039 Views
Registered: ‎05-25-2018

Re: Bug: TCP/IP data corruption

The solution discussed in https://github.com/Xilinx/embeddedsw/issues/53 fixes the problem described above as well.

 

I edited the file SDK/2018.2/data/embeddedsw/ThirdParty/sw_services/lwip202_v1_1/src/contrib/ports/xilinx/netif/xemacpsif_dma.c of my Xilinx installation to permanently apply the fix.

0 Kudos