I am trying to solve a problem that I have thus far tracked inside the LwIP stack. Here's some context: Using a MicroBlaze in an Artix-7 with the Xilinx TEMAC and DMA in IP Integrator (Vivado 2017.1). The MicroBlaze is running FreeRTOS with LwIP (socket API). The application accepts a TCP connection from external software. Once the connection is established, the client software sends data requests and the MicroBlaze responds with multiple sends, each consisted of several hundred bytes.
Most of the time, this works fine. But occasionally (and more frequently if it has been powered on for a few hours before connecting the socket), there is an error during the sending. The original symptom was a Decode Error reported by the DMA. After some debugging, it was determined that the decode error was due to an invalid (out of range) buffer address in a descriptor. Further debugging revealed that this was due to LwIP continuing past the end of the pbuf chain in ip_frag() because ->tot_len in the first pbuf was greater that the sum of the lengths of all the pbufs in the chain. LwIP only uses the length and does not check for ->next == null, so it continues past the last pbuf and continues processing an invalid pbuf located at 0x00000000. This is where the invalid buffer address originates.
For the last couple of occurances that I trapped, the error occured during a TCP retrans. The first pbuf->tot_len was 1520, but the sum of ->len and ->next->tot_len was 1500. The MTU is configured for 1500. Interestingly, 20 is the length of the IP header. None of the application sends are for lengths greater than 1024 (and most are less than that), so it appears the condition is a combination of TCP nagling (combining), IP fragmenting, and possibly TCP retrans.
I am unsure how best to continue tracing the problem to its source. It's possible that there is a problem with our use of LwIP and threading. I also wonder if there is an issue in the Xilinx driver/port. Has anyone experienced anything similar? Any suggestions on where to look for the cause of pbuf IP packet length errors?