Showing results for 
Show  only  | Search instead for 
Did you mean: 
Registered: ‎10-20-2014

LwIP library on Zynq broken


The LWIP library included with Xilinx SDK 2014.4 has a problem with handling of received packets.  
It seems to be a related to handling of the memory cache. The problem manifests through  
byte-swapped port numbers appearing in tcp_input(). Affected packets are skipped, and LWIP  
sends out TCP-RST packets for these invalid packets.
Of course the TCP protocol will recover from the data loss, but the data rate drops
down to 20-50 Mbit/sec - fairly low for a 1GB/s ethernet connection.  
The problem can be easily reproduced on a ZC702 Board with the  
Echo Server template applications from SDK and a few patches that are described here.  
I have included the patched files for reference.  
Also included is a small Winsock PC application that connects to  
the echo server at port 7, continuously sends data and prints the achieved  
data rate on the console.
=== Steps to recreate the Project from scratch ====
- run Xilinx SDK 2014.4, point it to an empty folder as workspace.
- create a new "lwIP Echo Server" Application for the ZC702 board.  
Apply the following patches:
in echo.c,  the following patches are necessary:
    add this function somewhere before recv_callback:
        void store_payload( const char* payload, unsigned int len )
            static char buffer[100*1024];
            static int  index = 0;
            if( payload==NULL ) { while(1){} }
            if( len > 2*1024  ) { while(1){} }
            if( sizeof(buffer)-index < len )
                index = 0;
            memcpy( buffer+index, payload, len );
            index += len;
    This function is used to actually store the data from recv_callback
    into memory. A 100kB circular buffer is used to create a memory access  
    pattern that triggers the caching bug more often.
    in echo.c, in recv_callback: replace this line:
        err = tcp_write(tpcb, p->payload, p->len, 1);
    with this one:  
        store_payload( p->payload, p->len );
    This is used to store the data in memory instead of echoing it back.
    in echo.c, in recv_callback():        
    replace this line:
        tcp_recved(tpcb, p->len);
    with this:
        tcp_recved(tpcb, p->tot_len);
    This actually fixes another unrelated bug in the echo server template,   
    that can cause tcp receive window congestion.
    It's recommended to fix this bug with the line above, otherwise it  
    makes tracing the real caching bug more difficult.
in system.mss, in the section on lwip:
    Remove the DHCP parameters, and add the following parameters:
        PARAMETER mem_size = 524288
        PARAMETER memp_n_pbuf = 2048
        PARAMETER memp_n_tcp_pcb = 1024
        PARAMETER memp_n_tcp_seg = 1024
        PARAMETER n_rx_descriptors = 256
        PARAMETER n_tx_descriptors = 256
        PARAMETER pbuf_pool_size = 4096
        PARAMETER tcp_debug = true
        PARAMETER tcp_snd_buf = 65535
        PARAMETER tcp_wnd = 65535
    These parameters taken from XAPP1026.
    Then, click "Re-generate BSP Sources", to actually apply these parameters.
    The automatic rebuild is not sufficient!
    These parameters are required for the caching bug to occur more often.
    They are taken from XAPP1026.
    NOTE: While it is possible to tweak the settings such that the bug no longer  
    occurs (e.g. by setting tcp_wnd to 10000), the bug is still there and
    will occur in applications with more complex memory access patterns.
in bsp/ps7_cortexa9_0/libsrc/lwip140_v2_3/src/lwip-1.4.0/src/core/tcp_in.c:
    search for this loop (easily found by searching for the string "destined"):
        for(pcb = tcp_active_pcbs; pcb != NULL; pcb = pcb->next)  
    Right before this loop, add the following code:
        if( tcphdr->dest == 0x700 )
            xil_printf( "byte-flipped port address: expected:0x0007 got:0x0700\n");
    Do NOT "re-generate the BSP sources", otherwise this patch is lost.
    The printf you added her should never be triggered, but it will be,  
    exposing the caching problem.  
    It appears that a previously already processed packet from the cache is visible,  
    since port numbers are byte-flipped from big endian to little endian order in tcp_input().
Run the test:
Connect the ZC702 board via Gbit-Ethernet cable to a PC, configure the PC IP  
in the same subnet (e.g.
Connect a UART console, and run the application.
Use the included PC application to connect to the Board on TCP Port 7,  
and continuously send data to it. Note: The included PC Application is just an  
example - any application that connects to on TCP-Port 7 and  
spams the server with data will do.
After the data transfer is started, you will get following lines  
in the UART console after about 2 seconds:
    Board IP:
    Netmask :
    Gateway :
    TCP echo server started @ port 7
    byte-flipped port address: expected:0x0007 got:0x0700
    byte-flipped port address: expected:0x0007 got:0x0700
    byte-flipped port address: expected:0x0007 got:0x0700
    byte-flipped port address: expected:0x0007 got:0x0700
The PC application console shows high fluctuations in data rate:
    net_init : 1
    net_init : 2
    net_init : 2.1
    net_init : 2.2
    net_init : 3
    connection established. sending...
        2 Data Rate MBit/sec : 155.7 mean=155.67
        4 Data Rate MBit/sec : 123.5 mean=137.75
        6 Data Rate MBit/sec : 88.4 mean=116.16
        8 Data Rate MBit/sec : 106.4 mean=113.56
       10 Data Rate MBit/sec : 233.2 mean=126.54
       12 Data Rate MBit/sec : 137.8 mean=128.29
       14 Data Rate MBit/sec : 270.8 mean=138.72
       16 Data Rate MBit/sec : 129.3 mean=137.46
       18 Data Rate MBit/sec : 129.3 mean=136.50
       20 Data Rate MBit/sec : 233.2 mean=142.41    
To fix this caching problem and prove that it's caching related, add this to main.c:
        #include "xil_cache.h"
and at the beginning of main():
The performance drops considerably, but no more byte-swapping will occur.
UART console:
    Board IP:
    Netmask :
    Gateway :
    TCP echo server started @ port 7
PC Application Console:
    net_init : 1
    net_init : 2
    net_init : 2.1
    net_init : 2.2
    net_init : 3
    connection established. sending...
        2 Data Rate MBit/sec : 178.8 mean=178.82
        4 Data Rate MBit/sec : 168.0 mean=173.26
        6 Data Rate MBit/sec : 175.0 mean=173.84
        8 Data Rate MBit/sec : 186.7 mean=176.89
       10 Data Rate MBit/sec : 182.6 mean=177.99
       12 Data Rate MBit/sec : 182.8 mean=178.78
       14 Data Rate MBit/sec : 182.6 mean=179.31
       16 Data Rate MBit/sec : 175.0 mean=178.75
       18 Data Rate MBit/sec : 168.0 mean=177.50
       20 Data Rate MBit/sec : 171.6 mean=176.89
       22 Data Rate MBit/sec : 182.6 mean=177.39
       24 Data Rate MBit/sec : 171.3 mean=176.87
       26 Data Rate MBit/sec : 178.8 mean=177.01
       28 Data Rate MBit/sec : 178.8 mean=177.14


Can anyone check this and confirm this problem?

And will it be fixed any time soon, making LWIP more reliable on zynq?


2 Replies
Registered: ‎06-01-2015

I saw no response to this post which is disappointing as this must affect many users, although most may not be aware,  and IS STILL A PROBLEM in the lwip stack supplied with Vivado 2016.1!!!


Below is a simpler way to replicate with the hope that Xilinx addresses this issue as it directly affects a Xilinx provided Example.


I saw another post on a different site noting a different issue caused by this same problem with the same ultimate but undesirable solution of disabling the Data cache.


This issue can basically be replicated by simply building the Lwip demo echo server and then pinging it.    About every other packet will get dropped (or not understood) and await the retry a second later making ping times jump from under a ms to a second.


If you add the same Xil_DCacheDisable() that you note in your post the issue goes away.   Sort of proving that it is related to cache issues.


Hopefully this can get addressed quickly!!

0 Kudos
Registered: ‎07-07-2017

This is still an issue in the 2017 SDK. We've seen this issue on other ARM cores using DCache ram and DMA, the memory section needs to either be no cache / write through depending on incoming / outgoing or have the page flushed before access.
0 Kudos