07-13-2017 01:52 PM
I am experiencing significant incoming UDP packet corruption (about 40 /8000 packets, packet size 528 bytes) in my custom ZYNQ board. It happens in an isolated network connecting only two devices back to back. I would expect no packet loss in such a simple network.
After I enable the lwip UDP checksum check option, these packets got discarded. It looks like the packets are corrupted inside the lwip stack. However, when I tried to debug the lwip stack by changing the compile flag to -O0 in the BSP build, the problem disappeared.
This behavior is reproducible, It works fine with no packet corruption when I build using -O0. There were packet corruptions using other build options (-O1, to -O3).
Any suggestion on how to resolve this issue?
07-13-2017 07:54 PM
What makes you think that the Zynq device has speed enough to handle the traffic?
The master GP ports are 128 MB/s and the HP ports just 256 MB/s each. There are many choke points. Is this 10G or 1G? Is this LInux or baremetal?
Also if this is being DMA'd throught the slave HP, beware that the HP port does not take in account cache coherency of the PS so you might end up with corrupted data. You got to invalidate cache before reading your buffer.
This can be solved by flowing the data through the ACP slave port but that will potentially make the PS slower because of cache misses - as expected.
There's a ton of info here (for Linux)
07-14-2017 11:14 AM
There is no UDP corruption when I turn compile optimization off (-O0) in build the bsp. If the board can handle the traffic with no code optimization, I would expect the optimized code (-O2) should have no problem handling it.
I am running baremetal with 1G network speed connection.
I do not explicitly call xil_ICacheEnable() or xil_DCacheEnable() in my code. Unless it is implicitly enable somewhere, cache is not enabled.
07-14-2017 11:22 AM
@derekyu Yes, cache is implicitly enabled on Zynq. That is why the ACP port exists in first place.
A non-optimized build might be allowing enough time for cache coherency, who knows.
When you are dealing with the HP ports, clearing the cache is mandatory on Zynq. You should just do it and see what happens, my advice.
07-17-2017 08:35 AM
If there is a need to clear the cache, I would expect the Xilinx SDK lwip contrib should have perform this already (I see the call to Xil_DCacheInvalidateRange() in setup_rx_bds() of xemacpsif_dma.c.
I would have no clue to find out which cache line needs to be cleared.
07-17-2017 09:39 AM
I doubt lwip would be aware where the buffers are being read from as you can point them anywhere.
In fact, if you are using the ACP, the last thing you want to do is invalidate the cache.
Xil_DCacheInvalidateRange( ptr, size )
where (ptr,size) is the memory location you are about to read/write.
07-17-2017 02:12 PM
The following description comes from emacps_v3_1 Documentation:
Alignment & Data Cache Restrictions
Both cache invalidate/flush are taken care of in driver code.
This is what I meant that I expect the SDK to take care of invalidate the cache.
07-17-2017 02:43 PM
07-18-2017 03:23 PM
What is the nature of the corruption? If you compare the bytes you sent to the bytes you received, are there random errors or always certain bytes?
07-18-2017 04:45 PM
With lwIP, during development of our drivers, I've encountered many times -O0 being OK and other levels failing.
All the times, lwIP was not the issue, but the lower level I/F & driver was.
Each case was unique.
If it helps, here's a few things you could check:
- On the Zynq, the DMA descriptors cannot be in cache memory as they are contiguous blocks of 16 bytes (1/2 a cache line) and at -O0, this may not be visible if they are in cached memory.
- buffer copying : GCC library memmove() / memcpy() behaves differently between -O0 and the others levels (try using lwIP own copy function)
- at -O1, the code is already faster than -O0:
check the throughput and add packet pacing to bring it down to see if becomes OK.
- Running out of buffers and the low level goes on and doesn't wait for an available buffer (increase the # of DMA buffers / add blocking on no buffer).
I can't provide specific insight in the BSP EMAC driver. A few years ago, looking at it, I was a bit baffled on its complexity when it's quite a simple set of operations to perform.
07-27-2017 11:40 AM
Thanks for the comments. Here are some more information I gathered:
1. The packet corruption was some what random with no pattern on when it happened. Sometimes no corruption. Sometimes the corruption ratio is about 50 packets/ 10,000 packets. There was no particular pattern in the corrupted data as well.
2. I realized that hardware checksum offload was enabled. However, checksum error showed up when I enabled the lwip software UDP checksum check. This implied the packet was received OK and was corrupted after the packet was received and before the lwip check the checksum. So the likely candidate to corrupt the packet was the interaction between the DMA and the cache controller.
3. I also tried disable data cache and there was no UDP packet corruption afterward. The performance, however, was even worst than turning optimization off.
4. The software path from receiving the packet to checking the checksum in the lwip stack are managed by the Xilinx SDK. I tried to look into files like xemacpsif_dma.c and other files in the xilinx contributed netif folder. So far I found no candidate to insert invalid data cache.
07-27-2017 12:41 PM
07-27-2017 04:52 PM
In the lwIP port / driver directory, look for a function named low_level_input().
That's the standard function name used in lwIP to I/F between the stack and the EMAC driver.
There's certainly a copying done from the driver buffer to lwIP's pbuf (the pbuf is supplied by the stack/caller).
This is where you would need to apply a cache invalidate (not a flush) on the driver buffer before it being copied.