cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
whelm
Explorer
Explorer
1,578 Views
Registered: ‎05-15-2014

LwIP configuration problems

Jump to solution

I have found some possible problems in lwip141.tcl and lwip202.tcl that result in mistakes in the generated lwipopts.h

 

The biggest problem is line 543:         puts $lwipopts_fd "\#define MEM_ALIGNMENT 64"

There appear to be two problems with this.  First, MEM_ALIGNMENT is supposed to be in bytes, not bits, and I don't think there is any point in aligning to 64 bytes.  Second, even if we assume the author thought he was working in bits, I don't think there is a reason for aligning it to even 64 bit boundary.  I believe everything should work fine on a 32 bit boundary, so the correct value for MEM_ALIGNMENT is 4.

 

In addition, the script should probably have an entry to allow setting MEM_USE_POOLS and possibly some of the related pools configuration options.

 

When 1 MB of DRAM is allocated for the protocol stack, neither of these is probably an issue.  But it is quite possible to build reasonable project using only OCM, in which case the memory allocated to the protocol stack might need to be closer to 32k.  Suddenly 64 byte alignment sounds like a big deal.  Also the use of dynamic memory allocation from the heap becomes very risky because of the possibility of fragmentation, leading to exhaustion.  Using a pool of fixed sized buffers, even though it can waste some memory is a much more appropriate approach, and having two or three pools of different sizes mitigates the wasted memory issue.  I recommend a 128 byte pool, a 256 byte pool and a 1700 byte pool.  Depending on the use case, the 256 might be better as 512.  I use mostly small UDP packets, so most would fit in a 128 byte buffer, as would most of those generated internally, such as ARP or ICMP.  The 1700 byte buffers are necessary for incoming packets, because their size is  unknown.  I've worked for many years with a similar protocol stack.  One product can have over 100 open UDP connections running on a 16 bit processor with 256 k Bytes of memory (code and data).  It has 64k bytes dedicated to buffers, split between 256 byte and 1500 byte buffers.

 

Of course, it is possible to hand create the lwipopts.h, which was the way LwIP was intended to be used.  But that is difficult with the Xilinx implementation, because the file is located in BSP and recreated by this tcl script every time.  It would be much better if it worked the way the linker lscript.ld worked, such that lwipopts.h is created once, and once created is never touched unless requested.  That way one could have the advantage of letting the tools create the initial version of it, but then letting the user hand create updates to customize it without fear that they will get wiped out.

 

1 Solution

Accepted Solutions
ericv
Scholar
Scholar
1,708 Views
Registered: ‎04-13-2015

@whelm

 

The A9 (Zynq 7XXX)  L1 cache cache line size is 32 bytes and the A53 (UltraScale+) is 64 bytes.

Aligning buffers on cache line boundaries eliminates spill over effect when flushing and / or invalidating cache.

 EMAC transfers the data through DMA so these cache operations have to be performed on the packet buffers.

 

View solution in original post

0 Kudos
Reply
4 Replies
ericv
Scholar
Scholar
1,709 Views
Registered: ‎04-13-2015

@whelm

 

The A9 (Zynq 7XXX)  L1 cache cache line size is 32 bytes and the A53 (UltraScale+) is 64 bytes.

Aligning buffers on cache line boundaries eliminates spill over effect when flushing and / or invalidating cache.

 EMAC transfers the data through DMA so these cache operations have to be performed on the packet buffers.

 

View solution in original post

0 Kudos
Reply
whelm
Explorer
Explorer
1,518 Views
Registered: ‎05-15-2014

So it could be #ifdefed to be 32 for A9 processors?

 

I'm working on a DDRless system to put ethernet on it.  A related problem arises for rx and tx descriptors.  I figure I need about 8 of each, for a total of 64 bytes each.  Xilinx doesn't seem to think anyone uses DDRless, so are very careless about any sort of support for it.  They set aside 1 MB for descriptors because it is supposed to be non-cachable.  Obviously that isn't going to work for DDRless.

 

So I'm considering alternatives.  One would be to place OCM in a MMU page with a secondary table so I could allocate only 4k to be non-cachable.  Another would be to turn off cache.  A third would be to lock some cache lines to the descriptor addresses.  A fourth might be to make OCM non-cachable for L1 D-cache and not I-cache, if that is possible.  Or I could create some memory in an empty area from BRAM and make that non-cachable.  Then there's the reality that if there is no external memory, the L2 cache isn't doing anything anyway (because OCM sites beside it) and lock it down to be 512kB of memory instead, making it cachable as main memory and use OCM for descriptors.  While I'm at it, I could put the packet buffers in there to and eliminate the alignment problem you mention with caching, although forcing alignment to a  dedicated memory region is trivial.

 

I'm wondering what your thoughts would be on these various options?  I'm relatively new to environments with caching, so there's a bit of a learning curve on setting all of this up.   The tools take care of most of it for fairly normal cases, but once one goes outside of those cases, the whole things suddenly becomes very large, and what the tools do behind the scenes can become a hinderance rather than a help.  I also want to create buffer pools rather than dynamic memory allocation because the memory size I'm working with and the safety critical requirements I have to work with aren't well suited to dealing with fragmentation, which can result in starvation.

 

0 Kudos
Reply
ericv
Scholar
Scholar
1,514 Views
Registered: ‎04-13-2015

@whelm

 

First, the descriptors cannot resided in cache memory because they are 1/2 an A9 cache line, so flushing/invalidating one descriptor also does the same for the neighbor.

 

For the data / packet buffers, as you are not familiar with caching, keep it simple by creating an area to hold un-cached buffers for the EMAC to use.

In the between lwip and the EMAC driver code section, modify it to copy in and out the lwip buffers with the uncached ones.

From memory, I the file where lwip exchanges with the driver is named ethernetif.c

 

0 Kudos
Reply
whelm
Explorer
Explorer
1,510 Views
Registered: ‎05-15-2014

I guess my first point of ignorance is why they need to be flushed.  Is it because the EMAC DMA doesn't support coherence?  I am aware of the concern, but don't fully understand it.

 

Then comes the issue of uncached memory.  Since the MMU handles cacheability, and does so with 1MB granulatity, your suggestion in a DDRless system is easier said than done.  I presume that the top 1 MB could point to a secondary TLB table and break it down into 4kB chunks, but I have never seen anything in Xilinx that suggests doing such, so I'm wondering if there is some reason it isn't possible.  If the only detractor is that it eats 1kB for the secondary table, that seems pretty trivial.

 

Finally, other DDRless solutions lock down L2 and use it as RAM.  If there is no external memory, such as DRAM, L2 doesn't really do anything anyway.  But it isn't clear to me if it could be used for descriptors, or if that creates a problem.  If it could, that would be interesting, because it could be locked down in different pieces in different blank areas of the address map and even without secondary TLB tables, could provide some relatively fine grained control over L1 cacheability, just by how it is distributed in the address space.  There are plenty of 1MB chunks of address space that could be locked to a few L2 cache lines.

 

Also, a similar question arises with regard to BRAM.  Would it make a suitable memory for descriptors or packet buffers, or would that create DMA access issues, too.

 

0 Kudos
Reply