06-22-2018 03:35 PM
I need some help understanding some cache related issues.
1. GEM documentation indicates that it is not wise to put rx and tx descriptors in cached memory, because of problems related to flushing cache lines. II isn't clear to me when and why such flushing needs to be done. Is the problem that the GEM DMA does not participate in cache coherency or something?
1A. I've also heard that there could be a related issue with packet buffers themselves. I'd appreciate extending the discussion to that.
2. I'm running systems without DRAM, primarily using OCM, which doesn't use L2 cache. I know L2 cache can be turned into RAM by locking down rows. Given that a system without DRAM appears to have no use for L2, that is attractive. But I'm wondering if the GEM DMA might have problems using that memory, or once it is locked down, would it appear to the system just like OCM or other normal memory? One interesting possibility with this is that it conceivably could be locked down into several chunks in different parts of the address space so different parts could have different attributes, including L1 cachability.
3. The MMU is segmented into 1 MB segments, which precludes doing useful things like giving different parts of OCM different attributes, such as cachability. However, the documentation seems to indicate that each 1 MB segment can be further divided by a secondary table into 4kB chunks. the secondary table takes 1kB of memory, which seems quite cheap. I'm wondering why this is never considered for things like rx and tx descriptors that don't need much memory. Are there some constraints or issues I'm not noticing?
My take on it right now is that 99% of Zynq development is done on systems with many MB of DRAM, so there is no need to optimize much of anything. This is clearly shown in the fact that the standard FSBL will fail by definition without DRAM, even though it is simple and useful to run FSBL without DRAM and there are alternative FSBLs that can easily do it. Also the fact that one can't use the LwIP library without DRAM, because it hard codes 1 MB for descriptors, when it could have easily been a configurable option. So I'm trying to sort out ways to take full advantage of the on-chip resources. There should be a lot of use cases for this.
To give an example, we have built a UHF direct digital software defined radio complete with LCD and touch screen and Ethernet protocol stack running entirely in OCM. With L2 locked down as RAM, the total available memory would be 3 times what we're using. Coming from the 16 bit world with all previous projects requiring 256k or less of memory, that is a lot of potential, without adding one or two more ICs with lots of pins that have lots of signal integrity requirements.
06-26-2018 03:07 PM
06-25-2018 04:49 PM
06-25-2018 09:16 PM
I don't mind writing the table. That's a five minute job. It's just that in all the talk about allocating 1 MB for a few kB of descriptor tables because the MMU granularity is 1 MB, I was afraid there was some reason whey the secondary table wouldn't work or something. There was never any mention that it was a possibility, and no provision in the Xilinx implementation of LwIP to even work with descriptors that weren't 1 MB. We had to import the source directly into the project just to shrink them. I'd gladly allocate 32k or 64k of OCM non-cached for descriptors and packet buffers. The hardest part is having to import the whole library source rather than just use it. The secondary table is simpler by comparison.
06-26-2018 02:57 PM
OK. The main question I have left has to do with caching the GEM. As far as I can tell, the whole issue here is that the GEM path to memory is not monitored by any coherency logic, therefore if the GEM DMA alters memory, the cache won't pick up the changes, which is why it has to be flushed. Can someone tell me if that is correct?
And if that is the case, Locking cache rows of L2 and using them as memory would not work for packet buffers or rx/tx descriptors, because the L2 cache would never see any changes made by the GEM (and the GEM probably wouldn't see anything written there by the CPU). Can someone verify that this is correct thinking?
06-26-2018 03:07 PM