07-08-2019 02:45 PM
I have a design that exclusively uses BRAM for code storage, but which uses an external PSRAM for data storage of several megabytes. The full application is working and validated with no caching on the PSRAM.
To try to improve performance, I reduced the size of the DCache (no ICache since all code is in BRAM), successfully raised the Microblaze frequency, and connected the System Cache IP. I followed the recommendations in the documentation to force the Microblaze Dcache to use the cache port for all memory access. Anyway, I seem to have persistent problems with memory access in the application after enabling the DCache, regardless of whether I use the ACE or the AXI interface flavors to the System Cache IP.
I then stripped the design down to the absolute minimum so I could just run the sample memory test app, and my finding is that the memory tests all pass if I never execute xil_DcacheEnable(), but they all fail if I do. They fail every time if I alter the sample to test the entire 16MB memory, but if the size of the test is kept small enough I can get it to pass consistently and then at some boundary size it will be different from one run to the next - One run will fail and then running it again immediately, it may pass.) This leads me to believe that after the L1 cache is enabled there is a coherency problem, but I don't see why that would be the case with the ACE connection between processor and cache. Shouldn't the memory test sample "just pass" in all cases regardless of whether the Dcache is enabled or not?
A search of earlier threads on System Cache mostly comes up with Zynq examples that are not applicable here, or are old ISE/XPS versions that are out of date, so I'm looking for anyone who has some up-to-date understanding of how Microblaze and System Cache are expected to work together in a simple single-core design, because I've blown hours and hours on this over a period of several days and it seems to be far less straightforward than it should be.
07-16-2019 02:25 AM
When using System Cache, and the memory test size is small, you are actually not accessing the PSRAM itself, because the data is cached in the System Cache L2 cache. That is the case independent of whether D-cache is enable or not in MicroBlaze. The difference when you run the test with D-cache enabled is that data is also cached in the MicroBlaze L1 cache. That is probably why the memory test passes with small test sizes. It seems likely that the issues occur when actually accessing the PSRAM, but I can't really explain why you would only see them with D-cache enabled.
Unless you have a multiprocessor design, you don't need to enable ACE. There can be no coherency issues with a single processor.
You might try to isolate the problem by connecting the System Cache master port to an on-chip memory using the AXI BRAM Controller IP, and make that as large as possible. If the full size memory test always passes with that configuration, the issue has to be related to the memory connection.
Would you be able to provide the block design of your stripped down design so I can take a closer look at the configuration? You can save it with "write_bd_tcl bd.tcl" from the Vivado Tcl console.
07-17-2019 02:28 AM
Thank you for the block design. Everything in the MicroBlaze and System Cache configuration looks good.
I don't have access to a Nexsys4 board (or another board with a PSRAM), so I couldn't run the memory test, but I tried replacing axi_emc_0 with AXI BRAM Controller and sucessfully ran the memory test on a KC705 board, both with cache enabled and disabled.
I have attached a block design for Nexsys4 with axi_emc_0 replaced with AXI BRAM Controller. This reduces the memory size from 16M to 256K. You can create a design by using "vivado -source bd_axi_bram.tcl", Create HDL Wrapper and Generate Bitstream. You would also have to change the memory test to only use the size 262144 in memory_config_g.c. Could you try to run this on your board?
If this works, I would suggest adding a System ILA to the System Cache M0_AXI port to be able to look at the actual AXI traffic from axi_emc_0.
07-17-2019 02:48 PM
I'm pretty sure there is some problem in the memory controller IP rather than the cache itself. I agree that BRAM works. With PSRAM, there is clearly a problem in the first portion of the 32-bit memory test where after a small number of dwords are read back correctly, data starts coming back from memory with the high word being corrupted (e.g. 0x0020 rather than 0x0000 as expected, while the low word counts up monotonically as expected). The PSRAM is 16 bits wide, so two physical writes/reads are required per 32-bit AXI transaction. My suspicion at this point is that the 32-bit operations are not being broken up properly in the IP under some condition. I'm going to continue debugging with an ILA connected to both the AXI bus and the PSRAM interface and see if I can catch incorrect data being written to memory, or whether some reads are being decoded incorrectly relative to the specified AXI address and pulling data from the wrong physical address.
07-17-2019 11:16 PM
Thanks for confirming that the issue is likely to be related to the memory controller IP. I have raised an internal change request on the memory controller IP to look at the issue.
07-18-2019 09:49 AM
Thanks for doing that, hope they find/fix it. I've written my own PSRAM controller now using the sample AXI IP that gets generated within Vivado, and that one seems to be working much better. :-) Just takes a little more work than talking to a block RAM.
08-19-2019 11:38 PM
AXI EMC IP is in maintenance mode; hence, the bug that was found will not be fixed.