01-30-2019 06:25 AM
I have a Zynq-7000 design utilizing the PS and PL. During operation, the PS makes AXI writes and reads to the PL. I've verified the data is written correctly with an ILA, however, in some cases reading the same registers returns corrupted data. Typically the 2 least significant bytes are correct, but the MSB's contain garbage data. The ILA shows that the PL registers being read contain the correct unchanged data the whole time.
Since I've verified the data is correct on the PL, it seems to me that it is an issue farther up the chain between the PS and the PL. After doing some research it seemed like it could be cache related, so I tried using the ioremap() function which should not use caching (see "Linux CPU to PL Access.pdf" from here). This did not fix the problem.
Could caching still be a potential issue even with the ioremap() function? If not, what else could be causing this issue?
01-30-2019 04:34 PM - edited 01-30-2019 04:36 PM
If the cache were involved, I suspect there would be no read access occurring in the PL--the PS just would be getting stale data directly from the cache.
So: you see the correct read data appear from the source register within the PL, but that data doesn't show up correctly at the PS.
What kind of accesses are you performing--simple or burst? How are you performing the writes and reads--software of some sort running on a processor, or via a debugger?
What happens when you perform simple accesses to the same location(s) using XSCT?
And where are the pictures? Engineers thrive on timing diagrams...
01-31-2019 04:33 AM
I know of two issues which might cause a problem like this, although the chances are that neither is what you are experiencing--especially since I don't have enough data to tell.
While I hope this helps, I'll also be the first to admit that these are only guesses.
01-31-2019 06:43 AM
@jg_bds I've verified that the data in the PL registers is correct. I am using an AXI4-Lite peripheral based on the automatically generated one in Vivado (Tools > Create and Package New IP > Create a new AXI4 peripheral). The only significant change to this peripheral is making it parameterizable for any number of registers. Since it is AXI4-Lite, all transactions are single, not burst.
Reads and writes are done via a standalone C application running on the ARM processor. I have not used XSCT before, and I'm not sure how it would help me in this case since the C application is making reads and writes already.
I don't have any ILA screenshots at this moment, but if I am able to capture the invalid data in a read transaction in one I will post that.
@dgisselq The first article you referenced might potentially be related to this problem, since it discusses a bug in the Vivado generated AXI-Lite peripheral (which I am using). However, if this is the issue, I am confused as to why I have not seen this before. Up to this point, I've used this AXI peripheral in many designs without any issues. I will try to get some ILA captures of read/write transactions to verify this peripheral.
The second article you referenced doesn't seem as applicable since it was an issue with read data being reordered, which is not what I am seeing. Nevertheless, thanks for including it as an option.
I attached the AXI peripheral as a reference, but I am not sure it is the issue. I still believe it is something with the PL-PS interface.
01-31-2019 07:07 AM
Yes, your code looks like the more recent Vivado generated demo IP--which still has the bug within it. My best guess is that Xilinx tried to fix things, and just made things worse. The article I cited above discusses a simple fix, as does this follow on article which presents a clean design of an AXI-lite slave. (If you are a Xilinx moderator, then let me invite you to repeat the tests shown in the first article. All the code is open source ...)
As to your question of why this hasn't been noted before, allow me to say that I share your curiosity. I'd love to hear why it hasn't been discovered before. Is it just because the return channels are always ready? Or is there another reason? I don't know. I think I would need some insight into how some of the closed source IP works. I'd love to find out though.
Let me ask one other question: Are you leaving the bus width at 32-bits, or overriding it at all? I'd understand seeing some corruption if you tried connecting to a 64-bit slave. (Xilinx's AXI-lite only ever supports a 32-bit width.)
02-04-2019 11:38 AM
I am using 32 bit data width throughout the design.
I was able to capture an invalid read transaction with the old AXI code on an ILA. The register being read contains a value of 0x0000009B, but 0x000002DB is read. If you compare the attached waveform (axi_invalid_read.png) and the previously attached code (axi4lite.vhd), it seems like it might be a timing issue rather than a logical issue. The output RDATA is assigned from reg_data_out, which is in turn assigned from slv_reg. reg_data_out seems to settle to the true value, but it is captured too soon. I hope I am wrong, because that would mean that updating to the new AXI code @dgisselq referenced wouldn't fix the issue.
Nevertheless, I tested a design with the new AXI code translated into VHDL (axi4lite_new.vhd) from this code. The good news is that I haven't been able to break it yet - all reads have returned the expected data. The bad news is that this was sometimes the case with the original design, and it would fail unexpectedly later. If I continue to see the same issues, I will update in a new post.
I attached a simple write/read transaction waveform of the new AXI code as well.
02-04-2019 03:38 PM - edited 02-04-2019 04:17 PM
It's not so much that reg_data_out is captured too soon. Rather, its value stabilizes too late to be capture by the following clock edge. RDATA must be clocked at that time so as to coincide with RVALID.
How fast are you running this AXI Lite interface? And how many readable registers exist inside the block?
02-05-2019 05:29 AM
I believe it is a 100 MHz clock, and there are currently 1024 registers (12 address bits). I realize this implements a 1024:1 multiplexer, which could be causing a large delay. If I'm able to, I may be able to reduce it to 256 registers (10 address bits).
02-05-2019 05:47 AM - edited 02-05-2019 05:50 AM
Yeah... That's a lot.
If you're not doing byte writes, you can simplify things (and remove the lower 2 address bits) by forcing 32-bit writes. All reads are 32 bits wide.
A single LUT can do a 4:1 mux. So your read system has at least 5 layers of LUTs just for the data muxing. (2^(5*2) = 1024) That doesn't include the other logic signals that are used to qualify the 'latch' in front of the RDATA register. With that many layers, a lot of the timing depends on the routing delays.
The timing has to be really close--yet still not guaranteed. In the trace you shared, the signal reg_data_out was settled by the time the ILA sample (@2) was taken, but it wasn't settled soon enough for the proper value to get stored in the RDATA register, supposedly on the same clock edge.
So Vivado didn't report any timing problems with this?
02-05-2019 07:56 AM
I haven't noticed any issues with writes so far, so I'm fine with leaving the strobe bits enabled. If I notice any problems with writes, that will be the first thing to go.
The design met timing in Vivado, so I'm surprised it appears to be having timing issues. Hopefully the new axi4lite code along with reducing the number of registers to 256 eliminates the timing issue.