04-16-2017 03:37 AM
I faced a problem while designing a system with large bram size (32 x 256K - WIDTH x DEPTH) on Zynq 7000 series (ZC706 board).
I tried to generate a single bram instance of simple dual port bram using block memory generator with width of 32 bits and depth around 256K which utilizes 232 block ram slices (total available 545). I uploaded data from DRAM to BRAM using bursts of 256 length (write operation working fine). While reading back the data from bram I got the data in anonymous order and also data bus flipped when address bus is carrying the same address.
So, I thought of having some sort of timing issue with the memory design. I tried to lower the operating frequency and tried upto 500 kHz but still the results are same i.e. data is anomalous and and bits getting flipped in non-uniform fashion.
Experiments performed to debug :
1. Lower frequency - 500KHz - 32 x 256K - Result - Failing to read
2. Nominal Frequency - 500KHz - 32 x 256K - Result - Failing to read
Banking Structure implementation:
I also tried to lower the bram size and analyze the behaviour and I found that we can have reliable read operations with bram size (32 x 64K) and data is in orderand works well even at 50MHz .
So I tried to bank the overall required memory i.e. (32 x 256K) into four banks of (32 x 64K) as I was getting correct data for lower bram size. But still after instantiating four small bram modules, It followed similar anomalous operation and unable to provide the correct data.
I think the problem is because of overall bram utilization by the system and it corrupts the data once a limit is reached but system uses less than 50% of total bram available and I couldn't figure out the root cause of the problem.
If anyone has encountered similar problem or can help me with this problem. Any help would be appreciated.
04-16-2017 02:44 PM
@parth.7012121 did you simulate your design? how reliable is your timing constraints and does your design pass timing? If you have hold violations, lowering the clock speed doesn't help. Simulate and time your design properly.
04-19-2017 05:44 AM
@muzaffer While going through post-synthesis and post-implementation timing reports, I figured out that post-synthesis timing report shows hold violations of '0.091 ns' in write data path. Whereas post-implementation timing report shows all timing constraints met without any violation.
I referred to one of your post where you have mentioned that small hold violations are taken cared by Vivado and after implementation there will be no violations (similar happened in my case). But It is expected that after implementation it should work properly if there are no timing violations left. In my case, although there are no timing violations after implementation but on reading data from BRAM, it is providing anomalous data (because of mis-matched timing).
Can you provide any help with this.
04-19-2017 11:45 AM
@parth.7012121 the questions are: do you have multiple clock domains in your design, or is it single clock synchronous design? Do you simulate the RTL ? If the answers are single clock domain with RTL simulations passing, then your only remaining choice is to simulate the post-implementation design with timing and see where the issue is coming from.
By the way, what do you mean by: "on reading data from BRAM, it is providing anomalous data (because of mis-matched timing)."
how do you know the result is because of "mis-matched timing"? And what does that mean exactly?
04-19-2017 11:54 AM
@muzaffer Designed system is a single clock synchronous design. I have performed logic simulation on the RTL and getting correct output there. I have also checked post-implementation timing report and checked at the exact nets but according to report and timing at that particular net, there is no timing violation in the design.
Earlier I mentioned that "on reading data from BRAM, it is providing anomalous data (because of mis-matched timing)." because when I use debugger to check functionality on board the data I get from BRAM is flipped at different places. At times, data bus gets flipped even if the address bus carrying the same address. Also at few positions, I can observe that data bits getting shifted as well as flipped. So I thought if having an timing issue while interacting with BRAM. Also, in post synthesis report, we can observe a hold violation in write datapath.
Please let me know if there is any solution
04-19-2017 03:01 PM
@parth.7012121 as you mention, what matters is the post-implementation timing, not post-synthesis. At this point the only suggestion I can make is to run post-implementation timing simulations to see if you can duplicate the issue and find the root cause if yes. If post-implementation simulation doesn't agree (with respect to timing) you should open a case with Xilinx.
04-20-2017 03:40 AM
@muzaffer Earlier I have performed behavioral simulation of the design only.
Can you please pass on some material to perform post-implementation timing simulation for zynq (ZC706) board if possible. I tried to perform post-implementation timing simulation but as I'll be using DRAM and software core for transferring the data. Can you please let me know about the simulation possibilities.
04-23-2017 03:39 PM
@parth.7012121 post-implementation on zynq needs to be done in coordination with a BFM to access the various axi busses into and out of the PL which should simulate the behavior of PS & the DDR ports (through HPx I assume). Alas it's not a very easy thing to setup.
11-06-2017 02:18 PM - edited 11-06-2017 03:25 PM
Do you have the option of loading some test data?
When I create the BRAM IP in Vivado it offered me the option of supplying an initial file into the BRAM.
So one path of debugging works as follows:
1. Use an initial file
2. Use the (initialized) BRAM to get the READ logic working correctly.
3. Once READ works you may then try the design with alternating initial patterns to debug the WRITE logic.
4. (Be sure the write logic does not change the timing of the READ logic.)
The initial patterns for testing writes can be things like 10101010, 01010101, traveling 1's 00000001,00000010,etc (you find these patterns in things like MEMTEST86+ et al.)
What I'd like to know is where I can find a BRAM timing diagram for a proper read and write operation since the only thing I could find on Xilinx tells me about voltages and stuff that, while useful for using the Zynq chip on one's own custom PCB, isn't so helpful from an RTL perspective (since in RTL land it's the timing diagram that matters to get the read and write state machines to do their thing correctly).
Edit: I found documentation with timing diagrams: https://www.xilinx.com/support/documentation/ip_documentation/blk_mem_gen/v8_4/pg058-blk-mem-gen.pdf
11-06-2017 02:31 PM
When I create the BRAM block in Vivado the wizard's last page mentions a 2 clock latency for READing.
Does your design take this into account?
I read that as on first clock clock the address, and on the next clock the data on dout is valid for reading.
but it could also mean: Clock the address, burn a clock, then read dout on next clock (using a register to hold the address over all 2/3 clocks as per case).
Without an official BRAM timing diagram (seeing as I can't find one) I don't know which case is correct here.