04-12-2011 05:36 AM - edited 04-12-2011 05:38 AM
I have been working a bug in a graphics controller core for the last few days. The bug leads me to think there is a random problem with MPMC5.04.
My system is a V4FX20 with 32BIT DDR connected to MPMC5.04. I have the following ports:
Port1 is a 32bit NPI PIM for TFT display
Ports 2 and 3 are PLB's for the PPC405
Port4 is 32 bit NPI PIM for my graphics controller.
The graphics controller reads and writes from the TFT's frame buffers stored in memory. The TFT is 16bit (2 bytes per pixel).
I am preforming a specific operation of drawing a white vertical line in a buffer from pixel location (3,0) to (3,10). The video buffer is located at 0x01A00000 in DDR. I start the first write transaction at 0x01A00006 (X=6=3pixels), I mask the address correctly to align it to a 4 byte boundard and write to the memory location, using the WrFIFO_BE = 0xC to mask off the correct pixel (upper nibble of word).
I understand all about the MPMC special cases, safe write mode, address alignment, etc. I have been very careful to make sure I meet all the requirements.
(Note: the NPI_ADDX in the above image, is not actually the addx being fed to the NPI PIM. The actual addx has the lower two bits masked to zero for alignment)
This process works fine in the simulator and generally works well on hardware, however at very random times I will see adjacent pixels light up. This mainly happens when writing 0xFFFFFFFF to memory. I can run the project in ModelSIM and not see this problem. I verify the pixels have been WRITTEN to memory, and they are not display artifacts or issues caused by the TFT.
When I change from BRAM FIFO's on PIM4 to SRL FIFO the problem does not exist.
This leads me to think there is a bug in MPMC causing a write to memory at an address that is off by 1 byte at random times when using a NPI interface.
Can someone please point in the direction of how to debug such a beast?..
04-12-2011 07:34 AM
Odd or weird behavior is more usually the result of not enough timing slack. In the verbose timing report, what sort of slack is reported?
What happens when you warm up the device (just restrict the airflow with a box, or use a gentle added heat from a heat source -- watch out that you do not exceed the junction temperature limit, or else you will not be testing if it works or not at maximum temperature!).
If you (gently) cool the device, do thew error disappear?
If heat/cold affects the problem, then first look for a timing problem.
Timing issues may result from excess system jitter, which may be related to the decoupling capacitors. Check against the user's guides on the bypassing solution and components used. Check how much ground bounce you have. It may be you just need to decrease the timng period constraint by 100 to 200 ps and re-place and route the design.
04-22-2011 04:05 AM
Sorry for the belated reply.
I found the timing slack as 3.048nS in my design. As I heated the board up to about 50°C the suspect pixels started getting less frequent.(strange!)
I have performed unit testing with the IP cores on several different boards, and different devices (V4FX12, V4FX20 and V5FX70T). All unit tests showed the same problem. (The problem is still classified as DDR writes occurring at addresses which are slightly wrong, and its possible the NPI_WrBE signal is to blame)
I conclude that there is a problem with the IP core, causing a marginal timing problem when connected to the MPMC. The IP core in question is a graphics processor, and I use about 26 DSP48's inside this module, running at high frequency. The V4FX20 gets very hot when all DSP48's are running, however if I replace their functionality with a pure logic implementation, I do not get as much heat, but I still get the same problem.
The only solution is to switch to SRL FIFO's for the MPMC's NPI pipeline, and avoid using BRAM FIFO's.
04-22-2011 07:27 AM
04-25-2011 10:41 PM
04-26-2011 01:54 PM - edited 04-26-2011 01:57 PM
I am using 32bit NPI and 32bit DDR I am using the special case for write byte mode.
All of the NPI signals are synchronous out of my Core. The problem is clearly bytes written 'just off' from the target address to the DDR, this is caused by MPMC. The problem only occurs on a NPI port connected to my graphics controller, all other aspects of the sytem work good. The problem is the same on two different boards (V4FX12 and V4FX20).
Moreover the results are random in nature and not easy to capture and disappears when SRL FIFO's are used for the NPI pipeline. Like Austin said, it *must* be a timing problem, I agree too that the design is very mature, and a bug in MPMC5.04 is not really possible.
Having said that I would like to try and find out why this happens, but I am finding it hard to make sense of the MPMC to the level where I can debug with certanty.
04-26-2011 02:05 PM