cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
samwagner
Visitor
Visitor
388 Views
Registered: ‎01-27-2021

Corrupted CDMA on BRAM to DDR transfer?

Jump to solution

All, I have a problem that I have been trying to squash for a while and at this point I would appreciate help. (Background: I am an FPGA newbie, so excuse if I misuse some words). Hardware is a ZedBoard Zynq-7000.

As a proof of concept for another project, I have a VHDL module that generates sine waves (2048 samples each) and stores the result in the lower 11 bits of a 32-bit number in a large block BRAM (256K). After each wave is done, an edge interrupt is generated to tell the PS to read the data. The module generates 4096 sine waves and then terminates. 

In the PS, I use an AXI-CDMA connected to an AXI-Bram Controller to transfer the data into DDR using the following code snippet. I then use an edited LWIP echo server to transfer the data to an external PC running a TCP server, which then receives the data.

//initiate a transfer (dest = DDR, src=BRAM, len = #bytes)
int bram2ddr_xfer(int *dest, int *src, int len)
{
	u32 status;
	status=XAxiCdma_SimpleTransfer(&AxiCdmaInstance, (UINTPTR) src, (UINTPTR) dest,  len, NULL, NULL);

	while(XAxiCdma_IsBusy(&AxiCdmaInstance)){ } //wait until finished
	return status;
}

For reference, here is how I initialize the CDMA. (I don't know if this is the correct way to do it - polling mode)

// function to setup the axi cdma and flush cache of DDR
int setup_axi_cdma(UINTPTR start_inval, UINTPTR end_inval)
{
	XAxiCdma_Config * CfgPtr;
	//flush cache of addresses in DDR
	Xil_DCacheFlushRange(start_inval, (end_inval - start_inval));

	//initialize CDMA
	CfgPtr = XAxiCdma_LookupConfig(DMA_CTRL_DEVICE_ID);
	XAxiCdma_CfgInitialize(&AxiCdmaInstance, CfgPtr,
			CfgPtr->BaseAddress);
	XAxiCdma_Reset(&AxiCdmaInstance);
	
	XAxiCdma_IntrDisable(&AxiCdmaInstance, XAXICDMA_XR_IRQ_ALL_MASK);

	xil_printf("AXI CDMA correctly initialized \r\n");
	return 0;
}

And for reference, here is the VHDL block diagram.

samwagner_1-1616187310625.png

 

So, the problem. Waveforms are transferred normally until waveform #87, when there is a glitch (see image below)

samwagner_0-1616186465487.png

At first this seems to be similar to an off-by-one error, but it is unlikely because waveform 87 is stored in the 87th row of a 128x2048 u32 DDR variable, and it is read in the 22nd row of a 32x2048 32-bit BRAM block. There doesn't seem to be anything special about this number. But the behavior is somewhat periodic - a glitch occurs at waveforms 87, 100, 188, 201, 289, ... (87 + k*101 and 100 + k*101).

I have tried almost everything I know of - it is not an ethernet issue, the corrupted data is physically read into DDR (I have summed & verified). 

Questions/Ideas

1) I'm sure some of you have seen something similar out there! What could be the cause?

2) Could timing closure be the cause? (I had to slow down the AXI clk to meet timing for the huge BRAM block - 65536 deep). This design meets timing by a couple of ns.

3) CDMA - is the block diagram correct (not connecting interrupt), and is my C-code to initialize it and transfer correct?

4) Could LWIP echo server (edited to transfer data) be interrupting and corrupting the CDMA transfer?

Thanks for any help you have.

 

 

0 Kudos
1 Solution

Accepted Solutions
samwagner
Visitor
Visitor
246 Views
Registered: ‎01-27-2021

I think the issue with problem 2 was that the variable in which I stored all data was too large for the stack allocated. I increased the size of the stack in the linker script generator, and all of the data was transferred correctly from BRAM to DDR to PC via lwip. Here is a picture of the linker script generator for reference - I just increased the stack to be extremely large (~256MB).

samwagner_0-1617062143628.png

 

So, in summary - why the data was being read with glitches.

1) Cache coherency was causing stale data to be read - this was fixed by using the ACP port on Zynq instead of the HP-slave. (I also fixed this with cache flush/invalidations using an HP-slave, but the ACP was an easier solution)

2) (my theory) The DDR variable in which I stored the DMA'd data was too large for the stack size, causing an error when I attempted to access its upper contents. I increased both the heap and stack size for good measure, which fixed the issue. If my theory about the problem wasn't correct, increasing the stack/heap size in the linker script fixed the underlying problem anyway.

 

 

View solution in original post

0 Kudos
2 Replies
samwagner
Visitor
Visitor
271 Views
Registered: ‎01-27-2021

As an update, I have found that the glitch is actually two (or more) problems.

1. Cache coherency with the DMA. I changed to using the ACP port on the Zynq to enable cache coherency. Fixing this removed a majority of the "glitch".

2. Some issue with lwip that I am still debugging, still causes some "glitch" in the received waveform and causes the program to crash. Some more detailed info I have found about this second problem.

 

After the CDMA runs for some time, a call to xemacpsif_input() in xemacpsif.c from the lwip mainloop crashes the entire system due to an invalid pbuf return pointer. Here is a code snippet from xemacpsif_input().

 

 

struct eth_hdr *ethhdr;
struct pbuf *p;
SYS_ARCH_DECL_PROTECT(lev);
		
/* move received packet into a new pbuf */
SYS_ARCH_PROTECT(lev);
p = low_level_input(netif);
SYS_ARCH_UNPROTECT(lev);

/* no packet could be read, silently ignore this */
if (p == NULL) {
      return 0;
}

xil_printf("P addr: %X\r\n",p); // added printf statement

/* points to packet payload, which starts with an Ethernet header */
ethhdr = p->payload;

 

 

the pointer return from low_level_input(), p, takes the values of : (debug printout from serial terminal):

samwagner_0-1617044567047.png

when xemacpsif_input() tries to reference address 0x71D0_57E3, the program crashes (because there is no 0x71D0_57E3). 

So, the new question is why the operation of the CDMA and lwip are interacting to generate an invalid pbuf *?

 

 

 

 

Sorry for any info overload - let me know if any of you have comments on why this invalid pbuf * is being created.

0 Kudos
samwagner
Visitor
Visitor
247 Views
Registered: ‎01-27-2021

I think the issue with problem 2 was that the variable in which I stored all data was too large for the stack allocated. I increased the size of the stack in the linker script generator, and all of the data was transferred correctly from BRAM to DDR to PC via lwip. Here is a picture of the linker script generator for reference - I just increased the stack to be extremely large (~256MB).

samwagner_0-1617062143628.png

 

So, in summary - why the data was being read with glitches.

1) Cache coherency was causing stale data to be read - this was fixed by using the ACP port on Zynq instead of the HP-slave. (I also fixed this with cache flush/invalidations using an HP-slave, but the ACP was an easier solution)

2) (my theory) The DDR variable in which I stored the DMA'd data was too large for the stack size, causing an error when I attempted to access its upper contents. I increased both the heap and stack size for good measure, which fixed the issue. If my theory about the problem wasn't correct, increasing the stack/heap size in the linker script fixed the underlying problem anyway.

 

 

View solution in original post

0 Kudos