cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
tkontogiorgis
Explorer
Explorer
644 Views
Registered: ‎09-10-2019

AXI DMA problem

Jump to solution

Hello,

I am running FREErtos with lwip in zynq Ultrascale+ device. What i am trying to do is to pass data from lwip to PL via AXI DMA and then vice versa. Currently I am just implementing a loopback FIFO AXI STREAM to pass the data back. I send one big packet ( 691254 bytes ) .

1) If I understand correct I need to send one packet each time the  NUMBER_OF_PKTS_TO_TRANSFER = 1 and then wait for the next lwip packet. Thus I should XAxiDma_BdRingFree() and XAxiDma_BdRingAlloc  the RINGS after sending data to lwip  and then wait for the next transfer right?

2) What I see is that every new transaction the Tx,Rx Bd Head Pointer is increased? Shouldn't this pointer remain in the same address after i have free and re-allocated the Bds?  I also tried  XAxiDma_BdRingPrev to "make " the Ring look at the same adress but it is not working. I think XAxiDma_BdRingFromHw() is returning the increased BdPointer every new interrupt but what i want (if i understand correct) is the same adress because it is a new DMA transaction . Right?

3) Also i think i am getting error in  Status register :

Dump registers A0000030:
Control REG: 00017002
Status REG: 00010019
Cur BD REG: 00D96B80
Tail BD REG: 010000C0

After Receiving interrupt this code is executed :

 

			xil_printf("Before Tx FRee , TxDone = %d \r\n" , TxDone);
			Status =XAxiDma_BdRingFree(TxRingPtr , TxDone , TxBdPtr);
			if (Status != XST_SUCCESS) {
				xil_printf("Tx BD Ring FREE failed %d\r\n", Status);
			}
			TxBdPtr = (XAxiDma_Bd *)XAxiDma_BdRingPrev(TxRingPtr, TxBdPtr);

			xil_printf("Before Rx FRee , RxDone = %d \r\n" , RxDone);
			Status =XAxiDma_BdRingFree(RxRingPtr , RxDone , RxBdPtr);
			if (Status != XST_SUCCESS) {
				xil_printf("RX BD Ring FREE failed %d\r\n", Status);
			}
			RxBdPtr = (XAxiDma_Bd *)XAxiDma_BdRingPrev(RxRingPtr, RxBdPtr);

			Status = Rx_Retransfer(&AxiDma);
			if (Status != XST_SUCCESS) {
				xil_printf("Problem in Rx_Retransfer function %d\r\n", Status);
			}

 

where Rx_Retransfer :

 

int Rx_Retransfer (XAxiDma * AxiDmaInstPtr )
{

	int Status;
	XAxiDma_Bd *BdCurPtr;
	UINTPTR RxBufferPtr;
	int Index;
	int Pkts;


	RxRingPtr = XAxiDma_GetRxRing(AxiDmaInstPtr);

	xil_printf("RxRingPtr->FreeHead = %x \t" , RxRingPtr->FreeHead);
	xil_printf("RxRingPtr->PreHead = %x \t" , RxRingPtr->PreHead);
	xil_printf("RxRingPtr->HwHead = %x \t" , RxRingPtr->HwHead);
	xil_printf("RxRingPtr->HwTail = %x \t" , RxRingPtr->HwTail);
	xil_printf("RxRingPtr->PostHead = %x \t\n" , RxRingPtr->PostHead);

	/* Disable all RX interrupts before RxBD space setup */
	//XAxiDma_BdRingIntDisable(RxRingPtr, XAXIDMA_IRQ_ALL_MASK);

	/* Attach buffers to RxBD ring so we are ready to receive packets */

	Status = XAxiDma_BdRingAlloc(RxRingPtr, NUMBER_OF_BDS_TO_TRANSFER, &RxBdPtr);
	if (Status != XST_SUCCESS) {
		xil_printf("Rx bd alloc failed with %d\r\n", Status);
		return XST_FAILURE;
	}

	BdCurPtr = RxBdPtr;
	RxBufferPtr = RX_BUFFER_BASE;
	xil_printf("Rx Bd Head Pointer %x \r\n" , RxBdPtr);

	for(Index = 0; Index < NUMBER_OF_PKTS_TO_TRANSFER; Index++) {

		for(Pkts = 0; Pkts < NUMBER_OF_BDS_PER_PKT; Pkts++) {

			Status = XAxiDma_BdSetBufAddr(BdCurPtr, RxBufferPtr);
			if (Status != XST_SUCCESS) {
				xil_printf("Rx set buffer addr %x on BD %x failed %d\r\n",
				(unsigned int)RxBufferPtr,
				(UINTPTR)BdCurPtr, Status);

				return XST_FAILURE;
			}

			Status = XAxiDma_BdSetLength(BdCurPtr, MAX_PKT_LEN,
						RxRingPtr->MaxTransferLen);
			if (Status != XST_SUCCESS) {
				xil_printf("Rx set length %d on BD %x failed %d\r\n",
				    MAX_PKT_LEN, (UINTPTR)BdCurPtr, Status);

				return XST_FAILURE;
			}

			/* Receive BDs do not need to set anything for the control
			 * The hardware will set the SOF/EOF bits per stream status
			 */
			XAxiDma_BdSetCtrl(BdCurPtr, 0);

			XAxiDma_BdSetId(BdCurPtr, RxBufferPtr);

			RxBufferPtr += MAX_PKT_LEN;
			BdCurPtr = (XAxiDma_Bd *)XAxiDma_BdRingNext(RxRingPtr, BdCurPtr);


		}
	}

	/*
	 * Set the coalescing threshold
	 *
	 * If you would like to have multiple interrupts to happen, change
	 * the COALESCING_COUNT to be a smaller value
	 */
	Status = XAxiDma_BdRingSetCoalesce(RxRingPtr, COALESCING_COUNT,
			//DELAY_TIMER_COUNT);
			0);
	if (Status != XST_SUCCESS) {
		xil_printf("Rx set coalesce failed with %d\r\n", Status);
		return XST_FAILURE;
	}

	Status = XAxiDma_BdRingToHw(RxRingPtr, NUMBER_OF_BDS_TO_TRANSFER, RxBdPtr);
	if (Status != XST_SUCCESS) {
		xil_printf("Rx ToHw failed with %d\r\n", Status);
		return XST_FAILURE;
	}

	/* Enable all RX interrupts */
	//XAxiDma_BdRingIntEnable(RxRingPtr, XAXIDMA_IRQ_ALL_MASK);
	/* Enable Cyclic DMA mode */

//	if (Test_type ==  Statistics_test){
//		/* Enable Cyclic DMA mode */
//		XAxiDma_BdRingEnableCyclicDMA(RxRingPtr);
//		XAxiDma_SelectCyclicMode(AxiDmaInstPtr, XAXIDMA_DEVICE_TO_DMA, 1);
//	}

	/* Start RX DMA channel */
	Status = XAxiDma_BdRingStart(RxRingPtr);
	if (Status != XST_SUCCESS) {
		xil_printf("Rx start BD ring failed with %d\r\n", Status);
		return XST_FAILURE;
	}

//	Status = SetupIntrSystem(&Intc, &AxiDma, TX_INTR_ID, RX_INTR_ID);
//		if (Status != XST_SUCCESS) {
//
//			xil_printf("Failed intr setup\r\n");
//			return XST_FAILURE;
//		}


		return XST_SUCCESS;
}

 

 

, Also i made global the RxRingPtr, TxRingPtr , TxBdptr , RxBdptr . Finally , SendPacket is called every next transaction (when receiving next data from lwip):

 

int SendPacket(XAxiDma * AxiDmaInstPtr , uint8_t  *recv_buf)
{

	int FreeBdCount;
	//XAxiDma_BdRing *TxRingPtr = XAxiDma_GetTxRing(AxiDmaInstPtr);
	TxRingPtr = XAxiDma_GetTxRing(AxiDmaInstPtr);

	xil_printf("TxRingPtr->FreeHead = %x \t" , TxRingPtr->FreeHead);
	xil_printf("TxRingPtr->PreHead = %x \t" , TxRingPtr->PreHead);
	xil_printf("TxRingPtr->HwHead = %x \t" , TxRingPtr->HwHead);
	xil_printf("TxRingPtr->HwTail = %x \t" , TxRingPtr->HwTail);
	xil_printf("TxRingPtr->PostHead = %x \t\n" , TxRingPtr->PostHead);

	/* Internal Counter, when it reaches ntdvbs2 block_size -1 (starting from zero counting)
	 * the last 9bit value is masked with zeros and only the xxxx Most Significant bits
	 * (where xxxx depending on the mode) are used for payload transmitted data.
	 * After block_size data, extra xxxx data data are padded depending on the ntdvbs2 mode
	 * to fulfill DMA requirements
	 */
//	u32 Value;
//	u32 cnt = 0;


	u32 *TxPacket ;
	//XAxiDma_Bd *BdPtr, *BdCurPtr;
	XAxiDma_Bd  *BdCurPtr;
	int Status;
	int Index, Pkts;
	UINTPTR BufferAddr;

	//printf("Maximum Packet length in bytes? = %d \r\n" , TxRingPtr->MaxTransferLen);

	/*
	 * Each packet is limited to TxRingPtr->MaxTransferLen
	 *
	 * This will not be the case if hardware has store and forward built in
	 */
	if (MAX_PKT_LEN * NUMBER_OF_BDS_PER_PKT >
			TxRingPtr->MaxTransferLen) {

		xil_printf("Invalid total per packet transfer length for the "
		    "packet %d/%d\r\n",
		    MAX_PKT_LEN * NUMBER_OF_BDS_PER_PKT,
		    TxRingPtr->MaxTransferLen);

		return XST_INVALID_PARAM;
	}

	/* Tx Packet is in our ARM (in ARM or cache) while Packet is in DDR3 memory.
	 * Thus we set the same adress (DDR's adress) to our TxPacket to write to DDR
	 * and after that we Flush the Tx Packet so to keep data cache coherency
	 *
	 */
	TxPacket = (u32 *) Packet;
	//Packet = ( u32 *) recv_buf;


	//Value = Data_Start_Value;


	/*** Turbo dvbrcs case for data sending ******/

		for(Index = 0 ; Index < (MAX_PKT_LEN * NUMBER_OF_BDS_TO_TRANSFER -1) / 4;Index ++){
			if (Index < total_sent_bytes) TxPacket[Index] = (u32)recv_buf[Index+HEADER_OFFS];
			else TxPacket[Index] = 0;
		}
//		{
//
//			TxPacket[Index] = Value & 0x03; // Masking with 0x03 because 2bit are considered as data from the Hardware in the current configuration.
//
//			/* counter from 0 to dma_packet length-1 size in 32bits*/
//			if (cnt == MAX_PKT_LEN/4 -1 ) cnt = 0;
//			else	cnt ++;
//
//			/* If counter reaches Block Size , Rest of the data are padding for AXI_DMA thus equal zero ***/
//			if ( cnt > BLK_SIZE - 1)Value = 0x00; // extra zero data padding as AXI_DMA requirement
//			else	Value = (Value + 1) & 0x03; // Masking with 0x03 because 2bit are considered as data from the Hardware in the current configuration.
//		}


	/* Flush the SrcBuffer before the DMA transfer, in case the Data Cache
	 * is enabled
	 */
	Xil_DCacheFlushRange((UINTPTR)TxPacket, MAX_PKT_LEN *NUMBER_OF_BDS_TO_TRANSFER  );
	#ifdef __aarch64__
	Xil_DCacheFlushRange((UINTPTR)RX_BUFFER_BASE, MAX_PKT_LEN *	(NUMBER_OF_BDS_TO_TRANSFER ) );
	#endif



	FreeBdCount = XAxiDma_BdRingGetFreeCnt(TxRingPtr);

	Status = XAxiDma_BdRingAlloc(TxRingPtr, NUMBER_OF_BDS_TO_TRANSFER,&TxBdPtr);
	if (Status != XST_SUCCESS) {

		xil_printf("Failed Tx bd alloc\r\n");
		return XST_FAILURE;
	}

	BufferAddr = (UINTPTR)Packet;
	BdCurPtr = TxBdPtr;
	xil_printf("Tx Bd Head Pointer %x \r\n" , TxBdPtr);

	/*
	 * Set up the BD using the information of the packet to transmit
	 * Each transfer has NUMBER_OF_BDS_PER_PKT BDs
	 */
	for(Index = 0; Index < NUMBER_OF_PKTS_TO_TRANSFER; Index++) {

		for(Pkts = 0; Pkts < NUMBER_OF_BDS_PER_PKT; Pkts++) {
			u32 CrBits = 0;

			Status = XAxiDma_BdSetBufAddr(BdCurPtr, BufferAddr);
			if (Status != XST_SUCCESS) {
				xil_printf("Tx set buffer addr %x on BD %x failed %d\r\n",
				(unsigned int)BufferAddr,
				(UINTPTR)BdCurPtr, Status);

				return XST_FAILURE;
			}

			Status = XAxiDma_BdSetLength(BdCurPtr, MAX_PKT_LEN,TxRingPtr->MaxTransferLen);
			if (Status != XST_SUCCESS) {
				xil_printf("Tx set length %d on BD %x failed %d\r\n",
				MAX_PKT_LEN, (UINTPTR)BdCurPtr, Status);

				return XST_FAILURE;
			}

			if (Pkts == 0) {
				/* The first BD has SOF set
				 */
				CrBits |= XAXIDMA_BD_CTRL_TXSOF_MASK;

#if (XPAR_AXIDMA_0_SG_INCLUDE_STSCNTRL_STRM == 1)
				/* The first BD has total transfer length set
				 * in the last APP word, this is for the
				 * loopback widget
				 */
				Status = XAxiDma_BdSetAppWord(BdCurPtr,
				    XAXIDMA_LAST_APPWORD,
				    MAX_PKT_LEN * NUMBER_OF_BDS_PER_PKT);

				if (Status != XST_SUCCESS) {
					xil_printf("Set app word failed with %d\r\n",
					Status);
				}
#endif
			}

			if(Pkts == (NUMBER_OF_BDS_PER_PKT - 1)) {
				/* The last BD should have EOF and IOC set
				 */
				CrBits |= XAXIDMA_BD_CTRL_TXEOF_MASK;
			}

			XAxiDma_BdSetCtrl(BdCurPtr, CrBits);
			XAxiDma_BdSetId(BdCurPtr, BufferAddr);

			BufferAddr += MAX_PKT_LEN;
			BdCurPtr = (XAxiDma_Bd *)XAxiDma_BdRingNext(TxRingPtr, BdCurPtr);
		}
	}

	/* Give the BD to hardware */
	Status = XAxiDma_BdRingToHw(TxRingPtr, NUMBER_OF_BDS_TO_TRANSFER,TxBdPtr);
	if (Status != XST_SUCCESS) {
		xil_printf("Failed to hw, length %d\r\n",
			(int)XAxiDma_BdGetLength(TxBdPtr,
					TxRingPtr->MaxTransferLen));
		return XST_FAILURE;
	}

	return XST_SUCCESS;
}

 

 

I guess probably my understanding about the DMA mechanism and BD Ring is not somewhere correct.

Thanks in Advance,

Theo

Tags (3)
0 Kudos
1 Solution

Accepted Solutions
abommera
Xilinx Employee
Xilinx Employee
219 Views
Registered: ‎10-12-2018

Hi @tkontogiorgis,

>> If you are planning to use one BD, the current descriptor register and tail descriptor register would point to the same buffer descriptor. Hence Xilinx recommends to use more than one BD in scatter-gather mode.

>> In your case, I would suggest to use Direct register DMA (simple) mode instead of SG mode, this would be similar to one BD.

>> Please refer to PG021 for additional details about simple DMA mode and SG mode.

Thanks & Regards
Anil B
-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------

View solution in original post

0 Kudos
10 Replies
abommera
Xilinx Employee
Xilinx Employee
553 Views
Registered: ‎10-12-2018

Hi @tkontogiorgis,

>> The XAxiDma_BdRingFromHw api would return the number of BDs that have been processed and updated the pointer which points to the first buffer descriptor.

>> You are seeing DMA internal error which occurs if the buffer length of fetched buffer descriptor is '0'

>> I would suggest to try with this SG polled example and see if it works or not with NUMBER_OF_PKTS_TO_TRANSFER = 1, and then modify your application by taking this example as a reference.

 

 

Thanks & Regards
Anil B
-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
tkontogiorgis
Explorer
Explorer
533 Views
Registered: ‎09-10-2019

Hello @abommera and thanks for helping.

" I would suggest to try with this SG polled example and see if it works or not with NUMBER_OF_PKTS_TO_TRANSFER = 1, and then modify your application by taking this example as a reference." :

I just modified my code to try SG polled mode and the problem still exists... It seems that S2MM (Rx) Head pointer is increasing. I also tried with  NUMBER_OF_PKTS_TO_TRANSFER = 4 in the previous setup and nothing changed. Do you think it should be different with poll mode? I am trying to understand if something in my understanding is wrong. 

 

"The XAxiDma_BdRingFromHw api would return the number of BDs that have been processed and updated the pointer which points to the first buffer descriptor." :

If a free and re-allocate RxRingPointer and TxRingPointer (S2MM and MM2S respectively) shouldn't Head pointers point in the same address? (Correct me if I am wrong)

"You are seeing DMA internal error which occurs if the buffer length of fetched buffer descriptor is '0'"  :

So probably my buffer has zero bytes indication right? Thus , something wrong happens before DMA? How should I treat the code concerning caches? I am using  Xil_DCacheFlushRange when sending packet. Should I do something equivalent in RX section?

-Also , can I use cyclic mode? As far as I understand this mode will send the same data and will not allow the user to "control" when DMA sends right? Or it is intended also for other use (i.e. sending, waiting , sending ,waiting etc but with the abillity to control when to send)

 

Thanks in Advance ,

Theo

0 Kudos
tkontogiorgis
Explorer
Explorer
461 Views
Registered: ‎09-10-2019

Hello @abommera ,

Still stuck here... Is it ok for the Current BdPointer to increasing? The error happens after 2-3 transaction not in the first. Below i call every iteration  XAxiDma_BdRingDumpRegs of S2MM Ring :

1st transaction :

Dump registers A0000030
Control REG: 00FF0003
Status REG: 00FE0008
Cur BD REG: 01000080
Tail BD REG: 0100FFC0

2nd transaction :

Dump registers A0000030:
Control REG: 00FF0003
Status REG: 00FD0008
Cur BD REG: 010000C0
Tail BD REG: 01000000

 

3rd transcation:

 

Dump registers A0000030:
Control REG: 00FF0003
Status REG: 00FC0008
Cur BD REG: 01000100
Tail BD REG: 01000040

 

4th transcation error :

 

Dump registers A0000030:
Control REG: 00FF0003
Status REG: 00FB0008
Cur BD REG: 00D96B40
Tail BD REG: 01000080

It seems something happened with the Cur BD REG? I am running in polling mode and have interrupts deactivated. I also tried to remove all prints in case the cause an error but nothing happened...

Also , (not shown here) I observe that the updated pointer which points to the first buffer descriptor is increasing. Is this correct? Is the behavior of MM2S (where this pointer sets in the same value every new transaction ) different from S2MM?

Best Regards,

Theo

 

 

0 Kudos
tkontogiorgis
Explorer
Explorer
415 Views
Registered: ‎09-10-2019

Hello @abommera ,

Finally I figured out the problem. It had to do with memory allocation and conflicts with DDR. So it was not DMA problem. However, the "problem" of increasing S2MM block descriptors (or misunderstanding) still exists. The Rx (S2MM) Block descriptor increases each transaction 64bits (which is the XAXIDMA_BD_MINIMUM_ALIGNMENT). The data are ok until the descriptor roll-overs to initial Rx BD space address. Then it seems that the new data are not overwritten in the old in DDR memory and thus reading again the same old data. I am trying to understand what is my fault? Should I reset the DMA?

Is there any way to just do DMA transfers in only one  descriptor (or in specific set of descriptors later) in every transaction?

I also read this post : https://forums.xilinx.com/t5/AXI-Infrastructure-Archive/How-to-reset-a-DMA-core-by-software/m-p/949927#M2678

Perhaps @calebd could help? 

Should I reset finally my DMA every size == Rx BD space  descriptors? Or am I doing something else wrong concerning DDR?

Thanks,

Theo

0 Kudos
tkontogiorgis
Explorer
Explorer
254 Views
Registered: ‎09-10-2019

Hello @abommera ,

Almost 3 weeks... Any updates please? Could you please help me?

What should be wrong after the first roll-over of descriptors? I have the feeling that my data still are correct written in DDR.... Should i reset the DMA totally after all BD's are scanned? 

Thanks,

Theo

0 Kudos
abommera
Xilinx Employee
Xilinx Employee
242 Views
Registered: ‎10-12-2018

Hi @tkontogiorgis,

Apologies for the delayed response, What do you mean "However, the "problem" of increasing S2MM block descriptors (or misunderstanding) still exists."?

>> Were all BDs processed successfully? Would you like to processor the same BDs without intervention? If yes, you can use cyclic mode, then the DMA continues to fetch and process until it is stopped or reset.

Thanks & Regards
Anil B
-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
0 Kudos
tkontogiorgis
Explorer
Explorer
225 Views
Registered: ‎09-10-2019

Hello @abommera ,

I want to process one BD per time  ( MM2S and S2MM)  and then after software actions do the same again infinitively. So I don't think I should user cyclic mode because as far as I understand this mode cannot be "stopped". What I am trying to do is the above:

1) Receive one chunk of data  from lwip

2) Send these data to DMA ( assuming one packet and 1 bd at the moment)

3) Receive these data from DMA

4) Send these data back to LWIP

1) till 4) continuously

Currently I am running in loopback mode with a Stream FIFO intermediate S2MM and MM2S buses. So my observations are the following:

1) In every transaction Receive BD pointer is increasing by XAXIDMA_BD_MINIMUM_ALIGNMENT  unlike Transmit BD pointer which is the same in every transaction. I assume that this is correct right? RX BDs pointers are controlled and returned from HW.

2) After Receive BD pointer reaches the last pointer available (showing in  RX_BD_SPACE_HIGH) , the mechanism seems to roll-over so the descriptor seems to point again in the first position. But it seems that something is wrong because i read  the same data as it was in the first transaction Cycle (   (RX_BD_SPACE_HIGH - RX_BD_SPACE_BASE ) / XAXIDMA_BD_MINIMUM_ALIGNMENT number of   transactions ).

- Should I use cyclic mode ? How should I configure it to wait for lwip receive?
- Is it correct for the RX BD pointer to increase and then roll-over to the beginning?
- Perhaps i should reset the system after RX BD pointer reaches RX_BD_SPACE_HIGH?

Thanks for responding,

Best Regards

 

0 Kudos
abommera
Xilinx Employee
Xilinx Employee
220 Views
Registered: ‎10-12-2018

Hi @tkontogiorgis,

>> If you are planning to use one BD, the current descriptor register and tail descriptor register would point to the same buffer descriptor. Hence Xilinx recommends to use more than one BD in scatter-gather mode.

>> In your case, I would suggest to use Direct register DMA (simple) mode instead of SG mode, this would be similar to one BD.

>> Please refer to PG021 for additional details about simple DMA mode and SG mode.

Thanks & Regards
Anil B
-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------

View solution in original post

0 Kudos
tkontogiorgis
Explorer
Explorer
207 Views
Registered: ‎09-10-2019

Hi @abommera ,

Thanks for responding. Currently i am in loopback mode testing but later on the real design my IP will receive multiple BD because the LWIP chunk has different size than my HW IP. Do you think that one descriptor is the problem in the current situation? What about the other considerations in the above post?

Thanks again for helping,

Best Regards

0 Kudos
tkontogiorgis
Explorer
Explorer
169 Views
Registered: ‎09-10-2019

Hi @abommera ,

I realized that my understanding was wrong. I tried to use 4Bds per packet and I realized that I need to free the BD every time but Allocate them only when the Rx BD pointer reaches last available free BD in HW. So my understanding was fault concerning the Re-allocation of BDs.

Thanks very much,

Theo

0 Kudos