cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
keffer
Visitor
Visitor
7,037 Views
Registered: ‎03-05-2009

SDMA (MPMC) and partial BD completion

Hi,

We have a linux build running on the ML405 dev board and are using this as a platform to test transfer of data using SDMA (on MPMC) and Local Link. We have a simple 'loopback' PIM design that routes data from the transmit DMA channel through to the receive DMA channel and I have written a test kernel driver that initiates the transmit and waits for data on the receive.

So, the setup is along these lines:

1. Create BD rings for rx and tx channels
2. Load up 4096 byte target buffers for each of the rx channel BDs
3. Enable interrupts and start BD rings
.
. some time later
.
4. Load up a 512 byte buffer into tx BD with SOP and EOP flags set


The tx BD completes as expected, but nothing happens on the rx side, although if I read the rx channel 'Current Buffer Length' register I can see it decrease. If I send further tx packets then the rx BD will eventually complete when all of its 4096 bytes have been filled.

My question is, is it possible for the rx channel to complete when it has received less than the length specified in the buffer descriptor? Reading through the MPMC documentation it seemed that the BD would complete on EOP, but maybe its not quite as simple as that. Have we missed something by making our test loopback too dumb, or is it just the case that the User IP must always stuff enough data into the fifo to ensure a BD is completed?

Note, if I perform the test with an XLlTemac driver attached to our loopback PIM I get the same result, no BD completion for small packets. So, what is it that the Temac IP does to force completion of small packets?

thanks in advance for any help!

 

Keith

 

0 Kudos
4 Replies
morphiend
Explorer
Explorer
7,014 Views
Registered: ‎08-14-2007

I ran into the exact problem with the exact same setup. Using Chipscope I found some issues with the interface.

 

First, here are some words of warning:

 

1) In the TEMAC usage setup, the TEMAC ships the LocalLink footer across the LocalLink interface which contains the receive size. The SDMA does not. As such, when you receive a DMA transfer between SDMA controllers, you won't receive the size. This causes the Xilinx OS-Agnostic driver to not ever "see" a completed RX BD.

 

I overcame this by writing my own XLlDma_BdRingFromHw() to not force looking for the receive size to be set in the descriptor. I also had to add a mechanism for know how much data was actually sent via DMA. In my case, I added a mailbox in the logic where I wrote the size then performed the transfer.

 

 

unsigned XLlDma_BdRingFromHw(XLlDma_BdRing *RingPtr, unsigned BdLimit, XLlDma_Bd **BdSetPtr)
{
XLlDma_Bd *CurBdPtr;
unsigned BdCount;
unsigned BdPartialCount;
u32 BdStsCr;

CurBdPtr = RingPtr->HwHead;
BdCount = 0;
BdPartialCount = 0;

/* If no BDs in work group, then there's nothing to search */
if (RingPtr->HwCnt == 0) {
*BdSetPtr = NULL;
return (0);
}

/* Starting at HwHead, keep moving forward in the list until:
* - A BD is encountered with its completed bit clear in the status
* word which means hardware has not completed processing of that
* BD.
* - A BD is encountered with its XLLDMA_USERIP_APPWORD_OFFSET field
* with value XLLDMA_USERIP_APPWORD_INITVALUE which means hardware
* has not completed updating the BD structure.
* - RingPtr->HwTail is reached
* - The number of requested BDs has been processed
*/
while (BdCount < BdLimit) {
/* Read the status */
XCACHE_INVALIDATE_DCACHE_RANGE((CurBdPtr), XLLDMA_BD_HW_NUM_BYTES);
BdStsCr = XLlDma_mBdRead(CurBdPtr,
XLLDMA_BD_STSCTRL_USR0_OFFSET);
/* If the hardware still hasn't processed this BD then we are
* done
*/
if (!(BdStsCr & XLLDMA_BD_STSCTRL_COMPLETED_MASK)) {
break;
}

BdCount++;

/* Hardware has processed this BD so check the "last" bit. If
* it is clear, then there are more BDs for the current packet.
* Keep a count of these partial packet BDs.
*/
if (BdStsCr & XLLDMA_BD_STSCTRL_EOP_MASK) {
BdPartialCount = 0;
}
else {
BdPartialCount++;
}

/* Reached the end of the work group */
if (CurBdPtr == RingPtr->HwTail) {
break;
}

/* Move on to next BD in work group */
CurBdPtr = XLlDma_mBdRingNext(RingPtr, CurBdPtr);
}

/* Subtract off any partial packet BDs found */
BdCount -= BdPartialCount;

/* If BdCount is non-zero then BDs were found to return. Set return
* parameters, update pointers and counters, return success
*/
if (BdCount) {
*BdSetPtr = RingPtr->HwHead;
RingPtr->HwCnt -= BdCount;
RingPtr->PostCnt += BdCount;
XLLDMA_RING_SEEKAHEAD(RingPtr, RingPtr->HwHead, BdCount);
return (BdCount);
}
else {
*BdSetPtr = NULL;
return (0);
}
}

 


 

 

2) Because the TEMAC ships out Footer, it's also using the SOF and EOF indicators. This is what causes the DMA controller to know that a transfer has finished. To circumvent this problem, I had to ensure that my receive buffer was always of the same size as the TX transfer. Theoretically you could scatter on the RX side, but I haven't tried that yet b/c its not necessary feature for my system.

 

HTH,

 - Mike

0 Kudos
brianhill
Xilinx Employee
Xilinx Employee
7,008 Views
Registered: ‎04-23-2008

0 Kudos
keffer
Visitor
Visitor
6,977 Views
Registered: ‎03-05-2009

Thanks very much for the suggestions, really appreciated!

With our own 'real' design, rather than the loopback, we have implemented sending out the footer but had some timing / data alignment issues compared to pure streaming so we'll have a look with Chipscope and work out what's going on (and check against the TEMAC while we're at it!). 

 

In the meantime, your solution works well Mike thanks. Our packet sizes can vary which is why we were wanting to trigger dma completion without the driver necessarily knowing the packet size in advance, but for now I've fixed the length and matched the buffer size as you've done.

 

regards,

Keith

 

0 Kudos
wqiang_max
Observer
Observer
4,636 Views
Registered: ‎06-11-2009

I follow the reference design of XAPP1126, but rewrite the TX,RX FSM to make these two channel work seperatly. I tested it with the standalone system, it works. Then I move to Linux(using the ELDK tools and the method introduced in git.xilinx), When the Linux kernel start, it stopped when the following information appeared:

Linux/PowerPC load: root=/dev/nfs rw ip=on nfsroot=10.10.20.3:/home/embeddedlab/ml403_rootfs console=ttyUL0,9600 mem=64M mtdparts=physmap_flash.0:5M(kernel),43M(others),-(bitstream)
Finalizing device tree... flat tree at 0x5c0300

I searched the website and got some disappointing answer record that it is not possible to implement two sdma in the same system(one for LL_TMAC, the other for locallink interface custom device)

Following is my device tree, can somebody do me a favor:
/dts-v1/;
/ {
#address-cells = <1>;
#size-cells = <1>;
compatible = "xlnx,virtex405", "xlnx,virtex";
model = "testing";
DDR2_SDRAM: memory@0 {
device_type = "memory";
reg = < 0x0 0x40000000 >;
} ;
aliases {
ethernet0 = &TriMode_MAC_GMII;
serial0 = &RS232_Uart;
} ;
chosen {
bootargs = "root=/dev/nfs rw ip=on nfsroot=10.10.20.3:/home/embeddedlab/ml403_rootfs console=ttyUL0,9600 mem=64M mtdparts=physmap_flash.0:5M(kernel),43M(others),-(bitstream)";
linux,stdout-path = "/plb@0/serial@84000000";
} ;
cpus {
#address-cells = <1>;
#cpus = <0x1>;
#size-cells = <0>;
ppc405_0: cpu@0 {
clock-frequency = <300000000>;
compatible = "PowerPC,405", "ibm,ppc405";
d-cache-line-size = <0x20>;
d-cache-size = <0x4000>;
dcr-access-method = "native";
dcr-controller ;
device_type = "cpu";
i-cache-line-size = <0x20>;
i-cache-size = <0x4000>;
model = "PowerPC,405";
reg = <0>;
timebase-frequency = <300000000>;
xlnx,apu-control = <0xde00>;
xlnx,apu-udi-1 = <0xa18983>;
xlnx,apu-udi-2 = <0xa38983>;
xlnx,apu-udi-3 = <0xa589c3>;
xlnx,apu-udi-4 = <0xa789c3>;
xlnx,apu-udi-5 = <0xa98c03>;
xlnx,apu-udi-6 = <0xab8c03>;
xlnx,apu-udi-7 = <0xad8c43>;
xlnx,apu-udi-8 = <0xaf8c43>;
xlnx,deterministic-mult = <0x0>;
xlnx,disable-operand-forwarding = <0x1>;
xlnx,fastest-plb-clock = "DPLB0";
xlnx,generate-plb-timespecs = <0x1>;
xlnx,mmu-enable = <0x1>;
xlnx,pvr-high = <0x0>;
xlnx,pvr-low = <0x0>;
} ;
} ;
plb: plb@0 {
#address-cells = <1>;
#size-cells = <1>;
compatible = "xlnx,plb-v46-1.04.a", "xlnx,plb-v46-1.00.a", "simple-bus";
ranges ;
FLASH: flash@8c000000 {
bank-width = <4>;
compatible = "xlnx,xps-mch-emc-3.01.a", "cfi-flash";
reg = < 0x8c000000 0x4000000 >;
xlnx,family = "virtex4";
xlnx,include-datawidth-matching-0 = <0x1>;
xlnx,include-datawidth-matching-1 = <0x0>;
xlnx,include-datawidth-matching-2 = <0x0>;
xlnx,include-datawidth-matching-3 = <0x0>;
xlnx,include-negedge-ioregs = <0x0>;
xlnx,include-plb-ipif = <0x1>;
xlnx,include-wrbuf = <0x1>;
xlnx,max-mem-width = <0x20>;
xlnx,mch-native-dwidth = <0x20>;
xlnx,mch-splb-awidth = <0x20>;
xlnx,mch-splb-clk-period-ps = <0x2710>;
xlnx,mch0-accessbuf-depth = <0x10>;
xlnx,mch0-protocol = <0x0>;
xlnx,mch0-rddatabuf-depth = <0x10>;
xlnx,mch1-accessbuf-depth = <0x10>;
xlnx,mch1-protocol = <0x0>;
xlnx,mch1-rddatabuf-depth = <0x10>;
xlnx,mch2-accessbuf-depth = <0x10>;
xlnx,mch2-protocol = <0x0>;
xlnx,mch2-rddatabuf-depth = <0x10>;
xlnx,mch3-accessbuf-depth = <0x10>;
xlnx,mch3-protocol = <0x0>;
xlnx,mch3-rddatabuf-depth = <0x10>;
xlnx,mem0-width = <0x20>;
xlnx,mem1-width = <0x20>;
xlnx,mem2-width = <0x20>;
xlnx,mem3-width = <0x20>;
xlnx,num-banks-mem = <0x1>;
xlnx,num-channels = <0x0>;
xlnx,pagemode-flash-0 = <0x0>;
xlnx,pagemode-flash-1 = <0x0>;
xlnx,pagemode-flash-2 = <0x0>;
xlnx,pagemode-flash-3 = <0x0>;
xlnx,priority-mode = <0x0>;
xlnx,synch-mem-0 = <0x0>;
xlnx,synch-mem-1 = <0x0>;
xlnx,synch-mem-2 = <0x0>;
xlnx,synch-mem-3 = <0x0>;
xlnx,synch-pipedelay-0 = <0x2>;
xlnx,synch-pipedelay-1 = <0x2>;
xlnx,synch-pipedelay-2 = <0x2>;
xlnx,synch-pipedelay-3 = <0x2>;
xlnx,tavdv-ps-mem-0 = <0x1adb0>;
xlnx,tavdv-ps-mem-1 = <0x3a98>;
xlnx,tavdv-ps-mem-2 = <0x3a98>;
xlnx,tavdv-ps-mem-3 = <0x3a98>;
xlnx,tcedv-ps-mem-0 = <0x1adb0>;
xlnx,tcedv-ps-mem-1 = <0x3a98>;
xlnx,tcedv-ps-mem-2 = <0x3a98>;
xlnx,tcedv-ps-mem-3 = <0x3a98>;
xlnx,thzce-ps-mem-0 = <0x2710>;
xlnx,thzce-ps-mem-1 = <0x1b58>;
xlnx,thzce-ps-mem-2 = <0x1b58>;
xlnx,thzce-ps-mem-3 = <0x1b58>;
xlnx,thzoe-ps-mem-0 = <0x1b58>;
xlnx,thzoe-ps-mem-1 = <0x1b58>;
xlnx,thzoe-ps-mem-2 = <0x1b58>;
xlnx,thzoe-ps-mem-3 = <0x1b58>;
xlnx,tlzwe-ps-mem-0 = <0x88b8>;
xlnx,tlzwe-ps-mem-1 = <0x0>;
xlnx,tlzwe-ps-mem-2 = <0x0>;
xlnx,tlzwe-ps-mem-3 = <0x0>;
xlnx,tpacc-ps-flash-0 = <0x61a8>;
xlnx,tpacc-ps-flash-1 = <0x61a8>;
xlnx,tpacc-ps-flash-2 = <0x61a8>;
xlnx,tpacc-ps-flash-3 = <0x61a8>;
xlnx,twc-ps-mem-0 = <0xd6d8>;
xlnx,twc-ps-mem-1 = <0x3a98>;
xlnx,twc-ps-mem-2 = <0x3a98>;
xlnx,twc-ps-mem-3 = <0x3a98>;
xlnx,twp-ps-mem-0 = <0xd6d8>;
xlnx,twp-ps-mem-1 = <0x2ee0>;
xlnx,twp-ps-mem-2 = <0x2ee0>;
xlnx,twp-ps-mem-3 = <0x2ee0>;
xlnx,xcl0-linesize = <0x4>;
xlnx,xcl0-writexfer = <0x1>;
xlnx,xcl1-linesize = <0x4>;
xlnx,xcl1-writexfer = <0x1>;
xlnx,xcl2-linesize = <0x4>;
xlnx,xcl2-writexfer = <0x1>;
xlnx,xcl3-linesize = <0x4>;
xlnx,xcl3-writexfer = <0x1>;
} ;
RS232_Uart: serial@84000000 {
clock-frequency = <100000000>;
compatible = "xlnx,xps-uartlite-1.01.a", "xlnx,xps-uartlite-1.00.a";
current-speed = <9600>;
device_type = "serial";
interrupt-parent = <&xps_intc_0>;
interrupts = < 5 2 >;
port-number = <0>;
reg = < 0x84000000 0x10000 >;
xlnx,baudrate = <0x2580>;
xlnx,data-bits = <0x8>;
xlnx,family = "virtex4";
xlnx,odd-parity = <0x1>;
xlnx,use-parity = <0x0>;
} ;
TriMode_MAC_GMII: xps-ll-temac@81c00000 {
#address-cells = <1>;
#size-cells = <1>;
compatible = "xlnx,compound";
ethernet@81c00000 {
compatible = "xlnx,xps-ll-temac-2.03.a", "xlnx,xps-ll-temac-1.00.a";
device_type = "network";
interrupt-parent = <&xps_intc_0>;
interrupts = < 4 2 >;
llink-connected = <&PIM2>;
local-mac-address = [ 00 0a 35 6a 78 00 ];
reg = < 0x81c00000 0x40 >;
xlnx,avb = <0x0>;
xlnx,bus2core-clk-ratio = <0x1>;
xlnx,mcast-extend = <0x0>;
xlnx,phy-type = <0x1>;
xlnx,phyaddr = <0x1>;
xlnx,rxcsum = <0x0>;
xlnx,rxfifo = <0x4000>;
xlnx,rxvlan-strp = <0x0>;
xlnx,rxvlan-tag = <0x0>;
xlnx,rxvlan-tran = <0x0>;
xlnx,stats = <0x0>;
xlnx,temac-type = <0x1>;
xlnx,txcsum = <0x0>;
xlnx,txfifo = <0x4000>;
xlnx,txvlan-strp = <0x0>;
xlnx,txvlan-tag = <0x0>;
xlnx,txvlan-tran = <0x0>;
} ;
} ;
xps_ll_example_0: xps-ll-example@81d00000 {
compatible = "xlnx,xps-ll-example-1.00.a";
reg = < 0x81d00000 0x10000 >;
llink-connected = <&PIM3>;
xlnx,family = "virtex4";
xlnx,include-dphase-timer = <0x0>;
} ;
mpmc@0 {
#address-cells = <1>;
#size-cells = <1>;
compatible = "xlnx,mpmc-5.04.a";
PIM2: sdma@84600100 {
compatible = "xlnx,ll-dma-1.00.a";
interrupt-parent = <&xps_intc_0>;
interrupts = < 3 2 2 2 >;
reg = < 0x84600100 0x80 >;
} ;
PIM3: sdma@84600180 {
compatible = "xlnx,ll-dma-1.00.a";
interrupt-parent = <&xps_intc_0>;
interrupts = < 1 2 0 2 >;
reg = < 0x84600180 0x80 >;
} ;
} ;
xps_bram_if_cntlr_1: xps-bram-if-cntlr@ffff0000 {
compatible = "xlnx,xps-bram-if-cntlr-1.00.b", "xlnx,xps-bram-if-cntlr-1.00.a";
reg = < 0xffff0000 0x10000 >;
xlnx,family = "virtex4";
} ;
xps_intc_0: interrupt-controller@81800000 {
#interrupt-cells = <0x2>;
compatible = "xlnx,xps-intc-2.00.a", "xlnx,xps-intc-1.00.a";
interrupt-controller ;
reg = < 0x81800000 0x10000 >;
xlnx,kind-of-intr = <0x20>;
xlnx,num-intr-inputs = <0x6>;
} ;

} ;
ppc405_0_dplb1: plb@1 {
#address-cells = <1>;
#size-cells = <1>;
compatible = "xlnx,plb-v46-1.04.a", "xlnx,plb-v46-1.00.a", "simple-bus";
ranges ;
} ;
} ;
0 Kudos