cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Visitor
Visitor
2,322 Views
Registered: ‎08-20-2018

Confusing PCIe XDMA AXI Lite transactions

Jump to solution

I'm currently working on a project with an UltraScale+ FPGA. This FPGA contains an XDMA PCIe end-point with some logic attached to the AXI-Lite interface of the XDMA end-point (via an interconnect). The XDMA is configured as 5GT, Gen2, 4x and a 125MHz AXI-clock (this means I have to configure the data-width as 128-bits). The PCIe root-complex is a Zynq 7045 with Linux and the XDMA driver.

 

Everything is working fine, but I do have one issue with the AXI-Lite interface...

 

When I do a 32-bit read (or write) in Linux, I see up to 4 AXI-Lite transactions occurring from the XDMA IP. The first transaction is at the specified address, the next three are 0x04, 0x08 and 0x0C higher than the first. This is something I was not expecting. It also causes issues with other (3rd party) IP as those IPs have key-hole and clear-on-read registers which are accessed from the unexpected three additional reads/writes. If I connect a JTAG as the second interconnect Master, everything is working fine as expected.

 

I think I know why the XDMA IP is issuing 4 transactions in total. I think it is to fill the internal 128-bits AXI data-bus. However, I do not want that. I just want a single 32-bit AXI-Lite transaction for every 32-bit read on the host.

 

Could someone please explain if I overlooked something and what possible solutions are available. The current solution I'm thinking of is 'muting' the three additional AXI-Lite transactions.

 

Please help.... Thanks in advance!

0 Kudos
1 Solution

Accepted Solutions
Highlighted
Xilinx Employee
Xilinx Employee
2,336 Views
Registered: ‎12-10-2013

Hi @spajas

 

You can use an AXI Interconnect instead of an AXI SmartConnect. 

 

We describe the root cause and workaround here:  https://www.xilinx.com/support/answers/70838.html

 

If you put in the Interconnect instead, you should be good to go!

-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------

View solution in original post

11 Replies
Highlighted
Xilinx Employee
Xilinx Employee
2,278 Views
Registered: ‎12-10-2013

Do you have an AXI SmartConnect inserted in between?  We have seen the SmartConnect upsizing transactions and causing this issue. 

-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
0 Kudos
Highlighted
Visitor
Visitor
2,266 Views
Registered: ‎08-20-2018

Thanks for your reply. I don't have an AXI SmartConnect between. My simplified config is as follows (See attached .jpg for more info):

 

XDMA(AXI-Lite) -> AXI Interconnect -> Rest of system (and AXI ILA)

 

I also connected a JTAG AXI master to the same AXI Interconnect for testing purposes. In case of XDMA, I get up to 4 consecutive AXI Lite transaction. With JTAG, I only get one.

 

Any ideas? We really need to get rid of the additional 3 AXI Lite transactions and I'm hoping it is something I overlooked. Any help is appreciated.

xdma_axi.jpg
0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
2,248 Views
Registered: ‎12-10-2013

Hi @spajas

 

I have been simulating today, and I am not seeing the same behavior.  You could provide the command you are running on the Linux system?

 

 

-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
0 Kudos
Highlighted
Contributor
Contributor
2,232 Views
Registered: ‎09-14-2017

Have you set the PCI BAR as non-prefetchable otherwise the CPU root or PCIe-bridges/switches can do prefetching. Use of "reset on read" registers is very dangerous in PCI based systems, learned this the hard way long time ago in an ASIC project. 

 

--Kim

0 Kudos
Highlighted
Visitor
Visitor
2,222 Views
Registered: ‎08-20-2018

@bethe@kenkovaa:Thank you for your replies. I can reproduce this issue with:

- Linux command-line: $> devmem2 <address>

- Via the standard Linux driver (kernel space) IOREAD/IOWRITE:

 

// Below is taken excerpt of the Linux kernel sources:
static inline void raw_writel(u32 val, volatile void iomem addr)
{
    asm volatile("str %1, %0"
             : : "Qo" ((volatile u32 __force *)addr), "r" (val));
}

The BAR is memory mapped in a driver using ioremap_nocache.

 

As for the clear-on-read... As I use some 3rd-party IP, I cannot change that. Xilinx also uses this concept in, for instance, the 25G MAC. Not only that, a common practice to read a FIFO is using a key-hole register. In other words, read the same register over and over to retrieve the FIFO data. My issue also causes havoc on those type of registers.

 

Anyway, I really appreciate your time. Thanks again.

0 Kudos
Highlighted
Contributor
Contributor
2,218 Views
Registered: ‎09-14-2017

Do you see the BAR memory in lspci -v as non-prefetchable (Memory at c0000000 (64-bit, non-prefetchable) [size=4M]). I think ioremap_nocache only affects root behaviour, but if you have switches their prefetch behaviour is set during enumeration phase. 

 

--Kim

0 Kudos
Visitor
Visitor
2,208 Views
Registered: ‎08-20-2018

@kenkovaaYes, all three BARs are non-prefetchable

 

01:00.0 0580: 10ee:8012
        Subsystem: 10ee:0007
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR+ <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin ? routed to IRQ 174
        Region 0: Memory at 40000000 (32-bit, non-prefetchable) [size=32M]
        Region 1: Memory at 42100000 (32-bit, non-prefetchable) [size=64K]
        Region 2: Memory at 42000000 (64-bit, non-prefetchable) [size=1M]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [48] MSI: Enable+ Count=16/32 Maskable- 64bit+
                Address: 000000002eb2c000  Data: e000
        Capabilities: [70] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 1024 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x4, ASPM not supported
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range BC, TimeoutDis+, LTR-, OBFF Not Supported
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                         AtomicOpsCtl: ReqEn-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Kernel driver in use: xdma
0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
2,201 Views
Registered: ‎12-10-2013

Hi @spajas,

 

Do you have an AXI Bridge on the Root Port side?  Can you provide the setup for that?  I am wondering if maybe the transaction is getting upsized on that side of the bus.

 

 

-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
0 Kudos
Highlighted
Visitor
Visitor
2,183 Views
Registered: ‎08-20-2018

Hi @bethe,

 

Thanks for pointing this out! I looked at the design and there is a 64-bit 32-bit to 128-bit SmartConnect. The PCIe root complex requires a 128-bits AXI interface and the Zynq only provides 64-bit 32-bit. As you pointed out before, you have seen that a SmartConnect might upsize an AXI transaction.

 

If this is the case in our situation, what can I do to resolve this issue? As far as I know, I need the SmartConnect to connect a 64-bit 32-bit AXI to 128-bit AXI.

 

*) I made a mistake with 64-bit. It is actually 32-bit, see SmartConnect detail.

zynq_pcie_root_complex.jpg
smart_interconnect.png
0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
2,337 Views
Registered: ‎12-10-2013

Hi @spajas

 

You can use an AXI Interconnect instead of an AXI SmartConnect. 

 

We describe the root cause and workaround here:  https://www.xilinx.com/support/answers/70838.html

 

If you put in the Interconnect instead, you should be good to go!

-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------

View solution in original post

Highlighted
Visitor
Visitor
1,829 Views
Registered: ‎08-20-2018

Hi @bethe,

 

Thank you for your reply. I tested it this morning and it seems to work! :-D

 

I want to thank both you and @kenkovaa for your efforts in helping me with this issue. Greatly appreciated.

0 Kudos