cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
sktpin
Adventurer
Adventurer
2,693 Views
Registered: ‎09-11-2018

PCI-e - FPGA registers via /dev/mem - inconsistent reads

In a setup involving an Artix-7 devboard, the FPGA exposes, on two different BARs:
1) register blocks for config & handshake
2) a memory block of e.g. 64KB for data transport.

The FPGA card is inserted in the PCI-e slot of a devboard with an ARM Cortex A CPU running Linux 3.10.
For 2), a DMA driver is used. This still requires 1) for handshaking, i.e. poll that the write index of the buffer is over a certain threshold, do a DMA transfer, and update the read index to be seen by the FPGA logic. Currently, no interrupts are being used.

1) uses the Linux /dev/mem driver and mmap to get an address in user space.
While the /dev/mem access seemed to work at a first glance, i.e. I saw valid looking content in the registers, it has failed so far to work for the DMA handshaking scheme.
Currently, for debugging, the FPGA does not continuously write to the DMA buffer, it writes a certain data packet when it was triggered via a bit in one of the config registers - and I trigger it as many times as necessary to get over the write index threshold (currently just 1/2 of the buffer size). I.e. the value of the write index should be rather predictable (static) after no triggering is going on anymore.

I stepped into the code with GDB and saw this interesting, repeatable effect: If I have a breakpoint at a certain place in the code where the write index should be over the threshold, its value is higher than when the breakpoint is not there, even if, instead, a seconds long delay is there.

What could explain this observation?
Does this have anything to do with cache? I.e. are /dev/mem mappings to PCI-e regions subject to caching? (and does GDB flush that, voila explanation?) I've found inconsistent accounts on that floating around the net.
Someting else?

0 Kudos
9 Replies
johnvivm
Voyager
Voyager
2,687 Views
Registered: ‎08-16-2018

There are truths, lies and debug observations. Don't trust them blindly, specially when hardware with their own mind is around (DMA, interrupts, etc). Better to check things on the go and print meaningful things out.

0 Kudos
pedro_uno
Advisor
Advisor
2,665 Views
Registered: ‎02-12-2013

I use the Xilinx PCIe/DMA bridge on cards running under Ubuntu Linux 16.04.  I used to like /dev/memory for talking to Xilinx PCIe interfaces but for the last couple of years I have been using the Xilinx XDMA driver for Linux.  It is provided in full source code and makes it easy to get DMA transfers going.  There is an additional Bypass AXI port on the core that lets you do regular memory mapped register accesses.  Separate driver devices are provided by XDMA for the Bypass and DMA access types.

 

Maybe it makes sense to debug the PCIe stuff in a more standard software environment like Linux on a PC before trying to do things with an embedded processor.

 

It should be possible to then cross compile the XDMA driver for your Arm processor and then move everything over there for deployment. I was able to do this once with a PPC processor running Linux.

 

Good luck

----------------------------------------
DSP in hardware and software
-----------------------------------------
0 Kudos
sktpin
Adventurer
Adventurer
2,650 Views
Registered: ‎09-11-2018

Thanks, yes, I cross-compiled the xdma driver, I have a thread about that wondering why it's provided as "x86 only". Turns out it does not use ARM-specific cache invalidation opcodes which may be needed. If that is an issue on my ARM Cortex A (the CPU on the Tegra TK1), I guess I have to dig into the driver source and add those.

 

Anyway, I will have a look at the DMA-bypass driver to maybe access the config registers with, good hint! I saw it already I think, but it didn't "click" to tell me I could use that instead of my existing devmem code.
If that, on my ARM platform, does have cache issues, I'd still have a problem there, too, though. As I have not done Linux kernel driver development before, right now I wouldn't even know where to look to add those cache invalidation instructions to the driver.

0 Kudos
sktpin
Adventurer
Adventurer
2,638 Views
Registered: ‎09-11-2018

@pedro_uno: you mention a "additional Bypass AXI port", I apparently misread that earlier. I tried to open the xdma0_bypass device file that's there after loading the Xilinx xdma driver and mmap FPGA register addresses on that, but it seems that's not what it does. I either get garbage data, or a crash (if I do *not* add the BAR address shown with dmesg to the FPGA register block address like I do with /dev/mem).

 

As for not trusting the debugger... the FPGA guy confirmed what I saw happening with GDB, from the FPGA end. Including the effect that a breakpoint vs. a long delay produce a slightly different write index value, i.e. the breakpoint really affects what's happening to the FPGA. (as described in the 1st post, using /dev/mem)
Even adding some 100millisec delay between setting the bit that triggers data being added to the DMA buffer ~200 times in a loop does not help, apparently, some of those bit changes are being missed for some reason, resulting in me getting roughly (but not exactly) half of the expected write index value.

0 Kudos
pedro_uno
Advisor
Advisor
2,631 Views
Registered: ‎02-12-2013

I have no idea what could be causing the inconsistency you mention but I post a snippet of how I use the XDMA driver on 64 bit Ubuntu Linux.

 

The call to mmap() returns a pointer to the start of the virtual memory address that goes through to the BAR associated with the Bypass port on the PCIe bridge.  Accessing register is then just a matter of offsets into that as assigned in Vivado IPI.

 

/////////////////////////

 

int main()
{
    uint32_t read_data, write_data;

    void* pcie_addr;
    uint32_t pcie_bar0_size=BAR1_SIZE;

    int fd = open("/dev/xdma0_bypass",O_RDWR|O_SYNC);
    if (fd<0) fprintf(stderr, "Can't open pcie driver for FPGA! You must be root?\n");

    pcie_addr=mmap(0,pcie_bar0_size,PROT_READ|PROT_WRITE,MAP_SHARED,fd,0);
    if (NULL == pcie_addr) fprintf(stderr, "Can't mmap()\n");

    read_data = read_reg(pcie_addr, FPGA_ID);
    printf("FPGA_ID      = 0x%08x\n", read_data);
    read_data = read_reg(pcie_addr, FPGA_VERSION);
    printf("FPGA_VERSION = 0x%08x\n\n", read_data);

    write_reg(pcie_addr, FPGA_LED, 0x55);

    // Do useful stuff here.

 

    munmap(pcie_addr,pcie_bar0_size);
    close(fd);

    return(0);
}

----------------------------------------
DSP in hardware and software
-----------------------------------------
sktpin
Adventurer
Adventurer
2,583 Views
Registered: ‎09-11-2018

Hey, thanks for the example! I think I already spotted something I did wrong, need to try it out.
What do read_reg and write_reg do? Just offsetting to the base and pointer access? (64bit in your case?)

 

EDIT:
Assuming that those do only normal memory writes of word width, with address offset - that didn't work for me. I can open /dev/xdma0_bypass, I can mmap* it and seem to get a valid address back, buf if I then attempt to read a 32bit value from e.g. ((volatile uint32_t*)(mmappedAddr + offsetToRegister))[0], the CPU freezes. Still searching whether something else is wrong...

* although the BAR I was told should be associated with bypass was shown as way too small in dmesg, only 64KB, whereas some of the addresses of register blocks reach 1/2 MB or so - but there is a second BAR which has the same address range, and then more, so I mapped those 12 MB instead...

0 Kudos
sktpin
Adventurer
Adventurer
2,550 Views
Registered: ‎09-11-2018

Ok, turns out the FPGA wasn't set up to access registers via xdma0_bypass. But in the Xilinx Answer 65444 Linux pdf, the /dev/xdma0_user device was mentioned to access the PCIe AXI Lite interface, which does get to the registers.


Using that, I see the same behavior as before with /dev/mem, i.e. I'm seeing a write index of ~ 1/2 of the expeced value, and also the number is slightly different depending on where I have a breakpoint in the code on the CPU or not, and that breakpoint is between triggering the last addition of data to the buffer (by writing a trigger bit into a register) and reading + printing the write index register. A huge delay in that place instead of the breakpoint has no effect.

EDIT:
Now, it does look like there is also something wrong at the FPGA end, as a way was found to reproduce part of the observed effect without involvement of "my" CPU (before, the adding of data to the buffer was triggered via a HW button on the board, and that worked as supposed at first...) While that needs to be resolved still - it remains weird that a breakpoint in the CPU changes the outcome, if slightly (but visible also inside the FPGA IDE, i.e. it is no GDB bad connection artifact or some such). Would be interesting to hear whether anyone has any idea what could be the cause...

0 Kudos
pedro_uno
Advisor
Advisor
2,537 Views
Registered: ‎02-12-2013

It sounds like you are very close to having this work the way you want.

 

One thing I find helpful is to put a "System ILA" on the AXI bus in the IP Integrator block diagram.  With that you can observe the actual physical address that is going out on the AXI bus. 

 

There are things that can go wrong with dereferencing pointers in C code.  If you access an address where there is no AXI device the bus will hang and either cause a segmentation fault or crash the computer.

 

Here are some old versions of my write_reg() and read_reg() functions, nothing fancy.

 

void write_reg(void* vir_addr,uint32_t offset,uint32_t value)
{
   *((uint32_t*)(vir_addr+offset))=value;
}

uint32_t read_reg(void* vir_addr,uint32_t offset)
{
   return *((uint32_t*)(vir_addr+offset));
}

 

----------------------------------------
DSP in hardware and software
-----------------------------------------
sktpin
Adventurer
Adventurer
2,466 Views
Registered: ‎09-11-2018

Thanks. Well, wouldn't that be nice, close to what I need.

While there was a bug at the FPGA end, and in principle, I get the correct register value, it still only works if I have a GDB breakpoint before the line of code where the value is expected, whereas a long delay in that place does not do the same.

That behavior I see, whether I use /dev/mem or /dev/xdma0_user, for register access.

 

 

0 Kudos