04-18-2012 11:06 AM
Using the example verilog design that came with my SP605 kit I have programmed the FPGA and can write/read to/from various BAR0 memory locations using PCI Tree. I have studied the design for the receipt and process of Memory Writes and Read TLPs and don't see any interrupts being generated when a Completion TLP is processed. If I need to pass a 32-bit data word from within the FPGA to the PC running Windows XP via PCI Express I thought that the only thing I needed to do was to implement the Memory Write TLP logic into the TX_ENGINE verilog file and present it to the PCIe Core and let the Core transfer it to the PC application. I have been looking for PC applications that will allow me to verify my design but the only applications that I've found, PCI Tree; Windriver; PCI Scope, only allow me to send data to the FPGA and not to receive from the FPGA, other than responses to Memory Reads. What do I need to do for the FPGA to send unsolicited data to the PC application? I've seen references to interrupts but I don't understand how to use them or even if I need to use them.
04-19-2012 08:47 AM
Thanks for your prompt response. I had looked at XAPP1052 but did not go any further because I could not get the implement.pl script to sucessfully complete (stops at ngdbuild with NGDBUILD completed with errors.). So I just started looking at the verilog code provided and it got really complicated. I'm using Windows XP, the Xilinx ISE Design Suite 13.3 and version 2.3 of the PCI Express Core. My implementation uses the PIO example provided by the Core Generator application and the intent is to get one 32-bit data word from the FPGA, not in memory just a local data word created by me, to the target host laptop via PCI Express. I started looking at the XAPP1052 BMD_32_TX_ENGINE.v and after I located where the TX Memory Write TLP header is created signal mwr_start_i is used in an else if statement to generate the TLP header word and to get the state machine started and after I traced the signal to it's source (BP_EP_MEM.v) it appears that memory data is being used to set/clear this and other signals. Since I'm not writing my data to the FPGA memory I'm confused on how to proceed. Any other suggestions?
04-20-2012 06:49 AM
XAPP1052 is indeed not useful.
Here are a set of materials you should refer to:
1. There is a blog talking about PCIE implementation which is available at:
This blog will get you started.
2. You need to read the User guide of PCIE v2.5 IP Core.
as well as the PCIE 2.0 base specification.
both of them are available on-line.
3. I recommend you to read the Linux Device Driver
with specific concern about PCI driver and DMA Mapping.
And for you questions,
1. When you want to read data from FPGA (for host PC to access data on FPGA), there are two ways you can do:
a. PC can send a TLP Read Request to FPGA, and FPGA respond with Read Completions. And that's an end.
b. PC reserves some memory for FPGA to control, and then sends TLP write to configure registers on FPGA, telling FPGA to issue a write TLP to Host PC. After FPGA finished TLP write, it should issue an interrupt to notify host PC that the transfer is done. Host PC regain control of the memory, and gets the data.
In situation "a", it is known as memory IO. In situation B, it is what we called Direct Memory Access.
The example design generated automatically by CoreGen only implements Memory IO. XAPP1052 has both, but definitely, it is more complicated.
Good luck with your design :P
04-20-2012 08:56 AM
Steve thanks for your reply.
I just programmed the FPGA with the routed.bit generated by the XAPP1052 as per the instructions but my PC does not recognize the board and it even refuses to install the driver provided (I get "The specified location does not contain information about your hardware"). And that's that.
Using the example PCI Express verilog design that came with the SP605 kit and PCI Tree I can see the board and can write/read to/from specific memory locations. What I'm trying to accomplish is for the board to take some data, 1 32-bit data word, and transfer it to the host PC using a Memory Write from the FPGA. I thought that implementing the Memory Write logic in the TX_ENGINE, w/o interrupts, would do the trick. I can see the Memory Write TLPs at the PCIe Core but have not found a way to verify if these TLPs are being transmitted by the core to the host PC application. Seems like this is what your item b. addresses. Although the host PC controls the transfer initiation. In my case the FPGA gets some data from a different source and it then routes the data to the PC host.
I've been looking at items 1 and 2 and this is where I get confused because even though the Memory Writes and Interrupts are addressed I can't seem to find (understand) how the two interact. Or even what interrupt lines are being used (seems like the cfg_interrupt* lines). Since I'm using Windows XP I believe I need to use the legacy interrupts. Looking at UG672 I see how these interrupts are generated but I don't see where the Memory Write TLPs fit in, before the interrupts or after. Assuming that interrupts are needed then here's what I think I need to do.
1. Data comes into the FPGA from an external source other than the PC host.
2. My FPGA logic generates a legacy interrupt to the Core prior to building the Memory Write TLP.
3. When the Core completes the interrupt transmission to the host PC and desserts the interrupts my logic will then generate the Memory Write TLP and present it to the Core for transmission to the PC host which has been alerted that data is coming via the interrupt.
04-24-2012 01:51 PM
Luis, I second Steve that Linux and its large Open Source driver base, together with the well-written but slightly outdated LDD3 will get you going pretty quickly. You cannot take PCIe (with DMA) with one giant leap, you need little steps. Pick a PCIe device for which the data sheet/user guide and the driver is available and try to understand the various tasks a driver has to do. This goes from claiming PCI device support, driver initialization, device initialization, OS-level device handling, handling race conditions, and so on, and properly going the full path backwards and not leak memory.
I would recommend to change your architecture a little bit regarding the DMA write and interrupt order. Try to adhere to a token processing architecture instead of oversampling everything. But remember: The token is precious, so neither lose nor duplicate the token. The token is some ‘virtual’ thing, encoded implicitly in some states and status bits and is explained later.
1. Prepare some internal PIO registers, say, in BAR0 which is a non-prefetchable memory-mapped I/O block. You will need them inside the Interrupt Service Routine (ISR).
1a. You need an interrupt status register with, say, two status bits. When read, any set bit shall be reset from the register. This avoids an additional write operation for resetting a bit manually once serviced. The first bit will be used for indicating an “Address Request” as detailed below. A second bit will be used to indicate a shutdown acknowledge by the core. The register is read-only, but, as told, resets all '1' bits upon read. Try to not lose a new event if the bit was '0' when reading.
1b. You need a 64 bit address register used by the driver for telling the device what is the address to write received data to. This should be write-only.
1c. You need an on/off switch register (write only). Explanation is below.
2. Don’t forget to properly initialize your device inside the driver: Enable Master operations (i.e. DMA), assign the BARs properly, request the interrupt and always check for return values.
3. I propose the following terminology for your given example: The device can be in one of four states: *frozen*, *cold*, *warm* and *hot*. At startup, the device is *frozen* state. Once general PCI setup is done, the device can be switched on by the driver by writing to the on/off switch register. Switching the device on will move it into the *cold* state. The device owns the token when in *cold* state.
4. Every time the device is in *cold* state, it desparately waits for a buffer address written by the driver. It needs this address to write newly received data words there, so in this situation, the device will set the “Address Request” interrupt status bit and trigger the interrupt. With this operation done, the device goes into *warm* state. The token is the interrupt status bit, so it is still owned by the device.
Note that legacy interrupts (INTA) are easier to implement as they guarantee interrupt propagation and handling under all conditions. Downside is that in the general case you have to share the interrupt with other resources so that your ISR might be called if the other device needs attention. Additionally, the ISR might be called even if there is no work to be done anymore, so you always have to double-check by reading from the interrupt status register. MSI would be better from the token aspect but sharing a single MSI with other interrupt sources than the token might be more effort to do right, so go for legacy interrupts first.
5. The driver will eventually enter the ISR and check the interrupt status register. If the “Address Request” bit was set, it will be auto-cleared by the device, so the driver now owns the token. Now the driver has to find a well-suited address for data exchange. Such a memory location must reside in kernel space, not user space. Remember to declare the memory as PCI coherent memory, other PCI memory types are of no use at this moment. You will then have to deal with two addresses, the address as seen from the CPU and the device-level address as seen on the PCI bus. Although they might be the same on most systems, they might differ on any computer, so keep them apart, say, by putting an imaginary red badge on a CPU address pointer and an imaginary blue badge on a PCI address pointer. You could use different variable names for telling them apart. The PCI address should always be 64 bits, and the CPU address should be the address pointer range supported by the CPU/OS architecture.
Once you have the address in PCI address space (the blue one), write it to the address register (see 1b above). Try to find a good ‘commit’ operation if you split the two 32 bit parts into two transfers, i.e. write the LSB part of the address first, then the MSB part which commits the whole address to the internal FPGA logic. At this time, the token returned to the device, and the device enters the *hot* state.
6. In *hot* state, the device waits for data. The driver is idle and only waits for an interrupt. The device owns the token, and once data arrives, it will use the address to write the data to main memory. As soon as this is done, the loop returns to point 4 above, i.e. it will set the interrupt status and indicate an interrupt, again handing over the token to the driver. The bit now not only indicates “Address Request” but also “Data Available”. The loop runs from 4 to 6 until you want to terminate that action.
7. If you want to terminate the game, this should happen in-order. The usual way is that the driver tells the device that it wants to shut down by writing the ‘off’ command to the on/off register. Next the device terminates all pending operation and indicates completion by issuing a new interrupt with the shutdown acknowledge interrupt status bit set. More complex designs may in turn have more complex shutdown operations like waiting for some operations to finish, more cleanup to do by the driver, etc. The shutdown is a mostly asynchronous request and leads to a *frozen* state and an unconditional ownership of the token by the driver.
8. Don’t forget to deallocate all ressources when stopping the driver.
I hope that gets you going.