04-05-2018 09:53 AM - edited 04-05-2018 09:55 AM
I am using the AXI Memory Mapped to PCIe IP core interfacing with a Gen 1 PCIe slave. To get the best bandwidth out of the PCIe link I wish to have multiple outstanding writes. My custom AXI logic is designed to generate up to 2 outstanding writes and 8 outstanding reads, the maximum the IP core can handle.
The CPU slave I am talking to can accept 2 address handshakes with unique transaction IDs but cannot handle receiving data with two different transaction IDs. The slave has to send the write response before it can accept the 2nd write transaction data or it 'falls over'.
I assume this is because the IP block is generating non-posted PCIe writes which means it must wait for the write response before the next transaction can start sending data. Is there a way to force the IP core to generate posted writes so I don't need to wait for a completion before sending the second transaction IDs write data?
04-05-2018 10:03 AM
Can you clarify if you are seeing this behavior:
"The CPU slave I am talking to can accept 2 address handshakes with unique transaction IDs but cannot handle receiving data with two different transaction IDs. The slave has to send the write response before it can accept the 2nd write transaction data or it 'falls over'."
On a AXI Slave attached to the AXI Bridge Master port or a PCIe Slave which you are initiating a transaction to through an the AXI Bridge Slave port?
04-05-2018 02:05 PM - edited 04-05-2018 02:07 PM
I am generating the write transaction on the FPGA with my custom AXI logic which is sent via the IP core (to translate to PCIe packets) to the CPU across PCIe. This writes data to the CPUs DDR. The CPU is the PCIe root complex in the system but the FPGA is initiating read and writes requests.
From PG055 I am using the following behaviour.
"The slave bridge provides termination of memory-mapped AXI4 transactions from an AXI master device (such as a processor). The slave bridge provides a way to translate addresses that are mapped within the AXI4 memory mapped address domain to the domain addresses for PCIe. When a remote AXI master initiates a write transaction to the slave bridge, the write address and qualifiers are captured and write data is queued in a first in first out (FIFO). These are then converted into one or more MemWr TLPs, depending on the configured Max Payload Size setting, which are passed to the integrated block for PCI Express. The Slave Bridge can support up to two active AXI4 memory mapped write transactions.
When a remote AXI master initiates a read transaction to the slave bridge, the read address and qualifiers are captured and a MemRd request TLP is passed to the core and a completion timeout timer is started. Completions received through the core are correlated with pending read requests and read data is returned to the AXI master. The slave bridge is capable of handling up to eight memory mapped AXI4 read requests with pending completions."
04-05-2018 02:59 PM
@seb.mitchison your description of the problem is not really that clear. Are you saying that when you use the FPGA logic to initiate two writes to a CPU over PCIe the CPU will fail if the second request comes before the completion of first?
How do you recognize this failed case? What goes wrong?
All normal writes (MemWr) in PCIe are posted, only IO or Cfg writes are non-posted. Its been awhile since I looked at it but I believe only the root complex is allowed to initiate IO or Cfg transactions but you may want to make sure that you are not somehow creating those accidentally.
The slave bridge should be generating a write response to your internal master logic once the write data has been accepted by the bridge, it does not matter if the target of the PCIe transaction has acked the data yet. Are you sure your master is not holding off the AXI write response?
If you do only one write at a time does everything work normally for multiple writes?
04-05-2018 11:44 PM - edited 04-06-2018 12:00 AM
Case 1 that works.
I initiate up to 8 AXI read requests and up to 2 write requests in the FPGA logic. These are sent across the Slave Bridge in the IP core. The reads work as expected. The write logic initiates write transactions by sending addresses until the write address channel is hung up because it has completed 2 successful handshakes. The next handshake completes once the write response from the first write transaction has completed. In this case the FPGA sends the write data for the first write transaction ID and waits for the response to complete. The third address that was hung up now completes and the write data for the 2nd write transaction is sent and the system waits for the write response. This continues until all the write transactions complete. The reads are completing concurrently.
The problem with this is that the write response take time to return after sending the write data.
Case 2 works as above except that once the first write data has been sent, rather than wait for the write response for the first transaction before sending the second write data, I send the write data straight after and concurrently wait for the write response from the first write transaction. This saves time waiting for write responses.
One full DMA sequence completes for case 2 but after it completes another will not trigger and looking at the DDR memory contents it then all reads X's and that memory block is corrupted/unusable.
I believe from reading around that non posted PCIe writes don't allow another transactions data to be sent until the response is received?
Is this clearer now?
04-06-2018 12:15 AM
"Are you saying that when you use the FPGA logic to initiate two writes to a CPU over PCIe the CPU will fail if the second request comes before the completion of first?"
Yes. The system does not fail to send the first write response if I send the second write data before the first write response is returned but after the whole DMA sequence completes, the next time the DMA sequence is triggered it does not execute. When I investigate the DDR contents the area I have DMAd to reads all X's.
If I wait for the first write response before sending the second write then it works fine but I lose time waiting for the write completion.
04-06-2018 06:47 PM
If it is configured as an endpoint, the IP can only generate posted writes. You are only seeing the delay of IP converting your AXI transaction to a PCIe one not the delay of the actual PCIe transaction happening
The bresp to the remote (requesting) AXI4 master device for a write to a remote PCIe device is not issued until the MemWr TLP transmission is guaranteed to be sent on the PCIe link before any subsequent TX-transfers.
Have you tried just the writes without the simulations reads to temporarily work around the ordering rules?
What size transactions are you doing and what addresses?
It is also not clear what you mean be the DMA sequence doesn't execute. Is the AXI bus idle but some data was written into the IP that never got sent out? Is one of the channels of the AXI bus holding off a ready?