UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Adventurer
Adventurer
6,651 Views
Registered: ‎03-30-2012

PCIe - Multi-Channel DMA - Completion Data handling

Hi all,

 

Can anyone suggest a way to handle PCIe completion data in a multi-channel DMA configuration?

 

In our DMA controller, there are two DMA channel (see figure below). Each channel is capable to issue an Memory read request and for each Completion Data associated to the MemRD request, we want to sort and dispatch the data to the right RX Data Channel. What is the best way to achieve this? Should we use:

 

1-  The Tag field in the TLP packet, for example if tag>128 Data is for RX channel 1, if tag <= 128 then Data is for RX Channel 0

2- Use the concept of Traffic class and Virtual Channel. We don't really want to venture into this....

3- Any suggestion....

 

PCIeDMA2Ch.png

0 Kudos
9 Replies
Xilinx Employee
Xilinx Employee
6,589 Views
Registered: ‎11-25-2015

Re: PCIe - Multi-Channel DMA - Completion Data handling

Hi Opal,

 

You can develop an Scatter gather PCIe DMA Controller in which PCIe DMA transfers data between the PCIe Core and a port’s data buffer or aggregation buffer on up to 2 individual channels

 

An Descriptor Cache controller can be used which can store buffer descriptors read from system memory inside the Descriptor Cache to minimize the latency when servicing a channel. The 4K DC can allow each channel to store up to 2 128-bit buffer descriptors. The access to each channel’s buffer descriptors in the DC can be maintained as a circular buffer with a read (RPTR) and write (WPTR) pointer. When the channel is initially enabled, the descriptor controller can pre-fetch 2 buffer descriptors by issuing a 128 byte MRd request. The DC controller can periodically refill the descriptors when the difference between the write and read pointers is less than or equal to the value of the some thershold register value maintained. Since the DC controller can always fetch a block of 2 buffer descriptors, it must throw away descriptors that would overwrite the location at the current RPTR.

 

We can have two types of descriptors; data descriptors and jump descriptors. The descriptor controller will only write valid buffer descriptors to the cache. When the DC controller encounters a jump descriptor, it will load the jump descriptor into the DC and stop writing any additional descriptors into the cache, even if the next descriptor is valid. For this channel, the DC Controller will enter a jump pending state and will no longer refill buffer descriptors based on the difference between the write and read pointers.
Once the SGDMA has consumed all of the descriptors in the cache, the DC controller will issue a descriptor read at the address pointed by the jump descriptor.

 

If there are no descriptors available, the descriptor block should be terminated with an invalid descriptor. When the DC controller encounters an invalid descriptor during a descriptor block read, it will discard this invalid descriptor and anything after and enter into a resume pending state. Once all descriptors have been consumed, the DC controller will enter into a pending state until the system memory is refilled with valid descriptors and can set the resume bit in the register maintained to monitor the status of each PCIe Channel.

 

You can manage a Socket Manager register which represents both the channels.Hardware can periodically takes a snapshot of this registers to service both the channels. The channel’s direction is irrelevant to this operation.
• Channels to be serviced in ascending order.
• One request for each pending channel is serviced. 
• The next snapshot is taken after all pending channels have been serviced.
• Channels that do not have system memory buffer descriptors available is skipped.

 

By this approach the channels can be taken care separately and since out of order completions occurs only within a specific Memory read.You can associate the memory read with the specific channel by descriptor cache controller and receive completions of each memory read request of associate to the particular channel

 

Thanks,

Sethu 

0 Kudos
Xilinx Employee
Xilinx Employee
6,581 Views
Registered: ‎11-25-2015

Re: PCIe - Multi-Channel DMA - Completion Data handling

Hi Opal,

 

I would rather suggest the most tested/standard approach as per PCIe protocol is to go with Multiple Traffic Class enabled 

 

Xilinx Gen3 PCIe core also supports multiple TC's

 

By default only TC0 is enabled. The example design is TC0 only, it's not written to support multiple TCs

 

Still there are a few things you need to do to get this enabled for multiple TC’s

 

1) Set the TC/VC capability in the customization GUI. There's an "Enable VC Capability" option under Extd. Capabilities-2 tab in PCIe customization GUI.


2) In the Configuration Space, you now have PCIe Extended Capability - VC (offset 'h3C0). Your Rootport logic will need to program the Port VC Control and VC Resource Control Register 0 to have the TC/VC enabled and their mappings.


3) The example design at the Endpoint side will still do only for TC0. So you will need to add new code if you're going to start sending TLPs with non-0 TC. It should be a small modification to the existing example design (outside the IP) to put these 3-bits value in the TLP header.You probably will be editing the pio_tx_engine module. In there it forms the TLP header and you can find the TC field is hard coded to 0. You can add a small logic to send a different TC value there as needed

 

This approach is more feasilble than the above one mentioned

 

Thanks,

Sethu

0 Kudos
Adventurer
Adventurer
6,569 Views
Registered: ‎03-30-2012

Re: PCIe - Multi-Channel DMA - Completion Data handling

Hi Sethus,

 

Thank for the reply. As of now, Scatter/Gather is out of our scope as we only want to transfer two large blocks of data from the system memory. Block Streaming is more suitable.

 

Regarding TC/VC capability, is it possible to give more detail or sequence on how to do it. I guess this should be handle by the drivers right?

0 Kudos
Xilinx Employee
Xilinx Employee
6,528 Views
Registered: ‎11-25-2015

Re: PCIe - Multi-Channel DMA - Completion Data handling

Hi Opal,

 

Your RP Model has to send TLP with multiple traffic class. So you will be needing the driver related changes at top level too..

Once you have done that, your RP model can send multiple TC's to the Xilinx Endpoint 

 

In endpoint side, as i mentioned above you have take care of PCIe GUI, PCIe configuration space programming and an edit in pio_tx_engine module to make it respond for multiple TC's

 

Thanks,

Sethu

-----------------------------------------------------------------------------------------------

Please mark the post as "Accept as solution" if the information provided answers your query/resolves your issue.

Give Kudos to a post which you think is helpful.

0 Kudos
Xilinx Employee
Xilinx Employee
6,433 Views
Registered: ‎11-25-2015

Re: PCIe - Multi-Channel DMA - Completion Data handling

Hi @opal-rt_fpga_grp,

 

Are you moving ahead with multiple traffic class or scatter gather DMA ?

 

If so, did the TC/VC capability for multiple traffic class is enabled properly and works with Xilinx Endpoint 

 

Thanks,

Sethu

 

 

0 Kudos
Adventurer
Adventurer
6,418 Views
Registered: ‎03-30-2012

Re: PCIe - Multi-Channel DMA - Completion Data handling

Hi Sethus,

 

Thank for your reply.

 

No we did not try the TC/VC feature yet. We used the Tag field instead for our two channels DMA. The Tag is used for sorting the Completion Data so that data can be dispatched to the right RX channel.

 

Eventually we will increase the number of channel to 16. It is then that we will need to enable the TC/VC feature.

 

We know how to enable the TC/VC in the endpoint (see Fig 1 below). But, what about on the root port side. Is this feature enable by default or do we have to do something in the BIOS or the Root port configuration space?

 

Fig 1 : Enable TC/VC in Endpoint

TC_VC.png

0 Kudos
Xilinx Employee
Xilinx Employee
6,414 Views
Registered: ‎11-25-2015

Re: PCIe - Multi-Channel DMA - Completion Data handling

Hi @opal-rt_fpga_grp,

 

Your RP Model has to send TLP with multiple traffic class. So you will be needing the driver related changes at top level too..

Once you have done that, your RP model can send multiple TC's to the Xilinx Endpoint 

 

In endpoint side, as i mentioned above you have take care of PCIe GUI, PCIe configuration space programming and an edit in pio_tx_engine module to make it respond for multiple TC's

 

 

Thanks,

Sethu

-----------------------------------------------------------------------------------------------

Please mark the post as "Accept as solution" if the information provided answers your query/resolves your issue.

Give Kudos to a post which you think is helpful.

0 Kudos
Xilinx Employee
Xilinx Employee
6,268 Views
Registered: ‎11-25-2015

Re: PCIe - Multi-Channel DMA - Completion Data handling

Hi @opal-rt_fpga_grp

 

Have you enabled the TC/VC in the endpoint and RP side.Did my suggestions helped you with Multiple Traffic Class ?

 

Thanks,

Sethu

0 Kudos
Highlighted
Observer bdixon007
Observer
2,322 Views
Registered: ‎08-20-2014

Re: PCIe - Multi-Channel DMA - Completion Data handling

I have a Ultrascale PCIe EP and am having difficulty understanding the TCn/VC0  Traffic Class filtering approach.  I was hoping to implement VC0 and VC1 for PF0 or VC0 for PF0 and VC1 for PF1 ... but it seems only VC0 is available for  PF0 ... I do get the VC Extended Capability Structure but don't see how to take advantage of the Root Port I have that implements VC0  and VC1  since only VC0 is present.

 

Any help appreciated for

 

1. understanding what TCn/VC0 Traffic Class filtering achieves.

2. understanding how to run with  a root port supporting VC0 and VC1.

 

Thanks,  Bob.   

0 Kudos