cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Voyager
Voyager
1,546 Views
Registered: ‎05-14-2017

PCIe Gen3 TLP format for PG023 differnce compare with PG054 or PCIe spec revision 3.0

The TLP format on page106 for PG023 is difference compare with the PCIe spec Revision 3.0.

Does anyone know if page 106 really represent something besides the PCIe TLP format?

 

The other question I have is that I have design the PCIe block for a kintex FPGA using Gen 2.0 IP and I am porting the design to a Virtex 7. But Virtex PCIe is a Gen 3.0 and it seem like the interface is a lot different in term of the signaling and it actually has 4 AXI4 bus instead of 2 AXI bus, this is very strange why such a big change; and now I notices the TLP format is also different..

 

Does anyone know why there is such a big different

 

 

0 Kudos
8 Replies
Highlighted
Xilinx Employee
Xilinx Employee
1,527 Views
Registered: ‎12-10-2013

Hi @tchin123,

 

You are correct.  The Integrated Blocks between the Gen2 family and Gen3 families were significantly redesigned, including a change to the user interface.  There were a very large number of reasons for the difference in format and user interface changes, but it was primarily done on customer feedback.

 

In the Gen2 cores (PG054 - 7-series Integrated Block) the user needed to have full understanding of PCIe and keep track of tags, timeouts, Request / Completer IDs and other.   As we moved towards Gen3 and SR-IOV (among other more advanced capabilities) much more of the tracking and configuration logic was moved inside the block.   To minimize the amount of data a user had to present at the AXI-S interfaces, we went to a custom "Descriptor" format.   Many customers found these descriptors desirable at least partially because the address is the first decode on the bus, and allowed them to do better routing before parsing further into packets.  (You'll notice the descriptor format is more intuitive for backend routing than a traditional PCIe TLP).

 

As for the 4 user interfaces, what most folks found on migration (circa 2013) was that they were already splitting the 7-series TX/RX streams into two types of traffic, and were actually able to take out logic to hook-up to the 4 interfaces.   The interfaces are now presented as TX/RX sets based on initiator of transaction, rather than mixed TX/RX schemes.

 

While these are described (with migration guide) in PG023 and PG156, here are the basics:

 

CQ/CC - The CQ (completer request) is a Receive interface where incoming PCIe requests from the link partner are presented.  For non-posted packets (reads, a few others), the user gets the request incoming on the CQ, and responds with a Completion on the CC (Completer Completion) interface (a transmit queue).   If the core is used as an endpoint - these are incoming requests from the root, and outgoing completions.

 

RQ/RC - The RQ (Requester Request) is the transmit interface where the user initiates requests to go to the link partner.  The Completions are received back from the link on the (RC) interface. 

 

The majority of folks were already splitting up their traffic in this way, and were having to decode packets on the RX side and interleave on the TX side.  Much of this is now handled internally to the core, and the core can provide help in areas like tracking completion timeouts, TAG matching, PCIe ordering rules, and more.

-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
Highlighted
Voyager
Voyager
1,519 Views
Registered: ‎05-14-2017

thanks for the reply, its seem like there is a lot of changes that I need to perform on the Gen2 design to make it compatible in the Virtex7 Gen3 design.

 

But does the figure 4-8 on PG023 suppose to represent the TLP format? This is the data on the CC_TDATA bus.

This doesn't even resemble the TLP format from the PCIe Spec Gen3.0.

My original Gen2.0 design was keying off the first 3 DW header for decoding all the necessary fields and determining how to handle various type of TLP, but now it seem like the first 3 DW is no longer in the field  ordering and some of the fields are missing. I also noticed that some of the header field are presented in the cq_tuser instead, is this correct?

 

Isn't it a lot more intuitive to decode the first 3-4 DW of the header as is according to the PCIe spec and proceed to the handle the payload afterward.. This is how I handle my Kintex Gen2.0 design. It seem like now it will take a lot of massaging to port it to the virtex fpga.

 

 

Highlighted
Xilinx Employee
Xilinx Employee
1,510 Views
Registered: ‎08-02-2007

 

it does not support the format in the Spec

Yes cq_tuser will have some of the infomation in the old format

The new devices with Gen3 IP will all use the new format

 

 

------------------------------------------------------------------------------
Don't forget to reply, give kudo and accept as solution
------------------------------------------------------------------------------
Highlighted
Voyager
Voyager
1,486 Views
Registered: ‎05-14-2017

Since I'm just started to find out about the different between using Gen2 and Gen3 PCIe core for FPGA design and not familiar with all the difference yet. 

 

Is it very messy to port the existing Gen2 PCIe design over to the virtex7 which uses the gen3.0 IP or just start from scratch with the Gen3.0 IP design?

 

 

 

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
1,470 Views
Registered: ‎12-10-2013

We've had folks go successfully both directions.

 

If you were already compiling all the fields in the header, then concatenating - you can just re-assign the ones needed to the spots in tdata and tuser.  The community here might be able to provide more of a feel for user implementations.  For the Root Port model provided with the endpoint example design usrapps, you can see that much of the code was just a shim to what was pre-existing.

 

Ultimately most folks end up finding the Gen3 interface easier to use once the initial port is done. 

-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
Highlighted
Visitor
Visitor
416 Views
Registered: ‎05-21-2012

I know this thread is old, but I am replying anyway, sharing in the frustration.

Count me in the camp of preferring the Gen2 TLP format, which matches the PCIe spec (and is compatible with other Vendors).  Way more intuitive and if one is slinging TLPs, one should be intimate with the spec anyhow.  Splitting out the master and slave req/cpl interfaces is neat, but not at the expense of mangling a well defined interface.  Talk about forcing vendor lock in and lots of rework to adapt previous designs to the Gen3 HIP.

My biggest gripe with Gen3 format right now is the lack of an equivalent field indicating 3DW or 4DW request packet format (32b/64b addressing/bars, PCIe spec, byte 0 bits 6:5).

0 Kudos
Highlighted
Explorer
Explorer
350 Views
Registered: ‎08-14-2013

I have some gripes with the descriptor format as well.  Transferring the address first is incredibly annoying as with a 64 bit interface you have to store the first cycle before you can see any of the other fields, namely the function number and BAR fields, making a simple routing shim that splits traffic based on the BAR or function higher complexity and higher latency.  These really should have been accessible in the first cycle.  But I suppose it's not that big of a deal because you get the whole descriptor in the same cycle once you move to a 128 bit or wider interface. 

Why would you need to know 3DW vs 4DW?  Can't you just check if all of the high-order address bits are zero, if you really need to know that?  Personally, that's one of the biggest advantages of the descriptor format - you don't have to shift the payload by different amounts, depending on whether the address is 32 or 64 bits. 

0 Kudos
Highlighted
Visitor
Visitor
319 Views
Registered: ‎05-21-2012

I, too, am currently working with the 64-bit interface, as I am porting fairly vendor agnostic EP design over to a Xilinx device with the Gen3 HIP.  I basically just had to develop adapter modules to convert the CQ and CC descriptors to my more generic TLP format.

Given my investment in the standard TLP format, that is the reason the 2 missing bits (along with the type field would tell me 3DW or 4DW) and any others annoy me.  The "new" descriptor format simplifies some things at the expense of knowing everything available in a standard TLP.  I'd rather check 7 bits (fmt/type) for 3DW/4DW versus 32-bits of the upper address.  Like you mentioned, with the 64-bit descriptor you don't know anything other than address until the 2nd beat of the transfer.  Dword shifting based on 3DW or 4DW format is certainly annoying as well, but fairly easy to deal with.  A 64-bit address is pretty useless imho, unless you happen to actually need 64-bits of address wherever you are sending data to or fetching data from, which I imagine is still less common than 32-bit address/BARs.  So always including it technically adds an additional DWord to every transfer, whether actually needed or not.

Clock rate and timing closure may force me to 128-bit interface in the future, but that really only gives me the same incomplete info on the first beat of the transfer.

The biggest issue with the new descriptor format is that is affects design portability and reuse through vendor lock in.  As engineers using good design practices, we should all be opposed to this.

 

0 Kudos