cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
upside_down
Visitor
Visitor
467 Views
Registered: ‎11-06-2019

Synchronize a counter on CPU register with another counter on FPGA register using PCIe

We are planning to synchronize a counter on CPU with a register on FPGA (as precise as possible) to measure FPGA to CPU datapath latency. We have a Virtex Ultrascale+ (VU9P) device with a PCIe hard block (called PCIE4 block). We generated the IP in PCIe Gen3 x8 mode with 500MHz core clock frequency. The reference clock of the core is connected to clock coming through PCIe (connected to master device's reference clock).

To synchronize those registers, at the beginning we continuously send/write current value of counter at CPU to FPGA, we measure average interval between consecutive writes (on FPGA) and update the value at FPGA accordingly. Then we stop sending counter value from CPU to FPGA and start sending estimated CPU counter value together with data packets from FPGA to CPU.

Let DELTA be average time interval between two consecutive writes (from CPU to FPGA), Tcpu counter value at CPU register, Tfpga counter value at FPGA, Tcpu_est estimated counter value of CPU calculated at FPGA side, Fcpu and Ffpga are CPU and FPGA clock frequences (as far as I know both are generated from the same physical clock source).

We are assuming  Tcpu_est = (Tfpga + DELTA) * (Fcpu/Ffpga) and adding it to packet being sent to CPU. When we receive the packet at CPU side we calculate Tcpu - Tcpu_est(within the packet, calculated just before sending) to calculate latency.

  • Is it a reliable method?
  • Are there problems that should be avoided in this method?
  • How can we improve precision?
  • Are there better methods to do this task?

If the question is not clear you are always welcome to ask more about it.

0 Kudos
7 Replies
drjohnsmith
Teacher
Teacher
450 Views
Registered: ‎07-09-2009

May be with a lot of averaging , it might give you an indication of PCIe bus delay,

Your sending data from a CPU, 

      so how do you know what the "jitter" on the time data is sent is . 

       How do you know if the data is being sent in "packets" by DMA say , or individual atomic writes.

        How are you removing the PCIe overhead on first write ? 

             i.e if PCIe wirtes 8 words , with one over head, or 64 words with one over head, you will have very different word times.

Then dat acoming back to CPU,

   how are you going to account for the "jitter" in the CPU time stamping ?

      after all, it has to go through an interrupt that is variable time , depending what's happening,

        and if the CPU is throttling its CPU , interrupt times can vary even more.

 

And on top of this, you have the CPU doing background tasks, affecting latency , and the PCIe bus contention to cope with.

 

Really not certain if you can ever get a word to word latency / delay number,

 

The best that I see, is the time to send a relatively large block of data to  give a best data rate.

    Use the PCIe 100 MHz clock at the card to meassure the period between start and end of transfer.

 

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
upside_down
Visitor
Visitor
440 Views
Registered: ‎11-06-2019

Thanks for the comment!

Unfortunately, we can't predict the jitter, but we are sure that we are receiving data in the form of 2 words (64bit) MWr from CQ interface of the hard IP.

We are not using interrupt, we are using something similar to ring buffer. We send MWr TLP to a ring buffer memory and send head value (as another MWr) to a required field immediately after sending the packet. A CPU core which is isolated from other tasks is always monitoring the head value, when it changes it gets the packet and compares its counter value to the one sent from FPGA, records the difference and checks the head again.

0 Kudos
upside_down
Visitor
Visitor
439 Views
Registered: ‎11-06-2019

I am getting that 100MHz clock from PCIe bus and using it to generate the clock for the counter. But still not sure about if that is the same clock that CPU uses to generate its own internal clocks. How can I be sure about it?

0 Kudos
drjohnsmith
Teacher
Teacher
373 Views
Registered: ‎07-09-2009

the cpu may or might not have its internal clock synchronised to the 100 Mhz,

  But , the CPU PCIe has to be synchronised to the PCIe 100 MHz,

      that does not mean its on the 100 Mhz clock edge, just all PCIe transactions must be synchronized to the PCIe clock, even if they are at say 33 Mhz, or what ever.

    You change the load on the CPU and the latency will change

IMHO, There are just to many buffers / clock crossing bits between the CPU getting a signal to send a lump, and lump getting on to the PCIe bus for this measurement to mean much,

With a scope on the PCIe bus signals, you might be able to see the burst of data, 

    then you can measure the jitter on getting packets to the bus,

     

 

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
0 Kudos
markcurry
Scholar
Scholar
355 Views
Registered: ‎09-16-2009

Sounds like an interesting project.  I'm sure you'll learn a lot.  What sort of accuracy are you looking for?  Also what is your PCIE host device?

I'll just add some notes. 

  • PCIE was designed with high-bandwidth as a design goal.  Fixed latency was not, perhaps, even a tertiary concern.  In my experience there's a LOT of variation in time-of-flight for various PCIE transactions.  And these also vary a lot given (even minor changes) in the PCIE hosts.
  • Cache architectures on the Host side get in the way too.  Even if the address space you're writing to is marked "non-cacheable", the latency of the transfer will vary because of "other" things happening within the host cache.
  • Also don't forget the the 100MHz PCIE reference clock is nominally spread spectrum.  This will have effects on your jitter as well.

To be fair, without knowing all your requirements, I'd tend to design all the time-critical stuff on the FPGA, and just have the CPU read the results.  You have much more control this way.  It's not clear from your requirements what "events" you're trying to sync - but it's easiest if both "events" are on the same device (FPGA or CPU), and do any sort of synchronous activity all in that one place.

Regards,

Mark

avrumw
Expert
Expert
302 Views
Registered: ‎01-23-2009

There are several protocols that are already doing things like this - trying to synchronize the timebase between two independent time counters that are separated by a connection mechanism of long and (at least somewhat) variable latency. Most of these are over Ethernet (rather than PCIe), but the techniques they use should be adaptable. For example:

  • The precision time protocol (IEEE-1538)
  • The Ethernet SOAM (ITU-T Y.1731) ETH-DMM messages

I know that these generally work by exchanging messages back and forth, inserting local timebase timestamps in the messages as close to the link as possible

  • Initiator sends message with its a timestamp from its local clock inserted as the last step before transmitting it
  • Receiver records its the time of reception of the message with its local clock as soon as the message is received.
  • The receiver then creates a reply message which contains the above two time stamps and inserts a third timestamp with its local clock into this message as the last step before transmitting it
  • The initiator records its local clock as soon as the reply message has been received.

With these 4 timestamps, the initiator can determine the skew between the two local timebases as well as the message latency - I don't remember how this is communicated to the receiver... But all of this is documented in these specs.

Avrum

drjohnsmith
Teacher
Teacher
217 Views
Registered: ‎07-09-2009

if I remember, IEEE 1538 is down to about 10 ns absolute accuracy, though the jitter is much less,

    

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
0 Kudos