06-08-2021 05:19 AM - edited 06-08-2021 05:23 AM
We are planning to synchronize a counter on CPU with a register on FPGA (as precise as possible) to measure FPGA to CPU datapath latency. We have a Virtex Ultrascale+ (VU9P) device with a PCIe hard block (called PCIE4 block). We generated the IP in PCIe Gen3 x8 mode with 500MHz core clock frequency. The reference clock of the core is connected to clock coming through PCIe (connected to master device's reference clock).
To synchronize those registers, at the beginning we continuously send/write current value of counter at CPU to FPGA, we measure average interval between consecutive writes (on FPGA) and update the value at FPGA accordingly. Then we stop sending counter value from CPU to FPGA and start sending estimated CPU counter value together with data packets from FPGA to CPU.
Let DELTA be average time interval between two consecutive writes (from CPU to FPGA), Tcpu counter value at CPU register, Tfpga counter value at FPGA, Tcpu_est estimated counter value of CPU calculated at FPGA side, Fcpu and Ffpga are CPU and FPGA clock frequences (as far as I know both are generated from the same physical clock source).
We are assuming Tcpu_est = (Tfpga + DELTA) * (Fcpu/Ffpga) and adding it to packet being sent to CPU. When we receive the packet at CPU side we calculate Tcpu - Tcpu_est(within the packet, calculated just before sending) to calculate latency.
If the question is not clear you are always welcome to ask more about it.
06-08-2021 05:34 AM
May be with a lot of averaging , it might give you an indication of PCIe bus delay,
Your sending data from a CPU,
so how do you know what the "jitter" on the time data is sent is .
How do you know if the data is being sent in "packets" by DMA say , or individual atomic writes.
How are you removing the PCIe overhead on first write ?
i.e if PCIe wirtes 8 words , with one over head, or 64 words with one over head, you will have very different word times.
Then dat acoming back to CPU,
how are you going to account for the "jitter" in the CPU time stamping ?
after all, it has to go through an interrupt that is variable time , depending what's happening,
and if the CPU is throttling its CPU , interrupt times can vary even more.
And on top of this, you have the CPU doing background tasks, affecting latency , and the PCIe bus contention to cope with.
Really not certain if you can ever get a word to word latency / delay number,
The best that I see, is the time to send a relatively large block of data to give a best data rate.
Use the PCIe 100 MHz clock at the card to meassure the period between start and end of transfer.
06-08-2021 06:09 AM
Thanks for the comment!
Unfortunately, we can't predict the jitter, but we are sure that we are receiving data in the form of 2 words (64bit) MWr from CQ interface of the hard IP.
We are not using interrupt, we are using something similar to ring buffer. We send MWr TLP to a ring buffer memory and send head value (as another MWr) to a required field immediately after sending the packet. A CPU core which is isolated from other tasks is always monitoring the head value, when it changes it gets the packet and compares its counter value to the one sent from FPGA, records the difference and checks the head again.
06-08-2021 06:12 AM
I am getting that 100MHz clock from PCIe bus and using it to generate the clock for the counter. But still not sure about if that is the same clock that CPU uses to generate its own internal clocks. How can I be sure about it?
06-08-2021 08:01 AM - edited 06-08-2021 08:02 AM
the cpu may or might not have its internal clock synchronised to the 100 Mhz,
But , the CPU PCIe has to be synchronised to the PCIe 100 MHz,
that does not mean its on the 100 Mhz clock edge, just all PCIe transactions must be synchronized to the PCIe clock, even if they are at say 33 Mhz, or what ever.
You change the load on the CPU and the latency will change
IMHO, There are just to many buffers / clock crossing bits between the CPU getting a signal to send a lump, and lump getting on to the PCIe bus for this measurement to mean much,
With a scope on the PCIe bus signals, you might be able to see the burst of data,
then you can measure the jitter on getting packets to the bus,
06-08-2021 08:46 AM
Sounds like an interesting project. I'm sure you'll learn a lot. What sort of accuracy are you looking for? Also what is your PCIE host device?
I'll just add some notes.
To be fair, without knowing all your requirements, I'd tend to design all the time-critical stuff on the FPGA, and just have the CPU read the results. You have much more control this way. It's not clear from your requirements what "events" you're trying to sync - but it's easiest if both "events" are on the same device (FPGA or CPU), and do any sort of synchronous activity all in that one place.
06-08-2021 01:30 PM - edited 06-08-2021 01:30 PM
There are several protocols that are already doing things like this - trying to synchronize the timebase between two independent time counters that are separated by a connection mechanism of long and (at least somewhat) variable latency. Most of these are over Ethernet (rather than PCIe), but the techniques they use should be adaptable. For example:
I know that these generally work by exchanging messages back and forth, inserting local timebase timestamps in the messages as close to the link as possible
With these 4 timestamps, the initiator can determine the skew between the two local timebases as well as the message latency - I don't remember how this is communicated to the receiver... But all of this is documented in these specs.
06-09-2021 02:58 AM
if I remember, IEEE 1538 is down to about 10 ns absolute accuracy, though the jitter is much less,