cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Explorer
Explorer
341 Views
Registered: ‎04-21-2017

Minimizing pin to pin latency in a Xilinx device

Jump to solution

Dear Forum,

Something of a academic inquiry, but can anyone advise re the following.?

I need to capture data on a 16-bit LVDS data bus, XOR this new value with the previous value from the same bus (captured on the previous clock cycle), and then present the result of this XOR operation on a second LVDS bus as fast as possible.

What recommendations can anyone make as to how I minimize the "Pin-2-Pin" delay of this entire operation? For example, what I/O standards and settings can I implement that will minimize the on-chip and off-chip I/O delays? Further, are there certain Xilinx families that whilst they might be older, have simpler I/O cells that actually offer faster basic I/O operation.?

Regards,

DJE666

0 Kudos
1 Solution

Accepted Solutions
Highlighted
197 Views
Registered: ‎01-22-2015

The following circuit (which I mentioned in my first reply),

BIT_IN > IBUFDS > REG1 > REG 2 > LUT1 > REG3 > OBUFDS > BIT_OUT

has the following delays:

  1. BIT_IN > IBUFDS:  delay is fixed since components are locked in place with dedicated routes between
  2. IBUFDS > REG1:  delay can be controlled by you
  3. REG1 > REG 2: delay is fixed and equal to the period of CLK1
  4. REG 2 > LUT1 > REG3: delay is fixed and equal to the period of CLK1
  5. REG3 > OBUFDS: delay can be controlled by you
  6. OBUFDS > BIT_OUT: delay is fixed since components are locked in place with dedicated routes between

So, only the delays described in 2) and 5) can be controlled by you.  These two delays can be made as short as possible by doing what is called “locking a register into the IOB”. 

Each IO-pin of the FPGA has an IO-Block (IOB) and each IOB contains a small group of circuits, including a register.  Circuits in the IOB have short and dedicated routes to the IO-buffers and to the associated FPGA pin. 

So, “locking a register into the IOB” means writing a constraint that forces a register in your design to be placed in the IOB (instead of in the general fabric of the FPGA).  The constraints used to lock REG1 and REG3 into the IOB would look something like the following (ref UG912(v2019.2), page 244).

set_property IOB TRUE [get_ports BIT_IN];   # this will cause REG1 to be placed in the IOB near BIT_IN port
set_property IOB TRUE [get_ports BIT_OUT];  # this will cause REG3 to be placed in the IOB near BIT_OUT port


-and that’s all you can do to minimize delay of the signal going from BIT_IN to BIT_OUT.

View solution in original post

7 Replies
Highlighted
296 Views
Registered: ‎01-22-2015

@dje666 

The signal on each bit of your 16-bit bus will go through the following circuit:

BIT_IN > IBUFDS > REG1 > REG 2 > LUT1 > REG3 > OBUFDS > BIT_OUT

The three registers, (REG1, REG2, REG3) result from VHDL that looks like the following:

    P0: process(CLK1)            
        begin    
            if rising_edge(CLK1) then  
               REG3 <= REG1 XOR REG2;
REG2 <= REG1; REG1 <= BIT_IN; end if; end process P0;

So, latency between incoming bits and outgoing bits consists mostly of clocking bits through the three registers.

Am I understanding the problem correctly?

Mark

0 Kudos
Highlighted
Explorer
Explorer
264 Views
Registered: ‎04-21-2017

Hi Mark,

Thanks for the reply. Your understanding is correct, thanks for the code snippet.

I coded the system differently to limit the register count to just 1. Not elegant, but the key here is to minimize the time taken on-boarding and off-boarding of the data through the I/O buffers, what happens in the logic is a secondary consideration at the moment.

Thanks,

DJE666

 

xor-sch.png
0 Kudos
Highlighted
245 Views
Registered: ‎01-22-2015

@dje666 

..the key here is to minimize the time taken on-boarding and off-boarding of the data through the I/O buffers..

We must first talk about your design, which will produce glitches on the outgoing data.  Specifically, capturing the incoming bit with a LUT and directly sending the output of the LUT to BIT_OUT will cause glitches.

I do not think your design can have less than the three registers that I showed in my first reply.

That is, you first need an architecture that will properly capture (in REG1) the incoming bit and pass timing analysis.  For you, this architecture is probably a source synchronous interface.  Then, you can send the captured bit to register, REG2, while you capture another bit in REG1.  Then, you can use the LUT to do the XOR and pass the XOR-result to a third register, REG3.  Finally, REG3 sends the bit out of the FPGA.  

If you agree with my assessment and you still want to minimize some routing delays (between registers and IO buffers), then I can help you further.  -just ask.

Mark

0 Kudos
Highlighted
Explorer
Explorer
236 Views
Registered: ‎04-21-2017

Hi Mark,

Please do show me how you can minimize the delays through your design.

 

Thanks,

DJE666

0 Kudos
Highlighted
198 Views
Registered: ‎01-22-2015

The following circuit (which I mentioned in my first reply),

BIT_IN > IBUFDS > REG1 > REG 2 > LUT1 > REG3 > OBUFDS > BIT_OUT

has the following delays:

  1. BIT_IN > IBUFDS:  delay is fixed since components are locked in place with dedicated routes between
  2. IBUFDS > REG1:  delay can be controlled by you
  3. REG1 > REG 2: delay is fixed and equal to the period of CLK1
  4. REG 2 > LUT1 > REG3: delay is fixed and equal to the period of CLK1
  5. REG3 > OBUFDS: delay can be controlled by you
  6. OBUFDS > BIT_OUT: delay is fixed since components are locked in place with dedicated routes between

So, only the delays described in 2) and 5) can be controlled by you.  These two delays can be made as short as possible by doing what is called “locking a register into the IOB”. 

Each IO-pin of the FPGA has an IO-Block (IOB) and each IOB contains a small group of circuits, including a register.  Circuits in the IOB have short and dedicated routes to the IO-buffers and to the associated FPGA pin. 

So, “locking a register into the IOB” means writing a constraint that forces a register in your design to be placed in the IOB (instead of in the general fabric of the FPGA).  The constraints used to lock REG1 and REG3 into the IOB would look something like the following (ref UG912(v2019.2), page 244).

set_property IOB TRUE [get_ports BIT_IN];   # this will cause REG1 to be placed in the IOB near BIT_IN port
set_property IOB TRUE [get_ports BIT_OUT];  # this will cause REG3 to be placed in the IOB near BIT_OUT port


-and that’s all you can do to minimize delay of the signal going from BIT_IN to BIT_OUT.

View solution in original post

Highlighted
Explorer
Explorer
176 Views
Registered: ‎04-21-2017

Hi Mark,

Thanks for this clear explanation.

To confirm, it's not possible that different I/O standards will have better or worse On-chip or Off-chip delays.?

 

Regards,

DJE666

 

0 Kudos
Highlighted
158 Views
Registered: ‎01-22-2015

@dje666 

To confirm, it's not possible that different I/O standards will have better or worse On-chip or Off-chip delays.?
Different IO standards will have different switching-speeds/delays as specified in the datasheet for your FPGA.  For example, Tables 20 and 21 in document, DS182(v2.18), show the specifications for IO in a Kintex-7.  To help make sense of these specifications, see Figs 1-1 thru 1-4 in UG471(v1.10).  Speed/delay of the different IO also depend on how you load the IO as described on page 28 of DS182.  Use of IBIS simulation tools can help you study the effect of loading on speed/delay of FPGA IO.

However …..

In your design, you probably have a register (let’s call it REG0) that is launching data into the BIT_IN pin of the FPGA.  Further, you probably have a register (let’s call it REG4) that is receiving data from the BIT_OUT pin of the FPGA.  These two registers, and the three (REG1, REG2, REG3) inside the FPGA are probably all clocked by clock, CLK1.  So, assuming all the register-to-register paths pass timing analysis, the signal transfer-time between any two consecutive registers is equal to the period of CLK1.  That is, the signal transfer-time from REG0 > REG1 and from REG3 > REG4 is the period of CLK – and does not depend on the speed/delay of the IO standard used at the FPGA pins.

So, maybe your original question should be rephrased as, “What is the minimum transmitter-to-receiver latency for my system?”.  -and the answer is, “The time it takes to clock the data through the 5 registers (REG0 – thru – REG4) that we discussed”.

Tags (2)
0 Kudos