UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Visitor masm
Visitor
1,341 Views
Registered: ‎08-02-2018

performance of the device (design under test) depends on the placement of the design, in virtex 7 board

Jump to solution

performance of the device (design under  test) is good when the placement of the design is exactly at the center, performance varies if the design is not at the center. ? (this can be clearly seen in the floor planning layout) of vivado 2017.1

my device (design under test) get the data and system_clock from another board.  data sampled on system_clock and will be given to my device(design under test).  my device has many clocks which will be derived from the sytem_clock. these derived clocks will be used internally. output data sent on DCLK.

im using verilog code to divide  sytem_clock to get derived clocks . because my device need dynamic clock based on the register values.

if there is small updates in the design, and re implement bit file. design may not exactly at the center. it may either go left corner or right corner.so performance is bad. when design at center performance is good. 

1) is there any way where we can make the design to sit always at center only? with affecting performance. do i need do any changes in the design?   

2) if i use p-block to lock the design at particularly at the center , does this have any impact on the performance?

3) ,mmcm/pll does not generate dynamic clock below 4.76Mhz according to data sheets of the mmcm/pll. but we need dynamic clock less than 1.5Mhz clock.

i'm new to FPGA ,  all suggestions are welcome. any documents to refer? or tutorials.. 

 

design_not_at_center.PNG
0 Kudos
1 Solution

Accepted Solutions
855 Views
Registered: ‎01-22-2015

Re: performance of the device (design under test) depends on the placement of the design, in virtex 7 board

Jump to solution

@masm

Thanks for the answers.

Let’s focus on the (DCLK, Data_out) interface.  We can talk about other things later.

You said:
     Data_out[3:0]  will be send on DCLK. …  at different DCLK rate so it needs to be dynamic clock
     …dynamic clock vary between 90khz to 24Mhz will be changed based the 4bit register value

Here are two approaches to the (DCLK, Data_out) interface that will keep things simple for you.

Approach#1:  Fix the frequency of DCLK at 24MHz.  That is, are you sure that DCLK needs to be dynamic?  

Approach#2:  Use the toggle technique, which allows DCLK to be dynamic.  The toggle technique can be used because DCLK(90khz to 24Mhz) is considered a slow clock.  If DCLK were a fast clock then you could not use the toggle technique. 

In order to use Approach#2, you need only ONE fast clock.  For example, in the VHDL that I sent you earlier, let’s use clk1=200MHz.  Next, set DIV1 so that tog1 will toggle at a frequency that is twice the frequency of DCLK.  For example, if DCLK=20MHz then use DIV1=5 to make tog1 toggle at a rate of 40MHz.  Then, write a process using the state-machine approach that looks something like the following:

P3: process(clk1, reset1)
begin
if rising_edge(clk1) then
if(reset1 = '1') then
next state = STATE-1
elsif(tog1 = '1') then --STATE-1: --set DCLK=0 --if transmit-trigger, TXRDY=1, then --place data on Data_out[3:0] --next state = STATE-2 --else --next state = STATE-1 --STATE-2: --set DCLK=1 --next state = STATE-1
end if;
end if;
end process P3;

In process P3, TXRDY is a signal that your code will set to 1 when data is ready to be transmitted on Data_out[3:0].  Process P3 also assumes that the interface allows the data lines to change when DCLK=0.  Note that tog1 and DCLK are signals - and not true clocks.  That is, you do not need to generate tog1 or DCLK with an MMCM and you do not need to write constraints for them.  In fact, you do not need to write constraints for the (DCLK, Data_out) interface – because this interface can be made to pass timing analysis “by design”.

Cheers,
Mark

 

0 Kudos
9 Replies
Xilinx Employee
Xilinx Employee
1,164 Views
Registered: ‎05-08-2012

Re: performance of the device (design under test) depends on the placement of the design, in virtex 7 board

Jump to solution

Hi @masm.

 

For questions 1 and 2, a pblock would be suggested to constrain the logic toward the center of the device. However depending on if you are connecting to physical resources such as I/O ports, PCIe ect., moving the logic away from the physical resources it needs to connect to would generally negatively affect performance.

Is there a reason why the logic needs to be in the center? Are there I/O ports placed in the X1Y1 clock region from the device view image?

I am not sure about the 3rd question, as the MMCM would need to adhere to its limitations.


-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------

---------------------------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
---------------------------------------------------------------------------------------------
1,156 Views
Registered: ‎09-17-2018

Re: performance of the device (design under test) depends on the placement of the design, in virtex 7 board

Jump to solution

If it isn't working as you describe,

Your timing constraints are incomplete (you have unconstrained paths that sometimes do not meet timing).

There is nothing magic here going on -- unconstrained paths will kill you later: fix them now!

l.e.o

 

1,125 Views
Registered: ‎01-22-2015

Re: performance of the device (design under test) depends on the placement of the design, in virtex 7 board

Jump to solution

@masm

Welcome to the Xilinx Forum!

     1) is there any way where we can make the design to sit always at center only?
The condition of “being in the center” should not be affecting performance (I assume you mean timing analysis) of your small design. See my answer to 3) for the likely cause.

     2) if i use p-block to lock the design at particularly at the center , does this have any impact on the performance?
In a small design (like yours) use of p-blocks should not be necessary to make the design pass timing analysis.

     3) mmcm/pll does not generate dynamic clock below 4.76Mhz according to data sheets of the mmcm/pll. but we need dynamic clock less than 1.5Mhz clock.
         ….. im using verilog code to divide sytem_clock to get derived clocks
Aha! Creating clocks in the FPGA fabric like this will almost always cause a design to fail timing analysis. You must strive to keep clocks in the clock tree. There are at least ways to do this and to create your slow 1.5MHz clock. One is to use the CLKOUT4_CASCADE feature of the MMCM that is described (briefly) in Table 3-7 of UG472. As described in the datasheet for the Virtex-7, the CLKOUT4_CASCADE feature allows the MMCM to produce a clock with frequency as low as 0.036MHz.  Another option for producing a slow clock is to create a counter (using Verilog) that toggles the enable pin on a BUFGCE. This method is described nicely by Avrum in <this> post.

So, please consider creating a proper slow clock using one of the two methods described – and then lets' see where things stand.

Cheers,
Mark

Highlighted
1,093 Views
Registered: ‎06-21-2017

Re: performance of the device (design under test) depends on the placement of the design, in virtex 7 board

Jump to solution

In addition to the excellent advise from markg@prosensing.com and @lowearthorbit, ask yourself if you really need a very slow clock or would you be better off using a faster system clock and generating a clock enable to the registers that need to be clocked more slowly.  It's a lot easier for the timing analyzer to work with a constant clock and it's fairly easy to adjust the rate of a clock enable.

1,044 Views
Registered: ‎01-22-2015

Re: performance of the device (design under test) depends on the placement of the design, in virtex 7 board

Jump to solution

@masm

I agree with the suggestion of @bruce_karaffa to use a toggle instead of a slow clock. This is an especially good idea since you want a “dynamic clock”, which I understand to mean a clock whose frequency you are changing dynamically.

The toggle technique is a truly beautiful thing and it looks like the following in VHDL (sorry - I don’t do Verilog):

signal clk1, tog1 : std_logic;
signal DIV1 : integer range 0 to 8 := 8; --divider for clk1 ..... P1: process(clk1) variable cnt : integer range 0 to 8; begin if rising_edge(clk1) then if(cnt = 0) then tog1 <= '1'; cnt := DIV1-1; else tog1 <= '0'; cnt := cnt - 1; end if; end if; end process P1; -- P2: process(clk1) begin if rising_edge(clk1) then if(tog1 = '1') then -- do something here.... end if; end if; end process P2;

Note how process, P1, creates a signal called tog1 (which is the toggle) by dividing down the clock called clk1.  It is important to note that tog1 is a signal and not a clock (ie. you don’t use constraints to define tog1 as a clock).  In process, P2, you can see how tog1 is used to make things happen at the rate of tog1 – even though the process is clocked at the rate of clk1.

There are other advantages of using a toggle:

  1. fewer constraints to write,
  2. they free up the clock tree for other uses, and
  3. they reduce the number of clock domains. That is, tog1 is a signal in the clk1 clock domain. So, you can pass signals from other processes clocked by clk1 into the process, P2, without using clock crossing circuits.
  4. you can easily and dynamically change the frequency of the toggle (eg. by changing the value of DIV1) 

Mark

Visitor masm
Visitor
1,027 Views
Registered: ‎08-02-2018

Re: performance of the device (design under test) depends on the placement of the design, in virtex 7 board

Jump to solution

Hi @marcb,

thank you all  @marcb@lowearthorbit ,markg@prosensing.com@bruce_karaffa

markg@prosensing.com , I  will implement the techniques you mentioned and i will get back with the results.

@marcb reply for your question.

Is there a reason why the logic needs to be in the center? Are there I/O ports placed in the X1Y1 clock region from the device view image?

i observed there are no performance degradation when the design is at the center,  with small design updates will lead the design to locate differently causes performance variations. so i was trying to fix the design at the center.  

yes i/o ports placed in x1y1, x1y2 and x0y6.  ( i'm are using HPC-FMC1 connector)

 

i will explain my h/w setup.  (figure block diagram design) EVAL board generated 9 bit input data and system_clock. which then travel through interface board  then reaches FPGA board. output data_output[3:0], DCLK will be captured using NI-PXI cards

(RTL design diagram) in the RTL design there are 2 fixed derived clocks (derived_clk1, derived_clk2) and one dynamic_clk.  dynamic clock vary between 90khz to 24Mhz will be changed based the  4bit register value. 

i constrained the all the i/o ports and  derived clocks.

1) how to constrain the dynamic_clock  ( which will be varied based on the register value)?

2)is it good idea to use a mmcm/pll to generate 18 different clock instead of using dynamic clock, so that i can constrain each derived clock? 

3) one more problem is mmcm/pll can generate only 8 different derived clocks only, do you have idea how to expand mmcm/pll to generate 18 different clocks?

4) i observed ,reduced the derived clocks in the design does not affect the performance. if derived clocks are more it will affect the performance. using different implementation techniques also improves the performance. how implementation techniques improves the performance? ex : balanced_ssl technology implementaion, spread SSL implementation.  how to decide which implementation technique will improve the performance? 

 

thank you

regards

masm

design1.PNG
RTL_design.PNG
0 Kudos
985 Views
Registered: ‎01-22-2015

Re: performance of the device (design under test) depends on the placement of the design, in virtex 7 board

Jump to solution

@masm

Before I can answer your new questions, I need more information about dynamic_clk:

  1. what are you using it for inside the FPGA?  
  2. why does it need to be dynamic (ie. to have changing frequency)?
  3. is it used to create DCLK or to create any clock that is sent out of the FPGA?
  4. is it used to clock registers that send any data out of the FPGA (for example, Data_out[3:0]?)?
  5. is it used to clock registers that capture any data coming into the FPGA (for example, Input_data[8:0])?

Mark

0 Kudos
Visitor masm
Visitor
910 Views
Registered: ‎08-02-2018

Re: performance of the device (design under test) depends on the placement of the design, in virtex 7 board

Jump to solution

Hi markg@prosensing.com 

  • what are you using it for inside the FPGA?                                                                                                                                                          
  • why does it need to be dynamic (ie. to have changing frequency)?                                                                                                                    
  • is it used to create DCLK or to create any clock that is sent out of the FPGA?                                                                                                
  • is it used to clock registers that send any data out of the FPGA (for example, Data_out[3:0]?
  • is it used to clock registers that capture any data coming into the FPGA (for example, Input_data[8:0])?

inside the RTL design 3 blocks are there , one of the block need this dynamic clock. so i created inside the FPGA. (1).

Data_out[3:0]  will be send on DCLK,  Data_out[3:0] needs to be sent at different DCLK rate so it needs to be dynamic clock and it will be used to create/manipulate DCLK .  (2,3,4)

dynamic clock is not used to capture any incoming data to FPGA . (5)

0 Kudos
856 Views
Registered: ‎01-22-2015

Re: performance of the device (design under test) depends on the placement of the design, in virtex 7 board

Jump to solution

@masm

Thanks for the answers.

Let’s focus on the (DCLK, Data_out) interface.  We can talk about other things later.

You said:
     Data_out[3:0]  will be send on DCLK. …  at different DCLK rate so it needs to be dynamic clock
     …dynamic clock vary between 90khz to 24Mhz will be changed based the 4bit register value

Here are two approaches to the (DCLK, Data_out) interface that will keep things simple for you.

Approach#1:  Fix the frequency of DCLK at 24MHz.  That is, are you sure that DCLK needs to be dynamic?  

Approach#2:  Use the toggle technique, which allows DCLK to be dynamic.  The toggle technique can be used because DCLK(90khz to 24Mhz) is considered a slow clock.  If DCLK were a fast clock then you could not use the toggle technique. 

In order to use Approach#2, you need only ONE fast clock.  For example, in the VHDL that I sent you earlier, let’s use clk1=200MHz.  Next, set DIV1 so that tog1 will toggle at a frequency that is twice the frequency of DCLK.  For example, if DCLK=20MHz then use DIV1=5 to make tog1 toggle at a rate of 40MHz.  Then, write a process using the state-machine approach that looks something like the following:

P3: process(clk1, reset1)
begin
if rising_edge(clk1) then
if(reset1 = '1') then
next state = STATE-1
elsif(tog1 = '1') then --STATE-1: --set DCLK=0 --if transmit-trigger, TXRDY=1, then --place data on Data_out[3:0] --next state = STATE-2 --else --next state = STATE-1 --STATE-2: --set DCLK=1 --next state = STATE-1
end if;
end if;
end process P3;

In process P3, TXRDY is a signal that your code will set to 1 when data is ready to be transmitted on Data_out[3:0].  Process P3 also assumes that the interface allows the data lines to change when DCLK=0.  Note that tog1 and DCLK are signals - and not true clocks.  That is, you do not need to generate tog1 or DCLK with an MMCM and you do not need to write constraints for them.  In fact, you do not need to write constraints for the (DCLK, Data_out) interface – because this interface can be made to pass timing analysis “by design”.

Cheers,
Mark

 

0 Kudos