cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Observer
Observer
33,269 Views
Registered: ‎05-02-2013

Clock, DSP Slice Register, Cascaded signals

Jump to solution

Hey community,

 

I have some questions about the VC707 (XC7VX485T).

 

First, I just found an differential clock in the ucf file. How do I get the regular system clock of e.g. 100 MHz though? Somehow this questions sounds so simple to me but I can not figure out the answer.

 

Second, I want to use DSP slices for a filter design. One DSP slice should perform just a multiplication. In the user guide for the DSP slice it is suggested:

"For non-multiplier-based designs, a twostage pipeline should be used. If latency is important in the design and only one or two registers can be used within the DSP48E1 slice, always use the M register."

However, when I chose the P,M and A (or rather in the wizard: Tier 4,5,6) the synthesis report says the maximum frequency is 984 MHz. When I just chose P and M (wizard: Tier 4,5) i.e. right before and after the multiplier the synthesis report says the maximum frequency is 496 MHz. I do not understand why since the multiplier should be pipelined with Tier 4 and 5 registers anyway. I attached 2 screenshots of the Wizard to illustrate the problem. The next DSP slice feeded with this product is fully pipelined.

 

Third, using the cascaded signals has other advantages than just skipping to use regular routing?  Is this faster than using the regular output and wire it to the next DSP slice? I assume also that this signal is not accessable.

 

Thanks

 

C

DSP.JPG
0 Kudos
Reply
1 Solution

Accepted Solutions
Mentor
Mentor
35,791 Views
Registered: ‎11-29-2007

"For non-multiplier-based designs, a twostage pipeline should be used. If latency is important in the design and only one or two registers can be used within the DSP48E1 slice, always use the M register."

However, when I chose the P,M and A (or rather in the wizard: Tier 4,5,6) the synthesis report says the maximum frequency is 984 MHz. When I just chose P and M (wizard: Tier 4,5) i.e. right before and after the multiplier the synthesis report says the maximum frequency is 496 MHz. I do not understand why since the multiplier should be pipelined with Tier 4 and 5 registers anyway. I attached 2 screenshots of the Wizard to illustrate the problem. The next DSP slice feeded with this product is fully pipelined.


That's because the multiplier produces two partial products which have to be added in order to compute the final result. This addition happens in the three-input adder, which is not fully pipelined if the P register is not used.



Please google your question before asking it.
If someone answers your question, mark the post with "Accept as solution". If you see a particularly good and informative post, consider giving it Kudos (the star on the left).

View solution in original post

0 Kudos
Reply
13 Replies
Scholar
Scholar
33,262 Views
Registered: ‎09-16-2009

 


e-gore wrote:

 

First, I just found an differential clock in the ucf file. How do I get the regular system clock of e.g. 100 MHz though? Somehow this questions sounds so simple to me but I can not figure out the answer.



Instanciate an IBUFDS to convert the differential signal to single ended.

Constraint the period on the positive polarity pin in the UCF (or XDC).

 


@e-gore wrote:
 

Second, I want to use DSP slices for a filter design. One DSP slice should perform just a multiplication. In the user guide for the DSP slice it is suggested:

"For non-multiplier-based designs, a twostage pipeline should be used. If latency is important in the design and only one or two registers can be used within the DSP48E1 slice, always use the M register."

However, when I chose the P,M and A (or rather in the wizard: Tier 4,5,6) the synthesis report says the maximum frequency is 984 MHz. When I just chose P and M (wizard: Tier 4,5) i.e. right before and after the multiplier the synthesis report says the maximum frequency is 496 MHz. I do not understand why since the multiplier should be pipelined with Tier 4 and 5 registers anyway. I attached 2 screenshots of the Wizard to illustrate the problem. The next DSP slice feeded with this product is fully pipelined.




Doesn't sound quite right, but really does this matter? 

1 - it's an estimate.  I tend to pretty much ignore these early estimates, and just trust the final P&R timing reports.

2  The DSP48s are able to run much faster than anything around it.  As an analogy - you're comparing which car can go faster - car A has a speedometer that max's out 140.  Car B's speedometer max's out at 160.  So Car B must be faster.  In reality you're going to never get anywhere near either of those rates (at least not without involving heroric managing of the entire area of the design.)

 


@e-gore wrote:

Third, using the cascaded signals has other advantages than just skipping to use regular routing?  Is this faster than using the regular output and wire it to the next DSP slice? I assume also that this signal is not accessable.

 

It will maximize the operating frequency of the datapath.  But more important, using the cascaded carry logic will free routing resources around that logic for other things.  It'll allow a more predicatable QoR result as well.

 

Regards,

 

Mark

 

0 Kudos
Reply
Mentor
Mentor
35,792 Views
Registered: ‎11-29-2007

"For non-multiplier-based designs, a twostage pipeline should be used. If latency is important in the design and only one or two registers can be used within the DSP48E1 slice, always use the M register."

However, when I chose the P,M and A (or rather in the wizard: Tier 4,5,6) the synthesis report says the maximum frequency is 984 MHz. When I just chose P and M (wizard: Tier 4,5) i.e. right before and after the multiplier the synthesis report says the maximum frequency is 496 MHz. I do not understand why since the multiplier should be pipelined with Tier 4 and 5 registers anyway. I attached 2 screenshots of the Wizard to illustrate the problem. The next DSP slice feeded with this product is fully pipelined.


That's because the multiplier produces two partial products which have to be added in order to compute the final result. This addition happens in the three-input adder, which is not fully pipelined if the P register is not used.



Please google your question before asking it.
If someone answers your question, mark the post with "Accept as solution". If you see a particularly good and informative post, consider giving it Kudos (the star on the left).

View solution in original post

0 Kudos
Reply
Observer
Observer
33,245 Views
Registered: ‎05-02-2013

Thanks for answering.

 

 

 

@markcurry wrote:

Instanciate an IBUFDS to convert the differential signal to single ended.

Constraint the period on the positive polarity pin in the UCF (or XDC).


Is that what you mean? I found it in a Virtex 6 Libraries Guide:
VHDL Instantiation Template
Unless they already exist, copy the following two statements and paste them before the entity declaration.
Library UNISIM;
use UNISIM.vcomponents.all;
-- IBUFDS: Differential Input Buffer
-- Virtex-6
-- Xilinx HDL Libraries Guide, version 12.3
IBUFDS_inst : IBUFDS
generic map (
DIFF_TERM => FALSE, -- Differential Termination
IBUF_LOW_PWR => TRUE, -- Low power (TRUE) vs. performance (FALSE) setting for refernced I/O standards
IOSTANDARD => "DEFAULT")
port map (
O => O, -- Buffer output
I => I, -- Diff_p buffer input (connect directly to top-level port)
IB => IB -- Diff_n buffer input (connect directly to top-level port)
);
-- End of IBUFDS_inst instantiation

@markcurry wrote:
Doesn't sound quite right, but really does this matter? 

1 - it's an estimate.  I tend to pretty much ignore these early estimates, and just trust the final P&R timing reports.

2  The DSP48s are able to run much faster than anything around it.  As an analogy - you're comparing which car can go faster - car A has a speedometer that max's out 140.  Car B's speedometer max's out at 160.  So Car B must be faster.  In reality you're going to never get anywhere near either of those rates (at least not without involving heroric managing of the entire area of the design.)



What do you mean by "In reality you're going to never get anywhere near either of those rates (at least not without involving heroric managing of the entire area of the design.)" ? Why shouldn't it be possible to have a time domain at high speed in the fpga containing DSP slices?

@markcurry wrote: 

It will maximize the operating frequency of the datapath.  But more important, using the cascaded carry logic will free routing resources around that logic for other things.  It'll allow a more predicatable QoR result as well.

Regards,

Mark



Does it also skip a register i.e. do I save a clock cycle?

 

0 Kudos
Reply
Scholar
Scholar
33,242 Views
Registered: ‎09-16-2009

<snip IBUFDS template>

 

Yes use that IBUFDS.

 


@e-gore wrote:

What do you mean by "In reality you're going to never get anywhere near either of those rates (at least not without involving heroric managing of the entire area of the design.)" ? Why shouldn't it be possible to have a time domain at high speed in the fpga containing DSP slices?



In reality, you're design data needs to 1. come FROM somewhere.  2.  Go to somewhere.  Without great difficulty, you're not going to achieve anywhere near those MAX DSP48 frequencies for the "other" logic.  Often for ease-of-design you're going to want to minimize the number of clocks in a design.  So whatever clock you use for the DSP48 may also be clocking unrelated RAMS, state machine, and random logic too.  That's not going to run anyway near those DSP48 speeds.

ALU blocks like these are some of the larger complexity blocks that FPGAs companies can design that achieve ASIC like performance, yet still remain generic enough to use in an FPGA.  And it's not terriblly difficult for the FPGA companies to do so - ALU design is a fairly well defined field.  So they can fairly easily design high performance ALUs, and drop them into the product.  For most use-cases you're not going to approach those limits in frequency - you'll have critical paths elsewhere that limit things.  I'm oversimplifying, but you get the idea.

As an example - virtex6 -2 Speed grade max DSP48 Frequency with everything pipelined is 540 MHz.  My filters are designed fully pipelined.  My nominal frequency is 165 MHz - we bumped it up to 200 MHz at one point but were having timing closure problems (not in the DSP48s but elsewhere).  Could we have optimized - sure?  But it get's harder and harder for less and less return, and I wouldn't bet on achieving anything over 250 MHz without great difficulty.

That's less than 50% of the DSP48's full bandwidth.

 



@markcurry wrote: 

It will maximize the operating frequency of the datapath.  But more important, using the cascaded carry logic will free routing resources around that logic for other things.  It'll allow a more predicatable QoR result as well.

Regards,

Mark



Does it also skip a register i.e. do I save a clock cycle?



I'm not sure what you mean here - what's your reference?  In any event, forget the wizard.  Look at the block diagrams in UG479.  Figure 2-1 shows where the Carry paths are sourced from and where they go to.

 

Regards,

 

Mark

0 Kudos
Reply
Observer
Observer
33,219 Views
Registered: ‎05-02-2013

Thank you for your answers.

I created the IBUFDS component and instantiated it. However, I am getting an error. I think the reason is my MMCME I am using to create the clock for my DSP slices. 

 

"

NgdBuild:770 - IBUFDS 'computing/IBUFDS_inst' and IBUFG
'computing/manage_clk/clkin1_buf' on net 'computing/slow_clk' are lined up in
series. Buffers of the same direction cannot be placed in series."

 

So am I right when I am now just feed the MMCME with the differential pair SYSCLK_P and SYSCLK_N and create the regular and the DSP clock and not instantiate the IBUFDS? 

0 Kudos
Reply
Observer
Observer
33,211 Views
Registered: ‎05-02-2013

Report: I created the MMCME with the Wizard and fed it with the SYSCLK pairs. Parallel to that I fed the IBUFDS with those to generate my main clk and everything seems good so far. There is a thing coming up though.

 

When I create a testbench of the toplevel it never assignes the clk process correctly. Is there a way to make it understand the differntial system or do I have to create a fake clk with the same properties for testbenching the system?

 

 

 

 

0 Kudos
Reply
Scholar
Scholar
33,200 Views
Registered: ‎09-16-2009

I don't use the wizards - so can't help much there.

 

In any case - the connections should be:

  input bidi pins -> IBUFDS -> MMCME.  

 

No "parallel" connections should be made to the differential side pins.

 

In your testbench, just create the clock as normal, and hook it up to the "P" pin.  Also in the testbench, invert the clock and hook that up to your "N" side pin.  Done.

 

Please post all followups here - I don't respond to requests via PM.

 

Regards,

 

Mark

 

0 Kudos
Reply
Observer
Observer
33,191 Views
Registered: ‎05-02-2013
@markcurry wrote:

 

In any case - the connections should be:

  input bidi pins -> IBUFDS -> MMCME.  

 

No "parallel" connections should be made to the differential side pins.

 


I tie the SYSCLK to the IBUFDS and get my single ended output. I connect this one to the MMCME and it throws an error like posted before, that they cannot be put in series...

 


markcurry wrote:

 

Please post all followups here - I don't respond to requests via PM.

 

 


Gotcha

0 Kudos
Reply
Xilinx Employee
Xilinx Employee
33,184 Views
Registered: ‎01-03-2008

> it throws an error like posted before, that they cannot be put in series...

 

What are you trying to put in series and what ERROR is reported by the tools?

 

In your top level VHDL entity the SYSCLK_P and SYSCLK_N pins should be connected directly to the I and IB input pins of the IBUFDS primitive.  In turn the O output pin of the IBUFDS primitive should be connected directly to the CLKIN of the MMCM.  

 

Anything else is a bad design.

------Have you tried typing your question into Google? If not you should before posting.
Too many results? Try adding site:www.xilinx.com
0 Kudos
Reply
Observer
Observer
4,674 Views
Registered: ‎05-02-2013

@mcgett wrote:

> it throws an error like posted before, that they cannot be put in series...

 

What are you trying to put in series and what ERROR is reported by the tools?

 

In your top level VHDL entity the SYSCLK_P and SYSCLK_N pins should be connected directly to the I and IB input pins of the IBUFDS primitive.  In turn the O output pin of the IBUFDS primitive should be connected directly to the CLKIN of the MMCM.  

 

Anything else is a bad design.


My code looks like this:

Library UNISIM;
USE UNISIM.vcomponents.all;


ENTITY toplevel_filter IS

	PORT (
		reset 			: IN STD_LOGIC;
		SYSCLK_N			: IN STD_LOGIC;
		SYSCLK_P			: IN STD_LOGIC;
	);

END toplevel_filter;

ARCHITECTURE Behavioral OF toplevel_filter IS

COMPONENT IBUFDS
	GENERIC (
		DIFF_TERM : BOOLEAN; -- Differential Termination
		IBUF_LOW_PWR : BOOLEAN; -- Low power (TRUE) vs. performance (FALSE) setting for refernced I/O standards
		IOSTANDARD : STRING
	);
	PORT (
		O : OUT STD_LOGIC; -- Buffer output
		I : IN STD_LOGIC; -- Diff_p buffer input (connect directly to top-level port)
		IB : IN STD_LOGIC -- Diff_n buffer input (connect directly to top-level port)
		);
END COMPONENT;

COMPONENT DSP_clk
PORT
 (-- Clock in ports
  CLK_IN1	: IN STD_LOGIC;
  -- Clock out ports
  CLK_OUT1	: OUT STD_LOGIC;
  CLK_OUT2	: OUT STD_LOGIC
 );
END COMPONENT;

SIGNAL	fast_clk_MMCME: STD_LOGIC;
SIGNAL	slow_clk_MMCME: STD_LOGIC;
SIGNAL	sys_clk_IBUFDS: STD_LOGIC;

BEGIN

IBUFDS_inst : IBUFDS
	GENERIC MAP (
		DIFF_TERM => FALSE, -- Differential Termination
		IBUF_LOW_PWR => FALSE, -- Low power (TRUE) vs. performance (FALSE) setting for refernced I/O standards
		IOSTANDARD => "DEFAULT")
	PORT MAP (
		O => sys_clk_IBUFDS, -- Buffer output
		I => SYSCLK_P, -- Diff_p buffer input (connect directly to top-level port)
		IB => SYSCLK_N -- Diff_n buffer input (connect directly to top-level port)
		);
		
manage_clk: DSP_clk
	PORT MAP (
		CLK_IN1	=> sys_clk_IBUFDS,
		-- Clock out ports
		CLK_OUT1	=> fast_clk_MMCME,
		CLK_OUT2	=> slow_clk_MMCME
	);
END Behavioral;

 The error occurs during translating the code:

 

"ERROR:NgdBuild:770 - IBUFDS 'IBUFDS_inst' and IBUFG 'manage_clk/clkin1_buf' on
net 'sys_clk_IBUFDS' are lined up in series. Buffers of the same direction
cannot be placed in series."

 

I am trying to do what was suggested by Mark to connect the differential clk pins to the IBUFDS and then to the MMCME (series).

0 Kudos
Reply
Xilinx Employee
Xilinx Employee
4,668 Views
Registered: ‎01-03-2008

The ERROR message indicates that you have an IBUFG in the DSP_clk submodule.  You need to remove it.

------Have you tried typing your question into Google? If not you should before posting.
Too many results? Try adding site:www.xilinx.com
0 Kudos
Reply
Observer
Observer
4,662 Views
Registered: ‎05-02-2013

Attached are two screenshots. So I am supposed to change the Drives from BUFG to...? I thought the BUFG is adequate for a DSP clock.

clkwiz1.JPG
clkwiz2.JPG
0 Kudos
Reply
Xilinx Employee
Xilinx Employee
4,654 Views
Registered: ‎01-03-2008
The selection of the single ended clock capable pin in the first dialog is inserting the IBUFG. Change to differential and remove the IBUFDS that you added, or change it to an internal sour, or simply rave te IBUFG in the source code that was generated.
------Have you tried typing your question into Google? If not you should before posting.
Too many results? Try adding site:www.xilinx.com
0 Kudos
Reply