UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Observer ronakbajaj
Observer
11,719 Views
Registered: ‎01-05-2012

DSP48E1 primitive latency

Jump to solution

Hi all,

 

What is the latency of DSP48E1 primitive when used by directly instantiating the primitive?

 

In my Verilog code, I am directly instantiating DSP48E1 primitive, using all four stages of pipeline. While doing the functional simulation (ISim), Instead of getting ouput after four(4) clock cycles, the design takes 11 cycles for the first output.

 

What might be the reason of this extra delay of 7 clock cycles? can it be avoided?

 

 

Thanks a lot!

 

 

RTL DSP48E1 Instantiation Code:

`timescale 1ns / 1ps

 

module addmuladd(
input CLK,
input RESET,
input signed [24:0] A,
input signed [17:0] B,
input signed [47:0] C,
input signed [24:0] D,
output signed [47:0] P
);

 

wire signed [47:0] out_dsp;

assign P = out_dsp;

 

DSP48E1 #(
.ADREG(1),
.ALUMODEREG(1),
.AREG(1),
.BREG(2),
.CARRYINREG(1),
.CARRYINSELREG(1),
.CREG(1),
.DREG(1),
.INMODEREG(1),
.MREG(1),
.OPMODEREG(1),
.PREG(1),
.A_INPUT("DIRECT"),
.B_INPUT("DIRECT"),
.USE_DPORT("TRUE"),
.USE_MULT("MULTIPLY"),
.USE_SIMD("ONE48")
)
dsp48e1_inst(
.P(out_dsp),
.CLK(CLK),
.A(A),
.ALUMODE(4'b0000),
.B(B),
.C(C),
.CARRYIN(1'b0),
.CARRYINSEL(3'b000),
.CEA1(1'b1),
.CEA2(1'b0),
.CEAD(1'b1),
.CEALUMODE(1'b1),
.CEB1(1'b1),
.CEB2(1'b1),
.CEC(1'b1),
.CECARRYIN(1'b1),
.CECTRL(1'b1),
.CED(1'b1),
.CEINMODE(1'b1),
.CEM(1'b1),
.CEP(1'b1),
.D(D),
.INMODE(5'b0_0101),
.OPMODE(7'b011_01_01),
.RSTA(RESET),
.RSTALLCARRYIN(RESET),
.RSTALUMODE(RESET),
.RSTB(RESET),
.RSTC(RESET),
.RSTCTRL(RESET),
.RSTD(RESET),
.RSTINMODE(RESET),
.RSTM(RESET),
.RSTP(RESET)
);

endmodule

 

 

Testbench Code which 'should' correct output but doesn't:

`timescale 1ns / 1ps

 

module tb_addmuladd;

// Inputs
reg CLK;
reg RESET;
reg signed [24:0] A;
reg signed [17:0] B;
reg signed [47:0] C;
reg signed [24:0] D;

// Outputs
wire signed [47:0] P;

 

initial CLK = 0;
always #5 CLK = ~CLK;

 

addmuladd uut (
.CLK(CLK),
.RESET(RESET),
.A(A),
.B(B),
.C(C),
.D(D),
.P(P)
);

 

initial
begin
CLK = 0;
RESET = 0;
A = 0; B = 0; C = 0; D = 0;

#10 RESET = 1;
#10 RESET = 0;

#10 A = 25'd29; B = 18'd4; D = 18'd91;
#20 C = 18'd26;

#25 $finish;
end

 

always @ (negedge CLK)
begin
$display("out = %0d", P);
end

endmodule

 

Testbench Code which gives correct output:

`timescale 1ns / 1ps

 

module tb_addmuladd;

// Inputs
reg CLK;
reg RESET;
reg signed [24:0] A;
reg signed [17:0] B;
reg signed [47:0] C;
reg signed [24:0] D;

// Outputs
wire signed [47:0] P;

 

initial CLK = 0;
always #5 CLK = ~CLK;

 

addmuladd uut (
.CLK(CLK),
.RESET(RESET),
.A(A),
.B(B),
.C(C),
.D(D),
.P(P)
);

 

initial
begin
CLK = 0;
RESET = 0;
A = 0; B = 0; C = 0; D = 0;

#10 RESET = 1;
#10 RESET = 0;

#10 A = 25'd29; B = 18'd4; D = 18'd91;
#20 C = 18'd26;

#95 $finish;
end

 

always @ (negedge CLK)
begin
$display("out = %0d", P);
end

endmodule

 

Tags (2)
0 Kudos
1 Solution

Accepted Solutions
Scholar markcurry
Scholar
17,998 Views
Registered: ‎09-16-2009

Re: DSP48E1 primitive latency

Jump to solution

Ronak,

 

A quick glance at your instaciation looks like 4 is the correct number for the latency.  I don't think you can get more than 4 stages from the DSP48.

 

I suspect a GSR simulation problem.  See this thread:

 

http://forums.xilinx.com/t5/Digital-Signal-Processing-IP-and/Strange-simulation-of-multiplier-and-multiply-add-IP-core/m-p/395431/highlight/true#M3708

 

 

Regards,

 

Mark

6 Replies
Scholar dwisehart
Scholar
11,697 Views
Registered: ‎06-23-2013

Re: DSP48E1 primitive latency

Jump to solution

Have you looked at the DSP manual for your device?  It is UG479 for the 7 Series devices.

 

You use a number of registers to get the work done, which are shown in some detail in the DSP manual.  The two BREG's take one clock each, and the ADREG sits in series with the DREG, for example, and that is just part of what leads up to the multiplier, which has its own register.  There are registers on the output as well.

 

I do not see why you would expect the output to arrive after four clock cycles.  Try starting with the simplest possible operation, check the latency and then layer on complexity to see where latency is being added.

 

Regards,

Daniel

 

0 Kudos
Observer ronakbajaj
Observer
11,692 Views
Registered: ‎01-05-2012

Re: DSP48E1 primitive latency

Jump to solution

DSP48E1 Userguide says that the DSP blocks have 4 pipeline stages and each stage can be used optionally using parameters as AREG, BREG, ADREG etc.

 

As you rightly pointed out, each BREG takes one clock cycle; DREG as well as ADREG takes one cycle each, but one BREG is in same pipeline stage as of DREG and other BREG is in same stage of ADREG.

 

In the code I mentioned in my question, I am using all four pipeline stages of DSP blocks (2 stages at input(2 BREG; DREG + ADREG), 1 stage at output of Multiplier (MREG), and output register (PREG)).

 

As the DSP block have four pipeline stages, I think the ouput should come after 4 cycles of inputs?

 

I did checked with a basic configuration of using only multiplier, but got the same delay.

 

 

Regards,

Ronak

 

0 Kudos
Scholar dwisehart
Scholar
11,684 Views
Registered: ‎06-23-2013

Re: DSP48E1 primitive latency

Jump to solution

My understanding of pipelining is that it will help you when the inputs change every clock cycle: you can have a continuous stream of inputs and outputs without break.  But I do not think pipelining will help you when you push one set of inputs through: you still have to do all of the operations, which are, serial in nature.

 

I was thinking of starting with something even simpler, such as a counter with almost nothing turned on:

 

DSP48E1 #
( .ACASCREG ( 0 ),
  .ADREG ( 0 ),
  .AREG ( 0 ),
  .AUTORESET_PATDET ( "RESET_MATCH" ),
  .DREG ( 0 ),
  .MASK ( 48'd0 ),
  .MREG ( 0 ),
  .PATTERN ( pattern ),
  .USE_MULT ( "NONE" ),
  .USE_PATTERN_DETECT( "PATDET" )

)
mDSP
( .CLK ( iClk ),
  .CEP ( iCount ),
  .PATTERNDETECT ( oTrig ),
  .P ( oCount ),

  .A ( 30'd0 ),
  .ACIN ( 30'd0 ),
  .ALUMODE ( 4'b0000 ),
  .B ( 18'd1 ),
  .BCIN ( 18'b0 ),
  .C ( 48'b0 ),
  .CARRYCASCIN ( 1'b0 ),
  .CARRYIN ( 1'b0 ),
  .CARRYINSEL ( 3'd0 ),
  .CEA1 ( 1'b0 ),
  .CEA2 ( 1'b0 ),
  .CEAD ( 1'b0 ),
  .CEALUMODE ( 1'b0 ),
  .CEB1 ( 1'b0 ),
  .CEB2 ( 1'b1 ),
  .CEC ( 1'b0 ),
  .CECARRYIN ( 1'b0 ),
  .CECTRL ( 1'b1 ),
  .CED ( 1'b0 ),
  .CEINMODE ( 1'b0 ),
  .CEM ( 1'b0 ),
  .D ( 25'd0 ),
  .INMODE ( 5'd0 ),
  .MULTSIGNIN ( 1'b0 ),
  .OPMODE ( 7'b0100011 ), // P reg, Y = 0, A:B 48-bits
  .PCIN ( 48'd0 ),
  .RSTA ( iReset ),
  .RSTALLCARRYIN ( iReset ),
  .RSTALUMODE ( iReset ),
  .RSTB ( iReset ),
  .RSTC ( iReset ),
  .RSTCTRL ( iReset ),
  .RSTD ( iReset ),
  .RSTINMODE ( iReset ),
  .RSTM ( iReset ),
  .RSTP ( iReset ),
  .ACOUT (),
  .BCOUT (),
  .CARRYCASCOUT (),
  .CARRYOUT (),
  .MULTSIGNOUT (),
  .OVERFLOW (),
  .PATTERNBDETECT(),
  .PCOUT (),
  .UNDERFLOW () );

 

Regards,

Daniel

 

0 Kudos
Observer ronakbajaj
Observer
11,681 Views
Registered: ‎01-05-2012

Re: DSP48E1 primitive latency

Jump to solution

Actually, in my application, I have streaming inputs only. I used one input testcase for the question only. 

 

Let me try your suggestion of building up from a simple configuration.

 

 

Thanks.

0 Kudos
Scholar markcurry
Scholar
17,999 Views
Registered: ‎09-16-2009

Re: DSP48E1 primitive latency

Jump to solution

Ronak,

 

A quick glance at your instaciation looks like 4 is the correct number for the latency.  I don't think you can get more than 4 stages from the DSP48.

 

I suspect a GSR simulation problem.  See this thread:

 

http://forums.xilinx.com/t5/Digital-Signal-Processing-IP-and/Strange-simulation-of-multiplier-and-multiply-add-IP-core/m-p/395431/highlight/true#M3708

 

 

Regards,

 

Mark

Observer ronakbajaj
Observer
11,661 Views
Registered: ‎01-05-2012

Re: DSP48E1 primitive latency

Jump to solution

Thanks a TON, Mark. It was indeed a GSR simulation problem. :)

 

 

Regards,

Ronak

0 Kudos