cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
15196604083
Observer
Observer
10,388 Views
Registered: ‎01-16-2013

no dsp in timing path

Jump to solution

Hi,

   I write RTL including multiplier with asynchronous reset and set the certain XST  properties to ensure dspe1 be employed.
   The target device is xc7vx485t.
   From the resource utilization report, it is sure that the dspe1 is employed.
   However, from the timing report which contains all paths, I cannot find any "Location" or  "Delay type" having relation to DSP. I am confused why there is nothing about DSP in the  timing reort.
    The attachment is the complete timing report.


1.RTL:

   module multtop(clk,
                                 rst,
                                 a,
                                 b,
                                 c);
 
input signed [31:0] a;
input signed [31:0] b;
input clk,rst;
output reg signed [63:0] c;
reg signed [63:0] r_c;
reg signed [63:0] rr_c;


always@(posedge clk or posedge rst)begin
if(rst)
   begin
   r_c<=0;
   rr_c<=0;
   c<=0;
   end
else
   begin
   r_c<=a*b;
   rr_c<=r_c;
   c<=rr_c;
   end
end

endmodule


2.The fragment of resource Utilization:
        Number of DSP48E1s:                            4 out of   2,800    1%

3.The fragment of timing report:

Paths for end point rr_c_13 (SLICE_X83Y255.CX), 1 path
--------------------------------------------------------------------------------
Slack (setup path):     1.239ns (requirement - (data path - clock path skew + uncertainty))
  Source:               r_c_13 (FF)
  Destination:          rr_c_13 (FF)
  Requirement:          4.000ns
  Data Path Delay:      2.699ns (Levels of Logic = 0)
  Clock Path Skew:      -0.027ns (0.996 - 1.023)
  Source Clock:         clk_BUFGP rising at 0.000ns
  Destination Clock:    clk_BUFGP rising at 4.000ns
  Clock Uncertainty:    0.035ns

  Clock Uncertainty:          0.035ns  ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
    Total System Jitter (TSJ):  0.070ns
    Total Input Jitter (TIJ):   0.000ns
    Discrete Jitter (DJ):       0.000ns
    Phase Error (PE):           0.000ns

  Maximum Data Path at Slow Process Corner: r_c_13 to rr_c_13
    Location                         Delay type                Delay(ns)           Physical Resource
                                                                                                            Logical Resource(s)
    -------------------------------------------------  -------------------
    SLICE_X90Y246.DQ     Tcko                       0.216                   r_c<13>
                                                                                                            r_c_13
    SLICE_X83Y255.CX     net (fanout=1)       2.464                    r_c<13>
    SLICE_X83Y255.CLK    Tdick                     0.019                    rr_c<14>
                                                                                                            rr_c_13
    -------------------------------------------------  ---------------------------
    Total                                                                 2.699ns (0.235ns logic, 2.464ns route)
                                                                             (8.7% logic, 91.3% route)

0 Kudos
1 Solution

Accepted Solutions
avrumw
Guide
Guide
15,257 Views
Registered: ‎01-23-2009

The timing report is correct.


The paths that you are seeing are all the paths that you have constrained. The timing analysis will only show you constrained paths.

 

In your design, you multiply "a" and "b". In the RTL code you gave, these are primary inputs to your system. Thus, the tools are routing the a and b directly to the DSP48 for the multiplication.

 

As bwiec stated, the DSP48E's flip-flops only have synchronous resets. Since your code calls for async resets, the tools have no choice but to not use the DSP48 cell's flip-flops and instead use slice flip flops in the fabric. So (and you can look at this in FPGA Editor) the a and b inputs go directly to the inputs of the DSP48. The multiplication result from the DSP48 comes out combinatorially from the DSP48 and goes to the r_c FFs in the fabric. These are then FF'ed again into rr_c and again into c.

 

So, lets look at the paths. From your timing report it looks like you only specified a period constraint. Therefore, in this design, only the paths from r_c->rr_c and rr_c->c are constrained. The paths from a&b to r_c are not constrained (since you didn't put an OFFSET IN constraint) and hence are not reported.

 

So, lets say that you fix this. You change all your FFs to be synchronous resets, and you put an extra set of flip-flops on a and b (lets call them r_a and r_b) allow them to be flopped once before the multiply. Even here, you will NOT get any report on the multiplication itself. When done this way, the r_a, r_b and r_c FFs will all be pulled in to the FFs in the DSP48E slice. The timing engine does not show the paths internal to the block, so it will only show the paths from r_c to rr_c. If you also have an OFFSET IN, then it will show the path from the a&b inputs to r_a and r_b. The timing inside the DSP48 is "guaranteed by design" as long as the clock isn't too fast for the DSP48E in this configuration (where the M pipeline register is not used). Even if it is, this will fail a "pulse width" check - saying that the DSP48 is running too fast - not a setup check.

 

Avrum

View solution in original post

0 Kudos
14 Replies
bwiec
Xilinx Employee
Xilinx Employee
10,376 Views
Registered: ‎08-02-2011

DSP48E1 Slices only have synchronous resets (not async)

www.xilinx.com
0 Kudos
avrumw
Guide
Guide
15,258 Views
Registered: ‎01-23-2009

The timing report is correct.


The paths that you are seeing are all the paths that you have constrained. The timing analysis will only show you constrained paths.

 

In your design, you multiply "a" and "b". In the RTL code you gave, these are primary inputs to your system. Thus, the tools are routing the a and b directly to the DSP48 for the multiplication.

 

As bwiec stated, the DSP48E's flip-flops only have synchronous resets. Since your code calls for async resets, the tools have no choice but to not use the DSP48 cell's flip-flops and instead use slice flip flops in the fabric. So (and you can look at this in FPGA Editor) the a and b inputs go directly to the inputs of the DSP48. The multiplication result from the DSP48 comes out combinatorially from the DSP48 and goes to the r_c FFs in the fabric. These are then FF'ed again into rr_c and again into c.

 

So, lets look at the paths. From your timing report it looks like you only specified a period constraint. Therefore, in this design, only the paths from r_c->rr_c and rr_c->c are constrained. The paths from a&b to r_c are not constrained (since you didn't put an OFFSET IN constraint) and hence are not reported.

 

So, lets say that you fix this. You change all your FFs to be synchronous resets, and you put an extra set of flip-flops on a and b (lets call them r_a and r_b) allow them to be flopped once before the multiply. Even here, you will NOT get any report on the multiplication itself. When done this way, the r_a, r_b and r_c FFs will all be pulled in to the FFs in the DSP48E slice. The timing engine does not show the paths internal to the block, so it will only show the paths from r_c to rr_c. If you also have an OFFSET IN, then it will show the path from the a&b inputs to r_a and r_b. The timing inside the DSP48 is "guaranteed by design" as long as the clock isn't too fast for the DSP48E in this configuration (where the M pipeline register is not used). Even if it is, this will fail a "pulse width" check - saying that the DSP48 is running too fast - not a setup check.

 

Avrum

View solution in original post

0 Kudos
15196604083
Observer
Observer
10,360 Views
Registered: ‎01-16-2013
Thank you very much for your reply. I have an extra question. I have another project , and after synthesis there is one path that contains DSP48E. The fragment of Synthesis report shows below. Timing constraint: Default period analysis for Clock 'clk' Clock period: 2.850ns (frequency: 350.853MHz) Total number of paths / destination ports: 52461 / 2773 ------------------------------------------------------------------------- Delay: 2.850ns (Levels of Logic = 0) Source: bx1/yi_11 (FF) Destination: t_nextb/Maddsub_yiyix (DSP) Source Clock: clk rising Destination Clock: clk rising Data Path: bx1/yi_11 to t_nextb/Maddsub_yiyix Gate Net Cell:in->out fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDR:C->Q 5 0.232 0.298 bx1/yi_11 (bx1/yi_11) DSP48E1:A20 2.320 t_nextb/Maddsub_yiyix ---------------------------------------- Total 2.850ns (2.552ns logic, 0.298ns route) (89.5% logic, 10.5% route) Since the timing engine does not show the paths internal to the DSP48E block, then what does DSP48E1:A20's Gate Delay mean? Thank you.
0 Kudos
15196604083
Observer
Observer
10,359 Views
Registered: ‎01-16-2013

Thank you very much for your reply.  I have an extra question.  I have another project , and after synthesis there is one path that contains DSP48E.   The fragment of  Synthesis report shows below.

 

Timing constraint: Default period analysis for Clock 'clk'
  Clock period: 2.850ns (frequency: 350.853MHz)
  Total number of paths / destination ports: 52461 / 2773
------------------------------------------------------------------------------------------------------
Delay:               2.850ns (Levels of Logic = 0)
  Source:            bx1/yi_11 (FF)
  Destination:       t_nextb/Maddsub_yiyix (DSP)
  Source Clock:      clk rising
  Destination Clock: clk rising

  Data Path: bx1/yi_11 to t_nextb/Maddsub_yiyix
                                             Gate       Net
    Cell:in->out      fanout   Delay     Delay      Logical Name (Net Name)
    ---------------------------------------- -------------------------------------------- ------------
     FDR:C->Q              5     0.232     0.298      bx1/yi_11 (bx1/yi_11)
     DSP48E1:A20              2.320                      t_nextb/Maddsub_yiyix
    --------------------------------------------------------------------------------------------------
    Total                               2.850ns (2.552ns logic, 0.298ns route)
                                                            (89.5% logic, 10.5% route)


Since the timing engine does not show the paths internal to the DSP48E block, then what does DSP48E1:A20's Gate Delay mean? Thank you.

0 Kudos
avrumw
Guide
Guide
10,352 Views
Registered: ‎01-23-2009

In this case, you have a path that starts outside the DSP48 and ends at a FF inside the DSP48. This path is timed and reported. In my previous post what I meant is that the tools will not report paths entirely in the DSP48 cell.

 

In this case the DSP48E1:A20 shows the propagation delay through the DSP48 pin A20 to an internal FF in the DSP48 (I can't tell which one from this report). This is (effectively) the setup time requirement of this pin of the DSP48 cell.

 

Avrum

0 Kudos
15196604083
Observer
Observer
10,343 Views
Registered: ‎01-16-2013

Thank you for your reply . The attachment is the architecture of DSP48E1 slice from the datasheet . So what you mean is that the delay in  the module of MULT25*18 will not be reported and is "guaranteed by design" as long as the clock isn't too fast for the DSP48E in this configuration.  Is that right?

 

But I am confused why the delay  from the pin A20 to the internal FF can be so long and why  the MULT25*18 can run so fast. Because it is generally acknowledged that the path including multiplication will be the critical path of an architecture.

Thank you again for your reply. I have learned so much from your reply. 

0 Kudos
15196604083
Observer
Observer
10,341 Views
Registered: ‎01-16-2013

Here is the attachment of DSP48E1 slice. I forget in my last reply. Looking forward to your reply.

DSP48E.jpg
0 Kudos
avrumw
Guide
Guide
10,335 Views
Registered: ‎01-23-2009

Almost all timing paths in an FPGA start at one element (or outside the FPGA) and end in another element. For these paths, the tool enumerates all the cells the path traverses through and gives you detailed timing for each component. The ISE timing analyzer is slice based, so each component in a timing report will be the time through a slice (i.e. from the input of the slice through a LUT and to the output of the slice).

 

There are very few components that have complete paths within them. I can only think of a couple

  - the paths inside the IDDR and ODDR cells when the cell is in SAME_EDGE or SAME_EDGE_PIPELINED mode

  - the paths inside an ISERDES and OSERDES

  - the path inside the block RAM from - particularly from the output latches to the output FF (in pipelined mode)

  - many paths within the DSP48 - all of these depend on whether the corresponding FFs are enabled or not

      - between the FFs in the dual A and B registers

      - from the A, B, and D FFs to the M FFs

      - from the M FFs to the P FFs

 

For these paths, the tool does not explicitely describe the paths - they are completely contained in a single slice, so are not broken down. For these paths, instead of explicitly timing the cells, the tools know how fast the clock can be and still meet the internal timing of these paths. For example, if the slowest path is from the A FF to the M FF, and it is 3ns, then the tool knows that the cell can run at a clock period of 3ns or higher. So instead, it puts a pulse width check on the clock port of the DSP48 so that it must be greater than 3.

 

For all other paths, it generates timing reports. This includes paths that start outside the DSP cell and end inside it, or vice versa.

 

As for the relative speed of the multiplier vs. the other stuff, that is entirely the point of the DSP48 cell. Logic implemented in the FPGA fabric has inherent limitations due to the reprogrammability of the FPGA. Notably, the programmable interconnect between cells is "slow". Multipliers are relatively common extremely complex combinatorial functions. If implemented in the fabric they would be VERY SLOW. As a result, Xilinx has included the DSP48 cell. This is a hard block (i.e. an ASIC block) that implements the multiplication. Since it doesn't use programmable logic for implementing the actual multiplication, it is much faster than logic implemented in the fabric (maybe by a factor of 10 or more).

 

Avrum

0 Kudos
15196604083
Observer
Observer
10,327 Views
Registered: ‎01-16-2013

Thank you for your reply. I don't know the use of Component Switching Limit Checks in timing report. Does it have something to do with the pulse width check you have mentioned?

Here's the fregment of  Component Switching Limit Checks in timing report:

 

Component Switching Limit Checks: TS_clk = PERIOD TIMEGRP "clk" 4 ns HIGH 50%;
--------------------------------------------------------------------------------
Slack: 1.574ns (period - min period limit)
  Period: 4.000ns
  Min period limit: 2.426ns (412.201MHz) (Tdspper_AREG_PREG_MULT)
  Physical resource: Mmult_r_a[25]_r_b[18]_MuLt_6_OUT_submult_0/CLK
  Logical resource: Mmult_r_a[25]_r_b[18]_MuLt_6_OUT_submult_0/CLK
  Location pin: DSP48_X14Y81.CLK
  Clock network: clk_BUFGP
--------------------------------------------------------------------------------
Slack: 2.651ns (period - min period limit)
  Period: 4.000ns
  Min period limit: 1.349ns (741.290MHz) (Tbcper_I(Fmax))
  Physical resource: clk_BUFGP/BUFG/I0
  Logical resource: clk_BUFGP/BUFG/I0
  Location pin: BUFGCTRL_X0Y0.I0
  Clock network: clk_BUFGP/IBUFG
--------------------------------------------------------------------------------
Slack: 3.200ns (period - (min low pulse limit / (low pulse / period)))
  Period: 4.000ns
  Low pulse: 2.000ns
  Low pulse limit: 0.400ns (Tcl)
  Physical resource: r_b_17_7/CLK
  Logical resource: r_b_17_4/CK
  Location pin: SLICE_X129Y201.CLK
  Clock network: clk_BUFGP

0 Kudos
avrumw
Guide
Guide
7,474 Views
Registered: ‎01-23-2009

Yes, I meant component switching limit. You can see the requirement for the DSP48 in the first one - in this configuration, the DSP48 can run at 2.426ns, or 412.201MHz.

 

Avrum

0 Kudos
15196604083
Observer
Observer
7,468 Views
Registered: ‎01-16-2013

Thank you for your reply. But the Component Switching Limit Checks of my another project didn't show the path from AREG, BREG or DREG to MREG(because these paths contains the cell of  25*18MULT). It only shows the path from MREG to PREG and another path.

 

Here's the RTL code:

 

module multtop(clk,
               rst,
     a,
     b,
     c
    );
 
input signed [24:0] a;
input signed [17:0] b;
input clk,rst;
output reg signed [42:0] c;
reg signed [42:0] r_c;
reg signed [24:0] r_a;
reg signed [17:0] r_b;

always@(posedge clk)begin
if(rst)
   begin
 r_a<=0;
 r_b<=0;
 end
else
   begin
   r_a<=a;
 r_b<=b;
 end
end

always@(posedge clk)begin
if(rst)
   begin
 r_c<=0;
 c<=0;
 end
else
   begin
   r_c<=r_a*r_b;
 c<=r_c;
 end
end

endmodule

 

Here's the fragment of timing report:

 

Component Switching Limit Checks: TS_clk = PERIOD TIMEGRP "clk" 4 ns HIGH 50%;
--------------------------------------------------------------------------------
Slack: 2.651ns (period - min period limit)
  Period: 4.000ns
  Min period limit: 1.349ns (741.290MHz) (Tbcper_I(Fmax))
  Physical resource: clk_BUFGP/BUFG/I0
  Logical resource: clk_BUFGP/BUFG/I0
  Location pin: BUFGCTRL_X0Y0.I0
  Clock network: clk_BUFGP/IBUFG
--------------------------------------------------------------------------------
Slack: 2.652ns (period - min period limit)
  Period: 4.000ns
  Min period limit: 1.348ns (741.840MHz) (Tdspper_MREG_PREG)
  Physical resource: Mmult_r_a[24]_r_b[17]_MuLt_6_OUT/CLK
  Logical resource: Mmult_r_a[24]_r_b[17]_MuLt_6_OUT/CLK
  Location pin: DSP48_X3Y83.CLK
  Clock network: clk_BUFGP
--------------------------------------------------------------------------------

 

 

0 Kudos
avrumw
Guide
Guide
7,458 Views
Registered: ‎01-23-2009

Basically, the tool does what it does. How it deals with the Component Switching Limit isn't exactly clearly described anywhere that I know of. What I suspect it does, is that for every cell it figures out what the limit is based on all internal paths (in a given configuration), and then uses that as the minimum requirement. In this case, it (correctly) identified that the longest used path in the DSP48 is the one through the multiplier, and therefore this sets the lower limit on the whole cell based on this.

 

I had hoped that speedprint would show you all of the Tdspper* parameters, but it doesn't appear to.

 

I think you pretty much have to accept that the tool does what it does. It may not be giving you explicit information through all paths of the DSP, but I am sure it is properly timing your design.

 

Avrum

 

 

0 Kudos
15196604083
Observer
Observer
7,444 Views
Registered: ‎01-16-2013

Thanks again for your help. I have gained a lot about fpga from you. Thank you.

0 Kudos
15196604083
Observer
Observer
7,409 Views
Registered: ‎01-16-2013

Hi avrumw.

   I was faced with another problem. And I have to ask you for your help.  

    I have a large project with case statement in my souce. But I found something strange after synthesis. If I remain the default in my   case statement, the Maximum Frequency is 249.308MHz . But if I remove the default, the Maximum Frequency is 324.317MHz. . As far as I know, the synthesizer will add an extra  launch when it comes across the case statement without default . How can this  lead to the increasement of the Maximum Frequency?  Thank you.

My device is Virtex7 XC7VX485T. And the case implementation style option in synthesize property in none.

The souce containing case statement shows below.(Since my project is a little large, so I don't attach my project to the internet):

 

module tw512_stage1_part(d_in,
                                                    p1,
                                                    d1_out,
                                                    d2_out);
parameter wd=14;
parameter fd=13;

input signed [wd-1:0] d_in;
input [3:0] p1;
output signed [wd-1:0] d1_out,d2_out;

reg signed [24:0] d1_outx,d2_outx; //25_24
wire signed [16:0] d_in_p2;   //17_15
wire signed [15:0] d_in_m2;   //16_15
wire signed [17:0] d_in_p3;   //18_16
wire signed [16:0] d_in_m3;   //17_16
wire signed [17:0] d_in_m4;   //18_17


wire signed [23:0] add_1_d1;   //24_23
wire signed [24:0] add_1_d2;   //25_24

wire signed [21:0] add_2_d1_0; //22_21
wire signed [24:0] add_2_d1;   //25_24
wire signed [20:0] add_2_d2;   //21_20

wire signed [21:0] add_3_d1;   //22_21
wire signed [20:0] add_3_d2_0;   //21_20
wire signed [24:0] add_3_d2;   //25_24

wire signed [22:0] add_4_d1;   //23_22
wire signed [20:0] add_4_d2;   //21_20

wire signed [23:0] add_5_d1;   //24_23
wire signed [18:0] add_5_d2_0;   //19_18
wire signed [24:0] add_5_d2;   //25_24

wire signed [19:0] add_6_d1_0;   //20_19
wire signed [24:0] add_6_d1;   //25_24
wire signed [23:0] add_6_d2;   //24_23

wire signed [20:0] add_7_d1_0;   //21_20
wire signed [24:0] add_7_d1;   //25_24
wire signed [22:0] add_7_d2_0;   //23_22
wire signed [24:0] add_7_d2;   //25_24

wire signed [17:0] add_8_d1_0;   //18_17
wire signed [21:0] add_8_d1;   //22_21
wire signed [21:0] add_8_d2;   //22_21


assign d_in_p2={d_in[wd-1],d_in,{2{1'b0}}}+{{3{d_in[wd-1]}},d_in};
assign d_in_m2={d_in,{2{1'b0}}}-{{2{d_in[wd-1]}},d_in};
assign d_in_p3={d_in[wd-1],d_in,{3{1'b0}}}+{{4{d_in[wd-1]}},d_in};
assign d_in_m3={d_in,{3{1'b0}}}-{{3{d_in[wd-1]}},d_in};
assign d_in_m4={d_in,{4{1'b0}}}-{{4{d_in[wd-1]}},d_in};


assign  add_2_d1_0={d_in,{8{1'b0}}}-{{5{d_in_p2[16]}},d_in_p2};
assign  add_3_d2_0={{2{d_in[wd-1]}},d_in,{5{1'b0}}}+{{4{d_in_p2[16]}},d_in_p2};
assign  add_5_d2_0={d_in[wd-1],d_in,{4{1'b0}}}-{{5{d_in[wd-1]}},d_in};
assign  add_6_d1_0={d_in_m2,{4{1'b0}}}+{{3{d_in_p2[16]}},d_in_p2};
assign  add_7_d1_0={d_in_m2,{5{1'b0}}}+{{5{d_in_m2[15]}},d_in_m2};
assign  add_7_d2_0={d_in_p2,{6{1'b0}}}+{{6{d_in_p2[16]}},d_in_p2};
assign  add_8_d1_0={d_in_m2,{2{1'b0}}}-{{4{d_in[wd-1]}},d_in};

 


assign  add_1_d1={d_in,{10{1'b0}}}-{{7{d_in_p2[16]}},d_in_p2}; //csd code 1 0 0 0 0 0 0 0 -1 0 -1 0
assign  add_1_d2={{3{d_in_m2[15]}},d_in_m2,{6{1'b0}}}+{{7{d_in_p3[17]}},d_in_p3};//csd code 0 0 0 1 0 -1 0 0 1 0 0 1
assign  add_2_d1={add_2_d1_0,{3{1'b0}}}+{{11{d_in[wd-1]}},d_in};//csd code 1 0 0 0 0 0 -1 0 -1 0 0 1
assign  add_2_d2={{2{d_in_m2[15]}},d_in_m2,{3{1'b0}}}+{{7{d_in[wd-1]}},d_in};//csd code 0 0 1 0 -1 0 0 1 0 0 0 0
assign  add_3_d1={{5{d_in_p2[16]}},d_in_p2}+{d_in_m4,{4{1'b0}}}; //csd code 1 0 0 0 -1 0 1 0 1 0 0 0
assign  add_3_d2={add_3_d2_0,{4{1'b0}}}+{{9{d_in_m2[15]}},d_in_m2};//csd code 0 0 1 0 0 1 0 1 0 1 0 -1
assign  add_4_d1={d_in_m4,{5{1'b0}}}-{{6{d_in_m3[16]}},d_in_m3};//csd code 1 0 0 0 -1 0 -1 0 0 1 0 0
assign  add_4_d2={d_in_m2[15],d_in_m2,{4{1'b0}}}+{{7{d_in[wd-1]}},d_in};//csd code 0 1 0 -1 0 0 0 1 0 0 0 0
assign  add_5_d1={d_in_m3,{7{1'b0}}}+{{7{d_in_m3[16]}},d_in_m3};//csd code 1 0 0 -1 0 0 0 1 0 0 -1 0
assign  add_5_d2={add_5_d2_0,{6{1'b0}}}+{{8{d_in_p2[16]}},d_in_p2};//csd code 0 1 0 0 0 -1 0 0 0 1 0 1
assign  add_6_d1={add_6_d1_0,{5{1'b0}}}+{{8{d_in_m3[16]}},d_in_m3};//csd code 1 0 -1 0 1 0 1 0 1 0 0 -1
assign  add_6_d2={d_in_p3,{6{1'b0}}}-{{7{d_in_m3[16]}},d_in_m3};//csd code 0 1 0 0 1 0 0 -1 0 0 1 0
assign  add_7_d1={add_7_d1_0,{4{1'b0}}}-{{11{d_in[wd-1]}},d_in};//csd code 1 0 -1 0 0 1 0 -1 0 0 0 -1
assign  add_7_d2={add_7_d2_0,{2{1'b0}}}-{{11{d_in[wd-1]}},d_in};//csd code 0 1 0 1 0 0 0 1 0 1 0 -1
assign  add_8_d1={add_8_d1_0,{4{1'b0}}}+{{5{d_in_p2[16]}},d_in_p2};//csd code 1 0 -1 0 -1 0 1 0 1 0 0 0
assign  add_8_d2={add_8_d1_0,{4{1'b0}}}+{{5{d_in_p2[16]}},d_in_p2};    //csd code 1 0 -1 0 -1 0 1 0 1 0 0 0

 

always@*begin
  case(p1)
  4'b0000:
    begin
    d1_outx={d_in,{11{1'b0}}};
    d2_outx=0;
    end
  4'b0001:
    begin
    d1_outx={add_1_d1,1'b0};
    d2_outx=~add_1_d2+1;
    end
  4'b0010:
    begin
    d1_outx=add_2_d1;
    d2_outx=~{add_2_d2,{4{1'b0}}}+1;
    end
  4'b0011:
    begin
    d1_outx={add_3_d1,{3{1'b0}}};
    d2_outx=~add_3_d2+1;
    end
  4'b0100:
    begin
    d1_outx={add_4_d1,{2{1'b0}}};
    d2_outx=~{add_4_d2,{4{1'b0}}}+1;
    end
  4'b0101:
    begin
    d1_outx={add_5_d1,1'b0};
    d2_outx=~add_5_d2+1;
    end
  4'b0110:
    begin
    d1_outx=add_6_d1;
    d2_outx=~{add_6_d2,1'b0}+1;
    end
  4'b0111:
    begin
    d1_outx=add_7_d1;
    d2_outx=~add_7_d2+1;
    end
  4'b1000:
    begin
    d1_outx={add_8_d1,{3{1'b0}}};
    d2_outx=~{add_8_d2,{3{1'b0}}}+1;
    end
   default:
    begin
    d1_outx='bx;
    d2_outx='bx;
    end
  endcase
end

rns1 #(.W_x(25),.F_x(24),.W_y(wd),.F_y(fd)) rns1_1(.x_in(d1_outx),.y_out(d1_out));
rns1 #(.W_x(25),.F_x(24),.W_y(wd),.F_y(fd)) rns1_2(.x_in(d2_outx),.y_out(d2_out));

endmodule

 

And here's the fragment of synthesis report with remaining default in case statement:

 

Timing Summary:
---------------
Speed Grade: -3

   Minimum period: 4.011ns (Maximum Frequency: 249.308MHz)
   Minimum input arrival time before clock: 2.884ns
   Maximum output required time after clock: 0.511ns
   Maximum combinational path delay: No path found

Timing Details:
---------------
All values displayed in nanoseconds (ns)

=========================================================================
Timing constraint: Default period analysis for Clock 'clk'
  Clock period: 4.011ns (frequency: 249.308MHz)
  Total number of paths / destination ports: 32098287 / 40874
-------------------------------------------------------------------------
Delay:               4.011ns (Levels of Logic = 20)
  Source:            path0/bf_ctrl/Mmult_n0083 (DSP)
  Destination:       path0/stage1_tw512/r_sg1_2_13_1 (FF)
  Source Clock:      clk rising
  Destination Clock: clk rising

  Data Path: path0/bf_ctrl/Mmult_n0083 to path0/stage1_tw512/r_sg1_2_13_1
                                Gate     Net
    Cell:in->out      fanout   Delay   Delay  Logical Name (Net Name)
    ----------------------------------------  ------------
     DSP48E1:CLK->P0       4   0.324   0.293  path0/bf_ctrl/Mmult_n0083 (path0/p1_512<0>)
     BUF:I->O             22   0.053   0.375  path0/bf_ctrl/Mmult_n0083_98 (path0/bf_ctrl/Mmult_n0083_98)
     BUF:I->O             21   0.053   0.519  path0/bf_ctrl/Mmult_n0083_2 (path0/bf_ctrl/Mmult_n0083_2)
     LUT6:I3->O            5   0.043   0.561  path0/stage1_tw512/tw512_ctrl/p12<4>1 (path0/stage1_tw512/p1<1>)
     LUT6:I1->O            1   0.043   0.428  path0/stage1_tw512/tw512_stage1/tw512_stage1_part1/Mmux_d2_outx61 (path0/stage1_tw512/tw512_stage1/tw512_stage1_part1/Mmux_d2_outx6)
     LUT5:I2->O            1   0.043   0.000  path0/stage1_tw512/tw512_stage1/tw512_stage1_part1/Mmux_d2_outx63 (path0/stage1_tw512/tw512_stage1/tw512_stage1_part1/d2_outx<11>)
     MUXCY:S->O            1   0.230   0.000  path0/stage1_tw512/tw512_stage1/tw512_stage1_part1/rns1_2/Madd_temp_round_cy<11> (path0/stage1_tw512/tw512_stage1/tw512_stage1_part1/rns1_2/Madd_temp_round_cy<11>)
     MUXCY:CI->O           1   0.013   0.000  path0/stage1_tw512/tw512_stage1/tw512_stage1_part1/rns1_2/Madd_temp_round_cy<12> (path0/stage1_tw512/tw512_stage1/tw512_stage1_part1/rns1_2/Madd_temp_round_cy<12>)
     MUXCY:CI->O           1   0.013   0.000  path0/stage1_tw512/tw512_stage1/tw512_stage1_part1/rns1_2/Madd_temp_round_cy<13> (path0/stage1_tw512/tw512_stage1/tw512_stage1_part1/rns1_2/Madd_temp_round_cy<13>)
     MUXCY:CI->O           1   0.013   0.000  path0/stage1_tw512/tw512_stage1/tw512_stage1_part1/rns1_2/Madd_temp_round_cy<14> (path0/stage1_tw512/tw512_stage1/tw512_stage1_part1/rns1_2/Madd_temp_round_cy<14>)
     MUXCY:CI->O           1   0.012   0.000  path0/stage1_tw512/tw512_stage1/tw512_stage1_part1/rns1_2/Madd_temp_round_cy<15> (path0/stage1_tw512/tw512_stage1/tw512_stage1_part1/rns1_2/Madd_temp_round_cy<15>)
     MUXCY:CI->O           1   0.012   0.000  path0/stage1_tw512/tw512_stage1/tw512_stage1_part1/rns1_2/Madd_temp_round_cy<16> (path0/stage1_tw512/tw512_stage1/tw512_stage1_part1/rns1_2/Madd_temp_round_cy<16>)
     MUXCY:CI->O           1   0.012   0.000  path0/stage1_tw512/tw512_stage1/tw512_stage1_part1/rns1_2/Madd_temp_round_cy<17> (path0/stage1_tw512/tw512_stage1/tw512_stage1_part1/rns1_2/Madd_temp_round_cy<17>)
     MUXCY:CI->O           1   0.012   0.000  path0/stage1_tw512/tw512_stage1/tw512_stage1_part1/rns1_2/Madd_temp_round_cy<18> (path0/stage1_tw512/tw512_stage1/tw512_stage1_part1/rns1_2/Madd_temp_round_cy<18>)
     MUXCY:CI->O           1   0.012   0.000  path0/stage1_tw512/tw512_stage1/tw512_stage1_part1/rns1_2/Madd_temp_round_cy<19> (path0/stage1_tw512/tw512_stage1/tw512_stage1_part1/rns1_2/Madd_temp_round_cy<19>)
     MUXCY:CI->O           1   0.012   0.000  path0/stage1_tw512/tw512_stage1/tw512_stage1_part1/rns1_2/Madd_temp_round_cy<20> (path0/stage1_tw512/tw512_stage1/tw512_stage1_part1/rns1_2/Madd_temp_round_cy<20>)
     MUXCY:CI->O           1   0.012   0.000  path0/stage1_tw512/tw512_stage1/tw512_stage1_part1/rns1_2/Madd_temp_round_cy<21> (path0/stage1_tw512/tw512_stage1/tw512_stage1_part1/rns1_2/Madd_temp_round_cy<21>)
     MUXCY:CI->O           1   0.012   0.000  path0/stage1_tw512/tw512_stage1/tw512_stage1_part1/rns1_2/Madd_temp_round_cy<22> (path0/stage1_tw512/tw512_stage1/tw512_stage1_part1/rns1_2/Madd_temp_round_cy<22>)
     MUXCY:CI->O           0   0.012   0.000  path0/stage1_tw512/tw512_stage1/tw512_stage1_part1/rns1_2/Madd_temp_round_cy<23> (path0/stage1_tw512/tw512_stage1/tw512_stage1_part1/rns1_2/Madd_temp_round_cy<23>)
     XORCY:CI->O           4   0.251   0.293  path0/stage1_tw512/tw512_stage1/tw512_stage1_part1/rns1_2/Madd_temp_round_xor<24> (path0/stage1_tw512/sg1_2<13>)
     BUF:I->O              5   0.053   0.298  path0/stage1_tw512/tw512_stage1/tw512_stage1_part1/rns1_2/Madd_temp_round_xor<24>_1 (path0/stage1_tw512/tw512_stage1/tw512_stage1_part1/rns1_2/Madd_temp_round_xor<24>_1)
     FDR:D                    -0.001          path0/stage1_tw512/r_sg1_2_13_9
    ----------------------------------------
    Total                      4.011ns (1.243ns logic, 2.768ns route)
                                       (31.0% logic, 69.0% route)

0 Kudos