UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Visitor jaga_nitc
Visitor
6,654 Views
Registered: ‎05-31-2016

Timing failure due to too many CARRY4 components in VIVADO 2015.4.

Jump to solution

I implemented a noise removing system using spectral subtraction in VHDL. I am using kintex-7 evaluation board with VIVADO 2015.4 version. i am using 16 bit samples to perform FFT. The frame length is 64. So, i have created four variables of length 4096 bits to store 64 values of 64 bits each. Some variables are 2048 bits wide to store 64 values of 32 bits each (output of FFT).

I have packaged the noise removal system as an IP called Speech_Enhancement.

 

The system is working fine during simulation. The operating frequency of the IP is 50MHz.

.bit file generation is successful, but my design failed to meet timing constraints.

 

I checked the timing violated paths and found that too many CARRY4 components are created along the path which induces very huge path delay. I didnt understand 'the reason for so many CARRY4 components.

 

I have attached the related report files here. please go through the files and suggest some solutions...

 

forum_1.pngforum_2.png

0 Kudos
1 Solution

Accepted Solutions
Instructor
Instructor
12,600 Views
Registered: ‎08-14-2007

Re: Timing failure due to too many CARRY4 components in VIVADO 2015.4.

Jump to solution

Slack (VIOLATED) :        -141.698ns  (required time - arrival time)
  Source:                 mb_subsystem_i/speech_enhancement_0/U0/speech_enhancement_ip_0/sample_index_reg[0]_rep__5_replica_1/C
                            (rising edge-triggered cell FDRE clocked by mmcm_clkout0  {rise@0.000ns fall@10.000ns period=20.000ns})
  Destination:            mb_subsystem_i/speech_enhancement_0/U0/speech_enhancement_ip_0/transfer_func_sq_reg[3968]/D
                            (rising edge-triggered cell FDRE clocked by mmcm_clkout0  {rise@0.000ns fall@10.000ns period=20.000ns})
  Path Group:             mmcm_clkout0
  Path Type:              Setup (Max at Slow Process Corner)
  Requirement:            20.000ns  (mmcm_clkout0 rise@20.000ns - mmcm_clkout0 rise@0.000ns)
  Data Path Delay:        161.413ns  (logic 102.088ns (63.246%)  route 59.323ns (36.752%))
  Logic Levels:           1441  (CARRY4=1373 LUT2=7 LUT3=58 LUT4=1 LUT5=1 RAMS64E=1)
  Clock Path Skew:        -0.225ns (DCD - SCD + CPR)
    Destination Clock Delay (DCD):    3.432ns = ( 23.432 - 20.000 )
    Source Clock Delay      (SCD):    3.914ns
    Clock Pessimism Removal (CPR):    0.256ns
  Clock Uncertainty:      0.094ns  ((TSJ^2 + DJ^2)^1/2) / 2 + PE
    Total System Jitter     (TSJ):    0.071ns
    Discrete Jitter          (DJ):    0.174ns
    Phase Error              (PE):    0.000ns

 

The number of logic levels indicates that you're trying to do too much in one clock period.  Most likely you have multiple levels of arithmetic, like Y = A + B + C + D - (E + F + G + H).  In order to run at 50 MHz, you'd need to pipeline this operation over several clock periods.  Alternatively, if you don't really need a new result every cycle (input data doesn't change at the clock rate), you could set a multicycle path.

-- Gabor
0 Kudos
5 Replies
Instructor
Instructor
12,601 Views
Registered: ‎08-14-2007

Re: Timing failure due to too many CARRY4 components in VIVADO 2015.4.

Jump to solution

Slack (VIOLATED) :        -141.698ns  (required time - arrival time)
  Source:                 mb_subsystem_i/speech_enhancement_0/U0/speech_enhancement_ip_0/sample_index_reg[0]_rep__5_replica_1/C
                            (rising edge-triggered cell FDRE clocked by mmcm_clkout0  {rise@0.000ns fall@10.000ns period=20.000ns})
  Destination:            mb_subsystem_i/speech_enhancement_0/U0/speech_enhancement_ip_0/transfer_func_sq_reg[3968]/D
                            (rising edge-triggered cell FDRE clocked by mmcm_clkout0  {rise@0.000ns fall@10.000ns period=20.000ns})
  Path Group:             mmcm_clkout0
  Path Type:              Setup (Max at Slow Process Corner)
  Requirement:            20.000ns  (mmcm_clkout0 rise@20.000ns - mmcm_clkout0 rise@0.000ns)
  Data Path Delay:        161.413ns  (logic 102.088ns (63.246%)  route 59.323ns (36.752%))
  Logic Levels:           1441  (CARRY4=1373 LUT2=7 LUT3=58 LUT4=1 LUT5=1 RAMS64E=1)
  Clock Path Skew:        -0.225ns (DCD - SCD + CPR)
    Destination Clock Delay (DCD):    3.432ns = ( 23.432 - 20.000 )
    Source Clock Delay      (SCD):    3.914ns
    Clock Pessimism Removal (CPR):    0.256ns
  Clock Uncertainty:      0.094ns  ((TSJ^2 + DJ^2)^1/2) / 2 + PE
    Total System Jitter     (TSJ):    0.071ns
    Discrete Jitter          (DJ):    0.174ns
    Phase Error              (PE):    0.000ns

 

The number of logic levels indicates that you're trying to do too much in one clock period.  Most likely you have multiple levels of arithmetic, like Y = A + B + C + D - (E + F + G + H).  In order to run at 50 MHz, you'd need to pipeline this operation over several clock periods.  Alternatively, if you don't really need a new result every cycle (input data doesn't change at the clock rate), you could set a multicycle path.

-- Gabor
0 Kudos
Visitor jaga_nitc
Visitor
6,601 Views
Registered: ‎05-31-2016

Re: Timing failure due to too many CARRY4 components in VIVADO 2015.4.

Jump to solution

Dear Gabor,

 

Thank you very much for the quick reply.

 

Your guess about using multiple levels of arithmetic in one clock cycle seems to be true.

I will try to distribute the operations to different clock cycles and get back to you...

 

I don't understand the reason for creating 1373 CARRY4 elements. If possible, could you please clarify on that ?

 

Thanks again for the help,

Jaga

0 Kudos
Historian
Historian
6,556 Views
Registered: ‎01-23-2009

Re: Timing failure due to too many CARRY4 components in VIVADO 2015.4.

Jump to solution

Each CARRY4 (as its name implies) is responsible for the propagation of the carry for 4 bits during an addition/subtraction operation.

 

So, if you add two numbers that are each 1024 bits wide, you will end up with 256 CARRY4 elements. By definition (since this is carry propagation), these will be in series, and hence will contribute 256 CARRY4 elements to the critical path.

 

Having 1373 carry elements means you are trying to do something on the order of 5500 bits worth of addition/subtraction in one clock period. Whether that 10 cascaded additions with operands of 550 bits each, or 550 cascaded additions with operands of 10 bits each (or any combination) is what you need to determine and re-architect - this is simply too much logic to do in one clock cycle; it needs to be pipelined.

 

Remember - carry chains will be used for any addition based operation. Clearly this includes addition and subtraction, but it also includes numerical comparison (<, <=, >, >=) since these operations are implemented using subtraction (unless one of the comparators is constant).

 

Avrum

0 Kudos
Visitor jaga_nitc
Visitor
6,522 Views
Registered: ‎05-31-2016

Re: Timing failure due to too many CARRY4 components in VIVADO 2015.4.

Jump to solution

Thank you Avrum for that clarification.

 

I think the operation which is responsible for timing failure is a "64 bit division".

The statement goes something like this,

      transfer_func_sq(4095 downto 4032) := RESIZE(temp_tr_fun/sp_pow_spec(63 downto 0), 64);

The variables used here are of type unsigned and i am using ieee.numeric_std.all library.

 

Please suggest an efficient way of achieving 64-bit division. I am using Kintex-7 evaluation board. I am using 50MHz clock, so if possible I want to complete the division operation within 1 or 2 clock cycles. The utilization report suggests that I still have around 50% of resources left to use.

 

Thanks,

Jaga

0 Kudos
Instructor
Instructor
6,513 Views
Registered: ‎08-14-2007

Re: Timing failure due to too many CARRY4 components in VIVADO 2015.4.

Jump to solution

Division typically requires a much longer pipeline than one or two cycles.  It's certainly possible to have a divider that takes a new value on every clock cycle or two, however the latency through the divider will be on the order of 10's of cycles depending on the length of the operands.  If you are dividing by a constant, or a number that doesn't change often, you can get a much lower latency by calculating the inverse of the divisor and then doing multiplication, instead.

-- Gabor
0 Kudos