Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

- Community Forums
- :
- Forums
- :
- Vivado RTL Development
- :
- Implementation
- :
- Left shift causes bit expansion in implementation

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Highlighted

silverace99work

Adventurer

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-18-2019 11:12 AM

437 Views

Registered:
03-04-2018

Hi, pretty green FPGA developer here. My environment is the following:

- Vivado 2018.2

- Virtex 7 board

I am designing a non-IEEE floating point adder. I have a piece of logic that has a left shift operation going into a DSP_48 adder. Below code is just quick example, not actual code:

logic [12:0] num; logic [7:0] shift; logic [25:0] adder_in1;

logic [26:0] sum; .... adder_in1 <= num <<< shift;

//pushed into DSP_48 using use_dsp48 attribute always @(posedge clk) begin if (!rstn) sum <= 0; else begin sum <= adder_in1 + adder_in2; end end

So my problem is this: The above left shift goes into a 26-bit register adder_in1; However, implementation schematic and conversations with my coworkers tells me that the shift causes the actual implemented size of adder_in1 to balloon to a much bigger size (in my case, 100+ bits). So I end up in reality with a 100-something bit DSP_48 adder that cannot make timing.

How can I best deal with this? Is there a directive I can set to force implementation not to invisibly expand the bit width of my shift output?

Appreciate the help.

1 Solution

Accepted Solutions

Highlighted

avrumw

Guide

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-18-2019 02:22 PM

395 Views

Registered:
01-23-2009

As coded, you have asked for a one clock cycle addition. Regardless of how you create this (via inference, instantiation, wizard), if you really have only one clock cycle for the addition, you are going to get the same result; there are only two ways to implement this

- 3 cascaded DSP48 or
- 104 bit carry implemented in the fabric (using the fast carry logic)
- If the tool is choosing the DSP48 then this is probably slower...

If this doesn't meet your timing requirement, then you have to change the **architecture**. One way to do this is to pipeline the adder; doing this, of course, changes the design - you need to accommodate the extra latency in the rest of your design.

If you can add pipeline, then you can definitely instantiate the DSP directly or use the wizard, but you still may be able to infer the addition. If you change your code to simply add 2 pipeline registers after the addition:

always @(posedge clk) begin if (!rstn) begin

sum <= 0;

sum_t1 <= 0;

sum_t2 <= 0;

end else begin sum_t1 <= adder_in1 + adder_in2;

sum_t2 <= sum_t1;

sum <= sum_t2; end end

The tool **may** be able to move these pipeline flip-flops into the cascade paths; I am not sure since I never tried this, but I know that it can do this for the M register in the DSP48 (convert a multiply followed by an additional pipe stage into a DSP that uses the M register in the middle of the multiply), so it is worth a try...

Avrum

4 Replies

Highlighted

avrumw

Guide

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-18-2019 01:21 PM

413 Views

Registered:
01-23-2009

The rules of Verilog are clear here, and the size of adder_in1 is 26 - it cannot be more. How have you come to the conclusion that this is not the case (and that adder_in1 is larger than 26 bits)?

Furthermore, even if adder_in were larger than 26 bits, sum is only 27 bits - any addition that results in something larger than this will simply discard the upper bits (and the associated portion of the adder).

If what you have said is true (adder_in1 is larger than 26 bits, and the resulting addition is also larger than 27 bits), then this is a synthesis bug. While synthesis bugs can happen from time to time, they are pretty rare...

So, again, what have you observed that leads you to come to your conclusion (that the result of the shift is over 100 bits wide)?

Avrum

Highlighted

silverace99work

Adventurer

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-18-2019 02:01 PM

403 Views

Registered:
03-04-2018

Perhaps I should open a separate post for this, but what would you recommend I do in this case? Would I be better off manually instantiating the DSP_48's (or using wizard IP) to create the cascaded adder and add pipeline flops to make timing, or is there a way to infer such thing?

Highlighted

avrumw

Guide

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-18-2019 02:22 PM

396 Views

Registered:
01-23-2009

As coded, you have asked for a one clock cycle addition. Regardless of how you create this (via inference, instantiation, wizard), if you really have only one clock cycle for the addition, you are going to get the same result; there are only two ways to implement this

- 3 cascaded DSP48 or
- 104 bit carry implemented in the fabric (using the fast carry logic)
- If the tool is choosing the DSP48 then this is probably slower...

If this doesn't meet your timing requirement, then you have to change the **architecture**. One way to do this is to pipeline the adder; doing this, of course, changes the design - you need to accommodate the extra latency in the rest of your design.

If you can add pipeline, then you can definitely instantiate the DSP directly or use the wizard, but you still may be able to infer the addition. If you change your code to simply add 2 pipeline registers after the addition:

always @(posedge clk) begin if (!rstn) begin

sum <= 0;

sum_t1 <= 0;

sum_t2 <= 0;

end else begin sum_t1 <= adder_in1 + adder_in2;

sum_t2 <= sum_t1;

sum <= sum_t2; end end

The tool **may** be able to move these pipeline flip-flops into the cascade paths; I am not sure since I never tried this, but I know that it can do this for the M register in the DSP48 (convert a multiply followed by an additional pipe stage into a DSP that uses the M register in the middle of the multiply), so it is worth a try...

Avrum

Highlighted
Thanks @avrumw . FYI, I attempted to infer the cascading adder with pipes as described, but it looks like it did not work. I will have implement the adder at a lower level of coding.

silverace99work

Adventurer

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-19-2019 09:17 AM

354 Views

Registered:
03-04-2018