We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for
Did you mean:
Observer
10,379 Views
Registered: ‎07-04-2011

## Virtex6, 32-bit barrel shifter

Is there a Virtex6 optimal implementation of 32-bit barrel shifter? I mean source code and using FPGA hardware resources.

6 Replies
Professor
10,371 Views
Registered: ‎08-14-2007

## Re: Virtex6, 32-bit barrel shifter

I figured that XST synthesis is pretty good so I coded up a simple barrel shifter as:

`timescale 1ns / 1ps

module barrel32
(
input wire  [31:0] Din,
input wire   [4:0] Sft,
output reg  [31:0] Dout
);

always @*
Dout = (Din << Sft) | (Din >> (32 - Sft));

endmodule

Strangely. XST did not recognize this as a barrel shifter but instead coded it as

two shifters and a subtractor using 121 LUTs for logic.

Recoding as:

always @*
case (Sft)
0:  Dout = Din;
1:  Dout = {Din[30:0],Din[31]};
2:  Dout = {Din[29:0],Din[31:30]};
3:  Dout = {Din[28:0],Din[31:29]};
4:  Dout = {Din[27:0],Din[31:28]};
5:  Dout = {Din[26:0],Din[31:27]};
6:  Dout = {Din[25:0],Din[31:26]};
7:  Dout = {Din[24:0],Din[31:25]};
8:  Dout = {Din[23:0],Din[31:24]};
9:  Dout = {Din[22:0],Din[31:23]};
10:  Dout = {Din[21:0],Din[31:22]};
11:  Dout = {Din[20:0],Din[31:21]};
12:  Dout = {Din[19:0],Din[31:20]};
13:  Dout = {Din[18:0],Din[31:19]};
14:  Dout = {Din[17:0],Din[31:18]};
15:  Dout = {Din[16:0],Din[31:17]};
16:  Dout = {Din[15:0],Din[31:16]};
17:  Dout = {Din[14:0],Din[31:15]};
18:  Dout = {Din[13:0],Din[31:14]};
19:  Dout = {Din[12:0],Din[31:13]};
20:  Dout = {Din[11:0],Din[31:12]};
21:  Dout = {Din[10:0],Din[31:11]};
22:  Dout = {Din[9:0],Din[31:10]};
23:  Dout = {Din[8:0],Din[31:9]};
24:  Dout = {Din[7:0],Din[31:8]};
25:  Dout = {Din[6:0],Din[31:7]};
26:  Dout = {Din[5:0],Din[31:6]};
27:  Dout = {Din[4:0],Din[31:5]};
28:  Dout = {Din[3:0],Din[31:4]};
29:  Dout = {Din[2:0],Din[31:3]};
30:  Dout = {Din[1:0],Din[31:2]};
31:  Dout = {Din[0],Din[31:1]};
endcase

XST recognises this as a 32-bit 32:1 mux and uses only 96 LUTs for logic.

I haven't seen any other recommendations for coding barrel shifters.  Maybe someone

with a better synthesis tool like Synplify Pro could chime in with results for the above

code snips.

-- Gabor

-- Gabor
Highlighted
Instructor
10,363 Views
Registered: ‎07-21-2009

## Re: Virtex6, 32-bit barrel shifter

XST recognises this as a 32-bit 32:1 mux and uses only 96 LUTs for logic.

If looking for minimum LUT count, you can do a little bit better.

Level 1:  16 LUTs -- LUT6_2 used for 2-bit 2:1 MUX

Level 2:  32 LUTs -- LUT6 used for 1-bit 4:1 MUX

Level 3:  32 LUTs -- LUT6 used for 1-bit 4:1 MUX

I count 80 LUTs, assuming Spartan-6 or higher LUT capability.

Here's an example, confirmed 80 LUTs.

module barrel32(Data_IN, Sel, Data_OUT);

input   [31:0]    Data_IN;
input   [4:0]     Sel;
output  [31:0]    Data_OUT;

reg     [31:0]    Lvl1 = 0, Lvl2 = 0, Lvl3 = 0; // not really registers, Verilog compliance
wire    [63:0]    Stage1, Stage2;     // wires to simplify FOR loops
integer           i, j;               // loop variables

always @(*) Lvl1 <= Sel[4] ? {Data_IN[15:0], Data_IN[31:16]} : Data_IN; // rotate {0 | 16} bits

assign Stage1 = {Lvl1, Lvl1};    // wraparound a la Verilog

always @(*)    // rotate {0 | 4 | 8 | 12} bits
case (Sel[3:2])
2'b00:  Lvl2 <= Stage1[31:0];       // rotate by 0
2'b01:  for (i=0; i<=31; i=i+1)  Lvl2[i] <= Stage1[i+4];  // rotate by 4
2'b10:  for (i=0; i<=31; i=i+1)  Lvl2[i] <= Stage1[i+8];  // rotate by 8
2'b11:  for (i=0; i<=31; i=i+1)  Lvl2[i] <= Stage1[i+12]; // rotate by 12
endcase

assign Stage2 = {Lvl2, Lvl2};    // wraparound a la Verilog

always @(*)    // rotate {0 | 1 | 2 | 3} bits
case (Sel[1:0])
2'b00:  Lvl3 <= Stage2[31:0];       // rotate by 0
2'b01:  for (j=0; j<=31; j=j+1)  Lvl3[j] <= Stage2[j+1];  // rotate by 1
2'b10:  for (j=0; j<=31; j=j+1)  Lvl3[j] <= Stage2[j+2];  // rotate by 2
2'b11:  for (j=0; j<=31; j=j+1)  Lvl3[j] <= Stage2[j+3];  // rotate by 3

endcase

assign Data_OUT = Lvl3;

endmodule

-- Bob Elkind

SIGNATURE:

Summary:
1. Read the manual or user guide. Have you read the manual? Can you find the manual?
2. Search the forums (and search the web) for similar topics.
3. Do not post the same question on multiple forums.
4. Do not post a new topic or question on someone else's thread, start a new thread!
5. Students: Copying code is not the same as learning to design.
6 "It does not work" is not a question which can be answered. Provide useful details (with webpage, datasheet links, please).
7. You are not charged extra fees for comments in your code.
8. I am not paid for forum posts. If I write a good post, then I have been good for nothing.
Tags (1)
Teacher
10,349 Views
Registered: ‎08-14-2007

## Re: Virtex6, 32-bit barrel shifter

Hi,

I recently browsed the user guide for the DSP48 Macros.

If I remember correctly there was a barrel shifter application mentioned.

This would cost only one or two DSP48 Macros, depending on the word length.

Have a nice synthesis

Eilert

Professor
10,341 Views
Registered: ‎08-14-2007

## Re: Virtex6, 32-bit barrel shifter

If looking for minimum LUT count, you can do a little bit better.

I understood that 96 LUTs is not minimum.  I was hoping that XST could do better without

being spoon-fed the algorithm.  It's quite possible that 80 LUTs is not minimal either.  I'm

generally amazed when I see the reductions possible when you spend some time with

these problems.  I'd be willing to bet that Synplify Pro would do at least as well as your

80-LUT solution without going to the bother of explicitly coding it.  By the way, the 96-LUT

solution also uses 32 MUXF7's and although it has 3 levels of logic, only two are LUTs,

so it may be faster than your 80-LUT solution.  Given the room for improvement in

the synthesis tools, I'm also surprised that there is no CoreGen module for this.

-- Gabor

-- Gabor
Scholar
10,331 Views
Registered: ‎09-16-2009

## Re: Virtex6, 32-bit barrel shifter

Here's how I do it - very straightforward - 96 LUTS.  Not 80, but I think more clear that

the 'spoon-fed' option.

module barrel
#(
parameter
WORD_SIZE = 32
)
(
input wire  [ WORD_SIZE - 1 : 0 ] d_i,
input wire  [ clogb2( WORD_SIZE ) - 1 : 0 ] shift_i,
output wire [ WORD_SIZE - 1 : 0 ] d_o
);
`include "clogb2.v"

assign d_o = ( { d_i, d_i } << shift_i ) >> WORD_SIZE;

endmodule

Instructor
10,329 Views
Registered: ‎07-21-2009

## Mine is smaller than yours

I understood that 96 LUTs is not minimum.  I was hoping that XST could do better without being spoon-fed the algorithm.

My 'spoon-feeding' wasn't much worse than your 'spoon-feeding'.

I'd be willing to bet that Synplify Pro would do at least as well as your 80-LUT solution without going to the bother of explicitly coding it.

As well, yes.  Better (fewer LUTs)? No, unless using eilert's solution.  I wouldn't characterise either the 96-LUT design or the 80-LUT design as 'explicit coding', in the sense that instantiating primitives is the model of explicit coding.

By the way, the 96-LUT solution also uses 32 MUXF7's and although it has 3 levels of logic, only two are LUTs, so it may be faster than your 80-LUT solution.

My example is based on Spartan-6 (which is MUXF7-challenged) rather than Virtex-6.  Using Virtex-6, the number of LUTs doesn't change, but using MUXF7s allows a 2-stage (rather than 3-stage) design: a level of 8:1 and a level of 4:1.  MUXF7 doesn't help the LUT count, but it does help the delay stage count.

-- Bob Elkind

SIGNATURE: