UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Adventurer
Adventurer
475 Views
Registered: ‎11-29-2015

What is the most space-efficient count_ones function?

Given that the requirement is to count ones in one clock tick, for a std_logic_vector of any size, what function will use the least amount of LUTS?

I just know of the method below, but I'm reading that it performs poorly in terms of area vs other designs, but the other designs all have a fixed input bit width.

function count_ones(s : std_logic_vector) return integer is
      variable temp : natural := 0;
    begin
      for i in s'range loop
        if s(i) = '1' then temp := temp + 1; 
        end if;
      end loop;
      
      return temp;
    end function count_ones;
0 Kudos
23 Replies
Historian
Historian
456 Views
Registered: ‎01-23-2009

Re: What is the most space-efficient count_ones function?

While there are some exceptions, in general the RTL coding style you use to describe combinatorial logic does not affect the number of LUTs required for the function. Synthesis is very good at doing Boolean minimization, so regardless of what form the logic starts at (after elaboration) synthesis will likely find a similar result (in terms of usage) after it has completed its minimization.

Avrum

0 Kudos
Adventurer
Adventurer
439 Views
Registered: ‎11-29-2015

Re: What is the most space-efficient count_ones function?

I have a 300bit vector which somehow results in 4500 adders... Isn't there a more efficient way to count 300 bits?

Module census_vector 
Detailed RTL Component Info : 
+---Adders : 
	   2 Input      9 Bit       Adders := 4500  
+---XORs : 
	   2 Input    300 Bit         XORs := 30    
+---Registers : 
	               11 Bit    Registers := 2     
	                5 Bit    Registers := 1     
	                1 Bit    Registers := 2     
+---Muxes : 
	   2 Input     11 Bit        Muxes := 1     
	   2 Input      9 Bit        Muxes := 41    
	   2 Input      5 Bit        Muxes := 20    
	   2 Input      4 Bit        Muxes := 12    
	   2 Input      3 Bit        Muxes := 6     
	   2 Input      2 Bit        Muxes := 3     
	   2 Input      1 Bit        Muxes := 2 
0 Kudos
Adventurer
Adventurer
412 Views
Registered: ‎11-29-2015

Re: What is the most space-efficient count_ones function?

I found the code below which was reported to be very space efficient but it I don't quite follow the logic enough to expand it out to 300 bits without coping out and just adding 10 them together. How would you modify this function to accept a 300 bit input?

function count_ones_32_bit(s : std_logic_vector) return integer is
    
        variable temp  : unsigned (31 downto 0);
        variable temp1 : unsigned (31 downto 0);
        variable temp2 : unsigned (31 downto 0);
        variable temp3 : unsigned (31 downto 0);
        variable temp4 : unsigned (31 downto 0);
        variable temp5 : unsigned (31 downto 0);

    begin
        temp  := unsigned(s);
        temp1 := (temp  AND X"5555_5555") + ( ( temp  srl 1)  AND X"5555_5555");  -- 0..2 out  x16
        temp2 := (temp1 AND X"3333_3333") + ( ( temp1 srl 2)  AND X"3333_3333");  -- 0..4 out  x8
        temp3 := (temp2 AND X"0707_0707") + ( ( temp2 srl 4)  AND X"0707_0707");  -- 0..8 out  x4
        temp4 := (temp3 AND X"001f_001f") + ( ( temp3 srl 8)  AND X"001f_001f");  -- 0..16 out x2
        temp5 := (temp4 AND X"0000_003f") + ( ( temp4 srl 16) AND X"0000_003f");  -- 0..32 out

        return to_integer(temp5(5 downto 0));
    end count_ones_32_bit;
0 Kudos
Voyager
Voyager
380 Views
Registered: ‎04-26-2012

Re: What is the most space-efficient count_ones function?

@david12341234   "How would you modify this function to accept a 300 bit input?"

I have always liked the "shift, mask, and add" approach because it is simple yet produces near-optimal results with most synthesizers - that said, I haven't checked the results in Vivado.

Below is a version using an entity with unconstrained ports that works to 512 bits ( limited only by the internal mask table size, see FIXME note in code).

This code hasn't been simulated, so it may have stupid errors...

The input port lengths are sanity checked using assertions, so be sure you turn on synthesis assertions in Vivado.

--
-- <bit_count.vhd>
--

---------------------------------------------------------------
--
-- (C) COPYRIGHT 2018  Brian Davis
--
-- Code released under the terms of the MIT license
-- https://opensource.org/licenses/MIT
--
---------------------------------------------------------------

--
-- bit counting 
--
--   - uses classical software shift-and-mask addition technique to count bits
--     XST optimizes this into a cascade of small adders
--
--   - unconstrained input vector supported by using a loop with a signal array
--       - input is currently limited to 512 bits by the mask array size
--

library ieee;
  use ieee.std_logic_1164.all;
  use ieee.numeric_std.all;
  use ieee.math_real.all;


entity bit_count is
  port
    (   
      din    : in  std_logic_vector;
      count  : out std_logic_vector
    );

end bit_count;


architecture looped_array of bit_count is

  --
  -- FIXME: currently using a constant array of masks good to 512 bits
  --        should just create these on the fly with a function call in the loop
  --
  -- note mask indexing starts at one to match loop range
  --
  type t_mask_array is array(1 to 9) of unsigned(511 downto 0);

  constant mask : t_mask_array :=
    (
      X"5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555",
      X"3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333",
      X"0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707",
      X"000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f",
      X"0000_001f_0000_001f_0000_001f_0000_001f_0000_001f_0000_001f_0000_001f_0000_001f_0000_001f_0000_001f_0000_001f_0000_001f_0000_001f_0000_001f_0000_001f_0000_001f",
      X"0000_0000_0000_003f_0000_0000_0000_003f_0000_0000_0000_003f_0000_0000_0000_003f_0000_0000_0000_003f_0000_0000_0000_003f_0000_0000_0000_003f_0000_0000_0000_003f",
      X"0000_0000_0000_0000_0000_0000_0000_007f_0000_0000_0000_0000_0000_0000_0000_007f_0000_0000_0000_0000_0000_0000_0000_007f_0000_0000_0000_0000_0000_0000_0000_007f",
      X"0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_00ff_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_00ff",
      X"0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_01ff_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_01ff"
    );

  --
  -- number of adder stages needed to cover the input vector length
  --
  constant STAGES : natural := natural( ceil( log2( real(din'length) ) ));

  --
  -- note there are STAGES+1 entries in array
  --
  type t_sum_array is array(0 to STAGES) of unsigned(din'range);

  signal sums : t_sum_array;

begin

  --
  -- port size sanity check 
  -- enable synthesis assertions when using Vivado
  --
  assert din'length <= mask(1)'length
    report   "Input vector size exceeds internal mask array size"
    severity ERROR;

  assert count'length >= STAGES
    report   "Output count vector size is too small to hold result"
    severity ERROR;

  --
  -- count bits by masking and adding adjacent bit groups
  --
  process(din,sums)
    begin

      --
      -- initialize index zero of the sum array to the data input
      --
      sums(0) <= unsigned(din);

      --
      -- stage loop
      --   note that the loop index starts at 1
      --
      stage_loop: for i in 1 to STAGES loop

        sums(i) <=   (   sums(i-1)                AND mask(i)(din'length-1 downto 0) ) 
                   + ( ( sums(i-1) srl 2**(i-1) ) AND mask(i)(din'length-1 downto 0) );

      end loop;

      count <= std_logic_vector( sums(STAGES)( count'length-1 downto 0 ) );

    end process;

end looped_array;

-Brian

0 Kudos
Adventurer
Adventurer
366 Views
Registered: ‎11-29-2015

Re: What is the most space-efficient count_ones function?

@brimdavis

I created this function based on your entity. It is hacky in that it only works with a 512 bit input. It would be nice to improve this to accept 512 bits or less, or even to accept any vector length.

function count_512_ones(din : std_logic_vector) return integer is
        --
        -- FIXME: currently using a constant array of masks good to 512 bits
        --        should just create these on the fly with a function call in the loop
        --
        -- note mask indexing starts at one to match loop range
        --
        type t_mask_array is array(1 to 9) of unsigned(511 downto 0);

        constant mask : t_mask_array :=
            (
            X"5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555_5555",
            X"3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333_3333",
            X"0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707_0707",
            X"000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f_000f",
            X"0000_001f_0000_001f_0000_001f_0000_001f_0000_001f_0000_001f_0000_001f_0000_001f_0000_001f_0000_001f_0000_001f_0000_001f_0000_001f_0000_001f_0000_001f_0000_001f",
            X"0000_0000_0000_003f_0000_0000_0000_003f_0000_0000_0000_003f_0000_0000_0000_003f_0000_0000_0000_003f_0000_0000_0000_003f_0000_0000_0000_003f_0000_0000_0000_003f",
            X"0000_0000_0000_0000_0000_0000_0000_007f_0000_0000_0000_0000_0000_0000_0000_007f_0000_0000_0000_0000_0000_0000_0000_007f_0000_0000_0000_0000_0000_0000_0000_007f",
            X"0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_00ff_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_00ff",
            X"0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_01ff_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_01ff"
            );

        --
        -- number of adder stages needed to cover the input vector length
        --
        -- constant STAGES : natural := natural( ceil( log2( real(din'length) ) ));
        constant STAGES : natural := 9;

        --
        -- note there are STAGES+1 entries in array
        --
        type t_sum_array is array(0 to STAGES) of unsigned(511 downto 0);

        variable sums : t_sum_array;
    begin
        --
        -- initialize index zero of the sum array to the data input
        --
        sums(0) := unsigned(din);

        --
        -- stage loop
        --   note that the loop index starts at 1
        --
        stage_loop: for i in 1 to STAGES loop
            sums(i) := (sums(i-1) AND mask(i)(din'length-1 downto 0)) + ((sums(i-1) srl 2**(i-1)) AND mask(i)(din'length-1 downto 0) );
        end loop;

        return to_integer(sums(STAGES)(511 downto 0));
    end count_512_ones;
0 Kudos
Voyager
Voyager
362 Views
Registered: ‎04-26-2012

Re: What is the most space-efficient count_ones function?

@david12341234   "It is hacky in that it only works with a 512 bit input. It would be nice to improve this to accept 512 bits or less, or even to accept any vector length."

I haven't reviewed your code, but my original example sized itself to an unconstrained std_logic_vector input port, and would work with any vector length of 512 bits or less.

EDIT: if you restore the original log2 code for STAGES, and change your function's last line to something like the following, I think it should work for any input vector up to 512 bits:

return to_integer(sums(STAGES)(STAGES-1 downto 0));

( I would also note that in practice to meet timing constraints, deep adder trees benefit from a pipeline register in each stage; this is easy to do in my entity version, but adds STAGES clock cycles of latency. )

Also FYI, an elegant recursive bit counting function was posted by @josyb  on the late Programmable Planet forums:

-- countbits.vhd
-- 07/12/2012 jb
-- recursive counting of bits in a std_logic_vector (of arbitrary length)

    function count_bits( s : std_logic_vector ) return natural is
        begin
            if (s'length = 1) then
                if ( s = "1") then
                    return 1 ;
                else
                    return 0 ;
                end if ;
            else
                return    count_bits( s(s'high 
                                                 downto s'low + s'length / 2) ) 
                         + count_bits( s(s'low + s'length / 2 - 1 
                                                 downto s'low ) ) ;
            end if ;		
        end function ;

I haven't tried the recursive version in Vivado, but sometimes synthesis tools have problems with, or get excruciatingly slow, upon encoutering deep recursion.

 -Brian

0 Kudos
Observer josyb
Observer
352 Views
Registered: ‎05-13-2015

Re: What is the most space-efficient count_ones function?

Brian,

thanks for the mention. Do you have more of that 'All Programmable Planet' stuff? That site just disappeared from the web without a trace ...

You may want to look at this page: https://github.com/josyb/recurse

It is about recursion in MyHDL but most of it applies to VHDL too.

It also shows that the recursive implementation will have the least levels of logic. 

 

Regards,

Josy

 

Voyager
Voyager
343 Views
Registered: ‎04-26-2012

Re: What is the most space-efficient count_ones function?

@josyb "Do you have more of that 'All Programmable Planet' stuff?"

Sadly, I only have copies of stuff from the threads that I'd posted to.

"It also shows that the recursive implementation will have the least levels of logic. "

The looped shift-and-mask code that I posted above relies on the synthesis tool to optimize out masked-off bits of the carry chain, so it ends up generating the same tree of small-ish adders as does your recursive version.

It's not as elegant as the recursive version, but if I replace the hardcoded mask array with a function to create the masks on the fly, I think it would at least get an honorable mention :)

"You may want to look at this page: https://github.com/josyb/recurse"

Thanks for the link! I've been meaning to experiment with MyHDL.

-Brian

 

0 Kudos
Voyager
Voyager
332 Views
Registered: ‎04-26-2012

Re: What is the most space-efficient count_ones function?

Earlier, I wrote "if I replace the hardcoded mask array with a function to create the masks on the fly"

Here's a version of the shift-and-mask code (entity version) without the maximum-of-512-bit input vector length, also fixed a bug in the assertion check:

EDIT: bogus syntax check errors (red underlines) in the Vivado text editor regarding math.real functions can be safely ignored, as Vivado Synthesis supports elaboration time computations using the math.real functions.

--
-- <bit_count.vhd>
--

---------------------------------------------------------------
--
-- (C) COPYRIGHT 2018  Brian Davis
--
-- Code released under the terms of the MIT license
-- https://opensource.org/licenses/MIT
--
---------------------------------------------------------------

--
-- bit counting 
--
--   - uses classical software shift-and-mask addition technique to count bits
--     XST optimizes this into a tree of small adders
--

library ieee;
  use ieee.std_logic_1164.all;
  use ieee.numeric_std.all;
  use ieee.math_real.all;


entity bit_count is
  port
    (   
      din    : in  std_logic_vector;
      count  : out std_logic_vector
    );

end bit_count;


architecture looped_function of bit_count is

  constant SIZE : natural := din'LENGTH;
  
  --
  -- produce bit mask arrays to add consecutive groups of bits, e.g.
  --
  -- STAGE  MASK
  --   1    X"5555_5555"
  --   2    X"3333_3333"
  --   3    X"0707_0707"
  --   4    X"000f_000f"
  --
  function bit_mask( stage : natural ) return unsigned is
    variable mask : unsigned(SIZE-1 downto 0);

    constant STRIDE : natural := 2**(stage);
    variable i : natural := 0;

  begin

     mask := ( others => '0');

     while i < SIZE loop

       for j in 0 to stage-1 loop                        -- EDIT: fixed loop index 
         if i+j < SIZE then mask(i+j) := '1';  end if;   -- EDIT: fixed non-power-of-two bug
       end loop;

       i := i + STRIDE;

     end loop;

     return mask;
  end;


  --
  -- number of adder stages needed to cover the input vector length
  --
  constant STAGES : natural := natural( ceil( log2( real(SIZE) ) ));

  --
  -- note there are STAGES+1 entries in array
  --
  type t_sum_array is array(0 to STAGES) of unsigned(din'range);

  signal sums : t_sum_array;

begin

  --
  -- port size sanity check 
  -- enable synthesis assertions when using Vivado
  --
  assert count'LENGTH > STAGES
    report   "Output count vector size is too small to hold result"
    severity ERROR;

  --
  -- count bits by masking and adding adjacent bit groups
  --
  process(din,sums)
    begin

      --
      -- initialize index zero of the sum array to the data input
      --
      sums(0) <= unsigned(din);

      --
      -- stage loop
      --   note that the loop index starts at 1
      --
      stage_loop: for i in 1 to STAGES loop

        sums(i) <=   (   sums(i-1)                AND bit_mask(i) ) 
                   + ( ( sums(i-1) srl 2**(i-1) ) AND bit_mask(i) );

      end loop;

    end process;

  count <= std_logic_vector( sums(STAGES)( count'length-1 downto 0 ) );

end looped_function;

 

0 Kudos
Explorer
Explorer
300 Views
Registered: ‎10-23-2018

Re: What is the most space-efficient count_ones function?

@david12341234

That sounds like a fun challenge... What are the utilization numbers you are trying to beat?  What is the balance of resources you are looking for? (e.g. reduce adders, ...) In other words what resources are most precious OR is it just 'generally' uses a small amount of resources (and mix will do)? A 300 bit specific solution is acceptable? Any other criteria?

0 Kudos
Adventurer
Adventurer
295 Views
Registered: ‎11-29-2015

Re: What is the most space-efficient count_ones function?

@brimdavis

 

Value 507 is out of range 0 to 506 at the line marked below

function bit_mask( stage : natural ) return unsigned is
        variable mask : unsigned(SIZE-1 downto 0);
        constant STRIDE : natural := 2**(stage);
        variable i : natural := 0;
    begin
        mask := ( others => '0');
        while i < SIZE loop
            for j in 0 to STRIDE/2-1 loop
------->     mask(i+j) := '1';   <----------------
            end loop;
            i := i + STRIDE;
        end loop;
        return mask;
    end;
0 Kudos
Voyager
Voyager
279 Views
Registered: ‎04-26-2012

Re: What is the most space-efficient count_ones function?

@david12341234  "Value 507 is out of range 0 to 506 at the line marked below"

I tested that second entity version for inputs of 256/512/1024, but those were all power of two; probably need to add another if to limit the index of the inner loop, something like:

  for j in 0 to STRIDE/2-1 loop
     if i+j < SIZE then mask(i+j) := '1';  end if;
  end loop;

If that doesn't fix it, post your latest function code and caller and I'll look at it.

-Brian

Voyager
Voyager
273 Views
Registered: ‎04-26-2012

Re: What is the most space-efficient count_ones function?

For reference, here's a list of various bit counting techniques from comp.arch.fpga/comp.lang.vhdl/comp.lang.verilog, where this question pops up periodically:

  Counting the number of ones present:  https://groups.google.com/d/topic/comp.lang.verilog/_-xAzd8yfzI/discussion

  Counting ones:   https://groups.google.com/d/topic/comp.arch.fpga/2DhAH2VBhKg/discussion

 The following thread includes some synthesis result summaries:

  Loop and manually unrolled loop: https://groups.google.com/group/comp.lang.vhdl/msg/01e25cf637b280db

  A function with a loop: https://groups.google.com/group/comp.lang.vhdl/msg/fcb25e7e5613bffe

  Explicit coding of adder structure: https://groups.google.com/group/comp.lang.vhdl/msg/48d337742b3d37a9

  Hybrid, lookup tables + adder https://groups.google.com/group/comp.lang.vhdl/msg/410a3b0c2e7c2402

  My mask-and-add-vectors 32 bit version: https://groups.google.com/group/comp.lang.vhdl/msg/aec4118508169efb

  Results comparison in LUTs: https://groups.google.com/group/comp.lang.vhdl/msg/6924f9a1e0f04336

And for completeness, josyb's page on recursive bit counting that was mentioned up thread:  https://github.com/josyb/recurse

 -Brian

Adventurer
Adventurer
244 Views
Registered: ‎11-29-2015

Re: What is the most space-efficient count_ones function?

@brimdavis

I saw that thread you mentioned originally, before posting here. The consensus seems to be that the mask-and-add technique is the best (Fastest synthesis time. Lowest resource usage.) The recursive version took hours to synthesize.

Here is the updated function that I was originally looking for in my first post, based on your examples. Works for any size input, odd or even. I appreciate your advice.

function count_ones(din : std_logic_vector) return integer is
        constant SIZE : natural := din'LENGTH;

        function bit_mask( stage : natural ) return unsigned is
            variable mask : unsigned(SIZE-1 downto 0);
            constant STRIDE : natural := 2**(stage);
            variable i : natural := 0;
        begin
            mask := ( others => '0');
            while i < SIZE loop
                for j in 0 to STRIDE/2-1 loop
                    if i+j < SIZE then mask(i+j) := '1';  end if;
                end loop;
                i := i + STRIDE;
            end loop;
            return mask;
        end;
        -- number of adder stages needed to cover the input vector length
        constant STAGES : natural := natural( ceil( log2( real(SIZE) ) ));

        -- note there are STAGES+1 entries in array
        type t_sum_array is array(0 to STAGES) of unsigned(SIZE-1 downto 0);

        variable sums : t_sum_array;
    begin
        -- initialize index zero of the sum array to the data input
        sums(0) := unsigned(din);
        -- stage loop
        --   note that the loop index starts at 1
        stage_loop: for i in 1 to STAGES loop
            sums(i) := (sums(i-1) AND bit_mask(i)(SIZE-1 downto 0)) + ((sums(i-1) srl 2**(i-1)) AND bit_mask(i)(SIZE-1 downto 0));
        end loop;
        return to_integer(sums(STAGES)(STAGES-1 downto 0));
    end count_ones;
0 Kudos
Voyager
Voyager
207 Views
Registered: ‎04-26-2012

Re: What is the most space-efficient count_ones function?

@david12341234   "Here is the updated function that I was originally looking for in my first post, based on your examples."

One correction to the bit_mask() function, I made a mistake (mostly harmless) in the j loop index, the code should read:

    for j in 0 to stage-1 loop

The original code produced the correct results, but the bit masks were wider than they needed to be (which the synthesizer then optimized away).

In Vivado 2017.4 targeting an Artix7, I see the following LUT usage:

  •  256 bits  386 LUTs
  •  128 bits  189 LUTs
  •   32 bits   42 LUTs

-Brian

 

0 Kudos
Explorer
Explorer
201 Views
Registered: ‎10-23-2018

Re: What is the most space-efficient count_ones function?

@brimdavis

Well done Brian!

You have gone the extra mile on this one, with all the links and updates. May I ask one favor, I would love to try this out on the UltraScale, and to insure I don’t miss one of the ‘fixes’, (e.g. end of with your final solution) would it be possible to upload the Vivado project?

Thanks in advance

0 Kudos
Voyager
Voyager
181 Views
Registered: ‎04-26-2012

Re: What is the most space-efficient count_ones function?

@xilinxacct  "would it be possible to upload the Vivado project"

Code, testbench, simulation makefile (GHDL/GTKWave), and Vivado Artix7 project are checked in here:

  https://github.com/brimdavis/fpga_stuff/tree/master/bit_count

Note that the above repository contains my entity version of the shift-and-mask bit counting code.

If I have time someday, I'll add a package with a few of the bit counting variants as functions; as I said up thread, I usually avoid functions for potentially deep trees because they are more difficult to pipeline (i.e. you have to surround them with N stages of registers and turn on the synthesis retiming options.)

-Brian

Explorer
Explorer
174 Views
Registered: ‎10-23-2018

Re: What is the most space-efficient count_ones function?

@brimdavis

@david12341234

Thanks... for those who care... The UltraScale utilization come in at...

256 = 387 LUTs 126 carries

128 = 190 LUTs 62 carries

32 = 47 LUTs 15 carries

0 Kudos
Mentor hgleamon1
Mentor
142 Views
Registered: ‎11-14-2011

Re: What is the most space-efficient count_ones function?

Just to be a bit extreme (the original query was least amount of LUTs), you could use a BRAM as a look up table, thereby not using any LUTs at all.

Routing a 300+ bit address might be awkward, though and may depend, in a real system, on other BRAM usage.

 

----------
"That which we must learn to do, we learn by doing." - Aristotle
0 Kudos
Adventurer
Adventurer
118 Views
Registered: ‎11-29-2015

Re: What is the most space-efficient count_ones function?

@hgleamon1

Wouldn't you need 2^300 addresses though?! Is there a trick to somehow only use 300 addresses/elements?

0 Kudos
Mentor hgleamon1
Mentor
111 Views
Registered: ‎11-14-2011

Re: What is the most space-efficient count_ones function?

Yes, for a 300 bit address there would be a lot of data and probably impractically so (might work for simulation purposes but then you wouldn't care about LUT usage!)..

The memory will of course have the same output for a given number of addresses (e.g. every possible "one-hot" address will return 0b1) and could therefore itself be subject to some optimisation.

You could probably use the same function as described previously to populate the RAM/ROM at compile time if the size of vector is known at compile time.

As I implied in my original response, it's a bit extreme and probably more an academic exercise but for a smaller address vector you could use a BRAM to completely eliminate LUT usage.

----------
"That which we must learn to do, we learn by doing." - Aristotle
0 Kudos
Historian
Historian
88 Views
Registered: ‎01-23-2009

Re: What is the most space-efficient count_ones function?

Just to put this in perspective - a RAM with 300 bits that can return the number from 0 to 300 would be 1.83x10^91 bits of memory.

For comparison, the total amount of computer memory on the planet is estimated to be 2x10^21 bits...

Avrum

0 Kudos
Mentor hgleamon1
Mentor
75 Views
Registered: ‎11-14-2011

Re: What is the most space-efficient count_ones function?

I apologise for derailing the thread slightly but my point was really about approaching the issue from a completely different perspective, rather than about identifying a physically implementable solution, i.e. in principle a solution that uses the least amount of LUTs is one that uses no LUTs at all.

 

I trust that was understood :)

----------
"That which we must learn to do, we learn by doing." - Aristotle
0 Kudos