cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
mvalvo
Adventurer
Adventurer
9,597 Views
Registered: ‎11-07-2007

find max, minimal hardware

Hi, I have 49, 18-bit numbers that I want to find the max value.  This is not a sorting problem, I just want to find the max of that group using the least amount of FPGA resources possible.  I currently use a comparator tree. thanks.
0 Kudos
11 Replies
woutersj
Explorer
Explorer
9,592 Views
Registered: ‎07-27-2009

You can use memories to reduce the number of comparators if latency if not directly the biggest issue. Maybe you can use shift registers. Hard to say what the most efficient solution would be without knowing how fast it needs to go or how fast these 49 numbers enter the system.

 

Cheers,

Johan

0 Kudos
mvalvo
Adventurer
Adventurer
9,590 Views
Registered: ‎11-07-2007

latency is not a constraint.  I'm interested in the least amount of hardware no matter how long the operation takes.
0 Kudos
woutersj
Explorer
Explorer
9,581 Views
Registered: ‎07-27-2009

I'd go for a sequential implementation with 1 comparison per step, so 48 comparisons spread over 48 clock cycles. If the numbers are all available at different clock cycles, you do a simple counter or FSM that updates the current maximum as soon as a new data item arrives. Further optimizations seem to make little sense, unless this is for some homework project.

 

Cheers,

Johan

0 Kudos
mvalvo
Adventurer
Adventurer
9,575 Views
Registered: ‎11-07-2007

the 49 numbers are not in memory and are ready at the same time.  should I put them in BRAM and then read them out 1-by-1 or just leave them in registers and compare sequentially?  would that create a giant mux? I'm doing this because I'm running out of room in the FPGA and I'm looking to optimize wherever I can.
0 Kudos
evgenis1
Advisor
Advisor
9,571 Views
Registered: ‎12-03-2007

If your FPGA has DSP48 you can use it in comparator mode.

 

 

OutputLogic 

Tags (1)
0 Kudos
woutersj
Explorer
Explorer
9,555 Views
Registered: ‎07-27-2009

I guess that if you put them in a BRAM, you'd still need a mux on the RAM data input, so the net effect would be very limited. Maybe some sort of shift register structure could at least get rid of the muxing logic but that would assume that the numbers won't change for at least 49 cycles.

Dedicated logic such as DSP slices can be used to reduce routing.

 

Good luck!

0 Kudos
bassman59
Historian
Historian
9,545 Views
Registered: ‎02-25-2008


mvalvo wrote:
the 49 numbers are not in memory and are ready at the same time.  should I put them in BRAM and then read them out 1-by-1 or just leave them in registers and compare sequentially?  would that create a giant mux? I'm doing this because I'm running out of room in the FPGA and I'm looking to optimize wherever I can.

Does the complexity you're trying to minimize include the mechanism which loads these 49 numbers?

----------------------------Yes, I do this for a living.
0 Kudos
mvalvo
Adventurer
Adventurer
9,543 Views
Registered: ‎11-07-2007

are you asking if I'm also interested in reducing the hardware to prepare the comparators? if so, then yes.  the 49 numbers are accumulators that take an indeterminate time to complete, but then remain constant for some time.  when they finish accumulating, then I try to find the max value.
0 Kudos
bassman59
Historian
Historian
9,541 Views
Registered: ‎02-25-2008


mvalvo wrote:
are you asking if I'm also interested in reducing the hardware to prepare the comparators? if so, then yes.  the 49 numbers are accumulators that take an indeterminate time to complete, but then remain constant for some time.  when they finish accumulating, then I try to find the max value.

Are all 49 accumulator outputs available simultaneously, and each is presented as a parallel word?

If so, then I think perhaps a max_value register, a simple counter from 1 to 49 and a mux selected by the counter will be the smallest implementation.

Obviously you must have some indicator telling you when the 49 values are valid. Use that to clear the maximum and start the counter. On each clock tick when the counter is not at terminal value (49) compare the max with the mux output, and update max as necessary.

----------------------------Yes, I do this for a living.
woutersj
Explorer
Explorer
4,651 Views
Registered: ‎07-27-2009

If these accumulators don't update all at once, you could get away with a BRAM structure where the accumulator value is stored in memory and 1,2...N adders update the accumulators. This requires that you know something about the statistics of the accumulator updates. A single port memory would support 1 update per 2 clocks, a dual port 1 update per clock, etc.

 

This architecture would probably show a significant decrease in resources and the memory can be reused to determine the max function or you can update the max on-the-gi whenever a (limited) number of accumulators update.

 

 

Cheers,

Johan

mvalvo
Adventurer
Adventurer
4,649 Views
Registered: ‎11-07-2007

unfortunately, the accumulators can update at the same time.  I think  I'll try comparing 1 at a time like bassman suggested.
0 Kudos