12-28-2009 08:47 AM
12-28-2009 10:45 AM
You can use memories to reduce the number of comparators if latency if not directly the biggest issue. Maybe you can use shift registers. Hard to say what the most efficient solution would be without knowing how fast it needs to go or how fast these 49 numbers enter the system.
12-28-2009 01:29 PM
I'd go for a sequential implementation with 1 comparison per step, so 48 comparisons spread over 48 clock cycles. If the numbers are all available at different clock cycles, you do a simple counter or FSM that updates the current maximum as soon as a new data item arrives. Further optimizations seem to make little sense, unless this is for some homework project.
12-28-2009 01:48 PM
12-28-2009 10:33 PM
I guess that if you put them in a BRAM, you'd still need a mux on the RAM data input, so the net effect would be very limited. Maybe some sort of shift register structure could at least get rid of the muxing logic but that would assume that the numbers won't change for at least 49 cycles.
Dedicated logic such as DSP slices can be used to reduce routing.
12-29-2009 08:42 AM
the 49 numbers are not in memory and are ready at the same time. should I put them in BRAM and then read them out 1-by-1 or just leave them in registers and compare sequentially? would that create a giant mux? I'm doing this because I'm running out of room in the FPGA and I'm looking to optimize wherever I can.
Does the complexity you're trying to minimize include the mechanism which loads these 49 numbers?
12-29-2009 08:51 AM
12-29-2009 08:55 AM
are you asking if I'm also interested in reducing the hardware to prepare the comparators? if so, then yes. the 49 numbers are accumulators that take an indeterminate time to complete, but then remain constant for some time. when they finish accumulating, then I try to find the max value.
Are all 49 accumulator outputs available simultaneously, and each is presented as a parallel word?
If so, then I think perhaps a max_value register, a simple counter from 1 to 49 and a mux selected by the counter will be the smallest implementation.
Obviously you must have some indicator telling you when the 49 values are valid. Use that to clear the maximum and start the counter. On each clock tick when the counter is not at terminal value (49) compare the max with the mux output, and update max as necessary.
12-29-2009 10:35 AM
If these accumulators don't update all at once, you could get away with a BRAM structure where the accumulator value is stored in memory and 1,2...N adders update the accumulators. This requires that you know something about the statistics of the accumulator updates. A single port memory would support 1 update per 2 clocks, a dual port 1 update per clock, etc.
This architecture would probably show a significant decrease in resources and the memory can be reused to determine the max function or you can update the max on-the-gi whenever a (limited) number of accumulators update.