12-18-2014 08:59 PM
Can anyone please tell me what is the difference between FIFO generator and RAM based Shift register of Xilinx IP core???
I need to pass my input values, each of 16bits to shift registers having depth of 256 values (i.e. input width=16bits, totally input 256 values). Once the first input is ready to come as output (i mean the clock before the first output), I hope 256 values will be there in buffer.
It seems both FIFO generator and RAM based shift register are doing same for this opeation.
As all i know, RAM based shift register takes latency about 0+depth clocks (256 clocks for depth=256) but FIFO takes only 6 clocks independent of depth
Can anyone please tell me what is the exact difference bet ween these two?? and which one is better if I would like to access all 256 values for each clock??
I look forward to hearing from you.
Thanks for your time and consideration.
12-18-2014 09:12 PM
Shift register just acts as a delay line and need depth clock cycles to output the first data input, while fifo is used as memory storage or synchronising where memory read can happen in the same cycle FWFT configuration or after latency equal to the register delays as per the fifo configuration.
If both cases you cannot access all the 256 values in a sigle clock cycle.
Refer PG057 for more details about the Fifo operation.
12-19-2014 12:46 AM
Thanks for your answer.
Can you please suggest me how can I access all 256 values from the memory?
I thought Once the buffer is filled with 256 input values I can able to access all values from memory and do Multiply and accumulate operation..
Figure 1 below explains my exact problem.. whereas all 256 input values only available after 256 clock (so called latency) and after that I need to read all 256 input values and do the multiply,accumulate operation.
and figure 2 below explains multiply and accumulation of next clock...
Can you please kindly suggest me how can I make it true??
12-22-2014 03:41 PM
Looks like a standard FIR to me. FIR filters are in a FPGA's sweet spot - it's usually fairly straightforward to crank all kinds of DSP calcs like these through a FPGA.
A 256 Element sum is rather BIG however, no matter how you do it. Do your data rates truly require that kind of ALU bandwidth?
I'd suggest looking at the Xilinx Extreme DSP guides (I like the Virtex-5 version best - UG193 - almost everything in there is relevant to future technologies too).
If your data rates are that high, then you'll most likely need the full 256 DSP48's. In that case, read the guides above, and use the internal storage of those cells in a "Adder Cascade" form instead of the adder tree. Both your coefficient, and data sample storage will be within the DSP48 itself.
If your data rates are lower, then there's various other options of combining RAMS / Shift registers and calculating your filter over multiple clock cycles in a stateful manner.
It all depends on your requirements.
01-04-2015 05:28 PM
Thanks for your suggestion mark.
As you metioned I have gone through the DSP48a macro IP core of Xilinx. I used the instruction P+A*B for multiply and accumulation.
For 256 MAC operation I have defined 16 DSP48a macros each computing 16 MAC opreations. my implementation works well before the MAC operation but It doesn't give output.
My question is that the Instruction for MAC opreation (P+A*B) is correct??
Please advise me.. Thanks
01-05-2015 09:17 AM
Are you simulating? This will be the best way for you to debug. You may even push into the behavioral model of the DSP48 - the names of the internal wires of the model match pretty closely with the Figures in the documentation.