cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
gmvivek
Visitor
Visitor
6,387 Views
Registered: ‎12-17-2014

Difference between FIFO and RAM based Shift Register of Xilinx IP CORE

Dear all,

 

Can anyone please tell me what is the difference between FIFO generator and RAM based Shift register of Xilinx IP core???

 

I need to pass my input values, each of 16bits  to shift registers  having depth of 256 values (i.e. input width=16bits, totally input 256 values). Once the first input is  ready to come as output (i mean the clock before the first output), I hope 256 values will be there in buffer.

 

It seems both FIFO generator and RAM based shift register are doing same for this opeation.

 


As all i know,  RAM based shift register takes latency about
 0+depth clocks (256 clocks for depth=256)  but FIFO takes only 6 clocks independent of depth

 

Can anyone please tell me what is the exact difference bet ween these two?? and which one is better if I would like  to access all 256 values for each clock??

 

I look forward to hearing from you.

 

Thanks for your time and consideration.

0 Kudos
7 Replies
yenigal
Xilinx Employee
Xilinx Employee
6,381 Views
Registered: ‎02-06-2013

Hi

 

 

Shift register just acts as a delay line and need depth clock cycles to output the first data input, while fifo is used as memory storage or synchronising where memory read can happen in the same cycle FWFT configuration or after latency equal to the register delays as per the fifo configuration.

 

If both cases you cannot access all the 256 values in a sigle clock cycle.

 

Refer PG057 for more details about the Fifo operation.

Regards,

Satish

--------------------------------------------------​--------------------------------------------
Kindly note- Please mark the Answer as "Accept as solution" if information provided is helpful.

Give Kudos to a post which you think is helpful.
--------------------------------------------------​-------------------------------------------
0 Kudos
gmvivek
Visitor
Visitor
6,370 Views
Registered: ‎12-17-2014

Hi satis,

 

Thanks for your answer.

 

Can you please suggest me how can I access all 256 values from the memory? 

 

I thought Once the buffer is filled with 256 input values I can able to access all values from memory and do Multiply and accumulate operation..

 

Figure 1 below explains my exact problem.. whereas all 256 input values only available after 256 clock (so called latency) and after that I need to read all 256 input values and do the multiply,accumulate operation.

 

and figure 2 below explains multiply and accumulation of next clock...

 

 Can you please kindly suggest me how can I make it true??

 

 

 

figure1.JPG
0 Kudos
gmvivek
Visitor
Visitor
6,367 Views
Registered: ‎12-17-2014

Hi satis,

 

figure 2 is here....

0 Kudos
gmvivek
Visitor
Visitor
6,366 Views
Registered: ‎12-17-2014

I am sorry figure 2 is here....which expalins the MAC opreation of next clock...

figure2.JPG
0 Kudos
markcurry
Scholar
Scholar
6,310 Views
Registered: ‎09-16-2009

Gmvivek,

 

Looks like a standard FIR to me.  FIR filters are in a FPGA's sweet spot - it's usually fairly straightforward to crank all kinds of DSP calcs like these through a FPGA.

 

A 256 Element sum is rather BIG however, no matter how you do it. Do your data rates truly require that kind of ALU bandwidth?

 

I'd suggest looking at the Xilinx Extreme DSP guides (I like the Virtex-5 version best - UG193 - almost everything in there is relevant to future technologies too).

 

If your data rates are that high, then you'll most likely need the full 256 DSP48's. In that case, read the guides above, and use the internal storage of those cells in a "Adder Cascade" form instead of the adder tree. Both your coefficient, and data sample storage will be within the DSP48 itself.

 

If your data rates are lower, then there's various other options of combining RAMS / Shift registers and calculating your filter over multiple clock cycles in a stateful manner.

 

It all depends on your requirements.

 

Regards,

 

Mark

 

0 Kudos
gmvivek
Visitor
Visitor
6,182 Views
Registered: ‎12-17-2014

Thanks for your suggestion mark.

 

As you metioned I have gone through the DSP48a macro IP core of Xilinx. I used the instruction P+A*B for multiply and accumulation.

 

For 256 MAC operation I have defined 16 DSP48a macros each computing 16 MAC opreations. my implementation works well before the MAC operation but It doesn't give output.

 

My question is that the Instruction for MAC opreation (P+A*B) is correct?? 

 

Please advise me.. Thanks

0 Kudos
markcurry
Scholar
Scholar
6,164 Views
Registered: ‎09-16-2009

Gmvivek,

 

Are you simulating?  This will be the best way for you to debug.  You may even push into the behavioral model of the DSP48 - the names of the internal wires of the model match pretty closely with the Figures in the documentation.

 

Regards,

 

Mark

0 Kudos