cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Explorer
Explorer
11,943 Views
Registered: ‎12-29-2008

Block RAM not working at higher frequencies....

Hi,

Xilinx says Block RAM works at around 450Mhz on Virtex 4 device , i have used 128k ram and 1k ram in my design , i have connected output of 128k to input of 1k , frequency is horribly coming down to 118Mhz.. In my project it is required to work more than 350Mhz and critical path is showing between these  two rams only , i dont understand why it is working like this..

Please guide me if i am doing any mistake..

 

Thanks in advance,

Regards

Krishna Kishore 

0 Kudos
27 Replies
Highlighted
Teacher
Teacher
11,935 Views
Registered: ‎08-14-2007

Re: Block RAM not working at higher frequencies....

Hi Krishna.

Use the FPGA-Editor tool to get an idea of the placement and routing.

Are the BRAMs as close together as possible?

What routing ressources are used? (Resulting in how many ns routing delay?)

 

Remember, that routing delay is getting worse in sub micron silicon.

 

Maybe it helps to put some pipeline registers between the RAMs.

Shorter routes between the registers lead to a higher clock frequency,  for the cost of some latency.

 

Have a nice synthesis

  Eilert

0 Kudos
Highlighted
Voyager
Voyager
11,926 Views
Registered: ‎08-30-2007

Re: Block RAM not working at higher frequencies....

You mention 128k and 1K for rams.  What are their organizations?  Is this 128k bits or 128k entries?

 

Are you looking at synthesis timing results or the P&R output?  I've pretty much always ignored the timing

estimate from XST synthesis - it is usually completely worthless.  Pay attention to the P&R timing

results.

 

You may need a pipeline stage between  the block rams.  The BRAMs have worse timing than the slice

registers, so that might help.

 

John Providenza

0 Kudos
Highlighted
Explorer
Explorer
11,910 Views
Registered: ‎07-27-2009

Re: Block RAM not working at higher frequencies....

Krishna,

 

That speed specification provided in the datasheet is for 1 (one) block RAM.

 

My guess is that when you mention 128k, that this is one _big_ chunk of memory consisting out of multiple BRAM plus a bunch of LUTs to multiplex the data busses and decode the control lines. The extra logic will seem to slow down your virtual 128k memory compared to the raw speed of a single BRAM. Note also that the BRAM instances used to build you big memory will be distributed over the vertical RAM strips in the FPGA and if the placer is not properly constrained, or you have loads of BRAM this might look bad (you may want to have a look in the floorplanner port P&R to see what the tools have cooked for you).

 

My advice to get this thing a lot faster is to add pipeline registers to your logic to absorb the 128k RAM latency and maybe fiddle a bit with the R,W,CE,OE,... control line logic to avoid combinatorial paths. Also make sure you have selected a fast mode when using the memories in dual port mode. Finally, you might need some extra pipeline registers on the data to give the placer and router enough timing room or you might want to manually split the 128k memory into a number of smaller instances that you manually combine to insert pipeline registers.

 

If you do incremental reads (like for reading frames),  it might be a good idea to use an alternative addressing on your memories such that you can do an N parallel read at speed of 1/N.

 

Regarding the fact that you need to run at over 350MHz, I'd suggest to do some architecture exploration to check if you can run at 175MHz with a double datapath. In my experience this is one of the best ways to avoid a lot of headache trying to meet timing.

 

Can you post some HDL code?

0 Kudos
Highlighted
Explorer
Explorer
11,884 Views
Registered: ‎12-29-2008

Re: Block RAM not working at higher frequencies....

Hi Eilert,

Thank you very much,

Asper critical path --> 3.228 ns is logic     delay

                                    3.070 ns is routing delay

With FPGA editor can i improve the performance?? can you provide me some links of such material / labs  as i am verymuch interested.

 

Thanks in advance.

 

regards

Krishna Kishore 

0 Kudos
Highlighted
Explorer
Explorer
11,878 Views
Registered: ‎12-29-2008

Re: Block RAM not working at higher frequencies....

Hi John,

Thank you very much for your reply.

 

The organizations of RAM s are like this 128kX4 ,1kX6.

I am looking at P&R output only.

 

I didnt understand the sentence "The BRAMs have worse timing than the slice

registers, so that might help". Could you please explain it bit clear.

 

Thanks in advance,

regards

Krishna Kishore 

0 Kudos
Highlighted
Explorer
Explorer
11,877 Views
Registered: ‎12-29-2008

Re: Block RAM not working at higher frequencies....

Hi Woutersj,

Thank you,

 

I will consider your points , but i didnt understand clealy the point " If you do incremental reads (like for reading frames),  it might be a good idea to use an alternative addressing on your memories such that you can do an N parallel read at speed of 1/N."

 

and

 

" Regarding the fact that you need to run at over 350MHz, I'd suggest to do some architecture exploration to check if you can run at 175MHz with a double datapath. In my experience this is one of the best ways to avoid a lot of headache trying to meet timing."

 

could you please explain it.

 

and the code is just like this ::::::

 

module ssram_xst #(parameter words = 16, addr = 4, data = 32) (clk, en, w, a, i, o);
   input  clk;
   input  en;
   input  w;
   input [0:(addr-1)] a;
   input [0:(data-1)] i;
   output [0:(data-1)] o;
   reg [0:(data-1)]    ram [0:(words-1)];
   reg [0:(addr-1)]    read_a;
   (*ram_style = "block"*)

     always @(posedge clk) begin
      if (en)
    begin
       if (w)
             ram[a] <= i;
       read_a <= a;
    end
   end
   assign o = ram[read_a];
endmodule

 

and i have instantiated this ram in different places with different parameters to make my design..

 

Please correct me if i am wrong. 

 

regards

Krishna Kishore 

0 Kudos
Highlighted
Adventurer
Adventurer
11,870 Views
Registered: ‎07-16-2009

Re: Block RAM not working at higher frequencies....

Hi onkarkk1,

you could use FPGA Editor to improve timing of your design, but it is not recommended. The better approach would be to use FPGA 
Editor to understand the problem a then put some LOC constrain into your design. This is because FPGA editor modifies par output

and therefore every change in design will overwrite your FPGA modifications. On the other hand LOC are the attribute of the instance
and par will respect it in all other runs.

I would recommend to look at the programs such as planAhead of floorplanner(depending on your tool version).

But 350MHz frequency is very high ambitious goal. And if there is any other way how to solve your problem (ie. lower frequency with
wider datapath) I would strongly recommend this other approach.

 

Jan

0 Kudos
Highlighted
Explorer
Explorer
11,860 Views
Registered: ‎07-27-2009

Re: Block RAM not working at higher frequencies....

onkarkk1,

 

The point about incremental reads should be understood as follows:

  • suppose you have a memory of a certain logical width like 8 bits because your algorithm consumes 8 bits per clock
  • suppose that the memory contains something like an ethernet frame that occupies many 8bit addresses
  • suppose your algorithm does something on that block of data like calculating a checksum
  • then you can also use a memory of say 32bits wide and fetch 4 bytes per clock but lower the clock to 1/4 the original speed, but the penalty will be 4x more 'gates'


The second remark is along similar lines: it can be much easier to get timing closure running at half the clock but with roughly double the amount of logic. This can be understood from the fact that the relative overhead of fetching something from a register or memory increases when the clock goes faster.

 

To get a faster RAM, you could adapt the verilog to

 

 

module ssram_xst #(parameter words = 16, addr = 4, data = 32) (clk, en, w, a, i, o);
   input  clk;
   input  en;
   input  w;
   input [0:(addr-1)] a;
   input [0:(data-1)] i;
   output reg [0:(data-1)] o;
   reg [0:(data-1)]    ram [0:(words-1)];
   reg [0:(addr-1)]    read_a;
   (*ram_style = "block"*)

     always @(posedge clk) begin
      if (en)
    begin
       if (w)
             ram[a] <= i;
       read_a <= a;

      assign o <= ram[read_a];

    end
   //assign o = ram[read_a];
endmodule

 

This will result in the synthesis report into something like the following (not the 'absorb' part which means this is part of the BRAM!)

 

=========================================================================
*                       Advanced HDL Synthesis                          *
=========================================================================

Loading device for application Rf_Device from file '3s1400a.nph' in environment d:\Xilinx\10.1\ISE.

Synthesizing (advanced) Unit <ssram_xst>.
INFO:Xst - The RAM <Mram_ram> will be implemented as a BLOCK RAM, absorbing the following register(s): <o>
    -----------------------------------------------------------------------
    | ram_type           | Block                               |          |
    -----------------------------------------------------------------------
    | Port A                                                              |
    |     aspect ratio   | 1024-word x 32-bit                  |          |
    |     mode           | read-first                          |          |
    |     clkA           | connected to signal <clk>           | rise     |
    |     enA            | connected to signal <en>            | high     |
    |     weA            | connected to signal <w>             | high     |
    |     addrA          | connected to signal <a>             |          |
    |     diA            | connected to signal <i>             |          |
    -----------------------------------------------------------------------
    | optimization       | speed                               |          |
    -----------------------------------------------------------------------
    | Port B                                                              |
    |     aspect ratio   | 1024-word x 32-bit                  |          |
    |     mode           | write-first                         |          |
    |     clkB           | connected to signal <clk>           | rise     |
    |     addrB          | connected to signal <read_a>        |          |
    |     doB            | connected to signal <o>             |          |
    -----------------------------------------------------------------------
    | optimization       | speed                               |          |
    -----------------------------------------------------------------------
Unit <ssram_xst> synthesized (advanced).

0 Kudos
Highlighted
Voyager
Voyager
11,859 Views
Registered: ‎08-30-2007

Re: Block RAM not working at higher frequencies....

The clock-to-output delay is slower for BRAMs than the flip-flops in a CLB.  If' you're right up

against a timing problem, adding a CLB flip-flop pipeline stage can buy you some extra time

and probably also make the P&R operation easier.

 

 

From the V4 data sheet:

 

For CLB flip-flop outputs...

TCKO FF Clock CLK to XQ/YQ outputs      0.28 0.31 0.36 ns, Max

 

 

For BRAM

Sequential Delays
TRCKO_DORA Clock CLK to DOUT output (without output register)(2) 1.65 1.83 2.10 ns, Max
TRCKO_DOA Clock CLK to DOUT output (with output register)(3) 0.72 0.80 0.92 ns, Min

 

 

ALSO - from one of you rother posts, you had the following code:

 

always @(posedge clk) begin
    if (en)
        begin
        if (w)
            ram[a] <= i;
        read_a <= a;
        end
    end
assign o = ram[read_a];
 

 

If you change the code to be:

 

always @(posedge clk) begin
    if (en)
        begin
        if (w)
            ram[a] <= i;
        read_a <= a;
        end
    o <= ram[read_a];
    end



You might gain about 1ns in timing since the output of the ram will now be registered.  Of course,

this adds a pipeline stage.

 

John Providenza

0 Kudos
Highlighted
Explorer
Explorer
9,086 Views
Registered: ‎07-27-2009

Re: Block RAM not working at higher frequencies....

Ignore the assign part; I copy pasted this. The report is based on a piece of code:

 

 module ssram_xst #(parameter words = 16, addr = 4, data = 32) (clk, en, w, a, i, o);
   input  clk;
   input  en;
   input  w;
   input [0:(addr-1)] a;
   input [0:(data-1)] i;
   output [0:(data-1)] o;
   reg [0:(data-1)]    ram [0:(words-1)];
   reg [0:(addr-1)]    read_a;
   (*ram_style = "block"*)

     always @(posedge clk) begin
      if (en)
    begin
       if (w)
             ram[a] <= i;
       read_a <= a;
    end
   end
   assign o = ram[read_a];
endmodule


module top (
  input  clk;
  input  en;
  input  w;
  input [0:(4-1)] a;
  input [0:(4-1)] i;
  output [0:(32-1)] o;
);

ssram_xst ssram_xst_0
(
   .clk(clk);
   .en(en);
   .w(w);
   .a(a);
   .i(i);
   .o(o);
);

endmodule

0 Kudos
Highlighted
Explorer
Explorer
9,074 Views
Registered: ‎12-29-2008

Re: Block RAM not working at higher frequencies....

Hi Woutersj,

I have tried the configuration as you said, in synthesis reports it is shown exactly as u said , but it is left with no improvement in performance.

Please correct me if i am doing any mistake.

 

Thanks in advance,

Regards

Krishna Kishore 

0 Kudos
Highlighted
Teacher
Teacher
9,060 Views
Registered: ‎08-14-2007

Re: Block RAM not working at higher frequencies....

Hi Krishna,

I agree with the things  Jan (Lordgalloth) wrote.

Fpga-editor is great for finding and understanding design problems, but a PITA when you try to change your designs with it.

It's like working on assembler level with code generated by a compiler.

 

Just some questions:

You wrote that your critical path has 3.2 ns of logic delay.

Does this path have more than one logic level? If so you should add registers to split that path.

 

Since you are using BRAMS, have you enabled all the registers (input and output and adresses)?

Use coregen to see what options are available.

 

 Have a nice synthesis

   Eilert

 

 

0 Kudos
Highlighted
Explorer
Explorer
9,060 Views
Registered: ‎12-29-2008

Re: Block RAM not working at higher frequencies....

Hi Eilert ,

Thank you for your continued support, as you people said i have started working on fpga editor, tracing paths  its really good tool if you get expertised i think .. (as per my little experience).. Here i have one doubt i.e., in CLB there are some MUX s  like DYMUX , DXMUX , CY0G etc., which doesnt have any select signals as i have seen in fpga editor, could you please tell me what is the logic behind this??

Please correct if i am interpretting things in other way..

 

And your questions: 

***You wrote that your critical path has 3.2 ns of logic delay.

Does this path have more than one logic level? If so you should add registers to split that path.

 

 yes it is having 2 levels of logic.

 

Since you are using BRAMS, have you enabled all the registers (input and output and adresses)?

Use coregen to see what options are available.

 

 I have not used any such option , i just used simple write first ram coding. I will check this option.

 

 

Thanks in advance,

Regards

Krishna Kishore 

0 Kudos
Highlighted
Teacher
Teacher
9,038 Views
Registered: ‎08-14-2007

Re: Block RAM not working at higher frequencies....

Hi Krishna,

since you are continouusly working on your project your "little experience" will surely grow quite soon. :-)

You can be proud for yourself, being able to state questions in a detailed and understandable manner, so we are at least able to help you.

Other guys are not even clever enough for that.

 

For your question about the CLB internal muxes:

This is simple...

Even in a CLB signals have to find their way. For that purpose there are muxes, that are programmed by an underlying SRAM cell via the bitstream.

They are not intended to switch while the device is operating, so they don't need accessable select lines.

Only unintended cause for such a mux to switch is a bitflip in the controlling SRAM cell, which may happen e.g. if the device is exposed to radiation.

 

 

 Have a nice synthesis

    Eilert

0 Kudos
Highlighted
Explorer
Explorer
9,041 Views
Registered: ‎12-29-2008

Re: Block RAM not working at higher frequencies....

Hi Eilert ,

Thank you for your reply and kind words which have rezuvenated me a lot..

Thanks for your explanation regarding internal Mux's , but really i am not getting exact picture with that..

i.e., as per my knowledge CLB doesnot contain SRAM cell to program those Mux's ,  that means they have to get select control signals while programming , that means is it a dynamic process ?? because some mux's having 6 inputs  some having 4 inputs and 2 inputs without control singals. 

 

** Now if i want to manually edit a particular CLB then how should i control these MUX's 

 

** out of curiosity i am asking for each and every CLB how does it calculates this switching information ( any perticular algorithm)??

 

** Is there any name for such type of MUX's.

 

Thanks in advance,

Regards

Krishna Kishore 

0 Kudos
Highlighted
Teacher
Teacher
9,030 Views
Registered: ‎07-09-2009

Re: Block RAM not working at higher frequencies....

Do I understand that you have a 128k ram, who's output is driving the input of a 1k ram ?

 

Sorry , Verilog is not my language, so i could have that wrong.

 

Your only going to get maximum performance if you register the Inputs and outputs, of each RAM. This will give a fall through of a number of clocks,  but it will be fastest clock design possible.

 

ADDRESS_1 -> [REG ] (128 k RAM ) -> data_1[REG]    => address_2[REG](1k RAM ) -> data[REG]

 

 Is your ADDRESS_1 17 bits wide, and data_1 10 bits wide, which gives the address_2 10 bits wide ?

 

 can I probe a little as to what function you are implimenting in such a big RAM. I assume it's beign used as some sort of ROM / LUT.

 

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
0 Kudos
Highlighted
Explorer
Explorer
9,024 Views
Registered: ‎12-29-2008

Re: Block RAM not working at higher frequencies....

Hi John,

 

 

Do I understand that you have a 128k ram, who's output is driving the input of a 1k ram ?

 

Yes it is.

 

and regarding function , i can tell you it is like a part of search engine sort of stuff..

i am attaching a rtl view of the same which will be bit helpfull .

 

Thanks for your time..

regards

Krishna Kishore 

 

ta2.png
0 Kudos
Highlighted
Teacher
Teacher
9,016 Views
Registered: ‎08-14-2007

Re: Block RAM not working at higher frequencies....

Hi Krishna,

SRAM-based FPGAS like those from XILINX have a lot of SRAM cells that just hold the information that come from the bitfile.

This memory array is normally only programmed once, during configuration. (Exceptions: partitional reconfiguration and all storage elements of the FPGA like BRAMs etc)  

Each bit of this controlls some element of the FPGA fabric on top of that SRAM memory.

 

So a CLB is internally connected to a bunch of these SRAM cells. 

For example: You find a mux in the CLB/Slice diagramm that is fed with an input line and the inverse of that line and constant '0' and constant '1'.

The output of that multiplexor is connected to the reset input of one of the slices FlipFlops. 

Now if you code something like :

 

    If MyReset = '1' then

        MyFlop <= '0';

    elsif rising_edge(Clock)

  ...etc

 

Then the synthesis tool derives from this information that a signal MyReset must be connected to the reset input of some flipflop.

The slice internal routing requires to set the mux correctly, and so the implementation tools calculate which SRAM cells have to be set to provide this.

If you code a flipflop without a reset signal, the mux would be set to one of the constant input values.

 

It's the same for the other muxes too. Just for different functions. e.g. clock inversion, LUT Output direct or over FF, carry chain usage etc.

 

If you want to change these informations in FPGA-editor, well it's a little tricky. First you have to enter the mode to make your design editable. (Its write protected for safety purposes)

Then you can click on some muxes inputs and the routing changes. It's hard to explain here, You have to try and see. Sometimes it's necessary to disconnect some signals from the CLB first in order to change the routing. 

fpga-editor is an old tool (dating back to the days of XACT) and usability has never been improved since then.

 

The slice internal routing muxes names... well you already mentioned them in your former posting. :-)

 

Have a nice synthesis

  Eilert 

 

 

 

0 Kudos
Highlighted
Teacher
Teacher
8,995 Views
Registered: ‎07-09-2009

Re: Block RAM not working at higher frequencies....

Not an answer, but an aside,

 

have you considered contents addressable memory for a search function ? used a lot in network switches and the like.

 

http://en.wikipedia.org/wiki/Content-addressable_memory

 

http://www.xilinx.com/support/documentation/application_notes/xapp202.pdf

 

 

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
0 Kudos
Highlighted
Explorer
Explorer
7,668 Views
Registered: ‎12-29-2008

Re: Block RAM not working at higher frequencies....

Hi John,

Thanks for your reply.

But CAM concept doesn't match with our requirement (this is my assumption only, i have to talk with my hierarchy and come to a conclusion on it), it is R&D kind of stuff thats why i am hesitating to tell details of the requirements.  

 

Thanks and regards

Krishna Kishore 

0 Kudos
Highlighted
Explorer
Explorer
7,668 Views
Registered: ‎12-29-2008

Re: Block RAM not working at higher frequencies....

Hi Eilert,

With FPGA editor and STA reports , by changing the placements of components i could gain some frequency ( around 30Mhz) as of now. I think if get more experience on it i can improve more frequency , there might be some more tricks ...

 

Thanks a lot..

 

Regards 

Krishna Kishore 

0 Kudos
Highlighted
Teacher
Teacher
7,664 Views
Registered: ‎07-09-2009

Re: Block RAM not working at higher frequencies....

If your getting 30 Mega Hz improvement by moving the ram around, it sounds like you have not constrained the design to what you really need.

 

 What UCF timing constraints have you used ?

 

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
0 Kudos
Highlighted
Explorer
Explorer
7,662 Views
Registered: ‎12-29-2008

Re: Block RAM not working at higher frequencies....

Hi John, 

In UCF i have given only timing constraint i.e "3.5 ns " . And i didn't move RAM , i have moved SLICE blocks. 

If it is wrong way of doing things please correct me.

 

Regards

Krishna Kishore 

 

0 Kudos
Highlighted
Teacher
Teacher
7,659 Views
Registered: ‎07-09-2009

Re: Block RAM not working at higher frequencies....

Hi

 

I might have this wrong, so please excuse.

 

Did you say that your critical path is 3.2 ns, which is too slow  and you are con-straining the timing to meet 3.5ns ?

 

 

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
0 Kudos
Highlighted
Explorer
Explorer
7,648 Views
Registered: ‎12-29-2008

Re: Block RAM not working at higher frequencies....

Hi John,

I didnt understand what u meant to say, but i got critical path delay of around 6.298 ns . After change of placements i got new critical path whose dealy is different ( which is less) thats what i meant to say..

 

If my approach of synthesis is wrong please correct me..

regards

Krishna Kishore 

0 Kudos
Highlighted
Teacher
Teacher
7,645 Views
Registered: ‎07-09-2009

Re: Block RAM not working at higher frequencies....

Hi

 

step back,

 

what UCF file do you have ?

 

in there should be the timing constraints for the design, it sort of defines how hard the 'compiler' works. 

 

For timing, as a minimum , you need a clock period, I'd be surprised if you do not also need setup and hold timings defined for each IO.

 

try using the constraints editor if your new to this, 

 

start here.

 

http://www.xilinx.com/support/documentation/sw_manuals/xilinx11/manuals.pdf

 

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
0 Kudos
Highlighted
Explorer
Explorer
7,609 Views
Registered: ‎12-29-2008

Re: Block RAM not working at higher frequencies....

Hi John,

I am aware of this document , but i thought of using specific constraints when i encounter conditions like false paths , multicycle paths etc., but according to you , i have to mention setup time / hold time constraint for each I/O pin right?? that is through OFFSET IN / OFFSET OUT constraint , here i could not understand what value i should mention for setup time and hold time for each IOB..

 

That means i understood that if i specify period constraint then tool itself will take care of these things , and no need of specifying explicitely as i dont know what should be the setup time / hold time each IO ..

 

Please correct me if i understood wrong..

 

Thanks and regards

Krishna Kishore 

Message Edited by onkarkk1 on 08-12-2009 04:49 AM
0 Kudos