cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
18,848 Views
Registered: ‎09-02-2007

DDR3 memory bandwidth

Jump to solution

Just how big is the memory controler / ddr ( 1 2 or 3 ) continous bandwidth.

 

I see notes of around 800 Mega Bit per second, is this true? 

   seems low for a DDRx running at 120 plus MHz clock, double data rate, and 32 plus bits wide ?

 

thats something like a one clock in 12 carries data.

 

 

0 Kudos
1 Solution

Accepted Solutions
Highlighted
Contributor
Contributor
17,315 Views
Registered: ‎09-10-2008

You need to understand, that the bandwidth will depend on many parameters: what kind of memory you use, how wide is the memory bus is, how many MCBs you have instantiated, whether your data access pattern is random or not, etc.

 

To answer your hypothetical question about maximum bandwidth, consider the following:

1) You have a chip with 4 MCBs working in x16 800Mbps mode

2) You write to contiguous blocks of memory, no random access whatsoever

3) You only write to memory

 

Then the best case memory bandwidth(upper bound) will be  4x16x800=51200Mbps = 6400 Megabytes per second.

 

OK?

View solution in original post

16 Replies
Highlighted
Xilinx Employee
Xilinx Employee
18,840 Views
Registered: ‎08-13-2007

When you see reference to something like "800Mbps DDR2" - they are saying that the clock is 400MHz and each bit is tranferring data on both edges of the clock.

The bandwidth capability will be the bus width multiplied by the bus width.

The effective bandwidth can be much lower and is impacted by the controller algorithms (e.g. we use a Least Recently Used algorithm on Virtex-5 that can keep multiple banks open and improves the internal latency), access pattern, direction, burst length, refresh considerations, etc.

 You'll see other features on the V6 controller to furher increase effective bandwidth.

 

bt

0 Kudos
Highlighted
18,834 Views
Registered: ‎09-02-2007

Hi

 

Agree with you in that performance is situation related, and it's double data rate, but quoting from the spartan 6 product brief.

 

"integrated Memory Controllers               Only low-cost FPGA with integrated memory controller blocks
                                                                DDR, DDR2, DDR3, and LPDDR support

                                                                Data rates up to 800Mbps (12.8Gbps peak bandwidth)"

 

So is that 800 Mega bits per second with a 128 bit interface, i.e. an average clock speed of around 3 MHz on each pin,

   is that not very slow for a memory that has a  clock of 400 plus MHz, and data is double data rate ?

 

 

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
18,832 Views
Registered: ‎08-13-2007

3MHz on each pin would indeed be very slow. But there appears to be some confusion on the internal and external interfaces.

 

The S6 MCB offers an external interface of 4, 8, or 16-bits. That is where the 12.8Gbps peak bandwidth is coming from.

Internally, you have to run the fabric back-end at a slower and wider single-data rate interface to handle the data inside the FPGA. The backend of the MCB is a 32, 64, or 128-bit data bus inside the FPGA.


More details on in the MCB User Guide:

http://www.xilinx.com/support/documentation/spartan-6.htm (Spartan-6 Documentation)

 

bt

0 Kudos
Highlighted
18,822 Views
Registered: ‎09-02-2007

thank you timpe,

 

yep, the web link you quote is where I got the info I quote to you from.

 

 Yep agree, inside the FPGA we run single data rate, not DDR, and we run wide to account for the slower data rate inside the FPGA compared to the out side.

 

 but your not answering question here, is the guaranteed continuous data rate of the Xilinx interface as stated only around 3 MHz per pin ? 

   this can't be, can it ? 

 

Simplify  question, specific, not general.

 

What sustained data rate can I get into or out of a DDR 3 memory, using the fastest Spartan 6 built in memory controler, double chanel, using the fastest DDR 3 chips I could get. 

 

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
18,806 Views
Registered: ‎08-13-2007

It isn't clear to me where you keep getting this "3MHz per pin" The only pins are the external interface and they can transfer up to 800Mbps each. So for a 16-bit interface, that would be a peak bandwidth of 12.8Gbps for the entire MCB. The back-end interface will be running slower, wider, and SDR.

 

I don't have the final characterization #s available. But the preliminary numbers I'v'e seen show very good effective utilization when you consider sustained throughput. It obviously depends on the type of memory (e.g. DDR2, DDR3, or LPDDR), the type of transfers, direction, etc. For example, the efficiency drops with random addressing and shorter burst patterns. These numbers are not official but for DDR3 I see a range of:

best case: high (95%+) for burst writes, length 32

down to: around 50% with random operations with a burst length between 4 and 16

There are obviously a range of operations in between here.

Even at 50% efficiency, you still have 6.4Gbps of effective banwidth.

 

Where are you getting this 3MHz per pin?

 

bt

 

 

Message Edited by timpe on 06-30-2009 02:29 PM
0 Kudos
Highlighted
18,799 Views
Registered: ‎09-02-2007

Hi

 

 to quote the product brief.

 

" Integrated Memory Controllers               Only low-cost FPGA with integrated memory controller blocks
                                                                  DDR, DDR2, DDR3, and LPDDR support

                                                                  Data rates up to 800Mbps (12.8Gbps peak bandwidth)

                                                                  Multi-port bus structure with independent FIFO to reduce "

 

 

 so to be clear, you are saying the 800 Mega bit per second quoted in the Spartan 6 product brief is the maximum peak data rate per external pin of the Spartan 6 ?

 

 

 

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
18,794 Views
Registered: ‎08-13-2007

Yes. 800Mbps * 16 = 12.8Gbps

Obviously the narrower configuration of x8 or x4 will half or quarter the respective maximum bandwidth.

And there are some considerations as I outlined above for real world performance.

 

I believe this convention is fairly standard (bitrate per pin) in the memory industry.

 

bt

 

BTW,

It is clear from the moderation on my previous posts you didn't like something I said... I said "each bit" from the beginnning. I apologize if this was not more obvious

Tags (1)
0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
18,790 Views
Registered: ‎08-13-2007
It may be worth clarifying that it is bitrate per data pin, since those are the only pins involved in moving data. The rest (address, control) are overhead to control the interface.
0 Kudos
Highlighted
18,775 Views
Registered: ‎09-02-2007

We have deviated a long way from the original simple one line question.

 

but to go back to my original question,

 

"Just how big is the memory controller / ddr ( 1 2 or 3 ) continuous bandwidth"

 

To clarify, I'll restate it slightly.

 

" the spartan 6 has a dedicated memory controller. Using the fastest spartan 6, and the fastest DDR3 memory the controller supports, does Xilinx have any numbers as to what continuous read or write performance can be expected .

 

A hypothetical example to clarify things.  If I have a continous data generator inside the spartan 6 of say 128 bits wide, at 200 MHz. This data generator can not be stopped, it must free run at the 200 MHz.  Can this data be constantly written to the DDR 3 by the memory controler inside the spartan 6, does the controler have sufficient buffering to take care of the times when the controler is performing it's house keeping ?  " 

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
11,646 Views
Registered: ‎08-13-2007

I would argue that trying to establish an understanding of the bottlenecks and associated throughput is not a deviation from the practical application of the MCB...

 

For example, 12.8Gbps (for a 16-bit interface at 800Mbps) equates to a peak bandwidth of 1.6GB/s.

A 128-bit back-end continually running at 200MHz equates to 16x 200MB/s or 3.2GB/s. Clearly this is not possible if you can't stop the data. It also fails to account for inefficiencies in the transfer or the problem of reading the data as well (which if you can't stop would presumably be interleaved or else the data capture is eventually terminated like a logic analyzer).

 

bt

0 Kudos
Highlighted
11,621 Views
Registered: ‎09-02-2007

Hi

 

you seem to be implying that Xilinx have no information as to what the maximum continuous data transfer rate the MCB can cope with.

 

If that is so and you can confirm this, then I'm happy in that  my original question is answered.

 

 

0 Kudos
Highlighted
Contributor
Contributor
17,316 Views
Registered: ‎09-10-2008

You need to understand, that the bandwidth will depend on many parameters: what kind of memory you use, how wide is the memory bus is, how many MCBs you have instantiated, whether your data access pattern is random or not, etc.

 

To answer your hypothetical question about maximum bandwidth, consider the following:

1) You have a chip with 4 MCBs working in x16 800Mbps mode

2) You write to contiguous blocks of memory, no random access whatsoever

3) You only write to memory

 

Then the best case memory bandwidth(upper bound) will be  4x16x800=51200Mbps = 6400 Megabytes per second.

 

OK?

View solution in original post

Highlighted
11,543 Views
Registered: ‎09-02-2007

thank you Ivan,

 

that's the number and bounds I am after, 

 

A very good answer,

 

 

0 Kudos
Highlighted
Adventurer
Adventurer
11,189 Views
Registered: ‎04-22-2008

Here's a related question for anyone who's got an answer.  According to UG388, you need to provide the MCB with a clock at 2x the memory bus frequency, i.e. an 800 MHz clock to get a 400 MHz bus (800 Mb/s on each pin.)  On page 80, the recommendation is that this clock be driven from one of the main PLLs, then through a BUFPLL_MCB (which doesn't change the frequency) and finally from there into the MIG wrapper core.

 

The only problem in all of this is that according to electrical specs document DS162 the PLL is unable to exceed 375 MHz, limiting the memory bus to 187.5 MHz.  The Clocking Wizard CoreGen seems to agree with this assessment, and that's whether or not you output those lines into a BUFG.

 

Anyone know what the story is there?

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
11,168 Views
Registered: ‎10-23-2007
Are you looking at 36 of the DS162 which shows 375 for the PLL with the BUFGMUX?  I think you should be looking at the next page which lists the Foutmax with the BUFPLL (I think we need BUFPLL_MCB).  Unfortunately this doesn't have a spec yet, but one would hope that it would be sufficient to run the MCB at its supported rate.  Perhaps the clocking wizard is using the only available number but will be updated for this case.
Highlighted
Observer
Observer
8,965 Views
Registered: ‎05-16-2012

Picking up and hopefully continuing that discussion, I found the frequencies  1080 1050 950 500  for the -3, -3N, -2, -1L devices, which clarifies that the 1l cannot run a 400 MHz DDR. (Table 51, pg 56).

 

Referring to the initial question:

 

Assuming a one-chip DDR3 attached to the MCB by 16Bit @ 400MHz x 2 and 40% maximum efficiency for balanced read an write - 10% overhead : This should by around 650 MBps continuous data transfer.

 

My question now is, if it is better to use a 4 port (2W + 2W) interface or better a dual port (1R + 1W) with 128 Bits and manage access manually rather than adding 4 processes and let them be managed by round robin ?

 

 

 

 

0 Kudos