Sign In

Don't have a Xilinx account yet?

  • Choose to receive important news and product information
  • Gain access to special content
  • Personalize your web experience on Xilinx.com

Create Account

Username

Password

Forgot your password?
XClose Panel
Xilinx Home
Reply
Super Contributor
bouvett
Posts: 162
Registered: ‎09-22-2010
0

How fast is reading and writing with MIG?

Hi All,

 

I am doing a project on an Atlys board in which I interface the spartan6 fpga with the on board ddr2 sdram MT47H64M16-25E via the MIG ip core by xilinx. I have set the operating frequency of the MIG as 333.333Mhz. On board oscillator is 100Mhz.

 

Now I implemented a small controller which simplifies further the interface with the MIG. I used it to write and read all the 128Mbytes of space available succesfully. The problem is the speed.

 

Important info: I am not making use of the memory's ability to do double data rate. The MIG has a datawidth of 32 bits whereas the IC has a datawidth of 16bits. I am only writing and reading 16 bits at any one time so I always mask half the data width; I do this due to my application. I am only using a burst lenght of 1.

 

When I operate the controller at 100Mhz (MIG at 333.333Mhz), I manage to get a data write speed of 133MB/s, whereas the data read speed is of 39 MB/s. When I change the controller frequency to 200Mhz (again MIG is at 333.333Mhz), the data wirte speed goes up to 267MB/s whereas the data read speed goes up to 45.87MB/s.

 

I know that I am not using the MIG in all its potential. I should use a longer burst lenght for read. And also I should be reading 32 bit words instead .. but doesn't it seem to you that the data read speed is a bit too low? Also, wouldn't you expect that the data read speed is equal or possibly faster than the write speed (due to electronic reasons..).

 

Thanks for your time. Best regards,

bouvett

 

By the way I attached the controller code below..

----------------------------------------------------------------------------------
-- Company: 
-- Engineer: 
-- 
-- Create Date:    03:24:50 06/12/2012 
-- Design Name: 
-- Module Name:    DDR2_SDRAM_CONTROLLER - Behavioral 
-- Project Name: 
-- Target Devices: 
-- Tool versions: 
-- Description: This module instnatiates a simple interface between the host and the MIG module generated via the
--					 MIG wizard by Xilinx.
--
-- Dependencies: 
--
-- Revision: 
-- Revision 0.01 - File Created
-- Additional Comments: 
--
----------------------------------------------------------------------------------
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.numeric_std.all;
library work;
use work.DDR2_SDRAM_PACKAGE.all;

entity DDR2_SDRAM_CONTROLLER is
	 generic(
			read_burst_count : integer := 1; --read command burst lenght
			read_cmd  : std_logic := '1';
			write_cmd : std_logic := '0'
	 );
    Port ( 
			  --Controller ports
			  clock_src			: in  STD_LOGIC;   --single ended clock source input
           mstr_rst 			: in  STD_LOGIC;  --master reset of the module
           cmd 				: in  STD_LOGIC;  --specifies the job that is required. '0' to write, '1' to read
           location 			: in  STD_LOGIC_VECTOR (29 downto 0); --specifies location where to write the item
           data_to_rd 		: out STD_LOGIC_VECTOR (15 downto 0);  --is the port holding the data read from the requested location in the memory ic
           data_to_wrt 		: in  STD_LOGIC_VECTOR (15 downto 0);  --is the port holding the data to be written to the requested location in the memory ic
           trigger_prcs 	: in  STD_LOGIC; --trigger the controller to start processing the new command
           prcs_ready 		: out STD_LOGIC; --normally low; this signal is set for one clock cycle to show that the process has been completed
			  error_flag 		: out	STD_LOGIC; --when this goes high an error would have occured
			  read_data_to_rd : out STD_LOGIC; --when this pin is asserted, the client should read the data_to_rd
														  --port. This is used only if the read_burst_count is greater than 1.
			  ack_trigger		: out STD_LOGIC; --This pin is used to acknowledge the client that the trigger has been received.
			  clock_locked		: in std_logic;  --When this pin is high it means that the clock is good to use
			  
           c3_sys_rst_i                            : out std_logic;
			  c3_calib_done                           : in 	std_logic;
			  c3_clk0                                 : in  std_logic;
			  c3_rst0                                 : in  std_logic;
			  c3_p0_cmd_en                            : out std_logic;
			  c3_p0_cmd_instr                         : out std_logic_vector(2 downto 0);
			  c3_p0_cmd_bl                            : out std_logic_vector(5 downto 0);
			  c3_p0_cmd_byte_addr                     : out std_logic_vector(29 downto 0);
			  c3_p0_cmd_empty                         : in 	std_logic;
			  c3_p0_cmd_full                          : in 	std_logic;
			  c3_p0_wr_en                             : out std_logic;
			  c3_p0_wr_mask                           : out std_logic_vector(C3_P0_MASK_SIZE-1 downto 0);
			  c3_p0_wr_data                           : out STD_LOGIC_VECTOR(C3_P0_DATA_PORT_SIZE-1 downto 0);
			  c3_p0_wr_full                           : in 	std_logic;
			  c3_p0_wr_empty                          : in 	std_logic;
			  c3_p0_wr_count                          : in 	std_logic_vector(6 downto 0);
			  c3_p0_wr_underrun                       : in 	std_logic;
			  c3_p0_wr_error                          : in 	std_logic;
			  c3_p0_rd_en                             : out std_logic;
			  c3_p0_rd_data                           : in 	std_logic_vector(C3_P0_DATA_PORT_SIZE - 1 downto 0);
			  c3_p0_rd_full                           : in 	std_logic;
			  c3_p0_rd_empty                          : in 	std_logic;
			  c3_p0_rd_count                          : in 	std_logic_vector(6 downto 0);
			  c3_p0_rd_overflow                       : in 	std_logic;
			  c3_p0_rd_error                          : in 	std_logic
	 );
end DDR2_SDRAM_CONTROLLER;

architecture Behavioral of DDR2_SDRAM_CONTROLLER is

type states is (calibration_check,trigger_check,write_start,write_data_buffer_check,command_buffer_check_wr,procedure_finished,read_start,command_buffer_check_rd,read_buffer_check,catch);
signal current_state 	: states;
signal location_2_lsb  	: std_logic_vector (1 downto 0);
signal hey					: std_logic := '0';
signal test_pin_signal	: std_logic := '0';

begin
	process(clock_src,mstr_rst)
		variable read_counter : unsigned (4 downto 0); --maximum value 16
		begin
			--asynchrous reset
			if(mstr_rst = '1') then
				--reset settings
				current_state 	<= calibration_check;
				c3_p0_cmd_en  	<= '0';
				c3_p0_wr_en  	<= '0';
				c3_p0_rd_en  	<= '0';
				error_flag		<= '0';
				prcs_ready		<= '0';
			elsif(clock_src'event and clock_src='1') then
				
				case current_state is
					when calibration_check => --check whether calibration is ready
						
						if(c3_calib_done = '1' and clock_locked = '1') then
							current_state <= trigger_check;
						end if;
						
					when trigger_check => --next check whether a process has been triggered
												 --also check what command has been entered
												
						if(trigger_prcs = '1') then
							if(cmd = '1') then --read command?
								current_state <= read_start; --start read state
							else --write command
								current_state <= write_start; --start write state
							end if;
							ack_trigger <= '1';
						end if;
						
					when write_start => --This is the write start node
											 --set write and command parameters
						ack_trigger <= '0';
											
						--Take care of port alignment
						if(location_2_lsb = "00") then
							c3_p0_wr_data (31 downto 16) <= (others => '0');
							c3_p0_wr_data (15 downto 0)  <= data_to_wrt;
							c3_p0_wr_mask <= "1100";
						else --location_2_lsb = "10"
							c3_p0_wr_data (31 downto 16) <= data_to_wrt;
							c3_p0_wr_data (15 downto 0)  <= (others => '0');
							c3_p0_wr_mask <= "0011";
						end if;
						
						c3_p0_cmd_bl 			<= "000000";
						c3_p0_cmd_instr 		<= "000"; --set MIG to write
						c3_p0_cmd_byte_addr 	<= location(29 downto 2) & "00";
					
						c3_p0_wr_en <= '1'; --enable data write buffer
						current_state <= write_data_buffer_check;
												
					when write_data_buffer_check => --Disable the write buffer enable pin
															  --after 1 clock cycle.
															  --If command buffer is not full, enable 
															  --it.
															  
						c3_p0_wr_en <= '0'; --disable data write buffer
						
						if(c3_p0_cmd_full = '0') then
							c3_p0_cmd_en <= '1'; 
							current_state <= command_buffer_check_wr;
						end if;
											
					when command_buffer_check_wr	=> --Disable the command buffer enable pin
														  --after 1 clock cycle.
					
						c3_p0_cmd_en <= '0'; --disable command buffer
						
						--signal that procedure is ready
						prcs_ready 		<= '1';
						current_state 	<= procedure_finished;
																	
					when procedure_finished	=> --Procedure finished hence de-assert flag pin
								
						prcs_ready 		<= '0';
						current_state 	<= trigger_check;
								
					when read_start => --Start read procedure
						
						ack_trigger <= '0';
						
						c3_p0_cmd_instr 	<= "001"; --set MIG to read
						--Take care of address alignment
						--In this test we are assuming that the user supplies 2 byte data. Since the user
						--interface is *32bits, then according to table 4-2 of UG388, the 2 LSB must be set
						--to 0. Hence the 2 LSBs (location_2_lsb) can either be "00" or "10".
						--Therefore, what should be done is that the 2 LSBs of c3_p0_cmd_byte_addr are always 
						--set to 0 and then use the value of location_2_lsb to determine whether the higher or 
						--lower 2 bytes of c3_p0_rd_data should be output to data_to_rd.
						c3_p0_cmd_byte_addr	<= location(29 downto 2) & "00";
						
						--Set burst lenght
						--command fifo is 4 data words deep, whereas the data fifo is 64 data words deep
						--hence maximum burst lenght which can exactly fit the fifos is (64/4 = 16).
						-- !! Make sure burst lenght is not more than 16 !!
						c3_p0_cmd_bl <= std_logic_vector(to_unsigned(read_burst_count-1,6)); 
						  
						if(c3_p0_cmd_full = '0') then
							c3_p0_cmd_en  <= '1'; --pass command to MIG
							current_state <= command_buffer_check_rd;
							read_counter  := to_unsigned(read_burst_count,5);
						end if;
						
					when command_buffer_check_rd =>
												
						c3_p0_cmd_en 		<= '0';
						
						if(c3_p0_rd_empty = '0') then
							c3_p0_rd_en 		<= '1';
							read_data_to_rd 	<= '1';
										
							--Take care of port alignment
							if(location_2_lsb = "00") then
								data_to_rd <= c3_p0_rd_data(15 downto 0);
							else --location_2_lsb = "10"
								data_to_rd <= c3_p0_rd_data(31 downto 16);
							end if;
							
							current_state <= read_buffer_check;
						end if;
						
					when read_buffer_check =>
								
						c3_p0_rd_en 		<= '0';
						read_data_to_rd 	<= '0';
						read_counter := read_counter - 1;
						
						if(read_counter = 0) then
							prcs_ready 	  <= '1';
							current_state <= procedure_finished;
						else
							current_state <= command_buffer_check_rd;
						end if;	
							
					when catch	=>
						test_pin_signal <= '1';
													
					when others =>
						NULL;
				end case;
			end if;
					
	end process;
	
--test_pin <= test_pin_signal;
location_2_lsb <= location(1 downto 0);
						
end architecture Behavioral;

 

Xilinx Employee
austin
Posts: 3,678
Registered: ‎02-27-2008
0

Re: How fast is reading and writing with MIG?

b,

 

Welcome to the world of DRAM memories.  Yes, it takes quite a bit of optimization to push such memories to even 50% efficiency (they read, or write at even half the rate they are being clocked at).

 

This is not new, and definitely not something that 'only' happens in FPGA implementations:  it happens whenevere these devices are being used.  One has to eanable a row, wait, and then access.  Whhenn accessing, it is best to access as many locations as possible (read in 256 bytes at a time, or write many bytes at a time, before you change the row address).


Significant software techniques, and added hardware is often used to increase the efficiency:  re-ordering all operations in a queue so that row address changes are minimized, performing all the reads at once, etc.

 

http://www.xilinx.com/txpatches/pub/documentation/misc/improving%20ddr%20sdram%20efficiency.pdf

Austin Lesea
Principal Engineer
Xilinx San Jose
Expert Contributor
gszakacs
Posts: 5,264
Registered: ‎08-14-2007
0

Re: How fast is reading and writing with MIG?

Actually the Spartan 6 MCB does a pretty good job of merging multiple small read or write

commands when you access the memory sequentially.  Without going through your code,

my best guess is that when writing, your commands are issued while the command queue

is not full - this gives the MCB a "lookahead" at the next write command and allows it to

burst them together.

 

When reading, if you always wait for returned data or an empty command queue before issuing

the next read, then you will never get close to the memory's maximum bandwidth.  However if

you either use a longer burst (make the user interface burst as long or longer than the memory

chip burst size) or queue up multiple read commands, then you can gain some speed.

 

-- Gabor

-- Gabor
Expert Contributor
eteam00
Posts: 7,505
Registered: ‎07-21-2009
0

Spartan-6 MCB performance

Actually the Spartan 6 MCB does a pretty good job of merging multiple small read or write commands when you access the memory sequentially...

 

... However if you either use a longer burst (make the user interface burst as long or longer than the memory chip burst size) or queue up multiple read commands, then you can gain some speed.

 

This recent thread includes some very interesting (and, to me, surprising) Spartan-6 MCB performance trial results.  In particular, read posts #7 and #9.

 

A comprehensive white paper on Spartan-6 MCB performance would be very interesting to Spartan-6 customers.  UG388 has no useful information for understanding how to maximise effective performance from the MCB.

 

Some examples:

 

  • For consecutive read (or write) operations, is there an optimal transaction burst length (cmd_BL)?  Does MCB "overhead" incur one or more "dead" cycles between back-to-back operations?
  • When user access patterns effectively refresh the DRAM (for example:  video buffer fills or fetches), does MCB skip refreshes for rows which have recently been accessed (and refreshed)?  When user access patterns are organised to also refresh the memory, can (redundant) MCB refresh activity be disabled?
  • Can refresh be performed entirely during opportune "dead" times -- such as video blanking intervals -- to ensure that refresh activity does not interfere with memory access during other times?  Does MCB simply schedule a refresh every tREFI?  Or does MCB understand that once all rows have been refreshed in a certain period (typically 64mS for commercial DDR2), no further refresh cycles are needed in that period?

For some unknown reason this sort of useful applications information, which is known by design, was not deemed worth of inclusion in UG388.

 

-- Bob Elkind

SIGNATURE:
README for newbies is here: http://forums.xilinx.com/t5/New-Users-Forum/README-first-Help-for-new-users/td-p/219369

Summary:
1. Read the manual or user guide. Have you read the manual? Can you find the manual?
2. Search the forums (and search the web) for similar topics.
3. Do not post the same question on multiple forums.
4. Do not post a new topic or question on someone else's thread, start a new thread!
5. Students: Copying code is not the same as learning to design.
6 "It does not work" is not a question which can be answered. Provide useful details (with webpage, datasheet links, please).
7. You are not charged extra fees for comments in your code.
8. I am not paid for forum posts. If I write a good post, then I have been good for nothing.
Super Contributor
bouvett
Posts: 162
Registered: ‎09-22-2010
0

Re: Spartan-6 MCB performance

hi all,

 

thanks for your replies.i will read them.. 

 

But just a quick query, does it make sense that the read data rate is so low compared to the write data rate? doesn't it make more sense (electronically), that the read is faster?

 

thanks for your time.

 

bouvett

Expert Contributor
rcingham
Posts: 2,010
Registered: ‎09-09-2010
0

Re: Spartan-6 MCB performance

The maximum acheivable burst rate is the same.
But (if my experience with Virtex-4/5 controllers is anything to go by) there is a lot of latency with a read, which will reduce any realistic effective rate measurement.

------------------------------------------
"If it don't work in simulation, it won't work on the board."
Expert Contributor
gszakacs
Posts: 5,264
Registered: ‎08-14-2007
0

Re: Spartan-6 MCB performance

For video applications, which typically use very long sequential reads or writes, it is

possible to get very close to the theoretical maximum bandwidth of the memory.

 

As for skipping refresh, this is not an option for DDR memories according to the JEDEC

standard which guarantees a refresh at regular intervals in order for the memory to

update the internal DLL while most of the interface pins are idle.  It is not clear that

many DRAM chips require this (Micron explicitly states they don't), but I'd be surprised

if the MCB designers would ignore the requirement.

 

-- Gabor

-- Gabor
Expert Contributor
eteam00
Posts: 7,505
Registered: ‎07-21-2009
0

Re: Spartan-6 MCB performance

As for skipping refresh, this is not an option for DDR memories according to the JEDEC standard which guarantees a refresh at regular intervals in order for the memory to update the internal DLL while most of the interface pins are idle.

 

First, a semantic nit:  I believe you intended to write that the JEDEC standard requires (rather than guarantees) refresh at regular intervals...

 

The "skipping" to which I refer is directed at the MCB logic, and not the memory device.  Re-phrased:

 

Is the MCB design clever enough to realise when refreshes are redundant and unnecessary (and can be skipped), or will the MCB issue a refresh transaction every tREFI whether needed or not?

 

There are proven methods to explicitly schedule refresh activity when it will have the least impact on overall system performance.  Is the Spartan-6 MCB supportive of such practices?  To my understanding, this is a useful but unanswered question.

 

-- Bob Elkind

SIGNATURE:
README for newbies is here: http://forums.xilinx.com/t5/New-Users-Forum/README-first-Help-for-new-users/td-p/219369

Summary:
1. Read the manual or user guide. Have you read the manual? Can you find the manual?
2. Search the forums (and search the web) for similar topics.
3. Do not post the same question on multiple forums.
4. Do not post a new topic or question on someone else's thread, start a new thread!
5. Students: Copying code is not the same as learning to design.
6 "It does not work" is not a question which can be answered. Provide useful details (with webpage, datasheet links, please).
7. You are not charged extra fees for comments in your code.
8. I am not paid for forum posts. If I write a good post, then I have been good for nothing.
Expert Contributor
gszakacs
Posts: 5,264
Registered: ‎08-14-2007
0

Re: Spartan-6 MCB performance

The "skipping" to which I refer is directed at the MCB logic, and not the memory device.  Re-phrased:

 

Is the MCB design clever enough to realise when refreshes are redundant and unnecessary (and can be skipped), or will the MCB issue a refresh transaction every tREFI whether needed or not?

 

My point was that according to JEDEC, refreshes cannot be skipped - not because a row will lose its data,

but because the memory device needs periodic updates to the DLL circuit.  This was not the case in the old single-

data-rate SDRAM's which had no local DLL.  This requirement may have been removed for newer varieties of

DDR (DDR2 DDR3) but it is certainly in the original DDR memory JEDEC spec.

 

-- Gabor

-- Gabor
Expert Contributor
eteam00
Posts: 7,505
Registered: ‎07-21-2009
0

Re: Spartan-6 MCB performance

My point was that according to JEDEC, refreshes cannot be skipped - not because a row will lose its data, but because the memory device needs periodic updates to the DLL circuit.  This was not the case in the old single-data-rate SDRAM's which had no local DLL.  This requirement may have been removed for newer varieties of DDR (DDR2 DDR3) but it is certainly in the original DDR memory JEDEC spec.

 

From a recent Micron 1Gbit DDR2 device datasheet:

 

The refresh period is 64ms (commercial) or 32ms (industrial and automotive). This equates to an average refresh rate of 7.8125μs (commercial) or 3.9607μs (industrial and automotive). To ensure all rows of all banks are properly refreshed, 8,192 REFRESH commands must be issued every 64ms (commercial) or 32ms (industrial and automotive).

 

In other words, the memory requires 8K refreshes every 64mS, which is a much more flexible requirement than a single refresh every 7.8125uS.  The unanswered question stands:  Does Spartan-6 MCB permit some or all of the flexibility allowed by the Micron memory device?

 

From a several-years-old Micron 1Gbit DDR3 device datasheet (notice the highlighted differences!):

 

The refresh period is 64ms when TC is less than or equal to 85°C. This equates to an average refresh rate of 7.8125μs. However, nine REFRESH commands should be asserted at least once every 70.3μs. When TC is greater than +85°C, the refresh period is 32ms. Although JEDEC specifies tREFI as a MAX, Micron allows REFRESH commands to be burst provided that the maximum refresh period is not violated.

 

-- Bob Elkind

SIGNATURE:
README for newbies is here: http://forums.xilinx.com/t5/New-Users-Forum/README-first-Help-for-new-users/td-p/219369

Summary:
1. Read the manual or user guide. Have you read the manual? Can you find the manual?
2. Search the forums (and search the web) for similar topics.
3. Do not post the same question on multiple forums.
4. Do not post a new topic or question on someone else's thread, start a new thread!
5. Students: Copying code is not the same as learning to design.
6 "It does not work" is not a question which can be answered. Provide useful details (with webpage, datasheet links, please).
7. You are not charged extra fees for comments in your code.
8. I am not paid for forum posts. If I write a good post, then I have been good for nothing.