cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
icosmos72
Visitor
Visitor
7,465 Views
Registered: ‎01-04-2010

SDRAM-BRAM DMA

Hello community --- thanks for your time ---
 
We have an application where data needs to flow BRAM->SDRAM, and then SDRAM->(different) BRAM, without the CPU spending all its time doing memory copies.  We are on an ML507 board.  No problem, we thought, there are these four DMA channels (minus one for the Ethernet controller), we'll let them do the chattering on the PLB while the CPU is off doing smart mathematical tasks.
 
Now I finally realize that the DMA is not what I thought it was, copy from this memory-mapped location to this other one; but is specific to LocalLink interfaces on one side.  Which is fine, because that should mean less traffic on the PLB itself.  But I'm embarrassed to say I'm stumped because I can't find any reference to how to convince (presumably) the BRAM to be a LocalLink device that I can DMA to.
 
I wonder now if I'm totally barking up the wrong tree: that there must be a way to get a pile of bytes from SDRAM, to BRAM, without the processor itself constantly running "*ptr1++ = *ptr2++"; but maybe it's not called DMA, or maybe it's some other block within Base System Builder, or ... I just don't know where to look.  Any thoughts very much appreciated.
 
Regards ---
Scott McDermott 
 
 
0 Kudos
Reply
9 Replies
dylan
Xilinx Employee
Xilinx Employee
7,444 Views
Registered: ‎07-30-2007

Scott,

I think you're looking for the Central DMA core.  It reads data from one PLB port and writes it out to another PLB port. Each transfer requires a few registers to be set up from the CPU.

It sounds like you were looking at the SDMA PIM of the MPMC, which is a scatter-gather software functionality, more oriented for streaming type interfaces to be packaged up and placed in memory.

Dylan

icosmos72
Visitor
Visitor
7,435 Views
Registered: ‎01-04-2010

Hi Dylan ---

 

Thanks so much for your extremely quick reply, on a weekend no less!  (My spouse thanks you too; hitting this issue was unexpected and made me quite suddenly quite un-fun to be around.)  Indeed the central DMA looks like a good-fit solution to the problem I described.

 

One question (to you or to anyone with a thought on this): how hard is it to write or adapt a PLB Master interface?  All of my BRAMs are FIFOs; if they can take the reins themselves, and push / pull a burst of data with the SDRAM when they have / need it, it removes a host of buffer-maintenance issues.  If, as with Dylan's recommendation for the Central DMA, the answer is as clear as "Oh, you want XPS_BRAM_Master_something", then great, we have another option to consider.  If it's "Hmm, ya, you might be able to make that work," it's probably more than we can take on right now.  Just looking for opinions from people who've made the PLB sing, when we're only managed to get it to hum a bit.

 

Thanks again.

 

--Scott McDermott 

 

 

 

0 Kudos
Reply
dylan
Xilinx Employee
Xilinx Employee
7,433 Views
Registered: ‎07-30-2007

Scott,

The PLB IPIF abstracts the details of the bus connection for you.  The "Create or Import Peripheral" wizard will create an HDL template for a master which performs some bursts across the PLB.  You will have to modify that example.

 

It is not exactly what you need, but it should be reasonably easy HDL design work.

 

One thing to take care of is to make sure you are taking care of cache coherency with the CPU if you have all these masters moving things around. But thats more of a standard embedded design concern.  The IPIF also has an interrupt service to help update the CPU software with DMA activity.

Dylan

0 Kudos
Reply
wting
Visitor
Visitor
7,400 Views
Registered: ‎03-16-2010

Dylan,

 

My name is Wen-Chun. I am working with Scott on solving this problem. We have come up with a  possbile approach:

 

Build a custom IP that acts as a bus master, and design the state machine in the "user_logic.vhd" that controls the data transfer:

Processor <--> PLB <--> IP <-->dual port BRAM <--> other logics.

                  <--> MIC <--> SDRAM

The operation is: the "other logics" generates data and writes to the BRAM (in the fabrics).  The IP monitors the BRAM rd/wr pointers

and read data from it, then pushes the data on to the PLB, enrouted to the memory-mapped off-chip SDRAM, all without the intervention

of the processor. The processor reads the data from the SDRAM when it needs to. (The BRAM rd/wr pointers are also

memory-mapped and available to the processor) . The IP can also read data from the SDRAM and write them to the BRAM.

 

The question for this approach is (1) Is the SDRAM accessible this way? We thought it is memory-mapped and the cross-bar should route

the data (2) from the user_logic.vhd generated by EDK for a bus-master test custom IP. It seems three parts are needed to perform data transfer betwwen

the IP and the PLB: is it correct ?

 

(a) a state machine controlling signals such as:

 

  IP2Bus_MstRd_Req  <= mst_cmd_sm_rd_req;
  IP2Bus_MstWr_Req  <= mst_cmd_sm_wr_req;
  IP2Bus_Mst_Addr   <= mst_cmd_sm_ip2bus_addr;
  IP2Bus_Mst_BE     <= mst_cmd_sm_ip2bus_be;
  IP2Bus_Mst_Type   <= mst_cmd_sm_xfer_type;
  IP2Bus_Mst_Length <= mst_cmd_sm_xfer_length;
  IP2Bus_Mst_Lock   <= mst_cmd_sm_bus_lock;
  IP2Bus_Mst_Reset  <= mst_cmd_sm_reset;

 

(b) state machines for locallink interfaces that controls signals such as those:

 

  IP2Bus_MstWr_src_rdy_n <= not(mst_llwr_sm_src_rdy);
  IP2Bus_MstWr_src_dsc_n <= '1'; -- do not throttle data
  IP2Bus_MstWr_rem       <= (others => '0');
  IP2Bus_MstWr_sof_n     <= not(mst_llwr_sm_sof);
  IP2Bus_MstWr_eof_n     <= not(mst_llwr_sm_eof);

 

(c)  a data FIFO that holds the data:

 

 DATA_CAPTURE_FIFO_I : entity proc_common_v3_00_a.srl_fifo_f
    generic map
    (
      C_DWIDTH   => C_MST_DWIDTH,
      C_DEPTH    => 128
    )
    port map
    (
      Clk        => Bus2IP_Clk,
      Reset      => Bus2IP_Reset,
      FIFO_Write => mst_fifo_valid_write_xfer,
      Data_In    => Bus2IP_MstRd_d,
      FIFO_Read  => mst_fifo_valid_read_xfer,
      Data_Out   => IP2Bus_MstWr_d,
      FIFO_Full  => open,
      FIFO_Empty => open,
      Addr       => open
    );

 

 

Thanks.

Wen-Chun

0 Kudos
Reply
dylan
Xilinx Employee
Xilinx Employee
7,394 Views
Registered: ‎07-30-2007

Wen-Chun,

Since you mention "crossbar" I assume you are in Virtex-5FX. Yes, in that case the master SPLB0 and SPLB1 ports on the crossbar can be driven by the custom IP user_logic into SDRAM.

 

Yes, it seems like you are on the right track.

Message Edited by dylan on 03-16-2010 12:29 PM
0 Kudos
Reply
wting
Visitor
Visitor
7,390 Views
Registered: ‎03-16-2010

Dylan,

 

Thank you for the speedy reply. I have another question: if I instantiate a BRAM block in the EDK without connecting either its port A or B (no BRAM controller),

rather I edit the .mhs file to add the following ports to it:

 

PORT BRAM_Dout_A

PORT BRAM_Addr_A

PORT BRAM_WEN_A
PORT BRAM_EN_A
PORT BRAM_Clk_A

 

PORT BRAM_Dout_B

PORT BRAM_Addr_B

PORT BRAM_WEN_B
PORT BRAM_EN_B
PORT BRAM_Clk_B

 

And then I connect these ports to my custom IP in the EDK. Would this work ?

 

Thanks,

Wen-Chun

0 Kudos
Reply
dylan
Xilinx Employee
Xilinx Employee
7,373 Views
Registered: ‎07-30-2007

I haven't done this myself, but it seems reasonable, and would expect it to work.

Dylan

0 Kudos
Reply
wting
Visitor
Visitor
7,369 Views
Registered: ‎03-16-2010

Dylan,

 

On a related subject. After I added two central DMA controllers to the EDK (want to get a feel for its size in case we have to utilize it as an option to our problem) and recompile the project in ISE, although both "slice registers" and "slice LUTs" count do increase roughly as expected, the overall slice usage stays the same. Could it be that a lot of slices were not fully utilized by the mapper and place&router before and it happens to be that the tool is able to reorganize all the resource and find places for everybody without using more slices ?

 

 

Thanks,

Wen-Chun

0 Kudos
Reply
dylan
Xilinx Employee
Xilinx Employee
7,367 Views
Registered: ‎07-30-2007

Correct. You saw the total number of LUTs/flops increase when you added the central DMA core, and the number of slices stayed the same- indicating that the extra logic was able to fit in existing underutilized slices.
0 Kudos
Reply