cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
New2Zyqn_Vivado
Visitor
Visitor
1,060 Views
Registered: ‎05-25-2021

Transfer highspeed digital input from ADC to DRAM directly with going through ARM

Hello,

I am new to zyqn and vivado.  I tried searching through the forum for similar challenge that I have but was not able to find any.

 

I have an application that calls for the zyqn fpga to receive LVDS data from a highspeed 300msps ADC to be stored directly to a DRAM off the zyqn fpga without going through ARM.  Can someone please provide some pointers or point me to application notes that can guide me through this?

Thanks.

0 Kudos
24 Replies
bkamen
Explorer
Explorer
953 Views
Registered: ‎07-17-2014

Are you not wanting to use the ARM (PS) at all or are you saying you don't want the PS to be moving the data -- you'd like it to be DMA based?

If you're not wanting to have the ARM move the data (either polled I/O or IRQ driven data transfer) -- you would use DMA and then the ARM would just handle doing something with it.

If you don't want the PS doing anything ever and this is a design you're rolling from scratch, I believe you can still use the MIG wizard to build a memory interface that would allow you to use ARM based or custom connected RAM. My memory is a little rusty on the details for sharing RAM with the PS... but either way, you'd start with the MIG (Memory Interface Generator) user guide to see if it's applicable to what you need.

I did a design where a 576Mb/s 4-lane LVDS imager could directly write to PS memory via DMA - but I was also using PetaLinux on the PS to make decisions and move the data off to storage. So not sure if that's what you'd want either.

Hope that helps,

 -Ben

drjohnsmith
Teacher
Teacher
937 Views
Registered: ‎07-09-2009

The basic answer is you want to use DMA to move data 

Assuming the memory is on the PS side, and its going to be shared by the ARM,

The PS side will require a small buffer store, ( FIFO ) that the video gets written into 

      when that FIFO has hit a set level, then the DMA moves the FIFO content fast to the Memory

Your going to need some sort of PS logic to control when to put data into the FIFO 

    and some sort of logic to indicate to the DMA as to when and how much to transfer to where,

 

The DMA engine can be in the PS or the PL side,

    and how you program it can be in the PL or the PS side.

 

Typically, the PS side would use its DMA engine to move the data ( scatter gather mode probably )

    and the DMA engine is programmed up by the ARM.

         The arm gets an interrupt when the FIFO has data to move, 

Or you can write your own state machine controller in the PS to do the control,

    but that's not as easy by a long way as using the pre defined routine in the PS side.

 

 

 

 

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
bruce_karaffa
Scholar
Scholar
919 Views
Registered: ‎06-21-2017

Do you have DDR RAM on the PL side of your board or only on the PS side?  If you will use the PS side RAM, then as @bkamen and @drjohnsmith mentioned, you should use DMA.  You may want to look into the DMA controllers and find their maximum transfer rate.  The maximum S2MM rate listed in the AXI DMA guide (PG021) is about 300MB/S.  You may also want to check how large of a memory block you can allocate in the RAM.  Even if you don't plan to use the processor, you need to use software to keep this memory range out of the RAM the processor uses.  How much data do you plan to collect at one time and what do you expect to do with it?

New2Zyqn_Vivado
Visitor
Visitor
891 Views
Registered: ‎05-25-2021

Hello bkamen!!!!

 

Thank you so much for responding!  I looked at the datasheet for zynq for DMA speed.  At max transfer, the DMA can trasfer 400Mbps.  The ADC is throwing more bits than that per second (input from ADC is 500Mbps).  So, it will have to bypass the DMA.  It is a design I am rolling from scratch.  I will look at MIG user guide for pointers.  Thank you so much for even taking the time to respond!  Really truly appreciate it!

 

Best best regards,

Newbie

New2Zyqn_Vivado
Visitor
Visitor
881 Views
Registered: ‎05-25-2021

Hey bruce_karaffa, thank you for responding!  It is appreciated.  I looked at the datasheet for zynq for DMA speed.  At max transfer, the DMA can trasfer 400MBps.  The ADC is throwing more bits than that per second (input from ADC > 500MBps).  So, it will have to bypass the DMA.  So the data will be collected continuously for less than 1 sec (data will be less than 256MB so it can fit the DRAM size) and then transfer out via gigabyte ethernet to a highspeed processor for DSP manipulations.

 

Best best regards,

Newbie

0 Kudos
New2Zyqn_Vivado
Visitor
Visitor
881 Views
Registered: ‎05-25-2021

Hello bkamen!!!!

 

Thank you so much for responding!  I looked at the datasheet for zynq for DMA speed.  At max transfer, the DMA can trasfer 400Mbps.  The ADC is throwing more bits than that per second (input from ADC is 500Mbps).  So, it will have to bypass the DMA.  It is a design I am rolling from scratch.  I will look at MIG user guide for pointers.  Thank you so much for even taking the time to respond!  Really truly appreciate it!

 

Best best regards,

Newbie

0 Kudos
New2Zyqn_Vivado
Visitor
Visitor
872 Views
Registered: ‎05-25-2021

drjohnsmilth,

 

Thank you so much for responding!  I will take your pointers and look into them.  Thank you so much for even taking the time to respond!  Really truly appreciate it!

 

Best best regards,

Newbie

0 Kudos
bkamen
Explorer
Explorer
849 Views
Registered: ‎07-17-2014

Well - hang on there.

What is the word-size of the LVDS stream from the ADC? That gets divided down by the FPGAs ISERDES unit.

So if you have an 8bit word size? Then the data rate becomes 500Mb/s / 8 = 62.5MB/s (easily under the 300MB/s rate Bruce stated) -- but the important part is your data clock is now 1/8th of what it was for the LVDS serial channel.

Don't feel like you have to stream the data to RAM bit by bit (no pun intended).

you can do it byte by byte (/8) or word by word (/16) -- what you should be doing is packing the data to the bus width which for the ARM side, can be 64bits wide for the HP port.

A single 500Mb/s channel into a 64bit DMA channel would end up being a transfer clock rate of only 7.8125 MHz. - although I think the RAM is only 32bits wide, so it would end up being 2 * 7.8125MHz.

(/8 for the ISERDES and then another /8 stacking 8 bytes into a 64bit channel to travel into RAM)

Would that let you do accomplish what you need?

Cheers,

 -Ben




drjohnsmith
Teacher
Teacher
826 Views
Registered: ‎07-09-2009

Does the Mega Bytes per Second not stay the same into and out of the iSerDes ?

 

 

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
0 Kudos
New2Zyqn_Vivado
Visitor
Visitor
759 Views
Registered: ‎05-25-2021

Hey bkamen.  The ADC is output 250Msps and each sample is 16bits wide.  So that's 4,000,000,000b/s which equates to 500MBps (sorry I typed wrong.  The DMA maxed out at 400MBps).  So, as i was thinking through it, it doesn't seem to matter if the DMA pushed 8 bits or 64bits.  It gets swamp either case.  Am I wrong?  So, either I pause the ADC long enough for DMA to transfer the cumulated data and bypass the DMA completely.  The latter seems hard and I just learned vivado and zynq 2 weeks ago.

 

But bkamen, I want to thank you for taking your time to respond and help.  Please keep providing tips and pointers that can guide me along.

My utmost respect and gratitude.

 

Regards,

newbie

0 Kudos
bkamen
Explorer
Explorer
734 Views
Registered: ‎07-17-2014

Ok - I was going to ask about big B versus little b next.... the discussion so far had some moments that could use some clarity.

Would you mind sharting what ADC you are using?

I would be curious to know how many LVDS lanes it has? I also forgot to ask, is the data transfer SDR or DDR? DDR would also halve the clock rate since a bit is transfered on every clock transition.

If the transmit rate of the data is really 4Gb/s (gigabits) over a *single lane* -- then you can't even use a Zynq SoC.

Check out: https://www.xilinx.com/support/documentation/data_sheets/ds187-XC7Z010-XC7Z020-Data-Sheet.pdf
And look at page 33 -- specifically Table:50.

I have a feeling your ADC has multiple transmit lanes which allows transmitting the data at the rate you need but with 2 or more channels, that lowers the clock rate of each channel and allows parallelizing the data feed into FPGA.

Now -- there's still the question of: if the ADC is giving you 4Gb/s, how are you going to stream that over a 1Gb/s Ethernet link?




avrumw
Expert
Expert
705 Views
Registered: ‎01-23-2009

@New2Zyqn_Vivado , where did you read that the DMA (and which DMA) can transfer no more than 400MB/s? If you are looking only at the DMA in the PS, that is not the way to go; there are many ways of doing DMA in a Zynq system.

Regardless of which DDRx-SDRAM controller you are using (in the PS or in the PL) they are accessed via AXI interfaces - either through the Zynq "High-Performance" ports (which are slaves) or the AXI port of the MIG generated in the PL (which is also a slave). These ports are capable of far more than 400MB/s; the PS ports are 64 bits wide and can run easily at more than 250MHz (and probably much higher) allowing more than 2GB/s raw throughput. The MIG AXI port can be designed to be even wider.

So one way to do this would be to write RTL code to capture the ADC data and convert it to an AXI Stream - the AXI Stream protocol is very easy to implement in hardware. The AXI-Stream interface would need to be wide enough to run at a reasonable speed, but even 16 bits at 250MHz would be enough to handle your data (you can easily go far wider and even a fair bit higher clock rate and not start having trouble).

Then you can use the "AXI Datamover" in S2MM mode (stream to memory mapped) which will turn this stream into a set of memory mapped writes to specific addresses. If the addresses used (which are programmed by the processor using AXI slave registers in the S2MM) target an address that is in the address space of the DRAM (either the PS or PL DRAM) then the stream will be written to the DRAM. 

This solution requires some RTL (to write the ADC capture and the AXI-Stream master), the use of the IP integrator to connect all the components (your RTL module, some AXI infrastructure, the AXI Datamover and the Zynq itself) and some software running on the PS to control the S2MM (along with all the other stuff the PS has to do - probably an operating system, etc...). This is a "moderate" complexity project, but is definitely implementable. And it will go WAY faster than 500MB/s.

Avrum

drjohnsmith
Teacher
Teacher
652 Views
Registered: ‎07-09-2009

One thought, 

  I have used these sort of ADC's for RF systems, 

      The first thing we did was to decimate the data, i.e filter out the band we wanted, 

          which lowered the data rate,

no point storing what is not needed,

 

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
New2Zyqn_Vivado
Visitor
Visitor
623 Views
Registered: ‎05-25-2021

Hello bkamen!  Thanks for the link.  I will look into it. 

Yes, the ADC is a TI part and the part number is ADS42LB69IRGCT .  It has 2 channel but we only need 1.  Each channel has 8 LVDS lines and the data output are obtained on opposite edge of each clock (so DDR).  So, we will not need continuous streaming but the ADC must sample at at least 250Msps (sorry I can't disclose why at this time).  It is easiest to think of our product as application of procedures (each procedure will require collection of data from ADC) with each procedure up to a few seconds long before the next procedure starts.  

Thanks for the tips so far.  Much appreciated.

Regards,

Newbie

0 Kudos
New2Zyqn_Vivado
Visitor
Visitor
622 Views
Registered: ‎05-25-2021

Hello ayrumw!  Thanks for sharing your thoughts.  Below is where I got the 400MBps limitation.  I need to study your comments more.  If they are correct, then it would be what I need to do.  

 

Best regards,

Newbie

DMA Throughput.jpg

0 Kudos
New2Zyqn_Vivado
Visitor
Visitor
618 Views
Registered: ‎05-25-2021

Good tip drjohnsmith.  Thanks.

0 Kudos
New2Zyqn_Vivado
Visitor
Visitor
590 Views
Registered: ‎05-25-2021

Hi Avrumw, I looked into Vivado and looked at AXI Datamover.  It looks applicable.  I want to share my hardware architecture with you and along the way, ask you some questions on terminologies as I am not clear with what they meant.  So, we bought a custom SOM board with Zynq 7000 on it.  On this SOM board is an external memory DRAM of 256MB that is external to zynq 7000 (that is, it is connected to zynq via zynq's IO pins and therefore zynq can read/write from/to it). 

1st Question - I was asked if DRAM is on PS or PL side.  What is meant by that?  The DRAM is external to zynq and so is connected to zynq through zynq's IO pin.  How do I tell if it is on PL or PS side?

Continuing, the SOM board has an expansion card on which a daughter card can be attached to.  We are designing a daughter with a high speed ADC (ADS42LB69IRGCT) with LVDS output and differential clock output and these go to zynq.

2nd Question - Looking at AXI Datamover, there is MM2S control signals and S2MM control signals.  Is the external DRAM considered the "MM" side and the ADC considered the "S" side?  How do I know what is on the "MM" side and what is on the "S" side?

Thanks in advance.

Regards,

Newbie

0 Kudos
New2Zyqn_Vivado
Visitor
Visitor
581 Views
Registered: ‎05-25-2021

Sorry Avrumw, 2 more questions,

 

1.  given my description in an earlier post, is the external DRAM considered "system memory"?  I read the AXI Datamover doc and it mentioned the following

"Applications - The AXI DataMover provides high-speed data movement between system memory and an AXI4-Stream-based target. This core is intended to be a standalone core for a custom design."

and I don't understand what is meant by system memory.  Embedded in zyqn 7000 are RAM and perhaps the AXI Datamover doc is referring to those RAM as "system memory".

 

2. When a doc in vivado mentioned "stream-based target", what do they by that?

 

Regards,

Newbie

0 Kudos
bkamen
Explorer
Explorer
572 Views
Registered: ‎07-17-2014

I'm lookin at the datasheet now.

This is the one I'm looking at https://www.ti.com/lit/ds/symlink/ads42lb69.pdf

There's a LVDS QDR mode or the LVDS DDR mode -- you've stated you're using "DDR" --

For a single ADC, there's an LVDS clock signal and 8 LVDS data pin pairs (for DDR mode)...  Yea, I see how for the D0 pair, data result bits D0, D1 are sent and then for the D2 pair, it's D2 then D3.
If I'm reading this right, the data clock would still be 250MHz. Which should be find. Keep your traces short -- you can't really use the ISERDES, but make sure you follow all the other recommendations because you'll probably want to use the IDELAY anyway if trace lengths are an issue. (Also, make sure all the pins come in the same I/O Bank).

The Xilinx FIFO IP will allow configuring a FIFO that's 16bits in and 32bits out. This effectively cuts the transfer rate to RAM in half. And if the RAM is 32bits wide, that might be good enough. But as others have mentioned, there are ways to scale the AXI bus up even higher to transfer more bits at a time at a lower clock/transfer rate. (this makes pushing the data around the FPGA a little easier now that timing isn't so tight.)

I would have to go look up how wide the PS memory interface is. But I can tell you in my CMOS imager project, I used the 64bit HP interface for DMA on the PS. In your case, this would cut your data transfer rate to 1/4 the speed of it's ingress into the FPGA. --- so far, that sounds pretty do-able.

Are you buying a pre-made board like the Avnet MicroZed-020 or are you going to spin your own design from scratch?

Lastly, if you haven't looked at these userguides and appnotes yet -- make sure to have them handy.

DS187 - Zynq-7000 All Programmable SoC (I assume you have this one already)
UG471 - 7 Series FPGAs SelectIO Resources
Xapp855 - LVDS Deserializer (you may not need any of it - but it has bits and pieces you might find useful.)
Xapp860 - 16 Channel DDR LVDS Interface with Real-Time Window Monitoring (same. You probably won't need it but I'd bet it has some useful bits for you)
Xapp866 - An interface for TI ADC with Serial LVDS Outputs (and not exactly to the ADC you're using - but will probably give you lots of ideas)

Cheers,

 -Ben



New2Zyqn_Vivado
Visitor
Visitor
535 Views
Registered: ‎05-25-2021

Hello Ben!!!! Thank you for taking your time on this.  It is appreciated. So, noted, for sure the traces will be short and neat and their electrical length the same.  Good tip on same bank.  I already assigned the IO bank.  For the board we are using, I could not get the LFVS input to be on the same bank (I assumed the same bank meant all IO from bank B (or A) for example).  I use selectIO interface wizard on Vivado to create 8 numbers of 2 input LVDS block.  I believe Vivado will automatically insert buffer as needed on the inputs and output (never used Vivado before till now and so this is strictly an assumption on my end).  I designed my own RTL to deserialized the inputs and designed my own demux to concatenate 4 sets of 16bits input to form a 64b wide bus for DMA to push through.  I think I have implemented it according to some of the suggestions by you and others.  The board is not MicroZed.  It is from another company (sorry, I can't disclose who it is at this time but will do so as soon as I am allowed to do so) and the company does not have a board file for vivado which made it a harder to use vivado.

Thanks for all the recommended docs.  I have read through a lot of them before I decided to seek for advices here.  Please take a look at some of the questions I post to avrumw.  If you have answers to them, I would appreciate seeing them.

And again, much respect and gratitude for taking your time to a stranger like me.  Really really appreciate it.

Regards,

Newbie

0 Kudos
bkamen
Explorer
Explorer
512 Views
Registered: ‎07-17-2014

Where you said:

1. given my description in an earlier post, is the external DRAM considered "system memory"? I read the AXI Datamover doc and it mentioned the following

"Applications - The AXI DataMover provides high-speed data movement between system memory and an AXI4-Stream-based target. This core is intended to be a standalone core for a custom design."

and I don't understand what is meant by system memory. Embedded in zyqn 7000 are RAM and perhaps the AXI Datamover doc is referring to those RAM as "system memory".


I would have to see the document you're looking at (but the PDF I mention below? Also discusses more generally the DataMover IP and what it does) -- so for all the ways I've used AXI IP to move data around, I'm going to say for now (unless one of the others would like to add/subtract from what I'm saying) that "system memory" can be any memory on the FPGA (like block ram or CLBs) or off-chip memory like the external RAMs (Like the DDR RAM connected to the dedicated PS or even other RAM that are connected to PL-I/O with custom RTL or the MIG tool)


2. When a doc in vivado mentioned "stream-based target", what do they by that?


"stream-based target"
means any code/device/RTL that's the receiver of an AXI-stream.

For this - you'll want to read (I know - more reading - but you'll have a better understanding than any oversimplified answer I would give you here)


ug761 - AXI Reference Guide -- And then look at Chp.1 (on my PDF, it's pdf page 12) and read the descriptions that explain the difference between AXI memory mapped and stream protocols.

Let us know if that helps.

As for "whose SoM you're using" -- I was just wondering if you were rolling your own or using something off the shelf.

If it's off-the-shelf, then make sure to double check PCB tolerances like track lengths and things. (which you've done. So we've covered that.)

Cheers,

 -Ben

New2Zyqn_Vivado
Visitor
Visitor
504 Views
Registered: ‎05-25-2021

Hey Ben, 

it is off-the-shelf.  We may choose to roll our own in the future.  I have not read UG761.  I do need to read that.  The document where I read about system memory is straight from the vivado design suite (PG022).  Thanks for all your inputs so far.  There is a lot of info for me to dive into and chew on.

There has been a great outpour of guidance from a lot of people here especially yourself.

Thanks so much.

Newbie

0 Kudos
avrumw
Expert
Expert
392 Views
Registered: ‎01-23-2009

Below is where I got the 400MBps limitation. 

(Which appears to be from PG201). 

I am pretty sure you are misinterpreting that table. First, the table about says that Fmax for AXI4 and AXI4-Stream are a 150MHz (Artix-7 in a -1 speedgrade) to 280MHz (Virtex-7/Kintex-7 in a -3 speedgrade). Furthermore, since the AXI can be 64bits wide for the PS/PL links, this means you have a raw throughput of 1.2GB/s to 2.24GB/s.

The table you showed (table 2-3) is showing a calculation of efficiency- not maximum throughput. The maximum throughput is transferring one word of data every single clock. The S2MM and MM2S have to do some other stuff, though (managing the pointers), and this takes additional clock cycles. This table is showing the effect of these extra operations on efficiency.

For this table they took an example design. In this case their 100MHz design had a throughput of 399.03MB/s, which is 99.76% of the theoretical maximum - meaning the theoretical maximum was 400MB/s. This corresponds to a 32 bit AXI4 running at 100MHz - neither of which (the width or the frequency) are as fast as the AXI can go.

and I don't understand what is meant by system memory

The word "System memory" is used (intentionally) vaguely. In an AXI system you have a number of interconnected AXI components - masters, slaves, and a variety of interconnects and converters. There are different kinds of slaves that can act as generic read/write memory

  • The PS DDRx-DRAM controller
  • A PL DDRx-DRAM controller generated by the MIG
  • The "High Bandwidth Memory (HBM)" controller on UltraScale+ devices with HBM
  • BlockRAMs
  • UltraRAMs

All of these can act as "system memory".

When a doc in vivado mentioned "stream-based target", what do they by that?

They mean a slave (target is another word for a slave) that has an input that is an AXI4-Stream. Take a look at this Xilinx blog which gives an Introduction to AXI.

Avrum

New2Zyqn_Vivado
Visitor
Visitor
276 Views
Registered: ‎05-25-2021

Thanks Avrum. I'll re-read 400MBps limitation again. It is possible that I misunderstood it.  Your responses are appreciated!

Thanks.

Newbie

0 Kudos