UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Scholar ronnywebers
Scholar
21,722 Views
Registered: ‎10-10-2014

AXI4 Stream - can I fix TLAST to zero and TVALID to one

Jump to solution

Hello,

 

I designed a simple IP block containing a 32-bit counter, that never stops counting and just wraps around. It produces an AXI4 Stream of count values, on value on each clock cycle. Later it will be replaced by data from an AD converter.

 

Upstream I connect it to a AXI4 Data fifo, and then to an AXI4 DMA.

 

I just want to stream the values to DDR memory to a circular buffer continuously, and then transfer the data over ethernet to a PC.

 

For the purpose of checking any missing count values, I just let the counter 'count' always.

 

I fixed TVALID to one, and TLAST to zero, as I want to generate a continuous stream of ADC data (like an oscilloscope)

 

Q : Is this 'allowed' for an AXI4 stream device?

** kudo if the answer was helpful. Accept as solution if your question is answered **
0 Kudos
1 Solution

Accepted Solutions
Xilinx Employee
Xilinx Employee
35,996 Views
Registered: ‎08-02-2011

Re: AXI4 Stream - can I fix TLAST to zero and TVALID to one

Jump to solution

Hello,

 

 

Thank you very much for these examples - wish I would have found these quicker - looks like these are not part of the Vivado examples ?

You're right; they are not part of the Vivado examples. I created them a while back to help with all the questions on the DMA :).

 

Your description of the design is correct. It does pause when tready goes low.

 

You can still use something very similar for your ADC if you are careful. Here are some specific things you'll need to consider:

1) Drive tlast in hardware. In ADC case, it's somewhat arbitrary because there's not really a notion of a 'packet.' So you can drive it by a counter similar to my design. Which sample you assert it should be less than or equal to the value you program into the DMA's 'Length' register. This value will affect your DMA throughput (see item 2).

2) DMA throughput will be affected by 2 things that you should be aware of. First is memory bandwidth. If you run out of available memory bandwidth, you will eventually see the DMA tready go low, so make sure the AXI4-side and memory controller are designed/configured appropriately to support enough bandwidth to avoid this. Second is your DMA transfer length. In simple mode, you need to write a minimum of 3 registers to kick off a new transfer. Tready will go low during this time. So the larger your transfer length, the less downtime you will have

3) At startup, DMA will deassert tready until it is kicked off by the software.

 

So how to solve it if you can not handle any tready assertion? I can think of two solutions:

1) Use an ASYNC FIFO (AXI Stream Data FIFO IP would work well for this) to cross from your ADC's nominal clock to a faster clock domain used for the AXI Stream interface. This way, your ADC can continue to write to the FIFO when DMA deasserts tready during reconfiguration. Since the DMA is running faster than the ADC, it will eventually catch up and the FIFO shouldn't overflow if designed with proper depth and clock rates to support necessary bandwidth/downtime.

2) The AXI DMA has Cyclic BD mode which will eliminate all downtime for reconfiguration. The DMA will basically work in a circular buffer mode forever. This has the disadvantage, though, of requiring you to use scatter gather. This costs area (and possibly memory bandwidth, depending on where you store your BDs) and is more complex for your software initialization.

 

www.xilinx.com
31 Replies
Xilinx Employee
Xilinx Employee
21,704 Views
Registered: ‎08-02-2011

Re: AXI4 Stream - can I fix TLAST to zero and TVALID to one

Jump to solution

Hello,

 

You need to be careful about this. The DMA requires that you assert tlast occasionally or it will hang (it is discussed a lot in these forums... just search for it).

 

You might also find these design helpful.

www.xilinx.com
Scholar ronnywebers
Scholar
21,690 Views
Registered: ‎10-10-2014

Re: AXI4 Stream - can I fix TLAST to zero and TVALID to one

Jump to solution

Thank you very much for these examples - wish I would have found these quicker - looks like these are not part of the Vivado examples ?

 

You're right about the DMA hanging, I got DMA errors during my tests (with TLAST fixed to '0')

 

I took a look at the example 'dma_ex_polled_v1_0', my main goal at the moment is to build a counter that simulates an external parallel ADC that keeps on spitting out samples each clock. So there's no way to 'pause' the ADC, I need to get every sample in the DDR memory. I can see more people struggling with this easy application I didn't find a single example nor tutorial on interfacing parallel ADC's in an easy way.

 

If I look at the counter in the file dma_ex_polled_v1_0_top.v (I'm only familiar with VHDL), is my understanding correct :

 

- the 32-bit counter is on hold as long as bit 0 of the 8 bit gpio port is low. In TDATA is just '0' and TVALID is '0'

- if gpio bit0 goes high, the counter starts incrementing, and tvalid goes high. TREADY does not influence the counter, so it's up to the upstream devices to keep up with the counter in order not to loose any count values. This would be the case for an external ADC too

- TLAST goes high on every multiple of 128 ? (so every 512 bytes) - not sure about this one, I'm not familiar with the Verilog constructs

- the 4 TKEEP signals (one for each byte) are fixed to '1'

 

I'll try to recreate the project on my Zedboard

 

** kudo if the answer was helpful. Accept as solution if your question is answered **
0 Kudos
Scholar ronnywebers
Scholar
21,683 Views
Registered: ‎10-10-2014

Re: AXI4 Stream - can I fix TLAST to zero and TVALID to one

Jump to solution
I made an error : TREADY does hold the counter - however, what with a real world ADC? If you cannot afford to loose any sample?
** kudo if the answer was helpful. Accept as solution if your question is answered **
0 Kudos
Xilinx Employee
Xilinx Employee
35,997 Views
Registered: ‎08-02-2011

Re: AXI4 Stream - can I fix TLAST to zero and TVALID to one

Jump to solution

Hello,

 

 

Thank you very much for these examples - wish I would have found these quicker - looks like these are not part of the Vivado examples ?

You're right; they are not part of the Vivado examples. I created them a while back to help with all the questions on the DMA :).

 

Your description of the design is correct. It does pause when tready goes low.

 

You can still use something very similar for your ADC if you are careful. Here are some specific things you'll need to consider:

1) Drive tlast in hardware. In ADC case, it's somewhat arbitrary because there's not really a notion of a 'packet.' So you can drive it by a counter similar to my design. Which sample you assert it should be less than or equal to the value you program into the DMA's 'Length' register. This value will affect your DMA throughput (see item 2).

2) DMA throughput will be affected by 2 things that you should be aware of. First is memory bandwidth. If you run out of available memory bandwidth, you will eventually see the DMA tready go low, so make sure the AXI4-side and memory controller are designed/configured appropriately to support enough bandwidth to avoid this. Second is your DMA transfer length. In simple mode, you need to write a minimum of 3 registers to kick off a new transfer. Tready will go low during this time. So the larger your transfer length, the less downtime you will have

3) At startup, DMA will deassert tready until it is kicked off by the software.

 

So how to solve it if you can not handle any tready assertion? I can think of two solutions:

1) Use an ASYNC FIFO (AXI Stream Data FIFO IP would work well for this) to cross from your ADC's nominal clock to a faster clock domain used for the AXI Stream interface. This way, your ADC can continue to write to the FIFO when DMA deasserts tready during reconfiguration. Since the DMA is running faster than the ADC, it will eventually catch up and the FIFO shouldn't overflow if designed with proper depth and clock rates to support necessary bandwidth/downtime.

2) The AXI DMA has Cyclic BD mode which will eliminate all downtime for reconfiguration. The DMA will basically work in a circular buffer mode forever. This has the disadvantage, though, of requiring you to use scatter gather. This costs area (and possibly memory bandwidth, depending on where you store your BDs) and is more complex for your software initialization.

 

www.xilinx.com
Scholar ronnywebers
Scholar
21,590 Views
Registered: ‎10-10-2014

Re: AXI4 Stream - can I fix TLAST to zero and TVALID to one

Jump to solution

I managed to rebuild the entire 'dma_ex_interrupt_v1_0' example in vivado 2014.4, first I tried using the tcl script, but that resulted in too many strange errors & behaviour, IP upgrades, ... so I decided to start with a clean project and build the block diagram from scratch, which did the job!

 

then I imported the C code, needed to make some minor corrections to xparameter.h refs , after which everything works fine on my Zedboard, so thank you for this great example!

 

Question regarding the firmware :

 

(if I am correct) 

 

step 1 :

the application first performs a s2mm dma  (device to DMA) : 128x a 32-bit counter value generated by counter in the top level verilog file). So the DMA transfer size = 128x4 = 512 bytes. This data ends up in DDR memory (cached)

 

step 2 :

then the DMA is kicked off to transfer this data from the DDR memory to the axi stream fifo (MM2S transfer, DMA to device)

 

step 3 : 

then we wait until both irq's have set their done flags

 

step 4 :

read back the data from the fifo through the AXI interface -> so if I understand it correctly the fifo not only has a stream in and stream out port, but the data can also be read through the AXI interface? 

 

step 5 : 

this data read back from the fifo is compared to the data in DDR memroy (which originated from the 32-bit counter at the top level.

 

-> if the above is correct : then why does the application does not have to wait between the S2MM and MM2S transfer? I would asume that the MM2S transfer could only be launched when the S2MM transfer has completed ?

 

 

 

 

** kudo if the answer was helpful. Accept as solution if your question is answered **
0 Kudos
Xilinx Employee
Xilinx Employee
21,580 Views
Registered: ‎08-02-2011

Re: AXI4 Stream - can I fix TLAST to zero and TVALID to one

Jump to solution

Great! I'm glad it's working well for you now.

 

step 4 :
read back the data from the fifo through the AXI interface -> so if I understand it correctly the fifo not only has a stream in and stream out port, but the data can also be read through the AXI interface? 

Right, it has a stream interface on one side and an AXI interface on the other so you can write the data (from the DMA) using the stream interface and then read it back with the processor using the AXI interface.

 

why does the application does not have to wait between the S2MM and MM2S transfer? I would asume that the MM2S transfer could only be launched when the S2MM transfer has completed ?

Hmm I think you're right; it should probably wait for S2MM to complete before kicking off MM2S. It still works in this case because the S2MM is kicked off first so it gets a first few samples before the MM2S gets going. So this way, the MM2S is always behind S2MM and thus is transferring the right data. It's not a very robust way to do it, though...

www.xilinx.com
Scholar ronnywebers
Scholar
21,569 Views
Registered: ‎10-10-2014

Re: AXI4 Stream - can I fix TLAST to zero and TVALID to one

Jump to solution
ok thanks! All clear now!
** kudo if the answer was helpful. Accept as solution if your question is answered **
0 Kudos
Scholar ronnywebers
Scholar
21,513 Views
Registered: ‎10-10-2014

Re: AXI4 Stream - can I fix TLAST to zero and TVALID to one

Jump to solution

Hello,

 

I'm still extending the example - I can now launch multiple DMA transfers in a loop. I did these further modifications :

 

* add an AXIS_DATA_FIFO (which I later need to give 2 async clock domains, one of which will by my external board oscillator)

* created an 'enable_data_source' and 'disable_data_source' (instead of the 'reset_data_source), so I can start & stop the datasource

 

Question : when I stop the data source, there's still some datawords in the pipeline : 1024 in the axis_data_fifo, and about 5 somewhere in pipeline regs (I can see this on an ILA)

 

Now how can I completely flush this pipeline, up to the last word, before I re-enable the data source? I tried to perform further DMA's with size 512, but when there's no longer exactly 512 bytes left in the pipeline, the dma never ends... 

 

I was thinking of adding some timeout detection, but that looks so unefficient ...?

 

I can see the axis_data_fifo has a data_count, wr_data_count and rd_data_count output, but how can I read these out?

 

Or is there another way to 'flush' all data in the pipeline? Seems like the axis_data_fifo has no AXI4 reg interface to further control it?

 

 

** kudo if the answer was helpful. Accept as solution if your question is answered **
0 Kudos
Scholar ronnywebers
Scholar
19,858 Views
Registered: ‎10-10-2014

Re: AXI4 Stream - can I fix TLAST to zero and TVALID to one

Jump to solution
Hello @bwiec

could you please explain : 'Second is your DMA transfer length. In simple mode, you need to write a minimum of 3 registers to kick off a new transfer'

-> do you mean that any s2mm IP would need to send a minimum of 3 words to a dma before the dma starts transferring the data upstream?
** kudo if the answer was helpful. Accept as solution if your question is answered **
0 Kudos
Xilinx Employee
Xilinx Employee
15,114 Views
Registered: ‎08-02-2011

Re: AXI4 Stream - can I fix TLAST to zero and TVALID to one

Jump to solution

Hi,

I'm a little confused by the terminology. There are 2 different interfaces we're talking about here:
s_axis_s2mm (input data) and s_axi_lite (control).

You need to write 3 registers (via s_axi_lite control interface) before the DMA will send data from s_axis_s2mm to memory. The only qualifier for when you can send data to the s_axis_s2mm interface is its tready signal.

www.xilinx.com
0 Kudos
Scholar ronnywebers
Scholar
15,105 Views
Registered: ‎10-10-2014

Re: AXI4 Stream - can I fix TLAST to zero and TVALID to one

Jump to solution

Hi @bwiec, ok I misunderstood your explanation - it's clear now what you meant with writing 3 registers to kick off the DMA transfer.

 

I'll rephrase my question if I may :

 

* is it possible to just transfer a single 32-bit word using S2MM DMA? 

* in that case, if my DMA sizes on the AXIS channel are always 4 bytes (1 32-bit word), could I just fix TLAST to '1' in my custom IP that sends data upstream? I know it's not the most efficient way of transferring a single 32-bit word, but it would do the job in my particular case.

** kudo if the answer was helpful. Accept as solution if your question is answered **
0 Kudos
Xilinx Employee
Xilinx Employee
15,097 Views
Registered: ‎08-02-2011

Re: AXI4 Stream - can I fix TLAST to zero and TVALID to one

Jump to solution

Hey Ronny,

 

Oh okay, sorry I misunderstood

 

Yes, it's possible to transfer signle 32 bit word using DMA, assuming its TDATA is 32 bits wide.

 

In this case, yeah, tying tlast to 1 should do the trick, I think.

www.xilinx.com
Scholar ronnywebers
Scholar
15,035 Views
Registered: ‎10-10-2014

Re: AXI4 Stream - can I fix TLAST to zero and TVALID to one

Jump to solution

still continuing on this one ... again I find the TLAST use annoying in this application :

 

I have a downstream device that needs to reads 32 words from a fifo, whenever there are 32 words in the fifo - it reads them in a single burst at axi_clk speed, so the 32 words should be there completely. I planned to do this using a fifo with a flag output that indicates >= 32 words present (this triggers an interrupt in my downstream device). This should be fine for MM2S, as the TLAST will be generated by the DMA controller when it writes the 32'th word.

 

however upstream, I would like to avoid a modification to the IP, and use an interrupt from the fifo, whenever >= 32 words are available in the upstream fifo (S2MM). Upon this interrupt, the S2MM DMA should read 32 words. 

 

Is this possible with the current AXI stream fifo's and dma IP's? So it would mean no TLAST generation by the IP towards the upstream axi-stream fifo, just a continuous flow of data

** kudo if the answer was helpful. Accept as solution if your question is answered **
0 Kudos
Xilinx Employee
Xilinx Employee
15,028 Views
Registered: ‎08-02-2011

Re: AXI4 Stream - can I fix TLAST to zero and TVALID to one

Jump to solution

I'm not sure I understand the problem. Bottom line is you have to assert tlast on at least the 32nd sample from FIFO to S2MM, or the DMA will hang.

Have a look at this design:
http://www.xilinx.com/support/answers/57561.html

In the lib folder, there's an IP called tlast_gen (it's just a little counter and a comparator, nothing significant in area). It takes a signal called pkt_length and will generate tlast for you based on that. Put it between your FIFO and S2MM and then write it to a value of 32 from a GPIO. If you ever need to transfer any other packet size, all you need to do is change a bit of software to re-write that GPIO with the new value.

Even in use case where you just want to stream data with no notion of packets, just pick some arbitrary number. 256. 512. whatever. In any case, it should work, so long as the value written to tlast_gen matches the 'length' register you write to the dma.

www.xilinx.com
Contributor
Contributor
6,412 Views
Registered: ‎06-06-2017

Re: AXI4 Stream - can I fix TLAST to zero and TVALID to one

Jump to solution

@ronnywebers

Hi Ronny,

I have a task to receive and store data from an ADC to a ZCU102 DRAM. It is quite similar this task you were dealing with.

So I have decided to use your approach, that is, to create a prototype design using a simple counter which will generated the data and through the AXI4 Stream interface will be send to the DRAM. I have used an AXI FIFO as well as the AXI DMA IP. In addition, I have created a simple counter through the Vivado HLS. Unfortunately, my design is not working, meaning the counter is not generating the data.  So since I am a newbie in this field I was wondering if you could share your latest HW design in order to adapt it to the ZCU102 chip.

Any help is greatly appreciated. Thank you in advance. 

0 Kudos
Scholar ronnywebers
Scholar
6,408 Views
Registered: ‎10-10-2014

Re: AXI4 Stream - can I fix TLAST to zero and TVALID to one

Jump to solution

hello @amatsaka,

 

I don't have a HLS solution for you, but I've attached my VHDL code. It's actually the full code of a custom AXI4-Stream master IP. (I started with the generated template, and then modified it). You can just package it and include it in your block diagram. 

 

the 2 control inputs come from an AXI GPIO IP :

 

ENABLE_STREAM : in std_logic; -- if '1' then enable stream
ENABLE_ADC : in std_logic; -- if '1' then ADC data comes through, otherwise internal counter comes through

 

Using this IP you can select wether you want ADC data or counter values. Note that in my design I used a dual channel, 14 bit ADC, but you can easiliy replace that with your ADC I guess.

 

I recommend to put an ILA on a few positions in your block design, it will help you understand the small but important details of AXI4-Stream, especially the TLAST signal (as discussed in this thread).

 

When you switch to ADC data, start with for example a sawtooth or triangle wave, and display the sampled data  on your pc. You'll quickly see if there are gaps / issues with your sampling.

 

I'm not familiar with HLS, and not if it's easy to implement such 'bit stuffing' things in HLS, but I'm curious if it can be done easily. So if someday you implement this block in HLS, feel free to post it here!

** kudo if the answer was helpful. Accept as solution if your question is answered **
Contributor
Contributor
6,377 Views
Registered: ‎06-06-2017

Re: AXI4 Stream - can I fix TLAST to zero and TVALID to one

Jump to solution

@ronnywebers

Hi Ronny,

while I was packaging the IP I got the warning

[IP_Flow 19-3153] Bus Interface 'm00_axis_aclk': ASSOCIATED_BUSIF bus parameter is missing.

I think you were facing the same warning back then. Is it safe to ingore it?

 

Thanx

 

 

0 Kudos
Scholar ronnywebers
Scholar
6,370 Views
Registered: ‎10-10-2014

Re: AXI4 Stream - can I fix TLAST to zero and TVALID to one

Jump to solution

I remember something like that. Since it's an AXI interface, it's better to fix this I guess (don't think it will harm if you don't, but I don't like this kind of warnings :-). It was one of the first things I wrote so, don't remember the exact details :-)

 

Check my answer on this forum post for an explanation of the issue .

 

In IP packager, you need to look under the packaging step 'Ports and interfaces', and then right click on the signal and make the correct associations.I believe it is right-click -> add bus interface -> then find the interface definition for axi stream.

** kudo if the answer was helpful. Accept as solution if your question is answered **
0 Kudos
Contributor
Contributor
5,858 Views
Registered: ‎06-06-2017

Re: AXI4 Stream - can I fix TLAST to zero and TVALID to one

Jump to solution

Hi @ronnywebers,

I have moved a bit further with the design. So now your ADC_to_AXIS IP is connected via a AXIS Data Fifo to the AXI DMA. See the attached block design.Block_design.png Then I have created a bare metal application which transfers the counter values to the DDR. I am using the simple mode transfer and only the write channel (i.e. PL to PS).  (see attached source code).

I have set the ADC_to_AXIS IP to generate the TLAST every 32 words (NUMBER_OF_OUTPUT_WORDS : integer := 32; ) So I am trying to read packets that are >= 32 bytes. I only managed one transfer. Then the status register of the DMA indicates halted.

Next step was to add ILA cores to debug the HW.

My question is how can I program the PS in order to generate the correct triggers for the TLAST and TREADY signals? Up to now I

  • I used the Program Device command from Vivado and I get the error that no debug core was detected. 
  • I executed the default Hello World program from SDK which has no effect because it does not initiates the DMA
  • Trying to initialize the DMA manually from xsdb. So I loaded manually (after sourcing the generated psu_init.tcl) the psu_init and the psu_ps_pl_reset_config from xsdb but I get the following error.
    Memory read error at 0xFF5E0020. EDITR overrun 

    Should I execute my baremetal application i have created in order to program the PS to generate the clock signals? Is this the right way to generate the triggers for the hardware debug? Or is there a standard way to program the PS in order to debug the hardware

 

Furthermore should the ADC_to_AXIS be enabled after the DMA is initialized?  Which means in the above design the ENABLE_STREAM signal should be driven by a GPIO controlled by the PS and not always asserted.

 

Thanks once more in advance

 

 

0 Kudos
Scholar ronnywebers
Scholar
5,139 Views
Registered: ‎10-10-2014

Re: AXI4 Stream - can I fix TLAST to zero and TVALID to one

Jump to solution

@amatsaka, a lot of stuff to tackle :-) I'll sum up some stuff here, I'd suggest you go through them one at a time, it may look like a few steps back, but you won't regret it afterwards

 

1) axi4 stream

I'd start by downloading the AXI4 Stream protocol spec from ARM, and read through it. It's only 42 pages, but it's important you understand the important signals. Not all of them are used by Xilinx. Fyi, the Xilinx AXI ref guide is UG1037. I never read it completely, but might be usefull to look things up. You could also watch this great tutorial from Mohammad Sadri, something that Xilinx should take an example of. (just watch  all the tutorials from Sadri, will be your best investment ever! He uses older Vivado version, but you should be able to follow most of them (Petalinux has changed a bit though). Xilinx should give this guy a medal!).

 

One important thing to know is that the ARM spec says that TLAST is not obligatory. But for Xilinx it is (see all the previous answers on this forum thread) : if you don't assert TLAST at 'regular intervals' (i.e. packet size of 32, but doesn't need to be fixed in fact), then the DMA controllers won't work (just remain pending). 

 

It's important to understand 'when' a dataword is effectively transferred on AXI Streams : it's only when TREADY and TVALID are both high. Only then. And on top of that, TLAST indicates a 'packet end'.

 

So to answer one of your questions : TVALID, TLAST & TREADY create a 'throttled' stream of data. TVALID & TLAST are generated by your data source (from downstream IP to upstream IP) (i.e. from ADC IP to AXI S fifo, and from AXI S fifo to DMA), TREADY is generated from upstream IP to downstream IP (i.e. from DMA to AXI S fifo, and form AXI s fifo to ADC). It is there to 'throttle' data transfers. TLAST cuts your stream into pieces, so that the DMA knows when a packet ends.

 

2) enabling/controlling the stream

 

Furthermore should the ADC_to_AXIS be enabled after the DMA is initialized?  Which means in the above design the ENABLE_STREAM signal should be driven by a GPIO controlled by the PS and not always asserted.

 

yes I'd recommend to do that : first you setup your DMA, start the DMA (it will then be pending/waiting for data), then you enable your data source (using AXI GPIO is a possible solution, you could also use an EMIO pin, or even an AXI4 Lite register, but keep that for later :-). And only after that you enable your data source. That's the easiest way of working. If you would enable your source first, you'll probably have a fifo overrun long before you enabled the DMA, and that could complicate stuff. But, in some applications you might need to start - stop - clear - start again your data stream. In that case you also need to be able to 'flush' your data path. But also, keep that for later :-) For now, just enable your stream only after you initialized the DMA.

 

3) ila

getting the ILA's to work has always been a bit of an adventure.... so the message you get : 'no debug core was detected' is a typical one. It means that you tried to setup the ILA, but it didn't end up in the bitstream for some reason. Try searching the forum for this message, you'll find many posts. Check UG936 (tutorial) and UG908, it's worth walking through the tutorial.

 

about your triggering : you could trigger for example when you set your AXI GPIO pin high, and enable the stream. you'll need to 'cascade' your ILA's in case you want to trigger them simultaneously (use the trigger in/out, ... all explained in the UG's). It's also interesting to add an ILA after your DMA, so you can see how it transfers data to memory (or even how you configure it). Then you can see the complete chain.

 

Triggering the ILA is a matter of setting the right signals & levels, you can create very complex triggering. Check the UG's how to do tihs. 

 

4) dma handling / firmware

check for example this AR , you'll learn a lot form it. Also this page gives an overview of possible DMA modes & firmware for each mode. Usually one starts in polled mode, then when that works, you know that your PL works ok, you can move to interrupt mode. Also, an easy tutorial is this one from fpgadeveloper.com, maybe that is the best place to start.

 

a few tips :

* make sure your destination buffer is larger than the max packet size you expect

* don't be fooled by cache memory if you're dumping packets to the serial port for example - turning off cache in the beginning is a good idea. If you leave cache on, make sure to properly 'flush' & 'invalidate' cache where needed. But just turn it off in the beginning!

* if your DMA stops after a single transfer, you might have a TLAST issue, or some firmware issue.

 

use google a lot to search for things like 'vivado tutorial axi dma', or 'vivado tutorial  custom IP', and try the same on youtube. Plenty of stuff out there, some very good, some useless. But most of the time more at beginner level than the Xilinx tutorials, which are usually not going into much detail, they are more like 'reference tutorials'. 

 

sorry for pointing you to many tutorials, but it's worth the investment, you'll gain that time back later, by doing it right from the first time.

 

 

** kudo's are the only salary that forum volunteers get from Xilinx, don't forget to throw some from time to time :-) **

 

 

** kudo if the answer was helpful. Accept as solution if your question is answered **
Contributor
Contributor
4,937 Views
Registered: ‎06-06-2017

Re: AXI4 Stream - can I fix TLAST to zero and TVALID to one

Jump to solution

Hi  @ronnywebers

Firstly let me thank you for your detailed response. Really appreciated!!

 

I had already went through the tutorials you've suggested (Sadri and Fpgadeveloper), and it is obvious that my design is based on them. 

 

I've managed to use the ILAs in the cascade mode along the pipeline which is very handy (see attached design). In order to initiate triggers I just pause the code execution (I execute the firmware code from the debugger). So now I am able to see what is happening along the pipeline.

 

I've also modified the code to (see attached source code):

  • disable the caches right after the init_platform()
  • initiate the first DMA transfer before enabling the ADC_To_AXIS IP.  Here I would like to point out that I am not using Interrupts. Also for the transfer the receive buffer is 64bytes which follows the rule >= expected packet (32 words are gererated by the ADC_to_AXIS IP)

Still I cannot initiate the second transfer because the DMA raises the halted bit in its Status register right after the first transfer. As I understood the transfer consists of two actions provided the interrupts are not enabled:

  • Write the destination addr to the S2MM_DA register (0xA000000 in my case)
  • write the length in bytes in the S2MM_LENGTH register (64 in my case)

So as you can see in the FIFO_Output_ila.png picture for each packet (4 bytes) that is send to DMA the  TVALID is asserted. The DMA deasserts the TREADY after 20 packets which I presume is when is gets halted. What worries me is that the TLAST from FIFO is not set (I think is because the FIFO is not set to packet mode).

 

The ADC_to_AXIS IP works OK (see the ADC_Output_ila.png picture). It asserts the TLAST after each packet (32 words) is send and the TVALID is asserted after it becomes enable. Here the TREADY is deasserted by the FIFO after 279 words. I presume it is because it gets full. But when the FIFO depth is set to 256 I would expect to become full after 260 words Right?

 

What is your suggestion on the above?

 

As a next step I will remove the FIFO. The only reason that I have put it is because sadri suggested when we have two clock domains, which I have in my case. The ADC clock (external) is 10MHz which is much slower from the default clocks provided by PS (100MHz). 

 

 

 

0 Kudos
Contributor
Contributor
4,939 Views
Registered: ‎06-06-2017

Re: AXI4 Stream - can I fix TLAST to zero and TVALID to one

Jump to solution

Hi  @ronnywebers

Firstly let me thank you for your detailed response. Really appreciated!!

 

I had already went through the tutorials you've suggested (Sadri and Fpgadeveloper), and it is obvious that my design is based on them. 

 

I've managed to use the ILAs in the cascade mode along the pipeline which is very handy (see attached design). In order to initiate triggers I just pause the code execution (I execute the firmware code from the debugger). So now I am able to see what is happening along the pipeline.

 

I've also modified the code to (see attached source code):

  • disable the caches right after the init_platform()
  • initiate the first DMA transfer before enabling the ADC_To_AXIS IP.  Here I would like to point out that I am not using Interrupts. Also for the transfer the receive buffer is 64bytes which follows the rule >= expected packet (32 words are gererated by the ADC_to_AXIS IP)

Still I cannot initiate the second transfer because the DMA raises the halted bit in its Status register right after the first transfer. As I understood the transfer consists of two actions provided the interrupts are not enabled:

  • Write the destination addr to the S2MM_DA register (0xA000000 in my case)
  • write the length in bytes in the S2MM_LENGTH register (64 in my case)

So as you can see in the FIFO_Output_ila.png picture for each packet (4 bytes) that is send to DMA the  TVALID is asserted. The DMA deasserts the TREADY after 20 packets which I presume is when is gets halted. What worries me is that the TLAST from FIFO is not set (I think is because the FIFO is not set to packet mode).

 

The ADC_to_AXIS IP works OK (see the ADC_Output_ila.png picture). It asserts the TLAST after each packet (32 words) is send and the TVALID is asserted after it becomes enable. Here the TREADY is deasserted by the FIFO after 279 words. I presume it is because it gets full. But when the FIFO depth is set to 256 I would expect to become full after 260 words Right?

 

What is your suggestion on the above?

 

As a next step I will remove the FIFO. The only reason that I have put it is because sadri suggested when we have two clock domains, which I have in my case. The ADC clock (external) is 10MHz which is much slower from the default clocks provided by PS (100MHz). 

0 Kudos
Contributor
Contributor
5,072 Views
Registered: ‎06-06-2017

Re: AXI4 Stream - can I fix TLAST to zero and TVALID to one

Jump to solution

Hi @ronnywebers

Firstly let me thank you for your detailed response. Really appreciated!!
I had already went through the tutorials you've suggested (Sadri and Fpgadeveloper), and it is obvious that my design is based on them.
I've managed to use the ILAs in the cascade mode along the pipeline which is very handy (see attached design). In order to initiate triggers I just pause the code execution (I execute the firmware code from the debugger). So now I am able to see what is happening along the pipeline.
I've also modified the code to (see attached source code):

  • disable the caches right after the init_platform()
  • initiate the first DMA transfer before enabling the ADC_To_AXIS IP.  Here I would like to point out that I am not using Interrupts. Also for the transfer the receive buffer is 64bytes which follows the rule >= expected packet (32 words are gererated by the ADC_to_AXIS IP)

Still I cannot initiate the second transfer because the DMA raises the halted bit in its Status register right after the 1st transfer.  I cannot even set again the RS bit of the DMACR register after the 1st transfer. As I understood the transfer consists of two actions provided the interrupts are not enabled:

  • Write the destination addr to the S2MM_DA register (0xA000000 in my case)
  • Wite the length in bytes in the S2MM_LENGTH register (64 in my case)

So as you can see in the FIFO_Output_ila.png picture for each packet (4 bytes) that is send to DMA the  TVALID is asserted. The DMA deasserts the TREADY after 20 packets which I presume is when is gets halted. What worries me is that the TLAST from FIFO is not set (I think is because the FIFO is not set to packet mode).


The ADC_to_AXIS IP works OK (see the ADC_Output_ila.png picture). It asserts the TLAST after each packet (32 words) is send and the TVALID is asserted after it becomes enable. Here the TREADY is deasserted by the FIFO after 279 words. I presume it is because it gets full. But when the FIFO depth is set to 256 I would expect to become full after 260 words Right?


What is your suggestion on the above?

As a next step I will remove the FIFO. The only reason that I have put it is because sadri suggested when we have two clock domains, which I have in my case. The ADC clock (external) is 10MHz which is much slower from the default clocks provided by PS (100MHz).

 

0 Kudos
Scholar ronnywebers
Scholar
5,064 Views
Registered: ‎10-10-2014

Re: AXI4 Stream - can I fix TLAST to zero and TVALID to one

Jump to solution

 

Also for the transfer the receive buffer is 64bytes which follows the rule >= expected packet (32 words are gererated by the ADC_to_AXIS IP)

 

1) just a quick thought :  -> should your rx buffer not be 32 words x 4 bytes = 128 bytes in size, if you transmit 32 (0x20) words? Also, make sure your rx buffer is always >= max size. Btw you wrote it 'halts' after 20 words, is that 20 or 0x20? (because 0x20 = 32decimal)

 

2) did you check the contents of S2MM_DMASR? Especially if bit 4 (DMAIntErr) is set or not. Check PG 021 for the complete description of this bit/register.

 

note : S2MM_DMACR bit 0 (RS) -> This bit is cleared by AXI DMA hardware when an error occurs.

 

-> if there is an error, DMA stops, you have to clear the error / fix the cause.

 

3) also check S2MM_DA description -> you might need proper data alignemnt of your destination buffer in memory (depending on how you configured the DMA IP)

 

4) your FIFO should indeed 'propagate' TLAST, check it's configuration, not sure if 'enable packet mode' helps (not sure what it exactly means, check UG), but it sounds like that. Also check if TLLAST is enabled.

 

5) if you have 2 asynchronous clock domains, you MUST use a fifo to do proper clock domain crossing. If you don't, your system will eventually go wrong (read about metastability & clock domain crossing on the forum/web). Even in case it' mesochronous or even fully synchronous, it's good to have a fifo, just for elasticity / efficiency in DMA transfers. There's plenty of fifo's in your device.

 

6) the fact it's not 256, but something like 270 is because of the in/out pipelining in each IP, every IP adds some extra pipelining registers, not sure how many, but sounds realistic.

 

let me know if this helps

 

 

 

 

** kudo if the answer was helpful. Accept as solution if your question is answered **
0 Kudos
Contributor
Contributor
5,044 Views
Registered: ‎06-06-2017

Re: AXI4 Stream - can I fix TLAST to zero and TVALID to one

Jump to solution

hi @ronnywebers

Yes, you are absolutely correct!!. Setting the packet length to 128 bytes I managed to perform multiple transfers without halting the DMA.

So everything is "ok". But.......

  • Checking the transfered data I observed that were samples are missing (counter values in this case). On the 10th transfer there were 21133 samples missing. On the 11th transfer there were 21418 samples missing and so on.
  • Also on the HW debug the ILAs showed a strange behavior. The DMA Output ILa showed that only one transfer was performed  (see DMA_Output_ila.png). Same for the FIFO Output i.e. the TREADY was set to 0 during the 2nd transfer which makes sence if the DMA stopped transfering data (see FIFO_Ouput.ila)

It does not make any sense!! Do you have any suggestion?

 

Just to answer to your other points you mentioned:

  • I have not enabled any interrupts and after the correction the err bit in S2MM_DMASR is not set
  • All my data are aligned. The DMA has Address width 32 bits and max burst size 16
  • The FIFO indeed propagates the TLAST as is shown in the attached pic

Thanks once again!!!

 

DMA_Output_ila.png
FIFO_Output_ila.png
0 Kudos
Scholar ronnywebers
Scholar
5,032 Views
Registered: ‎10-10-2014

Re: AXI4 Stream - can I fix TLAST to zero and TVALID to one

Jump to solution

quick question - I'll check the rest of your post later :

 

how large is your fifo? Can you try doubling or making the fifo way larger, and see if you are missing less samples? If you're polling the DMA, you might miss some samples and have gaps (?). Or are you 'stopping' your counter when the fifo is full?  (I would recommend not to do this, so you can indeed see if your system is performant enough not to miss any samples. you can't 'stop' an ADC either or tell the input signal to wait a bit if you have a DMA hick-up :-)

 

** kudo if the answer was helpful. Accept as solution if your question is answered **
0 Kudos
Scholar ronnywebers
Scholar
5,025 Views
Registered: ‎10-10-2014

Re: AXI4 Stream - can I fix TLAST to zero and TVALID to one

Jump to solution

also, you should experiment with making your DMA transfer sizes (and hence packet sizes / TLAST) larger : you have a certain software overhead in polling & handling DMA transfers. So having very small DMA transfers requires a lot more overhead than having larger ones. You have to account for all this : packet size / transfer size / software response time / ...

 

Also if you are dumping to the serial port, while data is filling up your FIFO, you're might be missing samples (fifo overruns). You can try to trigger on such event with your ILA : ADC data has TVALID asserted, and your AXI4 stream FIFO input has TREADY de-asserted : this should never occur, otherwise your sample will be 'gone'

** kudo if the answer was helpful. Accept as solution if your question is answered **
0 Kudos
Contributor
Contributor
4,975 Views
Registered: ‎06-06-2017

Re: AXI4 Stream - can I fix TLAST to zero and TVALID to one

Jump to solution

Hi @ronnywebers

I am replying to your last two posts.

 

The FIFO initially had a 256 depth. I've increased it to 1024 and the now all the samples have been transfered! I have no intention to stop the ADC IP when the FIFO is full. Our external ADC will generate bursts that last 12.8us with a 37.2us interval between bursts. So there is time to empty the FIFO.

 

So now the packet size specified during the DMA transfer is equal to the burst size (generated by the "sample generator") and it works smoothly.

 

What worries me though is the picture I get from the HW debug. The ILA connected to the DMA output indicates only the 1st transfer took place (see attached DMA_Output_ila). In addition the ILA connected to the FIFO output shows that only the 1st transfer is completed (see attached FIFO_Output_ila) and then TREADY is deasserted by the DMA. 

I would expect the subsequent transfers are captured as well.

Could this mean that the subsequent transfers happen later in time? The ILA have 4096 sample buffer. The ILAs to the FIFO and DMA output are clocked with the 200MHz clock where as the ILA to the "sample generator" output is clocked with 10MHz. 

I have a bear metal application that works as it supposed to but the HW debug shows otherwise. 

Any suggestion?

 

 

FIFO_Output_ila.png
DMA_Output_ila.png
0 Kudos
Scholar ronnywebers
Scholar
4,949 Views
Registered: ‎10-10-2014

Re: AXI4 Stream - can I fix TLAST to zero and TVALID to one

Jump to solution

that's good news! If your firmware correctly receives all data, then it's probably working.

 

The ILA's are powerfull, but have very limited depth : i.e. 4096 sample buffer means 4096 x clk_200MHz cycles -> 4096 * 5ns = 20.48us . So sometimes you need to be 'creative' to trigger on the right event. 

 

you can i.e. trigger on data values, address values, equal, greater / smaller then, use AND/OR to combine things, ... You could even add your own (hdl) data counter, and trigger i.e. on the 3456th generated sample. 

 

So probably as you say, it's not fitting the ILA depth, or you're not looking at the right moment in time :-) You could try to make it 16k deep for example, if your FPGA allows it 

 

Also, subsequent PS read/write access to the PL are 'rather slow' compared to HDL code, i.e. DMA in polled mode vs DMA under interrupt, or using scatter/gather lists will make a huge difference. Check for example a simple write command, followed by a read command on an ILA to for example a BRAM to get an idea.

 

And don't forget, if you put a printf somewhere in your code, you're away for many milliseconds :-)

** kudo if the answer was helpful. Accept as solution if your question is answered **