cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
2,135 Views
Registered: ‎07-14-2017

Using Burst on Software Side

Hello Everyone,

 

   I have created my own memory mapped peripheral that is compatible with the AXI4 Full protocol with the help of the template provided by Vivado. I have tested the peripheral with a BFM and i am able to successfully use bursts. My peripheral is connected to the AXI_GP_0.

   My problem is on the Xilinx SDK side, is there any Xilinx driver in C that can generate bursts? My data are saved in the RAM and i want to pass them to the PL peripheral. So far i am using Xil_Out32 but these intructions are used for signle transfers.

  From some research i have found that i can use an DMA IP of the PL  but i would like to know is the ARM Processor itself can generate those burst or a DMA of some kind is always necessary?

 

Thanks is advance

0 Kudos
17 Replies
Highlighted
Teacher
Teacher
2,090 Views
Registered: ‎03-31-2012

@kgkougkoulias  yes, apparently you need to mark your PL range as memory for the mmu to generate burst transactions. There was a thread a couple months ago which showed how to do this. If you can't find it I can look for it again.

 

- Please mark the Answer as "Accept as solution" if information provided is helpful.
Give Kudos to a post which you think is helpful and reply oriented.
Highlighted
Scholar
Scholar
2,073 Views
Registered: ‎03-22-2016

@muzaffer @kgkougkoulias NV MIND

 

vitorian.com --- We do this for fun. Always give kudos. Accept as solution if your question was answered.
I will not answer to personal messages - use the forums instead.
0 Kudos
Highlighted
Observer
Observer
1,536 Views
Registered: ‎12-28-2016

Hi, Guys,

 

I am around this topic for a while now, but I am not sure I've got it right. When I define the PL memory as DEVICE_MEMORY, using

 

Xil_SetTlbAttributes(XPAR_AXI_ETHERNETLITE_0_BASEADDR,DEVICE_MEMORY);

 

I get 6x better performance, so the number of clocks between bvalid fall from 18 to 3. But when I look into the AXI signal, this for me seems like colescence and not a real burst. Please see the attached print screen.

 

I obtain the same result when using DMA... 

 

My question is: am I getting a burst or not?

 

Thank you,

Tomas

 

 

2018-03-08 11_38_46-RingSlave - [C__Users_tomaspcorrea_Drive_Xilinx_RingSlave_vivado_RingSlave.xpr] .png
0 Kudos
Highlighted
Scholar
Scholar
1,526 Views
Registered: ‎03-22-2016

@tpcorrea 

vitorian.com --- We do this for fun. Always give kudos. Accept as solution if your question was answered.
I will not answer to personal messages - use the forums instead.
0 Kudos
Highlighted
Observer
Observer
1,518 Views
Registered: ‎12-28-2016

@hbucher According to Zynq TRM, one of Device Memory access rules is "both read and write accesses can have side effects on the system. Accesses are never cached. Speculative accesses are never be performed." So I understood it wasn't cached, but in the same document one reads "A write to device memory is permitted to complete before it reaches the peripheral or memory component accessed by the write". So this is the issue you refer to, right? Would the correct choice be Strongly Ordered?

 

Back to my question, are the accesses showed in the picture enclosed in my previous post a burst or not?  In a burst I would expect the awaddr to remain constant...

 

KR,

Tomas

0 Kudos
Highlighted
Scholar
Scholar
1,515 Views
Registered: ‎03-22-2016

@tpcorrea 

vitorian.com --- We do this for fun. Always give kudos. Accept as solution if your question was answered.
I will not answer to personal messages - use the forums instead.
0 Kudos
Highlighted
Observer
Observer
1,512 Views
Registered: ‎12-28-2016

@hbucher thanks for the quick reply. I have exactly the same opinion. I am a bit puzzled, because even using DMA I could not get bursts. Any idea here? Maybe @johnmcd could help me on that too?

 

Cheers,

Tomas

0 Kudos
Highlighted
Scholar
Scholar
1,508 Views
Registered: ‎03-22-2016

@tpcorrea NEVER MIND

vitorian.com --- We do this for fun. Always give kudos. Accept as solution if your question was answered.
I will not answer to personal messages - use the forums instead.
0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
1,488 Views
Registered: ‎02-01-2008

On zynq, there are basically 3 types of accesses that can be configured in the mmu.

Strongly-ordered: a new transaction will not occur until the previous completes

Sharable Device: a new transaction can start before the response of the previous completes (can get up to 10x improvement over strongly-ordered if writing continuously to a peripheral)

memory: uses the cache controller.

 

Cache can get in the way, depending on what you want to do. If you set the address range as memory, and configure both inner and outer as non-cached, then the cache controller will coalesce multiple writes to incrementing addresses into burst transactions.

 

You are correct that the waveform you captured is not a burst transaction. And, I do not see any axi4 signals in the waveform capture which suggests to me that these may be axiLite transactions. AxiLite does not support bursting. I would have expected to see signals such as AWLEN, AWSIZE, etc.

Highlighted
Observer
Observer
1,421 Views
Registered: ‎12-28-2016

@johnmcd thank you for your reply. First of all, yes I have AXI4 interface, I captured the handshake signals and just some others to not clutter the screen.

 

Now I understand why when I change from STRONGLY_ORDERED to DEVICE_MEMORY I have such improvement in the performance (6x faster in my case).

 

I didn't give a background on what I am trying to do. I want to move data from AXI Ethernet Lite to the DDR and the other way around for incoming and outcoming Ethernet packets. I use lwip and am making some changes to get it faster, avoiding unnecessary copies and speeding up the AXI4 interface, for example.

 

So for having bursts I need to set the memory as normal, correct? So far my echo server hangs after I get the first packet when I configure the memory as normal.

 

KR,

Tomas 

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
1,411 Views
Registered: ‎02-01-2008

It's been a long time since I've looked at the ethernetLite. It is a slow interface since there is basically a ping/pong bram buffer within the core. As the ping block is written by the IP, you can read the pong block. So your cpu will be very busy.

 

So thinking aloud, if you setup ethernetLite address space as memory, non-cached, and if the ethernetLite tx fifo has an address range instead of a single address so that you can do incremental address writes to the core and therefore get coalescing, you may need to use a data barrier instruction before trying to access other addresses just to make sure the writes complete.

 

So CPU copies data to/from DDR/enetLite. DDR could be cachable since you are dealing with DMA accessing the same DDR space but the enetLite address space must not have cache enabled. If your enetLite address range is set as 'memory' non-cached, you may run into issues when reading/writing register locations. Use chipscope to verify. If you really want to move ahead with enetLite, and you want fast data movements using the CPU, then it might help to setup two virtual addresses to the same enetLite physical address. Virtual address #1 is memory-non-cached, and virtual address #2 is device strongly ordered. Use #2 for register reads/writes and #1 for data fifo access (as long as the fifo has an address range and not a single keyhole address).

 

I do question why you don't use either CDMA with ethernetLite or axi_dma with a hardcore enet block. CDMA is for axi to axi transactions, and axi_dma is for axiStream to/from axi transactions.

 

LWIP should have the necessary stuff to manage cache as you are probably well aware if you are digging around to do zero copy or equivalent. I'm not sure if LWIP includes enetLite with CDMA. You could probably tell me that.

 

So regarding a hanging echo server, I'm wondering if it has to do with register accesses to the enetLite while coalescing is occurring. Easy way to check is chipscope, and use xsdb to issue single read and write accesses to fifo space vs register space.

 

 

 

 

0 Kudos
Highlighted
Observer
Observer
1,399 Views
Registered: ‎12-28-2016

Hi, John,

 

thank you a lot for this detailed answer. My application is all about low latency, not much throughput, and you might be surprised that Enet Lite has lower latencies than GEM for any size of packet. The cost is CPU occupation, sure.

 

You gave me a good insight and now I know how to continue my development. I will post here again whenever I am able to accomplish what I want or get stuck. 

 

KR,

Tomas

0 Kudos
Highlighted
Observer
Observer
1,395 Views
Registered: ‎12-28-2016

@johnmcd I forgot to say, but I tried to use the PS DMA without success...

0 Kudos
Highlighted
Scholar
Scholar
1,392 Views
Registered: ‎03-22-2016

@tpcorrea NEVER MIND.

vitorian.com --- We do this for fun. Always give kudos. Accept as solution if your question was answered.
I will not answer to personal messages - use the forums instead.
0 Kudos
Highlighted
Observer
Observer
1,359 Views
Registered: ‎12-28-2016

@hbucher That's off-topic, but thanks for the comment. 200 ns is very good, certainly. The PHYs using cat 5 cables need at least 350 ns, not considering the higher layers, but I am still investigating more standard, lower costs interfaces using cables or at most plastic optical fibre. In industrial environments, with hundreds of nodes, we need to keep it simple and (relatively) cheap. 

0 Kudos
Highlighted
Scholar
Scholar
1,357 Views
Registered: ‎03-22-2016

@tpcorrea NEVER MIND

vitorian.com --- We do this for fun. Always give kudos. Accept as solution if your question was answered.
I will not answer to personal messages - use the forums instead.
0 Kudos
Highlighted
Observer
Observer
1,348 Views
Registered: ‎12-28-2016

@hbucher you have to know all the details of the application before you say a certain solution is or not adequate, so this "no-no" is arrogant from your side. I never said I need minimum latency independent of the (hw, sw, development, etc. ) cost. The topic is "Using burst on Software Side", so your previous comment is definitely off-topic.

Tomas
0 Kudos