UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Explorer
Explorer
3,270 Views
Registered: ‎07-17-2014

[SOLVED] Linux DMA Cleanup after transaction timeout

Jump to solution


I'm using dma_alloc_coherent for dma operations -- and it works great as long as the transaction doesn't timeout. (which is a condition that's expected to occur).

After the initial timeout, DMA stops working for every subsequent attempt. (see below)

dma_terminate_all() looks like should be what I should use, but I get first time use dmesg of:

[   35.146414] xilinx-dma 40400000.dma: Cannot stop channel eeaff410: 0
[   36.151393] xilinx-dma 40400000.dma: Cannot stop channel eea30350: 0


(DMA doesn't clean  up -- and doesn't go back to working -- and no more messages like the above occur on sequential timeouts.)

Is there some cleanup step I'm missing? (this is Petalinux 2015.4)

Thanks,

 -Ben

 



I originally used kzalloc with small buffers combined with the sequence of:  (get/release dma channel is on open/close of driver)

dma_map_single ()
dmaengine_prep_slave_single ()
init_completion()
dma_async_issue()
wait_for_completion_timeout()
ucpi_dma_ch_unmap()



it worked fine.

I switched to dma_mmap_coherent() and the sequence is now: (get/release dma channel is on open/close of driver)

dmaengine_prep_slave_single ()
init_completion()
dma_async_issue()
wait_for_completion_timeout()
(tried with dma_terminate_call() and without)
ucpi_dma_ch_unmap()




 

0 Kudos
1 Solution

Accepted Solutions
Explorer
Explorer
4,326 Views
Registered: ‎07-17-2014

Re: [SOLVED] Linux DMA Cleanup after transaction timeout

Jump to solution

 

Welp, I found the problem(s)....

 

dma_terminate_all() looks to be the way to go when stopping foiled DMA transactions.

IF -- I didn't find....


(2) bugs in the FPGA DMA IP module.

(2) bugs (maybe 3) in the Xilinx DMA Driver.

yay.




View solution in original post

0 Kudos
5 Replies
Explorer
Explorer
4,327 Views
Registered: ‎07-17-2014

Re: [SOLVED] Linux DMA Cleanup after transaction timeout

Jump to solution

 

Welp, I found the problem(s)....

 

dma_terminate_all() looks to be the way to go when stopping foiled DMA transactions.

IF -- I didn't find....


(2) bugs in the FPGA DMA IP module.

(2) bugs (maybe 3) in the Xilinx DMA Driver.

yay.




View solution in original post

0 Kudos
Scholar ronnywebers
Scholar
3,114 Views
Registered: ‎10-10-2014

Re: [SOLVED] Linux DMA Cleanup after transaction timeout

Jump to solution

hello @bkamen, I stumbled across your post, I remember my software collegue having issues with dma timouts too ... you talk about bugs you found in the Xilinx DMA IP?

 

Can you tell more about this? You should bring this to Xilinx attention if so.

** kudo if the answer was helpful. Accept as solution if your question is answered **
0 Kudos
Explorer
Explorer
3,103 Views
Registered: ‎07-17-2014

Re: [SOLVED] Linux DMA Cleanup after transaction timeout

Jump to solution

Hi Ronny,

 
How does anyone bring anything to Xilinx attention? (their support channel is mostly frustrating and problematic at best. I have no one to call except Avnet and we won't get started on that being a Friday and me wanting to keep my good mood.)

Anyway -- I don't know if it still exists in the latest versions of Xilinx IP for Vivado or Petalinux.

I don't update willy-nilly as doing so has burned me badly in the past -- so I'm still on Petalinux 2015.4.

With that:

bugs found: (and they play off each other)

FPGA:

1. A DMA channel told to stop doesn't. (you can clear DMACR[0], but DMASR[0] does not go to 1)

2. A DMA channel reset to one channel seems to reset "the rest". (I'm only using 2, S2MM and MM2S. Resetting one resets the other. This now causes problems for the driver which does resets only on DMA submission. See below.)

Driver:

1. the DMA channel reset, if successful, doesn't also properly adjust the driver's channel chan->idle flag.
 This causes a problem in channel_start which bails the function without any kernel message saying "Channel not idle".)

2. the DMA channel is reset in submit function only if chan->err is set which is usually set if dma_terminate is called. But since channels are tracked separately and the channel reset bonks both channels, one channel can still end up NOT being reset in terms of it's chan->err and chan->idle flags in the driver.

I plan on fixing this by making xilinx_dma_chan_reset properly set/clear these flags so the submit routine won't bother doing the reset at the start of a transaction.

3. I haven't hunted this one down yet, but it seems that the xilinx_dma driver re-starts a channel (DMACR[0]=1) when my code hasn't asked to do so. When my userland program quits and releases the channels, the FPGA DMA IP is still waiting for data to process a transaction. (and the cyclic bit isn't set.)

Tags (3)
Scholar ronnywebers
Scholar
3,046 Views
Registered: ‎10-10-2014

Re: [SOLVED] Linux DMA Cleanup after transaction timeout

Jump to solution

@bkamen, thanks for the details. I did use the DMA bare metal, but didn't really suffer from timeouts. But I remember our linux guy having issues with this too. We're currently migrating to 2017.2, and someday I should take the time to dive into the DMA stuff on linux too, so I'll keep this post as a reference.

 

The forum is a 'volunteer' place, so even if there are Xilinx employees, I believe most of them, if not all, do this in their spare time.

 

To get it to the attention there are a few things I can tell :

 

1) post it here on the forum, and hope that the right person picks this up. This can be someone from Xilinx, but there are some really good non-xilinx people on the forum, helping people as much as they can. Sometimes an answer comes quickly, sometimes just no one answers. Also once when you mark the answer as 'solved', chances become low that anyone will further look into your issue. 

 

2) be as precise / exact as possible, add source code or extracts, add your tool version, .... the more to the point the question is, the more chance that someone will answer it. If necessary/possible, split your question into a few smaller posts.

 

3) try to help other people too if you can / see an opportunity. It will pay back.

 

4) if you can't get any help here, ask your representative (Avnet in your case, mine too) to escalate this to (official) Xilinx support, and point to your forum post. But sometimes you get your answer quicker on the forum, then through regular support.

 

 

** kudo if the answer was helpful. Accept as solution if your question is answered **
0 Kudos
Explorer
Explorer
3,004 Views
Registered: ‎07-17-2014

Re: [SOLVED] Linux DMA Cleanup after transaction timeout

Jump to solution
See below,


@ronnywebers wrote:

@bkamen, thanks for the details. I did use the DMA bare metal, but didn't really suffer from timeouts. But I remember our linux guy having issues with this too. We're currently migrating to 2017.2, and someday I should take the time to dive into the DMA stuff on linux too, so I'll keep this post as a reference.

 

No worries. The "fix" or workaround is easy. It's all contained in xilinx_dma.c -- and I'd share if you need it.

 

 

The forum is a 'volunteer' place, so even if there are Xilinx employees, I believe most of them, if not all, do this in their spare time.

 

I understand that. I don't expect anything from the forums.. but it's better than nothing. And it's not uncommon to find people in the forums who know more about the product than the company official tech support people.

 

To get it to the attention there are a few things I can tell :

 

1) post it here on the forum, and hope that the right person picks this up. This can be someone from Xilinx, but there are some really good non-xilinx people on the forum, helping people as much as they can. Sometimes an answer comes quickly, sometimes just no one answers. Also once when you mark the answer as 'solved', chances become low that anyone will further look into your issue. 

I don't mark items solved unless they're solved and further research is no longer needed.

 

2) be as precise / exact as possible, add source code or extracts, add your tool version, .... the more to the point the question is, the more chance that someone will answer it. If necessary/possible, split your question into a few smaller posts.

 
That's a really mixed bag. Sometimes I've gone through the effort to fully detail a problem and it seems to intimidate readers. I've noticed sometimes it's better to ping with something short but descriptive and see who bites. There seems to be no magic formula.

3) try to help other people too if you can / see an opportunity. It will pay back.


Indeed -- I do when I can. (it's one of the ways I get new clients... although now I'm swamped and can't take on any more work than currently projected.)
 

4) if you can't get any help here, ask your representative (Avnet in your case, mine too) to escalate this to (official) Xilinx support, and point to your forum post. But sometimes you get your answer quicker on the forum, then through regular support.

 

 My Avnet FAE is not very technical... and also not very responsive. I get the same feeling from him that I've received from other vendors dealing with a consultant. (We're not taken seriously because we're at the wrong end of securing big contracts.) Luckily at one company I had a friend who was able to talk to the VP of tech support and explain that dissing consultants can be bad karma. The project at the time that brought this issue of tech support treatment up involved a company/project I was writing code for that was ready to buy reels and reels of that company's microcontroller. 

Unless things have changed recently, I've  called Xilinx in the past and have been shut down right at the front door. There's no one to talk to except my local Avnet rep. That's been my only escalation path.

We used to have XIlinx FAEs to talk to directly assigned to an area -- and that was great. Loved it. Bought some real loyalty after we had horrid (paid) support problems with Lattice and which caused us to move the project to Xilinx. 

Things have changed since then. (and seemingly not for the better.)

I would also note that it's no industry secret that companies lay off a whole bunch of support people after putting up a forum (like this one) with the attitude of "let the users support each other". (I've literally heard execs say that in a meeting.)

Just mentioning my observations/experience.