UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Explorer
Explorer
12,895 Views
Registered: ‎02-22-2012

OpenAMP "chicken or egg" problem

I'm JTAG debugging OpenAMP BM remote application on CPU1. My OpenAMP application is basically echo_test from UG1186 with some differences:
HW is 7Z010 with 512MB DDR, BM/LX memory is partitioned 128MB/384MB, LX start address is 0x08000000, no petalinux, LX is from linux-xilnx git tag 2015.4.01, Vivado and XSDK 2015.4.

I found this problem, with the symptom that /dev/rpmsg0 did not show.

After looking and analysing the BM RPMSG code (echo_test BM example), LX LKM virtio_rpmsg_bus, remoteproc, zynq_remoteproc, virtio and virtio_ring I found that in zynq_remoteproc driver probing procedure, the virtio_rpmsg_bus driver in its function rpmsg_probe() will send first notification kick SGI to BM CPU1. This first SGI kick in BM on CPU1 is fundamental, because it will trigger BM RPMSG to send NS (Name Service) message back to CPU0. The virtio_rpmsg_bus RX callback is coded so, that it will at receive of NS message trigger matching and probe of registered RPMSG client device drivers. If match is found, they are probed and dev "rpmsg%d" is created. Now (!) if remote BM CPU1 application misses this first kick (e.g. it has not yet initialized RPMSG infrastructure), it will not get another notification kick from CPU0. BM on CPU1 will wait in its env_acquire_sync_lock(OpenAMPInstPtr.lock); and it will not send NS message to CPU0 master.

The result of this endless waiting on both sides, is that no /dev/rpmsg0 will ever show.
 
Note that in the virtio_rpmsg_bus function rpmsg_probe() code, right before sending notification kick, there is a call to virtio_device_ready() (which mislead me by name for some time). But this call only checks that device driver is configured (it does not assure remote BM is ready to receive kicks).
It is a short description, but I hope it is clear enough. All the sources to look on are at your hands (linux-xilinx git and BM examples).

The above described problem depends on "Who came first" (chicken or egg), how fast CPU0 is vs CPU1, will CPU1 finish RPMSG initialization before CPU0 arrives to virtio_rpmsg_bus first notification kick. I'm sure the problem will pop up in a different scenario (e.g. super fast 4 core CPU with a slightly slower remote processor or adding some xil_printf or some code in-front of remote BM RPMSG initialization).

I run into this problem because I use "JTAG CPU rendezvous" trick to JTAG debug BM application which start (reset) I can't control with JTAG (Digilent HS2 JTAG compatible HW (no JTAG driving of PS_SRST_B).
I can only attach to already running CPU1. This is my remote CPU1 BM application JTAG debug case.
In BM application, usually somewhere in main function, I put some endless loop like this:

int dummy = 0; 
while (g_DebugStop == 0x1234) { 
  dummy += 1;    
} 

 

Note that g_StopDebug is my global static variable initialized to 0x1234 (int g_DebugStop = 0x1234;). With this endless loop I know that CPU1 will wait for me in this loop when I attach with JTAG to CPU1. After break (pause) the CPU1, I change the value of global variable g_StopDebug to something different, I also set breakpoints to my point of interests. Continue running CPU1 will get application out of this endless loop and have my breakpoints set.

Be aware that now-days C/C++ compilers are very smart and activated compile optimization will auto remove useless code.
It works only for debug, no optimization builds.

Once knowing the reason of "no rpmsg0 device" I workaround this problem and continue my JTAG BM application debug on CPU1 (e.g. figure out how to change platform_vatopa() to match my BM/LX mem partitioning).
I added irq kernel driver attribute to zynq_remoteproc drive. Google for Henry Choi "Zynq inter-process interrupts" for an how to example.
This attribute allows me to send kicking SGI from CPU1 to CPU0, after I made the "JTAG CPU1 randevouz" and I know the BM CPU1 has finished RPMSG initialization.
I do this with: echo '?' > /sys/devices/soc0/amba/0.remoteproc/remoteproc0/irq

Writing LKM is not my profession. Debugging LKMs is more necessity because at the end everything on ZYNQ SOC must work (who knows better than an FPGA engineer how the things should work ;-) ). I will not speculate what the best solution/workaround to solve this problem would be, but I'm convinced it is good that this problem is known to other OpenAMP users.

WBR Primoz

0 Kudos
4 Replies
Highlighted
Xilinx Employee
Xilinx Employee
12,470 Views
Registered: ‎09-10-2008

Re: OpenAMP "chicken or egg" problem

Hi,

It seems like this was only an issue, no rpmsg0 created, because you were trying to debug the BM CPU application such that it could not run as normal.  Is that true?

 

If so, were you only debugging to learn more or was there a need for debug?

 

I've also seen that debug of the OpenAMP is a bit tricky because of the handshaking that occurs.  I would think there are some timeouts the can be tuned to allow for debug such that it could wait forever (or a long time) for the handshake.

Thanks
John

0 Kudos
Explorer
Explorer
12,416 Views
Registered: ‎02-22-2012

Re: OpenAMP "chicken or egg" problem

Hi,
Yes, I found this issue because I had to debug BM remote echo_test example. It was learning session, because I have different 512MB 7Z010 HW (not Zed or ZC702) and different LX (no petalinux) and I also tried with different DDR memory partitioning between BM and LX (128MB/384MB). The only way to debug OpenAMP remoteproc_resource_init() BM via JTAG, is to Attach to running target before it is called.
During this session I saw that adding "some" delay before remoteproc_resource_init() call, it will result that BM remote will miss LX master first SGI KICK. For example puting xil_printf("Started BM Main\n\r") in BM main() is long enough to miss LX master first SGI KICK and no NS message will be sent ever.

This way I found that it depends how fast/slow is LX master init vs BM remote init to have BM remoteproc_resource_init() done and ready to catch this first LX master SGI KICK and send back NS message.
I also saw in BM remote rpmsg_rx_callback() function code, that BM remote will send NS message only(!) at first catch SGI KICK. Then it changes its state to RPMSG_CHNL_STATE_ACTIVE and even it gets new SGI KICK it will not(!) send NS again. This is also a potential issue, if for any reason LX master does not get/process NS message sent from BM remote (e.g. during its remoteproc init).

Solution that works all the times (for me) is:
Usualy I have some startup initializations in BM startup that can be long or short in time.
To get rid of condition "Is my BM OpenAMP remote initialization fast enough?", I wait in BM remote startup long enough that LX OpenAMP master startup is done (insmod zynq_remoteproc.ko firmware=image_echo_openamp is done). This way it is guaranteed that BM remote will miss first LX master SGI KICK (and it will not change state to RPMSG_CHNL_STATE_ACTIVE). Only then BM remote calls remoteproc_resource_init() this way it is guaranteed that BM remote is ready to catch first SGI KICK (and it is in RPMSG_CHNL_STATE_IDLE state).
Latter, whenever I want, I can fire from LX SGI KICK via echo '?' > /sys/devices/soc0/amba/0.remoteproc/remoteproc0/irq (this kernel attribute irq_store function calls gic_raise_softirq(cpumask_of(1), irqnum)) and I know my BM remote will catch it and send back required NS message (it is in RPMSG_CHNL_STATE_IDLE state), to trigger all the rest in virtio_rpmsg_bus.
The only modification I did was to add irq kernel attribute (trivial and simple code add to LKM driver) that allows me to trigger SGI from user space, when I want to. All the rest of OpenAMP infrastructure (LX and BM) is unchanged.

This way it works for me every time and BM remote does not depend on what it do in its startup (xil_print, initialization, waiting for JTAG attach, etc).

By the way, I see OpenAMP as very useful technology with all those CPUs available in todays and future SOCs. I will stick with it.
WBR Primoz

0 Kudos
Visitor agent147
Visitor
3,385 Views
Registered: ‎08-18-2017

Re: OpenAMP "chicken or egg" problem

could you please explain how you made that kernel attribute. if possible with some screen shots!

thank you!

0 Kudos
Explorer
Explorer
3,371 Views
Registered: ‎02-22-2012

Re: OpenAMP "chicken or egg" problem

In the top comment there is statement: "Google for Henry Choi "Zynq inter-process interrupts" for an how to example.". Just Google for it .

LX kernel attributes are standard kernel drivers feature (interface from user space) and there are several tutorials and  HowTos.

0 Kudos