05-21-2021 05:40 AM - edited 05-21-2021 05:44 AM
First off, a quick disclaimer that I am not massively familiar with multithreading support under petalinux and how things affect it.
I am running an application on an RFSoC (so 4 core ARM APU) which has five threads: one for running I/O type activities (primarily over Ethernet), three "worker" threads and one "coordinator" thread. This is a very soft-RT type design - more detail below.
The workers are all set up the same: they sit on a pthread_cond_wait and, once released, call a function (each worker is set up to run a different function), and then wait again. They do have a deadline (~500us) but the actual code they run is pretty small, so only takes ~100us and hence I don't really care about the non-RT latency of 30us+ (although I have changed the kernel pre-emption mode to be low latency desktop - more on that later).
I have an interrupt that the coordinator thread waits for (read on generic-uio IRQ) and it then reads some PL registers to determine which of the three workers need to be run and then calls the appropriate pthread_cond_signal (there are three separate conditions and three separate mutexes). Sometimes more than one thing needs to be run at the same time and so several of the workers will be kicked into action at the same time.
As far as I am concerned, this should work well as I have one CPU for the OS, I/O etc and three more cores, one for each worker thread (which are the majority load of the system). When I actually ran it the first time, however, I saw that the three worker threads had very variable run time (they should be pretty much the same every go-round - it is a fairly deterministic algorithm) - for example, one had run times mostly around 80us but sometimes they spiked to 130us, 180us, 230us and sometimes very high numbers (~950us). I did note that the jumps appeared to be multiples of 50us approximately. I concluded that they were being interrupted by other processes and therefore changed the controller and worker threads to use SCHED_FIFO rather than SCHED_OTHER and set the priority to 99 (no errors in function calls and read-back of the values confirmed that they were set correctly). This resulted in near identical behaviour.
I then wasted a bit of time trying to improve matters (I patched the kernel with PREEMPT_RT and various other things) but they were all pointless asides. I eventually realised that the issue was that the three workers were interrupting each other because they were all running on the same CPU (i.e. I had one very busy CPU and three nearly idle ones). As a result, I constrained the OS and everything else to CPU 0 using isolcpus=1,2,3 as a bootarg (and confirmed it worked using ps -e -m -o psr,command) and then used pthread_attr_setaffinity_np prior to thread creation to tie the three worker threads one each to CPUs 1, 2 and 3 (again, confirmed that it worked using ps). So now I have them all sitting on their own CPU but the worker threads never run.
Looking into this issue, it seems that once they are on different CPUs, the pthread_cond_signal does not get picked up by the waiting thread (i.e. I can see that the worker thread never starts but that the controller thread does call signal). I presume that this is because each CPU is looking at its (local) L1 cache and never flushing the results to/from the (common) L2 cache.
So my questions:
Many thanks in advance for your help.
05-21-2021 07:39 AM
Ignore this, just me doing something very stupid in the code (nothing to do with mutexes though - I already had those implemented properly). Now works just fine.