03-08-2018 02:58 PM
I have written a set of test applications, targeted to the Zynq (PicoZed card).
The first application, running on Core0 implements a network benchmark via LWIP (similar to IPERF4).
The 2nd application, running on Core1 implements a hello world app, but after printing out hello world, runs in a busy loop doing nothing (an empty loop forever).
I am running under the SDK with a JTag cable to download and debug with.
The network benchmark reports about 95 MB/s when both cores have programs running. If I suspend core 1 using the JTAG the network benchmark remains at 95 MB/s.
However, if I only load and run an app on Core0 (single core only used) then the network benchmark sustains about 107 MB/s.
What is going on, why is the benchmark degraded when the 2nd core has a program loaded (either running or suspended)? The busy loop in core 1 should be running out of L1 cache and should not be impacting core 0 performance, especially when it is suspended.
03-09-2018 07:00 AM
I would like to add a little more detail. I see 3 tiers of performance degradation:
* CASE 1: When the 2nd core is doing almost nothing in a tight loop the network benchmark is about 75 MB/s:
* CASE 2: When the 2nd core is doing even less in a tight loop the network benchmark is about 95 MB/s (this is also the performance if the 2nd core exits):
__asm ("Yield \n");
* CASE 3: When the 2nd core is never enabled (not running AMP) the network benchmark is about 108 MB/s
Is there something that lowers the clock rate (e.g. due to thermal load) when both cores are active?
01-28-2019 06:20 AM
Anyone have any suggestions on how to make LWIP performance acceptable with both core's active? I would like one core to be dedicated to LWIP and the other to other processing - and I would expect the other processing to run almost always out of the cache with limited memory contention.