UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

LWIP Bare Metal AMP Performance Worse Than Single Core Only

Reply
Highlighted
Visitor
Posts: 18
Registered: ‎08-14-2015

LWIP Bare Metal AMP Performance Worse Than Single Core Only

I have written a set of test applications, targeted to the Zynq (PicoZed card).

 

The first application, running on Core0 implements a network benchmark via LWIP (similar to IPERF4).

 

The 2nd application, running on Core1 implements a hello world app, but after printing out hello world, runs in a busy loop doing nothing (an empty loop forever).

 

I am running under the SDK with a JTag cable to download and debug with.

 

The network benchmark reports about 95 MB/s when both cores have programs running.  If I suspend core 1 using the JTAG the network benchmark remains at 95 MB/s.

 

However, if I only load and run an app on Core0 (single core only used) then the network benchmark sustains about 107 MB/s.

 

What is going on, why is the benchmark degraded when the 2nd core has a program loaded (either running or suspended)?  The busy loop in core 1 should be running out of L1 cache and should not be impacting core 0 performance, especially when it is suspended.

Visitor
Posts: 18
Registered: ‎08-14-2015

Re: LWIP Bare Metal AMP Performance Worse Than Single Core Only

I would like to add a little more detail.  I see 3 tiers of performance degradation:

 

* CASE 1: When the 2nd core is doing almost nothing in a tight loop the network benchmark is about 75 MB/s:

  while (1)

  {

    printf("Hello\n");

    sleep(2);

  }

 

* CASE 2: When the 2nd core is doing even less in a tight loop the network benchmark is about 95 MB/s (this is also the performance if the 2nd core exits):

  while (1)

  {

    __asm ("Yield \n");

  }

 

* CASE 3: When the 2nd core is never enabled (not running AMP) the network benchmark is about 108 MB/s

 

Is there something that lowers the clock rate (e.g. due to thermal load) when both cores are active?