UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Scholar milosoftware
Scholar
9,771 Views
Registered: ‎10-26-2012

QSPI performance under Linux is very disappointing

"High performance - QSPI is the fastest configuration solution"

http://www.xilinx.com/support/answers/50991.htm

 

Well, maybe under some very particualr conditions. However, in the real world, the QSPI throughput is much much lower than I'd been expecting - less than 10% of the theoretically attainable rate.

 

Just a quick example:

 

root@zedboard:~# echo 1 > /proc/sys/vm/drop_caches
root@zedboard:~# dd if=/dev/mtd4 of=/dev/null bs=128k count=200
200+0 records in
200+0 records out
26214400 bytes (25.0MB) copied, 5.971374 seconds, 4.2MB/s

root@zedboard:~# dd if=/dev/mmcblk0 of=/dev/null bs=128k count=200
200+0 records in
200+0 records out
26214400 bytes (25.0MB) copied, 1.337640 seconds, 18.7MB/s
root@zedboard:~#

Here you can see that reading the QSPI memory device delivers about 4.2MB per second. The SD card delivers about 18MB/s, much much better, so I can't blame Linux for being the bottleneck here.

 

In theory, the QSPI memory is capable of transferring 4 bits per clock, at a clock rate of 104MHz. (Spansion datasheet for the s25fl256s, which is the flash chip on the Zedboard). That would amount to a whopping 416 Mbps, or about 42MB/s in raw data throughput.

 

Now I'm willing to accept that some extra overhead is needed, so I'd been expecting to see about 70% of that bandwidth in real performance.

 

While evaluating, I found that the device tree has set the maximum speed to 50MHz. This resulted in a measured transfer of 3.8 MB/s. Changing the devicetree for the qspi to 104MHz increased that to the 4.2 MB/s I measured above. I also checked the m25p80.c driver code, added some extra debugging statements, and verified that the chip is indeed being used in quad-read mode.

 

Is there an explanation for this disappointing QSPI flash performance?

0 Kudos
15 Replies
Visitor jhelder
Visitor
9,753 Views
Registered: ‎02-27-2013

Re: QSPI performance under Linux is very disappointing

There is no DMA for the QSPI peripheral, so everything is is dependent on polling.  If the device is programmed in Linear Addressing Mode (effectively memory mapped), then the PL330 can be configured to do mem-mem copies (but only read-only) at high speed.  Extending the QSPI to use the DMA in linux is not possible (if you were so inclined) because the PL330 DMA control signals are not routed to any of the on-board peripherals.  [The only peripherals that can take advantage of the PL330 are those in fabric.  The onboard peripherals that do have DMA modes have dedicated units.]

 

That is where I expect the performance boost in Linux are because of the dedicated DMA units in the SDIO interfaces.  The effective BootROM performance is different because of how the peripherals are configured at that time. 

 

The FIFOs for the QSPI are only 63 words deep, which is not that much when the interface is operating at 104MHz.  If you view the signal lines, I suspect you will see (long) dwell time between bursts.  [In other words, your interface is idle most of the time because Linux cannot keep the FIFO full.]

 

I don't have much experience with QSPI and do not know which kernel version you are using, but maybe someone else can chime in if there have been any performance enhancements in a kernel newer than yours.

 

0 Kudos
Visitor mstubley
Visitor
9,742 Views
Registered: ‎06-27-2013

Re: QSPI performance under Linux is very disappointing


 

Looking at linux-xlnx on gituhub yesterday there appear to have been several performance improvements in the QSPI driver though I don't think there is an official release yet. Have a look at the history for spi-zynq-qspi.c

 


 

0 Kudos
Scholar milosoftware
Scholar
9,737 Views
Registered: ‎10-26-2012

Re: QSPI performance under Linux is very disappointing

I'll cherry-pick the QSPI changes and see what their impact is. Expect new benchmarks today...

 

As for the DMA, is it technically impossible to use DMA for the QSPI transfers (to RAM)? Sounds very unlikely to me.

 

How about the "regular" SPI controller? If that does support DMA, we're better off hooking up the memory device to a normal SPI controller. That would at least offload the CPU.

 

0 Kudos
Scholar milosoftware
Scholar
9,735 Views
Registered: ‎10-26-2012

Re: QSPI performance under Linux is very disappointing

root@zedboard:~# echo 1 > /proc/sys/vm/drop_caches
root@zedboard:~# dd if=/dev/mtd4 of=/dev/null bs=128k count=200
200+0 records in
200+0 records out
26214400 bytes (25.0MB) copied, 1.080401 seconds, 23.1MB/s

 

The effect on boot time with UBI in flash is also noticable, shaving yet another second off...

0 Kudos
Visitor mstubley
Visitor
9,725 Views
Registered: ‎06-27-2013

Re: QSPI performance under Linux is very disappointing

Just to clarify, is your new benchmark with the changes from github ? Almost 6x faster ?

 

0 Kudos
Scholar milosoftware
Scholar
9,711 Views
Registered: ‎10-26-2012

Re: QSPI performance under Linux is very disappointing

Yep, I cherry-picked all QSPI related commits into the kernel, and then built it.

 

This is the kernel I'm using:

https://github.com/milosoftware/linux

0 Kudos
Observer schleifer
Observer
9,651 Views
Registered: ‎11-16-2007

Re: QSPI performance under Linux is very disappointing

Hello Milo,

 

your Repository is not accessable anymore.

0 Kudos
Scholar milosoftware
Scholar
9,646 Views
Registered: ‎10-26-2012

Re: QSPI performance under Linux is very disappointing

The repository has been moved to here:

https://github.com/topic-embedded-products/linux

0 Kudos
Adventurer
Adventurer
9,554 Views
Registered: ‎05-01-2012

Re: QSPI performance under Linux is very disappointing

Could you please explain the steps by which you changed the clock frequncy to 104Mhz?

0 Kudos
Scholar milosoftware
Scholar
5,082 Views
Registered: ‎10-26-2012

Re: QSPI performance under Linux is very disappointing

0 Kudos
Adventurer
Adventurer
5,029 Views
Registered: ‎05-01-2012

Re: QSPI performance under Linux is very disappointing

I tried dd if=/dev/mtd4 of=/dev/null bs=128k count=200 for benchmarking, but do not prints the rate as it is in your case.

root@Xilinx-ZC702-14_7:/dev# dd if=/dev/mtd3 of=/dev/null bs=128k count=200
3+0 records in
3+0 records out

then nothing. \Why it like that ? Just FYI I am booting from SD card when I am typing these commands.
0 Kudos
Scholar rfs613
Scholar
5,023 Views
Registered: ‎05-28-2013

Re: QSPI performance under Linux is very disappointing

Some versions of "dd" command do not display the rates.

You can prefix the dd command with "time", this will report how long the program took to execute. From that you can calculate the rate. Note that for very short times, or very small transfers, this may be rather inaccurate.
0 Kudos
Scholar milosoftware
Scholar
5,016 Views
Registered: ‎10-26-2012

Re: QSPI performance under Linux is very disappointing

I configured busybox to use a more verbose dd command. If you use yocto and/or openembedded, running "bitbake -c menuconfig busybox" is the key to doing that.
0 Kudos
Adventurer
Adventurer
5,004 Views
Registered: ‎05-01-2012

Re: QSPI performance under Linux is very disappointing

Thanks for the info. I will try this one and post my results here. But I still have one doubt about how do I increase qspi flash clock frquency. You posted a link : https://github.com/topic-embedded-products/linux/commit/08f4a2f4ad2454ab8b371b487a3190e0802479c4

But I dont know what I have to do. Sorry if I am asking too trivial things but I tried finding the file zynq-zed.dtsi in workspace directory where I have installed petalinux, but failed to locate it.
0 Kudos
Scholar milosoftware
Scholar
5,001 Views
Registered: ‎10-26-2012

Re: QSPI performance under Linux is very disappointing

You'll have to change the frequency parameter in the devicetree that you are actually using.

0 Kudos