cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
gsanthar
Newbie
Newbie
7,537 Views
Registered: ‎07-06-2016

Hard float app in Petalinux 2016.1 gives just shows a very few rise in benchmark performance results

 

One of the Main Features in Petalinux 2016.1 is to have support for hard float compiler option.

I want to find the difference between new hard float point compiler and old softfp one. In relation to this

I executed classic benchmark test for floating point operation such as whetstone and linpack using both the hard floating and softp compiler.

 

But I found a marginal increase in the benchmark performance after moving to hard float point compiler. which is hard to comprehend.

 

I was expecting atleast 50% increase in performance with hard floating point compiler 

 

 

Results are below

Soft Floating Point Compiler:

 

Whetstone Tests Results:

##############################################

 

  Assembler CPUID and RDTSC      

  CPU Cortex A8, Features Code 00000000, Model Code 00000000

   

  Measured - Minimum 1000 MHz, Maximum 1000 MHz

  Linux Functions

  get_nprocs() - CPUs 2, Configured CPUs 2

  get_phys_pages() and size - RAM Size  0.98 GB, Page Size 4096 Bytes

  uname() - Linux, OP4200_SD, 4.0.0-xilinx

  #1 SMP PREEMPT Wed Jun 22 15:51:12 EDT 2016, armv7l

 

##############################################

 

Whetstone Single Precision C Benchmark  Opt 3 64 Bit, Sat Jan  3 15:39:31 1970

 

 

Loop content                   Result              MFLOPS      MOPS   Seconds

 

N1 floating point      -1.12475013732910156       186.039               0.052

N2 floating point      -1.12274742126464844       187.355               0.361

N3 if then else         1.00000000000000000               19465.812     0.003

N4 fixed point         12.00000000000000000                 829.160     0.191

N5 sin,cos etc.         0.49911010265350342                   9.595     4.361

N6 floating point       0.99999982118606567       147.482               1.840

N7 assignments          3.00000000000000000                5972.144     0.016

N8 exp,sqrt etc.        0.75110864639282227                   5.919     3.161

 

MWIPS                                             503.795               9.984

 

Results  to  load  to  spreadsheet        MWIPS   Mflops1   Mflops2   Mflops3   Cosmops   Expmops  Fixpmops    Ifmops    Eqmops

Results  to  load  to  spreadsheet      503.795   186.039   187.355   147.482     9.595     5.919   829.160 19465.812  5972.144

 

Whetstone Tests Results:

##############################################

 

  Assembler CPUID and RDTSC      

  CPU Cortex A8, Features Code 00000000, Model Code 00000000

   

  Measured - Minimum 1000 MHz, Maximum 1000 MHz

  Linux Functions

  get_nprocs() - CPUs 2, Configured CPUs 2

  get_phys_pages() and size - RAM Size  0.98 GB, Page Size 4096 Bytes

  uname() - Linux, petalinux_new, 4.4.0-xilinx

  #6 SMP PREEMPT Tue Jun 21 15:53:06 EDT 2016, armv7l

 

##############################################

 

Whetstone Single Precision C Benchmark  Opt 3 64 Bit, Thu Jan  1 00:03:25 1970

 

 

Loop content                   Result              MFLOPS      MOPS   Seconds

 

N1 floating point      -1.12475013732910156       186.290               0.053

N2 floating point      -1.12274742126464844       187.359               0.368

N3 if then else         1.00000000000000000                9918.454     0.005

N4 fixed point         12.00000000000000000                 819.753     0.197

N5 sin,cos etc.         0.49911010265350342                   9.994     4.271

N6 floating point       0.99999982118606567       147.493               1.876

N7 assignments          3.00000000000000000                5968.970     0.016

N8 exp,sqrt etc.        0.75110864639282227                   5.941     3.212

 

MWIPS                                             513.071               9.999

 

Results  to  load  to  spreadsheet        MWIPS   Mflops1   Mflops2   Mflops3   Cosmops   Expmops  Fixpmops    Ifmops    Eqmops

Results  to  load  to  spreadsheet      513.071   186.290   187.359   147.493     9.994     5.941   819.753  9918.454  5968.970

 

 

Is there benchmark performance results available for Xilinx Zc702 Mercrury board from Xilinx. If so can you please share the results

 

0 Kudos
6 Replies
guillaumebres
Scholar
Scholar
6,803 Views
Registered: ‎03-27-2014

I suppose the second results are the one using the FPU right?

 

I increased my amount of MFlops by 20 using the FPU on the zc706 (I don't use petalinux).

Can you show us the command line you used (especially the -flags) when you compiled the program?

here's what I did:

 

gcc -o main main.c -0fast -ffast-math -mcpu=cortex-a9 -mfpu=neon

 

gw.
Embedded Systems, DSP, cyber
0 Kudos
muzaffer
Teacher
Teacher
6,266 Views
Registered: ‎03-31-2012

does "increased my amount of MFlops by 20 using the FPU on the zc706" mean that MFlops went up roughly by 10% ?
- Please mark the Answer as "Accept as solution" if information provided is helpful.
Give Kudos to a post which you think is helpful and reply oriented.
0 Kudos
guillaumebres
Scholar
Scholar
6,110 Views
Registered: ‎03-27-2014

@muzaffer,

 

I meant by a factor of 20,

this was between an embedded OS without any support for the FPU and a Neon optimized system.

Download the attached image to see it properly

gw.
Embedded Systems, DSP, cyber
mflops_fftw.png
0 Kudos
muzaffer
Teacher
Teacher
6,045 Views
Registered: ‎03-31-2012

>> without any support for the FPU and a Neon optimized system.

this is not a really interested test. I'm looking for numbers where the difference is hard float vs not. Any idea what that difference would be ?

- Please mark the Answer as "Accept as solution" if information provided is helpful.
Give Kudos to a post which you think is helpful and reply oriented.
0 Kudos
guillaumebres
Scholar
Scholar
6,010 Views
Registered: ‎03-27-2014


muzaffer wrote:

this is not a really interested test. I'm looking for numbers where the difference is hard float vs not. Any idea what that difference would be ?


I have never compared virtual floating point against floating point in hardware

gw.
Embedded Systems, DSP, cyber
0 Kudos
milosoftware
Scholar
Scholar
5,963 Views
Registered: ‎10-26-2012

Usually results like this are the result of changing the float ABI, not the FPU itself.

 

All build environments for the Zynq use the vfp and neon floating point instructions by default. Many use the "softfp" ABI, which still uses the FPU but exchanges data with libraries using the CPU registers instead of FPU registers. This makes the library compatible with both hard- and softfloat machines. This is marginally slower than hardfloat ABI.

 

As for neon and vfp, since having neon implies vfp, the compiler will decide which is to be used and will already make the correct choice, no need to disable one or the other.

 

As for benchmarking, "hard" float implementations usually run at least 10x faster than their software-only counterparts.

0 Kudos