cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
mamanakis
Newbie
Newbie
8,490 Views
Registered: ‎07-01-2013

What "flavor" Vector Floating Point Unit does the ZYNQ APU have: Full VFPv3 or VFPv3-d16

The ARM Cortex-A9 Floating-Point Unit Technical Reference Manual states "The FPU is a VFPv3-D16 implementation of the ARMv7 floating-point architecture" (section 1.1, page 1-2)

 

UG585-ZYNQ-7000-TRM states "Dual Arm Cortex-A9 MPcore CPUs with ARM v7 ... NEON™ 128b SIMD coprocessor and VFPv3 per MPCore" (section 1.2.1, pae 31)

 

Which is it? The full VFPv3, in which case it is not stock Cortex-A9 or the standard VFPv3-D16?

 

It matters because the D16 has only 16 64-bit FPU registers whereas the full VFPv3 has 64 64-bit FPU registers.

 

gcc takes the mfpu argument as -mfpu=vfpv3   or  -mfpu=vfpv3-d16 and I need to know which one to give it.

 

d16 seems to be the safe bet as code will never be generated for registers 17 through 32, but I don't want to give up on rescources that actually exists.

 

7 Replies
milosoftware
Scholar
Scholar
8,472 Views
Registered: ‎10-26-2012

There's also the option "-mfpu=neon" (yes, the neon is also a, which is what I guess most people are using for the ARM9, so I suspect that yields the best performance.

0 Kudos
milosoftware
Scholar
Scholar
8,469 Views
Registered: ‎10-26-2012

Hmm, I didn't mean to post it like that.

 

Anyway, I've been using "-mfpu=neon" for ages now on the Zynq, and from the documentation I gathered that this option implies "-mfpu=vfpv3", so the simple answer to your question appears to be that the ARM on the Zynq has a "full" VFPv3 implementation.

0 Kudos
mamanakis
Newbie
Newbie
8,463 Views
Registered: ‎07-01-2013

NOTE: I got carried away with my "64"s in the first post. The full VFPv3 has 32 64-bit FPR registers

0 Kudos
mamanakis
Newbie
Newbie
8,462 Views
Registered: ‎07-01-2013

(mfpu=neon) != (mfpu=vfp|vfpv3|...) The neon engine and the VFP unit are different hardware units with different microcode. the neon engine is only capable of floating point calculations while the VFP can do doubles. There is a flag mpfu=neon-vfpv3 (or something like that) that is supposed to enable both, though I am not sure if gcc really does much with that at the moment.

0 Kudos
mamanakis
Newbie
Newbie
8,459 Views
Registered: ‎07-01-2013

neon does generally work better, but I am restricted to the VFP by some middleware and the fact that I have a whole lot of doubles to process which the VFP has support for but neon does not.
0 Kudos
milosoftware
Scholar
Scholar
8,447 Views
Registered: ‎10-26-2012

With "-mfpu=neon" the GCC compiler will also generate VFPv3 floating point code. It won't actually use NEON instructions unless you also compile with -O3 (or with the vectorize option) and-funsafe-math-optimizationsbecause the NEON FPU does not comply with the complete standard. The NEON is not related to the VFP at all, but apparently you cannot buy an ARM with NEON but without the VFP, or something to that effect.

 

If you don't have any FP calculations that are single precision and can be vectorized, there is no difference in output between -mfpu=neon and -mfpu=vfpv3.

 

If you want to be sure, compile with the -S option to see the assembly output. The VFD instructions usually start with "F".

0 Kudos
michelcharette
Observer
Observer
7,208 Views
Registered: ‎06-07-2012

It does look like the full VFPv3 is implemented.  In section 3.2.7, the TRM mentions...

Large, shared register file, addressable as:

° Thirty-two 32-bit S (single) registers

° Thirty-two 64-bit D (double) registers

 

... which implies that it's the full VFPv3 implementation.

 

0 Kudos