07-01-2013 06:04 PM
The ARM Cortex-A9 Floating-Point Unit Technical Reference Manual states "The FPU is a VFPv3-D16 implementation of the ARMv7 floating-point architecture" (section 1.1, page 1-2)
UG585-ZYNQ-7000-TRM states "Dual Arm Cortex-A9 MPcore CPUs with ARM v7 ... NEON™ 128b SIMD coprocessor and VFPv3 per MPCore" (section 1.2.1, pae 31)
Which is it? The full VFPv3, in which case it is not stock Cortex-A9 or the standard VFPv3-D16?
It matters because the D16 has only 16 64-bit FPU registers whereas the full VFPv3 has 64 64-bit FPU registers.
gcc takes the mfpu argument as -mfpu=vfpv3 or -mfpu=vfpv3-d16 and I need to know which one to give it.
d16 seems to be the safe bet as code will never be generated for registers 17 through 32, but I don't want to give up on rescources that actually exists.
07-03-2013 04:55 AM
There's also the option "-mfpu=neon" (yes, the neon is also a, which is what I guess most people are using for the ARM9, so I suspect that yields the best performance.
07-03-2013 05:15 AM
Hmm, I didn't mean to post it like that.
Anyway, I've been using "-mfpu=neon" for ages now on the Zynq, and from the documentation I gathered that this option implies "-mfpu=vfpv3", so the simple answer to your question appears to be that the ARM on the Zynq has a "full" VFPv3 implementation.
07-03-2013 06:34 AM
(mfpu=neon) != (mfpu=vfp|vfpv3|...) The neon engine and the VFP unit are different hardware units with different microcode. the neon engine is only capable of floating point calculations while the VFP can do doubles. There is a flag mpfu=neon-vfpv3 (or something like that) that is supposed to enable both, though I am not sure if gcc really does much with that at the moment.
07-03-2013 06:37 AM
07-04-2013 05:32 AM
With "-mfpu=neon" the GCC compiler will also generate VFPv3 floating point code. It won't actually use NEON instructions unless you also compile with -O3 (or with the vectorize option) and-funsafe-math-optimizationsbecause the NEON FPU does not comply with the complete standard. The NEON is not related to the VFP at all, but apparently you cannot buy an ARM with NEON but without the VFP, or something to that effect.
If you don't have any FP calculations that are single precision and can be vectorized, there is no difference in output between -mfpu=neon and -mfpu=vfpv3.
If you want to be sure, compile with the -S option to see the assembly output. The VFD instructions usually start with "F".
08-11-2014 10:08 AM
It does look like the full VFPv3 is implemented. In section 3.2.7, the TRM mentions...
Large, shared register file, addressable as:
° Thirty-two 32-bit S (single) registers
° Thirty-two 64-bit D (double) registers
... which implies that it's the full VFPv3 implementation.