UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Scholar pedro_uno
Scholar
8,595 Views
Registered: ‎02-12-2013

4x4 matrix inversion?

Jump to solution

Hello Guys,

 

I need to do a 4x4 matrix inversion for MIMO style antenna combining.  Matrix inversion can be very sensitive to underflow and loss of significance so I thought I would try using a floating point implementation via HLS.  I have code that compiles in Vivado HLS and gives correct simulation results but the logic really blows up.  It seems to be implementing all the 4x4=16 numbers in parallel.

 

My code is written that way, 16 equations for the 16 entries of the inverted matrix, so I am not surprised but I would really like to get logic that implements one adder and one multiplier and then sequences the data through them.  It is ok if the calculation requires hundreds of clock cycles.  These inversions are done at a low rate.

 

Can anyone recommend compiler directives that will cause HLS to serialize the calculations?

 

I attach my code in case you are interested.

 

Thanks in advance.

 

  Pete Dudley

----------------------------------------
DSP in hardware and software
-----------------------------------------
0 Kudos
1 Solution

Accepted Solutions
Xilinx Employee
Xilinx Employee
12,816 Views
Registered: ‎08-17-2011

Re: 4x4 matrix inversion?

Jump to solution

Hello Pete,

 

If you'd have a loop somewhere, i may be easier for the tool.

 

Anyway, here you need a directive allocation - check UG902 or help set_directive_allocation to pickup the right core, you may need to check what is used from the names of the instances.

 

What (I reckon) you will end up with is a FSM that will sequence N set of data through your FP cores via MUXes. The FSM will have N+something states.

 

Let us know which directives you end-up using...

 

good luck!

- Hervé

SIGNATURE:
* New Dedicated Vivado HLS forums* http://forums.xilinx.com/t5/High-Level-Synthesis-HLS/bd-p/hls
* Readme/Guidance* http://forums.xilinx.com/t5/New-Users-Forum/README-first-Help-for-new-users/td-p/219369

* Please mark the Answer as "Accept as solution" if information provided is helpful.
* Give Kudos to a post which you think is helpful and reply oriented.
0 Kudos
6 Replies
Teacher muzaffer
Teacher
8,582 Views
Registered: ‎03-31-2012

Re: 4x4 matrix inversion?

Jump to solution
Yes seeing the code would help.
- Please mark the Answer as "Accept as solution" if information provided is helpful.
Give Kudos to a post which you think is helpful and reply oriented.
0 Kudos
Moderator
Moderator
8,539 Views
Registered: ‎04-17-2011

Re: 4x4 matrix inversion?

Jump to solution
Try UNROLL directive on the loop to see if it helps.
Regards,
Debraj
----------------------------------------------------------------------------------------------
Kindly note- Please mark the Answer as "Accept as solution" if information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
----------------------------------------------------------------------------------------------
0 Kudos
Scholar pedro_uno
Scholar
8,535 Views
Registered: ‎02-12-2013

Re: 4x4 matrix inversion?

Jump to solution

Sorry,

 

Here is the code for the 4x4 matrix inverter.

 

bool

mat_inv_4x4(float m[16], float inv[16])

{

   

float  det;

   

int i;

 

    inv[0] = m[5]  * m[10] * m[15] -

             m[5]  * m[11] * m[14] -

             m[9]  * m[6]  * m[15] +

             m[9]  * m[7]  * m[14] +

             m[13] * m[6]  * m[11] -

             m[13] * m[7]  * m[10];

 

inv[4] = -m[4] * m[10] * m[15] +

              m[4]  * m[11] * m[14] +

              m[8]  * m[6]  * m[15] -

              m[8]  * m[7]  * m[14] -

              m[12] * m[6]  * m[11] +

              m[12] * m[7]  * m[10];

 

    inv[8] = m[4]  * m[9] * m[15] -

             m[4]  * m[11] * m[13] -

             m[8]  * m[5] * m[15] +

             m[8]  * m[7] * m[13] +

             m[12] * m[5] * m[11] -

             m[12] * m[7] * m[9];

 

    inv[12] = -m[4]  * m[9] * m[14] +

m[4] * m[10] * m[13] +

               m[8]  * m[5] * m[14] -

               m[8]  * m[6] * m[13] -

               m[12] * m[5] * m[10] +

               m[12] * m[6] * m[9];

 

    inv[1] = -m[1]  * m[10] * m[15] +

              m[1]  * m[11] * m[14] +

              m[9]  * m[2] * m[15] -

              m[9]  * m[3] * m[14] -

              m[13] * m[2] * m[11] +

              m[13] * m[3] * m[10];

 

    inv[5] = m[0]  * m[10] * m[15] -

             m[0]  * m[11] * m[14] -

             m[8]  * m[2] * m[15] +

             m[8]  * m[3] * m[14] +

             m[12] * m[2] * m[11] -

             m[12] * m[3] * m[10];

 

    inv[9] = -m[0]  * m[9] * m[15] +

              m[0]  * m[11] * m[13] +

              m[8]  * m[1] * m[15] -

              m[8]  * m[3] * m[13] -

              m[12] * m[1] * m[11] +

              m[12] * m[3] * m[9];

 

    inv[13] = m[0]  * m[9] * m[14] -

              m[0]  * m[10] * m[13] -

              m[8]  * m[1] * m[14] +

              m[8]  * m[2] * m[13] +

              m[12] * m[1] * m[10] -

              m[12] * m[2] * m[9];

 

    inv[2] = m[1]  * m[6] * m[15] -

             m[1]  * m[7] * m[14] -

             m[5]  * m[2] * m[15] +

             m[5]  * m[3] * m[14] +

             m[13] * m[2] * m[7] -

             m[13] * m[3] * m[6];

 

    inv[6] = -m[0]  * m[6] * m[15] +

              m[0]  * m[7] * m[14] +

              m[4]  * m[2] * m[15] -

              m[4]  * m[3] * m[14] -

              m[12] * m[2] * m[7] +

              m[12] * m[3] * m[6];

 

    inv[10] = m[0]  * m[5] * m[15] -

              m[0]  * m[7] * m[13] -

              m[4]  * m[1] * m[15] +

              m[4]  * m[3] * m[13] +

              m[12] * m[1] * m[7] -

              m[12] * m[3] * m[5];

 

    inv[14] = -m[0]  * m[5] * m[14] +

               m[0]  * m[6] * m[13] +

               m[4]  * m[1] * m[14] -

               m[4]  * m[2] * m[13] -

               m[12] * m[1] * m[6] +

               m[12] * m[2] * m[5];

 

    inv[3] = -m[1] * m[6] * m[11] +

              m[1] * m[7] * m[10] +

              m[5] * m[2] * m[11] -

              m[5] * m[3] * m[10] -

              m[9] * m[2] * m[7] +

              m[9] * m[3] * m[6];

 

    inv[7] = m[0] * m[6] * m[11] -

             m[0] * m[7] * m[10] -

             m[4] * m[2] * m[11] +

             m[4] * m[3] * m[10] +

             m[8] * m[2] * m[7] -

             m[8] * m[3] * m[6];

 

    inv[11] = -m[0] * m[5] * m[11] +

               m[0] * m[7] * m[9] +

               m[4] * m[1] * m[11] -

               m[4] * m[3] * m[9] -

               m[8] * m[1] * m[7] +

               m[8] * m[3] * m[5];

 

    inv[15] = m[0] * m[5] * m[10] -

              m[0] * m[6] * m[9] -

              m[4] * m[1] * m[10] +

              m[4] * m[2] * m[9] +

              m[8] * m[1] * m[6] -

              m[8] * m[2] * m[5];

 

    det = m[0] * inv[0] + m[1] * inv[4] + m[2] * inv[8] + m[3] * inv[12];

 

   

if (det == 0)

       

returnfalse;

 

    det = 1.0 / det;

 

    det_for:

for (i = 0; i < 16; i++)

        inv[i] = inv[i] * det;

 

   

returntrue;

}

----------------------------------------
DSP in hardware and software
-----------------------------------------
0 Kudos
Xilinx Employee
Xilinx Employee
12,817 Views
Registered: ‎08-17-2011

Re: 4x4 matrix inversion?

Jump to solution

Hello Pete,

 

If you'd have a loop somewhere, i may be easier for the tool.

 

Anyway, here you need a directive allocation - check UG902 or help set_directive_allocation to pickup the right core, you may need to check what is used from the names of the instances.

 

What (I reckon) you will end up with is a FSM that will sequence N set of data through your FP cores via MUXes. The FSM will have N+something states.

 

Let us know which directives you end-up using...

 

good luck!

- Hervé

SIGNATURE:
* New Dedicated Vivado HLS forums* http://forums.xilinx.com/t5/High-Level-Synthesis-HLS/bd-p/hls
* Readme/Guidance* http://forums.xilinx.com/t5/New-Users-Forum/README-first-Help-for-new-users/td-p/219369

* Please mark the Answer as "Accept as solution" if information provided is helpful.
* Give Kudos to a post which you think is helpful and reply oriented.
0 Kudos
Scholar pedro_uno
Scholar
8,483 Views
Registered: ‎02-12-2013

Re: 4x4 matrix inversion?

Jump to solution

Hervé

 

I decided that HLS can unroll loops but it cannot take un-looped code and deduce hardware re-use.

 

Looking at the code above there is a lot of regularity.  The 16 equations each require 6 multiplies so I created a table of pointers to the input data and put the multiplies inside several loops.  This resulted in very compact logic but still written C++.

 

My conclusion is that, even with HLS, you need to keep in mind what logic you want implemented and organize your code that way.

 

Thanks for the help and encouragement.

 

  Pete

----------------------------------------
DSP in hardware and software
-----------------------------------------
Xilinx Employee
Xilinx Employee
8,473 Views
Registered: ‎08-17-2011

Re: 4x4 matrix inversion?

Jump to solution

Thanks Pete for your feedback!

 

I guess that's a fair comment - and feedback to other users.

 

Thinking more about it, the original code has 12 multipliers for each of the 16 inv[] which would probably look like a "complex" spaghetti problem so the tools may try to schedule all of them initially, then followed by the adders etc.

 

The structured loop code is giving strongs hints about what to do for a fairly easier problem.

- Hervé

SIGNATURE:
* New Dedicated Vivado HLS forums* http://forums.xilinx.com/t5/High-Level-Synthesis-HLS/bd-p/hls
* Readme/Guidance* http://forums.xilinx.com/t5/New-Users-Forum/README-first-Help-for-new-users/td-p/219369

* Please mark the Answer as "Accept as solution" if information provided is helpful.
* Give Kudos to a post which you think is helpful and reply oriented.
0 Kudos