UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Visitor wa-fpgaking
Visitor
231 Views
Registered: ‎07-02-2019

Optimization Problem

Hi! I created a project to solving linear equations by SOR method but it turned out a huge latency in Synthesis Report.

I hope that someone could help me optimize this project --- directives or codes, anything can help reduce latency.

There is source of a II violation:

if (j != i){

tmpm = (A[i][j] * x0[j]); tmp += tmpm;

}

How can I solve this II violation?

0 Kudos
4 Replies
Voyager
Voyager
202 Views
Registered: ‎03-28-2016

Re: Optimization Problem

You will need to add a PIPELINE directive to one of your 'for' loops (SOR_label1 or SOR_label3).

Search 'pipeline' in UG902 for more details.  Also check out UG1270

Ted Booth - Tech. Lead FPGA Design Engineer
www.designlinxhs.com
0 Kudos
Highlighted
Scholar u4223374
Scholar
199 Views
Registered: ‎04-26-2015

Re: Optimization Problem

By far the best way to make this run faster (and also use far fewer resources) is going to involve turning it from floating-point into fixed-point. Most floating-point operations take at least a couple of cycles to complete, which puts a hard limit on how tightly it's possible to pipeline the loops. Double-precision floating-point is even slower. Whereas in fixed-point each operation takes a fraction of a clock cycle so many fixed-point operations can happen simultaneously.

 

Depending on what sort of resources you can throw at it, pipelining SOR_label3 (which implies unrolling SOR_label1) might be possible in fixed-point. It'll be expensive in hardware (at least 128 DSP slices), but if it yields a 100x+ speed gain then that's worthwhile.

 

Apart from that, FPGAs and division don't tend to mix very well. The norm /= NL line is fine (divide by 128 is just a left-shift by seven places, effectively free) but the earlier divide by A[i][i] will be nasty. Instead of dividing (B[i]-tmp) by A[i][i], could you just multiply everything else by A[i][i]? Even if then have to do the division later, if this means that you do it once at the end instead of 38400 times (once in each SOR_label3 iteration) then that's a worthwhile saving.

Visitor wa-fpgaking
Visitor
183 Views
Registered: ‎07-02-2019

Re: Optimization Problem

Thanks! 

I changed double type to <ap_fixed> but C simulation failed. The result became so big.

How could I modify codes to achieve algorithm with fixed point type?

0 Kudos
Visitor wa-fpgaking
Visitor
171 Views
Registered: ‎07-02-2019

Re: Optimization Problem

Analysis Report shows that it's SOR_lab3 loop costs too many circles. How could I optimized it? BTW, I have optimized division and this leads to large reducement of latency. Thanks a lot for your advice! 

0 Kudos