We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for
Did you mean:
Visitor
231 Views
Registered: ‎07-02-2019

Optimization Problem

Hi! I created a project to solving linear equations by SOR method but it turned out a huge latency in Synthesis Report.

I hope that someone could help me optimize this project --- directives or codes, anything can help reduce latency.

There is source of a II violation:

if (j != i){

tmpm = (A[i][j] * x0[j]); tmp += tmpm;

}

How can I solve this II violation?

4 Replies
Voyager
202 Views
Registered: ‎03-28-2016

Re: Optimization Problem

You will need to add a PIPELINE directive to one of your 'for' loops (SOR_label1 or SOR_label3).

Search 'pipeline' in UG902 for more details.  Also check out UG1270

Ted Booth - Tech. Lead FPGA Design Engineer
www.designlinxhs.com
Highlighted
Scholar
199 Views
Registered: ‎04-26-2015

Re: Optimization Problem

By far the best way to make this run faster (and also use far fewer resources) is going to involve turning it from floating-point into fixed-point. Most floating-point operations take at least a couple of cycles to complete, which puts a hard limit on how tightly it's possible to pipeline the loops. Double-precision floating-point is even slower. Whereas in fixed-point each operation takes a fraction of a clock cycle so many fixed-point operations can happen simultaneously.

Depending on what sort of resources you can throw at it, pipelining SOR_label3 (which implies unrolling SOR_label1) might be possible in fixed-point. It'll be expensive in hardware (at least 128 DSP slices), but if it yields a 100x+ speed gain then that's worthwhile.

Apart from that, FPGAs and division don't tend to mix very well. The norm /= NL line is fine (divide by 128 is just a left-shift by seven places, effectively free) but the earlier divide by A[i][i] will be nasty. Instead of dividing (B[i]-tmp) by A[i][i], could you just multiply everything else by A[i][i]? Even if then have to do the division later, if this means that you do it once at the end instead of 38400 times (once in each SOR_label3 iteration) then that's a worthwhile saving.

Visitor
183 Views
Registered: ‎07-02-2019

Re: Optimization Problem

Thanks!

I changed double type to <ap_fixed> but C simulation failed. The result became so big.

How could I modify codes to achieve algorithm with fixed point type?

Visitor
171 Views
Registered: ‎07-02-2019

Re: Optimization Problem

Analysis Report shows that it's SOR_lab3 loop costs too many circles. How could I optimized it? BTW, I have optimized division and this leads to large reducement of latency. Thanks a lot for your advice!