UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Visitor zathras129
Visitor
354 Views
Registered: ‎06-07-2018

SDAccel adding calls to the same routine causes them to stop being pipelined

Greetings,

I'm baffled by this.  I have a routine through which I'm processing data that I want to instantiate (not via additional kernels) multiple times.  Once I've gotten a good throughput I'm planning on instantiating to more kernels to max out the FPGA. 

When I have 4 compute routines, the System HLS report shows they are pipelined and my data throughput is commensurate.  When I try and go to 8 compute routines, it ceases pipelining the  routines and I lose throughput.  The compute routine always takes the same amount of cycles regardless of input data.  I'm out of ideas on how to add more compute routines.  Last time I tried to add kernels it couldn't due to resources (this one is super LUT heavy and no DSP48s).

This example IS pipelined according to the HLS report and resulting throughput results:

__attribute__((xcl_pipeline_workitems))
{
  computeRoutine0(inputdata0, result0);
  computeRoutine0(inputdata1, result1);
  computeRoutine0(inputdata2, result2);
  computeRoutine0(inputdata3, result3);
}

OR alternately if I just copy and paste to replicate the routine in case the complier isn't replicating the routine to pipeline it:

__attribute__((xcl_pipeline_workitems))
{
  computeRoutine0(inputdata0, result0);
  computeRoutine1(inputdata1, result1);
  computeRoutine2(inputdata2, result2);
  computeRoutine3(inputdata3, result3);
}

But as soon as I increase it to 8 routines it's no longer pipelined according to the HLS report and the throughput numbers are terrible.

__attribute__((xcl_pipeline_workitems))
{
  computeRoutine0(inputdata0, result0);
  computeRoutine0(inputdata1, result1);
  computeRoutine0(inputdata2, result2);
  computeRoutine0(inputdata3, result3);
  computeRoutine0(inputdata4, result4);
  computeRoutine0(inputdata5, result5);
  computeRoutine0(inputdata6, result6);
  computeRoutine0(inputdata7, result7);
}

and of course the manual copy and paste method also produces the same non-pipelined result and terrible throughput.

__attribute__((xcl_pipeline_workitems))
{
  computeRoutine0(inputdata0, result0);
  computeRoutine1(inputdata1, result1);
  computeRoutine2(inputdata2, result2);
  computeRoutine3(inputdata3, result3);
  computeRoutine4(inputdata4, result4);
  computeRoutine5(inputdata5, result5);
  computeRoutine6(inputdata6, result6);
  computeRoutine7(inputdata7, result7);
}

Right after this routine I'm simply storing the results in an array so there are no conditional statements within this loop.  After the loop I'm processing the data.  Interestingly, when I do a HW Emulation build, the system is over-mapped but once it starts optimizing the logic for a System build it gets to less than 50% mapped with the 4 routines.  When I go to 8 it increases usage but not double, presumably due to some optimization/overhead that isn't replicated.

I'm baffled why adding the additional calls to the exact same routine lead it to stop pipelining.  I'm wondering if it is because the compiler isn't fitting the design so it automatically drops back in size by switching to sequential execution?  I'm on 2017.2 with an Alpha Data 7V3 card.

Thoughts are very appreciated.

 

 

0 Kudos
2 Replies
Xilinx Employee
Xilinx Employee
322 Views
Registered: ‎03-24-2010

Re: SDAccel adding calls to the same routine causes them to stop being pipelined

Please provide test case for us to investigate. For the test case, please also include test bench. Thanks!

Regards,
brucey
----------------------------------------------------------------------------------------------
Kindly note- Please mark the Answer as "Accept as solution" if information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
----------------------------------------------------------------------------------------------
0 Kudos
Visitor zathras129
Visitor
270 Views
Registered: ‎06-07-2018

Re: SDAccel adding calls to the same routine causes them to stop being pipelined

So I reached out to another engineer and got help re-writing my computeRoutine code so it takes notably less resources (use of a look up table rather than logic was the biggest impact).  This has allowed me to add additional computeRoutine calls.  I'm still curious how I could more easily tell when it switches to sequential versus pipelined execution.

0 Kudos