UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Scholar jprice
Scholar
391 Views
Registered: ‎01-28-2014

Problems with unrolling loops containing inner loops

I've been out of the Vivado HLS game a while, but am looking at it for an application. However I've run into a fairly large roadblock almost immediately. I've got a loop I want to runroll (constant bound) that contains an inner loop (also constant bound). Inside that inner loop is a function call that does some math on some inputs. I'd like the outter loop unrolled creating N copies of the inner loop logic and N instances of the math function. Try as I might, I cannot get the N instances of the math function. If I remove the inner loop, I am able to get the N instances but that doesn't meet my needs. 

Code Below:

#include <ap_int.h>

#define NUM_A 8
#define NUM_B 8
#define NUM_C 4

typedef ap_uint<16> uint16;
typedef ap_uint<32> uint32;

struct TestStruct
{
  uint16 numbers_a[NUM_A];
  uint16 numbers_b[NUM_B];
  uint16 special_num;
};

uint32 do_math(TestStruct input, uint32 input2)
{
 #pragma HLS INLINE off
 #pragma HLS ARRAY_PARTITION variable=input.numbers_a complete dim=1
 #pragma HLS ARRAY_PARTITION variable=input.numbers_b complete dim=1

 uint32 value = input.special_num*input2;
 Outer_Loop: for(uint16 i = 0; i < NUM_A; i++)
 {
    #pragma HLS PIPELINE II=1
    Inner_Loop: for(uint16 j = 0; j < NUM_B; j++)
    {
      value = value + input.numbers_a[i]*input.numbers_b[j];
    }
 }

 return value;
}

uint32 test(const TestStruct input[NUM_C])
{
  #pragma HLS ARRAY_PARTITION variable=input complete dim=1
  #pragma HLS ARRAY_PARTITION variable=input.numbers_a complete dim=1
  #pragma HLS ARRAY_PARTITION variable=input.numbers_b complete dim=1

  uint32 largest_value = 0xFFFFFFFF;
  uint32 values[NUM_C] = {largest_value};

  #pragma HLS ARRAY_PARTITION variable=values complete dim=1
  Loop_A: for(uint16 i = 0; i < NUM_C; i++)
  {
      #pragma HLS UNROLL
	  Loop_B: for(uint16 j = 0; i < 8; i++)
	  {
         uint32 value = do_math(input[i], i+j);
         if(value < values[i])
        	 values[i] = value;
	  }
  }

  for(uint16 i = 0; i < NUM_C; i++)
  {
	  if(values[i] < largest_value)
       largest_value = values[i];
  }
  return largest_value;
}

Results:

results.png

In this example I expect 4 instances of do_math but I only get 4. Hopefully this is a simple mistake that someone can help me correct.

Thanks!

0 Kudos
8 Replies
Scholar u4223374
Scholar
376 Views
Registered: ‎04-26-2015

Re: Problems with unrolling loops containing inner loops

@jprice Have you tried the FUNCTION_INSTANTIATE pragma? That might do it. Or you could turn inline ON for do_math, which would have a similar effect.

 

Apart from that, your loops look wrong. The inner loop bounds depend on "i" (not "j") which seems like an error.

0 Kudos
Scholar jprice
Scholar
363 Views
Registered: ‎01-28-2014

Re: Problems with unrolling loops containing inner loops

@u4223374 

Thanks for the response. Sorry, the inner loop variable was indeed a mistake when creating the example. However oddly enough fixing it has 0 effect. The problem remains. Turning inline on doesn't actually appear to result in the desired behavior. The overall latency goes down a bit and it does go from 1 to 4 DSPs indicating it might have worked. However the latency decreased by about 10%, where I'd expect close to a factor of 4 in this case. Analysis perspective suggests each unrolled loop is executed seequentially. This is sometimes indicative of a memory being used, but no memories are being used in this case. Also in my real problem, when I inline HLS decides that'll be to big and stop. Even though my real problem isn't all that big, muliplying by a factor of 16 would still be rather small (maybe 40k LUTS/Flops and a few dozen DSP48s).

0 Kudos
Scholar jprice
Scholar
350 Views
Registered: ‎01-28-2014

Re: Problems with unrolling loops containing inner loops

As a work around for someone struggling with the same issue, if I merge loop A and loop B by hand, rederive the indicies and use that I get the expected behavior. Note the loop unroll factor should be the old loop's over bound. Inlining of do_math must also be turned off. This feels like a bug with HLS to me, but perhaps I am misunderstanding a detail.

Example Code Below:

#include <ap_int.h>

#define NUM_A 8
#define NUM_B 8
#define NUM_C 4

typedef ap_uint<16> uint16;
typedef ap_uint<32> uint32;

struct TestStruct
{
  uint16 numbers_a[NUM_A];
  uint16 numbers_b[NUM_B];
  uint16 special_num;
};

uint32 do_math(TestStruct input, uint32 input2)
{
 #pragma HLS INLINE off
 #pragma HLS ARRAY_PARTITION variable=input.numbers_a complete dim=1
 #pragma HLS ARRAY_PARTITION variable=input.numbers_b complete dim=1

 uint32 value = input.special_num*input2;
 Outer_Loop: for(uint16 i = 0; i < NUM_A; i++)
 {
    #pragma HLS PIPELINE II=1
    Inner_Loop: for(uint16 j = 0; j < NUM_B; j++)
    {
      value = value + input.numbers_a[i]*input.numbers_b[j];
    }
 }

 return value;
}

uint32 test(const TestStruct input[NUM_C])
{
  #pragma HLS ARRAY_PARTITION variable=input complete dim=1
  #pragma HLS ARRAY_PARTITION variable=input.numbers_a complete dim=1
  #pragma HLS ARRAY_PARTITION variable=input.numbers_b complete dim=1

  uint32 largest_value = 0xFFFFFFFF;
  uint32 values[NUM_C] = {largest_value};

  #pragma HLS ARRAY_PARTITION variable=values complete dim=1
  Loop_A: for(uint16 test = 0; test < NUM_C*8; test++)
  {
      #pragma HLS UNROLL factor=4
	  uint16 i = test(4, 3);
	  uint16 j = test(2, 0);
      uint32 value = do_math(input[i], i+j);
      if(value < values[i])
        values[i] = value;

  }

  for(uint16 i = 0; i < NUM_C; i++)
  {
	  if(values[i] < largest_value)
       largest_value = values[i];
  }
  return largest_value;
}
0 Kudos
Xilinx Employee
Xilinx Employee
237 Views
Registered: ‎09-04-2017

Re: Problems with unrolling loops containing inner loops

Hi,

  when we unroll the outer loop, what essentially happens is that HLS creates 4 copies of LOOP_B. Now the function if you see is same across all the unrolled loops and HLS is trying to reuse across all of them. That's the reason we don't see parallel execution.

One of the ways is what you already have suggested. There are few more alternatives

1. Move the inner loop into a function. Now the unroll should get us parallel execution for the 4 copies of the function

2. Swap the loops and unroll the inner loop

Thanks,

Nithin

0 Kudos
Scholar jprice
Scholar
222 Views
Registered: ‎01-28-2014

Re: Problems with unrolling loops containing inner loops

@nithink,

Thanks for the response. I understood unrolling is creating a copy, that's very much my intention. However if it's creating a copies, why would it try to perform reuse if not absolutely necessary? I find that a bit of a puzzling response. The whole point of unrolling as I understand it is parallel execution via copies. Without that I might as well just pipeline right? 

Thanks!

0 Kudos
Xilinx Employee
Xilinx Employee
215 Views
Registered: ‎09-04-2017

Re: Problems with unrolling loops containing inner loops

Let me check if this can be enhanced. what's happening here is loop_A is getting unrolled and now the tool sees 4 copies of loop_B where the function is present. Let's say you have code like this

funcA()

 funcB();

 funcB();

}

HLS will try to have one instance and reuse. That's what is happening in this case

Thanks,

Nithin

0 Kudos
Scholar jprice
Scholar
208 Views
Registered: ‎01-28-2014

Re: Problems with unrolling loops containing inner loops

@nithink,

Thanks for looking into it and the suggested work arounds, I do appreciate it. For your reference I'm not necesairly looking for the behavior to change, so much as understand why this is intended (if it is intended) in the first place. Such understanding always helps me to better use the tools.

Side Note: I've always tend to be a power useful of such tools. This particular example is one where I think I'd like to be able to tune with a directive but I haven't found a way to do so yet.

0 Kudos
Moderator
Moderator
111 Views
Registered: ‎11-21-2018

Re: Problems with unrolling loops containing inner loops

Hi @jprice 

 

Do you have any update on this? 

 

Regards, 

 

Aoife
Product Application Engineer - Xilinx Technical Support EMEA
**~ Don't forget to reply, give kudos, and accept as solution.~**
0 Kudos