cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Observer
Observer
1,302 Views
Registered: ‎11-02-2017

Synthesis can't finish

Jump to solution

I want to transpose a rgb picture,so I write a function.Simulation is successful,when I synthesis the code,it can't finish after an hour.So I want to how to solve,the picture and window size is 256*256.

Here is my function.

void Transpose (RGB_IMAGE& src,RGB_IMAGE& dst,int rows,int cols)
{
 WD win0,win1,win2;
 uchar  pixel;
 rgb_data pix;
 for(int row=0;row<rows;row++)
  for(int col=0;col<cols;col++)
  {
   src >> pix;
   win0.insert(pix.val[0],col,row);
   win1.insert(pix.val[1],col,row);
   win2.insert(pix.val[2],col,row);
  }

 for(int row=0;row<rows;row++)
  for(int col=0;col<cols;col++)
  {
   pix.val[0]=win0.getval(row,col);
   pix.val[1]=win1.getval(row,col);
   pix.val[2]=win2.getval(row,col);
   dst << pix;
  }
}

Thanks

0 Kudos
1 Solution

Accepted Solutions
Highlighted
Advisor
Advisor
1,479 Views
Registered: ‎04-26-2015

The problem is that you're asking HLS for a set of 196608 (256*256*3) 8-bit registers, with three 65536-to-1 multiplexers and three 1-to-65536 demultiplexers.

 

For reference, I have previously had HLS build a 1024-to-1 mux and 1-to-1024 demux simply as a way of wasting space on the FPGA (wanted to verify performance when it was almost full). If I remember correctly, that block took something like 20% of a Zynq 7045. What you're asking for is vastly more difficult.

 

I expect that HLS is getting bogged-down building huge multiplexers. As @rosa_bpc has said, hls:Window is designed for small spaces; 3x3 or 5x5 would be common. A 9-to-1 or 25-to-1 mux is not a huge piece of hardware.

 

You have two basic options:

 

(1) Get rid of the hls::Windows and use a simple block RAM buffer instead. 256*256*8-bit*3 will still need 96 BRAM_18K blocks, which is pretty substantial - but it's definitely something that HLS will be able to achieve.

 

(2) Do a block-wise transpose. This requires semi-random access to either input or output images (or both), so an AXI Master is generally used. You then read in small (eg. 32x32) blocks, transpose those (32*32*8-bit*3 is only going to need three block RAMs) and write them out, rather than doing the whole image at once.

 

View solution in original post

5 Replies
Highlighted
Contributor
Contributor
1,269 Views
Registered: ‎03-13-2017

Hello. I would like to know what is WD. I propose you this solution:

 

void Transpose (RGB_IMAGE& src,RGB_IMAGE& dst,int rows,int cols)
{
#pragma HLS INTERFACE axis port=src 
#pragma HLS INTERFACE axis port=dst 

#pragma HLS INTERFACEap_none port=rows
#pragma HLS INTERFACE ap_none port=cols

 

    WD win0,win1,win2;
    rgb_data pix;

 

   for(int row=0;row<rows;row++)

  {

#pragma HLS loop_flatten off
#pragma HLS PIPELINE II=1
      for(int col=0;col<cols;col++)
     {

#pragma HLS loop_flatten off
#pragma HLS PIPELINE II=1


          pix =src.read();
   
          win0.insert(pix.val[0],col,row);
          win1.insert(pix.val[1],col,row);
          win2.insert(pix.val[2],col,row);

   

         dst.write(pix); 
    }
  }
}

 

--------------------------------------------------------------------------------------------
Please mark the post as an answer "Accept as solution" in case it helped resolve your query.
Give kudos in case a post in case it guided to the solution.

0 Kudos
Highlighted
Observer
Observer
1,260 Views
Registered: ‎11-02-2017
WD is defined as hls::window,size is 256*256.So I consider this may be the problem .
0 Kudos
Highlighted
Contributor
Contributor
1,251 Views
Registered: ‎03-13-2017
Why do you need WD? The usually size of hls::window is 3x3

--------------------------------------------------------------------------------------------
Please mark the post as an answer "Accept as solution" in case it helped resolve your query.
Give kudos in case a post in case it guided to the solution.
0 Kudos
Highlighted
Observer
Observer
1,243 Views
Registered: ‎11-02-2017
I want to transpose a rgb picture,the picture need to store .So I use a window,the size is the same as the picture.
0 Kudos
Highlighted
Advisor
Advisor
1,480 Views
Registered: ‎04-26-2015

The problem is that you're asking HLS for a set of 196608 (256*256*3) 8-bit registers, with three 65536-to-1 multiplexers and three 1-to-65536 demultiplexers.

 

For reference, I have previously had HLS build a 1024-to-1 mux and 1-to-1024 demux simply as a way of wasting space on the FPGA (wanted to verify performance when it was almost full). If I remember correctly, that block took something like 20% of a Zynq 7045. What you're asking for is vastly more difficult.

 

I expect that HLS is getting bogged-down building huge multiplexers. As @rosa_bpc has said, hls:Window is designed for small spaces; 3x3 or 5x5 would be common. A 9-to-1 or 25-to-1 mux is not a huge piece of hardware.

 

You have two basic options:

 

(1) Get rid of the hls::Windows and use a simple block RAM buffer instead. 256*256*8-bit*3 will still need 96 BRAM_18K blocks, which is pretty substantial - but it's definitely something that HLS will be able to achieve.

 

(2) Do a block-wise transpose. This requires semi-random access to either input or output images (or both), so an AXI Master is generally used. You then read in small (eg. 32x32) blocks, transpose those (32*32*8-bit*3 is only going to need three block RAMs) and write them out, rather than doing the whole image at once.

 

View solution in original post