UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Explorer
Explorer
1,363 Views
Registered: ‎06-17-2012

How to read long sequential char stream efficiently?

Jump to solution

Hi, all,

 

I have a data flow design which consists of 6 pipeline stages. Most of the pipeline stages have gmem ports accessing the DDR.

In my first pipeline stage, I need to read a very long char stream as follows. In the second stage, I can only process the depth_stream one by one. However, the overall memory bandwidth utilization is low (less than 10%).  

static void stage0(
        const char *depth,
        hls::stream<char> &depth_stream,
        int vertex_num)
{
    read_depth: for (int i = 0; i < vertex_num; i++){
    #pragma HLS pipeline II=1
        depth_inspect_stream << depth_for_inspect[i];
    }
}

I understand that the char stream results in low memory utilization, so I try to access memory with 512bit data width. Although, the following stream will not be fast enough to process 512bit with II=1. I expect this may still improve the memory bandwidth utilization eventually. Here is my modified code with a cache structure.

 

 

static void bfs_stage0(
        const int512_dt *depth,
        hls::stream<char> &depth_stream,
        int vertex_num)
{
    // 6 bit offset, 4 bit index, 22 bit tag
    int512_dt cache_data[16];
    uint22_dt cache_tag[1] = {0x3FFFFF};

    stage0_main:
    for (uint32_dt i = 0; i < vertex_num; i++){
        uint6_dt offset = i.range(5, 0);
        uint4_dt index = i.range(9, 6);
        uint22_dt tag = i.range(31, 10);
        uint22_dt _tag = cache_tag[0];
        uint26_dt addr = i.range(31, 6);
        int512_dt word = cache_data[index];

        if(tag == _tag){
            depth_inspect_stream << word.range((offset + 1) * 8 - 1, offset * 8);
        }
        else{
            for(int j = 0; j < 16; j++){
            #pragma HLS pipeline
                cache_data[j] = depth[addr + j];
            }
            cache_tag[0] = tag;
            depth_stream << cache_data[0].range(7, 0);
        }
}
                             

However, the bandwidth is even lower and the performance gets worse.

I don't quite understand the reason of the performance drop.

Suggestions will be appreciated.

 

Regards,

Cheng Liu

Tags (2)
0 Kudos
1 Solution

Accepted Solutions
Xilinx Employee
Xilinx Employee
1,889 Views
Registered: ‎07-18-2014

Re: How to read long sequential char stream efficiently?

Jump to solution

hi @liucheng,

I also came across the same requirement and handled using multiple modules using dataflow. Please see below code if it help:

 

void gmem_read(const int512_dt* in, hls::stream<int512_dt> &outStream, int input_size)
{
    int buffer_size = 16;
    int512_dt buffer[16];
    in sizein512 = (input_size -1)/64 + 1;
    for (int i = 0 ; i < sizein512; i+=16){
    int chunk_size = 16;
    if (i+16> sizein512) chunk_size = sizein512-i;
    mrd1:for (int j = 0 ; j < chunk_size ;j++){
    #pragma HLS PIPELINE
        buffer[j] = in[i+j];
     }
     mrd2:for (int j = 0 ; j < chunk_size ; j++) {
      #pragma HLS PIPELINE
      outStream << buffer[j];
     }
  }
}

static void stage0(
   hls::stream<int512_dt> inStream512,
   hls::stream<char> &depth_stream,
   int vertex_num)
{
  int512_dt tmpValue;
  read_depth: for (int i = 0; i < vertex_num; i++){
  #pragma HLS pipeline II=1
  int idx = i % 64 ;
   if (idx == 0) tmpValue = inStream512.read();
   depth_stream << tmpValue.range((idx+1)*8-1,idx*8);
}
}

static void stage0_modified(
const int512_dt *depth,
hls::stream<char> &depth_stream,
int vertex_num)
{
   hls::stream<int512_dt> inStream512;
   #pragma HLS STREAM variable=inStream512 depth=32

   #pragma HLS dataflow
   gmem_read(depth,inStream512,vertex_num); 
   stage0(inStream512,depth_stream,vertex_num); 
}

View solution in original post

0 Kudos
5 Replies
Moderator
Moderator
1,325 Views
Registered: ‎03-27-2012

Re: How to read long sequential char stream efficiently?

Jump to solution

Hi Cheng,

 

Have you tried partitioning cache_data?

Since it's an array it will likely to be mapped into a single BlockRAM and become bottleneck of performance.

 

Regards,

Sean

-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
0 Kudos
Explorer
Explorer
1,293 Views
Registered: ‎06-17-2012

Re: How to read long sequential char stream efficiently?

Jump to solution

Hi, Sean,

 

Thanks for the reply.

I have already set the data width of the 'cache' array to be 512bit,

partitioning the cache produces even larger data width and it will not

reduce the memory access latency. Instead, the timing gets worse.

 

What I am actually expecting is a pre-fetch buffer.

As I know the data will be sequentially accessed, I can do perfect pre-fetching in logic.

Here, I thus opted to pre-fetch with 16x512bit burst granularity which utilizes the

memory bandwidth efficiently. However, the major problem is that I can't parallelize 

the pre-fetch logic and the regular cache access logic. Take the code as an example.

The if-part of the code can be very fast but the else part of the code is slow due to the 16x512bit memory access.

 

The for loop will not proceed until the the else is completed. As a result, II of the overall design is large.

This explains the bad performance of the code compared to the simple char stream. So the problem

is essentially implementing a good pre-fetch logic in HLS. Currently, I notice that setting the cache size to be 1x512bit will do the trick. It is just that the bandwidth utilization remains low. After all,  it issues just 512bit memory request one by one.

 

Regards,

Cheng Liu

 

 

0 Kudos
Moderator
Moderator
1,275 Views
Registered: ‎03-27-2012

Re: How to read long sequential char stream efficiently?

Jump to solution

Hi Cheng,

 

Can you attached the log file of HLS csyn? 

Normally it is named "solution_OCL_REGION_0.log" and can be found at:

_xocc_compile_xxxx\impl\kernels\<kernel>\<kernel>\solution_OCL_REGION_0\

 

Regards,

Sean

-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
0 Kudos
Xilinx Employee
Xilinx Employee
1,890 Views
Registered: ‎07-18-2014

Re: How to read long sequential char stream efficiently?

Jump to solution

hi @liucheng,

I also came across the same requirement and handled using multiple modules using dataflow. Please see below code if it help:

 

void gmem_read(const int512_dt* in, hls::stream<int512_dt> &outStream, int input_size)
{
    int buffer_size = 16;
    int512_dt buffer[16];
    in sizein512 = (input_size -1)/64 + 1;
    for (int i = 0 ; i < sizein512; i+=16){
    int chunk_size = 16;
    if (i+16> sizein512) chunk_size = sizein512-i;
    mrd1:for (int j = 0 ; j < chunk_size ;j++){
    #pragma HLS PIPELINE
        buffer[j] = in[i+j];
     }
     mrd2:for (int j = 0 ; j < chunk_size ; j++) {
      #pragma HLS PIPELINE
      outStream << buffer[j];
     }
  }
}

static void stage0(
   hls::stream<int512_dt> inStream512,
   hls::stream<char> &depth_stream,
   int vertex_num)
{
  int512_dt tmpValue;
  read_depth: for (int i = 0; i < vertex_num; i++){
  #pragma HLS pipeline II=1
  int idx = i % 64 ;
   if (idx == 0) tmpValue = inStream512.read();
   depth_stream << tmpValue.range((idx+1)*8-1,idx*8);
}
}

static void stage0_modified(
const int512_dt *depth,
hls::stream<char> &depth_stream,
int vertex_num)
{
   hls::stream<int512_dt> inStream512;
   #pragma HLS STREAM variable=inStream512 depth=32

   #pragma HLS dataflow
   gmem_read(depth,inStream512,vertex_num); 
   stage0(inStream512,depth_stream,vertex_num); 
}

View solution in original post

0 Kudos
Explorer
Explorer
1,208 Views
Registered: ‎06-17-2012

Re: How to read long sequential char stream efficiently?

Jump to solution

Hi, @heeran,

 

Thanks for the suggestions.

I have tried your strategy in my design, it works great.

 

Regards,

Cheng Liu

0 Kudos