UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
498 Views
Registered: ‎02-18-2018

SDSoC Unable to Pipeline the Loop

Hello everyone,

I am trying to accelerate a rectangular matrix-vector multiplication application using SDSoC based on the matrix-matrix multiplication example in the tool. The mmult function that I want to accelerate on hardware is this:

 

 

#include <stdio.h>
#include <stdlib.h>
#include "mmultadd.h"

void mmult (float A[N*M], float B[M], float C[N])
{
     float Bbuf[N];
#pragma HLS array_partition variable=Bbuf complete dim=1

     BBUF_LOOP: for(int i=0; i<M; i++) {
#pragma HLS PIPELINE
    	 Bbuf[i] = B[i];
     }

     L1: for (int i = 0; i < N; i++) {
#pragma HLS PIPELINE
	float result = 0;
	L2: for (int k = 0; k < M; k++) {
		float term = (*(A + i*M + k)) * Bbuf[k];
		result += term;
	}
	C[i] = result;
     }
}

   when I compile the code with N=M=256 (or 128) everything is as expected and the L1 loop is pipelined as expected with Initiation Interval(II)=M=256. However with same code and just changing M and N to 512, the tool is unable to pipeline the second loop any more. The only improvement done is that it performs a multiply and an add operation concurrently, but I want to pipeline the L2 loop completely with II=1. Here is a part of the HLS report for function mmult:

 

INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [HLS 200-42] -- Implementing module 'mmult'
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [SCHED 204-11] Starting scheduling ...
INFO: [SCHED 204-61] Pipelining loop 'BBUF_LOOP'.
INFO: [SCHED 204-61] Pipelining result : Target II = 1, Final II = 1, Depth = 2.
INFO: [SCHED 204-61] Pipelining loop 'L1'.
WARNING: [SCHED 204-68] Unable to enforce a carried dependence constraint (II = 1, distance = 1, offset = 1)
   between fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81) and fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81).
WARNING: [SCHED 204-68] Unable to enforce a carried dependence constraint (II = 2, distance = 1, offset = 1)
   between fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81) and fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81).
WARNING: [SCHED 204-68] Unable to enforce a carried dependence constraint (II = 3, distance = 1, offset = 1)
   between fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81) and fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81).
WARNING: [SCHED 204-68] Unable to enforce a carried dependence constraint (II = 4, distance = 1, offset = 1)
   between fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81) and fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81).
WARNING: [SCHED 204-68] Unable to enforce a carried dependence constraint (II = 130, distance = 1, offset = 1)
   between fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81) and fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81).
WARNING: [SCHED 204-68] Unable to enforce a carried dependence constraint (II = 193, distance = 1, offset = 1)
   between fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81) and fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81).
WARNING: [SCHED 204-68] Unable to enforce a carried dependence constraint (II = 225, distance = 1, offset = 1)
   between fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81) and fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81).
WARNING: [SCHED 204-68] Unable to enforce a carried dependence constraint (II = 241, distance = 1, offset = 1)
   between fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81) and fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81).
WARNING: [SCHED 204-68] Unable to enforce a carried dependence constraint (II = 249, distance = 1, offset = 1)
   between fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81) and fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81).
WARNING: [SCHED 204-68] Unable to enforce a carried dependence constraint (II = 253, distance = 1, offset = 1)
   between fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81) and fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81).
WARNING: [SCHED 204-68] Unable to enforce a carried dependence constraint (II = 255, distance = 1, offset = 1)
   between fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81) and fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81).
WARNING: [SCHED 204-68] Unable to enforce a carried dependence constraint (II = 256, distance = 1, offset = 1)
   between fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81) and fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81).
INFO: [SCHED 204-61] Unable to satisfy pipeline directive: Unable to pipeline the region.
WARNING: [SCHED 204-71] Latency directive discarded for region mmult since it contains subloops.

The loop implementation results are also like this:

 

loop implementation

Also I tried another version of the code with putting the PIPELINE directive on the L2 loop. The modified code is :

#include <stdio.h>
#include <stdlib.h>
#include "mmultadd.h"

void mmult (float A[N*M], float B[M], float C[N])
{
     float Bbuf[N];
#pragma HLS array_partition variable=Bbuf complete dim=1

     BBUF_LOOP: for(int i=0; i<M; i++) {
#pragma HLS PIPELINE
    	 Bbuf[i] = B[i];
     }

     L1: for (int i = 0; i < N; i++) {
	float result = 0;
	L2: for (int k = 0; k < M; k++) {
#pragma HLS PIPELINE
		float term = (*(A + i*M + k)) * Bbuf[k];
		result += term;
	}
	C[i] = result;
     }
}

Because the L2 loop is accessing the A and Bbuf variables sequentially, I expected it to be pipelined with II=1, but again after the implementation, the achieved Initial Interval was 5 (the latency of a floating point add operation). A part of the HLS compilation log is:

INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [HLS 200-42] -- Implementing module 'mmult'
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [SCHED 204-11] Starting scheduling ...
INFO: [SCHED 204-61] Pipelining loop 'BBUF_LOOP'.
INFO: [SCHED 204-61] Pipelining result : Target II = 1, Final II = 1, Depth = 2.
INFO: [SCHED 204-61] Pipelining loop 'L2'.
WARNING: [SCHED 204-68] Unable to enforce a carried constraint (II = 1)
   between 'fadd' operation ('result', /home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81) and 'fadd' operation ('result', /home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81).
WARNING: [SCHED 204-68] Unable to enforce a carried constraint (II = 2)
   between 'fadd' operation ('result', /home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81) and 'fadd' operation ('result', /home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81).
WARNING: [SCHED 204-68] Unable to enforce a carried constraint (II = 3)
   between 'fadd' operation ('result', /home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81) and 'fadd' operation ('result', /home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81).
WARNING: [SCHED 204-68] Unable to enforce a carried constraint (II = 4)
   between 'fadd' operation ('result', /home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81) and 'fadd' operation ('result', /home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81).
INFO: [SCHED 204-61] Pipelining result : Target II = 1, Final II = 5, Depth = 11.
WARNING: [SCHED 204-71] Latency directive discarded for region mmult since it contains subloops.
INFO: [SCHED 204-11] Finished scheduling.

The loop implementation result is:

loop implementation

So what should I do to reach II=1 for the inner loop for large matrices?

 

 

 

0 Kudos