cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Observer
Observer
873 Views
Registered: ‎02-18-2018

SDSoC Unable to Pipeline the Loop

Hello everyone,

I am trying to accelerate a rectangular matrix-vector multiplication application using SDSoC based on the matrix-matrix multiplication example in the tool. The mmult function that I want to accelerate on hardware is this:

 

 

#include <stdio.h>
#include <stdlib.h>
#include "mmultadd.h"

void mmult (float A[N*M], float B[M], float C[N])
{
     float Bbuf[N];
#pragma HLS array_partition variable=Bbuf complete dim=1

     BBUF_LOOP: for(int i=0; i<M; i++) {
#pragma HLS PIPELINE
    	 Bbuf[i] = B[i];
     }

     L1: for (int i = 0; i < N; i++) {
#pragma HLS PIPELINE
	float result = 0;
	L2: for (int k = 0; k < M; k++) {
		float term = (*(A + i*M + k)) * Bbuf[k];
		result += term;
	}
	C[i] = result;
     }
}

   when I compile the code with N=M=256 (or 128) everything is as expected and the L1 loop is pipelined as expected with Initiation Interval(II)=M=256. However with same code and just changing M and N to 512, the tool is unable to pipeline the second loop any more. The only improvement done is that it performs a multiply and an add operation concurrently, but I want to pipeline the L2 loop completely with II=1. Here is a part of the HLS report for function mmult:

 

INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [HLS 200-42] -- Implementing module 'mmult'
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [SCHED 204-11] Starting scheduling ...
INFO: [SCHED 204-61] Pipelining loop 'BBUF_LOOP'.
INFO: [SCHED 204-61] Pipelining result : Target II = 1, Final II = 1, Depth = 2.
INFO: [SCHED 204-61] Pipelining loop 'L1'.
WARNING: [SCHED 204-68] Unable to enforce a carried dependence constraint (II = 1, distance = 1, offset = 1)
   between fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81) and fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81).
WARNING: [SCHED 204-68] Unable to enforce a carried dependence constraint (II = 2, distance = 1, offset = 1)
   between fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81) and fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81).
WARNING: [SCHED 204-68] Unable to enforce a carried dependence constraint (II = 3, distance = 1, offset = 1)
   between fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81) and fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81).
WARNING: [SCHED 204-68] Unable to enforce a carried dependence constraint (II = 4, distance = 1, offset = 1)
   between fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81) and fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81).
WARNING: [SCHED 204-68] Unable to enforce a carried dependence constraint (II = 130, distance = 1, offset = 1)
   between fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81) and fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81).
WARNING: [SCHED 204-68] Unable to enforce a carried dependence constraint (II = 193, distance = 1, offset = 1)
   between fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81) and fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81).
WARNING: [SCHED 204-68] Unable to enforce a carried dependence constraint (II = 225, distance = 1, offset = 1)
   between fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81) and fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81).
WARNING: [SCHED 204-68] Unable to enforce a carried dependence constraint (II = 241, distance = 1, offset = 1)
   between fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81) and fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81).
WARNING: [SCHED 204-68] Unable to enforce a carried dependence constraint (II = 249, distance = 1, offset = 1)
   between fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81) and fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81).
WARNING: [SCHED 204-68] Unable to enforce a carried dependence constraint (II = 253, distance = 1, offset = 1)
   between fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81) and fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81).
WARNING: [SCHED 204-68] Unable to enforce a carried dependence constraint (II = 255, distance = 1, offset = 1)
   between fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81) and fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81).
WARNING: [SCHED 204-68] Unable to enforce a carried dependence constraint (II = 256, distance = 1, offset = 1)
   between fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81) and fifo read on port 'A' (/home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81).
INFO: [SCHED 204-61] Unable to satisfy pipeline directive: Unable to pipeline the region.
WARNING: [SCHED 204-71] Latency directive discarded for region mmult since it contains subloops.

The loop implementation results are also like this:

 

loop implementation

Also I tried another version of the code with putting the PIPELINE directive on the L2 loop. The modified code is :

#include <stdio.h>
#include <stdlib.h>
#include "mmultadd.h"

void mmult (float A[N*M], float B[M], float C[N])
{
     float Bbuf[N];
#pragma HLS array_partition variable=Bbuf complete dim=1

     BBUF_LOOP: for(int i=0; i<M; i++) {
#pragma HLS PIPELINE
    	 Bbuf[i] = B[i];
     }

     L1: for (int i = 0; i < N; i++) {
	float result = 0;
	L2: for (int k = 0; k < M; k++) {
#pragma HLS PIPELINE
		float term = (*(A + i*M + k)) * Bbuf[k];
		result += term;
	}
	C[i] = result;
     }
}

Because the L2 loop is accessing the A and Bbuf variables sequentially, I expected it to be pipelined with II=1, but again after the implementation, the achieved Initial Interval was 5 (the latency of a floating point add operation). A part of the HLS compilation log is:

INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [HLS 200-42] -- Implementing module 'mmult'
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [SCHED 204-11] Starting scheduling ...
INFO: [SCHED 204-61] Pipelining loop 'BBUF_LOOP'.
INFO: [SCHED 204-61] Pipelining result : Target II = 1, Final II = 1, Depth = 2.
INFO: [SCHED 204-61] Pipelining loop 'L2'.
WARNING: [SCHED 204-68] Unable to enforce a carried constraint (II = 1)
   between 'fadd' operation ('result', /home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81) and 'fadd' operation ('result', /home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81).
WARNING: [SCHED 204-68] Unable to enforce a carried constraint (II = 2)
   between 'fadd' operation ('result', /home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81) and 'fadd' operation ('result', /home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81).
WARNING: [SCHED 204-68] Unable to enforce a carried constraint (II = 3)
   between 'fadd' operation ('result', /home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81) and 'fadd' operation ('result', /home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81).
WARNING: [SCHED 204-68] Unable to enforce a carried constraint (II = 4)
   between 'fadd' operation ('result', /home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81) and 'fadd' operation ('result', /home/mhg/Desktop/FPGA_MM/SDx/MM/mmultadd/src/mmult.cpp:81).
INFO: [SCHED 204-61] Pipelining result : Target II = 1, Final II = 5, Depth = 11.
WARNING: [SCHED 204-71] Latency directive discarded for region mmult since it contains subloops.
INFO: [SCHED 204-11] Finished scheduling.

The loop implementation result is:

loop implementation

So what should I do to reach II=1 for the inner loop for large matrices?

 

 

 

0 Kudos
0 Replies