cancel
Showing results for
Show  only  | Search instead for
Did you mean:
Visitor
518 Views
Registered: ‎07-23-2021

## Reading and Writing to Register in one clock

Hello,

I know that reading and writing from and into an array can not be implemented in one clock unless by reconstructing the code in some cases such histogram calculation Hist[i]=Hist[i] +1;

But what about registers ?!!!  Unfortunately, the second loop of the given code below can not achieve pipeline, II=1

``````void myF(float *ave)
{
unsigned char my_array[256]={0};
loop1: for (int i=0; i<256; i++)
{
my_array[i] = i;
}

float sum = 0;
loop2: for (int i=0; i<256; i++)
{
sum += my_array[i];
}

*ave= sum/256;
}``````

1 Solution

Accepted Solutions
Xilinx Employee
448 Views
Registered: ‎09-04-2017

@pouya_hwsw   The reason why II=1 is not achievable in this case is due to Float operation. Floating point addition takes few cycles to get the result, and so the next iteration has to wait for the result.

If you change the data type to integer, it should pipeline with II=1

Thanks,

Nithin

4 Replies
Mentor
500 Views
Registered: ‎06-20-2017

I am not sure why...that function looks like it would just return a constant.

*** Destination: Rapid design and development cycles *** Please remember to give internet points to those who help you here. ***
Teacher
478 Views
Registered: ‎05-11-2015

Yep, for an efficient pipeline (minimum II) you need to divide your task into chainable sub-tasks (outputs from one are inputs to the next).

- The addition of array into sum

- The calculation of average from sum

Because the addition is a binary operation, you add array[0] to sum, then array[1] and so on. So calculating sum takes about 256 times an addition time and that could be the problem. Unrolling the loop may not be possible for the dependency on sum.

A better approach for a faster addition is to add pairs of array[256] to make array_2[128], then add it pair-wise again to another array_3[64] and so on.

That will take only 8 times an addition time, but will use up to 128 adders in parallel. It's always "do you want it fast or cheap (small)?"

If that extreme solution is too big, you can combine shorter loops with a number of adders until a combination you are happy with.

That will improve the II. Obviously not the latency.

Teacher
477 Views
Registered: ‎05-11-2015

Yes, filling an array with values and reckoning its average is not of much use...

Let's assume it's for test purposes.

But another problem here is that @pouya_hwsw seems to be interested on the II and the array loading also takes some time, so the measure is not going to be the right one.

It would be better to include my_array with the inputs. Values are not needed for latency calculations.

Xilinx Employee
449 Views
Registered: ‎09-04-2017

@pouya_hwsw   The reason why II=1 is not achievable in this case is due to Float operation. Floating point addition takes few cycles to get the result, and so the next iteration has to wait for the result.

If you change the data type to integer, it should pipeline with II=1

Thanks,

Nithin