We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for
Did you mean:
Highlighted
Visitor
14,175 Views
Registered: ‎04-25-2010

## Please help me reduce my program size - so I can get more bss ( stack & heap )

Hi,

since I'm on a deadline (university) I would REALLY appreciate some help with reducing my program size and making it faster. I'm doing some complicated matrix computations on a microblaze ( XUPV5-LX110T - board ) like singular value decomposition, etc. on matrices with several hundred float values. It took me quite some time to produce a working code in c , so I am under pressure to get it working on a microblaze (the next guy will then build some accelerators in VHDL) to speed things up.

The program looks somewhat like this:

```#include "xparameters.h"

#include "stdlib.h"

#include "stdio.h"

#include "xutil.h"

extern void mbMatMul(float* Mat1, float* Mat2, float* ResMat, int m1, int n1, int n2);

extern void mbprintfloatMat(float* Mat, int mRow, int nCol, char titel[32], int target, int limit);

void mbprintfloatMat(float* Mat, int mRow, int nCol, char titel[32], int target, int limit) {

target = 0;
int Rowlim = mRow;
int Collim = nCol;

if(limit != 0) {
if(limit < mRow)
Rowlim = limit;
if(limit < nCol)
Collim = limit;
} // else no limit

xil_printf("\r\nPrint: %s\r\n", titel);
int i,j, prepoint, postpoint;
float floatvalue;
for (int i = 0; i < Rowlim; i++) {
for (int j = 0; j < Collim; j++) {
floatvalue = Mat[i*nCol + j];
prepoint = floatvalue;
postpoint = (floatvalue - prepoint)*1000000;
if(prepoint >= 0) xil_printf(" ");
if(postpoint != 0) {
xil_printf("%d.%6d ", prepoint, postpoint);
} else {
xil_printf("%d.000000 ", prepoint, postpoint);
}
}
xil_printf("\r\n");
}
}

void mbMatMul(float* Mat1, float* Mat2, float* ResMat, int m1, int n1, int n2) {

int m2 = n1;
float* temp_Mat1 = (float*)malloc(m1 * n1 * sizeof(float));
float* temp_Mat2 = (float*)malloc(m2 * n2 * sizeof(float));
float akku;

int i,j;
for(i = 0; i < m1*n1; i++)
temp_Mat1[i] = Mat1[i];
for(j = 0; j < m2*n2; j++)
temp_Mat2[j] = Mat2[j];

int n, m, index;
for(n = 0; n < n2; n++) {
for(m = 0; m < m1; m++) {
akku = 0.0f;
for(index = 0; index < n1; index++) {
akku = akku + temp_Mat1[m*n1 + index]*temp_Mat2[index*n2 + n];
}
ResMat[m*n2 + n] = akku;
}
}

free(temp_Mat1);
free(temp_Mat2);
}

int main (void) {

print("-- Entering Main --\r\n");

int m1 = 12;
int n1 = 3;
int m2 = n1;
int n2 = 4;

float *Mat1 = (float*)malloc(m1 * n1 * sizeof(float));
if(Mat1)
print("Mat1 created\n\r");

float *Mat2 = (float*)malloc(m2 * n2 * sizeof(float));
if(Mat2)
print("Mat2 created\n\r");

int i;
for(i = 0; i < m1*n1; i++) {
Mat1[i] = i / 5.0f;
}
for(i = 0; i < m1*n1; i++) {
Mat2[i] = i * 1.4f;
}

mbprintfloatMat(Mat1, m1, n1, "Mat1", 0, 0);
mbprintfloatMat(Mat2, m2, n2, "Mat2", 0, 0);

float *ResMat = (float*)malloc(m1 * n2 * sizeof(float));
if(ResMat)
print("ResMat created\n\r");

mbMatMul(Mat1, Mat2, ResMat, m1, n1, n2);

mbprintfloatMat(ResMat, m1, n2, "ResMat", 0, 0);

print("-- Leaving Main --\r\n");   free(Mat1);
free(Mat2);
free(ResMat);

return 0;

}

```

There is a lot more where this came from :-)

My memory usage looks like this:

text    data     bss     dec     hex filename
49260    1340   12388   62988    f60c TestApp_Memory/executable.elf

That is with Compiler Optimisation for size.

There is almost no space left in my 64K of BRAM, but I need a lot more stack and heap to perform the matrix calculations (about 7K stack and 27K heap). I suspect the text memory usage is huge because I do everything in C instead of using more of the microblaze stuff.

I need urgent help getting my program to run on this microblaze. Some possibly viable options:

1) Increasing BRAM. I saw sth about adding an extra controller and BRAM. How to? Possible on this board?

2) Reducing text - size. Examples please!

3) Also you can tell me everything else I could do better.

4) In my real code I use math.h functions fabs, powf, logf and sqrtf. I know thats bad and maybe I will use e.g. lookup-tables instead. Could this make a crucial difference regarding text - size?

1 Solution

Accepted Solutions
Visitor
12,379 Views
Registered: ‎04-25-2010

## Re: Please help me reduce my program size - so I can get more bss ( stack & heap )

Mea maxima culpa!

3 things I did wrong:

- One of my free() calls was in a loop that was never called :smileysad: After putting it where it belongs malloc() and free() work like a charm.

- I automatically replaced all printf with xil_printf, but I kind of missed one. It bloated my code size enormously.

- I was once calling a logf()-function, this too bloated the code and slowed my program to a crawl. I replaced it with an approximation.

Not that these 3 things are fixed my programm runs forever and much faster. Thanks to you helpers!

Next question is:

- How fast is rand() ?

- How do I set up profiling? I just want to record cycles and save them to an array. Is there a way to do it without adding a hardware timer (XPS timer) on microblaze?

- If not is there a complete guide somewhere on how to set this up?

12 Replies
Visitor
14,166 Views
Registered: ‎04-25-2010

## Re: Please help me reduce my program size - so I can get more bss ( stack & heap )

Ok, thanks to the HOWTO increase BRAM thread my program is now running :-) But even 64K heap seems to be too little to run at full specs. But my raw data should never be more than 30K. And I can run my program only once. Strange!

Is it possible that free() doesn't work properly? Do I have to set back the heap pointer in a different way?

And yes, help regarding how to improve my code is still very welcome!

Message Edited by chriskit on 04-25-2010 08:04 PM
Explorer
14,158 Views
Registered: ‎07-27-2009

## Re: Please help me reduce my program size - so I can get more bss ( stack & heap )

Just a few quickies:

• used -O2 or something equivalent? This will dramatically reduce code size and increase speed
• your math mult function starts with a malloc+mem copy. No need for that; the source matrices remain untouched and the target you better malloc outside the code
• try to help the compiler by keeping matrix index calculation invariants outside loops.

for i

for j

res = input[i*c+j]

write

tmp = 0;

for i

tmp2 = tmp;

for j

res = input[tmp2]

tmp2++;

tmp += c;

printf and float are both very good at increasing code size. Maybe you can dump the results as binary or hex of the raw memory contents? Assuming you have the print attached to a UART or something, use code on the host PC to visualize the results.

Cheers,

Johan

Contributor
14,155 Views
Registered: ‎02-12-2009

## Re: Please help me reduce my program size - so I can get more bss ( stack & heap )

The quickest way is to add more BRAM (of course). I assign code and data to two different BRAM blocks. Just add another BRAM block to your hardware design from EDK, and click Generate custom linker script from SDK, then you can assign all code to one bram block and all data to another.

Have you enabled the hardware floating point unit on the microblaze. That will both make your code faster, and you will save code space by not using the floating point library as much. Afair, if you set it to extended, it also includes fsqrt and such.

You could try the compiler option -ffast-math. It relaxes the IEEE floating point rules a bit, but makes the code faster and smaller.

Depending on the amount of hardware you use, you could rewrite some of the Xilinx drivers to only use the features you need. The Xilinx drivers were too bloated for my application, so i rewrote them.

Last, you could recompile the standard c libraries (newlib) that Xilinx use, but that only if you are really needing space.

Visitor
14,137 Views
Registered: ‎04-25-2010

## Re: Please help me reduce my program size - so I can get more bss ( stack & heap )

Thanks a lot for the help woutersj and nfogh!

- I tried different Compiler Optimisations and I get the smallest code size for -Os (no surprise)

- I copy the matrix at the beginning of MatMult because I actually have cases where matMult(Mat1, Mat2, Mat1, 3,3,3); but I can of course create a temporary matrix outside the function for these cases.

- Thanks for the example. Keeping matrix indent calculations outside the loop could speed things up :-) I will do that.

- The xil_printf() calls are only for debugging. The result of the calculations will be written to ddr.

- I attached another BRAM block and added 2 controllers. I increased ilmb and dlmb to the full range. Would a different configuration (e.g. only one extra controller for data) make more sense speedwise?

- I enabled the FPU, but I have to check if the sqrt engine is also on. Do you know how much ressources the FPU takes regarding BRAM and multipliers w/wo the sqrt extensions?

- I will try the compiler option -ffast-math and see if precision stays acceptable.

- Rewriting the libraries. Sounds like a good idea, but probably takes some time, that I don't have!

My biggest problem right now is the huge memory needs of the system. I calculated the memory needs by hand e.g. for a 300x9xsizeof(float) matrix I calculated 10.8KByte and with the current specs I should have less than 20 KByte memory usage for heap. But in reality it is close to 60 KB and if I try to run more than once (putting the functon call in a for loop) the system crashes. I strongly suspect I'm doing sth wrong when using free(). Or do you think that shouldn't be a problem?

If I don't find a solution soon I will resort to using arrays of predefined sizes and see how far I can go with that.

I'm also looking for a way to approximate log x / log y.

Thank you,

Chris

Contributor
14,131 Views
Registered: ‎02-12-2009

## Re: Please help me reduce my program size - so I can get more bss ( stack & heap )

I'm not sure about what your application is, but if you are afraid of malloc and free, you could just declare the variables in the code. like

float Mat1[m1][n1];

float Mat2[m2][n2];

void main (void)

{ ...

and the same in the function

float temp_mat[max_entries];

void mbMatMul(...

Just ensure that temp_mat is big enough to hold what you throw at the function. It's not pretty, but it works :)

The reason I declared them out of the function itself, is that they then get allocated on the stack, and for large max_entries, that will probably cause stack overflow problems.

Many of the libraries are pretty easy to rewrite. But if you don't use many peripherals, it is probably not worth the effort. I have some code for uart16550, mbox, spi and timers if you are using any of them.

I have a version of the GNU toolchain, which has been compiled with newlib with the '-Os' optimization setting. It scraped a few KBs off my code size. It is for Ubuntu Linux though. And you cannot debug with it at all (gdb doesn't work).

I think the best solution would be to maybe make a separate bram block for your heap, and then generate a linker script to put your heap into that bram block. If you have enough memory for it, it is by far the easiest solution.

Visitor
14,124 Views
Registered: ‎04-25-2010

## Re: Please help me reduce my program size - so I can get more bss ( stack & heap )

Of course my memory usage will be even larger if I go for arrays of pre-defined sizes. My real concern is that I cannot call my main function twice, because it runs out of memory the second time. Since all the malloc()s happen inside the main function, the reason could be either leaks (I checked for those) or that free is not working properly or some other problem I haven't thought of yet.

If someone could tell me: "I use malloc on microblaze the same way you do:

```float* Mat = (float*)malloc(m * n * sizeof(float));
//do sth
free(Mat);```

and it works for me!" Then I could exclude one possible reason for running out of memory.

Explorer
14,118 Views
Registered: ‎07-27-2009

## Re: Please help me reduce my program size - so I can get more bss ( stack & heap )

Did you have a look at the linker scripts? Maybe the heap/stack settings are incorrect.

I guess you can be pretty sure that malloc/free work if you have set them up correctly.

Cheers,

Johan

Explorer
14,117 Views
Registered: ‎07-27-2009

## Re: Please help me reduce my program size - so I can get more bss ( stack & heap )

Forgot to mention: use objdump to inspect your object for large data objects and big functions.

Visitor
14,091 Views
Registered: ‎04-25-2010

## Re: Please help me reduce my program size - so I can get more bss ( stack & heap )

I don't even get that far!

I rewrote the whole program using non-dynamic fixed-size arrays and now I get:

region ilmb_cntlr_dlmb_cntlr is full (TestApp_Memory/executable.elf section .stack)

So I reduced the size of my arrays like this:

```#define MAXPOINTS 100

int main()
{
float array[4*MAXPOINTS]; //size 4*100*4Byte = 1600 Bytes
//do sth

return 0;
}
```

calculated like this I should be using about 20KBytes of memory now. I have increased the stack size to 64KBytes

```_STACK_SIZE = DEFINED(_STACK_SIZE) ? _STACK_SIZE : 0xFFFF;
_HEAP_SIZE = DEFINED(_HEAP_SIZE) ? _HEAP_SIZE : 0x1000;

/* Define Memories in the system */

MEMORY
{
ilmb_cntlr_dlmb_cntlr : ORIGIN = 0x00000050, LENGTH = 0x0001FFB0
}```

but it still refuses to build my .elf

`tools/xilinx/ISE_EDK/10.1/EDK/gnu/microblaze/lin64/bin/../lib/gcc/microblaze-xilinx-elf/4.1.1/../../../../microblaze-xilinx-elf/bin/ld.real: region ilmb_cntlr_dlmb_cntlr is full (TestApp_Memory/executable.elf section .stack)`

What am I doing wrong?

Contributor
7,275 Views
Registered: ‎02-12-2009

## Re: Please help me reduce my program size - so I can get more bss ( stack & heap )

EDK is complaining about that it doesn't have enough BRAM to allocate stack. You use 64KB bram for your stack, which is a whole block.If you decrease your stack size, you might make it fit.

Could you try to do the following.

1. Attach 2 BRAM blocks to your microblaze, 64KB each

2. Go to SDK, synchronize with hardware and click Generate linker script

3. Under "assign all code sections to", select the first BRAM block

4. Under "assign all data sections to", select the second BRAM block

5. Set the stack size and heap size to a suitable amount (enough to hold your data)

This should give you 64K for just data, and 64K for code (.text). Ensure that you have enough heap/stack space for your variables if you allocate them dynamically.

Visitor
12,380 Views
Registered: ‎04-25-2010

## Re: Please help me reduce my program size - so I can get more bss ( stack & heap )

Mea maxima culpa!

3 things I did wrong:

- One of my free() calls was in a loop that was never called :smileysad: After putting it where it belongs malloc() and free() work like a charm.

- I automatically replaced all printf with xil_printf, but I kind of missed one. It bloated my code size enormously.

- I was once calling a logf()-function, this too bloated the code and slowed my program to a crawl. I replaced it with an approximation.

Not that these 3 things are fixed my programm runs forever and much faster. Thanks to you helpers!

Next question is:

- How fast is rand() ?

- How do I set up profiling? I just want to record cycles and save them to an array. Is there a way to do it without adding a hardware timer (XPS timer) on microblaze?

- If not is there a complete guide somewhere on how to set this up?

Visitor
7,245 Views
Registered: ‎04-25-2010

## Re: Please help me reduce my program size - so I can get more bss ( stack & heap )

After spending way to much time trying to figure out all the steps needed to add a xps_timer for profiling, here is a mini - tutorial ( I'm using EDK 10.1.03) . Don't know if it is all necessary, but it works:

- Add the xps_timer by double clicking XPS Timer/Counter under IP Catalog > DMA and Timer

- System Assembly View > Bus Interfaces > xps_timer_0 > SPLB = mb_plb

- System Assembly View > Adresses > xps_timer_0 > Size 64KB        mb_plb Size U

- System Assembly View > Adresses > Lock all the adresses that must not change

- Open Software Platform settings > OS and Libs > enable_software_intrusive_profiling true + profile_timer xps_timer_0

- Open Software Platform Settings > Drivers > xps_timer tmrctr

- Project > Double Click MHS File add missing parameters to xps_timer (Interrupt, c_count_width, etc.) Make it look like this:

BEGIN xps_timer
PARAMETER INSTANCE = xps_timer_0
PARAMETER C_FAMILY = virtex5
PARAMETER C_COUNT_WIDTH = 32
PARAMETER HW_VER = 1.00.a
BUS_INTERFACE SPLB = mb_plb
PORT Interrupt = xps_timer_0_Interrupt
END

BEGIN microblaze
PARAMETER INSTANCE = microblaze_0
PARAMETER HW_VER = 7.10.d
PARAMETER C_USE_FPU = 2
PARAMETER C_DEBUG_ENABLED = 1
PARAMETER C_FAMILY = virtex5
PARAMETER C_INSTANCE = microblaze_0
BUS_INTERFACE DPLB = mb_plb
BUS_INTERFACE IPLB = mb_plb
BUS_INTERFACE DEBUG = microblaze_0_dbg
BUS_INTERFACE DLMB = dlmb
BUS_INTERFACE ILMB = ilmb
PORT MB_RESET = mb_reset
PORT INTERRUPT = xps_timer_0_Interrupt
END

- Applications > Set Compiler Options>Paths and Options > Other Compiler Options to Append Add -pg

- Add to main.c:  #include "xtmrctr_l.h"

- Add to function you would like to profile:

unsigned long tic1, tic2, dur;

//Measure

// do sth

dur = (tic1-tic2) * 1000 / XPAR_CPU_CORE_CLOCK_FREQ_HZ; //result in ms

- Synthesize (BRAM INIT)

Actually I don't know if you have to divide by XPAR_CPU_CORE_CLOCK_FREQ_HZ or XPAR_MICROBLAZE_CORE_CLOCK_FREQ_HZ .

Maybe one of the gurus can explain which one is applicable.