01-06-2017 08:10 AM
I am using Spartan 6 FPGA and ISE/EDK 14.7 on a Windows 7 64 bit machine.
In my software application the .text section of my code has gotten too big to hold in a single BRAM. I read the forums and found someone having a similar problem however their solution did not work for me.
I created 3 BRAMs of 64 KB each and made them contiguous in the address space. I am attaching the MHS, BMM and the linker script.
In the linker script I combined two BRAMs and then assigned the .text section to the combined space. However this did not work for me. If I have the code small enough to fit into a signle BRAM it does work. Its only when it starts to get bigger than 64KB that it doesnt. However I do not see any elf size errors in the compilation when I target the .text section into the combined memory space. Do I need to make changes to the BMM?
01-06-2017 08:45 AM
In your linker script, microblaze_1_i_bram_ctrl_microblaze_1_d_bram_ctrl is never used? In any case, I don't think you need to call out the different 64K segments as separate regions. To the linker they can all be treated as one 192K contiguous memory space, which will simplify your life.
A trick that can save some BRAM is to change how Xilinx drivers are compiled. Adding a directive like this to the PROCESSOR block in your .mss can help:
PARAMETER extra_compiler_flags = -DNDEBUG -g -ffunction-sections -fdata-sections
...then modify your program's link options to enable garbage collection (-Wl,--gc-sections). You may also want to enable garbage collection in your program's gcc and/or g++ options.
Including a barrel shifter in the Microblaze can also make a significant reduction in code size.
When I've built a 192K program I've found it necessary to lock down BRAMs via LOC constraints in order to ensure that the BMM maps correctly to BRAMs.
One way to debug this is to use XMD to see how memory is being mapped. If you 'dow' and run your program that way does it work? If you download a bitstream in which 'data2mem' has been used to burn the program, then XMD in, do you see data you expect in the 0x10000-0x1FFFF and 0x20000 - 0x2FFFF regions? Or does the data look scrambled or blank?
01-08-2017 07:59 PM
you need not modify the BMM by hand. Would that be possible to try different optimization while compiling the application?
you can also try to use xil_printf as compared to print statements.
01-09-2017 07:17 AM
Thanks for the info Steve. Much appreciated. Can you elaborate on the LOC comment. Do you mean I need to change this in the BMM File? or in the constraints file. Can you give me an example?
As you noticed I was not using microblaze_1_bram. This was on purpose. My initial thought was that I will concatenate 128 KB of BRAMs and keep the other 64 KB for other sections of the code. However when I did do this, the elf compiled successfully but I saw weird behaviour from bitgen. I saw some memory dump from the tool as shown below and the bitstream did not work.
125D1770: B0 F6 9D 12 00 00 00 00 00 00 00 00 00 00 00 00 ................
125D1780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
125D1790: 01 00 00 00 00 00 00 00 ........
I will try the dow method like you suggest however I do need the bitstream to have the code loaded eventually.
01-09-2017 08:00 AM
The last example I have of this was with the 8.2 tools, because they weren't capable of generating a BMM that properly reflected the BRAMs selected during place & route.
At that time, to get an "odd" (non power-of-two) sized memory region to work, I would build the FPGA, use FPGA Editor to see what BRAMs the tools had selected (and to determine how the constraints needed to be named), and update the files as follows (bram_block instance names lmb_bram0, lmb_bram1, lmb_bram2 - for 48 KiB total memory):
INST "lmb_bram0/lmb_bram0/ramb16_s4_s4_0" LOC=RAMB16_X1Y3;
INST "lmb_bram0/lmb_bram0/ramb16_s4_s4_1" LOC=RAMB16_X1Y1;
INST "lmb_bram0/lmb_bram0/ramb16_s4_s4_2" LOC=RAMB16_X0Y3;
INST "lmb_bram0/lmb_bram0/ramb16_s4_s4_3" LOC=RAMB16_X0Y1;
INST "lmb_bram0/lmb_bram0/ramb16_s4_s4_4" LOC=RAMB16_X1Y10;
INST "lmb_bram0/lmb_bram0/ramb16_s4_s4_5" LOC=RAMB16_X0Y10;
INST "lmb_bram0/lmb_bram0/ramb16_s4_s4_6" LOC=RAMB16_X0Y8;
INST "lmb_bram0/lmb_bram0/ramb16_s4_s4_7" LOC=RAMB16_X1Y8;
INST "lmb_bram1/lmb_bram1/ramb16_s4_s4_0" LOC=RAMB16_X1Y4;
INST "lmb_bram1/lmb_bram1/ramb16_s4_s4_1" LOC=RAMB16_X1Y0;
INST "lmb_bram1/lmb_bram1/ramb16_s4_s4_2" LOC=RAMB16_X0Y5;
INST "lmb_bram1/lmb_bram1/ramb16_s4_s4_3" LOC=RAMB16_X0Y2;
INST "lmb_bram1/lmb_bram1/ramb16_s4_s4_4" LOC=RAMB16_X1Y11;
INST "lmb_bram1/lmb_bram1/ramb16_s4_s4_5" LOC=RAMB16_X0Y11;
INST "lmb_bram1/lmb_bram1/ramb16_s4_s4_6" LOC=RAMB16_X0Y7;
INST "lmb_bram1/lmb_bram1/ramb16_s4_s4_7" LOC=RAMB16_X1Y7;
INST "lmb_bram2/lmb_bram2/ramb16_s4_s4_0" LOC=RAMB16_X1Y5;
INST "lmb_bram2/lmb_bram2/ramb16_s4_s4_1" LOC=RAMB16_X1Y2;
INST "lmb_bram2/lmb_bram2/ramb16_s4_s4_2" LOC=RAMB16_X0Y4;
INST "lmb_bram2/lmb_bram2/ramb16_s4_s4_3" LOC=RAMB16_X0Y0;
INST "lmb_bram2/lmb_bram2/ramb16_s4_s4_4" LOC=RAMB16_X1Y9;
INST "lmb_bram2/lmb_bram2/ramb16_s4_s4_5" LOC=RAMB16_X0Y9;
INST "lmb_bram2/lmb_bram2/ramb16_s4_s4_6" LOC=RAMB16_X0Y6;
INST "lmb_bram2/lmb_bram2/ramb16_s4_s4_7" LOC=RAMB16_X1Y6;
ADDRESS_MAP microblaze_0 MICROBLAZE 100
// Processor 'microblaze_0' address space 'lmb_bram0_combined' 0x00000000:0x0000BFFF (48 KB).
ADDRESS_SPACE lmb_bram0_combined COMBINED [0x00000000:0x0000BFFF]
// Address range 0x00000000:0x00003FFF (16 KB).
lmb_bram0/lmb_bram0/ramb16_s4_s4_0 [31:28] PLACED = X1Y3;
lmb_bram0/lmb_bram0/ramb16_s4_s4_1 [27:24] PLACED = X1Y1;
lmb_bram0/lmb_bram0/ramb16_s4_s4_2 [23:20] PLACED = X0Y3;
lmb_bram0/lmb_bram0/ramb16_s4_s4_3 [19:16] PLACED = X0Y1;
lmb_bram0/lmb_bram0/ramb16_s4_s4_4 [15:12] PLACED = X1Y10;
lmb_bram0/lmb_bram0/ramb16_s4_s4_5 [11:8] PLACED = X0Y10;
lmb_bram0/lmb_bram0/ramb16_s4_s4_6 [7:4] PLACED = X0Y8;
lmb_bram0/lmb_bram0/ramb16_s4_s4_7 [3:0] PLACED = X1Y8;
// Address range 0x00004000:0x00007FFF (16 KB).
lmb_bram1/lmb_bram1/ramb16_s4_s4_0 [31:28] PLACED = X1Y4;
lmb_bram1/lmb_bram1/ramb16_s4_s4_1 [27:24] PLACED = X1Y0;
lmb_bram1/lmb_bram1/ramb16_s4_s4_2 [23:20] PLACED = X0Y5;
lmb_bram1/lmb_bram1/ramb16_s4_s4_3 [19:16] PLACED = X0Y2;
lmb_bram1/lmb_bram1/ramb16_s4_s4_4 [15:12] PLACED = X1Y11;
lmb_bram1/lmb_bram1/ramb16_s4_s4_5 [11:8] PLACED = X0Y11;
lmb_bram1/lmb_bram1/ramb16_s4_s4_6 [7:4] PLACED = X0Y7;
lmb_bram1/lmb_bram1/ramb16_s4_s4_7 [3:0] PLACED = X1Y7;
// Address range 0x00008000:0x0000BFFF (16 KB).
lmb_bram2/lmb_bram2/ramb16_s4_s4_0 [31:28] PLACED = X1Y5;
lmb_bram2/lmb_bram2/ramb16_s4_s4_1 [27:24] PLACED = X1Y2;
lmb_bram2/lmb_bram2/ramb16_s4_s4_2 [23:20] PLACED = X0Y4;
lmb_bram2/lmb_bram2/ramb16_s4_s4_3 [19:16] PLACED = X0Y0;
lmb_bram2/lmb_bram2/ramb16_s4_s4_4 [15:12] PLACED = X1Y9;
lmb_bram2/lmb_bram2/ramb16_s4_s4_5 [11:8] PLACED = X0Y9;
lmb_bram2/lmb_bram2/ramb16_s4_s4_6 [7:4] PLACED = X0Y6;
lmb_bram2/lmb_bram2/ramb16_s4_s4_7 [3:0] PLACED = X1Y6;
01-09-2017 10:57 AM
Thanks Steve. So using the -DNDEBUG switch did reduce the size enough that I could fit the .text section into 1 64KB BRAM. You had stated that I could make the RAM continuous as 192 KB. I am guessing that is through the linker script?
On your UCF comment I am wondering if this is needed as the BMM file already has this albeit without location constraints. I am unable to connect why the LOC constraints are needed for the memory to be seen as continuous. Maybe I missed something?
01-09-2017 11:07 AM
Sorry - apples and oranges.
The UCF LOC constraints and BMM "PLACED" clauses are needed if the tools don't automatically generate a BMM that can be used by data2mem to load data into the proper BRAMs (so that you see what you expect if you use XMD 'mrd' / 'dis' commands to examine memory in a system that has booted the merged bitstream). Post back when you determine whether this is the case - for future reference, I'd like to know whether the tools have evolved enough in the last 10 years to make working with odd-sized LMB memory less painful.
The memory segments always form a contiguous memory space as far as the processor and the linker script are concerned. So you could edit your linker script to make microblaze_0_i_bram_ctrl_microblaze_0_d_bram_ctrl span the entire 192 KiB (and maybe have a snappier name :), eliminate the microblaze_1_... and microblaze_2_... definitions, and target all the sections to the 192K segment.