cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
vytautas
Explorer
Explorer
8,654 Views
Registered: ‎10-01-2007

FPlibrary and MicroBlaze

Jump to solution

Hello.

Several days I work on MicroBlaze + FPlibrary project. My goal is to create FSL pcore which executes arithmetic operations from MicroBlaze. I try at first the log operation.

I have downloaded FPlibrary. All functions inside this library use FP format (exn[2]+S[1]+E[8]+F[23] = 34 bits). Traditional PC or MicroBlaze use IEEE764 standard (without exn[2] bits). FPlibrary includes two conversion functions IEEE754_To_FP and FP_To_IEEE754. So, now my project looks like this: FSL-> IEEE754_To_FP-> LOG ->FP_To_IEEE754->FSL . Unfortunately, I get value (with MicroBlaze) always -inf.I don't know why but during one test value was correct, second test without changes was -inf again...I think this is synchro problem because FPlibrary functions do not have such signals like "new data" and "ready output".

 

I attach all vhdl files of this project. The top file is top_log.vhd. FSM is very simple and state Calculate is used just for delay...

pkd_ieee754_log.vhd includes required 3 components

What could you advice to do, how to sychronize FSL and this pcore?

Thank you!

Best Regards,
Vytautas
0 Kudos
1 Solution

Accepted Solutions
vytautas
Explorer
Explorer
5,993 Views
Registered: ‎10-01-2007

Irregular resource utilization of my project was solved. The main problem were signals arrays of size 32x256. So, my suggestion is do not use arrays but FIFO instead.

 

Best Regards,
Vytautas

View solution in original post

0 Kudos
13 Replies
goran
Xilinx Employee
Xilinx Employee
8,639 Views
Registered: ‎08-06-2007

Hi,

 

It will be fairly easy to find the issue in simulation.

Have you tried to simulate? 

I would create a small assembler program that executes some FSL instruction. 

 

Göran

0 Kudos
vytautas
Explorer
Explorer
8,636 Views
Registered: ‎10-01-2007

Hello, Göran.

Thank you for your answer. Actually I have simulated only with ISE simulator, but this is pretty simple and not very visual solution. Should I write script for modelsim which executes simulation?What do you mean exaclty "small assambler program"?

I didn't find information on internet how actually communicates MicroBlaze fsl_read/write functions with FSL bus signals (e.g. if I write to fsl bus character from C application, FSL_S_EXISTS, FSL_S_Data signals are activated and so on...). 

What would you do?

Thank you

Best Regards,
Vytautas
0 Kudos
goran
Xilinx Employee
Xilinx Employee
8,629 Views
Registered: ‎08-06-2007

Hi,

 

Look at the FSL signals so you can see what MicroBlaze writes to the FSL and what MicroBlaze reads from the FSL.

 

I usually create a small software application in XPS which is a pure assembler file and then set this application to "Mark to Initialize BRAMs".

I set "other compiler options" for that project to "-nostartfiles", this will remove the whole C initialization code which is unnecessary when executing pure assembler code.

I also set the program start to 0x0 since that is the address that MicroBlaze will start to execute after reset.

 

My assembler file usually start with this:

 

.global _start
_start: 
 bri main

 .org 0x50
main:
 addik r19,r0, 0x80000000

 

 

If you look at MicroBlaze trace signals, you will see the execution of your assembler program.

Trace_Valid_Instr:           A instruction is executed

Trace_PC:                      The PC address for the executed instruction

Trace_Instruction:           The opcode
Trace_Reg_Write:          The instruction wrote to the register file

Trace_Reg_Addr:           The register that was written

Trace_New_Reg_Value: The value that was written to the register file

 

There are plenty more trace signals and you can find more information about all these trace signals in the microblaze reference guide.

 

Göran

vytautas
Explorer
Explorer
8,568 Views
Registered: ‎10-01-2007

Hello.

Thank you  Göran for your detailed answer. I'll use your advice in future projects. About mentioned project: actually I have implemented yesterday designed system. Very strangesituation is with resources of FPGA. I use ML402 with SX35 FPGA. My project consist of microblaze, SDRAM, sysACE, 3 timers, 3 GPIO IPs, uartlite and custom_model. This model consist of 5 Coregen modules (flaoting point FFT, FPO-adder,2 x FPO-multiplier, FPO-sqrt), FPlibrary (IEEE754_2_FP, Log, FP_2_IEEE754). These components are synchronized together and with FSL using 25 states FSM. Code is pretty simple, but don't understand why it takes almost all resources of FPGA! Compiliation of system continues about 5-6 hours...

Used number of Slices is: 13487 (87%);

use number of Slice Flip Flops: 15821 (51%).

Does my FSM takes so much resources? I have created one simple project which calculated logarithm of input value of FSL. Components are from FPlibrary. Utilization of logics is as fallows:

Slices: 1152,

Slices Flip Flops: 1364 .

It's a lot of, but total project takes much more spaces. Looks like FSM takes many resources...

Should I maybe share top.vhd code?

Thank you!

Message Edited by vytautas on 02-11-2010 01:47 PM
Best Regards,
Vytautas
0 Kudos
vytautas
Explorer
Explorer
8,561 Views
Registered: ‎10-01-2007

I have generated the same code without FPlibrary components and utilization of resource is not more than 40%... Bellow I pasted my created VHDL code (it connects converter->log->converter). Maybe here is misstake which does this failure...:

 

begin nA_1 <= log_fsl_in; --converter from IEEE754 to Floating Point format ieee754_2_fp_a : IEEE754_To_FP generic map ( wE => wE, wF => wF) port map ( nA => nA_1, nd => log_fsl_nd, nR => nA2_1); wreg1 : if reg = true generate process(clk) begin if clk'event and clk='1' then nA2_2 <= nA2_1; end if; end process; end generate; --Logarithm funcion's component log_clk_1 : Log_Clk generic map ( wE => wE, wF => wF, reg => reg, SzSlk => 0 ) port map ( nA => nA2_2, nR => nR1_1, clk => clk); wreg2 : if reg = true generate process(clk) begin if clk'event and clk='1' then nR2_1 <= nR1_1; end if; end process; end generate; --converter from Floating Point to IEEE754 format fp_2_ieee754_a : FP_To_IEEE754 generic map ( wE => wE, wF => wF) port map ( nA => nR2_1, nR => nR_3, rdy => log_fsl_rdy); log_fsl_out <= nR_3; end Behavioral;

 

 

 

Best Regards,
Vytautas
0 Kudos
vytautas
Explorer
Explorer
8,543 Views
Registered: ‎10-01-2007

Hello.

Actually logarithmic function and converters I used from libhdlfltp open source library:

http://www.newsag.com/software/open-source/id_54533/.

Source can be downloaded from http://libhdlfltp.sourceforge.net 

Best Regards,
Vytautas
0 Kudos
ajjc@optngn.com
Visitor
Visitor
8,493 Views
Registered: ‎02-22-2010
Vytautas,
Pertaining to your questions about the libhdlfltp library.

2/19/2010
I tried to send you an email answer to your personal email, but your server kept bouncing my email as "dynamic IP".
So, I'll answer you here.

At first glance, there might be a problem in some aspect of usage, but I can't be sure until I do more work.

Looking at Section 4 of:
http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=1FBC5AE872371D6698710D7984655F82?doi=10.1.1.96.4736&rep=rep1&type=pdf
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.96.4736

Seems as if the (8,23) Logarithm, using 18×18 multipliers, as reported in the paper, should take ~830 slices on a Virtex II.
(Not sure if that is pipelined or not, but it should be less than ~1000, even pipelined, according to their tables.

I'll give it a second look over the weekend by actually synthesizing the log_clk (8,23) component
and sharing the results with you. I'm assuming that the (8,23) log is the one you are synthesizing.

2/22/2010
Addendum:
I modified norm.vhd to compute log(a**2 + b**2)
and added to the compile order list..

See the below table for the results

I got way few resources used than you did.
This design took 3 BRAMs, which are needed
for the table part of the log.
Did your synthesis end up with BRAMs ?
If not, that could easily account for the difference.

I've enclosed the mods I made by zipping up the project directory FYI.

alan

--
Alan Coppola mailto:ajjc@optngn.com phone: 503-781-0083
OptNgn Software http://www.optngn.com fax: 866-448-6575


------------ Resource usage table for log(a**2 + b**2) -----------------------------

Device utilization summary:
---------------------------

Selected Device : 3s1500fg456-4

Number of Slices: 1971 out of 13312 14%
Number of Slice Flip Flops: 2012 out of 26624 7%
Number of 4 input LUTs: 3027 out of 26624 11%
Number used as logic: 2845
Number used as Shift registers: 182
Number of IOs: 103
Number of bonded IOBs: 101 out of 333 30%
Number of BRAMs: 3 out of 32 9%
Number of MULT18X18s: 24 out of 32 75%
Number of GCLKs: 1 out of 8 12%
0 Kudos
vytautas
Explorer
Explorer
8,480 Views
Registered: ‎10-01-2007

Hello Alan.

Thank You for very detailed answer.

I'll try to show situation from my side.  You can find my top vhd file of design and Module level utilization generated by ISE 11.4. Top file is the wrapper of FSL bus and very simple FSM.

Bellow are summary tables of 3 ISE projects with utilization of resources:

1-full, 2-without LOG, 3 - only log

 

So, as far as I understand 3th project gives almost the same results as you showed. Map report gives result utilization of 3 BRAMs, but summary report doesn't include any BRAM:

Module BRAM/FIFO

t_0        1/1

t_1        1/1

t_1        1/1

 

The biggest problem is the final project (FFT IP core, 4 x FPO modules and fplib/log):

 

Number of Slices: 13136 out of 15360 85%
Number of Slice Flip Flops: 15254 out of 30720 49%
Number of 4 input LUTs: 21320 out of 30720 69%
Number used as logic: 19468
Number used as Shift registers: 828
Number used as RAMs: 1024
Number of IOs: 74
Number of bonded IOBs: 73 out of 448 16%
Number of FIFO16/RAMB16s: 12 out of 192 6%
Number used as RAMB16s: 12
Number of GCLKs: 1 out of 32 3%
Number of DSP48s: 28 out of 192 14%

 

 FFT with FPO modules and without LOG utilization:

 

Number of Slices: 3707 out of 15360 24%
Number of Slice Flip Flops: 5711 out of 30720 18%
Number of 4 input LUTs: 4852 out of 30720 15%
Number used as logic: 4120
Number used as Shift registers: 732
Number of IOs: 74
Number of bonded IOBs: 73 out of 448 16%
Number of FIFO16/RAMB16s: 9 out of 192 4%
Number used as RAMB16s: 9
Number of GCLKs: 1 out of 32 3%
Number of DSP48s: 28 out of 192 14%

 

 Utilization of resources only with LOG (fplibrary)

Number of Slices: 1152 out of 15360 7%
Number of Slice Flip Flops: 1364 out of 30720 4%
Number of 4 input LUTs: 2097 out of 30720 6%
Number used as logic: 2005
Number used as Shift registers: 92
Number of IOs: 74
Number of bonded IOBs: 73 out of 448 16%
Number of FIFO16/RAMB16s: 3 out of 192 1%
Number used as RAMB16s: 3
Number of GCLKs: 1 out of 32 3%

 

 So, 2th and 3th designs were compiled very fast, but "connected together" thay take huge number of resources. Maybe FSM has some mistakes and does such issue. Please compare results at 3 different tables.

Result is not like LOG+FFT_FPO == FULL_PROJECT...

You can find some screenshots in attached file

Message Edited by vytautas on 02-23-2010 01:34 PM
Best Regards,
Vytautas
0 Kudos
ajjc@optngn.com
Visitor
Visitor
8,456 Views
Registered: ‎02-22-2010

 Vytautas,

Sorry for the delay in response. Next week will be better for me.

 The biggest ting I can see is that the FFT takes ~5700 flip flops and the log takes ~1300 flip flops,

 yet the whole design takes ~15000 flip flops.

 

 Synthesis tools don't create them easily. It could be a memory expansion or some other expansion

 that is done automatically.  Find whre those 8000 flip flops come from, and you've got something to configure and address!

 I'll look at your files as soon as I can.  You're project is a good one, so keep at it!

 alan

 

0 Kudos
vytautas
Explorer
Explorer
5,188 Views
Registered: ‎10-01-2007

Hello Alan.

You are right FFT takes ~5700 flip flops, log takes ~1300 flip flops. together should be ~7000 flip flops, but top design as you say is huge because of strange issue. This is probably some loopback generation or some kind of automatization which I don't understand now and try to solve it.

Regards,

Vytautas

Best Regards,
Vytautas
0 Kudos
ajjc@optngn.com
Visitor
Visitor
5,182 Views
Registered: ‎02-22-2010

 

Vytautas,

 

Ok, I typically go into the RTL Viewer and look at where the FF's are. 

I noticed that each left-hand-side assignment in your FSM is not being fully specified in each branch of an "if-then-else" 

 or "case-others" statement. This can result in extra latches. 

Also, if the fpu_{mul,add,sqrt} are not configured correctly, you'll see too many ff/latches in the RTL viewer. 

 If you still have trouble, I can get you our Periodogram demo code, which does a similar thing without the tie in to the FSL bus.

alan 

vytautas
Explorer
Explorer
5,176 Views
Registered: ‎10-01-2007

Hello Alan.

Thank You very much for your advice. left-hand side of FFT IP core (inputs signals) could be actually problem of utilized resources (FF, LUTs, Slices).

I'm going to investigate my code very intently.

Also, about synchronization signals of elements in FPlibrary. How do you know actually time when module is ready for new data input and when data are processed? For example I use Xilinx FPO IP core with new_data input and ready output signals. With high rdy I read result and with high new_data I put input. To synchronize FFT and LOG modules I need to know mentioned parameters, but LOG doesn't have such. So, I just insert additional case branch with "if" condition which implements delay of 10 cycles. This is only guess...

Thank you very much

Best regards

Vytautas

Best Regards,
Vytautas
0 Kudos
vytautas
Explorer
Explorer
5,994 Views
Registered: ‎10-01-2007

Irregular resource utilization of my project was solved. The main problem were signals arrays of size 32x256. So, my suggestion is do not use arrays but FIFO instead.

 

Best Regards,
Vytautas

View solution in original post

0 Kudos