12-09-2010 09:12 AM
I designing a system which needs a very fast loop speed. I am using spartan3 xc3s400 fpga and I am using EDK for my project. My clock speed is 80 Mhz and when I design a very simple project, and I choose the microblaze clock speed as 80 Mhz. I am not using any external Ram. All the codes are in BRAMs.
Just I try to measure loop speed in microblaze and I measured it araound 2.5Mhz. I could not understand why it is too slow. Just I am setting a pin high and low as below.
XIo_Out32(XPAR_XPS_GPIO_0_BASEADDR + XGPIO_DATA_OFFSET, 0xFFFF);
When I run the following code just I add 16 bit shif, the loop sped goes to 2.05 Mhz. This operatin must take 1 cycle, so how can loop speed decrease 0.5 Mhz?
XIo_Out32(XPAR_XPS_GPIO_0_BASEADDR + XGPIO_DATA_OFFSET,LDAC);
XIo_Out32(XPAR_XPS_GPIO_0_BASEADDR + XGPIO_DATA_OFFSET,ALL);
By the way, when I generate bitstream at EDK, it gives an error that this design can be run 72Mhz with 80 Mhz clock. Is it normal.
Please help me ...
12-09-2010 09:26 AM
MicroBlaze(tm) uses one, or two, clock cycles to execute one assembly language instruction (I believe, you can verify this in the documentation for MicroBlaze).
Thus, a typical c program loop might have just three instructions (in c), but these might get compiled into 20 instructions in assembly language (go look at your raw assembly language binary code to get an actual count).
Thus, at 2 clocks per instruction, you now have 40 clcoks for a loop, so the loop speed is 2.5 MHz for a 100 MHz clock: perfectly normal!
Look at the raw code, read about how many clocks for each type of instruction, and prove to yourself you now understand.
12-09-2010 09:59 AM
Thank you very much for your reply but still I think there is some problem.
My clock cycle at 80 Mhz and my loop speed is 2.5 mhz. So there must be 32 cylce instruction for simple IO. When I add one cycle shift operation inti loop the speed decreases to 2.05 mhz, so the instruction number must be 39, but it is impossible. Because bitwise shift is only take when cycle according to microblaze documents.
by the way how can I see my C codes assembler instruction. I am new at EDK.
than you very much.
12-09-2010 10:13 AM
So how many assembly language instructions are in each loop you are testing?
12-09-2010 10:37 AM
it is a srange question but how can I see tahat how many assembly language instructions are in each loop. I am new at EDK. I am using 12.3 version.Do I need to use XMD.
12-09-2010 05:09 PM
This might be helpful:
http://forums.xilinx.com/xlnx/board/message?board.id=EDK&message.id=2276 (Object file that microblaze generates)
By the way, it is not good if your design doesn't meet timing. You may be getting lucky with PVT (process/voltage/temperature) margin and not see any issues yet, but I wouldn't bet on it long-term or for a production design. A soft processor is different than a hardened processor in this respect.
You may want to consider changing your MicroBlaze configuration, using a faster part, changing your implementation options (fast_runtime.opt in XPS), reducing your clock rate, floorplanning, etc.
http://www.xilinx.com/support/training/rel/timing-closure-flow.htm (Recorded Lecture: Timing Closure Flow)
http://www.xilinx.com/support/documentation/white_papers/wp331.pdf (Timing Closure 6.1i)
12-09-2010 05:12 PM
xmd also has a "dis" option.
You should may find something here useful:
http://www.xilinx.com/support/documentation/application_notes/xapp1037.pdf (Introduction to Software Debugging on Xilinx MicroBlaze Embedded Platforms)
12-10-2010 01:30 AM - edited 12-10-2010 01:37 AM
I was planning to repond to this thread but then I found three instances of the same question on the forums.
One here in "Embedded Processing", another in "Spartan Family" and a third in "EDK and Platform Studio".
I frankly don't know which one to reply to.
Why do you create 3 different threads with the same questions?
Why not only create one in the obvious place called "Embedded Processing"?
Under Embedded Processing on the top you can read "A board to discuss topics on Embedded IP cores inlcuding MicroBlaze, PPC, PLB, OPB, GPIO, UART, Ethernet, EMC controller, SDR/DDR Controller etc"
How do you want people to reply to your threads when they are at three places?
One random or all three at the same time?
PLEASE DON'T CREATE MULTIPLE THREADS ON THE SAME TOPIC!
If the latency to write to GPIO is 5 clock cycles (which is the ballpark if you are using PLB), then MicroBlaze can do this loop in 11 clock cycles when written in assembler.
PS. Not sure if I only should reply this in this thread or if I have to copied to the other two threads, sigh!
12-10-2010 02:35 AM
Thank you very much for your reply. I am too sory for my multiple threads. I am new at xilix forum. So I could not see the true section at first. I could not get answers so agan look at the forum and see more suitable section. I mean it is releated with my inattention. I apologise for it.
My clock is araound 80 Mhz. If my code section takes 11 cycle, the loop must be turn araound 7 Mhz isn't it.. My code is running at 2.5 Mhz.
I need to capture analog data with 2 Mhz sample rate. You know there are many code blocks to drive ADCs. So what I need to achieve high data rate. It is obivious that not possible to achieve such a high rate with microblaze isnt it?
Do I need to write all block with VHDL?
12-10-2010 05:38 AM
you asked: "Do I need to write all block with VHDL?"
In this case I think "Yes" is the answer to it.
Because that's what the FPGA stuff is all about.
Pushing fast and time critical stuff into hardware, and leaving the boring standard interface and protocol stuff for the processor.
Maybe you can tell us more about the ADC and what you intend to do with the data, once it is in the Processor?
Some hint. If you have a 8-Bit ADC and need to sample data at with 2MSPS then you can build a little FSM that triggers the sampling and pushes four consecutive values in a 32 Bit register, which can be read at once by the Processor at a fourth of the sampling rate. Of course reading is controlled by interupts that come from the FSM, so you won't miss any new data.
You see how this relaxes the time critical parts of the software?
It can be done in a similar way for multiple ADCs or ADCs with other word lenghts.
Other helpful approaches may include the use of FIFOs and FSL-interfaces. There may be many mor e solutions unknown to me.
About the assembler thing: It's not really an EDK issue. The C-compiler is a GNU compiler.
You can set options, that it doesn't delete (or explicitly write out) the source files that are fed into the assembler. (-s or -S if I remember it correctly) There you can count the number of assembler instructions and calculate runtimes.
Also, check your optimisation level( -O option). C-Compilers like to waste time with pushing and popping data to and from the stack.
When it comes to HW-controll a programmer is often forced to take a look at the assembler level.
Better get used to it, and learn how it works.
Have a nice synthesis
12-10-2010 05:54 AM
I would put a simple state machine that sample 4 bytes and put it on a fsl fifo.
MicroBlaze can at anytime read or check for new data using fsl instructions.
The overhead would be minimal.
If you have a 80 MHz clock and need to sample every 2 MHz, you first create a counter which counts from 39 downto 0.
This counter is clocked by the 80 MHz and every time the counter is 0, you set a signal to '1' and reload the counter to 39.
This signal is now the sample signal for your ADC.
You would then sample the 8 bit and write in into a 32-bit register on one byte position.
The byte position selected from a two-bit counter which is updated on every new sample.
This will create something that will built up a 32-bit register from four 8-bits.
When the register has got the four bytes, you drive the value onto the FSL bus and the FSL_Write to '1'.
The FSL bus can always be the value of the register and you just have to set FSL_WRITE to '1' every four bytes.
This is all you need from your hardware.
MicroBlaze can check availability on new data or just blocking until new data is available.
The fsl fifo can have a depth from 16 to 512, so you can let your software to either consume a nwe sample every 500khz rate or consume many data at a much lower rate. It all depends on what you want to do.
By letting the hardware handling the sampling, you will get a very precise sampling.
12-10-2010 06:12 AM
Agreed with Goran.
Let the hardware do what it is good at (e.g. determinitic sampling by clock cycle - this is very hard to control via processor pointer operations) and leverage the processor for its strengths.
This might be helpful. It is a bit dated but the concepts are still applicable:
http://www.xilinx.com/support/documentation/application_notes/xapp529.pdf (Connecting Customized IP to the MicroBlaze Soft Processor Using the Fast Simplex Link (FSL) Channel)
12-10-2010 11:35 AM
Hi, thank you very much for your all nice advices..
I am using SPI bus controlled ADC and DAC. All of it has 16 bit resolution. I am trying to make a phase locked loop.
I am implemented a simple DDS with 32 bit phase accumulator. I need to run my lock in loop around 300 khz. so I need to run my loop higher than 2 mhz to achieve it. Do you have any idea for my hardware design..
My algorithm for one step
generate a frequency from DDS
capture this signal with a 16 bit ADC
multiple this signal with a sign and cosine
apply an IIR based filter
that all :)))
with best regards..