cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
aminfar1
Explorer
Explorer
5,030 Views
Registered: ‎01-09-2009

Vector processing using Microblaze

I am going to add a coprocessor to Microblaze using FSL to add some vector processing. The coprocessor gets a stream of data, does some operations, and returns a stream of results. I have seen some articles that say vector processing works in Microblaze. However, before writing any HDL code, I would like to know the caveats and pitfalls of vector processing using a coprocessor on Microblaze processor. For example, what should I and what should not I expect from vector processing using a coprocessor?

Since performance is really important in my application, I don't want to end up having a slow coprocessor. So, if any one has any experience on that issue that might enlighten me, I would be grateful to know.

Thanks,

Message Edited by aminfar1 on 09-30-2009 04:04 PM
0 Kudos
5 Replies
martinthompson
Explorer
Explorer
4,981 Views
Registered: ‎08-14-2007

One thought for you:

 

Your data has to pass through the microblaze - therefore, you have to use (at least) one clock cycle to write each vector value and one to read each return value (if it's just a vector summation, then there might only be one of those).

 

If you can make your "vector " accelerator interface directly to the data source, then you will improve matters, so that the microblaze doesn;t spend its whole time pushing data to and from FSL ports.

 

Cheers,
Martin

Martin Thompson
martin.j.thompson@trw.com
http://www.conekt.co.uk/capabilities/electronic-hardware
0 Kudos
aminfar1
Explorer
Explorer
4,970 Views
Registered: ‎01-09-2009

Martin,

 

Thanks for your good point. Can you elaborate it?  How can I make the "vector " accelerator interface directly to the data source? Do you mean by means something like DMA?

 

Thanks,

 

Amin Far

0 Kudos
martinthompson
Explorer
Explorer
4,967 Views
Registered: ‎08-14-2007

Ideally, the accelerator would get it's input data direct from the source, without going through the memory hierachy.  This is not always possible, in which case a DMA might be appropriate.

 

Can you tell more of the specifics of your application?

 

Cheers,

 

Martin

Martin Thompson
martin.j.thompson@trw.com
http://www.conekt.co.uk/capabilities/electronic-hardware
0 Kudos
aminfar1
Explorer
Explorer
4,938 Views
Registered: ‎01-09-2009

Martin,

 

Ok, I need to do a lot of matrix multiplications that matrices are usually large and are composed of either float (32 bits) or double (64 bits) or even 54-bit values. I wanted to have a coprocessor that gets array elements from the processor and returns the results. So, it gets arrays elements of A and B, and return C, C = A * B.

since I don't want to use the FPU of Xilinx (because I might have, for example, 54-bit values. So FPU cannot do anything for me), I wanted to design a custom matrix multiplier coprocessor that is capable of doing fast multiplication of matrices (there are plenty parallelism that I can take advantage of).

 

I hope this helps explain my problem. Thank you for you contribution and looking forward to hearing from you.

0 Kudos
martinthompson
Explorer
Explorer
4,928 Views
Registered: ‎08-14-2007

Funnily enough, I have just such a matrix multiply core (currently single precision, but it could be tweaked), for a standalone system that I am working on.  I'm sure it could be integrated into XPS with a bit of effort.

 

Contact me off list if you are interested...

 

Cheers,

 

Martin

Martin Thompson
martin.j.thompson@trw.com
http://www.conekt.co.uk/capabilities/electronic-hardware
0 Kudos