09-30-2009 03:59 PM - edited 09-30-2009 04:04 PM
I am going to add a coprocessor to Microblaze using FSL to add some vector processing. The coprocessor gets a stream of data, does some operations, and returns a stream of results. I have seen some articles that say vector processing works in Microblaze. However, before writing any HDL code, I would like to know the caveats and pitfalls of vector processing using a coprocessor on Microblaze processor. For example, what should I and what should not I expect from vector processing using a coprocessor?
Since performance is really important in my application, I don't want to end up having a slow coprocessor. So, if any one has any experience on that issue that might enlighten me, I would be grateful to know.
10-09-2009 02:41 AM
One thought for you:
Your data has to pass through the microblaze - therefore, you have to use (at least) one clock cycle to write each vector value and one to read each return value (if it's just a vector summation, then there might only be one of those).
If you can make your "vector " accelerator interface directly to the data source, then you will improve matters, so that the microblaze doesn;t spend its whole time pushing data to and from FSL ports.
10-11-2009 10:15 PM
Thanks for your good point. Can you elaborate it? How can I make the "vector " accelerator interface directly to the data source? Do you mean by means something like DMA?
10-12-2009 03:32 AM
Ideally, the accelerator would get it's input data direct from the source, without going through the memory hierachy. This is not always possible, in which case a DMA might be appropriate.
Can you tell more of the specifics of your application?
10-15-2009 04:03 PM
Ok, I need to do a lot of matrix multiplications that matrices are usually large and are composed of either float (32 bits) or double (64 bits) or even 54-bit values. I wanted to have a coprocessor that gets array elements from the processor and returns the results. So, it gets arrays elements of A and B, and return C, C = A * B.
since I don't want to use the FPU of Xilinx (because I might have, for example, 54-bit values. So FPU cannot do anything for me), I wanted to design a custom matrix multiplier coprocessor that is capable of doing fast multiplication of matrices (there are plenty parallelism that I can take advantage of).
I hope this helps explain my problem. Thank you for you contribution and looking forward to hearing from you.
10-16-2009 12:33 AM
Funnily enough, I have just such a matrix multiply core (currently single precision, but it could be tweaked), for a standalone system that I am working on. I'm sure it could be integrated into XPS with a bit of effort.
Contact me off list if you are interested...