UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Observer hchen213
Observer
3,100 Views
Registered: ‎04-08-2016

Large Matrix Tranpose

Hello All,

 

I have a matrix that is about 10k by 10k in size, and it is transferred from the computer to the PS-side DDR. What would be the most efficient way to transpose it? 

 

I believe DMA would not be needed, because we can not utilize the burst feature. Is there any other tools that allows fast data transfer between different DDR memory locations?

 

Thanks

0 Kudos
5 Replies
Scholar hbucher
Scholar
3,093 Views
Registered: ‎03-22-2016

Re: Large Matrix Tranpose

@hchen213

Do you need to transpose it? Because transposition is very costly while a frequent operation, most matrix libraries would just set a flag to indicate that the matrix is transposed - relative to their position in memory. 

When it comes to using the matrix, say matrix vs vector multiplication, two algos are used to multiply it - one for the normal matrix and one for the transposed. Same for element access etc.

This way you avoid entirely a costly operation with the tradeoff of a few more bytes of code.

 

That said, You could create a component in HLS to perform this operation and connect it through a full AXI interface and connect it to the slave HP port on the ZYNQ. 

vitorian.com --- We do this for fun. Always give kudos. Accept as solution if your question was answered.
I will not answer to personal messages - use the forums instead.
Observer hchen213
Observer
3,036 Views
Registered: ‎04-08-2016

Re: Large Matrix Tranpose

@hbucher

 

Right now, I am using the data mover IP on the PL side to read and write to PS DDR memory through the High Performance AXI connection. I can get a decent data transfer rate by using a 128 bit bus width with a large burst size. However, If I have to read the transposed matrix out, I can only read one element at a time, then increment the DDR address and read the next one, this would slow down the data transfer rate dramatically. 

 

I am wondering if there is any better tools that I am not aware of for this kind of operation? 

0 Kudos
Scholar hbucher
Scholar
3,031 Views
Registered: ‎03-22-2016

Re: Large Matrix Tranpose

The HP ports are very slow if compared to DDR speed - some 10x slower
I wonder if you transpose with the ARM it would be faster.
Or perhaps move your heavy stuff/matrices to PL side DDR.
vitorian.com --- We do this for fun. Always give kudos. Accept as solution if your question was answered.
I will not answer to personal messages - use the forums instead.
0 Kudos
Observer hchen213
Observer
3,020 Views
Registered: ‎04-08-2016

Re: Large Matrix Tranpose

@hbucher

Thanks for your input.

 

I am using the ZCU102 evaluation board, and the PL-DDR is not large enough, also MIG takes quite bit of space in the PL. 

 

I did tested using ARM for transpose, and it was faster than I expected. But I am not sure if it is fast enough for my application.

 

There are 4 HP ports, would using all four of them give me a higher data rate?

0 Kudos
Scholar hbucher
Scholar
3,016 Views
Registered: ‎03-22-2016

Re: Large Matrix Tranpose

@hchen213

Yes, I think so. 

Have a look at this SDK performance manual

https://www.xilinx.com/support/documentation/sw_manuals/xilinx2017_2/ug1145-sdk-system-performance.pdf

Chapter 7 "Evaluating High-Performance Ports" shows many statistics about using all ports simultaneously

This video is also a good overview

https://www.xilinx.com/video/soc/accelerating-system-performance-zynq.html

vitorian.com --- We do this for fun. Always give kudos. Accept as solution if your question was answered.
I will not answer to personal messages - use the forums instead.