cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Mentor
Mentor
300 Views
Registered: ‎10-07-2011

Image transpose

Jump to solution

Hi folks,

I have data coming in as 2350 rows of 160 pixels each. I'm getting the rows (160 pixels) one at a time. I am processing each row and I need to store the data until all 2350 rows have been processed. Then, I need to processed that stored image column-wise hence, 160 columns of 2350 pixels each.

Is there a way to transpose the image? Or is there a way to perform the column-wise part of the processing efficiently?

I was thinking of maybe use a VDMA with a stride of 160. Write a data, jump 160 locations, write another data and so on. The second row would start one-location right to the first and so on. But that would be quite inefficient on the AXI bus...

Any idea?

Claude

0 Kudos
1 Solution

Accepted Solutions
Highlighted
Mentor
Mentor
114 Views
Registered: ‎10-07-2011

@u4223374

Thanks for your comment. That was the missing link! I was considering adding large off-chip SRAM but you made me realize I can indeed process the data in chunks. So 160x50 is indeed a great idea as this is small enough to reside in on-chip BRAM. And 50, as opposed to 1, is enough to bring the overall AXI bus efficiency back to acceptable figures.

I'm planning on using a couple (ping-pong) dual-port memory arrays. On the input-side, the address will be computed such that the incoming data is written to the "transposed" memory location. On the output side, the VDMA will be used to store each of the 160x50 into DDR (with a stride parameter of 2350), and retrieve the whole 160x2350 when available.

All in all, this is perfectly clear and no longer an issue.

Thanks again!

View solution in original post

3 Replies
Highlighted
Moderator
Moderator
190 Views
Registered: ‎11-09-2015

Hi @chevalier 

This was already discussed on the forums few years ago:

https://forums.xilinx.com/t5/Processor-System-Design-and-AXI/Video-rotation-using-axi-vdma/m-p/447788/highlight/true#M11676

The AXI VDMA does not support this. So yes I think your idea is a good way to go


Florent
Product Application Engineer - Xilinx Technical Support EMEA
**~ Don't forget to reply, give kudos, and accept as solution.~**
0 Kudos
Highlighted
Advisor
Advisor
180 Views
Registered: ‎04-26-2015

I suggest reading in maybe a 160x50 section (store it in block RAM), and then write that out to DDR as a 50x160 section. Block RAM doesn't care about access order, it'll be just as fast either way. That way the DDR only has to deal with linear writes and linear reads.

While the first block is being written to DDR, you can be filling a second block RAM from the input.

 

Once they've all finished, you can do linear 2350-pixel reads from DDR, which will remain very efficient.

Highlighted
Mentor
Mentor
115 Views
Registered: ‎10-07-2011

@u4223374

Thanks for your comment. That was the missing link! I was considering adding large off-chip SRAM but you made me realize I can indeed process the data in chunks. So 160x50 is indeed a great idea as this is small enough to reside in on-chip BRAM. And 50, as opposed to 1, is enough to bring the overall AXI bus efficiency back to acceptable figures.

I'm planning on using a couple (ping-pong) dual-port memory arrays. On the input-side, the address will be computed such that the incoming data is written to the "transposed" memory location. On the output side, the VDMA will be used to store each of the 160x50 into DDR (with a stride parameter of 2350), and retrieve the whole 160x2350 when available.

All in all, this is perfectly clear and no longer an issue.

Thanks again!

View solution in original post