02-08-2021 04:03 AM
AXI4-Stream interconnect is 32 bits wide. Assuming I want a larger bandwidth per stream than 32G bits/sec, for example 64G bits/sec, can I push from PL 256 bits wide bus with a 250 MHz clock that will be converted in AIE domain to 2 parallel streams of 32 bits each with 1 GHz clock aggregating 64G bits/sec? Will the tile know to take both streams and combine them to one 64 bit wide word in tile? Otherwise is true to say the sample rate input is limited to 32G bits/sec?
02-10-2021 07:05 AM
Each AI Engine tile can read two 32-bit AXI stream direct inputs. Given typical 1 GHz AIE clock, the maximum direct stream inputs you an achieve is 64 Gbit/s.
Note that it takes 4 clock cycles to convert from native 32 bit AXIS to 128 bit register/memory alignment when using direct stream. You can either interleave the stream inputs to do 128 bit update of the register to effectively refresh every second clock cycle or do concatenate if both stream ports every 4th clock cycle to get 256 bit update.
Also note that if you use dual AXIS, you will need to arrange the input streams alternating 4 consecutive samples on AXIS port 0 and port 1 respectively.
This is explained in this guide:
02-08-2021 04:10 AM
Hi @hezi
I would recommend you to read the very detailed answer from my colleague @ludovica on the following topic:
https://forums.xilinx.com/t5/Versal-and-UltraScale/Versal-FPGA-AIE-to-PL-logic-bit-width-rate-matching-and/m-p/1169826#M15351
Let me know if there are still things unclear after reading it.
02-08-2021 11:05 PM
Thanks, this doesn't exactly address my question since I want a virtual stream of 64G bits/sec mapped on 2 physical 32G bits/sec streams in AIE.
This means I need a 256 bit PL interface running at 250MHz using 4 native PL-AIE 64b stream interfaces connected to 2 internal 32b interconnect streams concatenated in AIE tile to 1 64 bit stream.
AIE is capable of preforming 2 loads of 128 bits per clock. If the above is not possible, this means the AIE tile can utilize full BW only as consumer of neighboring tiles(producing 128/256 bits results), but not as consumer from PL(since streams are only 32 bit wide).
Is my understanding correct? Or in other words how can I have a single input to AIE array with BW greater than 4GB/sec?
02-10-2021 04:56 AM
02-10-2021 07:05 AM
Each AI Engine tile can read two 32-bit AXI stream direct inputs. Given typical 1 GHz AIE clock, the maximum direct stream inputs you an achieve is 64 Gbit/s.
Note that it takes 4 clock cycles to convert from native 32 bit AXIS to 128 bit register/memory alignment when using direct stream. You can either interleave the stream inputs to do 128 bit update of the register to effectively refresh every second clock cycle or do concatenate if both stream ports every 4th clock cycle to get 256 bit update.
Also note that if you use dual AXIS, you will need to arrange the input streams alternating 4 consecutive samples on AXIS port 0 and port 1 respectively.
This is explained in this guide: