08-20-2019 10:25 AM
I'm using Xilinx's XDMA on an Ultrascale device.
Suppose I'm configuring the core to have 2 C2H and 2 H2C engines. I also select it to use AXI Stream (not MM).
Suppose the Host CPU configures a descriptor and sends it to one of the C2H interfaces.
What should happen next ?
How will the user's FPGA logic even know that the CPU issued a descriptor and is now waiting for data to be sent ?
08-21-2019 07:01 PM
The Driver writes the control register to start DMA engine and the DMA engine fetch the descritper from host
based on the descriptor received, the DMA will transfer th data to host
The example C2H flow sequence is as follow:
1. Open device C2H and initialize the DMA.
2. The user program allocates buffer pointer (based on size), passes pointer to read
function with specific device (C2H) and data size.
3. The driver creates descriptor based on size and initializes the DMA with descriptor start
address. Also if there are any adjacent descriptor.
4. The driver writes control register to start the DMA transfer.
5. The DMA reads descriptor from host and starts processing each descriptor.
6. The DMA fetches data from Card and sends data to host. After all data is transferred
based on the settings, the DMA generates an interrupt to host.
7. The ISR driver processes the interrupt to find out which engine is sending the interrupt
and checks the status to see if there are any errors and also checks how many
descriptors are processed.
8. After the status is good, the drive returns transfer byte length to user side so it can
check for the same
08-22-2019 01:23 AM
Thank for helping Liy.
Lets focus on step 6:
6. The DMA fetches data from Card and sends data to host.
How will this "fetching" action occur ?
All the FPGA logic see's is the C2H AXI Stream bus.
How is it supposed to know that it's now supposed to start sending data through this channel ??
08-22-2019 08:30 AM
Once the descriptor is set up, the DMA core will open up the c2h_tready - basically that is the indicator that the data can flow in. The user logic would then use the c2h_tvalid to indicate that data is going in, using the tkeep and tlast as described in the documentation (please review carefully).
For C2H Stream, it is less of a "fetch" than an "open bucket" for data to flow in. But the concept is still the same, once the descriptor is processed the DMA engine is ready to accept data, and will complete the transactions until tlast is raised, indicating the end of the data flow.
08-22-2019 08:54 AM - edited 08-22-2019 11:14 AM
For C2H Stream, it is less of a "fetch" than an "open bucket" for data to flow in.
This best conveys it...
But for the bucket to open - the Host FIRST has to issue a descriptor with specific length parameters - correct ?
What if it doesn't know the exact size of the packet it's about to get ?
08-23-2019 09:19 AM
The host has to open up spaces in host memory (via descriptor(s)) that are equal to or greater than the anticipated incoming data size. For example, if you had 12k incoming data, the host descriptors could open 3 - 4k buckets via 3 descriptors. Then the DMA engine would be ready to accept the data.
Let's say you had 10k data that came in. You would put all that data on the Stream bus and hit tlast at the end. The first 2 descriptors would be completely fulfilled, then the 3rd descriptor would be closed, but with only half the data.
Per PG195, you can go short on the data for C2H stream. So back to your question:
What if it doesn't know the exact size of the packet it's about to get?
The recommended strategy is to provide several descriptors to the descriptor chain that will exceed the size of the expected data. This way you always have enough space + some extra for the incoming data. C2H stream by default is actually a circular buffer of descriptors, so once the last descriptor in the chain is fulfilled, data will start flowing back into the first. (Essentially once the descriptor(s) are loaded, tready will never go low again).
For example, let's say you expect 10k followed sometime later by another 10k, then sometime well after another 10k. You would immediately open 6 4k descriptors. The first 10k would take the first 3, and you would issue tlast. Then the DMA would send an interrupt (or set register bits for poll mode). Your host application would begin fetching the 1st 10k of data. While that is happening the 2nd 10k comes in, flowing into descriptors 4-6. Interrupt is issued. Data is fetched by application. Finally some time later, that last 10k comes in, and it will go back into descriptors 1-3.
In the above scenario, if you know that the last 10k is going to come in ahead of the time when the application would have retrieved the first, then you would need to open 9 (or more) descriptors.
08-23-2019 10:00 AM - edited 08-23-2019 10:08 AM
So the strategy is to always "oversubscribe" to data ?
Are there any exceptions to this ?
Can't the system freeze if it doesn't get data for a prolonged period of time (after sending a descriptor) ?