12-04-2018 08:52 AM - edited 12-04-2018 09:52 AM
I am designing a filter core that uses Xilinx FFT v9.0 IP. When configuring the core using the GUI, I spec the Target Clock Freq and Target data throughput identically at 400MHz. Looking at the implementation details tab, the gui automatcially selected "pipelined, Streaming I/O"
When I simulate it w/ QuestaSim, everything seems to work fine, until I start tinkering with the input side Slave Axi-Stream interface. I changed my testbed to only feed samples once every 4 clocks, which causes the S_AXIS_TVALID to pulse every 4 clocks, and the TDATA signal to change every 4 clocks. Things still seem right while the data is being input, but if I stall the input; so in my testbed, after X frames with of samples, I stop feeding data.
This causes the S_AXIS_TVALID to go low and stay low, forever. When I do this, the output side stops as well; which is to say that M_AXIS_TVALID goes low like 3 clocks later and stays low, despite there being at least 2 frames worth of data already loaded into the core, and despite being in the middle of outputting a frame of data; so the last state of M_AXIS is: TVALID = Low, TREADY = High, TUSER(7 downto 0)= k = 150, TDATA(15 downto 0) = Xk. None of the event signals indicate anything amiss.
It's almost like the S_AXIS_TVALID signal is internally connected to a clock enable pin. Is this the expected behavior?
12-05-2018 08:31 PM
In pipline mode, if you have loaded one frame data, then the IP core should processing and output this frame data. But you need to make sure the whole frame data is sent to the core. Also make sure the m_axis_data_tready is asserted to get the IP output data.
12-06-2018 05:48 AM
Agreed, but I don't think that is what I'm seeing in simulation. It's possible that there are steps I need to take, I've changed my FFT core's configuration in the GUI several times, but as far as I can tell, all those values are passed to the actual core in the vivado generated xfft_0.vhd, which instantiates the xfft_v9_1_0 core.
I could isolate the FFT into a testbed to be certain of what I'm seeing, and when/if I get the time, that's the next thing I'm going to do.
12-06-2018 06:14 AM
First, your question is not about FFT v9.0 but about AXI4 Stream communication.
I would suggest you get the AMBA AXI4-Stream Protocol from ARM and refer to chapter 2.2.1 (Handshake protocol)
I feel your problem is you don't check TREADY. You need to. Once you pull TVALID high, you need to wait for the slave to pull TREADY low. Data is fetch into the slave when both TVALID and TREADY are high, and that can span more than one cycle.
You don't need to guess or wonder what;s inside an AXI-Stream slave, just communicate with the correct grammar.
12-06-2018 11:23 AM - edited 12-06-2018 11:30 AM
I have spent a good deal of time looking at the axis interface. I've written what i believe to be axis compliant interfaces for several IPs that talk to the FFT core, as well as others that talk to AXI-DMA cores.
Also, my problem is on the output side of things.
In general, on the input side, S_AXIS_TREADY goes high; I then put data on TDATA and raise TVALID and on every subsequent clock period if TREADY is High, I drive data. Like Figure 2.2 in the AXIS spec. I'm not holding off until TREADY goes high, because that would violate the spec, but as implemented in this design, the FFT core asserts S_AXIS_TREADY high before I'm able to provide data.
On the output side; I drive M_AXIS_TREADY High, as I indicated in my original post.
M_AXIS_TVALID, which is driven by the FFT core drops low and stays low.
I'm not asking about the AXI-Stream interface protocol. What i'm seeing is the output data channel of the FFT stops mid-frame, when I stop the input data channel. For debug purposes I currently have M_AXIS_TREADY held high. I feed data to the FFT when the S_AXIS_TREADY is high. I've been through the work of ensuring that all the event signals are right. No unexpected_tlast, no missing_tlast. event_frame_started every N input samples. I have tested various sort of interface scenarios, whether it's the upstream running slower (controlled by tvalid), whether it's the downstream backing data up from the output (via tready), and combinations of both.
What an unhelpful comment.
12-07-2018 12:43 AM - edited 12-07-2018 01:54 AM
Ok, sorry for not having understood well your problem. So what happens to you is you expect the FFT to output data while not feeding input data in pipeline mode. Of course not. Pipeline means no data buffer. Data in = data out. Irrespective of clock. Data out without data in would imply something was buffered. That's not how the pipeline mode works. Not explicitly stated in the datasheet, but I think this says it implicitly (page41):
The core has the ability to simultaneously perform transform calculations on the current frame of data,
load input data for the next frame of data, and unload the results of the previous frame of data
Here, "simultaneously" means to me if it can't load data it won't calculate and unload anything. Not quite an ability, I'd say.
If you change to any Radix Burst mode, this is how it says it works:
It loads and/or unloads data separately from calculating the transform
The data loading and unloading processes can be overlapped if the data is unloaded in digit reversed order
I understand from the above you can unload without loading as these processes can, but do not need to, overlap. In pipeline mode, they must overlap.
Hope this is more helpful and insightful.
12-07-2018 05:48 AM
Ok, that all makes sense as a possible implementation. But is that the implementation? If so, how do you get the data out at the end? Suppose need to run at 1 sample/clock for 64 frames. The right answer cannot be that I have to feed it 3 frames worth of garbage data on the input, to get my last frame out.
Pipelining in my experience doesn't imply that you can't flush data out. Also, in the text from the PG109:
The core has the ability to simultaneously perform transform calculations on the current frame of data,
load input data for the next frame of data, and unload the results of the previous frame of data. You can continuously stream in data and, after the calculation latency, can continuously unload the results. If
preferred, this design can also calculate one frame by itself or frames with gaps in between.
If preferred, this design can also calculate one frame by itself or frames with gaps in between.
Which makes the most sense of all, because with an otherwise unconstrained, but fully compliant AXI-S interface, that's what you'd need to support.
12-07-2018 06:22 AM
Is not your case Pipelined Streaming I/O with Cyclic Prefix Insertion, where you need some extra input data?
12-07-2018 07:00 AM
You mention you clock your TVALID once while keeping your data for 4 cycles. This will feed incorrect data in Real Time mode. Check you are not in that mode and watching TREADY (if the FFT pulls it low, it won't read data even if your TVALID is high)
12-07-2018 07:05 AM
I'm not using real time mode. It's too much of a hassle to have to obey the S_AXIS_TREADY but violate the AXI-S interface in every other situation.
12-07-2018 07:25 AM
Are you interfacing the FFT with a standard or custom IP?
Even if not in Real time mode, you still have to watch for TREADY, you can't send data if the FFT is busy with something else.
12-07-2018 08:04 AM
"The right answer cannot be that I have to feed it 3 frames worth of garbage data on the input, to get my last frame out"
Let's look at it from a different point. To get the data out you need to clock it. You don't put anything special in the data port. Call it garbage, default value, all-zeros or 'don't care'. You get your previous result. When you feed your next frame, you will get the transform of your 'garbage' but because of the latency, your next frame's transform is going to come out when the following frame is sent.
I'm still thinking the mode you want is the burst, not the pipelined. I've used FFT a number of times, always with DMA, never doing such funny things, why would you send data once every four clocks? Couldn't you FIFO it with a separate clock?
12-09-2018 07:07 AM
" why would you send data once every four clocks?"
If the FFT IP doesn't support this, it's not AXI-S compliant.
Per the user guide:
Figure 3-40 shows the loading of the sample data for an 8 point FFT.
The upstream master drives TVALID and the core drives TREADY.
In this case, both the master and the core insert waitstates.
So, master waitstates are supported. The user guide also says":
The previous description only applies when the core is configured to use Non-Realtime mode. The situation is different in Realtime mode, which is used to create a smaller and faster design at the expense of flexibility in loading and unloading data. When the core is configured to use Realtime mode, the following occurs: 1. The TREADY signal on the Data Output channel (m_axis_data_tready) is removed 2. The TREADY signal on the Status channel (m_axis_status_tready) is removed 3. The TVALID signal on the Data Input channel is ignored when the loading of a frame has begun
That TVALID is ignored in real time mode, which implies that in non real time mode it's not ignored. And if it's not ignored, then it "MUST" support running at 4, 8 or whatever number clocks between TVALID ="1" pulses.
02-22-2019 06:12 AM
I just got back to this, and it looks like the problem I was seeing is if part of a frame is ingested into the core, it seems to stall processing and output of previous frames. The first image below show me sending exactly 2 frames, and then getting exactly the 2 frames out. testin is the input stream, testout is the output stream. Note the 2 pulses of fft_event_frame_started. The second image shows me sending 2 frames + 1 sample and getting nothing out. Note the 3 pulses of ff_event_frame_started. Also note the fft_event_data_in_channel_halted going high.
I'm pretty sure that I can create a work around, but I have trouble believing this is the appropriate or desired behavior. To create a work around I'm going to have to do something like directly upstream put an AXI-stream data fifo in packet mode; to basically be a packet buffer. But isn't that what the input stage of the FFT core is supposed to be doing?
02-27-2019 06:29 PM
@mckinjo4 , to answer your question, the Pipeline streaming arch of FFT IP allows the user to continuously send frames of data into the core and the IP core is able to continuously output data in frames, but you need to make sure to send a whole frame into the core, otherwise you will run into an issue if you send part of frame, with that said, if you have configured FFT to for example 1024 point, you need to send 1024 valid data(with s_data_tvalid and s_data_tready both asserted in the same clock cycle means a valid data) into the core, and with tlast indicating the last sample, and m_axis_data_treadt must be asserted high all the time, which is the downstream module indicating to FFT IP that it is able to accept data from FFT.
03-14-2019 09:48 AM
I still wonder about the FFT 9.0 behaviour:
It seems unavoidable, that tready goes low for one clock cycle after reading the first word of a frame.
The old FFT 7.0 core did not do that.
I know that it is documented like that, but this is most unconvnient when using real-time data coming from an ADC.
Does Xilinx really expect me to insert a FIFO between ADC and FFT? Is there no way to get the core to continuously read a whole frame without any wait cycles?
This would be a severe dagradation of FFT 9.0 vs. 7.0!
03-14-2019 10:14 AM
Yeah, it is quite the PITA if you aren't using axi-s compliant interfaces. But once I got my axi-s interfaces down the problem goes away.
But, Absolutely, for sure, if you aren't using back pressure capable AXI-S it makes the FFT borderline unusable.