cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
229 Views
Registered: ‎09-14-2020

Custom IP works functionally but on integration with processor and AXI DMA it fails.

I am working on a small Image processing application (Image blurring) and I am trying to send my image to the PL (designed and simulated and functionally verified to perform image blurring) thought AXI DMA. I have made the PL into a custom IP. I have posted recently on this forum with the same design asking how to check if a clock exists in the design. I am new to FPGA design and have almost no experience with complete designing.

Through ILA cores I have verified that data enters my IP. The clock signal exists as well. Img_Blur is my custom IP coded in VHDL.  I have tried debugging internal signals in my IP from this block design to no avail. I have counters to keep track of pixel values (input, output and columns). From my understanding these counters do not update at all (from what I have verified using ILA tools).

Basically, even though my IP works functionally at the same clock rate when I did a behavioral sim and produces the right results, the same does not happen when I integrate it using a block design. Data enters my IP, but the output stream is Inactive.

If any further information is required to help me solve this, please let me know. I am willing to share my entire PL and PS code if anyone has the time to help me understand the problem.

I know this might be asking for a lot, but I have been stuck on this for quite some time and have made no progress. Thank you.

 

IP integrater Block DesignIP integrater Block Design

Tags (3)
0 Kudos
3 Replies
Highlighted
Scholar
Scholar
225 Views
Registered: ‎05-21-2015

@SriramGangadhar,

Go ahead and share your custom IP.  I'll take a peek at it and see what I can find.

Dan

0 Kudos
Highlighted
176 Views
Registered: ‎09-14-2020

@dgisselq 

I have attached the files to this reply. Thank you so much.

0 Kudos
Highlighted
Scholar
Scholar
163 Views
Registered: ‎05-21-2015

@SriramGangadhar,

How are you handling AXIS_TLAST?  I don't see any references to it within your design.  The S2MM core will halt when it sees the first incoming value with TLAST set, so you need to faithfully copy this value from your input to your output and keep it aligned with your processing chain.

I've also noticed that you've simply copied m_axi_ready to s_axi_ready, but then used s_axi_valid (ungated) within your design.  This is broken on many levels.  1) if !m_axi_ready, your design may accept several s_axi_valid's even though the incoming stream might think it's only sending the first value and getting stalled.  2) If you fix that by setting en_in to s_axi_valid and s_axi_ready (a good approach, except ...) you'll then hang your IP if the slave ever waits for m_axi_valid before setting m_axi_ready.  That means that the right solution for s_axi_ready is to set it to (m_axi_ready or not m_axi_valid).  This, however, is a protocol error because AXI outputs are not allowed to depend combinatorially on their inputs.

The proper way to deal with s_axi_ready is to use a skid buffer.  That'll allow you to deal with it within your design as a combinatorial value, but still make certain it's registered externally.  Indeed, you might wish to take a look at some of the more recent AXI stream articles on ZipCPU.com to get some ideas regarding how to handle streams properly.  For example, there's a recent article on run-length encoding, and another older one on debugging AXI streams--both of which you might find valuable here.

Regarding the DFF.vhd core, let me counsel you against using it and any other low-level cores at that level.  1) They make things a real pain if you ever need to switch tools or vendors.  2) They are easily inferred from your design alone.  3) It's just unnecessary indirection.

Now, to the question of why this might work in simulation ... My guess is that in simulation you left m_axi_ready high, and that your simulation did nothing to check TLAST.  Am I right?  If not, my second guess would be that you either didn't simulate the memory, or simulated the design with no processor also attempting to access the memory--and so again m_axi_ready would get left high.  (This doesn't explain the TLAST issue ...)  A simple formal verification check would've found the READY bug quickly independent of any other components in the system.  Finding the TLAST bug?  That'd depend on how good your formal properties were.

Hope that gets you unstuck!

Dan