cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
lmaxeniro
Explorer
Explorer
676 Views
Registered: ‎09-09-2019

Firewall Trip protection Error report

Jump to solution

Dear supporter,

I recently meet frequently the issue as below--i.e. when running some acceleration test on U200, the FPGA looks hang and check by xbutil query I can see below error happen.

Firewall Last Error Status
Level 3 : 0x80000(RECS_WRITE_TO_BVALID_MAX_WAIT)
Error occurred on: Wed 2020-08-15 15:59:34 CST

Search around there is very little documentation available. There is one AXI IP documentation (pg293-axi-firewall) but it just tell me this looks likely a protection from this Firewall IP. 

Some clue may be useful:

 Such error will easily happen for when there are multiple iterations--I use a for iteration loop for sending the test data to FPGA and readback the reult-- if there is only few iterations (for exp, less than 3) it will be fine, if there are more iteration (for exp, 10 iterations), then the issue will be much easier to be reproduced.

What I really need to do is finding the root-cause--which I don't know how to do... Anyone can give some suggestion?

 

 

0 Kudos
1 Solution

Accepted Solutions
mcertosi
Xilinx Employee
Xilinx Employee
588 Views
Registered: ‎10-19-2015

Hi @lmaxeniro 

Are you running a Xilinx example acceleration project or your own? 

A firewall trip usually means something in the FPGA is misbehaving, usually the kernel. in a lot of cases, if you've written your own AXI protocol handlers you should start there. Place a lightweight protocol checker on the master interfaces of your kernels, use xbutil status to see the status of the light weight protocol checkers. Divide and conquer, find which subsystem is seeing a violation and narrow your sights there. Often it is necessary to place axi protocol checkers internal to your kernel to validate its behavior. 

Vitis Debugging flow 

Once you've isolated the system that is failing, you might want to change your host code so that it can loop on the failing commands or add a break point so you can arm an ILA internal to the FPGA prior to hitting the error condition.

Have you ran SW and HW simulations/emulations? 

Does this bitstream have other methods of validation? 

Can you reproduce the error on a smaller test case?

Does the design fail the same way running faster/slower? 

Does the design fail the same way running cooler/hotter?

Are there any inputs you can change that change the failure mode?  

Let me know if you'd like me to elaborate or if you have any other questions.

Regards,

M

-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------

View solution in original post

0 Kudos
3 Replies
lmaxeniro
Explorer
Explorer
612 Views
Registered: ‎09-09-2019

I am not sure if this could be a rare case, so nobody can give some hint...

0 Kudos
mcertosi
Xilinx Employee
Xilinx Employee
589 Views
Registered: ‎10-19-2015

Hi @lmaxeniro 

Are you running a Xilinx example acceleration project or your own? 

A firewall trip usually means something in the FPGA is misbehaving, usually the kernel. in a lot of cases, if you've written your own AXI protocol handlers you should start there. Place a lightweight protocol checker on the master interfaces of your kernels, use xbutil status to see the status of the light weight protocol checkers. Divide and conquer, find which subsystem is seeing a violation and narrow your sights there. Often it is necessary to place axi protocol checkers internal to your kernel to validate its behavior. 

Vitis Debugging flow 

Once you've isolated the system that is failing, you might want to change your host code so that it can loop on the failing commands or add a break point so you can arm an ILA internal to the FPGA prior to hitting the error condition.

Have you ran SW and HW simulations/emulations? 

Does this bitstream have other methods of validation? 

Can you reproduce the error on a smaller test case?

Does the design fail the same way running faster/slower? 

Does the design fail the same way running cooler/hotter?

Are there any inputs you can change that change the failure mode?  

Let me know if you'd like me to elaborate or if you have any other questions.

Regards,

M

-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------

View solution in original post

0 Kudos
lmaxeniro
Explorer
Explorer
479 Views
Registered: ‎09-09-2019

@mcertosi 

Thanks for your suggestion.

I had been eventually konck-off the issue, but I don't have very clear vision on what I did really take effect--could be one of some change or some combination. But anyway I post some experience (possible debug points) that I engaged hope to helpful to someone.

1. The kernel Argument parameter (in particyularly the parameter of sizeof, which is the memory size (bytesize) but the argument would be likely different type)--make sure you assign the correct size. 

2. If you have multiple argument parameters--make sure to double check the releantion ship between them and this usually need examining the kernel implementation closely (I was trapped here for once time as I am not familiar to the RTL design).

3. About the kernel buffer write/read flag, there are many trap here as well, as there are many trickies here as well.. I find one article would be helpful: https://streamhpc.com/blog/2013-02-03/opencl-basics-flags-for-the-creating-memory-objects/