This blog entry is not a technical article, but it is definitely for technical people. When you have done a few projects, you know that there is a lot of luck involved in a project working on the first try. You might face human errors, things you hadn’t thought about before-hand, or even issues outside of your control (for example issues with a source or sink equipment).
But the role of an engineer is to solve problems. And with the correct methodology, we can identify the root cause quickly and solve it without impacting on the project (hopefully your project planning has allowed some time for this).
I have been working for Xilinx Worldwide Technical Support (WTS) for a few years now, so I have tried to pull together what I have learned about how to debug Video Applications on Xilinx devices.
Step 1 – Get a clear view of your system
Before starting any debug, you need to be clear about how your system is intended to work. Start by writing up a concise description of what the system is expected to do, including the expected system capabilities. For example, note if the system only support 1080p video, or if it supports a range of resolutions from SD to UHD Resolutions.
It is also useful if you have not done so already, to draw a block level diagram of your system. The main point of the block diagram should be to show the data flow through the system. And you can even add a label to the system to correspond to the description you have written. If you have a software application, it might be helpful to have a basic software flow chart that describes the application.
Doing this will not only help you to get a better view of your system and identify potential failure points, but it can help anybody who would be willing to help you with debugging your system.
Step 2 – Make it fail
The second step is to try to find steps to repeatedly reproduce the issue. This will help you to understand which variables are making your system fail and which have no effect on the issue.
It will also help when you begin to focus on the root cause and provide a way to verify your fix once it is implemented.
This step often requires that you capture data about the system behavior and interaction.
For example, in your video system, you might want to see if different inputs (i.e. video sources such as GPUs…) or different outputs (i.e. video sinks such as monitors or test equipment) produce different behavior.
While doing this testing you should capture detailed information about the system inputs and outputs, including a description of the video source and/or video sink for your system and the configurations during the testing.
If the failure is repeatable, take note of what you have, and then check to see if it also fails when you change a variable such as trying with a different input, or the same input running a different OS.
You should minimize the number of parameters you vary with each test, ideally adjusting only one parameter at a time.
Step 3 – Identify the source of the issue
Once you have a repeatable failure, then what you want to do is identify the source of the issue.
At this point I like to start with what I call the divide and conquer approach.
That is, once I have a repeatable failure, I try to see if I can remove parts of the system to see if the issue persists.
The best way to figure out where to divide up the design is to refer to the block diagram created in step 1. For example, if the failure happens on a pass-through video design, I would look at the block diagram and identify where I can separate the capture and display portions of the design. Then I might test and see what happens if I removed the source and use a test pattern generator to send out a constant pattern.
If the problem continues I know that I no longer need to look at the input, but can focus my additional debug steps on the display or output portion of my system. It should be noted that this might require modification of the design, but often systems are software controlled and turning on and off a test pattern can be done through software rather than by modifying the design itself.
Once you have localized the issue as much as possible, you should look at the available tools for debugging the problem.
For a Video System, you might want to add an Integrated Logic Analyzer (ILA) as shown in Video Series 31 – Debugging a Video System using an ILA. Using the block diagram from Step 1, you can find the best place to put your probes. This might help you to see if an individual IP in the data path is not behaving as expected, for example if it is not consuming or producing any data or if it is producing incorrect data.
If your system is using Linux or your software application can output to a UART, use the output that you are getting to collect debug data. It might give information on which stage the application is failing at or why it is failing. Also do not be afraid to add more prints or logging capabilities into the application while debugging, you can always remove them when you have solved your issue.
Note: when using a UART it can sometimes be better to capture the data and print it later, as printing too much information can cause additional interrupts and change the overall system behavior.
Also, many applications or drivers have built-in debug capabilities, and there might already be debug applications provided such as the media-ctrl or modetest commands that are available on Linux. You can check this by looking in the IP/driver/Application documentation.
If your application is hanging, you might also want to run the application in Debug mode, going step by step through the application to find out where it is hanging.
Step 4 – Describe your issue with keywords
This step is not really mentioned in debug methodologies, but we are in the era of the Internet and search engines.
Just as you described your system in step 1, reducing your issue description into keywords will not only help you to be clear about what the issue is, it will also help you to find related issues (and solutions) on the web which might help you to solve your issue.
If you have identified that the issue was with a specific IP, then your first keyword will be the name of the IP.
Then, try to find a single keyword describing the issue. If you got an error in the output console, it might be easy to find this keyword.
If the keyword you chose does not return any useful results, it is sometimes worth looking for synonyms or other ways of describing the same issue. For example, “Why do I see noise on my screen?” or “Why does my screen look like snow?”
Step 5 – Look for related known issues
You might not be the first person to run into the problem you are seeing, so it is always useful to see if the problem you have encountered has been documented previously. There are many resources you can use to find known issues or more detailed debugging information.
You might want to do some directed searching, where you look in predefined places for the information you are trying to find. By narrowing the scope of your search, you could be more likely to find a related issue.
Some places you might want to look at for Xilinx Video Systems:
The IP Product Guide (PG):
Most IP Product Guides have a debugging section (usual in the appendixes section). Make sure you have followed all of the steps provided. These steps were written based on issues other people faced using the IP, so there is a good chance they can help you as well.
You can also try to do a search in the product guide with the keywords you have defined from step 4.
The IP Master Answer Record (AR): Did you know that every Xilinx IP had a master Answer Record which should list all of the known issues (known by Xilinx) for the cores? If not, this is probably where you want to look to ensure that your issue is not already documented (and possibly fixed).
One way to find the master Answer Record for a specific IP is to go to the Video Design Hub. Under the IP category you can find the master answer records and the product guides.
In an era when so much information is available on the web, your issue might have already been asked somewhere, particularly on the Xilinx forums. Use your favorite search engine and the keywords from Step 5.
If you do not find any related topics, this is your chance to create a new topic on the Xilinx community forums.
For Linux systems, Xilinx also provides Wiki pages with information specific to using IP in a Linux based system. The Xilinx Wiki contains pages for all of the Linux drivers. These driver pages include information on what hardware features are supported by the Linux drivers, as well as testing and debug information that can help you to reproduce and identify the system failure.
Liked this Video Series entry?
You can give Kudos using the Kudos button
Share it on social media using the Share button
Feel free to comment on this topic or to create a new topic on the forums to ask questions