10-22-2018 10:06 AM
I am having some problems with Linux hanging on startup with some custom firmware on the ZCU102 board.
I have an FMC card with a Camera Link port on it. My co-worker has put together an FPGA build to implement the pipeline to a framewriter and has validated with a camera source and JTAG debugging of the PL that this works as expected. I am now trying to boot Linux with this firmware and it is hanging after earlyprintk.
I have attached the full device tree as well as the DTSI that contains all of the Camera Link related nodes (renamed to txt files because of forum upload issues). The source of the problem is the implementation of the Camera Link video pipeline. I know this because if I disable the inclusion of vcap-cl1.dtsi from my device tree build I can boot the board without any issues.
You will see from the device tree there are no custom drivers in this pipeline. The software support is limited to Xilinx deployed drivers: xlnx,video and xlnx,v-frmbuf-wr-2.1. I have confirmed all necessary drivers are built into my kernel.
I am aware of AR# 69587 (https://www.xilinx.com/support/answers/69587.html), but we are using the 2018.2 release of tools and software. In addition, you will see from my kernel startup that clk_ignore_unused is part of my kernel command line arguments.
I'm not sure how to troubleshoot this with Linux hanging on startup. This appears to be the same/similar behavior to AR# 69587...
Here is my kernel startup:
Starting kernel ... [ 0.000000] Booting Linux on physical CPU 0x0 [ 0.000000] Linux version 4.14.0 (njozwiak@MHT-CCKD1D2) (gcc version 7.2.1 20171011 (Linaro GCC 7. 2-2017.11-rc1)) #1 SMP Mon Oct 15 01:57:54 EDT 2018 [ 0.000000] Boot CPU: AArch64 Processor [410fd034] [ 0.000000] Machine model: ZynqMP ZCU102 Rev1.0 [ 0.000000] earlycon: cdns0 at MMIO 0x00000000ff000000 (options '115200n8') [ 0.000000] bootconsole [cdns0] enabled [ 0.000000] efi: Getting EFI parameters from FDT: [ 0.000000] efi: UEFI not found. [ 0.000000] cma: Reserved 256 MiB at 0x0000000070000000 [ 0.000000] psci: probing for conduit method from DT. [ 0.000000] psci: PSCIv1.1 detected in firmware. [ 0.000000] psci: Using standard PSCI v0.2 function IDs [ 0.000000] psci: MIGRATE_INFO_TYPE not supported. [ 0.000000] percpu: Embedded 21 pages/cpu @ffffffc87ff5b000 s46488 r8192 d31336 u86016 [ 0.000000] Detected VIPT I-cache on CPU0 [ 0.000000] CPU features: enabling workaround for ARM erratum 845719 [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 1034240 [ 0.000000] Kernel command line: earlycon clk_ignore_unused root=/dev/mmcblk0p2 rw rootwait [ 0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes) [ 0.000000] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) [ 0.000000] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) [ 0.000000] software IO TLB [mem 0x6bfff000-0x6ffff000] (64MB) mapped at [ffffffc06bfff000-ffffffc 06fffefff] [ 0.000000] Memory: 3784896K/4194304K available (10172K kernel code, 812K rwdata, 3364K rodata, 51 2K init, 2169K bss, 147264K reserved, 262144K cma-reserved) [ 0.000000] Virtual kernel memory layout: [ 0.000000] modules : 0xffffff8000000000 - 0xffffff8008000000 ( 128 MB) [ 0.000000] vmalloc : 0xffffff8008000000 - 0xffffffbebfff0000 ( 250 GB) [ 0.000000] .text : 0xffffff8008080000 - 0xffffff8008a70000 ( 10176 KB) [ 0.000000] .rodata : 0xffffff8008a70000 - 0xffffff8008dc0000 ( 3392 KB) [ 0.000000] .init : 0xffffff8008dc0000 - 0xffffff8008e40000 ( 512 KB) [ 0.000000] .data : 0xffffff8008e40000 - 0xffffff8008f0b200 ( 813 KB) [ 0.000000] .bss : 0xffffff8008f0b200 - 0xffffff8009129638 ( 2170 KB) [ 0.000000] fixed : 0xffffffbefe7fd000 - 0xffffffbefec00000 ( 4108 KB) [ 0.000000] PCI I/O : 0xffffffbefee00000 - 0xffffffbeffe00000 ( 16 MB) [ 0.000000] vmemmap : 0xffffffbf00000000 - 0xffffffc000000000 ( 4 GB maximum) [ 0.000000] 0xffffffbf00000000 - 0xffffffbf1dc00000 ( 476 MB actual) [ 0.000000] memory : 0xffffffc000000000 - 0xffffffc880000000 ( 34816 MB) [ 0.000000] Hierarchical RCU implementation. [ 0.000000] RCU event tracing is enabled. [ 0.000000] RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=4. [ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4 [ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0 [ 0.000000] GIC: Adjusting CPU interface base to 0x00000000f902f000 [ 0.000000] GIC: Using split EOI/Deactivate mode [ 0.000000] irq-xilinx: /amba_pl@0/interrupt-controller@a0002000: num_irq=8, edge=0xf8 [ 0.000000] arch_timer: cp15 timer(s) running at 99.99MHz (phys). [ 0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x170f8dc196, max_id le_ns: 440795203664 ns [ 0.000003] sched_clock: 56 bits at 99MHz, resolution 10ns, wraps every 4398046511099ns [ 0.008314] Console: colour dummy device 80x25 [ 0.012568] console [tty0] enabled [ 0.015935] bootconsole [cdns0] disabled
10-23-2018 01:03 AM
10-23-2018 10:29 AM
@trigger I haven't looked at the details of that print statement, but it exists for a normal boot as well:
... [ 0.000000] GIC: Adjusting CPU interface base to 0x00000000f902f000 [ 0.000000] GIC: Using split EOI/Deactivate mode [ 0.000000] irq-xilinx: /amba/interrupt-controller@a0010000: num_irq=10, edge=0x1 [ 0.000000] arch_timer: cp15 timer(s) running at 99.99MHz (phys). [ 0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x170f8dc196, max_id le_ns: 440795203664 ns [ 0.000003] sched_clock: 56 bits at 99MHz, resolution 10ns, wraps every 4398046511099ns [ 0.008386] Console: colour dummy device 80x25 [ 0.012646] console [tty0] enabled [ 0.016013] bootconsole [cdns0] disabled [ 0.000000] Booting Linux on physical CPU 0x0 [ 0.000000] Linux version 4.14.0-xilinx-v2018.2 (oe-user@oe-host) (gcc version 7.2.0 (GCC)) #2 SMP Thu Jul 26 13:59:38 EDT 2018 [ 0.000000] Boot CPU: AArch64 Processor [410fd034] [ 0.000000] Machine model: ZynqMP ZCU102 Rev1.0 [ 0.000000] earlycon: cdns0 at MMIO 0x00000000ff000000 (options '115200n8') [ 0.000000] bootconsole [cdns0] enabled [ 0.000000] efi: Getting EFI parameters from FDT: [ 0.000000] efi: UEFI not found. [ 0.000000] cma: Reserved 1024 MiB at 0x000000000e000000 [ 0.000000] psci: probing for conduit method from DT. ...
I think we isolated the cause, but do not have full resolution yet. We did some debugging from the FPGA JTAG using Vivado and discovered that all register reads on our framebuffer IP core in the Camera Link pipeline were returning: 0xdec0dee7. Which looks suspiciously like some hard-coded debug return, but is not documented. So my co-worker did a full clean and rebuild of the FPGA build and that resolved the issue with those reads. So it would appear there was a Vivado glitch with build files. Once that was resolved Linux booted without issue.
During troubleshooting though, we encountered another cause of Linux boot stalling which appears to point to the root cause. As you see from the device tree I uploaded, we have an Aptina MT9P031 connected to the ZCU102. If that hardware is missing (disconnected), but the device nodes exist in the DT, Linux fails to boot past earlyprintk in the exact same way. This appears to indicate the root cause being a failure in bus transactions of some form.
What I would expect to see is a failure during the driver probes for missing or malfunctioning hardware. But clearly that is not occurring. Linux is completely stalling on boot. Has anyone else experienced this? This is a critical issue for us. Any thoughts on additional troubleshooting or workarounds?
10-23-2018 04:00 PM
@njozwiak's co-worker here.
Part of the issue we are seeing is that the AXI fabric is not decoding correctly. AXI SmartConnect started giving connection problems past 9 slaves. At first the IPI would show the devices connected but would not assign an address to them claiming that there was no path. After some manual intervention and creating a cascading set of SmartConnect fabrics the address editor finally allowed addresses to be assign. IPI Validate design passed and the design synthesized and routed with no errors. However the last slaves added would respond with "dec0dee3" when read via the JTAG to AXI master. I assume this is a an error message from the JTAG to AXI master block to show that no slave is at that address. I also assume that there is nothing to terminate the bus cycle if the A53 queries this address.
By deleting all intermediate files and forcing Vivado to regenerate all the IP in the Block design and in the chip I was able to get the slaves to respond to the assigned addresses. The error was not a temporary corruption of the state of the design, however. The very next slave added had the same problem in that it was not responding to the address assigned in the address editor. I did change the address of the slave after it was auto-assigned so there may be an issue with changing slave addresses not propagating through IPI correctly even though validate design passes. Attached is the simple video input pipeline. You can see Smartconnect that is part of this hierarchical block. IT is attached to another Smartconnect Fabric on the top level.
02-16-2020 02:20 AM
I'm facing exactly the same problem. I have the frmbuf read and write IPs. If I comment them out from the device tree, then boot goes on successfully. Otherwise, it hangs. Your solution is to rebuild the design from scratch?
Xilinx needs to look at this issue closely and solve it.