cancel
Showing results for 
Search instead for 
Did you mean: 
Adventurer
Adventurer
509 Views
Registered: ‎10-19-2017

Why does OpenSSH delay Linux boot 3 minutes?

Jump to solution

I am working on the ZCU106 and am having an issue booting when I change the rootfs to use OpenSSH rather than Dropbear-SSH. I am running 2019.1 (upgrading to 2019.2 is not an option).

...
INIT: version 2.88 booting [ 5.714563] FAT-fs (mmcblk1p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck. [ 5.784341] EXT4-fs (mmcblk1p3): mounted filesystem with ordered data mode. Opts: (null) [ 5.854201] EXT4-fs (mmcblk1p4): mounted filesystem with ordered data mode. Opts: (null) Starting udev [ 6.022765] udevd[1904]: starting version 3.2.5 [ 6.042190] random: udevd: uninitialized urandom read (16 bytes read) [ 6.051985] random: udevd: uninitialized urandom read (16 bytes read) [ 6.059004] random: udevd: uninitialized urandom read (16 bytes read) [ 6.108461] udevd[1905]: starting eudev-3.2.5 [ 6.290556] mali: loading out-of-tree module taints kernel. [ 6.337555] xilinx-dp-snd-card fd4a0000.zynqmp-display:zynqmp_dp_snd_card: ASoC: CPU DAI (null) not registered [ 6.830320] EXT4-fs (mmcblk1p2): re-mounted. Opts: (null) INIT: Entering runlevel: 5 Configuring network interfaces...
[ 7.315466] pps pps0: new PPS source ptp0 [ 7.319572] macb ff0b0000.ethernet: gem-ptp-timer ptp clock registered. [ 7.326470] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready done. Starting system message bus: dbus. Starting OpenBSD Secure Shell server: sshd [ 8.324685] macb ff0b0000.ethernet eth0: link up (1000/Full) [ 8.330444] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 194.013634] random: crng init done [ 194.017084] random: 7 urandom warning(s) missed due to ratelimiting done. Starting internet superserver: inetd. Starting syslogd/klogd: done Starting tcf-agent: OK root@zcu106:~#

As you can see from the log... the boot appears to stall starting sshd. It eventually completes, but takes ~186 seconds to do so.

Has anyone else encountered this behavior? My rootfs_config is attached.

Tags (4)
0 Kudos
1 Solution

Accepted Solutions
Highlighted
Adventurer
Adventurer
202 Views
Registered: ‎10-19-2017

Re: Why does OpenSSH delay Linux boot 3 minutes?

Jump to solution

I opened an SR with Xilinx a couple of weeks ago and have not gotten a response. I have found a solution although I have not explored all options. The question of why the jitterentropy-rng kernel driver doesn't work is still pending.

In the meantime, I was able to build haveged using PetaLinux. It isn't an advertised package, but actually exists in the base Yocto recipes:

/opt/Xilinx/PetaLinux/2019.1/components/yocto/source/aarch64/layers/meta-openembedded/meta-oe/recipes-extended/haveged/haveged_1.9.2.bb

To build and deploy haveged:

$ petalinux-build -c haveged -x do_package
$ cd build/tmp/work/aarch64-xilinx-linux/haveged/1.9.2-r0/package/
$ tar zcvf haveged-1.9.2.tgz *

Then unpack the haveged tarball on your rootfs (SD card, git repo rootfs, etc.) and update your startup to run haveged before sshd:

$ ls -l /etc/init.d/rc5.d/
total 8
drwxr-xr-x  2 root root 4096 Feb 10 17:24 ./
drwxr-xr-x 36 root root 4096 Feb 10 17:24 ../
... lrwxrwxrwx 1 root root 26 Feb 10 17:24 S08havged -> ../init.d/haveged-setup.sh* lrwxrwxrwx 1 root root 14 Feb 10 17:24 S09sshd -> ../init.d/sshd* ...

 

View solution in original post

0 Kudos
9 Replies
Highlighted
Scholar
Scholar
480 Views
Registered: ‎05-28-2013

Re: Why does OpenSSH delay Linux boot 3 minutes?

Jump to solution

This is a fairly common issue that affects Linux users on many platforms. Not just OpenSSH, but also other software (eg. wpa_supplicant) are affected.

The issue is that insufficient "truly random" numbers are available at boot time, and this in turn is caused by a change to how getrandom() works. There was (and continues to be) quite a bit of "discussion" about this. Search for "linux getrandom blocking" if you want more details.

Arguably the most correct way to fix this is to provide a better entropy source early in the boot. External tools like jitterentropy or haveged can be used, just be sure these start before OpenSSH etc. There is also an in-kernel jitterentropy available since approx kernel v4.4.

0 Kudos
Highlighted
Adventurer
Adventurer
473 Views
Registered: ‎10-19-2017

Re: Why does OpenSSH delay Linux boot 3 minutes?

Jump to solution

@rfs613 

Thanks for the info. I actually found some something similar (although less detailed) at https://unix.stackexchange.com/questions/461425/debian-testing-takes-a-long-time-to-load-crng-init-done.

What's curious to me is this only presented itself when I switched from dropbear ssh to openssh... but also, I checked me kernel config and verified CONFIG_CRYPTO_JITTERENTROPY=y.

I also don't see any support for building jitteryentropy or haveged using PetaLinux to build it in the rootfs. Any thoughts on corrective action? Did Xilinx just not test openssh before including it in the PetaLinux distro?

0 Kudos
Highlighted
Adventurer
Adventurer
463 Views
Registered: ‎10-19-2017

Re: Why does OpenSSH delay Linux boot 3 minutes?

Jump to solution

I've attached the rootfs_config that includes dropbear-ssh instead of openssh as a point of comparison. If I build the rootfs with this configuration, I have no issues with boot time stalling for several minutes...

0 Kudos
Highlighted
Scholar
Scholar
453 Views
Registered: ‎05-28-2013

Re: Why does OpenSSH delay Linux boot 3 minutes?

Jump to solution
I actually had similar experience... this was a few years ago... ended up installing the userspace jitterentropy daemon as a work-around. The kernel one should work though - perhaps the issue is that it also needs a high resolution time-stamp.
Dropbear does not seem to be affected by the getrandom() changes. Some quick googling suggests it uses /dev/urandom, which is pseudorandom rather than "true" random.
0 Kudos
Highlighted
Scholar
Scholar
355 Views
Registered: ‎05-28-2013

Re: Why does OpenSSH delay Linux boot 3 minutes?

Jump to solution

When using dropbear, each time a connection is made (or service starts), the kernel prints:

random: dropbear: uninitialized urandom read (32 bytes read)

This happens until the system has been running long enough to generate sufficient entropy, as indicated by the message

random: crng init done

After that point, no more complaints about uninitialized urandom read.

Whereas OpenSSH insists on truly random values, even if that means blocking during startup.

0 Kudos
Highlighted
Adventurer
Adventurer
351 Views
Registered: ‎10-19-2017

Re: Why does OpenSSH delay Linux boot 3 minutes?

Jump to solution

Interesting, do you experience the same delay in boot using OpenSSH?

I've only had a few minutes the last few days to look into the jitterentropy-rng. I definitely don't see that library immediately available for cross-compilation through PetaLinux... which would mean some manual building/deploying. I was looking at this site (which points to github) for the current source: https://www.chronox.de/jent.html.

I'm still scratching my head on the jitterentropy driver though. Do you have any thoughts on troubleshooting why it might not be working? You clearly are experienced in this area and its new to me. I didn't see any specific module requirements listed in the commit in 2015 (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bb5530e4082446aac3a3d69780cd4dbfa4520013).

I guess, in summary:

1. How do I know if the jitterentropy module is working?

2. How do I troubleshoot what OpenSSH is stalling in on bootup?

3. Am I missing something in my configuration or did Xilinx just not test OpenSSH in their builds to encounter this before?

0 Kudos
Highlighted
Scholar
Scholar
337 Views
Registered: ‎05-28-2013

Re: Why does OpenSSH delay Linux boot 3 minutes?

Jump to solution

Regrettably I can't directly pinpoint this, as my situation is a bit different... The system where I encountered this problem is an x86 machine running a vanilla kernel, along with openssh server. I was able to get the userspace jitterentopy daemon working there, and it's been happy ever since. I also have various embedded systems where I tend to use dropbear (mostly due to smaller size) and thus haven't run into the boot delay on them.

1.You can test by reading bytes from /dev/urandom manually. For example with:

dd if=/dev/urandom bs=4 count=1 | hexdump -C

This would read 4 bytes, eg 32-bits, just like openssh does on startup. If you run this before "crng init complete" it will likely hang, just like openssh does. When jitterentropy (or other source) is working, it will supply entropy to kernel, and the above test will complete without any delay.

As for why the kernel driver isn't working: I am just guessing here, the commit log says "The RNG only needs a high-resolution time stamp.". Looking at the code in that same commit, the function jent_get_nstime() provides the time stamp. My guess is that this is failing, and returing a time stamp of 0 each time.

2. openssh calls getrandom() with a flag that says to block when not enough truly-random data is available. This will be the case early in boot for sure. And on an embedded system without a user (keyboard/mouse), traditional sources of entropy are pretty weak, so it takes a while (minutes) to build up. That's why openssh blocks boot... and so would other things like wpa_supplicant which also use same getrandom() call.  You could patch openssh to not call it this way (instead doing what dropbear does, to accept less-than-random values instead) but of course this may compromise security as the initial key could be more easily guessed.

3. my guess would be that Xilinx didn't test this config (esp if dropbear is the default). As I'm not using petalinux these days, I don't even know what the default is...

 

0 Kudos
Highlighted
Adventurer
Adventurer
290 Views
Registered: ‎10-19-2017

Re: Why does OpenSSH delay Linux boot 3 minutes?

Jump to solution

I think you're dead on in your assessment of the root cause here.

Unfortunately, I can't run dd before the "random: crng init done" message on startup because that is actually is actually what is unblocking OpenSSH. But when I do run it, I am immediately returned 4 bytes.

INIT: Entering runlevel: 5
Configuring network interfaces... 
[ 10.349033] pps pps0: new PPS source ptp0 [ 10.357622] macb ff0b0000.ethernet: gem-ptp-timer ptp clock registered. [ 10.368866] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready done. Starting system message bus:
[ 10.472656] random: dbus-daemon: uninitialized urandom read (12 bytes read) [ 10.733547] random: dbus-daemon: uninitialized urandom read (12 bytes read) dbus. Starting OpenBSD Secure Shell server: sshd [ 11.365719] macb ff0b0000.ethernet eth0: link up (1000/Full) [ 11.375982] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 191.752363] random: crng init done ... root@zcu106_vcu_trd:~# dd if=/dev/urandom bs=4 count=1 | hexdump -C 1+0 records in 1+0 records out 4 bytes copied, 0.000118471 s, 33.8 kB/s 00000000 76 9e fe 29 |v..)| 00000004 root@zcu106_vcu_trd:~#

I am going to chase the jent_get_nstime() code to see if I can determine what timer it is trying to use and what clock is required to drive that.

I'm also going to open up an SR with Xilinx to get a response since they haven't chimed in here yet. I'll report back with any update I have.

Thanks for your help, @rfs613.

0 Kudos
Highlighted
Adventurer
Adventurer
203 Views
Registered: ‎10-19-2017

Re: Why does OpenSSH delay Linux boot 3 minutes?

Jump to solution

I opened an SR with Xilinx a couple of weeks ago and have not gotten a response. I have found a solution although I have not explored all options. The question of why the jitterentropy-rng kernel driver doesn't work is still pending.

In the meantime, I was able to build haveged using PetaLinux. It isn't an advertised package, but actually exists in the base Yocto recipes:

/opt/Xilinx/PetaLinux/2019.1/components/yocto/source/aarch64/layers/meta-openembedded/meta-oe/recipes-extended/haveged/haveged_1.9.2.bb

To build and deploy haveged:

$ petalinux-build -c haveged -x do_package
$ cd build/tmp/work/aarch64-xilinx-linux/haveged/1.9.2-r0/package/
$ tar zcvf haveged-1.9.2.tgz *

Then unpack the haveged tarball on your rootfs (SD card, git repo rootfs, etc.) and update your startup to run haveged before sshd:

$ ls -l /etc/init.d/rc5.d/
total 8
drwxr-xr-x  2 root root 4096 Feb 10 17:24 ./
drwxr-xr-x 36 root root 4096 Feb 10 17:24 ../
... lrwxrwxrwx 1 root root 26 Feb 10 17:24 S08havged -> ../init.d/haveged-setup.sh* lrwxrwxrwx 1 root root 14 Feb 10 17:24 S09sshd -> ../init.d/sshd* ...

 

View solution in original post

0 Kudos