Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pantheon: crash loop on start inside vmware guest #140513

Closed
oceanlewis opened this issue Oct 4, 2021 · 19 comments · Fixed by #156646
Closed

Pantheon: crash loop on start inside vmware guest #140513

oceanlewis opened this issue Oct 4, 2021 · 19 comments · Fixed by #156646
Labels
0.kind: bug Something is broken 6.topic: pantheon The Pantheon desktop environment

Comments

@oceanlewis
Copy link
Contributor

Describe the bug

I've been running NixOS inside VMware for the past month and recently moved to the unstable branch of nixpkgs to start using the latest Pantheon. Since doing that, I'll sometimes run into a crash loop on booting the VM that will usually fix itself after restarting the VM several times. I believe this issue is a relatively new one and limited to just Pantheon as I haven't run into this behavior when running Gnome on the same unstable branch, nor using Pantheon on the stable channel.

I'm attaching what I believe are relevant logs from journalctl. I'm not sure what other info I can provide, but I'm happy to do so.

Steps To Reproduce

Steps to reproduce the behavior:

  1. Start up the VM and boot into NixOS
  2. Observe the display manager crash and restart three times

Expected behavior

Pantheon should start without crashing.

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

From a look at journalctl output:

Oct 04 11:49:31 eos systemd[1]: Starting X11 Server...
Oct 04 11:49:31 eos systemd[1]: Started X11 Server.
Oct 04 11:49:31 eos systemd-logind[846]: New session c3 of user lightdm.
Oct 04 11:49:31 eos systemd[1]: Started Session c3 of User lightdm.
Oct 04 11:49:32 eos systemd[1104]: Starting Sound Service...
Oct 04 11:49:32 eos rtkit-daemon[1496]: Successfully made thread 1966 of process 1966 owned by 'lightdm' high priority at nice level -11.
Oct 04 11:49:32 eos rtkit-daemon[1496]: Supervising 1 threads of 1 processes of 1 users.
Oct 04 11:49:32 eos pulseaudio[1966]: Disabling timer-based scheduling because running inside a VM.
Oct 04 11:49:32 eos rtkit-daemon[1496]: Supervising 1 threads of 1 processes of 1 users.
Oct 04 11:49:32 eos rtkit-daemon[1496]: Successfully made thread 1976 of process 1966 owned by 'lightdm' RT at priority 5.
Oct 04 11:49:32 eos rtkit-daemon[1496]: Supervising 2 threads of 1 processes of 1 users.
Oct 04 11:49:32 eos pulseaudio[1966]: ALSA woke us up to write new data to the device, but there was actually nothing to write.
Oct 04 11:49:32 eos pulseaudio[1966]: Most likely this is a bug in the ALSA driver 'snd_ens1371'. Please report this issue to the ALSA developers.
Oct 04 11:49:32 eos pulseaudio[1966]: We were woken up with POLLOUT set -- however a subsequent snd_pcm_avail() returned 0 or another value < min_avail.
Oct 04 11:49:32 eos pulseaudio[1966]: Disabling timer-based scheduling because running inside a VM.
Oct 04 11:49:32 eos rtkit-daemon[1496]: Supervising 2 threads of 1 processes of 1 users.
Oct 04 11:49:32 eos rtkit-daemon[1496]: Successfully made thread 1977 of process 1966 owned by 'lightdm' RT at priority 5.
Oct 04 11:49:32 eos rtkit-daemon[1496]: Supervising 3 threads of 1 processes of 1 users.
Oct 04 11:49:32 eos systemd[1104]: Started Sound Service.
Oct 04 11:49:32 eos bluetoothd[800]: Endpoint registered: sender=:1.64 path=/MediaEndpoint/A2DPSink/sbc
Oct 04 11:49:32 eos bluetoothd[800]: Endpoint registered: sender=:1.64 path=/MediaEndpoint/A2DPSource/sbc
Oct 04 11:49:32 eos kernel: traps: .io.elementary.[1888] general protection fault ip:7f9d00cc54d1 sp:7ffd3cc0b810 error:0 in libgtk-3.so.0.2404.26[7f9d00ab1000+379000]
Oct 04 11:49:32 eos systemd[1]: Started Process Core Dump (PID 1997/UID 0).
Oct 04 11:49:32 eos systemd-coredump[1998]: Process 1888 (.io.elementary.) of user 78 dumped core.
Oct 04 11:49:32 eos systemd[1]: [email protected]: Deactivated successfully.
Oct 04 11:49:32 eos systemd-logind[846]: Session c3 logged out. Waiting for processes to exit.
Oct 04 11:49:32 eos bluetoothd[800]: Endpoint unregistered: sender=:1.64 path=/MediaEndpoint/A2DPSink/sbc
Oct 04 11:49:32 eos bluetoothd[800]: Endpoint unregistered: sender=:1.64 path=/MediaEndpoint/A2DPSource/sbc
Oct 04 11:49:33 eos systemd[1]: display-manager.service: Main process exited, code=exited, status=1/FAILURE
Oct 04 11:49:33 eos systemd[1]: display-manager.service: Failed with result 'exit-code'.
Oct 04 11:49:33 eos systemd[1]: display-manager.service: Scheduled restart job, restart counter is at 3.
Oct 04 11:49:33 eos systemd[1]: Stopped X11 Server.
Oct 04 11:49:33 eos systemd[1]: display-manager.service: Start request repeated too quickly.
Oct 04 11:49:33 eos systemd[1]: display-manager.service: Failed with result 'exit-code'.
Oct 04 11:49:33 eos systemd[1]: Failed to start X11 Server.
Oct 04 11:49:33 eos systemd[1]: display-manager.service: Triggering OnFailure= dependencies.
Oct 04 11:49:33 eos systemd[1]: display-manager.service: Failed to enqueue OnFailure= job, ignoring: Unit plymouth-quit.service not found.
Oct 04 11:49:34 eos systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.

Notify maintainers

@davidak
@bobby285271

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

[user@system:~]$ nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 5.14.8, NixOS, 21.11 (Porcupine)`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.3.15`
 - channels(root): `"nixos-21.11pre320334.82155ff501c"`
 - channels(davidlewis): `"home-manager"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`

Maintainer information:

# a list of nixpkgs attributes affected by the problem
attribute:
# a list of nixos modules affected by the problem
module:
@oceanlewis oceanlewis added the 0.kind: bug Something is broken label Oct 4, 2021
@oceanlewis oceanlewis changed the title Pantheon: crash loop on start inside vmware Pantheon: crash loop on start inside vmware guest Oct 4, 2021
@veprbl veprbl added the 6.topic: pantheon The Pantheon desktop environment label Oct 6, 2021
@yellowgh0st
Copy link
Contributor

Happens on my HW too. I guess it's kind of blocker.

@oceanlewis
Copy link
Contributor Author

Happens on my HW too. I guess it's kind of blocker.

Are you also running inside a VM?

@bobby285271
Copy link
Member

bobby285271 commented Oct 7, 2021

Sorry I am relatively new to Pantheon packaging and cannot find out the real issue so far. And sadly, while I am using Pantheon daily I cannot reproduce this issue (both on my physical machine and on QEMU via nixos-rebuild build-vm) and I think related NixOS test should fail if this is 100% reproducible as it contains a check in logging in.

I currently have no idea besides trying to reproduce the issue in elementary OS 6 (to make sure this is not upstream issue) and try to use a different version of mutter and gnome-settings-daemon (to make sure this is not caused by 3.38 binding generation like #139404, to be honest I am not sure if this is really related in this case though), apologize for that.

@yellowgh0st
Copy link
Contributor

yellowgh0st commented Oct 7, 2021

@davidarmstronglewis No, it's real HW. First boot is always OK, then after messing with UI something goes wrong and greeter perhaps crashes. There's blinking wingpanel + tty for some time. Imho it might be related to some of the helper services?

In journalctl there's just lightdm greeter core dump :/

@yellowgh0st
Copy link
Contributor

yellowgh0st commented Oct 7, 2021

I currently have no idea besides trying to reproduce the issue in elementary OS 6 (to make sure this is not upstream issue) and try to use a different version of mutter and gnome-settings-daemon (to make sure this is not caused by 3.38 binding generation like #139404, to be honest I am not sure if this is really related in this case though), apologize for that.

As you suggest I'm pretty convinced it's one of these. It works for me before some changes handled by g-s-d are made.
Unfortunately I had no time to do real debug :(

@oceanlewis
Copy link
Contributor Author

oceanlewis commented Oct 7, 2021

Sorry I am relatively new to Pantheon packaging and cannot find out the real issue so far. And sadly, while I am using Pantheon daily I cannot reproduce this issue (both on my physical machine and on QEMU via nixos-rebuild build-vm) and I think related NixOS test should fail if this is 100% reproducible as it contains a check in logging in.

I currently have no idea besides trying to reproduce the issue in elementary OS 6 (to make sure this is not upstream issue) and try to use a different version of mutter and gnome-settings-daemon (to make sure this is not caused by 3.38 binding generation like #139404, to be honest I am not sure if this is really related in this case though), apologize for that.

I've set up a vmware guest using the latest elementary image and haven't been able to reproduce the issue on my end from that image. I'd like to offer to pair on this if that would be helpful. I don't have the experience with desktop managers (much less the pantheon stack) to be able to debug this. I haven't been able to get anything more helpful out of my efforts than the output from sudo lightdm --debug which emits:

[+0.00s] DEBUG: Logging to /var/log/lightdm/lightdm.log
[+0.00s] DEBUG: Starting Light Display Manager 1.30.0, UID=0 PID=2858
[+0.00s] DEBUG: Loading configuration dirs from /run/current-system/sw/share/lightdm/lightdm.conf.d
[+0.00s] DEBUG: Loading configuration dirs from /nix/var/nix/profiles/default/share/lightdm/lightdm.conf.d
[+0.00s] DEBUG: Loading configuration dirs from /etc/profiles/per-user/root/share/lightdm/lightdm.conf.d
[+0.00s] DEBUG: Loading configuration dirs from /root/.nix-profile/share/lightdm/lightdm.conf.d
[+0.00s] DEBUG: Loading configuration dirs from /var/lib/flatpak/exports/share/lightdm/lightdm.conf.d
[+0.00s] DEBUG: Loading configuration dirs from /root/.local/share/flatpak/exports/share/lightdm/lightdm.conf.d
[+0.00s] DEBUG: Loading configuration dirs from /nix/store/46d2jykf65h0160zj8xs3bqb9fl368n8-desktops/share/lightdm/lightdm.conf.d
[+0.00s] DEBUG: Loading configuration dirs from /run/current-system/sw/etc/xdg/lightdm/lightdm.conf.d
[+0.01s] DEBUG: Loading configuration dirs from /nix/var/nix/profiles/default/etc/xdg/lightdm/lightdm.conf.d
[+0.01s] DEBUG: Loading configuration dirs from /etc/profiles/per-user/root/etc/xdg/lightdm/lightdm.conf.d
[+0.01s] DEBUG: Loading configuration dirs from /root/.nix-profile/etc/xdg/lightdm/lightdm.conf.d
[+0.01s] DEBUG: Loading configuration dirs from /var/lib/flatpak/exports/etc/xdg/lightdm/lightdm.conf.d
[+0.01s] DEBUG: Loading configuration dirs from /root/.local/share/flatpak/exports/etc/xdg/lightdm/lightdm.conf.d
[+0.01s] DEBUG: Loading configuration from /etc/lightdm/lightdm.conf
[+0.01s] DEBUG: Registered seat module local
[+0.01s] DEBUG: Registered seat module xremote
[+0.01s] DEBUG: Using D-Bus name org.freedesktop.DisplayManager
[+0.01s] DEBUG: _g_io_module_get_default: Found default implementation local (GLocalVfs) for ?gio-vfs?
[+0.01s] DEBUG: Monitoring logind for seats
[+0.01s] DEBUG: New seat added from logind: seat0
[+0.02s] DEBUG: Seat seat0: Loading properties from config section Seat:*
[+0.02s] DEBUG: Seat seat0: Starting
[+0.02s] DEBUG: Seat seat0: Creating greeter session
[+0.02s] DEBUG: Seat seat0: Creating display server of type x
[+0.02s] DEBUG: Using VT 7
[+0.02s] DEBUG: Seat seat0: Starting local X display on VT 7
[+0.02s] DEBUG: XServer 0: Logging to /var/log/lightdm/x-0.log
[+0.02s] DEBUG: XServer 0: Writing X server authority to /var/run/lightdm/root/:0
[+0.02s] DEBUG: XServer 0: Launching X Server
[+0.02s] DEBUG: Launching process 2864: /nix/store/kgpkj2smry8vbqh0958h2gjndc9dmbrz-xserver-wrapper :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch
[+0.02s] DEBUG: XServer 0: Waiting for ready signal from X server :0
[+0.02s] DEBUG: Acquired bus name org.freedesktop.DisplayManager
[+0.02s] DEBUG: Registering seat with bus path /org/freedesktop/DisplayManager/Seat0
[+0.03s] DEBUG: Loading users from org.freedesktop.Accounts
[+0.03s] DEBUG: User /org/freedesktop/Accounts/User1000 added
[+0.10s] DEBUG: Seat seat0 changes active session to c3
[+0.30s] DEBUG: Got signal 10 from process 2864
[+0.30s] DEBUG: XServer 0: Got signal from X server :0
[+0.30s] DEBUG: XServer 0: Connecting to XServer :0
[+0.31s] DEBUG: Seat seat0: Display server ready, starting session authentication
[+0.31s] DEBUG: Session pid=2886: Started with service 'lightdm-greeter', username 'lightdm'
[+0.33s] DEBUG: Session pid=2886: Authentication complete with return value 0: Success
[+0.33s] DEBUG: Seat seat0: Session authenticated, running command
[+0.33s] DEBUG: Session pid=2886: Running command /nix/store/f0czljq917ym7zl347w17qmghpd1ka2j-elementary-greeter-6.0.1/bin/io.elementary.greeter
[+0.33s] DEBUG: Creating shared data directory /var/lib/lightdm-data/lightdm
[+0.33s] DEBUG: Session pid=2886: Logging to /var/log/lightdm/seat0-greeter.log
[+0.34s] DEBUG: Activating VT 7
[+0.61s] DEBUG: Greeter connected version=1.30.0 api=1 resettable=false
[+1.04s] DEBUG: Greeter start authentication for davidlewis
[+1.04s] DEBUG: Session pid=2988: Started with service 'lightdm', username 'davidlewis'
[+1.07s] DEBUG: Session pid=2988: Got 1 message(s) from PAM
[+1.07s] DEBUG: Prompt greeter with 1 message(s)
[+2.17s] DEBUG: Greeter closed communication channel
[+2.17s] DEBUG: Session pid=2886: Exited with return value 1
[+2.17s] DEBUG: Seat seat0: Session stopped
[+2.17s] DEBUG: Seat seat0: Stopping; failed to start a greeter
[+2.17s] DEBUG: Seat seat0: Stopping
[+2.17s] DEBUG: Seat seat0: Stopping display server
[+2.17s] DEBUG: Sending signal 15 to process 2864
[+2.17s] DEBUG: Seat seat0: Stopping session
[+2.17s] DEBUG: Session pid=2988: Sending SIGTERM
[+2.17s] DEBUG: Session pid=2988: Terminated with signal 15
[+2.18s] DEBUG: Session: Failed during authentication
[+2.18s] DEBUG: Seat seat0: Session stopped
[+2.21s] DEBUG: Process 2864 exited with return value 0
[+2.21s] DEBUG: XServer 0: X server stopped
[+2.21s] DEBUG: Releasing VT 7
[+2.21s] DEBUG: XServer 0: Removing X server authority /var/run/lightdm/root/:0
[+2.21s] DEBUG: Seat seat0: Display server stopped
[+2.21s] DEBUG: Seat seat0: Stopped
[+2.21s] DEBUG: Required seat has stopped
[+2.21s] DEBUG: Stopping display manager
[+2.21s] DEBUG: Display manager stopped
[+2.21s] DEBUG: Stopping daemon
[+2.21s] DEBUG: Exiting with return value 1

These two lines seem significant, but I'm not sure how to continue investigating:

[+1.07s] DEBUG: Session pid=2988: Got 1 message(s) from PAM

[+2.18s] DEBUG: Session: Failed during authentication

@bobby285271
Copy link
Member

bobby285271 commented Oct 8, 2021

I will try to prepare a mutter & g-s-d 3.36 downgrade (as 3.36 is what upstream is using, maybe it is not useful in this issue, I want to backup this before elementary OS 7 is out so we can switch back at anytime) and a mutter & g-s-d 40/41 upgrade.

Update: 3.36 downgrade https://gist.github.com/bobby285271/d5f5a4d50cbb9bd35e16157756a975b9
Update: 41 upgrade https://gist.github.com/bobby285271/b12ef3743a779efc55dbd0f8b4803236

It works for me before some changes...

To help me reproduce the issue, would you like to describe what changes did you make?

@oceanlewis
Copy link
Contributor Author

@bobby285271 I'm not sure how to help on this issue. Can you help me understand what next steps I can take?

@bobby285271
Copy link
Member

Can you help me understand what next steps I can take?

It will be great if you can help me reproduce the issue repeatly. Also you can try applying the above patches to nixpkgs and build your system with that and see if the issue is still there.

@oceanlewis
Copy link
Contributor Author

Can you help me understand what next steps I can take?

It will be great if you can help me reproduce the issue repeatly. Also you can try applying the above patches to nixpkgs and build your system with that and see if the issue is still there.

Absolutely I can try applying these patches. This isn't something I have done before, so I'm not sure how to accomplish it. If there's a resource you can share of applying a patch I can probably figure out how to apply the ones you've linked. I'm guessing there's a way to via an overlay or similar?

I also keep my Nix config here in case it's helpful:

@bobby285271
Copy link
Member

Absolutely I can try applying these patches. This isn't something I have done before, so I'm not sure how to accomplish it. If there's a resource you can share of applying a patch I can probably figure out how to apply the ones you've linked. I'm guessing there's a way to via an overlay or similar?

You can clone the Nixpkgs repository, apply the patch with git apply then use env NIX_PATH=nixpkgs=/path/to/nixpkgs:nixos-config=/path/to/configuration.nix nixos-rebuild boot (do not forget to adjust the path).

@oceanlewis
Copy link
Contributor Author

I'll make the attempt. I'm also not sure how to help you reproduce the issue on your end if it's stemming from my use of NixOS as a vmware guest running on macOS, but you should be able to replicate my setup using the configuration in the repo I linked.

@oceanlewis
Copy link
Contributor Author

@bobby285271 Neither of those patches fix the issue on my end. Both exhibit the same behavior

Screen Shot 2021-10-17 at 7 26 38 PM

@bobby285271
Copy link
Member

Sorry again I don't have new ideas currently, if you still perfer Pantheon, maybe just don't use their greeter for now.

services.xserver.displayManager.lightdm.greeters.pantheon.enable = false;

Looking through elementary/greeter@5.0.4...6.0.1 I was not able to find something what can break the greeter yet.

@oceanlewis
Copy link
Contributor Author

Sorry again I don't have new ideas currently, if you still perfer Pantheon, maybe just don't use their greeter for now.

services.xserver.displayManager.lightdm.greeters.pantheon.enable = false;

That definitely bypasses the issue for me and does point to something being funky with the greeter. Why, though, I have no idea. Thanks for your time and the workaround.

Again, I'm happy to spend time debugging if I can be helped out with how to go about that (haven't the most experience debugging programs written in C/Vala, but more than happy to learn). Or if this is outside your experience, I'm happy to reach out to one of the maintainers of https://github.com/elementary/greeter/ if that's an appropriate thing to do.

@oceanlewis
Copy link
Contributor Author

I'm going to close this as I think this is an issue with my particular installation. After creating a new VM from scratch I'm not running into any of the issues I've described above. I don't have much of an idea why I was running into the issues I was due to a lack of technical know-how on my end with digging into the issue, but if I had to guess there was something stateful on my system that was corrupted during my experimentation with the VM.

@yellowgh0st
Copy link
Contributor

The issue is present on my real HW. It works when you do fresh install, but whenever you start making changes within the DE it breaks it.

@davidak
Copy link
Member

davidak commented Oct 30, 2021

@yellowgh0st what do you mean with changes? changing settings using the settings program?

Which settings exactly? Is it reproducible?

@yellowgh0st
Copy link
Contributor

@davidak Yes, I just set-up my preferences in system settings, add some shortcuts to the dock and boom LightDM is broken.
It should be I will try come up with straightforward steps when I get little time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken 6.topic: pantheon The Pantheon desktop environment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants