Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"failed to connect to local tailscaled" on self-hosted runners #140

Open
jasonbecker-os opened this issue Oct 21, 2024 · 5 comments
Open

Comments

@jasonbecker-os
Copy link

jasonbecker-os commented Oct 21, 2024

Running my workflow on github's hosted runner works fine, but when I switch to self-hosted runners (using a DinD approach and ultimately running in a nimmis/ubuntu:latest container) I'm finding that the action fails (see GitHub workflow output):

  sudo -E tailscaled --state=mem: ${ADDITIONAL_DAEMON_ARGS} 2>~/tailscaled.log &
  # And check that tailscaled came up. The CLI will block for a bit waiting
  # for it. And --json will make it exit with status 0 even if we're logged
  # out (as we will be). Without --json it returns an error if we're not up.
  sudo -E tailscale status --json >/dev/null
  shell: bash --noprofile --norc -e -o pipefail {0}
  env:
    ADDITIONAL_DAEMON_ARGS: 
  
failed to connect to local tailscaled; it doesn't appear to be running (sudo systemctl start tailscaled ?)
Error: Process completed with exit code 1.

I think the issue may have something to do with the CLI not blocking as expected, since as you can see in the image below with the timestamps, the entire thing is completed in seconds.

image

The only change I made to the nimmis/ubuntu:latest image was to run:

apt-get update
apt-get install -y sudo --fix-missing

since sudo was not installed by default since the only user is root.

Maybe some other error is being thrown by sudo -E tailscale status --json but it's being swallowed by the >/dev/null ?

EDIT: No, it's not redirecting stderr, only stdout... 🤔

@jasonbecker-os
Copy link
Author

jasonbecker-os commented Oct 21, 2024

I tried copying the action into my repo so I could fiddle with it and have found that even after removing the sudos (since it's being run by root anyways) and adding a retry loop around the tailscale status call, it is still failing to find the tailscaled process (see GitHub workflow output):

  set -xv
  if [ "$STATEDIR" == "" ]; then
    STATE_ARGS="--state=mem:"
  else
    STATE_ARGS="--statedir=${STATEDIR}"
    mkdir -p "$STATEDIR"
  fi
  tailscaled ${STATE_ARGS} ${ADDITIONAL_DAEMON_ARGS} 2>~/tailscaled.log &
  # And check that tailscaled came up. The CLI will block for a bit waiting
  # for it. And --json will make it exit with status 0 even if we're logged
  # out (as we will be). Without --json it returns an error if we're not up.
  
  # Retry mechanism for tailscale status
  for i in {1..10}; do
    tailscale status --json >/dev/null && break || sleep 5
  done
  shell: bash --noprofile --norc -e -o pipefail {0}
  env:
    ADDITIONAL_DAEMON_ARGS: 
    STATEDIR: 
if [ "$STATEDIR" == "" ]; then
  STATE_ARGS="--state=mem:"
else
  STATE_ARGS="--statedir=${STATEDIR}"
  mkdir -p "$STATEDIR"
fi
+ '[' '' == '' ']'
+ STATE_ARGS=--state=mem:
tailscaled ${STATE_ARGS} ${ADDITIONAL_DAEMON_ARGS} 2>~/tailscaled.log &
# And check that tailscaled came up. The CLI will block for a bit waiting
# for it. And --json will make it exit with status 0 even if we're logged
# out (as we will be). Without --json it returns an error if we're not up.
# Retry mechanism for tailscale status
for i in {1..10}; do
  tailscale status --json >/dev/null && break || sleep 5
done
+ for i in {1..10}
+ tailscale status --json
+ tailscaled --state=mem:
failed to connect to local tailscaled; it doesn't appear to be running (sudo systemctl start tailscaled ?)
+ sleep 5

(I removed the repeated retries for brevity)

@jasonbecker-os
Copy link
Author

Ah, here's something useful! I turned off the redirect stderr to file by removing 2>~/tailscaled.log and got this additional output:

2024/10/21 20:56:59 logtail started
2024/10/21 20:56:59 Program starting: v1.72.1-tc02a15244-g5c00d019b, Go 1.22.5: []string{"tailscaled", "--state=mem:"}
2024/10/21 20:56:59 LogID: 6c0a127189e4fc2f4a4bc59fbb8ed0b5597478fcce428b747951758e5da[99](https://github.com/openspacelabs/openspace/actions/runs/11448472642/job/31852107429#step:4:102)be1
2024/10/21 20:56:59 logpolicy: using system state directory "/var/lib/tailscale"
logpolicy.ConfigFromFile /var/lib/tailscale/tailscaled.log.conf: open /var/lib/tailscale/tailscaled.log.conf: no such file or directory
logpolicy.Config.Validate for /var/lib/tailscale/tailscaled.log.conf: config is nil
2024/10/21 20:56:59 dns: [rc=unknown ret=direct]
2024/10/21 20:56:59 dns: using "direct" mode
2024/10/21 20:56:59 dns: using *dns.directManager
2024/10/21 20:56:59 linuxfw: clear iptables: exec: "iptables": executable file not found in $PATH
2024/10/21 20:56:59 linuxfw: clear ip6tables: exec: "ip6tables": executable file not found in $PATH
2024/10/21 20:56:59 cleanup: list tables: netlink receive: operation not permitted
2024/10/21 20:56:59 wgengine.NewUserspaceEngine(tun "tailscale0") ...
2024/10/21 20:56:59 Linux kernel version: 6.1.[109](https://github.com/openspacelabs/openspace/actions/runs/11448472642/job/31852107429#step:4:112)
2024/10/21 20:56:59 is CONFIG_TUN enabled in your kernel? `modprobe tun` failed with: 
2024/10/21 20:56:59 tun module not loaded nor found on disk
2024/10/21 20:56:59 wgengine.NewUserspaceEngine(tun "tailscale0") error: tstun.New("tailscale0"): CreateTUN("tailscale0") failed; /dev/net/tun does not exist
2024/10/21 20:56:59 flushing log.
2024/10/21 20:56:59 logger closing down
2024/10/21 20:56:59 getLocalBackend error: createEngine: tstun.New("tailscale0"): CreateTUN("tailscale0") failed; /dev/net/tun does not exist

So the problem appears to be that it's expecting TUN to exist and it's swallowing errors if it's not.

@jasonbecker-os
Copy link
Author

For posterity: I found I was able to get around this by adding the TUN device to my container in my workflow, like so:

container:
  image: <image>
  options: --cap-add=NET_ADMIN --device=/dev/net/tun

If there's an action item to come out of this though, it's that the tailscaled command should probably share stderr with console, via either something like tee which writes to both a file and console at the same time, or adding a line to print the contents of the tailscaled.log file if it's not empty.

@bryan-rhm
Copy link

I'm having the same issue, is there any other workaround?

@jasonbecker-os
Copy link
Author

jasonbecker-os commented Nov 27, 2024

@bryan-rhm

I'm having the same issue, is there any other workaround?

See my previous comment. In my case, I just had to add those options to the workflow file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants