"failed to connect to local tailscaled" on self-hosted runners #140

jasonbecker-os · 2024-10-21T18:11:42Z

Running my workflow on github's hosted runner works fine, but when I switch to self-hosted runners (using a DinD approach and ultimately running in a nimmis/ubuntu:latest container) I'm finding that the action fails (see GitHub workflow output):

  sudo -E tailscaled --state=mem: ${ADDITIONAL_DAEMON_ARGS} 2>~/tailscaled.log &
  # And check that tailscaled came up. The CLI will block for a bit waiting
  # for it. And --json will make it exit with status 0 even if we're logged
  # out (as we will be). Without --json it returns an error if we're not up.
  sudo -E tailscale status --json >/dev/null
  shell: bash --noprofile --norc -e -o pipefail {0}
  env:
    ADDITIONAL_DAEMON_ARGS: 
  
failed to connect to local tailscaled; it doesn't appear to be running (sudo systemctl start tailscaled ?)
Error: Process completed with exit code 1.

I think the issue may have something to do with the CLI not blocking as expected, since as you can see in the image below with the timestamps, the entire thing is completed in seconds.

The only change I made to the nimmis/ubuntu:latest image was to run:

apt-get update
apt-get install -y sudo --fix-missing

since sudo was not installed by default since the only user is root.

~~Maybe some other error is being thrown by sudo -E tailscale status --json but it's being swallowed by the >/dev/null ?~~

EDIT: No, it's not redirecting stderr, only stdout... 🤔

The text was updated successfully, but these errors were encountered:

jasonbecker-os · 2024-10-21T20:47:19Z

I tried copying the action into my repo so I could fiddle with it and have found that even after removing the sudos (since it's being run by root anyways) and adding a retry loop around the tailscale status call, it is still failing to find the tailscaled process (see GitHub workflow output):

  set -xv
  if [ "$STATEDIR" == "" ]; then
    STATE_ARGS="--state=mem:"
  else
    STATE_ARGS="--statedir=${STATEDIR}"
    mkdir -p "$STATEDIR"
  fi
  tailscaled ${STATE_ARGS} ${ADDITIONAL_DAEMON_ARGS} 2>~/tailscaled.log &
  # And check that tailscaled came up. The CLI will block for a bit waiting
  # for it. And --json will make it exit with status 0 even if we're logged
  # out (as we will be). Without --json it returns an error if we're not up.
  
  # Retry mechanism for tailscale status
  for i in {1..10}; do
    tailscale status --json >/dev/null && break || sleep 5
  done
  shell: bash --noprofile --norc -e -o pipefail {0}
  env:
    ADDITIONAL_DAEMON_ARGS: 
    STATEDIR: 
if [ "$STATEDIR" == "" ]; then
  STATE_ARGS="--state=mem:"
else
  STATE_ARGS="--statedir=${STATEDIR}"
  mkdir -p "$STATEDIR"
fi
+ '[' '' == '' ']'
+ STATE_ARGS=--state=mem:
tailscaled ${STATE_ARGS} ${ADDITIONAL_DAEMON_ARGS} 2>~/tailscaled.log &
# And check that tailscaled came up. The CLI will block for a bit waiting
# for it. And --json will make it exit with status 0 even if we're logged
# out (as we will be). Without --json it returns an error if we're not up.
# Retry mechanism for tailscale status
for i in {1..10}; do
  tailscale status --json >/dev/null && break || sleep 5
done
+ for i in {1..10}
+ tailscale status --json
+ tailscaled --state=mem:
failed to connect to local tailscaled; it doesn't appear to be running (sudo systemctl start tailscaled ?)
+ sleep 5

(I removed the repeated retries for brevity)

jasonbecker-os · 2024-10-21T21:01:36Z

Ah, here's something useful! I turned off the redirect stderr to file by removing 2>~/tailscaled.log and got this additional output:

2024/10/21 20:56:59 logtail started
2024/10/21 20:56:59 Program starting: v1.72.1-tc02a15244-g5c00d019b, Go 1.22.5: []string{"tailscaled", "--state=mem:"}
2024/10/21 20:56:59 LogID: 6c0a127189e4fc2f4a4bc59fbb8ed0b5597478fcce428b747951758e5da[99](https://github.com/openspacelabs/openspace/actions/runs/11448472642/job/31852107429#step:4:102)be1
2024/10/21 20:56:59 logpolicy: using system state directory "/var/lib/tailscale"
logpolicy.ConfigFromFile /var/lib/tailscale/tailscaled.log.conf: open /var/lib/tailscale/tailscaled.log.conf: no such file or directory
logpolicy.Config.Validate for /var/lib/tailscale/tailscaled.log.conf: config is nil
2024/10/21 20:56:59 dns: [rc=unknown ret=direct]
2024/10/21 20:56:59 dns: using "direct" mode
2024/10/21 20:56:59 dns: using *dns.directManager
2024/10/21 20:56:59 linuxfw: clear iptables: exec: "iptables": executable file not found in $PATH
2024/10/21 20:56:59 linuxfw: clear ip6tables: exec: "ip6tables": executable file not found in $PATH
2024/10/21 20:56:59 cleanup: list tables: netlink receive: operation not permitted
2024/10/21 20:56:59 wgengine.NewUserspaceEngine(tun "tailscale0") ...
2024/10/21 20:56:59 Linux kernel version: 6.1.[109](https://github.com/openspacelabs/openspace/actions/runs/11448472642/job/31852107429#step:4:112)
2024/10/21 20:56:59 is CONFIG_TUN enabled in your kernel? `modprobe tun` failed with: 
2024/10/21 20:56:59 tun module not loaded nor found on disk
2024/10/21 20:56:59 wgengine.NewUserspaceEngine(tun "tailscale0") error: tstun.New("tailscale0"): CreateTUN("tailscale0") failed; /dev/net/tun does not exist
2024/10/21 20:56:59 flushing log.
2024/10/21 20:56:59 logger closing down
2024/10/21 20:56:59 getLocalBackend error: createEngine: tstun.New("tailscale0"): CreateTUN("tailscale0") failed; /dev/net/tun does not exist

So the problem appears to be that it's expecting TUN to exist and it's swallowing errors if it's not.

jasonbecker-os · 2024-10-25T22:46:54Z

For posterity: I found I was able to get around this by adding the TUN device to my container in my workflow, like so:

container:
  image: <image>
  options: --cap-add=NET_ADMIN --device=/dev/net/tun

If there's an action item to come out of this though, it's that the tailscaled command should probably share stderr with console, via either something like tee which writes to both a file and console at the same time, or adding a line to print the contents of the tailscaled.log file if it's not empty.

bryan-rhm · 2024-11-27T19:00:49Z

I'm having the same issue, is there any other workaround?

jasonbecker-os · 2024-11-27T21:46:26Z

@bryan-rhm

I'm having the same issue, is there any other workaround?

See my previous comment. In my case, I just had to add those options to the workflow file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"failed to connect to local tailscaled" on self-hosted runners #140

"failed to connect to local tailscaled" on self-hosted runners #140

jasonbecker-os commented Oct 21, 2024 •

edited

Loading

jasonbecker-os commented Oct 21, 2024 •

edited

Loading

jasonbecker-os commented Oct 21, 2024

jasonbecker-os commented Oct 25, 2024

bryan-rhm commented Nov 27, 2024

jasonbecker-os commented Nov 27, 2024 •

edited

Loading

"failed to connect to local tailscaled" on self-hosted runners #140

"failed to connect to local tailscaled" on self-hosted runners #140

Comments

jasonbecker-os commented Oct 21, 2024 • edited Loading

jasonbecker-os commented Oct 21, 2024 • edited Loading

jasonbecker-os commented Oct 21, 2024

jasonbecker-os commented Oct 25, 2024

bryan-rhm commented Nov 27, 2024

jasonbecker-os commented Nov 27, 2024 • edited Loading

jasonbecker-os commented Oct 21, 2024 •

edited

Loading

jasonbecker-os commented Oct 21, 2024 •

edited

Loading

jasonbecker-os commented Nov 27, 2024 •

edited

Loading