Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tailscale step runs successfully but subsequent steps to connect to DB fail #130

Open
khernandezrt opened this issue Jun 4, 2024 · 11 comments

Comments

@khernandezrt
Copy link

We created the correct tags and set the scope to device.
The step for Tailscale runs(i dont see any confirmations that we are connected) but the step to run my tests fail with
ERROR tests/mycode/code/test_my_code.py - sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on 'mysqlserver.us-east-1.rds.amazonaws.com' (timed out)")

We also see the node being created on the Tailscale UI but i keep getting a timeout when I run pytest.

name: Python application

on:
  push:
    branches: [ "feature/github-actions" ]
  pull_request:
    branches: [ "feature/github-actions" ]

env:
  AWS_CONFIG_FILE: .github/workflows/aws_config
  DB_NAME: "mydbname"
  DB_READ_SERVER: "mysqlserver.us-east-1.rds.amazonaws.com"
  DB_USERNAME: "root"
  DB_PASSWORD: ${{secrets.DB_PASSWORD}}

  AWS_PROFILE: "dev"
  API_VERSION: "v1"
  FRONT_END_KEY: ${{secrets.FRONT_END_KEY}}

  LOG_LEVEL: "INFO"
  DB_USER_ID: 32
  SENTRY_SAMPLE_RATE: 1
  NUMEXPR_MAX_THREADS: "8"

  LOG_LEVEL_CONSOLE: True
  LOG_LEVEL_ALGORITHM: "INFO"
  LOG_LEVEL_DB: "WARNING"

permissions:
  contents: read

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
      - name: Tailscale
        uses: tailscale/github-action@v2
        with:
          oauth-client-id: ${{ secrets.TS_OAUTH_CLIENT_ID }}
          oauth-secret: ${{ secrets.TS_OAUTH_SECRET }}
          tags: tag:cicd
      - uses: actions/checkout@v4
      - name: Set up Python 3.12
        uses: actions/setup-python@v3
        with:
          python-version: "3.12"
      - name: Install dependencies
        run: |
          pip install -r requirements-dev.txt
      - name: Test with pytest
        env: 
          PYTHONPATH: ${{github.workspace}}/src
        run: |
          pytest
@khernandezrt
Copy link
Author

Switching the URL to a direct IP did the trick. Looks like a DNS issue.
I will leave this issue open as id prefer not to use a direct IP.

@henworth
Copy link

I'm encountering a similar timeout error, although doesn't seem to be DNS in my case as the IP is resolved properly:

Error: Error connecting to PostgreSQL server database.us-east-1.rds.amazonaws.com (scheme: awspostgres): dial tcp correct.ip.address:5432: connect: connection timed out

@khernandezrt
Copy link
Author

@henworth
Copy link

henworth commented Jun 10, 2024

@henworth Have you setup your security policies correctly for your Tailscale instance?

Yep, I've done all this. It was working fine and now I'm not sure what's wrong.

Connectivity to this db works fine from other non-GitHub nodes using hostname or ip.

@talha5389-teraception
Copy link

I also started having issues 2 weeks ago. I have also verified that things works fine outside of github actions using same configuration

@ebarriosjr
Copy link

I am having the same issue. It has been working perfectly so far but today I get random i/o timeouts.

@ericpollmann
Copy link

ericpollmann commented Jul 3, 2024

Same here! I had random failures especially on the first connection to our RDS instance (running in AWS) from a github action worker (running in Azure). Subsequent connections after the first failure would succeed. I did some debugging and found that the connection is going through DERP despite having inbound wireguard port for IPv4/v6 on the AWS side.

I changed our use to first run a single ping to the subnet router DNS hostname after bringing up tailscale and that seemed to dramatically improve reliability though still had 1 fail in 10 (that time it was the ping itself failing)

Set up Split DNS and haven't had a failure since then, though only have had 10 or so runs since then.

@henworth
Copy link

henworth commented Jul 4, 2024

My issue turned out to be related to the stateful filtering added in v1.66.0. Once I disabled that on my subnet routers the problem disappeared.

@aaomidi
Copy link

aaomidi commented Nov 25, 2024

I wonder if there's a propagation delay here? E.g. a new node comes up but doesn't propagate fast enough. I wonder if adding a wait of 5 seconds or so would help here. Maybe thats why pinging may have helped?

The stateful filtering is interesting, but it's disabled by default it seems.

@aaomidi
Copy link

aaomidi commented Dec 5, 2024

@henworth can you describe what flags you changed? I think I'm seeing something similar to this but in the helm world this time.

Update:

--stateful-filtering Enable stateful filtering for [subnet routers](https://tailscale.com/kb/1019/subnets) and [exit nodes](https://tailscale.com/kb/1103/exit-nodes). When enabled, inbound packets with another node's destination IP are dropped, unless they are a part of a tracked outbound connection from that node. Defaults to disabled.

Seems like default is false?

@henworth
Copy link

henworth commented Dec 5, 2024

At the time I wrote that comment the default was true, it has since been changed to false in a subsequent release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants