Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic when adding healthcheck to gateway in standalone mode #4858

Open
hoyon opened this issue Dec 6, 2024 · 6 comments
Open

Panic when adding healthcheck to gateway in standalone mode #4858

hoyon opened this issue Dec 6, 2024 · 6 comments
Assignees
Labels
area/standalone Issues related to the standalone mode cherrypick/release-v1.2.4 kind/bug Something isn't working
Milestone

Comments

@hoyon
Copy link

hoyon commented Dec 6, 2024

Description:

When using standalone mode, envoy gateway panics when applying a config which applies a BackendTrafficPolicy with a health check attached.

I've also tested this on macOS and there it causes a segfault rather than recovering from a panic.

I haven't been able to reproduce this issue when using Envoy Gateway in a local minikube cluster.

Repro steps:

envoy-gateway server -c standalone.yaml

standalone.yaml

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyGateway
gateway:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
provider:
  type: Custom
  custom:
    resource:
      type: File
      file:
        paths: ["/tmp/config.yaml"]
    infrastructure:
      type: Host
      host: {}
logging:
  level:
    default: info
extensionApis:
  enableBackend: true

config.yaml

apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: eg
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: eg
spec:
  gatewayClassName: eg
  listeners:
    - name: http
      protocol: HTTP
      port: 8888
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: backend
spec:
  parentRefs:
    - name: eg
  hostnames:
    - "www.example.com"
  rules:
    - backendRefs:
        - group: "gateway.envoyproxy.io"
          kind: Backend
          name: backend
      matches:
        - path:
            type: PathPrefix
            value: /
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: Backend
metadata:
  name: backend
spec:
  endpoints:
    - ip:
        address: 0.0.0.0
        port: 4000
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: backend-policy
spec:
  targetRefs:
    - group: gateway.networking.k8s.io
      kind: Gateway
      name: eg
# vvvvv all works if bit below here is removed
  healthCheck:
    active:
      type: HTTP
      http:
        path: "/"
        method: GET

Environment:

I have tested this on the latest commit pre-release version on Linux and 1.2.3 on macOS.

./bin/linux/amd64/envoy-gateway version
ENVOY_GATEWAY_VERSION: v1.2.3
ENVOY_PROXY_VERSION: distroless-dev
GATEWAYAPI_VERSION: v1.2.1
GIT_COMMIT_ID: 8cba9584ab08869c7381828b9d87a5aff507e7a9
GOLANG_VERSION: go1.23.3

Logs:

Linux panic logs:

2024-12-06T16:09:49.683Z	INFO	config-loader	loader/configloader.go:105	running hook
2024-12-06T16:09:49.683Z	INFO	config-loader	loader/configloader.go:47	watching for changes to the EnvoyGateway configuration	{"path": "standalone.yaml"}
2024-12-06T16:09:49.683Z	INFO	cmd/server.go:59	Setup runners
2024-12-06T16:09:49.683Z	INFO	provider	runner/runner.go:62	Running provider	{"runner": "provider", "type": "Custom"}
2024-12-06T16:09:49.683Z	INFO	admin	admin/server.go:34	starting admin server	{"address": "127.0.0.1:19000", "enablePprof": false}
2024-12-06T16:09:49.683Z	INFO	gateway-api	runner/runner.go:86	started	{"runner": "gateway-api"}
2024-12-06T16:09:49.683Z	INFO	xds-translator	runner/runner.go:47	started	{"runner": "xds-translator"}
2024-12-06T16:09:49.683Z	INFO	metrics	metrics/register.go:171	initialized metrics pull endpoint	{"address": "0.0.0.0:19001", "endpoint": "/metrics"}
2024-12-06T16:09:49.683Z	INFO	infrastructure	runner/runner.go:67	started	{"runner": "infrastructure"}
2024-12-06T16:09:49.683Z	ERROR	infrastructure	runner/runner.go:63	failed to delete ratelimit infra	{"runner": "infrastructure", "error": "delete ratelimit infrastructure is not supported yet for host infrastructure"}
2024-12-06T16:09:49.683Z	INFO	provider	file/file.go:185	starting health probe server	{"runner": "provider", "address": ":8081"}
2024-12-06T16:09:49.683Z	INFO	xds-server	runner/runner.go:89	loaded TLS certificate and key	{"runner": "xds-server"}
2024-12-06T16:09:49.683Z	ERROR	gateway-api	runner/runner.go:102	Failed to create Wasm cache directory	{"runner": "gateway-api", "error": "mkdir /var/lib/eg: permission denied"}
2024-12-06T16:09:49.683Z	INFO	metrics	metrics/register.go:60	starting metrics server	{"address": "0.0.0.0:19001"}
2024-12-06T16:09:49.683Z	INFO	xds-server	runner/runner.go:104	started	{"runner": "xds-server"}
2024-12-06T16:09:49.818Z	INFO	provider	file/store.go:70	loaded and stored resources successfully	{"runner": "provider"}
2024-12-06T16:09:49.818Z	INFO	gateway-api	runner/runner.go:121	received an update	{"runner": "gateway-api"}
2024-12-06T16:09:49.818Z	INFO	provider	file/file.go:75	Watching path added	{"runner": "provider", "path": "/home/hoyon/code/sandbox/envoy-segfault/config.yaml"}
2024-12-06T16:09:49.820Z	INFO	infrastructure	runner/runner.go:92	received an update	{"runner": "infrastructure"}
looking up the latest patch for Envoy version 1.32
2024-12-06T16:09:49.821Z	INFO	xds-translator	runner/runner.go:55	received an update	{"runner": "xds-translator"}
2024-12-06T16:09:49.822Z	ERROR	watchable	message/watchutil.go:51	observed a panic	{"runner": "xds-translator", "stackTrace": "goroutine 54 [running]:\nruntime/debug.Stack()\n\t/opt/hostedtoolcache/go/1.23.3/x64/src/runtime/debug/stack.go:26 +0x5e\ngithub.com/envoyproxy/gateway/internal/message.handleWithCrashRecovery[...].func1()\n\t/home/runner/work/gateway/gateway/internal/message/watchutil.go:52 +0x1c5\npanic({0x2c52ec0?, 0x993f970?})\n\t/opt/hostedtoolcache/go/1.23.3/x64/src/runtime/panic.go:785 +0x132\ngithub.com/envoyproxy/gateway/internal/xds/translator.buildXdsHealthCheck(0xc002195a40)\n\t/home/runner/work/gateway/gateway/internal/xds/translator/cluster.go:253 +0x25\ngithub.com/envoyproxy/gateway/internal/xds/translator.buildXdsCluster(0xc0027be6e0)\n\t/home/runner/work/gateway/gateway/internal/xds/translator/cluster.go:236 +0xe72\ngithub.com/envoyproxy/gateway/internal/xds/translator.addXdsCluster(0xc0021844c0, 0xc0027be6e0)\n\t/home/runner/work/gateway/gateway/internal/xds/translator/translator.go:827 +0x175\ngithub.com/envoyproxy/gateway/internal/xds/translator.processXdsCluster(0xc0021844c0, {0x788a260?, 0xc0025da880?}, 0xf?)\n\t/home/runner/work/gateway/gateway/internal/xds/translator/translator.go:783 +0x36\ngithub.com/envoyproxy/gateway/internal/xds/translator.(*Translator).addRouteToRouteConfig(0xc0016c9920, 0xc0021844c0, 0xc0021c28c0, 0xc0021b8200, 0x0, 0x0)\n\t/home/runner/work/gateway/gateway/internal/xds/translator/translator.go:474 +0xa6b\ngithub.com/envoyproxy/gateway/internal/xds/translator.(*Translator).processHTTPListenerXdsTranslation(0xc0016c9920, 0xc0021844c0, {0xc0025da078, 0x1, 0xc00121d610?}, 0xc0022fa6c0, 0x0, 0x0)\n\t/home/runner/work/gateway/gateway/internal/xds/translator/translator.go:365 +0x7aa\ngithub.com/envoyproxy/gateway/internal/xds/translator.(*Translator).Translate(0xc0016c9920, 0xc0025c3950)\n\t/home/runner/work/gateway/gateway/internal/xds/translator/translator.go:90 +0x7f\ngithub.com/envoyproxy/gateway/internal/xds/translator/runner.(*Runner).subscribeAndTranslate.func1({{0xc000e26288?, 0x2bc2a20?}, 0x70?, 0xc0025c3950?}, 0xc001490000)\n\t/home/runner/work/gateway/gateway/internal/xds/translator/runner/runner.go:83 +0x294\ngithub.com/envoyproxy/gateway/internal/message.handleWithCrashRecovery[...](0xc00121df98?, {{0xc000e26288, 0xc001229d38?}, 0x8?, 0xc0025c3950?}, {{0x326b512, 0xe}, {0x3257b17, 0x6}}, 0xc001490000?)\n\t/home/runner/work/gateway/gateway/internal/message/watchutil.go:58 +0x137\ngithub.com/envoyproxy/gateway/internal/message.HandleSubscription[...]({{0x326b512, 0x78cb870?}, {0x3257b17?, 0xa?}}, 0xc001cc80e0?, 0xc0016c9f98)\n\t/home/runner/work/gateway/gateway/internal/message/watchutil.go:97 +0x7b0\ngithub.com/envoyproxy/gateway/internal/xds/translator/runner.(*Runner).subscribeAndTranslate(0xc000123380, {0x78cb870?, 0xc000f3a000?})\n\t/home/runner/work/gateway/gateway/internal/xds/translator/runner/runner.go:53 +0x7e\ncreated by github.com/envoyproxy/gateway/internal/xds/translator/runner.(*Runner).Start in goroutine 121\n\t/home/runner/work/gateway/gateway/internal/xds/translator/runner/runner.go:46 +0x235\n", "error": "runtime error: invalid memory address or nil pointer dereference"}
1.32.1 is already downloaded

macOS crash logs

2024-12-06T15:43:23.826Z	INFO	config-loader	loader/configloader.go:105	running hook
2024-12-06T15:43:23.826Z	INFO	config-loader	loader/configloader.go:47	watching for changes to the EnvoyGateway configuration	{"path": "standalone.yaml"}
2024-12-06T15:43:23.826Z	INFO	cmd/server.go:59	Setup runners
2024-12-06T15:43:23.826Z	INFO	provider	runner/runner.go:62	Running provider	{"runner": "provider", "type": "Custom"}
2024-12-06T15:43:23.826Z	INFO	gateway-api	runner/runner.go:86	started	{"runner": "gateway-api"}
2024-12-06T15:43:23.826Z	INFO	xds-translator	runner/runner.go:47	started	{"runner": "xds-translator"}
2024-12-06T15:43:23.826Z	INFO	admin	admin/server.go:34	starting admin server	{"address": "127.0.0.1:19000", "enablePprof": false}
2024-12-06T15:43:23.826Z	INFO	metrics	metrics/register.go:171	initialized metrics pull endpoint	{"address": "0.0.0.0:19001", "endpoint": "/metrics"}
2024-12-06T15:43:23.826Z	INFO	provider	file/file.go:127	starting health probe server	{"runner": "provider", "address": ":8081"}
2024-12-06T15:43:23.826Z	INFO	metrics	metrics/register.go:60	starting metrics server	{"address": "0.0.0.0:19001"}
2024-12-06T15:43:23.826Z	INFO	infrastructure	runner/runner.go:67	started	{"runner": "infrastructure"}
2024-12-06T15:43:23.827Z	ERROR	infrastructure	runner/runner.go:63	failed to delete ratelimit infra	{"runner": "infrastructure", "error": "delete ratelimit infrastructure is not supported yet for host infrastructure"}
2024-12-06T15:43:23.827Z	ERROR	gateway-api	runner/runner.go:102	Failed to create Wasm cache directory	{"runner": "gateway-api", "error": "mkdir /var/lib/eg: permission denied"}
2024-12-06T15:43:23.827Z	INFO	xds-server	runner/runner.go:89	loaded TLS certificate and key	{"runner": "xds-server"}
2024-12-06T15:43:23.827Z	INFO	xds-server	runner/runner.go:104	started	{"runner": "xds-server"}
2024-12-06T15:43:23.909Z	INFO	provider	file/store.go:75	loaded and stored resources successfully	{"runner": "provider"}
2024-12-06T15:43:23.909Z	INFO	gateway-api	runner/runner.go:121	received an update	{"runner": "gateway-api"}
2024-12-06T15:43:23.910Z	INFO	infrastructure	runner/runner.go:92	received an update	{"runner": "infrastructure"}
looking up the latest patch for Envoy version 1.29
2024-12-06T15:43:23.911Z	INFO	xds-translator	runner/runner.go:55	received an update	{"runner": "xds-translator"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x0 pc=0x102ddfc04]

goroutine 150 [running]:
github.com/envoyproxy/gateway/internal/xds/translator.buildXdsHealthCheck(0x140022cfac0)
	/home/runner/work/gateway/gateway/internal/xds/translator/cluster.go:253 +0x24
github.com/envoyproxy/gateway/internal/xds/translator.buildXdsCluster(0x14000556be0)
	/home/runner/work/gateway/gateway/internal/xds/translator/cluster.go:236 +0xc68
github.com/envoyproxy/gateway/internal/xds/translator.addXdsCluster(0x1400168c080, 0x14000556be0)
	/home/runner/work/gateway/gateway/internal/xds/translator/translator.go:826 +0x158
github.com/envoyproxy/gateway/internal/xds/translator.processXdsCluster(0x1400168c080, {0x107e46d00?, 0x14001066018?}, 0xf?)
	/home/runner/work/gateway/gateway/internal/xds/translator/translator.go:782 +0x40
github.com/envoyproxy/gateway/internal/xds/translator.(*Translator).addRouteToRouteConfig(0x14001803908, 0x1400168c080, 0x140019343c0, 0x14001ff2400, 0x0, 0x0)
	/home/runner/work/gateway/gateway/internal/xds/translator/translator.go:473 +0x818
github.com/envoyproxy/gateway/internal/xds/translator.(*Translator).processHTTPListenerXdsTranslation(0x14001803908, 0x1400168c080, {0x14000da9908, 0x1, 0x140018e95f8?}, 0x14000fc27e0, 0x0, 0x0)
	/home/runner/work/gateway/gateway/internal/xds/translator/translator.go:365 +0x554
github.com/envoyproxy/gateway/internal/xds/translator.(*Translator).Translate(0x14001803908, 0x14001821950)
	/home/runner/work/gateway/gateway/internal/xds/translator/translator.go:90 +0x68
github.com/envoyproxy/gateway/internal/xds/translator/runner.(*Runner).subscribeAndTranslate.func1({{0x140024a2060?, 0x0?}, 0x2?, 0x14001821950?}, 0x140021de460)
	/home/runner/work/gateway/gateway/internal/xds/translator/runner/runner.go:83 +0x210
github.com/envoyproxy/gateway/internal/message.HandleSubscription[...]({{0x102e976de, 0xe?}, {0x102e83d5d?, 0x6?}}, 0x140004d6c40?, 0x14001803f88)
	/home/runner/work/gateway/gateway/internal/message/watchutil.go:76 +0xab8
github.com/envoyproxy/gateway/internal/xds/translator/runner.(*Runner).subscribeAndTranslate(0x14002208780, {0x107e88390?, 0x14000a84050?})
	/home/runner/work/gateway/gateway/internal/xds/translator/runner/runner.go:53 +0x84
created by github.com/envoyproxy/gateway/internal/xds/translator/runner.(*Runner).Start in goroutine 134
	/home/runner/work/gateway/gateway/internal/xds/translator/runner/runner.go:46 +0x1d4
@hoyon hoyon added the triage label Dec 6, 2024
@arkodg arkodg added kind/bug Something isn't working cherrypick/release-v1.2.4 and removed triage labels Dec 6, 2024
@arkodg
Copy link
Contributor

arkodg commented Dec 6, 2024

looks like the kube builder defaults set

// +kubebuilder:default="1s"

dont kick in for standalone mode

@shawnh2 do we need to call scheme.Default() after umarshalling the resources

func loadKubernetesYAMLToResources(input []byte, addMissingResources bool) (*Resources, error) {
?

@arkodg arkodg added this to the v1.3.0-rc.1 milestone Dec 6, 2024
@shawnh2 shawnh2 added the area/standalone Issues related to the standalone mode label Dec 7, 2024
@shawnh2
Copy link
Contributor

shawnh2 commented Dec 7, 2024

Sure, we need to set default values just like kubebuilder while in standalone mode.

@shawnh2 shawnh2 added the help wanted Extra attention is needed label Dec 7, 2024
@shawnh2 shawnh2 self-assigned this Dec 8, 2024
@shawnh2 shawnh2 removed the help wanted Extra attention is needed label Dec 8, 2024
@shawnh2
Copy link
Contributor

shawnh2 commented Dec 8, 2024

Had some test, seems scheme.Default is not able to set default values for these fields.

@arkodg
Copy link
Contributor

arkodg commented Dec 10, 2024

@shawnh2 did you call combinedScheme.Default(..) right after unmarshalling the resource

?

Copy link

This issue has been automatically marked as stale because it has not had activity in the last 30 days.

@github-actions github-actions bot added the stale label Jan 10, 2025
@shawnh2
Copy link
Contributor

shawnh2 commented Jan 10, 2025

@shawnh2 did you call combinedScheme.Default(..) right after unmarshalling the resource

?

yes, unfortunately it wont work...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/standalone Issues related to the standalone mode cherrypick/release-v1.2.4 kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants