Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gRPC via nginx ingress #11882

Open
menvol3 opened this issue Feb 5, 2025 · 6 comments
Open

gRPC via nginx ingress #11882

menvol3 opened this issue Feb 5, 2025 · 6 comments
Labels

Comments

@menvol3
Copy link

menvol3 commented Feb 5, 2025

Hi,

I'm using a tool that utilizes gRPC-Java for communication between the client and server. The server is located in an AWS EKS cluster and is accessible externally via NGINX ingress.
To configure it, I followed this guide: https://kubernetes.github.io/ingress-nginx/examples/grpc/

After deploying all components, I tested it with grpcurl and received a successful response.

Then, I configured communication between the agent and the server, and it also worked. However, I encountered an issue when the agent lost its connection for a period of time during communication
Below is a log message

Agent log
Feb 03 14:24:11 test-agent-1 prometheus_proxy_agent[1042983]: 14:24:11.811 INFO  [AgentGrpcService.kt:132] - Creating gRPC stubs [Agent test-agent-1]
Feb 03 14:24:11 test-agent-1 prometheus_proxy_agent[1042983]: 14:24:11.812 INFO  [GrpcDsl.kt:75] - Creating connection for gRPC server at prometheus-proxy.service.net:443 using TLS (no mutual auth) [Agent test-agent-1]
Feb 03 14:24:11 test-agent-1 prometheus_proxy_agent[1042983]: 14:24:11.813 INFO  [Agent.kt:153] - Resetting agentId [Agent test-agent-1]
Feb 03 14:24:11 test-agent-1 prometheus_proxy_agent[1042983]: 14:24:11.813 INFO  [AgentGrpcService.kt:163] - Connecting to proxy at prometheus-proxy.service.net:443 using TLS (no mutual auth)... [Agent test-agent-1]
Feb 03 14:24:12 test-agent-1 prometheus_proxy_agent[1042983]: 14:24:12.534 INFO  [AgentClientInterceptor.kt:58] - Assigned agentId: 2488 to Agent{agentId=2488, agentName=test-agent-1, proxyHost=prometheus-proxy.service.net:443, adminService=Disabled, metricsService=Disabled} [grpc-default-executor-5]
Feb 03 14:24:12 test-agent-1 prometheus_proxy_agent[1042983]: 14:24:12.535 INFO  [AgentGrpcService.kt:169] - Connected to proxy at prometheus-proxy.service.net:443 using TLS (no mutual auth) [Agent test-agent-1]
Feb 03 14:24:12 test-agent-1 prometheus_proxy_agent[1042983]: 14:24:12.900 INFO  [AgentPathManager.kt:78] - Registered http://127.0.0.1:9273/metrics as /test-agent-1 with labels {} [Agent test-agent-1]
Feb 03 14:24:12 test-agent-1 prometheus_proxy_agent[1042983]: 14:24:12.901 INFO  [Agent.kt:244] - Heartbeat scheduled to fire after 5s of inactivity [DefaultDispatcher-worker-9]
Feb 03 14:25:16 test-agent-1 prometheus_proxy_agent[1042983]: 14:25:16.865 WARN  [Agent.kt:209] - Cannot connect to proxy at prometheus-proxy.service.net:443 StatusException INTERNAL: RST_STREAM closed stream. HTTP/2 error code: PROTOCOL_ERROR [Agent test-agent-1]
Feb 03 14:25:16 test-agent-1 prometheus_proxy_agent[1042983]: 14:25:16.866 INFO  [Agent.kt:216] - Waited 0s to reconnect [Agent test-agent-1]
Feb 03 14:25:16 test-agent-1 prometheus_proxy_agent[1042983]: 14:25:16.866 INFO  [AgentGrpcService.kt:132] - Creating gRPC stubs [Agent test-agent-1]
Feb 03 14:25:16 test-agent-1 prometheus_proxy_agent[1042983]: 14:25:16.867 INFO  [GrpcDsl.kt:75] - Creating connection for gRPC server at prometheus-proxy.service.net:443 using TLS (no mutual auth) [Agent test-agent-1]
Feb 03 14:25:16 test-agent-1 prometheus_proxy_agent[1042983]: 14:25:16.868 INFO  [Agent.kt:153] - Resetting agentId [Agent test-agent-1]
Feb 03 14:25:16 test-agent-1 prometheus_proxy_agent[1042983]: 14:25:16.868 INFO  [AgentGrpcService.kt:163] - Connecting to proxy at prometheus-proxy.service.net:443 using TLS (no mutual auth)... [Agent test-agent-1]
Feb 03 14:25:17 test-agent-1 prometheus_proxy_agent[1042983]: 14:25:17.699 INFO  [AgentClientInterceptor.kt:58] - Assigned agentId: 2492 to Agent{agentId=2492, agentName=test-agent-1, proxyHost=prometheus-proxy.service.net:443, adminService=Disabled, metricsService=Disabled} [grpc-default-executor-6]
Feb 03 14:25:17 test-agent-1 prometheus_proxy_agent[1042983]: 14:25:17.700 INFO  [AgentGrpcService.kt:169] - Connected to proxy at prometheus-proxy.service.net:443 using TLS (no mutual auth) [Agent test-agent-1]
Feb 03 14:25:18 test-agent-1 prometheus_proxy_agent[1042983]: 14:25:18.039 INFO  [AgentPathManager.kt:78] - Registered http://127.0.0.1:9273/metrics as /test-agent-1 with labels {} [Agent test-agent-1]
Feb 03 14:25:18 test-agent-1 prometheus_proxy_agent[1042983]: 14:25:18.040 INFO  [Agent.kt:244] - Heartbeat scheduled to fire after 5s of inactivity [DefaultDispatcher-worker-6]
Feb 03 14:26:21 test-agent-1 prometheus_proxy_agent[1042983]: 14:26:21.936 WARN  [Agent.kt:209] - Cannot connect to proxy at prometheus-proxy.service.net:443 StatusException INTERNAL: RST_STREAM closed stream. HTTP/2 error code: PROTOCOL_ERROR [Agent test-agent-1]
Feb 03 14:26:21 test-agent-1 prometheus_proxy_agent[1042983]: 14:26:21.937 INFO  [Agent.kt:216] - Waited 0s to reconnect [Agent test-agent-1]
Feb 03 14:26:21 test-agent-1 prometheus_proxy_agent[1042983]: 14:26:21.937 INFO  [AgentGrpcService.kt:132] - Creating gRPC stubs [Agent test-agent-1]
Feb 03 14:26:21 test-agent-1 prometheus_proxy_agent[1042983]: 14:26:21.938 INFO  [GrpcDsl.kt:75] - Creating connection for gRPC server at prometheus-proxy.service.net:443 using TLS (no mutual auth) [Agent test-agent-1]
Feb 03 14:26:21 test-agent-1 prometheus_proxy_agent[1042983]: 14:26:21.939 INFO  [Agent.kt:153] - Resetting agentId [Agent test-agent-1]
Feb 03 14:26:21 test-agent-1 prometheus_proxy_agent[1042983]: 14:26:21.940 INFO  [AgentGrpcService.kt:163] - Connecting to proxy at prometheus-proxy.service.net:443 using TLS (no mutual auth)... [Agent test-agent-1]
Feb 03 14:26:22 test-agent-1 prometheus_proxy_agent[1042983]: 14:26:22.664 INFO  [AgentClientInterceptor.kt:58] - Assigned agentId: 2495 to Agent{agentId=2495, agentName=test-agent-1, proxyHost=prometheus-proxy.service.net:443, adminService=Disabled, metricsService=Disabled} [grpc-default-executor-7]
Feb 03 14:26:22 test-agent-1 prometheus_proxy_agent[1042983]: 14:26:22.665 INFO  [AgentGrpcService.kt:169] - Connected to proxy at prometheus-proxy.service.net:443 using TLS (no mutual auth) [Agent test-agent-1]
Feb 03 14:26:23 test-agent-1 prometheus_proxy_agent[1042983]: 14:26:23.107 INFO  [AgentPathManager.kt:78] - Registered http://127.0.0.1:9273/metrics as /test-agent-1 with labels {} [Agent test-agent-1]
Feb 03 14:26:23 test-agent-1 prometheus_proxy_agent[1042983]: 14:26:23.108 INFO  [Agent.kt:244] - Heartbeat scheduled to fire after 5s of inactivity [DefaultDispatcher-worker-12]
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]: 14:27:25.801 WARN  [ScrapeResults.kt:120] - fetchScrapeUrl() java.util.concurrent.CancellationException: Parent job is Cancelling - http://127.0.0.1:9273/metrics [DefaultDispatcher-worker-11]
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]: java.util.concurrent.CancellationException: Parent job is Cancelling
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.engine.UtilsKt$attachToUserJob$cleanupHandler$1.invoke(Utils.kt:99)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.engine.UtilsKt$attachToUserJob$cleanupHandler$1.invoke(Utils.kt:97)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at kotlinx.coroutines.InvokeOnCancelling.invoke(JobSupport.kt:1571)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at kotlinx.coroutines.JobSupport.invokeOnCompletionInternal$kotlinx_coroutines_core(JobSupport.kt:500)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at kotlinx.coroutines.JobSupport.invokeOnCompletion(JobSupport.kt:452)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at kotlinx.coroutines.Job$DefaultImpls.invokeOnCompletion$default(Job.kt:313)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.engine.HttpClientEngineKt.createCallContext(HttpClientEngine.kt:166)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.engine.HttpClientEngine$DefaultImpls.executeWithinCallContext(HttpClientEngine.kt:91)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.engine.HttpClientEngine$DefaultImpls.access$executeWithinCallContext(HttpClientEngine.kt:24)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.engine.HttpClientEngine$install$1.invokeSuspend(HttpClientEngine.kt:70)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.engine.HttpClientEngine$install$1.invoke(HttpClientEngine.kt)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.engine.HttpClientEngine$install$1.invoke(HttpClientEngine.kt)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.util.pipeline.DebugPipelineContext.proceedLoop(DebugPipelineContext.kt:79)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.util.pipeline.DebugPipelineContext.proceed(DebugPipelineContext.kt:57)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.util.pipeline.DebugPipelineContext.execute$ktor_utils(DebugPipelineContext.kt:63)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.util.pipeline.Pipeline.execute(Pipeline.kt:86)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.HttpSend$DefaultSender.execute(HttpSend.kt:118)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.api.Send$Sender.proceed(CommonHooks.kt:41)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.auth.AuthKt$Auth$2$2.invokeSuspend(Auth.kt:130)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.auth.AuthKt$Auth$2$2.invoke(Auth.kt)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.auth.AuthKt$Auth$2$2.invoke(Auth.kt)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.api.Send$install$1.invokeSuspend(CommonHooks.kt:46)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.api.Send$install$1.invoke(CommonHooks.kt)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.api.Send$install$1.invoke(CommonHooks.kt)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.HttpSend$InterceptedSender.execute(HttpSend.kt:96)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.api.Send$Sender.proceed(CommonHooks.kt:41)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.HttpRequestRetryKt$HttpRequestRetry$2$1.invokeSuspend(HttpRequestRetry.kt:296)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.HttpRequestRetryKt$HttpRequestRetry$2$1.invoke(HttpRequestRetry.kt)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.HttpRequestRetryKt$HttpRequestRetry$2$1.invoke(HttpRequestRetry.kt)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.api.Send$install$1.invokeSuspend(CommonHooks.kt:46)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.api.Send$install$1.invoke(CommonHooks.kt)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.api.Send$install$1.invoke(CommonHooks.kt)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.HttpSend$InterceptedSender.execute(HttpSend.kt:96)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.api.Send$Sender.proceed(CommonHooks.kt:41)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.HttpTimeoutKt$HttpTimeout$2$1.invokeSuspend(HttpTimeout.kt:175)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.HttpTimeoutKt$HttpTimeout$2$1.invoke(HttpTimeout.kt)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.HttpTimeoutKt$HttpTimeout$2$1.invoke(HttpTimeout.kt)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.api.Send$install$1.invokeSuspend(CommonHooks.kt:46)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.api.Send$install$1.invoke(CommonHooks.kt)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.api.Send$install$1.invoke(CommonHooks.kt)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.HttpSend$InterceptedSender.execute(HttpSend.kt:96)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.api.Send$Sender.proceed(CommonHooks.kt:41)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.HttpRedirectKt$HttpRedirect$2$1.invokeSuspend(HttpRedirect.kt:97)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.HttpRedirectKt$HttpRedirect$2$1.invoke(HttpRedirect.kt)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.HttpRedirectKt$HttpRedirect$2$1.invoke(HttpRedirect.kt)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.api.Send$install$1.invokeSuspend(CommonHooks.kt:46)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.api.Send$install$1.invoke(CommonHooks.kt)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.api.Send$install$1.invoke(CommonHooks.kt)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.HttpSend$InterceptedSender.execute(HttpSend.kt:96)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.api.Send$Sender.proceed(CommonHooks.kt:41)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.HttpCallValidatorKt$HttpCallValidator$2$2.invokeSuspend(HttpCallValidator.kt:112)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.HttpCallValidatorKt$HttpCallValidator$2$2.invoke(HttpCallValidator.kt)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.HttpCallValidatorKt$HttpCallValidator$2$2.invoke(HttpCallValidator.kt)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.api.Send$install$1.invokeSuspend(CommonHooks.kt:46)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.api.Send$install$1.invoke(CommonHooks.kt)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.api.Send$install$1.invoke(CommonHooks.kt)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.HttpSend$InterceptedSender.execute(HttpSend.kt:96)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.HttpSend$Plugin$install$1.invokeSuspend(HttpSend.kt:84)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.HttpSend$Plugin$install$1.invoke(HttpSend.kt)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.HttpSend$Plugin$install$1.invoke(HttpSend.kt)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.util.pipeline.DebugPipelineContext.proceedLoop(DebugPipelineContext.kt:79)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.util.pipeline.DebugPipelineContext.proceed(DebugPipelineContext.kt:57)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.RequestError$install$1.invokeSuspend(HttpCallValidator.kt:134)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.RequestError$install$1.invoke(HttpCallValidator.kt)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.RequestError$install$1.invoke(HttpCallValidator.kt)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.util.pipeline.DebugPipelineContext.proceedLoop(DebugPipelineContext.kt:79)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.util.pipeline.DebugPipelineContext.proceed(DebugPipelineContext.kt:57)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.SetupRequestContext$install$1.invokeSuspend$proceed(HttpRequestLifecycle.kt:40)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.SetupRequestContext$install$1.access$invokeSuspend$proceed(HttpRequestLifecycle.kt)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.SetupRequestContext$install$1$1.invoke(HttpRequestLifecycle.kt:40)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.SetupRequestContext$install$1$1.invoke(HttpRequestLifecycle.kt:40)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.HttpRequestLifecycleKt$HttpRequestLifecycle$1$1.invokeSuspend(HttpRequestLifecycle.kt:27)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.HttpRequestLifecycleKt$HttpRequestLifecycle$1$1.invoke(HttpRequestLifecycle.kt)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.HttpRequestLifecycleKt$HttpRequestLifecycle$1$1.invoke(HttpRequestLifecycle.kt)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.SetupRequestContext$install$1.invokeSuspend(HttpRequestLifecycle.kt:40)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.SetupRequestContext$install$1.invoke(HttpRequestLifecycle.kt)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.plugins.SetupRequestContext$install$1.invoke(HttpRequestLifecycle.kt)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.util.pipeline.DebugPipelineContext.proceedLoop(DebugPipelineContext.kt:79)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.util.pipeline.DebugPipelineContext.proceed(DebugPipelineContext.kt:57)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.util.pipeline.DebugPipelineContext.execute$ktor_utils(DebugPipelineContext.kt:63)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.util.pipeline.Pipeline.execute(Pipeline.kt:86)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.HttpClient.execute$ktor_client_core(HttpClient.kt:1393)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.statement.HttpStatement.fetchResponse(HttpStatement.kt:147)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.ktor.client.statement.HttpStatement.execute(HttpStatement.kt:68)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at com.github.pambrose.common.dsl.KtorDsl.get(KtorDsl.kt:85)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.prometheus.agent.AgentHttpService.fetchContent(AgentHttpService.kt:90)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.prometheus.agent.AgentHttpService.fetchContentFromUrl(AgentHttpService.kt:76)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.prometheus.agent.AgentHttpService.fetchScrapeUrl(AgentHttpService.kt:61)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.prometheus.agent.AgentGrpcService$readRequestsFromProxy$2$1$2.invokeSuspend(AgentGrpcService.kt:298)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.prometheus.agent.AgentGrpcService$readRequestsFromProxy$2$1$2.invoke(AgentGrpcService.kt)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.prometheus.agent.AgentGrpcService$readRequestsFromProxy$2$1$2.invoke(AgentGrpcService.kt)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at io.prometheus.Agent$run$connectToProxy$3$4.invokeSuspend(Agent.kt:186)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:101)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:589)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:832)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:720)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]:         at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:707)
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]: 14:27:25.808 WARN  [Agent.kt:209] - Cannot connect to proxy at prometheus-proxy.service.net:443 StatusException INTERNAL: RST_STREAM closed stream. HTTP/2 error code: PROTOCOL_ERROR [Agent test-agent-1]
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]: 14:27:25.808 INFO  [Agent.kt:216] - Waited 0s to reconnect [Agent test-agent-1]
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]: 14:27:25.808 INFO  [AgentGrpcService.kt:132] - Creating gRPC stubs [Agent test-agent-1]
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]: 14:27:25.809 INFO  [GrpcDsl.kt:75] - Creating connection for gRPC server at prometheus-proxy.service.net:443 using TLS (no mutual auth) [Agent test-agent-1]
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]: 14:27:25.810 INFO  [Agent.kt:153] - Resetting agentId [Agent test-agent-1]
Feb 03 14:27:25 test-agent-1 prometheus_proxy_agent[1042983]: 14:27:25.810 INFO  [AgentGrpcService.kt:163] - Connecting to proxy at prometheus-proxy.service.net:443 using TLS (no mutual auth)... [Agent test-agent-1]
Feb 03 14:27:26 test-agent-1 prometheus_proxy_agent[1042983]: 14:27:26.526 INFO  [AgentClientInterceptor.kt:58] - Assigned agentId: 2499 to Agent{agentId=2499, agentName=test-agent-1, proxyHost=prometheus-proxy.service.net:443, adminService=Disabled, metricsService=Disabled} [grpc-default-executor-8]
Feb 03 14:27:26 test-agent-1 prometheus_proxy_agent[1042983]: 14:27:26.527 INFO  [AgentGrpcService.kt:169] - Connected to proxy at prometheus-proxy.service.net:443 using TLS (no mutual auth) [Agent test-agent-1]
Feb 03 14:27:26 test-agent-1 prometheus_proxy_agent[1042983]: 14:27:26.892 INFO  [AgentPathManager.kt:78] - Registered http://127.0.0.1:9273/metrics as /test-agent-1 with labels {} [Agent test-agent-1]
Feb 03 14:27:26 test-agent-1 prometheus_proxy_agent[1042983]: 14:27:26.893 INFO  [Agent.kt:244] - Heartbeat scheduled to fire after 5s of inactivity [DefaultDispatcher-worker-1]

To rule out any potential errors with the app, I also tested it with a simpler configuration, where the server was deployed on a standard EC2 instance and made available to the web. In this setup, I didn’t encounter any problems; everything worked as expected. So, it seems that the issue lies somewhere in the configuration of the NLB used by NGINX ingress or with NGINX ingress itself

Below is a visualization of how often the connection is dropped
Image

@kannanjgithub
Copy link
Contributor

kannanjgithub commented Feb 6, 2025

Feb 03 14:25:16 test-agent-1 prometheus_proxy_agent[1042983]: 14:25:16.865 WARN  [Agent.kt:209] - Cannot connect to proxy at prometheus-proxy.service.net:443 StatusException INTERNAL: RST_STREAM closed stream. HTTP/2 error code: PROTOCOL_ERROR [Agent test-agent-1]

The AgentService gRPC client receives RST_STREAM from the server, the client doesn't know any other details, you should check the Prometheus server proxy side logs to see what happened.
I see a similar issue grpc/grpc-node#1747 reported in grpc-node and AWS, you can see this reply summarizing the diagnosis. The customer's issue was apparently fixed by increasing the heartbeat timeout to 10s. I see your logs show 5s as the time between pings. If the server getting overwhelmed by the heartbeat pings is the cause of the error (along with some bug in the Prometheus server's gRPC service code) you can may be try increasing the timeout also.

@menvol3
Copy link
Author

menvol3 commented Feb 6, 2025

Thanks for the reply @kannanjgithub

I've attached the server log

Server log
14:24:06.400 WARN  [ProxyPathManager.kt:139] - Missing agent context for agentId: 2483 (Termination) [grpc-nio-worker-ELG-3-3]
14:24:06.401 WARN  [AgentContextManager.kt:53] - Missing AgentContext for agentId: 2483 (Termination) [grpc-nio-worker-ELG-3-3]
14:24:06.401 INFO  [ProxyServerTransportFilter.kt:46] - Disconnected with invalid agentId: 2483 [grpc-nio-worker-ELG-3-3]
14:24:08.070 INFO  [ProxyPathManager.kt:141] - Removing paths for agentId: 2486 (Termination) [grpc-nio-worker-ELG-3-1]
14:24:08.070 INFO  [AgentContextManager.kt:56] - Removed AgentContext{agentId=2486, launchId=Unassigned, consolidated=false, valid=true, agentName=Unassigned, hostName=Unassigned, remoteAddr=Unknown, lastRequestDuration=59.998507193s} for agentId: 2486 (Termination) [grpc-nio-worker-ELG-3-1]
14:24:08.070 INFO  [ProxyServerTransportFilter.kt:46] - Disconnected from AgentContext{agentId=2486, launchId=Unassigned, consolidated=false, valid=false, agentName=Unassigned, hostName=Unassigned, remoteAddr=Unknown, lastRequestDuration=59.998573739s} [grpc-nio-worker-ELG-3-1]
14:24:11.897 INFO  [ProxyPathManager.kt:141] - Removing paths for agentId: 2485 (Termination) [grpc-nio-worker-ELG-3-2]
14:24:11.897 INFO  [AgentContextManager.kt:56] - Removed AgentContext{agentId=2485, launchId=Unassigned, consolidated=false, valid=true, agentName=Unassigned, hostName=Unassigned, remoteAddr=Unknown, lastRequestDuration=1m 3.825870393s} for agentId: 2485 (Termination) [grpc-nio-worker-ELG-3-2]
14:24:11.897 INFO  [ProxyServerTransportFilter.kt:46] - Disconnected from AgentContext{agentId=2485, launchId=Unassigned, consolidated=false, valid=false, agentName=Unassigned, hostName=Unassigned, remoteAddr=Unknown, lastRequestDuration=1m 3.825934448s} [grpc-nio-worker-ELG-3-2]
14:24:11.898 INFO  [ProxyPathManager.kt:141] - Removing paths for agentId: 2484 (Termination) [grpc-nio-worker-ELG-3-4]
14:24:11.898 INFO  [ProxyPathManager.kt:150] - Removed path /test-agent-1 for AgentContextInfo(consolidated=false, labels={},agentContexts=[AgentContext{agentId=2484, launchId=CRudPRd53VSWxFS, consolidated=false, valid=true, agentName=test-agent-1, hostName=prometheus-proxy.service.net, remoteAddr=Unknown, lastRequestDuration=15.745416063s}]) [grpc-nio-worker-ELG-3-4]
14:24:11.898 INFO  [AgentContextManager.kt:56] - Removed AgentContext{agentId=2484, launchId=CRudPRd53VSWxFS, consolidated=false, valid=true, agentName=test-agent-1, hostName=prometheus-proxy.service.net, remoteAddr=Unknown, lastRequestDuration=15.745473072s} for agentId: 2484 (Termination) [grpc-nio-worker-ELG-3-4]
14:24:11.898 INFO  [ProxyServerTransportFilter.kt:46] - Disconnected from AgentContext{agentId=2484, launchId=CRudPRd53VSWxFS, consolidated=false, valid=false, agentName=test-agent-1, hostName=prometheus-proxy.service.net, remoteAddr=Unknown, lastRequestDuration=15.745532372s} [grpc-nio-worker-ELG-3-4]
14:24:12.453 INFO  [AgentContextManager.kt:40] - Registering agentId: 2488 [grpc-nio-worker-ELG-3-4]
14:24:12.651 INFO  [ProxyServiceImpl.kt:96] - Connected to AgentContext{agentId=2488, launchId=CRudPRd53VSWxFS, consolidated=false, valid=true, agentName=test-agent-1, hostName=prometheus-proxy.service.net, remoteAddr=Unknown, lastRequestDuration=197.759281ms} [DefaultDispatcher-worker-5]
14:24:12.819 INFO  [ProxyPathManager.kt:82] - Added path /test-agent-1 for AgentContext{agentId=2488, launchId=CRudPRd53VSWxFS, consolidated=false, valid=true, agentName=test-agent-1, hostName=prometheus-proxy.service.net, remoteAddr=Unknown, lastRequestDuration=366.072715ms} [DefaultDispatcher-worker-6]
14:24:13.043 INFO  [AgentContextManager.kt:40] - Registering agentId: 2490 [grpc-nio-worker-ELG-3-2]
14:24:13.043 INFO  [AgentContextManager.kt:40] - Registering agentId: 2489 [grpc-nio-worker-ELG-3-1]
14:24:16.738 INFO  [AgentContextCleanupService.kt:50] - Evicting agentId 2487 after 1m 3.664075204s (max 1m) of inactivity: AgentContext{agentId=2487, launchId=Unassigned, consolidated=false, valid=true, agentName=Unassigned, hostName=Unassigned, remoteAddr=Unknown, lastRequestDuration=1m 3.664095634s} [AgentContextCleanupService]
14:24:16.738 INFO  [ProxyPathManager.kt:141] - Removing paths for agentId: 2487 (Eviction) [AgentContextCleanupService]
14:24:16.738 INFO  [AgentContextManager.kt:56] - Removed AgentContext{agentId=2487, launchId=Unassigned, consolidated=false, valid=true, agentName=Unassigned, hostName=Unassigned, remoteAddr=Unknown, lastRequestDuration=1m 3.664272109s} for agentId: 2487 (Eviction) [AgentContextCleanupService]
14:24:18.003 INFO  [AgentContextManager.kt:40] - Registering agentId: 2491 [grpc-nio-worker-ELG-3-3]
14:24:26.111 INFO  [CallLogging.kt:45] - 200 OK: GET - /test-agent-1 - prometheus-kube-prometheus-stack-prometheus-0.prometheus-operated.monitoring.svc.cluster.local [DefaultDispatcher-worker-4]
14:24:56.140 INFO  [CallLogging.kt:45] - 200 OK: GET - /test-agent-1 - prometheus-kube-prometheus-stack-prometheus-0.prometheus-operated.monitoring.svc.cluster.local [DefaultDispatcher-worker-6]
14:25:11.935 WARN  [ProxyPathManager.kt:139] - Missing agent context for agentId: 2487 (Termination) [grpc-nio-worker-ELG-3-3]
14:25:11.935 WARN  [AgentContextManager.kt:53] - Missing AgentContext for agentId: 2487 (Termination) [grpc-nio-worker-ELG-3-3]
14:25:11.935 INFO  [ProxyServerTransportFilter.kt:46] - Disconnected with invalid agentId: 2487 [grpc-nio-worker-ELG-3-3]
14:25:13.042 INFO  [ProxyPathManager.kt:141] - Removing paths for agentId: 2489 (Termination) [grpc-nio-worker-ELG-3-1]
14:25:13.042 INFO  [AgentContextManager.kt:56] - Removed AgentContext{agentId=2489, launchId=Unassigned, consolidated=false, valid=true, agentName=Unassigned, hostName=Unassigned, remoteAddr=Unknown, lastRequestDuration=59.998934897s} for agentId: 2489 (Termination) [grpc-nio-worker-ELG-3-1]
14:25:13.042 INFO  [ProxyServerTransportFilter.kt:46] - Disconnected from AgentContext{agentId=2489, launchId=Unassigned, consolidated=false, valid=false, agentName=Unassigned, hostName=Unassigned, remoteAddr=Unknown, lastRequestDuration=59.999030482s} [grpc-nio-worker-ELG-3-1]
14:25:16.739 INFO  [AgentContextCleanupService.kt:50] - Evicting agentId 2490 after 1m 3.696316872s (max 1m) of inactivity: AgentContext{agentId=2490, launchId=Unassigned, consolidated=false, valid=true, agentName=Unassigned, hostName=Unassigned, remoteAddr=Unknown, lastRequestDuration=1m 3.696337839s} [AgentContextCleanupService]
14:25:16.739 INFO  [ProxyPathManager.kt:141] - Removing paths for agentId: 2490 (Eviction) [AgentContextCleanupService]
14:25:16.739 INFO  [AgentContextManager.kt:56] - Removed AgentContext{agentId=2490, launchId=Unassigned, consolidated=false, valid=true, agentName=Unassigned, hostName=Unassigned, remoteAddr=Unknown, lastRequestDuration=1m 3.696527758s} for agentId: 2490 (Eviction) [AgentContextCleanupService]
14:25:17.007 INFO  [ProxyPathManager.kt:141] - Removing paths for agentId: 2488 (Termination) [grpc-nio-worker-ELG-3-4]
14:25:17.007 WARN  [ProxyPathManager.kt:139] - Missing agent context for agentId: 2490 (Termination) [grpc-nio-worker-ELG-3-2]
14:25:17.007 INFO  [ProxyPathManager.kt:150] - Removed path /test-agent-1 for AgentContextInfo(consolidated=false, labels={},agentContexts=[AgentContext{agentId=2488, launchId=CRudPRd53VSWxFS, consolidated=false, valid=true, agentName=test-agent-1, hostName=prometheus-proxy.service.net, remoteAddr=Unknown, lastRequestDuration=20.878498248s}]) [grpc-nio-worker-ELG-3-4]
14:25:17.007 WARN  [AgentContextManager.kt:53] - Missing AgentContext for agentId: 2490 (Termination) [grpc-nio-worker-ELG-3-2]
14:25:17.008 INFO  [ProxyServerTransportFilter.kt:46] - Disconnected with invalid agentId: 2490 [grpc-nio-worker-ELG-3-2]
14:25:17.008 INFO  [AgentContextManager.kt:56] - Removed AgentContext{agentId=2488, launchId=CRudPRd53VSWxFS, consolidated=false, valid=true, agentName=test-agent-1, hostName=prometheus-proxy.service.net, remoteAddr=Unknown, lastRequestDuration=20.878568563s} for agentId: 2488 (Termination) [grpc-nio-worker-ELG-3-4]
14:25:17.008 INFO  [ProxyServerTransportFilter.kt:46] - Disconnected from AgentContext{agentId=2488, launchId=CRudPRd53VSWxFS, consolidated=false, valid=false, agentName=test-agent-1, hostName=prometheus-proxy.service.net, remoteAddr=Unknown, lastRequestDuration=20.878878457s} [grpc-nio-worker-ELG-3-4]
14:25:17.617 INFO  [AgentContextManager.kt:40] - Registering agentId: 2492 [grpc-nio-worker-ELG-3-4]
14:25:17.786 INFO  [ProxyServiceImpl.kt:96] - Connected to AgentContext{agentId=2492, launchId=CRudPRd53VSWxFS, consolidated=false, valid=true, agentName=test-agent-1, hostName=prometheus-proxy.service.net, remoteAddr=Unknown, lastRequestDuration=169.595291ms} [DefaultDispatcher-worker-4]
14:25:17.956 INFO  [ProxyPathManager.kt:82] - Added path /test-agent-1 for AgentContext{agentId=2492, launchId=CRudPRd53VSWxFS, consolidated=false, valid=true, agentName=test-agent-1, hostName=prometheus-proxy.service.net, remoteAddr=Unknown, lastRequestDuration=339.060823ms} [DefaultDispatcher-worker-4]
14:25:18.127 INFO  [AgentContextManager.kt:40] - Registering agentId: 2493 [grpc-nio-worker-ELG-3-1]
14:25:18.127 INFO  [AgentContextManager.kt:40] - Registering agentId: 2494 [grpc-nio-worker-ELG-3-2]
14:25:23.133 INFO  [AgentContextManager.kt:40] - Registering agentId: 2495 [grpc-nio-worker-ELG-3-3]
14:25:26.269 INFO  [CallLogging.kt:45] - 200 OK: GET - /test-agent-1 - prometheus-kube-prometheus-stack-prometheus-0.prometheus-operated.monitoring.svc.cluster.local [DefaultDispatcher-worker-8]
14:25:26.740 INFO  [AgentContextCleanupService.kt:50] - Evicting agentId 2491 after 1m 8.736333250s (max 1m) of inactivity: AgentContext{agentId=2491, launchId=Unassigned, consolidated=false, valid=true, agentName=Unassigned, hostName=Unassigned, remoteAddr=Unknown, lastRequestDuration=1m 8.736353560s} [AgentContextCleanupService]
14:25:26.740 INFO  [ProxyPathManager.kt:141] - Removing paths for agentId: 2491 (Eviction) [AgentContextCleanupService]
14:25:26.740 INFO  [AgentContextManager.kt:56] - Removed AgentContext{agentId=2491, launchId=Unassigned, consolidated=false, valid=true, agentName=Unassigned, hostName=Unassigned, remoteAddr=Unknown, lastRequestDuration=1m 8.736528364s} for agentId: 2491 (Eviction) [AgentContextCleanupService]
14:25:56.071 INFO  [CallLogging.kt:45] - 200 OK: GET - /test-agent-1 - prometheus-kube-prometheus-stack-prometheus-0.prometheus-operated.monitoring.svc.cluster.local [DefaultDispatcher-worker-2]
14:26:16.778 WARN  [ProxyPathManager.kt:139] - Missing agent context for agentId: 2491 (Termination) [grpc-nio-worker-ELG-3-3]
14:26:16.779 WARN  [AgentContextManager.kt:53] - Missing AgentContext for agentId: 2491 (Termination) [grpc-nio-worker-ELG-3-3]
14:26:16.779 INFO  [ProxyServerTransportFilter.kt:46] - Disconnected with invalid agentId: 2491 [grpc-nio-worker-ELG-3-3]
14:26:18.126 INFO  [ProxyPathManager.kt:141] - Removing paths for agentId: 2494 (Termination) [grpc-nio-worker-ELG-3-2]
14:26:18.126 INFO  [AgentContextManager.kt:56] - Removed AgentContext{agentId=2494, launchId=Unassigned, consolidated=false, valid=true, agentName=Unassigned, hostName=Unassigned, remoteAddr=Unknown, lastRequestDuration=59.999069096s} for agentId: 2494 (Termination) [grpc-nio-worker-ELG-3-2]
14:26:18.126 INFO  [ProxyServerTransportFilter.kt:46] - Disconnected from AgentContext{agentId=2494, launchId=Unassigned, consolidated=false, valid=false, agentName=Unassigned, hostName=Unassigned, remoteAddr=Unknown, lastRequestDuration=59.999183830s} [grpc-nio-worker-ELG-3-2]
14:26:22.017 INFO  [ProxyPathManager.kt:141] - Removing paths for agentId: 2493 (Termination) [grpc-nio-worker-ELG-3-1]
14:26:22.017 INFO  [AgentContextManager.kt:56] - Removed AgentContext{agentId=2493, launchId=Unassigned, consolidated=false, valid=true, agentName=Unassigned, hostName=Unassigned, remoteAddr=Unknown, lastRequestDuration=1m 3.889918337s} for agentId: 2493 (Termination) [grpc-nio-worker-ELG-3-1]
14:26:22.017 INFO  [ProxyServerTransportFilter.kt:46] - Disconnected from AgentContext{agentId=2493, launchId=Unassigned, consolidated=false, valid=false, agentName=Unassigned, hostName=Unassigned, remoteAddr=Unknown, lastRequestDuration=1m 3.890007082s} [grpc-nio-worker-ELG-3-1]
14:26:22.018 INFO  [ProxyPathManager.kt:141] - Removing paths for agentId: 2492 (Termination) [grpc-nio-worker-ELG-3-4]
14:26:22.018 INFO  [ProxyPathManager.kt:150] - Removed path /test-agent-1 for AgentContextInfo(consolidated=false, labels={},agentContexts=[AgentContext{agentId=2492, launchId=CRudPRd53VSWxFS, consolidated=false, valid=true, agentName=test-agent-1, hostName=prometheus-proxy.service.net, remoteAddr=Unknown, lastRequestDuration=25.957533718s}]) [grpc-nio-worker-ELG-3-4]
14:26:22.018 INFO  [AgentContextManager.kt:56] - Removed AgentContext{agentId=2492, launchId=CRudPRd53VSWxFS, consolidated=false, valid=true, agentName=test-agent-1, hostName=prometheus-proxy.service.net, remoteAddr=Unknown, lastRequestDuration=25.957604075s} for agentId: 2492 (Termination) [grpc-nio-worker-ELG-3-4]
14:26:22.018 INFO  [ProxyServerTransportFilter.kt:46] - Disconnected from AgentContext{agentId=2492, launchId=CRudPRd53VSWxFS, consolidated=false, valid=false, agentName=test-agent-1, hostName=prometheus-proxy.service.net, remoteAddr=Unknown, lastRequestDuration=25.957664659s} [grpc-nio-worker-ELG-3-4]
14:26:22.844 INFO  [ProxyServiceImpl.kt:96] - Connected to AgentContext{agentId=2495, launchId=CRudPRd53VSWxFS, consolidated=false, valid=true, agentName=test-agent-1, hostName=prometheus-proxy.service.net, remoteAddr=Unknown, lastRequestDuration=59.711243346s} [DefaultDispatcher-worker-2]
14:26:23.022 INFO  [ProxyPathManager.kt:82] - Added path /test-agent-1 for AgentContext{agentId=2495, launchId=CRudPRd53VSWxFS, consolidated=false, valid=true, agentName=test-agent-1, hostName=prometheus-proxy.service.net, remoteAddr=Unknown, lastRequestDuration=59.888860388s} [DefaultDispatcher-worker-2]
14:26:23.331 INFO  [AgentContextManager.kt:40] - Registering agentId: 2496 [grpc-nio-worker-ELG-3-1]
14:26:23.331 INFO  [AgentContextManager.kt:40] - Registering agentId: 2497 [grpc-nio-worker-ELG-3-4]
14:26:26.114 INFO  [CallLogging.kt:45] - 200 OK: GET - /test-agent-1 - prometheus-kube-prometheus-stack-prometheus-0.prometheus-operated.monitoring.svc.cluster.local [DefaultDispatcher-worker-5]
14:26:31.262 INFO  [AgentContextManager.kt:40] - Registering agentId: 2498 [grpc-nio-worker-ELG-3-2]
14:26:56.102 INFO  [CallLogging.kt:45] - 200 OK: GET - /test-agent-1 - prometheus-kube-prometheus-stack-prometheus-0.prometheus-operated.monitoring.svc.cluster.local [DefaultDispatcher-worker-9]
14:27:23.285 INFO  [ProxyPathManager.kt:141] - Removing paths for agentId: 2497 (Termination) [grpc-nio-worker-ELG-3-4]
14:27:23.285 INFO  [AgentContextManager.kt:56] - Removed AgentContext{agentId=2497, launchId=Unassigned, consolidated=false, valid=true, agentName=Unassigned, hostName=Unassigned, remoteAddr=Unknown, lastRequestDuration=59.954055439s} for agentId: 2497 (Termination) [grpc-nio-worker-ELG-3-4]
14:27:23.285 INFO  [ProxyServerTransportFilter.kt:46] - Disconnected from AgentContext{agentId=2497, launchId=Unassigned, consolidated=false, valid=false, agentName=Unassigned, hostName=Unassigned, remoteAddr=Unknown, lastRequestDuration=59.954180971s} [grpc-nio-worker-ELG-3-4]
14:27:25.922 INFO  [ProxyPathManager.kt:141] - Removing paths for agentId: 2496 (Termination) [grpc-nio-worker-ELG-3-1]
14:27:25.922 INFO  [ProxyPathManager.kt:141] - Removing paths for agentId: 2495 (Termination) [grpc-nio-worker-ELG-3-3]
14:27:25.923 INFO  [ProxyPathManager.kt:150] - Removed path /test-agent-1 for AgentContextInfo(consolidated=false, labels={},agentContexts=[AgentContext{agentId=2495, launchId=CRudPRd53VSWxFS, consolidated=false, valid=true, agentName=test-agent-1, hostName=prometheus-proxy.service.net, remoteAddr=Unknown, lastRequestDuration=29.831453486s}]) [grpc-nio-worker-ELG-3-3]
14:27:25.923 INFO  [AgentContextManager.kt:56] - Removed AgentContext{agentId=2496, launchId=Unassigned, consolidated=false, valid=true, agentName=Unassigned, hostName=Unassigned, remoteAddr=Unknown, lastRequestDuration=1m 2.592003977s} for agentId: 2496 (Termination) [grpc-nio-worker-ELG-3-1]
14:27:25.923 INFO  [ProxyServerTransportFilter.kt:46] - Disconnected from AgentContext{agentId=2496, launchId=Unassigned, consolidated=false, valid=false, agentName=Unassigned, hostName=Unassigned, remoteAddr=Unknown, lastRequestDuration=1m 2.592294197s} [grpc-nio-worker-ELG-3-1]
14:27:25.923 INFO  [AgentContextManager.kt:56] - Removed AgentContext{agentId=2495, launchId=CRudPRd53VSWxFS, consolidated=false, valid=true, agentName=test-agent-1, hostName=prometheus-proxy.service.net, remoteAddr=Unknown, lastRequestDuration=29.831954303s} for agentId: 2495 (Termination) [grpc-nio-worker-ELG-3-3]
14:27:25.923 INFO  [ProxyServerTransportFilter.kt:46] - Disconnected from AgentContext{agentId=2495, launchId=CRudPRd53VSWxFS, consolidated=false, valid=false, agentName=test-agent-1, hostName=prometheus-proxy.service.net, remoteAddr=Unknown, lastRequestDuration=29.832124716s} [grpc-nio-worker-ELG-3-3]
14:27:26.204 INFO  [CallLogging.kt:45] - 503 Service Unavailable: GET - /test-agent-1 - prometheus-kube-prometheus-stack-prometheus-0.prometheus-operated.monitoring.svc.cluster.local [DefaultDispatcher-worker-1]
14:27:26.441 INFO  [AgentContextManager.kt:40] - Registering agentId: 2499 [grpc-nio-worker-ELG-3-3]
14:27:26.639 INFO  [ProxyServiceImpl.kt:96] - Connected to AgentContext{agentId=2499, launchId=CRudPRd53VSWxFS, consolidated=false, valid=true, agentName=test-agent-1, hostName=prometheus-proxy.service.net, remoteAddr=Unknown, lastRequestDuration=198.444320ms} [DefaultDispatcher-worker-2]
14:27:26.807 INFO  [ProxyPathManager.kt:82] - Added path /test-agent-1 for AgentContext{agentId=2499, launchId=CRudPRd53VSWxFS, consolidated=false, valid=true, agentName=test-agent-1, hostName=prometheus-proxy.service.net, remoteAddr=Unknown, lastRequestDuration=366.150346ms} [DefaultDispatcher-worker-6]
14:27:26.977 INFO  [AgentContextManager.kt:40] - Registering agentId: 2500 [grpc-nio-worker-ELG-3-4]
14:27:26.977 INFO  [AgentContextManager.kt:40] - Registering agentId: 2501 [grpc-nio-worker-ELG-3-1]
14:27:32.034 INFO  [AgentContextManager.kt:40] - Registering agentId: 2502 [grpc-nio-worker-ELG-3-2]

@kannanjgithub
Copy link
Contributor

Those are logs from your Prometheus proxy server that forwards to the gRPC backend service application. We would need the logs from the latter.

@menvol3
Copy link
Author

menvol3 commented Feb 10, 2025

These are all the logs available from the backend service in my first post.

Also, I've tried to increase the heartbeat timeout to 10s and even 30 seconds, unfortunately it didn't help

Agent log
12:47:06.581 INFO  [Agent.kt:307] - Version: unknown Release Date: unknown [main]
12:47:06.644 INFO  [AgentOptions.kt:98] - proxyHostname: https://prometheus-proxy.service.net:443 [main]
12:47:06.644 INFO  [AgentOptions.kt:102] - agentName: test-agent-1 [main]
12:47:06.646 INFO  [AgentOptions.kt:106] - consolidated: false [main]
12:47:06.649 INFO  [AgentOptions.kt:110] - scrapeTimeoutSecs: 15s [main]
12:47:06.650 INFO  [AgentOptions.kt:114] - scrapeMaxRetries: 0 [main]
12:47:06.650 INFO  [AgentOptions.kt:120] - chunkContentSizeKbs: 32768 [main]
12:47:06.650 INFO  [AgentOptions.kt:124] - minGzipSizeBytes: 512 [main]
12:47:06.651 INFO  [AgentOptions.kt:128] - overrideAuthority:  [main]
12:47:06.651 INFO  [AgentOptions.kt:133] - trustAllX509Certificates: false [main]
12:47:06.651 INFO  [BaseOptions.kt:146] - adminEnabled: false [main]
12:47:06.652 INFO  [BaseOptions.kt:152] - adminPort: 8093 [main]
12:47:06.652 INFO  [BaseOptions.kt:158] - metricsEnabled: false [main]
12:47:06.652 INFO  [BaseOptions.kt:170] - metricsPort: 8083 [main]
12:47:06.653 INFO  [BaseOptions.kt:176] - transportFilterDisabled: true [main]
12:47:06.653 INFO  [BaseOptions.kt:164] - debugEnabled: false [main]
12:47:06.653 INFO  [BaseOptions.kt:182] - certChainFilePath:  [main]
12:47:06.653 INFO  [BaseOptions.kt:188] - privateKeyFilePath:  [main]
12:47:06.654 INFO  [BaseOptions.kt:194] - trustCertCollectionFilePath: dev_cert.pem [main]
12:47:06.654 INFO  [AgentOptions.kt:146] - agent.scrapeTimeoutSecs: 15s [main]
12:47:06.654 INFO  [AgentOptions.kt:147] - agent.internal.cioTimeoutSecs: 1m 30s [main]
12:47:06.654 INFO  [AgentOptions.kt:148] - agent.internal.heartbeatCheckPauseMillis: 500 [main]
12:47:06.655 INFO  [AgentOptions.kt:151] - agent.internal.heartbeatMaxInactivitySecs: 30 [main]
12:47:06.664 INFO  [AgentPathManager.kt:52] - Proxy path /test-agent-1 will be assigned to http://127.0.0.1:9273/metrics with labels {} [main]
12:47:06.785 INFO  [TlsUtils.kt:83] - Reading trustCertCollectionFilePath: "dev_cert.pem" [main]
12:47:06.820 INFO  [AgentGrpcService.kt:132] - Creating gRPC stubs [main]
12:47:06.825 INFO  [GrpcDsl.kt:75] - Creating connection for gRPC server at prometheus-proxy.service.net:443 using TLS (no mutual auth) [main]
12:47:06.886 INFO  [Agent.kt:125] - Agent name: test-agent-1 [main]
12:47:06.886 INFO  [Agent.kt:126] - Proxy reconnect pause time: 3s [main]
12:47:06.886 INFO  [Agent.kt:127] - Scrape timeout time: 15s [main]
12:47:06.887 INFO  [GenericService.kt:121] - Metrics service disabled [main]
12:47:06.887 INFO  [GenericService.kt:129] - Zipkin reporter service disabled [main]
12:47:06.892 INFO  [GenericService.kt:188] - Adding service Agent{agentId=, agentName=test-agent-1, proxyHost=prometheus-proxy.service.net:443, adminService=Disabled, metricsService=Disabled} [main]
12:47:06.912 INFO  [GenericServiceListener.kt:29] - Starting Agent{agentId=, agentName=test-agent-1, proxyHost=prometheus-proxy.service.net:443, adminService=Disabled, metricsService=Disabled} [main]
12:47:06.913 INFO  [GenericServiceListener.kt:30] - Running Agent{agentId=, agentName=test-agent-1, proxyHost=prometheus-proxy.service.net:443, adminService=Disabled, metricsService=Disabled} [main]
12:47:06.914 INFO  [GenericService.kt:141] - All Agent services healthy [Agent test-agent-1]
12:47:06.929 INFO  [AgentGrpcService.kt:163] - Connecting to proxy at prometheus-proxy.service.net:443 using TLS (no mutual auth)... [Agent test-agent-1]
12:47:07.934 INFO  [AgentGrpcService.kt:169] - Connected to proxy at prometheus-proxy.service.net:443 using TLS (no mutual auth) [Agent test-agent-1]
12:47:08.307 INFO  [AgentPathManager.kt:78] - Registered http://127.0.0.1:9273/metrics as /test-agent-1 with labels {} [Agent test-agent-1]
12:47:08.323 INFO  [Agent.kt:244] - Heartbeat scheduled to fire after 30s of inactivity [DefaultDispatcher-worker-2]
12:48:12.701 WARN  [ScrapeResults.kt:120] - fetchScrapeUrl() io.grpc.StatusException: INTERNAL: RST_STREAM closed stream. HTTP/2 error code: PROTOCOL_ERROR - http://127.0.0.1:9273/metrics [DefaultDispatcher-worker-9]
io.grpc.StatusException: INTERNAL: RST_STREAM closed stream. HTTP/2 error code: PROTOCOL_ERROR
	at io.grpc.Status.asException(Status.java:547)
	at io.grpc.kotlin.ClientCalls$rpcImpl$1$1$1.onClose(ClientCalls.kt:300)
	at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:564)
	at io.grpc.internal.ClientCallImpl.access$100(ClientCallImpl.java:72)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:729)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:710)
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
12:48:12.708 WARN  [Agent.kt:209] - Cannot connect to proxy at prometheus-proxy.service.net:443 StatusException INTERNAL: RST_STREAM closed stream. HTTP/2 error code: PROTOCOL_ERROR [Agent test-agent-1]
12:48:12.711 INFO  [Agent.kt:216] - Waited 0s to reconnect [Agent test-agent-1]
12:48:12.712 INFO  [AgentGrpcService.kt:132] - Creating gRPC stubs [Agent test-agent-1]
12:48:12.717 INFO  [GrpcDsl.kt:75] - Creating connection for gRPC server at prometheus-proxy.service.net:443 using TLS (no mutual auth) [Agent test-agent-1]
12:48:12.719 INFO  [Agent.kt:153] - Resetting agentId [Agent test-agent-1]
12:48:12.719 INFO  [AgentGrpcService.kt:163] - Connecting to proxy at prometheus-proxy.service.net:443 using TLS (no mutual auth)... [Agent test-agent-1]
12:48:13.483 INFO  [AgentGrpcService.kt:169] - Connected to proxy at prometheus-proxy.service.net:443 using TLS (no mutual auth) [Agent test-agent-1]
12:48:13.833 INFO  [AgentPathManager.kt:78] - Registered http://127.0.0.1:9273/metrics as /test-agent-1 with labels {} [Agent test-agent-1]
12:48:13.833 INFO  [Agent.kt:244] - Heartbeat scheduled to fire after 30s of inactivity [DefaultDispatcher-worker-3]
12:49:42.688 WARN  [ScrapeResults.kt:120] - fetchScrapeUrl() java.util.concurrent.CancellationException: Parent job is Cancelling - http://127.0.0.1:9273/metrics [DefaultDispatcher-worker-6]
java.util.concurrent.CancellationException: Parent job is Cancelling
	at io.ktor.client.engine.UtilsKt$attachToUserJob$cleanupHandler$1.invoke(Utils.kt:99)
	at io.ktor.client.engine.UtilsKt$attachToUserJob$cleanupHandler$1.invoke(Utils.kt:97)
	at kotlinx.coroutines.InvokeOnCancelling.invoke(JobSupport.kt:1571)
	at kotlinx.coroutines.JobSupport.invokeOnCompletionInternal$kotlinx_coroutines_core(JobSupport.kt:500)
	at kotlinx.coroutines.JobSupport.invokeOnCompletion(JobSupport.kt:452)
	at kotlinx.coroutines.Job$DefaultImpls.invokeOnCompletion$default(Job.kt:313)
	at io.ktor.client.engine.HttpClientEngineKt.createCallContext(HttpClientEngine.kt:166)
	at io.ktor.client.engine.HttpClientEngine$DefaultImpls.executeWithinCallContext(HttpClientEngine.kt:91)
	at io.ktor.client.engine.HttpClientEngine$DefaultImpls.access$executeWithinCallContext(HttpClientEngine.kt:24)
	at io.ktor.client.engine.HttpClientEngine$install$1.invokeSuspend(HttpClientEngine.kt:70)
	at io.ktor.client.engine.HttpClientEngine$install$1.invoke(HttpClientEngine.kt)
	at io.ktor.client.engine.HttpClientEngine$install$1.invoke(HttpClientEngine.kt)
	at io.ktor.util.pipeline.DebugPipelineContext.proceedLoop(DebugPipelineContext.kt:79)
	at io.ktor.util.pipeline.DebugPipelineContext.proceed(DebugPipelineContext.kt:57)
	at io.ktor.util.pipeline.DebugPipelineContext.execute$ktor_utils(DebugPipelineContext.kt:63)
	at io.ktor.util.pipeline.Pipeline.execute(Pipeline.kt:86)
	at io.ktor.client.plugins.HttpSend$DefaultSender.execute(HttpSend.kt:118)
	at io.ktor.client.plugins.api.Send$Sender.proceed(CommonHooks.kt:41)
	at io.ktor.client.plugins.auth.AuthKt$Auth$2$2.invokeSuspend(Auth.kt:130)
	at io.ktor.client.plugins.auth.AuthKt$Auth$2$2.invoke(Auth.kt)
	at io.ktor.client.plugins.auth.AuthKt$Auth$2$2.invoke(Auth.kt)
	at io.ktor.client.plugins.api.Send$install$1.invokeSuspend(CommonHooks.kt:46)
	at io.ktor.client.plugins.api.Send$install$1.invoke(CommonHooks.kt)
	at io.ktor.client.plugins.api.Send$install$1.invoke(CommonHooks.kt)
	at io.ktor.client.plugins.HttpSend$InterceptedSender.execute(HttpSend.kt:96)
	at io.ktor.client.plugins.api.Send$Sender.proceed(CommonHooks.kt:41)
	at io.ktor.client.plugins.HttpRequestRetryKt$HttpRequestRetry$2$1.invokeSuspend(HttpRequestRetry.kt:296)
	at io.ktor.client.plugins.HttpRequestRetryKt$HttpRequestRetry$2$1.invoke(HttpRequestRetry.kt)
	at io.ktor.client.plugins.HttpRequestRetryKt$HttpRequestRetry$2$1.invoke(HttpRequestRetry.kt)
	at io.ktor.client.plugins.api.Send$install$1.invokeSuspend(CommonHooks.kt:46)
	at io.ktor.client.plugins.api.Send$install$1.invoke(CommonHooks.kt)
	at io.ktor.client.plugins.api.Send$install$1.invoke(CommonHooks.kt)
	at io.ktor.client.plugins.HttpSend$InterceptedSender.execute(HttpSend.kt:96)
	at io.ktor.client.plugins.api.Send$Sender.proceed(CommonHooks.kt:41)
	at io.ktor.client.plugins.HttpTimeoutKt$HttpTimeout$2$1.invokeSuspend(HttpTimeout.kt:175)
	at io.ktor.client.plugins.HttpTimeoutKt$HttpTimeout$2$1.invoke(HttpTimeout.kt)
	at io.ktor.client.plugins.HttpTimeoutKt$HttpTimeout$2$1.invoke(HttpTimeout.kt)
	at io.ktor.client.plugins.api.Send$install$1.invokeSuspend(CommonHooks.kt:46)
	at io.ktor.client.plugins.api.Send$install$1.invoke(CommonHooks.kt)
	at io.ktor.client.plugins.api.Send$install$1.invoke(CommonHooks.kt)
	at io.ktor.client.plugins.HttpSend$InterceptedSender.execute(HttpSend.kt:96)
	at io.ktor.client.plugins.api.Send$Sender.proceed(CommonHooks.kt:41)
	at io.ktor.client.plugins.HttpRedirectKt$HttpRedirect$2$1.invokeSuspend(HttpRedirect.kt:97)
	at io.ktor.client.plugins.HttpRedirectKt$HttpRedirect$2$1.invoke(HttpRedirect.kt)
	at io.ktor.client.plugins.HttpRedirectKt$HttpRedirect$2$1.invoke(HttpRedirect.kt)
	at io.ktor.client.plugins.api.Send$install$1.invokeSuspend(CommonHooks.kt:46)
	at io.ktor.client.plugins.api.Send$install$1.invoke(CommonHooks.kt)
	at io.ktor.client.plugins.api.Send$install$1.invoke(CommonHooks.kt)
	at io.ktor.client.plugins.HttpSend$InterceptedSender.execute(HttpSend.kt:96)
	at io.ktor.client.plugins.api.Send$Sender.proceed(CommonHooks.kt:41)
	at io.ktor.client.plugins.HttpCallValidatorKt$HttpCallValidator$2$2.invokeSuspend(HttpCallValidator.kt:112)
	at io.ktor.client.plugins.HttpCallValidatorKt$HttpCallValidator$2$2.invoke(HttpCallValidator.kt)
	at io.ktor.client.plugins.HttpCallValidatorKt$HttpCallValidator$2$2.invoke(HttpCallValidator.kt)
	at io.ktor.client.plugins.api.Send$install$1.invokeSuspend(CommonHooks.kt:46)
	at io.ktor.client.plugins.api.Send$install$1.invoke(CommonHooks.kt)
	at io.ktor.client.plugins.api.Send$install$1.invoke(CommonHooks.kt)
	at io.ktor.client.plugins.HttpSend$InterceptedSender.execute(HttpSend.kt:96)
	at io.ktor.client.plugins.HttpSend$Plugin$install$1.invokeSuspend(HttpSend.kt:84)
	at io.ktor.client.plugins.HttpSend$Plugin$install$1.invoke(HttpSend.kt)
	at io.ktor.client.plugins.HttpSend$Plugin$install$1.invoke(HttpSend.kt)
	at io.ktor.util.pipeline.DebugPipelineContext.proceedLoop(DebugPipelineContext.kt:79)
	at io.ktor.util.pipeline.DebugPipelineContext.proceed(DebugPipelineContext.kt:57)
	at io.ktor.client.plugins.RequestError$install$1.invokeSuspend(HttpCallValidator.kt:134)
	at io.ktor.client.plugins.RequestError$install$1.invoke(HttpCallValidator.kt)
	at io.ktor.client.plugins.RequestError$install$1.invoke(HttpCallValidator.kt)
	at io.ktor.util.pipeline.DebugPipelineContext.proceedLoop(DebugPipelineContext.kt:79)
	at io.ktor.util.pipeline.DebugPipelineContext.proceed(DebugPipelineContext.kt:57)
	at io.ktor.client.plugins.SetupRequestContext$install$1.invokeSuspend$proceed(HttpRequestLifecycle.kt:40)
	at io.ktor.client.plugins.SetupRequestContext$install$1.access$invokeSuspend$proceed(HttpRequestLifecycle.kt)
	at io.ktor.client.plugins.SetupRequestContext$install$1$1.invoke(HttpRequestLifecycle.kt:40)
	at io.ktor.client.plugins.SetupRequestContext$install$1$1.invoke(HttpRequestLifecycle.kt:40)
	at io.ktor.client.plugins.HttpRequestLifecycleKt$HttpRequestLifecycle$1$1.invokeSuspend(HttpRequestLifecycle.kt:27)
	at io.ktor.client.plugins.HttpRequestLifecycleKt$HttpRequestLifecycle$1$1.invoke(HttpRequestLifecycle.kt)
	at io.ktor.client.plugins.HttpRequestLifecycleKt$HttpRequestLifecycle$1$1.invoke(HttpRequestLifecycle.kt)
	at io.ktor.client.plugins.SetupRequestContext$install$1.invokeSuspend(HttpRequestLifecycle.kt:40)
	at io.ktor.client.plugins.SetupRequestContext$install$1.invoke(HttpRequestLifecycle.kt)
	at io.ktor.client.plugins.SetupRequestContext$install$1.invoke(HttpRequestLifecycle.kt)
	at io.ktor.util.pipeline.DebugPipelineContext.proceedLoop(DebugPipelineContext.kt:79)
	at io.ktor.util.pipeline.DebugPipelineContext.proceed(DebugPipelineContext.kt:57)
	at io.ktor.util.pipeline.DebugPipelineContext.execute$ktor_utils(DebugPipelineContext.kt:63)
	at io.ktor.util.pipeline.Pipeline.execute(Pipeline.kt:86)
	at io.ktor.client.HttpClient.execute$ktor_client_core(HttpClient.kt:1393)
	at io.ktor.client.statement.HttpStatement.fetchResponse(HttpStatement.kt:147)
	at io.ktor.client.statement.HttpStatement.execute(HttpStatement.kt:68)
	at com.github.pambrose.common.dsl.KtorDsl.get(KtorDsl.kt:85)
	at io.prometheus.agent.AgentHttpService.fetchContent(AgentHttpService.kt:90)
	at io.prometheus.agent.AgentHttpService.fetchContentFromUrl(AgentHttpService.kt:76)
	at io.prometheus.agent.AgentHttpService.fetchScrapeUrl(AgentHttpService.kt:61)
	at io.prometheus.agent.AgentGrpcService$readRequestsFromProxy$2$1$2.invokeSuspend(AgentGrpcService.kt:298)
	at io.prometheus.agent.AgentGrpcService$readRequestsFromProxy$2$1$2.invoke(AgentGrpcService.kt)
	at io.prometheus.agent.AgentGrpcService$readRequestsFromProxy$2$1$2.invoke(AgentGrpcService.kt)
	at io.prometheus.Agent$run$connectToProxy$3$4.invokeSuspend(Agent.kt:186)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:101)
	at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:589)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:832)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:720)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:707)
12:49:42.690 WARN  [Agent.kt:209] - Cannot connect to proxy at prometheus-proxy.service.net:443 StatusException INTERNAL: RST_STREAM closed stream. HTTP/2 error code: PROTOCOL_ERROR [Agent test-agent-1]
12:49:42.691 INFO  [Agent.kt:216] - Waited 0s to reconnect [Agent test-agent-1]
12:49:42.691 INFO  [AgentGrpcService.kt:132] - Creating gRPC stubs [Agent test-agent-1]
12:49:42.692 INFO  [GrpcDsl.kt:75] - Creating connection for gRPC server at prometheus-proxy.service.net:443 using TLS (no mutual auth) [Agent test-agent-1]
12:49:42.694 INFO  [Agent.kt:153] - Resetting agentId [Agent test-agent-1]
12:49:42.695 INFO  [AgentGrpcService.kt:163] - Connecting to proxy at prometheus-proxy.service.net:443 using TLS (no mutual auth)... [Agent test-agent-1]
12:49:43.439 INFO  [AgentGrpcService.kt:169] - Connected to proxy at prometheus-proxy.service.net:443 using TLS (no mutual auth) [Agent test-agent-1]
12:49:43.790 INFO  [AgentPathManager.kt:78] - Registered http://127.0.0.1:9273/metrics as /test-agent-1 with labels {} [Agent test-agent-1]
12:49:43.791 INFO  [Agent.kt:244] - Heartbeat scheduled to fire after 30s of inactivity [DefaultDispatcher-worker-7]

@kannanjgithub
Copy link
Contributor

kannanjgithub commented Feb 12, 2025

Those are logs from your Prometheus proxy server that forwards to the gRPC backend service application. We would need the logs from the latter.

A correction about my previous message. After reading the link https://github.com/pambrose/prometheus-proxy you gave it is clear that

backend application providing metrics behind firewall <----- Prometheus Agent behind firewall - - - uses gRPC- - -> Prometheus Proxy uses NGIX Ingress (outside firewall) <--- Prometheus server (not our worry here)

(Whenever you said server you referred to the Prometheus proxy and not the Prometheus server).

You have provided the Agent logs which is the gRPC client. You can see in its logs

Connecting to proxy at prometheus-proxy.service.net:443 using TLS

Does the Prometheus proxy service that is using nginx ingress not have logs? It runs a gRPC server so it should have them.

@panchenko
Copy link
Contributor

nginx can close connection after the time specified by grpc_read_timeout, please try increasing that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants