Skip to content

Conversation

VihasMakwana
Copy link
Contributor

What does this PR do?

Adds a retry mechanism when creating a named pipe listener on Windows.
Instead of failing immediately if listener creation fails, it now retries for up to 5 seconds with a short delay between attempts.

Why is it important?

Sometimes the named pipe might not be immediately available. This was observed when using monitoring with beat receivers.

Checklist

  • I have read and understood the pull request guidelines of this project.
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

@VihasMakwana VihasMakwana requested a review from a team as a code owner September 29, 2025 17:06
@VihasMakwana VihasMakwana self-assigned this Sep 29, 2025
@VihasMakwana VihasMakwana added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Sep 29, 2025
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

Copy link
Contributor

mergify bot commented Sep 29, 2025

This pull request does not have a backport label. Could you fix it @VihasMakwana? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label that automatically backports to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@VihasMakwana VihasMakwana changed the title [ipc/;listener][windows] Implement retry mechanism for listener creation [ipc/listener][windows] Implement retry mechanism for listener creation Sep 29, 2025
@VihasMakwana VihasMakwana added backport-8.19 Automated backport to the 8.19 branch backport-9.1 Automated backport to the 9.1 branch skip-changelog labels Sep 29, 2025
@leehinman
Copy link
Contributor

I don't think this will fix https://github.com/elastic/ingest-dev/issues/6133

That error is:

filebeatreceiver:could not start the HTTP server for the API: failed to listen on the named pipe \\.\pipe\xTEtpJ7117ppc6OYvJCaYHbDW8mLjXGe.sock: open \\.\pipe\xTEtpJ7117ppc6OYvJCaYHbDW8mLjXGe.sock: Access is denied.

That error is coming from filebeatreceiver, not elastic-agent. I think we need to do the retry in the libbeat api server:

https://github.com/elastic/beats/blob/e55e0bedd6176c7c6c88dfdf012c14620cb85a29/libbeat/api/server.go#L73-L79

@VihasMakwana
Copy link
Contributor Author

VihasMakwana commented Sep 29, 2025

I don't think this will fix elastic/ingest-dev#6133

That error is:

filebeatreceiver:could not start the HTTP server for the API: failed to listen on the named pipe \\.\pipe\xTEtpJ7117ppc6OYvJCaYHbDW8mLjXGe.sock: open \\.\pipe\xTEtpJ7117ppc6OYvJCaYHbDW8mLjXGe.sock: Access is denied.

That error is coming from filebeatreceiver, not elastic-agent. I think we need to do the retry in the libbeat api server:

https://github.com/elastic/beats/blob/e55e0bedd6176c7c6c88dfdf012c14620cb85a29/libbeat/api/server.go#L73-L79

Oh!
I faced the same problem with elastic-agent and beatreceivers and this is what fixed for me.
Anyways, I'll apply the same fix in libbeat.

@VihasMakwana
Copy link
Contributor Author

I don't think this will fix elastic/ingest-dev#6133

That error is:

filebeatreceiver:could not start the HTTP server for the API: failed to listen on the named pipe \\.\pipe\xTEtpJ7117ppc6OYvJCaYHbDW8mLjXGe.sock: open \\.\pipe\xTEtpJ7117ppc6OYvJCaYHbDW8mLjXGe.sock: Access is denied.

That error is coming from filebeatreceiver, not elastic-agent. I think we need to do the retry in the libbeat api server:

https://github.com/elastic/beats/blob/e55e0bedd6176c7c6c88dfdf012c14620cb85a29/libbeat/api/server.go#L73-L79

Thank you for letting me know, Lee!

Copy link

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @VihasMakwana


// CreateListener creates net listener from address string
// Shared for control and beats comms sockets
func CreateListener(log *logger.Logger, address string) (net.Listener, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this change this is really going to have to take a context. Which I know is not great because changing the function signature means it needs to be updated everywhere. But if this is going to take up to 5 seconds, context cancelling will need to be handled. The user might Ctrl-C the elastic-agent it should not be stuck in this section and wait up to 5 seconds before it returns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blakerouse what if we create a signals.Notify(..) here and exit on sigint or sigterm, instead of passing context all the way from command?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.19 Automated backport to the 8.19 branch backport-9.1 Automated backport to the 9.1 branch skip-changelog Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants