Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Agentless] healthy agent without cloudbeat logs #2887

Closed
orouz opened this issue Jan 6, 2025 · 8 comments
Closed

[Agentless] healthy agent without cloudbeat logs #2887

orouz opened this issue Jan 6, 2025 · 8 comments
Assignees
Labels
bug Something isn't working Team:Cloud Security Cloud Security team related
Milestone

Comments

@orouz
Copy link
Collaborator

orouz commented Jan 6, 2025

Describe the bug
on ESS we have an elastic-agent 8.16.1 (agentless) reporting healthy status but when looking into Cloudbeat logs there are none for an extended time period (weeks)

To Reproduce

  1. Visit agent page on ESS long running env and check the logs for Cloudbeat

Expected behavior

  • Cloudbeat logs what it's currently doing or had failed to do.

Additional context

this agent is using a policy that has assets inventory installed and is misconfigured - all 3 cloud vendors are enabled at once, which is not supported. might not be related though.

@orouz orouz added bug Something isn't working Team:Cloud Security Cloud Security team related labels Jan 6, 2025
@kubasobon kubasobon self-assigned this Jan 7, 2025
@kubasobon
Copy link
Member

There's definitely an interval issue. Running locally with a 5m period:
Image
No results after 10 minutes besides the initial run.

@kubasobon
Copy link
Member

This checks out. Look at this piece of code.

The workflow looks like:

  1. Start fetcher.Fetch() in a goroutine for each fetcher
  2. Run a loop that forever checks for new events unless the context is cancelled

The issue is fetcher.Fetch() will only ever run once. That's it.

@kubasobon
Copy link
Member

With the fix:
Image

Still does not explain missing logs, but it's a start. We should see "Interval reached without events" every 10 seconds outside of fetch windows.

@kubasobon
Copy link
Member

Fix for intervals in #2902

@kubasobon
Copy link
Member

As for no logs.
This is all the data we see for one of the cloudbeats - 258 logs with info and warn levels.
Image

If there truly is only 1 fetcher run, it would check out. Since Interval was reached without events is a debug level log, it could be missing if logger is set to info level.

macbook:elastic-agent-diagnostics-2025-01-05T13-09-03Z-00 cat components-actual.yaml | grep 'logging.level'
                    - logging.level=info
                    - logging.level=info
                    - logging.level=info
                    - logging.level=info
                    - logging.level=info
                    - logging.level=info

Yup, diagnostics confirm it's all set to info. So no debug logs would appear. All good.

@kubasobon
Copy link
Member

PR merged. 8.x backport can be found here: #2905

@kubasobon
Copy link
Member

Ok, backported to 8.x, 8.16 and 8.17 branches.

@kubasobon kubasobon modified the milestones: 8.18, 9.0 Jan 8, 2025
@kubasobon
Copy link
Member

Going to go ahead and close this one as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Team:Cloud Security Cloud Security team related
Projects
None yet
Development

No branches or pull requests

2 participants