-
Notifications
You must be signed in to change notification settings - Fork 167
raise exception after exiting watcher.each #214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
f4a4639
raise exception after exiting watcher.each
mszabo d3475fc
raise exception when ERROR received while watching
mszabo 2b6de68
reset watch_retry_count after resourceVersion retrieved from API server
mszabo 5019676
remove watch_retry_count resets, it was handled in another PR which i…
mszabo File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -125,6 +125,7 @@ def process_pod_watcher_notices(watcher) | |
| @stats.bump(:pod_cache_watch_ignored) | ||
| end | ||
| end | ||
| raise | ||
| end | ||
| end | ||
| end | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this simply fall through all the time and raise an exception on every notification. Should we be evaluating for a specific
notice.typeor at least havecontinuestatements aboveThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as i suspected:
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its probably something more like:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about doing it with when 'ERROR', but then it can still exit the watcher.each block without an exception which can cause endless loop without backoff.
I don't know how this testing framework works, but in reality it stays inside the watcher.each block until the connection is open. Watching opens long running connections, the API server usually kills the connection after 40-50 minutes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... I thought the
notice.typevalue can only be one of these there:ADDED,MODIFIED,DELETED.Reference: https://github.com/abonas/kubeclient/blame/28dfc4d538d72127015dc9b90e6b776c4b0fb986/test/test_watch.rb#L8
@jcantrill 's suggestion in https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/pull/214/files#r384729998 makes sense to me.
Looking at the issue reported at #213 (comment), it's not an unexpected exception. Besides, if an exception if thrown, https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/blob/9b48b09fa56fcb32c0bcc8e7547bb7589a309125/lib/fluent/plugin/kubernetes_metadata_watch_pods.rb#L42 should have caught it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no exception thrown when the connection is closed.
When you open the HTTP connection for the watcher you get a 200 OK reply from the server, but connection stays open and it is streaming the events over the same connection. Kubernetes will close it after around 1 hour (at least on our cluster).
Once it's closed there is no exception, it's finishing normally.
So the thread loop will execute, but because there were no exception the rescue block did not execute and nothing set the pod_watcher to nil. It will use resource_version from the first run.
Kubernetes API remembers old resources for a while, but after some time it will return ERROR with "too old resource version" message and closes the connection (no exception happens).
When this happens the thread loop will start the watcher again with the old resource version, gets error again and this continues in a loop. It results in a high load on the API server and no way to recover except restarting fluentd.
I can rework this PR to use the when 'ERROR' case, but then the issue can still happen if the connection is closed by the API server without returning ERROR and throwing exception.
I think an exception should be raised whenever the watcher.each block is exited, because it means the long-running connection was closed gracefully.
Unfortunately I am not sure how I can make it work with the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re #214 (comment): Ah I see. That sounds good to me. Instead of a simple
raise, can we give it some message to help with debugging / trouble shooting when the issue happens?For the fact that it returned
ERRORnotice, seems like there is some discussion at kubeclient/pull/275. Sounds like it might be fixed in