-
Notifications
You must be signed in to change notification settings - Fork 993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
app_chanspy+cel: Release channel iterator before chanspying #114
base: master
Are you sure you want to change the base?
app_chanspy+cel: Release channel iterator before chanspying #114
Conversation
Because of upcoming planned changes/refactoring to app_chanspy, it is convenient to pass some of the many arguments as a struct. This changeset adds a channel_spy_context struct to pass around. Related: asterisk#68
…tion This moves the guts of common_exec into channel_spy_consume_iterator. This makes refactoring/changing the code easier because there are fewer function local variables to consider. Related: asterisk#68
Refactor channel spying so it never holds on to a channel iterator. Instead, we recreate the iterator when needed and skip channels that we've seen already, creating the illusion of using an iterator. This change was needed because the iterator caused the yet-unseen channels in the iterator to be referenced by the iterator. This reference ensured that the channel does not get destroyed. (Which is good, because the iterator needs valid channels to work on.) But, hanging on to a channel reference for longer than a short while conflicts with CEL logging. The CEL hangup logging is activated by the destruction of the channel. During chanspy activity, a bunch of channels would stay in limbo. First when the chanspying was done would those channels get their CEL hangup event logged. The fix here is to hang on to channel iterators for only a short while. An alternative fix that makes CEL hangup logging independent of channel destruction was deemed more invasive. This patch makes chanspy channel selection slightly more resource intensive. But that's a small price to pay for correct CEL hangup logging. Fixes: asterisk#68
cherry-pick-to: 22 |
|
So I was pondering this some. ChanSpy is from a time when the only way to know about a channel was to get the channel, or list of channels, and go through them. Since then we've added channel snapshots and have a cache of them accessible using ast_channel_cache_by_name() which doesn't require accessing the channels or channel list at all. I think this should be examined as an alternative instead, so that holding references to channels is kept to a minimum - specifically when a channel is being spied on. Anyone else have any thoughts on that idea? |
1b894e6
to
32fd0fb
Compare
@jcolp: Do you have a quick example of how to use those snapshots? Some bigger refactoring might be worth it. We've been running this on prod now, but we did run into 2 similar deadlocks. I cannot prove that this patch caused it, but cannot disprove it either. The two deadlocks occurred with two threads in: If this changeset is to blame, I would put my money on some ast_autochan behaviour (swapping (locked?) channels?), but I don't really see how I changed anything in this respect... especially since there was only one ChanSpy invocation in one of these Asterisk runs, and it was 13 hours prior to the deadlock. |
I'm not sure what example would be applicable for this exactly, but "core show channels" uses the channel snapshot cache and snapshots. It doesn't pull from the channels container. |
b15287c
to
1862a36
Compare
(This PR contains 4 commits, where the first three are really just refactoring. Only the last should change behaviour.)
Refactor channel spying so it never holds on to a channel iterator. Instead, we recreate the iterator when needed and skip channels that we've seen already, creating the illusion of using an iterator.
This change was needed because the iterator caused the yet-unseen channels in the iterator to be referenced by the iterator. This reference ensured that the channel does not get destroyed. (Which is good, because the iterator needs valid channels to work on.)
But, hanging on to a channel reference for longer than a short while conflicts with CEL logging. The CEL hangup logging is activated by the destruction of the channel. During chanspy activity, a bunch of channels would stay in limbo. First when the chanspying was done would those channels get their CEL hangup event logged.
The fix here is to hang on to channel iterators for only a short while. An alternative fix that makes CEL hangup logging independent of channel destruction was deemed more invasive.
This patch makes chanspy channel selection slightly more resource intensive. But that's a small price to pay for correct CEL hangup logging.
Fixes: #68