Skip to content

Replace unbounded channels with bounded ones. #1191

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jul 9, 2019

Conversation

twittner
Copy link
Contributor

@twittner twittner commented Jul 2, 2019

Context: #973

To remove the remaining unbounded channels, which are used for communicating with node tasks, an API similar to futures::Sink is used, i.e. sending is split into a start and complete phase. The start phase returns StartSend and first attempts to complete any pending send operations. Completing the send means polling until Poll::Ready(()) is returned.

In addition this PR has split the handled_node_tasks module into several smaller ones (cf. nodes::tasks) and renamed some types:

  • nodes::handled_node_tasks::NodeTask -> nodes::tasks::task::Task
  • nodes::handled_node_tasks::NodeTaskInner -> nodes::tasks::task::State
  • nodes::handled_node_tasks::NodeTasks -> nodes::tasks::Manager
  • nodes::handled_node_tasks::TaskClosedEvent -> nodes::tasks::Error
  • nodes::handled_node_tasks::HandledNodesEvent -> nodes::tasks::Event
  • nodes::handled_node_tasks::Task -> nodes::tasks::TaskEntry
  • nodes::handled_node_tasks::ExtToInMessage -> nodes::tasks::task::ToTaskMessage
  • nodes::handled_node_tasks::InToExtMessage -> nodes::tasks::task::FromTaskMessage

To review this PR it is probably best to use the diff against swarm.rs, raw_swarm.rs and collection.rs and review nodes::tasks directly (keeping in mind that its contents used to form handled_nodes_tasks).

twittner added 3 commits July 2, 2019 15:43
To remove the unbounded channels used for communicating with node tasks
an API similar to `futures::Sink` is used, i.e. sending is split into a
start and complete phase. The start phase returns `StartSend` and first
attempts to complete any pending send operations. Completing the send
means polling until `Poll::Ready(())` is returned.

In addition this PR has split the `handled_node_tasks` module into
several smaller ones (cf. `nodes::tasks`) and renamed some types:

- `nodes::handled_node_tasks::NodeTask` -> `nodes::tasks::task::Task`
- `nodes::handled_node_tasks::NodeTaskInner` -> `nodes::tasks::task::State`
- `nodes::handled_node_tasks::NodeTasks` -> `nodes::tasks::Manager`
- `nodes::handled_node_tasks::TaskClosedEvent` -> `nodes::tasks::Error`
- `nodes::handled_node_tasks::HandledNodesEvent` -> `nodes::tasks::Event`
- `nodes::handled_node_tasks::Task` -> `nodes::tasks::TaskEntry`
- `nodes::handled_node_tasks::ExtToInMessage` -> `nodes::tasks::task::ToTaskMessage`
- `nodes::handled_node_tasks::InToExtMessage` -> `nodes::tasks::task::FromTaskMessage`
@tomaka
Copy link
Member

tomaka commented Jul 8, 2019

So #973 was only about events from the handler towards the main task, and not the other way around. In your PR you end up replacing the unbounded channel with a bounded one, but then you other unbounded containers (take_over_to_complete and send_events_to_complete) as buffers, which doesn't make sense to me.

twittner added 2 commits July 8, 2019 10:47
Since it is always holding just a single pending message.
@twittner
Copy link
Contributor Author

twittner commented Jul 8, 2019

So #973 was only about events from the handler towards the main task, and not the other way around.

Could you please point out to me where in #973 this is stated because I can not find it?

[...] you add another unbounded container (take_over_to_complete) as a buffer, which doesn't make sense to me.

I take this as a cryptic hint that RawSwarm::take_over_to_complete and Swarm::send_events_to_complete do not need to be VecDeques, but could be Options because the poll methods are only ever going to put a single element into these collections.

@tomaka
Copy link
Member

tomaka commented Jul 8, 2019

The question is: what happens if a background task is stuck (for example in an infinite loop), and the channel towards that task is full, and the user continues sending events?

With this PR, this will fill send_events_to_complete, which is exactly the same as filling an unbounded channel, except that the code is now way more complicated.

@twittner
Copy link
Contributor Author

twittner commented Jul 8, 2019

With this PR, this will fill send_events_to_complete, which is exactly the same as filling an unbounded channel, except that the code is now way more complicated.

Assuming the consumer (Task) will not consume events, then the swarm could not make progress. The previously used VecDeques would not be added to because the code that adds an element can only be reached when they are empty (cf. [1], [2]), which is the reason why they can be Options instead.

So this means that whereas unbounded channels would consume all available memory if items are not consumed, with this change the programme would instead not make progress. This is why I added the following note to Task::poll:

NOTE: It is imperative to always consume all incoming event messages first in order to not prevent the outside from making progress because they are blocked on the channel capacity. [3]

@tomaka
Copy link
Member

tomaka commented Jul 8, 2019

Oh I see, you're blocking the swarm. That looks more reasonable. Let's go with that.

The other thing I don't like is that when the "pending item" is AsyncSink::Ready, that means "must call flush/poll_complete". It's not documented, not obvious, and really not great in terms of semantics.

Copy link
Member

@tomaka tomaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't spot any logic mistake, but I feel like the code is complicated enough that I wouldn't be able to spot mistakes unfortunately.

twittner and others added 2 commits July 8, 2019 12:02
@twittner
Copy link
Contributor Author

twittner commented Jul 8, 2019

The other thing I don't like is that when the "pending item" is AsyncSink::Ready, that means "must call flush/poll_complete". It's not documented, not obvious, and really not great in terms of semantics.

Is this not standard usage with sinks? Fwiw https://docs.rs/futures/0.1.28/futures/enum.AsyncSink.html documents AsyncSink::Ready as:

The start_send attempt succeeded, so the sending process has started; you must use Sink::poll_complete to drive the send to completion.

@tomaka
Copy link
Member

tomaka commented Jul 8, 2019

AsyncSink::Ready, to me, means "the sink was ready, therefore it accepted the item". It doesn't imply anything about flushing.

Instead of having a field pending: Option<AsyncSink<T>>, I would go for two fields pending: Option<T> and needs_flush: bool.

@twittner
Copy link
Contributor Author

twittner commented Jul 8, 2019

AsyncSink::Ready, to me, means "the sink was ready, therefore it accepted the item". It doesn't imply anything about flushing.

But it does. The API contract of Sink::start_send explains this in no uncertain terms (emphasis not mine):

Since sinks are designed to work with asynchronous I/O, the process of actually writing out the data to an underlying object takes place asynchronously. You must use poll_complete in order to drive completion of a send. In particular, start_send does not begin the flushing process [...]. This method returns AsyncSink::Ready if the sink was able to start sending item. In that case, you must ensure that you call poll_complete to process the sent item to completion.

And as I quoted above, AsyncSink::Ready's documentation reads the same.

@tomaka
Copy link
Member

tomaka commented Jul 8, 2019

Imagine this piece of code:

/// If this method returns `Some`, you have to call `bar` with the value.
#[must_use]
fn foo() -> Option<Bar> {
    ...
}

That does not imply that Some means "you have to call bar".
If I have a struct with a field of type Option<Bar>, it does not automatically imply that I have to call bar() if it contains Some.

Calling bar if foo returns Some is a property of foo. Similarly, having to call poll_complete is a property of start_send, not of AsyncSink.
Having a struct with a field of type AsyncSink does not automatically imply that I have to call poll_complete() if it contains Ready.

@romanb
Copy link
Contributor

romanb commented Jul 8, 2019

Calling bar if foo returns Some is a property of foo. Similarly, having to call poll_complete is a property of start_send, not of AsyncSink.
Having a struct with a field of type AsyncSink does not automatically imply that I have to call poll_complete() if it contains Ready.

But it seems that the property is indeed explicitly tied to the value AsyncSink::Ready in this case, by design, not just to the function start_send. Or put differently, as per the documentation AsyncSink seems intentionally tied to start_send, the property is explicitly attached (or extended) to the data type, not just the operation. If I would see an Option<AsyncSink> in the wild as a return value, that type would indeed tell me "if this is Some(AsyncSink::Ready), then start_send was called for some value and I must call poll_complete if I want to be sure that whatever was passed to start_send is actually sent".

Just my 2 superficial cents on this aspect - I don't really have much insight into the details and larger context of this PR (yet).

of take-over and event messages delivered over Sinks.
@twittner
Copy link
Contributor Author

twittner commented Jul 8, 2019

[...] having to call poll_complete is a property of start_send, not of AsyncSink.

AsyncSink, "the result of an asynchronous attempt to send a value to a sink" [1], is specific to start_send, which is why the documentation of AsyncSink explicitly refers to it (as well as to Sink::poll_complete).

Having a struct with a field of type AsyncSink does not automatically imply that I have to call poll_complete() if it contains Ready.

I have added some documentation to the struct fields to make this explicit.

Copy link
Member

@tomaka tomaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't understand the argument and I personally find that a very bad practice, but I also don't want to block this PR on that.

@twittner
Copy link
Contributor Author

twittner commented Jul 8, 2019

The only argument is that AsyncSink is used as advertised and not in some undocumented way. But regardless, I am not married to this particular implementation approach. IIUC you would prefer this:

pub struct RawSwarm {
// ...
    /// A take over message needs to be sent.
    take_over_to_send: Option<(TPeerId, InterruptedReachAttempt<TInEvent, (TConnInfo, ConnectedPoint), ()>)>,

    /// A take over message has been sent and needs to be flushed.
    take_over_to_flush: Option<TPeerId>
}
// ...
pub fn poll(&mut self) -> Async<RawSwarmEvent /* ... */> {
// ...
        // Attempt to send a take over message.
        if let Some((id, interrupted)) = self.take_over_to_send.take() {
            if let Some(mut peer) = self.active_nodes.peer_mut(&id) {
                if let StartTakeOver::NotReady(i) = peer.start_take_over(interrupted) {
                    self.take_over_to_send = Some((id, i));
                    return Async::NotReady
                }
                self.take_over_to_flush = Some(id)
            }
        }

        // Attempt to flush a take over message.
        if let Some(id) = self.take_over_to_flush.take() {
            if let Some(mut peer) = self.active_nodes.peer_mut(&id) {
                if let Ok(Async::NotReady) = peer.complete_take_over() {
                    self.take_over_to_flush = Some(id);
                    return Async::NotReady
                }
            }
        }
// ...
}

over the current implementation, right?

@tomaka
Copy link
Member

tomaka commented Jul 9, 2019

Let's merge this? 🙌

@twittner
Copy link
Contributor Author

twittner commented Jul 9, 2019

No objections.

@tomaka tomaka merged commit 6aba796 into libp2p:master Jul 9, 2019
@twittner twittner deleted the bounded-task-channels branch July 10, 2019 08:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants