Replace unbounded channels with bounded ones. #1191

twittner · 2019-07-02T13:51:25Z

Context: #973

To remove the remaining unbounded channels, which are used for communicating with node tasks, an API similar to futures::Sink is used, i.e. sending is split into a start and complete phase. The start phase returns StartSend and first attempts to complete any pending send operations. Completing the send means polling until Poll::Ready(()) is returned.

In addition this PR has split the handled_node_tasks module into several smaller ones (cf. nodes::tasks) and renamed some types:

nodes::handled_node_tasks::NodeTask -> nodes::tasks::task::Task
nodes::handled_node_tasks::NodeTaskInner -> nodes::tasks::task::State
nodes::handled_node_tasks::NodeTasks -> nodes::tasks::Manager
nodes::handled_node_tasks::TaskClosedEvent -> nodes::tasks::Error
nodes::handled_node_tasks::HandledNodesEvent -> nodes::tasks::Event
nodes::handled_node_tasks::Task -> nodes::tasks::TaskEntry
nodes::handled_node_tasks::ExtToInMessage -> nodes::tasks::task::ToTaskMessage
nodes::handled_node_tasks::InToExtMessage -> nodes::tasks::task::FromTaskMessage

To review this PR it is probably best to use the diff against swarm.rs, raw_swarm.rs and collection.rs and review nodes::tasks directly (keeping in mind that its contents used to form handled_nodes_tasks).

To remove the unbounded channels used for communicating with node tasks an API similar to `futures::Sink` is used, i.e. sending is split into a start and complete phase. The start phase returns `StartSend` and first attempts to complete any pending send operations. Completing the send means polling until `Poll::Ready(())` is returned. In addition this PR has split the `handled_node_tasks` module into several smaller ones (cf. `nodes::tasks`) and renamed some types: - `nodes::handled_node_tasks::NodeTask` -> `nodes::tasks::task::Task` - `nodes::handled_node_tasks::NodeTaskInner` -> `nodes::tasks::task::State` - `nodes::handled_node_tasks::NodeTasks` -> `nodes::tasks::Manager` - `nodes::handled_node_tasks::TaskClosedEvent` -> `nodes::tasks::Error` - `nodes::handled_node_tasks::HandledNodesEvent` -> `nodes::tasks::Event` - `nodes::handled_node_tasks::Task` -> `nodes::tasks::TaskEntry` - `nodes::handled_node_tasks::ExtToInMessage` -> `nodes::tasks::task::ToTaskMessage` - `nodes::handled_node_tasks::InToExtMessage` -> `nodes::tasks::task::FromTaskMessage`

tomaka · 2019-07-08T08:32:22Z

So #973 was only about events from the handler towards the main task, and not the other way around. In your PR you end up replacing the unbounded channel with a bounded one, but then you other unbounded containers (take_over_to_complete and send_events_to_complete) as buffers, which doesn't make sense to me.

Since it is always holding just a single pending message.

twittner · 2019-07-08T09:04:39Z

So #973 was only about events from the handler towards the main task, and not the other way around.

Could you please point out to me where in #973 this is stated because I can not find it?

[...] you add another unbounded container (take_over_to_complete) as a buffer, which doesn't make sense to me.

I take this as a cryptic hint that RawSwarm::take_over_to_complete and Swarm::send_events_to_complete do not need to be VecDeques, but could be Options because the poll methods are only ever going to put a single element into these collections.

tomaka · 2019-07-08T09:10:18Z

The question is: what happens if a background task is stuck (for example in an infinite loop), and the channel towards that task is full, and the user continues sending events?

With this PR, this will fill send_events_to_complete, which is exactly the same as filling an unbounded channel, except that the code is now way more complicated.

twittner · 2019-07-08T09:24:44Z

With this PR, this will fill send_events_to_complete, which is exactly the same as filling an unbounded channel, except that the code is now way more complicated.

Assuming the consumer (Task) will not consume events, then the swarm could not make progress. The previously used VecDeques would not be added to because the code that adds an element can only be reached when they are empty (cf. [1], [2]), which is the reason why they can be Options instead.

So this means that whereas unbounded channels would consume all available memory if items are not consumed, with this change the programme would instead not make progress. This is why I added the following note to Task::poll:

NOTE: It is imperative to always consume all incoming event messages first in order to not prevent the outside from making progress because they are blocked on the channel capacity. [3]

tomaka · 2019-07-08T09:40:33Z

Oh I see, you're blocking the swarm. That looks more reasonable. Let's go with that.

The other thing I don't like is that when the "pending item" is AsyncSink::Ready, that means "must call flush/poll_complete". It's not documented, not obvious, and really not great in terms of semantics.

tomaka

I didn't spot any logic mistake, but I feel like the code is complicated enough that I wouldn't be able to spot mistakes unfortunately.

core/src/nodes/tasks/manager.rs

Co-Authored-By: Pierre Krieger <[email protected]>

twittner · 2019-07-08T10:21:25Z

The other thing I don't like is that when the "pending item" is AsyncSink::Ready, that means "must call flush/poll_complete". It's not documented, not obvious, and really not great in terms of semantics.

Is this not standard usage with sinks? Fwiw https://docs.rs/futures/0.1.28/futures/enum.AsyncSink.html documents AsyncSink::Ready as:

The start_send attempt succeeded, so the sending process has started; you must use Sink::poll_complete to drive the send to completion.

tomaka · 2019-07-08T11:40:22Z

AsyncSink::Ready, to me, means "the sink was ready, therefore it accepted the item". It doesn't imply anything about flushing.

Instead of having a field pending: Option<AsyncSink<T>>, I would go for two fields pending: Option<T> and needs_flush: bool.

twittner · 2019-07-08T12:01:46Z

AsyncSink::Ready, to me, means "the sink was ready, therefore it accepted the item". It doesn't imply anything about flushing.

But it does. The API contract of Sink::start_send explains this in no uncertain terms (emphasis not mine):

Since sinks are designed to work with asynchronous I/O, the process of actually writing out the data to an underlying object takes place asynchronously. You must use poll_complete in order to drive completion of a send. In particular, start_send does not begin the flushing process [...]. This method returns AsyncSink::Ready if the sink was able to start sending item. In that case, you must ensure that you call poll_complete to process the sent item to completion.

And as I quoted above, AsyncSink::Ready's documentation reads the same.

tomaka · 2019-07-08T12:07:16Z

Imagine this piece of code:

/// If this method returns `Some`, you have to call `bar` with the value.
#[must_use]
fn foo() -> Option<Bar> {
    ...
}

That does not imply that Some means "you have to call bar".
If I have a struct with a field of type Option<Bar>, it does not automatically imply that I have to call bar() if it contains Some.

Calling bar if foo returns Some is a property of foo. Similarly, having to call poll_complete is a property of start_send, not of AsyncSink.
Having a struct with a field of type AsyncSink does not automatically imply that I have to call poll_complete() if it contains Ready.

romanb · 2019-07-08T12:56:02Z

Calling bar if foo returns Some is a property of foo. Similarly, having to call poll_complete is a property of start_send, not of AsyncSink.
Having a struct with a field of type AsyncSink does not automatically imply that I have to call poll_complete() if it contains Ready.

But it seems that the property is indeed explicitly tied to the value AsyncSink::Ready in this case, by design, not just to the function start_send. Or put differently, as per the documentation AsyncSink seems intentionally tied to start_send, the property is explicitly attached (or extended) to the data type, not just the operation. If I would see an Option<AsyncSink> in the wild as a return value, that type would indeed tell me "if this is Some(AsyncSink::Ready), then start_send was called for some value and I must call poll_complete if I want to be sure that whatever was passed to start_send is actually sent".

Just my 2 superficial cents on this aspect - I don't really have much insight into the details and larger context of this PR (yet).

of take-over and event messages delivered over Sinks.

twittner · 2019-07-08T13:09:47Z

[...] having to call poll_complete is a property of start_send, not of AsyncSink.

AsyncSink, "the result of an asynchronous attempt to send a value to a sink" [1], is specific to start_send, which is why the documentation of AsyncSink explicitly refers to it (as well as to Sink::poll_complete).

Having a struct with a field of type AsyncSink does not automatically imply that I have to call poll_complete() if it contains Ready.

I have added some documentation to the struct fields to make this explicit.

tomaka

I really don't understand the argument and I personally find that a very bad practice, but I also don't want to block this PR on that.

twittner · 2019-07-08T13:37:13Z

The only argument is that AsyncSink is used as advertised and not in some undocumented way. But regardless, I am not married to this particular implementation approach. IIUC you would prefer this:

pub struct RawSwarm {
// ...
    /// A take over message needs to be sent.
    take_over_to_send: Option<(TPeerId, InterruptedReachAttempt<TInEvent, (TConnInfo, ConnectedPoint), ()>)>,

    /// A take over message has been sent and needs to be flushed.
    take_over_to_flush: Option<TPeerId>
}
// ...
pub fn poll(&mut self) -> Async<RawSwarmEvent /* ... */> {
// ...
        // Attempt to send a take over message.
        if let Some((id, interrupted)) = self.take_over_to_send.take() {
            if let Some(mut peer) = self.active_nodes.peer_mut(&id) {
                if let StartTakeOver::NotReady(i) = peer.start_take_over(interrupted) {
                    self.take_over_to_send = Some((id, i));
                    return Async::NotReady
                }
                self.take_over_to_flush = Some(id)
            }
        }

        // Attempt to flush a take over message.
        if let Some(id) = self.take_over_to_flush.take() {
            if let Some(mut peer) = self.active_nodes.peer_mut(&id) {
                if let Ok(Async::NotReady) = peer.complete_take_over() {
                    self.take_over_to_flush = Some(id);
                    return Async::NotReady
                }
            }
        }
// ...
}

over the current implementation, right?

tomaka · 2019-07-09T13:12:10Z

Let's merge this? 🙌

twittner · 2019-07-09T13:44:19Z

No objections.

twittner added 3 commits July 2, 2019 15:43

Merge branch 'master' into bounded-task-channels

c422caa

Merge branch 'master' into bounded-task-channels

d38fad8

twittner added 2 commits July 8, 2019 10:47

take_over_to_complete can be an Option.

f58f641

Since it is always holding just a single pending message.

send_event_to_complete can be an Option.

b0d6a77

Merge branch 'master' into bounded-task-channels

bc213b2

tomaka suggested changes Jul 8, 2019

View reviewed changes

core/src/nodes/tasks/manager.rs Outdated Show resolved Hide resolved

core/src/nodes/tasks/manager.rs Outdated Show resolved Hide resolved

twittner and others added 2 commits July 8, 2019 12:02

Update core/src/nodes/tasks/manager.rs

ce247f0

Co-Authored-By: Pierre Krieger <[email protected]>

Update core/src/nodes/tasks/manager.rs

1b40cbc

Co-Authored-By: Pierre Krieger <[email protected]>

Add comments to explain the need to flush sends ...

e952cb7

of take-over and event messages delivered over Sinks.

tomaka approved these changes Jul 8, 2019

View reviewed changes

tomaka mentioned this pull request Jul 8, 2019

Rename RawSwarm* to Network*. #1194

Merged

romanb mentioned this pull request Jul 9, 2019

libp2p-next paritytech/substrate#3076

Merged

Merge branch 'master' into bounded-task-channels

52b27f1

tomaka merged commit 6aba796 into libp2p:master Jul 9, 2019

twittner deleted the bounded-task-channels branch July 10, 2019 08:11

twittner mentioned this pull request Jul 10, 2019

Add missing copyright header. #1201

Merged

romanb mentioned this pull request Feb 14, 2020

Multiple connections per peer #1440

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace unbounded channels with bounded ones. #1191

Replace unbounded channels with bounded ones. #1191

twittner commented Jul 2, 2019 •

edited

Loading

tomaka commented Jul 8, 2019 •

edited

Loading

twittner commented Jul 8, 2019

tomaka commented Jul 8, 2019 •

edited

Loading

twittner commented Jul 8, 2019

tomaka commented Jul 8, 2019

tomaka left a comment

twittner commented Jul 8, 2019

tomaka commented Jul 8, 2019

twittner commented Jul 8, 2019

tomaka commented Jul 8, 2019 •

edited

Loading

romanb commented Jul 8, 2019 •

edited

Loading

twittner commented Jul 8, 2019

tomaka left a comment

twittner commented Jul 8, 2019

tomaka commented Jul 9, 2019

twittner commented Jul 9, 2019

Replace unbounded channels with bounded ones. #1191

Replace unbounded channels with bounded ones. #1191

Conversation

twittner commented Jul 2, 2019 • edited Loading

tomaka commented Jul 8, 2019 • edited Loading

twittner commented Jul 8, 2019

tomaka commented Jul 8, 2019 • edited Loading

twittner commented Jul 8, 2019

tomaka commented Jul 8, 2019

tomaka left a comment

Choose a reason for hiding this comment

twittner commented Jul 8, 2019

tomaka commented Jul 8, 2019

twittner commented Jul 8, 2019

tomaka commented Jul 8, 2019 • edited Loading

romanb commented Jul 8, 2019 • edited Loading

twittner commented Jul 8, 2019

tomaka left a comment

Choose a reason for hiding this comment

twittner commented Jul 8, 2019

tomaka commented Jul 9, 2019

twittner commented Jul 9, 2019

twittner commented Jul 2, 2019 •

edited

Loading

tomaka commented Jul 8, 2019 •

edited

Loading

tomaka commented Jul 8, 2019 •

edited

Loading

tomaka commented Jul 8, 2019 •

edited

Loading

romanb commented Jul 8, 2019 •

edited

Loading