Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent safe writes #980

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

Ale-Cas
Copy link

@Ale-Cas Ale-Cas commented Mar 9, 2025

What type of PR is this? (check all applicable)

  • Feature

Description

Implements concurrency-safe write functions, using a mutex, as suggested in #826 and #828

Related Tickets & Documents

Added/updated tests?

  • Yes
  • No, and this is why: please replace this line with details on why tests
    have not been included
  • I need help with writing tests

Run verifications and test

  • make verify is passing
  • make test is passing

@hulkingshtick
Copy link

The PR does not enable concurrent writes.

This PR replaces the best-effort race detector with a mutex. The scope of the state that must be protected by a mutex is greater than the scope of the race detector.

The function TestConcurrentWrites does not call WriteMessage concurrently. The tests will fail under the race detector when the test is modified to allow concurrent calls.

There is discussion in issues on why the Conn methods are not not amenable to allowing concurrent writes. This PR does not address any of those problems.

There are other problems with the PR, but I will not discuss them here because the points above are fatal.

@Ale-Cas
Copy link
Author

Ale-Cas commented Mar 10, 2025

Hi @hulkingshtick, you're right this PR is not aiming to enable concurrent writes on the connection, the purpose was just to enable thread-safety in write functions using a mutex, and to remove the concurrent write to websocket connection panic.
I can update the title and improve the description so that the purpose it's clearer.
To me it still looks like an improvement over the current isWriting boolean flag, and it removes the burden to write some boiler plate code on the application side, like the one we see in #828 (comment).

Maybe I misunderstood your comment here:

Adding a mutex to protect those two calls will probably eliminate most of the reported concurrent write to websocket connection panics and mostly do the right thing for the application.

Isn't this what you had in mind?

There are other problems with the PR, but I will not discuss them here because the points above are fatal.

If there are other problems, I'm open to feedback and happy to improve the PR. Otherwise if you still think it does not bring any value I'll close it up.

@hulkingshtick
Copy link

the purpose was just to enable thread-safety in write functions using a mutex

The concurrency issue is higher in the call stack. Any fix for the problem will not require a mutex in Conn.write.

and to remove the concurrent write to websocket connection panic

This panic is useful because the panic message explicitly tells the programmer what the problem is. The alternative is to panic in less obvious ways or to corrupt data.

and it removes the burden to write some boiler plate code on the application side, like the one we see in #828 (comment).

This PR does not relax the concurrency requirements because the PR does not fix the concurrent write problem.

Update the your test to call WriteMessage concurrently and go test -race to see what happens.

Isn't this what you had in mind?

I said that mutual exclusion lock in Conn.WriteMessage and Conn.WriteMessage will reduce the number of issues reported against the package. You added a lock lower in the call stack.

If there are other problems, I'm open to feedback and happy to improve the PR.

  • Don't include unrelated changes in a PR.
  • Update comments when the referenced code changes.

@hulkingshtick
Copy link

hulkingshtick commented Mar 11, 2025

This test below fails when run with go test -race.

func TestConcurrentWrites(t *testing.T) {
	var connBuf bytes.Buffer
	c := newTestConn(nil, &connBuf, false)
	nGoroutines := 5
	done := make(chan error, nGoroutines)
	for i := 0; i < nGoroutines; i++ {
		go func() {
			err := c.WriteMessage(TextMessage, []byte{})
			done <- err
		}()
    }
    for i := 0; i < nGoroutines; i++ {
		err := <-done
		if err != nil {
			t.Fatal(err)
		}
	}
}

This test starts all goroutines before waiting. The test in the PR waits on the previously started goroutine before starting the next goroutine.

@Ale-Cas
Copy link
Author

Ale-Cas commented Mar 11, 2025

I understand now, thank you very much for clarifying, I'm working on it and will follow your suggestions

@Ale-Cas
Copy link
Author

Ale-Cas commented Mar 11, 2025

Don't include unrelated changes in a PR.

Btw if you're referring to the changes in the files other than conn/conn_test, they've been included here just because I run gofmt -w . so those are just formatting changes, but if you want I can revert them manually.
Unless you were talking about something else?

@frankjkelly
Copy link

Very interested in this . . . . . following. . . .

@hulkingshtick
Copy link

The following test fails when run with the race detector.

func TestConcurrencyNextWriter(t *testing.T) {
	loop := 10
	workers := 10
	for i := 0; i < loop; i++ {
		var connBuf bytes.Buffer

		wg := sync.WaitGroup{}
		wc := newTestConn(nil, &connBuf, true)

		for i := 0; i < workers; i++ {
			wg.Add(1)
			go func() {
				defer wg.Done()
				w, err := wc.NextWriter(TextMessage)
				if err != nil {
					t.Errorf("concurrently wc.NextWriter() returned %v", err)
				}
				w.Write([]byte("fail"))
			}()
		}

		wg.Wait()
		wc.Close()
	}
}

There is discussion in issues on why the Conn methods are not not amenable to allowing concurrent writes. This PR does not address any of those problems.

@Ale-Cas
Copy link
Author

Ale-Cas commented Mar 15, 2025

@hulkingshtick Thank you very much for the review and for providing that failing test!
I looked again at the issues and I noticed only now your comment here.

Looking again at both your comment and the code, I agree that there's no good way to make NextWriter thread-safe.

Still, I'm wondering if making the WriteMessage and WriteJSON methods thread-safe would reduce issues related to concurrent writes.
Additionally, it would make sense to improve the current detector with a counter, as you suggested, in the conn.write method to ensure it panics in the first concurrent call to NextWriter or writer.Write.

An alternative, as you suggested would be to add a new package that encapsulates this common use case, which would provide developers with a robust higher-level API.

Based on the outcome of this discussion, I will either update or close this PR.

@hulkingshtick
Copy link

Still, I'm wondering if making the WriteMessage and WriteJSON methods thread-safe would reduce issues related to concurrent writes.

Those methods are often on the stack in issues reporting the concurrent write panic, so yes, adding a mutex will cut down on reported issues.

The goal of my suggestion is to reduce the support burden for the maintainers, not to make those applications correct. I would hold off on any changes until the maintainers show that they are still interested in maintaining this package.

@liggitt
Copy link

liggitt commented Mar 19, 2025

If the underlying Conn won't support concurrent writes well, is it a good idea to allow concurrency in calls to higher level methods? That seems like it will invite concurrent use that will be non-deterministic and cause more problems.

@frankjkelly
Copy link

frankjkelly commented Mar 19, 2025

If the underlying Conn won't support concurrent writes well, is it a good idea to allow concurrency in calls to higher level methods? That seems like it will invite concurrent use that will be non-deterministic and cause more problems.

It would be great to have concurrency but even if we can't if there were examples that showed recommended "templates" of how to achieve concurrency (with locks? channels?) that would be great.

We have a use-case where we have multiple writers and we can figure out the locks but I'm sure whatever we will figure out will likely be suboptimal given our (current) shallow understanding of concurrency / goroutines / locking versus what the great folks here on in the open source world have. Either way, appreciative of the effort here by all sides.

@hulkingshtick
Copy link

If the underlying Conn won't support concurrent writes well, is it a good idea to allow concurrency in calls to higher level methods

The current code ensures that there are no concurrent writes to the underlying net.Conn.

if there were examples that showed recommended "templates" of how to achieve concurrency (with locks? channels?) that would be great.

https://github.com/gorilla/websocket/tree/main/examples

@frankjkelly
Copy link

if there were examples that showed recommended "templates" of how to achieve concurrency (with locks? channels?) that would be great.

https://github.com/gorilla/websocket/tree/main/examples

Thanks @hulkingshtick but maybe I am missing something?

$ find . -name "*.go" | xargs grep -i Lock
$ 

I see some <- instances but not sure which one is specific what I am looking for. Can you advise?

@hulkingshtick
Copy link

@frankjkelly The examples use goroutines and channels to ensure a single concurrent writer. This approach is the simplest way to make an application robust against slow or dead peers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants