Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: sync stuck due to networking error #240

Open
hunjixin opened this issue Jan 9, 2025 · 5 comments
Open

bug: sync stuck due to networking error #240

hunjixin opened this issue Jan 9, 2025 · 5 comments
Assignees

Comments

@hunjixin
Copy link

hunjixin commented Jan 9, 2025

reproduce on mocha network

  1. setup a new full node to sync,
  2. close the network(turn off wifi, pull out wire, etc)
  3. wait for all peers to fail
  4. open the network
  5. the sync goroutine will never recover

the sync goroutine is stucking
image

if network is down, the peers in peerQuene will always be in a decreasing state due to errors that not a ErrNotFound one(network connect fail etc). and eventually run out of the havePeer channel. in this time GetRangeByHeight alway wait for hasPeer channel while getRangeByHeight wait for the result channel

a candidate fix is push back the peer state for errEmptyResponse error. #238

but another fix is to add timeout for GetRangeByHeight

func (s *Syncer[H]) requestHeaders(
	ctx context.Context,
	fromHead H,
	to uint64,
) error {
	amount := to - fromHead.Height()
	// start requesting headers until amount remaining will be 0
	for amount > 0 {
		size := header.MaxRangeRequestSize
		if amount < size {
			size = amount
		}

		to := fromHead.Height() + size + 1
		s.metrics.rangeRequestStart()
                 //to fix , add timeout for this context
		headers, err := s.getter.GetRangeByHeight(ctx, fromHead, to)
		s.metrics.updateGetRangeRequestInfo(s.ctx, int(size)/100, err != nil)
		s.metrics.rangeRequestStop()
		if err != nil {
			return err
		}

		if err := s.storeHeaders(ctx, headers...); err != nil {
			return err
		}

		amount -= size // size == len(headers)
		fromHead = headers[len(headers)-1]
	}
	return nil
}
@hunjixin hunjixin changed the title bug: sync bug: sync stuck due to networking error Jan 9, 2025
@Wondertan
Copy link
Member

Hey @hunjixin, thanks for detailed bug report. I am gonna take a look soon, unless @vgonkivs wants to beat me to it and look faster

@vgonkivs
Copy link
Member

vgonkivs commented Jan 9, 2025

Thanks for opening an issue @hunjixin. I will take a closer look and report you back asap.

@hunjixin
Copy link
Author

Thanks for opening an issue @hunjixin. I will take a closer look and report you back asap.

there are about 70 peers when session created in mocha testnet. you can add log to confirm that hasPeer channel was drained.

@hunjixin
Copy link
Author

any progress?

@vgonkivs
Copy link
Member

vgonkivs commented Jan 17, 2025

Hey @hunjixin. I'm working on a proposal. The PR will be opened soon.

@vgonkivs vgonkivs self-assigned this Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants