Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot determine if a given line is complete or not #25

Open
kokes opened this issue Mar 29, 2021 · 17 comments
Open

Cannot determine if a given line is complete or not #25

kokes opened this issue Mar 29, 2021 · 17 comments
Labels
enhancement New feature or request

Comments

@kokes
Copy link
Contributor

kokes commented Mar 29, 2021

Describe the bug

There seems to be race condition that leads to partial line reads. That is, if a line write is not atomic, I can read a partial line, but since tail strips all newlines, I'll never know this happened. (My code will fail further on as it finds out said line is incomplete.)

Expected behaviour

There are two ways this can be addressed:

  1. Newlines are returned in line.Text, this would be a breaking change, not really a good solution
  2. Adding a boolean flag NewlineEnding to Line, which would tell the reader if the line.Text returned actually contained a newline at the end or not. This should be fully backwards compatible.

Here's my locally tested patch (it's just a draft): kokes@519502e

To Reproduce

Write a simple tailer:

A cli wrapper for this library
package main

import (
"fmt"
"log"
"os"
"strings"

"github.com/nxadm/tail"

)

func main() {
if err := run(); err != nil {
log.Fatal(err)
}
}

func run() error {
fmt.Printf("PID: %v\n", os.Getpid())
logfile := os.Args[1]
t, err := tail.TailFile(logfile, tail.Config{Follow: true, ReOpen: true})
if err != nil {
return err
}

for line := range t.Lines {
	fmt.Printf("line: %v; newline ending: %v\n", line.Text, strings.HasSuffix(line.Text, "\n"))
}
return nil

}


And run this against a random text file, go run repro.go tmp.log and then write to it:

echo -n "foo" >> tmp.log
echo "foo" >> tmp.log
echo "foofoo" >> tmp.log

This creates two lines, both containing foofoo, but we can't tell from the reader's point of view:

$ go run repro.go tmp.log
PID: 57959
2021/03/29 11:17:07 Waiting for tmp.log to appear...
line: foo; newline ending: false
line: foo; newline ending: false
line: foofoo; newline ending: false

System information

  • tail version v1.4.8
  • OS: macOS
  • Arch: amd64

Additional context

This was reported back in 2014 in the original repo #35 and "fixed" without tests in #40, which deadlocked the process if the tailed file didn't end with a newline. It was then reverted in this repo by merging in #126.

My proposal doesn't affect the library's behaviour, just adds a flag to inform the user of partial rows. I wasn't super sure how to handle splitting or large lines, so please review my code.

@kokes kokes added the bug Something isn't working label Mar 29, 2021
@nxadm
Copy link
Owner

nxadm commented Mar 29, 2021

Thank you @kokes! I will review the proposed changes as soon as possible and ask you if needed for a PRs.

If people have options about this issue, feel free to comment here.

@kokes
Copy link
Contributor Author

kokes commented Mar 30, 2021

@nxadm thanks for the prompt reply, I have three bits of news:

  1. I pushed a new branch to house it all, I can submit a PR from that if needed master...kokes:eol_markers
  2. I added tests for both "plain" tails and the MaxLineSize branch (since it handles lines a bit differently, so I wanted to cover it)
  3. I thought of a different API, in case you don't like a new struct member in Line - we could add a new Config field instead, one that would say something like "yield only complete lines" or something, so we'd have to have a strings.Builder or bytes.Buffer or something that would buffer incomplete chunks. If we go with my existing implementation, I'll build this buffering on top of it anyway, that's how I thought of it... but I do like my implementation better than this config thingy, because nobody has to change their configs in any way (everyone has to handle the incomplete lines though). (One thing I haven't quite figured out is if I have to reset the buffer if there's a file rotation or if it's handled cleanly.)

@kokes
Copy link
Contributor Author

kokes commented Mar 31, 2021

So I looked into it a little more and got stuck. Even with my newline implementation, I get issues with properly buffering partial lines and it's getting a bit yucky. Also, I'm not sure if the reader should be handling all this.

I got to the following:

buf := new(strings.Builder)
bufline := -1
for line := range t.Lines {
	if bufline != line.Num {
		buf.Reset()
	}
	bufline = line.Num
	if _, err := buf.WriteString(line.Text); err != nil {
		log.Fatal("buffer issues: %v", err)
	}
	if !line.NewlineEnding {
		continue
	}

The line number there is to keep the buffer relevant only for that line and nothing else (to prevent issues during reopens). But it turns out I can't use this, because the lineNum field does NOT describe the line number in the file, but rather just a counter of lines fed through the channel already.

I tried fiddling with the lineNum counter, but it just made the code messier and messier.

I'm starting to thing this partial line business will be a bit more complex than anticipated.

@kokes
Copy link
Contributor Author

kokes commented Mar 31, 2021

Upon further reflections (and failing to implement a working solution on top of the library), I decided to rework it in a different branch, this time giving the user an extra knob to enable this new functionality.

This should be fully backwards compatible - all the tests pass, the new functionality is behind a boolean check, though I did simplify the err handling a bit, specifically this chunk seemed repetitive - we can trim the newline in case of error, it won't be there anyway.

Everything else stays the same, there's a new private struct member to buffer incomplete lines and new tests pass. Last but not least - there's a bit of data loss if a given file never ends with a new line. This shouldn't hang in case of truncations, but I'll check the PRs I linked above to see if I introduced a regression.

master...kokes:complete_lines

Edit: Oh and there's something I discovered looking through the past PRs - even if we set CompleteLines: true, perhaps we should return the last line in a file, even if it's without a newline, but we have Follow: false... so in a way it's a complete line. af593d5

This might be a bit controversial, so feel free to suggest reverting this.

@kokes
Copy link
Contributor Author

kokes commented Apr 11, 2021

@nxadm having run this in our pre-production setting for the past 10 days or so (works well), I decided to submit a PR. I only used the latter approach, I found the earlier suggestion (the one with EOL markers) to be pretty hard to reason about from a user's perspective.

@nxadm
Copy link
Owner

nxadm commented Apr 12, 2021

Great to hear. Just a message to let you know I am not ignoring the issue, just a little busy at the moment. Your effort is appreciated!

@nxadm
Copy link
Owner

nxadm commented Nov 2, 2021

Now that I preparing the release of 1.4.9/1.5.0 (in case we include the PR), I had a look at the PR. Good job. I had few remarks/questions does:

  • Is the non-new line logging a non-atomic write something very common? I suspect for most use cases it's OK to wait until the new line is sent. If it's a corner case we need to ponder if we need to change the interface for that (very slightly).
  • Is the previous answer is positive, I think it may be less involved to change the logic slightly. See https://github.com/nxadm/tail/pull/26/files#r740978639
    So instead of fulling the buffer everywhere, we only enable the functionality after checking t.lineBuf is not nil.

But again, the main question is the first one. Any ideas on the balance between the added configuration and generality of the use case? ping @dolmen, @42wim

@nxadm nxadm added enhancement New feature or request and removed bug Something isn't working labels Nov 2, 2021
@kokes
Copy link
Contributor Author

kokes commented Nov 2, 2021

@nxadm Note that the buffer is NOT filled in the default case (CompleteLines: false).

https://github.com/kokes/tail/blob/224135092b774b93c11480ca8f503f171ea90d85/tail.go#L247-L252

The buffer is only used when CompleteLines is set to true - I accepted your change request and will only allocate the buffer in that case... but it shouldn't really make much of a difference. See kokes@2241350

@nxadm
Copy link
Owner

nxadm commented Dec 16, 2021

I merged the PR, thank you. I am waiting on fsnotify for creating a new release. In the meantime, please keep testing it :)

@kokes
Copy link
Contributor Author

kokes commented Jan 16, 2022

@nxadm not sure if that's coming - fsnotify hasn't had a code/dependency change commited since the last release, so I wouldn't expect it anytime soon.

@liukun
Copy link

liukun commented Mar 17, 2022

Cool, thanks! Need this feature in a project.

@sadbox
Copy link

sadbox commented May 25, 2022

@nxadm Hey! Thanks for all the work you've done on this, do you think there's any chance a new version could be tagged? fsnotify released 1.5.4 about a month ago

@nxadm
Copy link
Owner

nxadm commented May 25, 2022

I'll try to look this week for what's needing for making a release. At the latest, I'll release next week. Thank you for your patience.

@nxadm
Copy link
Owner

nxadm commented Jun 10, 2022

Working on the new release now and bumped fsnotiy. Checking what old bugs it fixes.

@benwh
Copy link

benwh commented Jul 8, 2022

Hey @nxadm - any news on the tagged release? 😄

I think we've started running into this as an occasional edge case within our infra, so it'd be great to pull the fix in. If you need any contributions to make it happen, then please do shout.

Thanks again for all of your work on this super useful library!

@shufps
Copy link

shufps commented Aug 20, 2022

I also noticed that tailing occasionally gives incomplete lines 🤔

Probably the same problem. Is there any progress on this? 🤔

@apang1992
Copy link

I also noticed that tailing occasionally gives incomplete lines 🤔

Probably the same problem. Is there any progress on this? 🤔

go to https://github.com/hpcloud/tail . I face the same problem. I find this repo meets your requirement .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants