Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V1.2 Plans (input welcome!) #11

Open
kalafut opened this issue Jan 28, 2024 · 6 comments
Open

V1.2 Plans (input welcome!) #11

kalafut opened this issue Jan 28, 2024 · 6 comments

Comments

@kalafut
Copy link
Owner

kalafut commented Jan 28, 2024

I'm scoping a V1.1 release that will add the first new features since the initial release in 2015. The intent is to maintain backward compatibility, so this will not need to be a major (V2) release from a Go module perspective.

Planned Changes

  • User-defined sample count via a new SampleCount parameter. This will let users override the current fixed number of sample chunks (3). The samples will continue to be evenly spaced across the files. This could improve conflict detection in large files with file change properties better caught by incorporating data from all parts of the file. This feature was prompted by a discussion in the py-imohash project.

    • The behavior of SampleCount==3 will retain the current behavior for backward compatibility. (This will be a special case because the general case of how to space out n samples is slightly different than current behavior.
    • SampleCount==2 will sample at the beginning and end.
    • SampleCount==1 will sample at the beginning.
  • The core hashing algorithm will be configurable and (probably) xxhash will be added as an alternative to murmur3. xxhash is faster and may (?) have fewer weaknesses. That said, for most uses the default murmur3 hash is still fine.

  • Optional size mixing. The current behavior encodes the size into the hash by prefixing it. Having the size recoverable and/or many hashes with a similar prefix may not be desirable. A new option to mix size information into the hash will be added.

  • Adopt functional options with the New() function. This will allow for these and future enhancements without changing the default New() signature. NewCustom() will be deprecated.

  • (internal) remove the now-defunct testing library being used. (completed in 1.0.3)

@matDOTviguier
Copy link

I suggest again that SampleCount==0 would force load the full file into the hasher.
I agree with xxhash because of the small data case imohash will trigger.

@guilherme-puida
Copy link

  • (internal) remove the now-defunct testing library being used.

Would you consider doing this before releasing v1.1? I'm willing to submit a pull request if that's ok with you.

I'm working on packaging croc on Debian and would need to do some workarounds to avoid is.v1.

@kalafut
Copy link
Owner Author

kalafut commented Feb 23, 2024

@guilherme-puida Yes, thanks for letting me know and I'll prioritize that soon. I think I already eliminated it in my dev branch and will release an update with those changes.

@guilherme-puida
Copy link

Nice! Thanks for the quick response.

Even if you don't make a new tagged release, I could still just import the patch and remove it later down the road when you release v1.1. But tagging a new version would certainly be nice as well :^)

@kalafut
Copy link
Owner Author

kalafut commented Feb 23, 2024

@guilherme-puida v1.0.3 has now been pushed. lmk if you run into any packaging issues.

@guilherme-puida
Copy link

Wow! That was quick. Thanks!

Sure, I'll ping you if I run into any trouble, but I don't expect to have to.

Cheers!

@kalafut kalafut changed the title V1.1 Plans (input welcome!) V1.2 Plans (input welcome!) Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants