-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparsehash (friendly fork of imohash) #7
Comments
Updates:
So it is now really different (and much smaller) - but the tests are still passing! (with some adaptation to re-include the input size) |
@oliverpool Cool! Glad you're getting some use out of the library. We're you hoping for a merge back into imohash, or do you think things have diverged, or may do so? I wouldn't mind cutting a v2 at some point with some of the enhancement mentioned in #6 or your fork, though I'm not sure when I'll be able to get to that. Though this may be fortuitous... as perhaps you'll discover some other things as you work on your project. |
Now that I refactored it further and changed a lot of bits, I am not sure if merging back would be worth it. (your package and its license are included there, since most of the code comes from here) |
Sounds good. I'll keep an eye on your project, especially should I get around to some of the v2 mods. For now I'm going to close the issue. Thanks! |
@oliverpool Would you turn on Github Issues for SparseHash? I wanted to have some discussion with you, as you've done many of the things that I was considering doing, and there are a few more options I'm wondering if you'd be willing to consider - such as making the following variable:
16k is probably enough for all common formats (jpeg, hvec, mpeg, mp4, etc), but I'd like to have some sort of reference to reasonably prove that or to have it adjustable. Also, I'm curious as to what the two of you are working on. |
@oliverpool @coolaj86 It has been a while since we chatted about this, and I'm curious how the SparseHash updates worked out for you? I'm definitely willing to consider pulling some of the options back into imohash, perhaps as a v2 (or maybe accomplished just with functional options). |
@coolaj86 I opened the issues there. As you can see on the documentation, the I am currently not considering allowing the
@kalafut I am quite happy with my fork: it doesn't have any external dependency (which allowed me to use a fork of murmur3). I also added error-handling everywhere. I think functional options are very cool (I will not use them, because the 3 parameters of my It should be feasible to refactor your current version to use a v2 implementation in the background (I am not sure if it is worth it). |
@oliverpool Nice. At a minimum I might survey the mmh3 landscape to see if that dep should be upgraded. |
@oliverpool @coolaj86 In case either of you are still interested in this space, I'm scoping out a few backwards compatible updates, including user-configurable chunk count. See #11 |
@kalafut thanks for this great package! (I am especially grateful for the tests, which allowed me to refactor without breaking the algo!)
I really like the idea of hashing only a small part of the file and plan to use it for my next project.
I took a look at the code and did some refactoring (which ended into a fork: https://github.com/oliverpool/sparsehash - documentation).
Main differences:
cmd
and the example show how easy it is to plugmurmur3
SampleSize = 0
)I also added error handling (in case the file reading triggers some error).
Feel free to share you thoughts.
The text was updated successfully, but these errors were encountered: