-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add C-minHash variant #203
Labels
Comments
This is interesting! From my quick read of the abstract it seems improvement is in reducing the number of permutation functions, but not really reducing the number of hash values stored? Am I missing something? |
Given that the hash values can be reproduced deterministically, they don't ned to be stored.
The hash values can be recomputed lazily - trading compute for space.
On Sunday, March 19, 2023, 10:49:25 a.m. PDT, Eric Zhu ***@***.***> wrote:
This is interesting! From my quick read of the abstract it seems improvement is in reducing the number of permutation functions, but not really reducing the number of hash values stored? Am I missing something?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
* need to be stored
Apologies for the typo.
On Sunday, March 19, 2023, 10:59:30 a.m. PDT, Kevin Jones ***@***.***> wrote:
Given that the hash values can be reproduced deterministically, they don't ned to be stored.
The hash values can be recomputed lazily - trading compute for space.
On Sunday, March 19, 2023, 10:49:25 a.m. PDT, Eric Zhu ***@***.***> wrote:
This is interesting! From my quick read of the abstract it seems improvement is in reducing the number of permutation functions, but not really reducing the number of hash values stored? Am I missing something?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Hmm. I am not sure I am getting this correctly. Reading Algorithm
3 C-MinHash-(σ, π), it seems to me that the hash values are still needed
from the first permutation. So let's say we have a set { a, b, c, d, e }
and after hashing and first permutation function, it becomes { 12, 32, 43,
231, 4 }. After you applying k (e.g., k = 3) circular minhash it becomes
{32, 34, 43} -- I just made these numbers up. So if my understanding is
correct, then you either store the minimum hash values in the end (same as
the classic minhash) or you store the original hash values, which is what
we want to avoid in the first place.
…On Sun, Mar 19, 2023 at 11:10 AM icsa ***@***.***> wrote:
* need to be stored
Apologies for the typo.
On Sunday, March 19, 2023, 10:59:30 a.m. PDT, Kevin Jones ***@***.***>
wrote:
Given that the hash values can be reproduced deterministically, they don't
ned to be stored.
The hash values can be recomputed lazily - trading compute for space.
On Sunday, March 19, 2023, 10:49:25 a.m. PDT, Eric Zhu ***@***.***> wrote:
This is interesting! From my quick read of the abstract it seems
improvement is in reducing the number of permutation functions, but not
really reducing the number of hash values stored? Am I missing something?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID:
***@***.***>
—
Reply to this email directly, view it on GitHub
<#203 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACOGLWSK3TTCH2RXFR2QPLW45DZHANCNFSM6AAAAAAWAHDJKA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
The "Circulant" part means that you can use a random permutation then rotate the permutation by k (of K) items to create "new" permutations.
One permutation gets used K times under rotation instead of K separate random permutations.
The rotated permutations can be computed "on the fly" and discarded. They do not need to be stored.
On Sunday, March 19, 2023, 09:54:02 p.m. PDT, Eric Zhu ***@***.***> wrote:
Hmm. I am not sure I am getting this correctly. Reading Algorithm
3 C-MinHash-(σ, π), it seems to me that the hash values are still needed
from the first permutation. So let's say we have a set { a, b, c, d, e }
and after hashing and first permutation function, it becomes { 12, 32, 43,
231, 4 }. After you applying k (e.g., k = 3) circular minhash it becomes
{32, 34, 43} -- I just made these numbers up. So if my understanding is
correct, then you either store the minimum hash values in the end (same as
the classic minhash) or you store the original hash values, which is what
we want to avoid in the first place.
On Sun, Mar 19, 2023 at 11:10 AM icsa ***@***.***> wrote:
* need to be stored
Apologies for the typo.
On Sunday, March 19, 2023, 10:59:30 a.m. PDT, Kevin Jones ***@***.***>
wrote:
Given that the hash values can be reproduced deterministically, they don't
ned to be stored.
The hash values can be recomputed lazily - trading compute for space.
On Sunday, March 19, 2023, 10:49:25 a.m. PDT, Eric Zhu ***@***.***> wrote:
This is interesting! From my quick read of the abstract it seems
improvement is in reducing the number of permutation functions, but not
really reducing the number of hash values stored? Am I missing something?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID:
***@***.***>
—
Reply to this email directly, view it on GitHub
<#203 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACOGLWSK3TTCH2RXFR2QPLW45DZHANCNFSM6AAAAAAWAHDJKA>
.
You are receiving this because you commented.Message ID:
***@***.***>
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
But how do you compute the minimum value for each permutation function
without having access to the input hash values?
…On Sun, Mar 19, 2023 at 10:04 PM icsa ***@***.***> wrote:
The "Circulant" part means that you can use a random permutation then
rotate the permutation by k (of K) items to create "new" permutations.
One permutation gets used K times under rotation instead of K separate
random permutations.
The rotated permutations can be computed "on the fly" and discarded. They
do not need to be stored.
On Sunday, March 19, 2023, 09:54:02 p.m. PDT, Eric Zhu ***@***.***> wrote:
Hmm. I am not sure I am getting this correctly. Reading Algorithm
3 C-MinHash-(σ, π), it seems to me that the hash values are still needed
from the first permutation. So let's say we have a set { a, b, c, d, e }
and after hashing and first permutation function, it becomes { 12, 32, 43,
231, 4 }. After you applying k (e.g., k = 3) circular minhash it becomes
{32, 34, 43} -- I just made these numbers up. So if my understanding is
correct, then you either store the minimum hash values in the end (same as
the classic minhash) or you store the original hash values, which is what
we want to avoid in the first place.
On Sun, Mar 19, 2023 at 11:10 AM icsa ***@***.***> wrote:
> * need to be stored
>
> Apologies for the typo.
> On Sunday, March 19, 2023, 10:59:30 a.m. PDT, Kevin Jones ***@***.***>
> wrote:
>
> Given that the hash values can be reproduced deterministically, they
don't
> ned to be stored.
> The hash values can be recomputed lazily - trading compute for space.
> On Sunday, March 19, 2023, 10:49:25 a.m. PDT, Eric Zhu ***@***.***>
wrote:
>
>
>
>
> This is interesting! From my quick read of the abstract it seems
> improvement is in reducing the number of permutation functions, but not
> really reducing the number of hash values stored? Am I missing something?
>
> —
> Reply to this email directly, view it on GitHub, or unsubscribe.
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
> —
> Reply to this email directly, view it on GitHub
> <#203 (comment)
>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AACOGLWSK3TTCH2RXFR2QPLW45DZHANCNFSM6AAAAAAWAHDJKA
>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID:
***@***.***>
—
Reply to this email directly, view it on GitHub
<#203 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACOGLTNUDXGUNJ5WJVCRRLW47QO7ANCNFSM6AAAAAAWAHDJKA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I mean "having access to the values after 1st permutation"
…On Sun, Mar 19, 2023 at 10:09 PM Eric Zhu ***@***.***> wrote:
But how do you compute the minimum value for each permutation function
without having access to the input hash values?
On Sun, Mar 19, 2023 at 10:04 PM icsa ***@***.***> wrote:
> The "Circulant" part means that you can use a random permutation then
> rotate the permutation by k (of K) items to create "new" permutations.
>
> One permutation gets used K times under rotation instead of K separate
> random permutations.
> The rotated permutations can be computed "on the fly" and discarded. They
> do not need to be stored.
> On Sunday, March 19, 2023, 09:54:02 p.m. PDT, Eric Zhu ***@***.***>
> wrote:
>
>
>
> Hmm. I am not sure I am getting this correctly. Reading Algorithm
> 3 C-MinHash-(σ, π), it seems to me that the hash values are still needed
> from the first permutation. So let's say we have a set { a, b, c, d, e }
> and after hashing and first permutation function, it becomes { 12, 32, 43,
> 231, 4 }. After you applying k (e.g., k = 3) circular minhash it becomes
> {32, 34, 43} -- I just made these numbers up. So if my understanding is
> correct, then you either store the minimum hash values in the end (same as
> the classic minhash) or you store the original hash values, which is what
> we want to avoid in the first place.
>
> On Sun, Mar 19, 2023 at 11:10 AM icsa ***@***.***> wrote:
>
> > * need to be stored
> >
> > Apologies for the typo.
> > On Sunday, March 19, 2023, 10:59:30 a.m. PDT, Kevin Jones ***@***.***>
> > wrote:
> >
> > Given that the hash values can be reproduced deterministically, they
> don't
> > ned to be stored.
> > The hash values can be recomputed lazily - trading compute for space.
> > On Sunday, March 19, 2023, 10:49:25 a.m. PDT, Eric Zhu ***@***.***>
> wrote:
> >
> >
> >
> >
> > This is interesting! From my quick read of the abstract it seems
> > improvement is in reducing the number of permutation functions, but not
> > really reducing the number of hash values stored? Am I missing
> something?
> >
> > —
> > Reply to this email directly, view it on GitHub, or unsubscribe.
> > You are receiving this because you authored the thread.Message ID:
> > ***@***.***>
> >
> > —
> > Reply to this email directly, view it on GitHub
> > <#203 (comment)
> >,
> > or unsubscribe
> > <
> https://github.com/notifications/unsubscribe-auth/AACOGLWSK3TTCH2RXFR2QPLW45DZHANCNFSM6AAAAAAWAHDJKA
> >
> > .
> > You are receiving this because you commented.Message ID:
> > ***@***.***>
> >
>
> —
> Reply to this email directly, view it on GitHub, or unsubscribe.
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
> —
> Reply to this email directly, view it on GitHub
> <#203 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AACOGLTNUDXGUNJ5WJVCRRLW47QO7ANCNFSM6AAAAAAWAHDJKA>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
|
But how do you compute the minimum value for each permutation function
without having access to the input hash values?
Lazily, perhaps using a Python generator.The hash values would be computed dynamically from the dynamically generated permutation.
From the paper:
Algorithm 2 C-MinHash-(0, π)
Input: Binary data vector v ∈ {0, 1}D , Permutation vector π: [D] → [D]Output: Hash values h1(v), ..., hK (v)
For k = 1 to K
Shift π circulantly rightwards by k units: πk = π→k
hk(v) ← mini:vi̸=0 π→k(i) // hash(k) <- apply the "virtual" permutation to the data and find the first bit set
End For
On Sunday, March 19, 2023, 10:10:50 p.m. PDT, Eric Zhu ***@***.***> wrote:
I mean "having access to the values after 1st permutation"
On Sun, Mar 19, 2023 at 10:09 PM Eric Zhu ***@***.***> wrote:
But how do you compute the minimum value for each permutation function
without having access to the input hash values?
On Sun, Mar 19, 2023 at 10:04 PM icsa ***@***.***> wrote:
> The "Circulant" part means that you can use a random permutation then
> rotate the permutation by k (of K) items to create "new" permutations.
>
> One permutation gets used K times under rotation instead of K separate
> random permutations.
> The rotated permutations can be computed "on the fly" and discarded. They
> do not need to be stored.
> On Sunday, March 19, 2023, 09:54:02 p.m. PDT, Eric Zhu ***@***.***>
> wrote:
>
>
>
> Hmm. I am not sure I am getting this correctly. Reading Algorithm
> 3 C-MinHash-(σ, π), it seems to me that the hash values are still needed
> from the first permutation. So let's say we have a set { a, b, c, d, e }
> and after hashing and first permutation function, it becomes { 12, 32, 43,
> 231, 4 }. After you applying k (e.g., k = 3) circular minhash it becomes
> {32, 34, 43} -- I just made these numbers up. So if my understanding is
> correct, then you either store the minimum hash values in the end (same as
> the classic minhash) or you store the original hash values, which is what
> we want to avoid in the first place.
>
> On Sun, Mar 19, 2023 at 11:10 AM icsa ***@***.***> wrote:
>
> > * need to be stored
> >
> > Apologies for the typo.
> > On Sunday, March 19, 2023, 10:59:30 a.m. PDT, Kevin Jones ***@***.***>
> > wrote:
> >
> > Given that the hash values can be reproduced deterministically, they
> don't
> > ned to be stored.
> > The hash values can be recomputed lazily - trading compute for space.
> > On Sunday, March 19, 2023, 10:49:25 a.m. PDT, Eric Zhu ***@***.***>
> wrote:
> >
> >
> >
> >
> > This is interesting! From my quick read of the abstract it seems
> > improvement is in reducing the number of permutation functions, but not
> > really reducing the number of hash values stored? Am I missing
> something?
> >
> > —
> > Reply to this email directly, view it on GitHub, or unsubscribe.
> > You are receiving this because you authored the thread.Message ID:
> > ***@***.***>
> >
> > —
> > Reply to this email directly, view it on GitHub
> > <#203 (comment)
> >,
> > or unsubscribe
> > <
> https://github.com/notifications/unsubscribe-auth/AACOGLWSK3TTCH2RXFR2QPLW45DZHANCNFSM6AAAAAAWAHDJKA
> >
> > .
> > You are receiving this because you commented.Message ID:
> > ***@***.***>
> >
>
> —
> Reply to this email directly, view it on GitHub, or unsubscribe.
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
> —
> Reply to this email directly, view it on GitHub
> <#203 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AACOGLTNUDXGUNJ5WJVCRRLW47QO7ANCNFSM6AAAAAAWAHDJKA>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Sorry I am still not getting this. Can you show me a Python example of lazy
computation?
…On Sun, Mar 19, 2023 at 10:23 PM icsa ***@***.***> wrote:
> But how do you compute the minimum value for each permutation function
> without having access to the input hash values?
Lazily, perhaps using a Python generator.The hash values would be computed
dynamically from the dynamically generated permutation.
From the paper:
Algorithm 2 C-MinHash-(0, π)
Input: Binary data vector v ∈ {0, 1}D , Permutation vector π: [D] →
[D]Output: Hash values h1(v), ..., hK (v)
For k = 1 to K
Shift π circulantly rightwards by k units: πk = π→k
hk(v) ← mini:vi̸=0 π→k(i) // hash(k) <- apply the "virtual" permutation
to the data and find the first bit set
End For
On Sunday, March 19, 2023, 10:10:50 p.m. PDT, Eric Zhu ***@***.***> wrote:
I mean "having access to the values after 1st permutation"
On Sun, Mar 19, 2023 at 10:09 PM Eric Zhu ***@***.***> wrote:
> But how do you compute the minimum value for each permutation function
> without having access to the input hash values?
>
> On Sun, Mar 19, 2023 at 10:04 PM icsa ***@***.***> wrote:
>
>> The "Circulant" part means that you can use a random permutation then
>> rotate the permutation by k (of K) items to create "new" permutations.
>>
>> One permutation gets used K times under rotation instead of K separate
>> random permutations.
>> The rotated permutations can be computed "on the fly" and discarded.
They
>> do not need to be stored.
>> On Sunday, March 19, 2023, 09:54:02 p.m. PDT, Eric Zhu ***@***.***>
>> wrote:
>>
>>
>>
>> Hmm. I am not sure I am getting this correctly. Reading Algorithm
>> 3 C-MinHash-(σ, π), it seems to me that the hash values are still needed
>> from the first permutation. So let's say we have a set { a, b, c, d, e }
>> and after hashing and first permutation function, it becomes { 12, 32,
43,
>> 231, 4 }. After you applying k (e.g., k = 3) circular minhash it becomes
>> {32, 34, 43} -- I just made these numbers up. So if my understanding is
>> correct, then you either store the minimum hash values in the end (same
as
>> the classic minhash) or you store the original hash values, which is
what
>> we want to avoid in the first place.
>>
>> On Sun, Mar 19, 2023 at 11:10 AM icsa ***@***.***> wrote:
>>
>> > * need to be stored
>> >
>> > Apologies for the typo.
>> > On Sunday, March 19, 2023, 10:59:30 a.m. PDT, Kevin Jones ***@***.***>
>> > wrote:
>> >
>> > Given that the hash values can be reproduced deterministically, they
>> don't
>> > ned to be stored.
>> > The hash values can be recomputed lazily - trading compute for space.
>> > On Sunday, March 19, 2023, 10:49:25 a.m. PDT, Eric Zhu ***@***.***>
>> wrote:
>> >
>> >
>> >
>> >
>> > This is interesting! From my quick read of the abstract it seems
>> > improvement is in reducing the number of permutation functions, but
not
>> > really reducing the number of hash values stored? Am I missing
>> something?
>> >
>> > —
>> > Reply to this email directly, view it on GitHub, or unsubscribe.
>> > You are receiving this because you authored the thread.Message ID:
>> > ***@***.***>
>> >
>> > —
>> > Reply to this email directly, view it on GitHub
>> > <
#203 (comment)
>> >,
>> > or unsubscribe
>> > <
>>
https://github.com/notifications/unsubscribe-auth/AACOGLWSK3TTCH2RXFR2QPLW45DZHANCNFSM6AAAAAAWAHDJKA
>> >
>> > .
>> > You are receiving this because you commented.Message ID:
>> > ***@***.***>
>> >
>>
>> —
>> Reply to this email directly, view it on GitHub, or unsubscribe.
>> You are receiving this because you authored the thread.Message ID:
>> ***@***.***>
>>
>> —
>> Reply to this email directly, view it on GitHub
>> <#203 (comment)
>,
>> or unsubscribe
>> <
https://github.com/notifications/unsubscribe-auth/AACOGLTNUDXGUNJ5WJVCRRLW47QO7ANCNFSM6AAAAAAWAHDJKA
>
>> .
>> You are receiving this because you commented.Message ID:
>> ***@***.***>
>>
>
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID:
***@***.***>
—
Reply to this email directly, view it on GitHub
<#203 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACOGLRKJTPLE5MCPPBQMPTW47STTANCNFSM6AAAAAAWAHDJKA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Let me work on a python implementation of the C-minHash algorithm to demonstrate the concept in context.
I'll get you the outline of the code in the next few days.
I'm quite busy, atm, using your library to great effect :)
In the mean time, here is a example of lazy computation in python:
def fibonacci():
a, b = 0, 1
while True:
yield a
a, b = b, a + bThe fibonacci values are computed on demand, one at a time, as needed.
Each call to fibonacci() produces one subsequent value.
source: Lazy evaluation in Python
|
|
| |
Lazy evaluation in Python
Jakub Konka
Jakub Konka's weblog about programming, game theory, maths, and more.
|
|
|
On Sunday, March 19, 2023, 11:00:25 p.m. PDT, Eric Zhu ***@***.***> wrote:
Sorry I am still not getting this. Can you show me a Python example of lazy
computation?
On Sun, Mar 19, 2023 at 10:23 PM icsa ***@***.***> wrote:
> But how do you compute the minimum value for each permutation function
> without having access to the input hash values?
Lazily, perhaps using a Python generator.The hash values would be computed
dynamically from the dynamically generated permutation.
From the paper:
Algorithm 2 C-MinHash-(0, π)
Input: Binary data vector v ∈ {0, 1}D , Permutation vector π: [D] →
[D]Output: Hash values h1(v), ..., hK (v)
For k = 1 to K
Shift π circulantly rightwards by k units: πk = π→k
hk(v) ← mini:vi̸=0 π→k(i) // hash(k) <- apply the "virtual" permutation
to the data and find the first bit set
End For
On Sunday, March 19, 2023, 10:10:50 p.m. PDT, Eric Zhu ***@***.***> wrote:
I mean "having access to the values after 1st permutation"
On Sun, Mar 19, 2023 at 10:09 PM Eric Zhu ***@***.***> wrote:
> But how do you compute the minimum value for each permutation function
> without having access to the input hash values?
>
> On Sun, Mar 19, 2023 at 10:04 PM icsa ***@***.***> wrote:
>
>> The "Circulant" part means that you can use a random permutation then
>> rotate the permutation by k (of K) items to create "new" permutations.
>>
>> One permutation gets used K times under rotation instead of K separate
>> random permutations.
>> The rotated permutations can be computed "on the fly" and discarded.
They
>> do not need to be stored.
>> On Sunday, March 19, 2023, 09:54:02 p.m. PDT, Eric Zhu ***@***.***>
>> wrote:
>>
>>
>>
>> Hmm. I am not sure I am getting this correctly. Reading Algorithm
>> 3 C-MinHash-(σ, π), it seems to me that the hash values are still needed
>> from the first permutation. So let's say we have a set { a, b, c, d, e }
>> and after hashing and first permutation function, it becomes { 12, 32,
43,
>> 231, 4 }. After you applying k (e.g., k = 3) circular minhash it becomes
>> {32, 34, 43} -- I just made these numbers up. So if my understanding is
>> correct, then you either store the minimum hash values in the end (same
as
>> the classic minhash) or you store the original hash values, which is
what
>> we want to avoid in the first place.
>>
>> On Sun, Mar 19, 2023 at 11:10 AM icsa ***@***.***> wrote:
>>
>> > * need to be stored
>> >
>> > Apologies for the typo.
>> > On Sunday, March 19, 2023, 10:59:30 a.m. PDT, Kevin Jones ***@***.***>
>> > wrote:
>> >
>> > Given that the hash values can be reproduced deterministically, they
>> don't
>> > ned to be stored.
>> > The hash values can be recomputed lazily - trading compute for space.
>> > On Sunday, March 19, 2023, 10:49:25 a.m. PDT, Eric Zhu ***@***.***>
>> wrote:
>> >
>> >
>> >
>> >
>> > This is interesting! From my quick read of the abstract it seems
>> > improvement is in reducing the number of permutation functions, but
not
>> > really reducing the number of hash values stored? Am I missing
>> something?
>> >
>> > —
>> > Reply to this email directly, view it on GitHub, or unsubscribe.
>> > You are receiving this because you authored the thread.Message ID:
>> > ***@***.***>
>> >
>> > —
>> > Reply to this email directly, view it on GitHub
>> > <
#203 (comment)
>> >,
>> > or unsubscribe
>> > <
>>
https://github.com/notifications/unsubscribe-auth/AACOGLWSK3TTCH2RXFR2QPLW45DZHANCNFSM6AAAAAAWAHDJKA
>> >
>> > .
>> > You are receiving this because you commented.Message ID:
>> > ***@***.***>
>> >
>>
>> —
>> Reply to this email directly, view it on GitHub, or unsubscribe.
>> You are receiving this because you authored the thread.Message ID:
>> ***@***.***>
>>
>> —
>> Reply to this email directly, view it on GitHub
>> <#203 (comment)
>,
>> or unsubscribe
>> <
https://github.com/notifications/unsubscribe-auth/AACOGLTNUDXGUNJ5WJVCRRLW47QO7ANCNFSM6AAAAAAWAHDJKA
>
>> .
>> You are receiving this because you commented.Message ID:
>> ***@***.***>
>>
>
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID:
***@***.***>
—
Reply to this email directly, view it on GitHub
<#203 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACOGLRKJTPLE5MCPPBQMPTW47STTANCNFSM6AAAAAAWAHDJKA>
.
You are receiving this because you commented.Message ID:
***@***.***>
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Looking forward to see what you build! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
C-minHash (or Circulant minHash) needs only two permutations versus K permutations, in practice.
This minHash variant could save significant storage space for large-scale data.
source: https://arxiv.org/abs/2109.03337
The text was updated successfully, but these errors were encountered: