You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dear @ekzhu
Now we start to use v1.6.5 and plan to use Redis as storage in production, thanks for "MinHash LSH supports using Redis as the storage layer for handling large index and providing optional persistence as part of a production environment."
And in real usage, we hope to use Redis or similar middleware to storage the MinHash.
Because as shown in the MinHashLSH doc below, after LSM query, we need to MinHash to calculate Jaccard similarity and make a sort.
We hope to use redis or other no-sql database to store MinHash also.
But I see the MinHash doc, Since version 1.1.1, MinHash will only support serialization using [pickle], so we can only use python pickle to store MinHash, cannot use redis? If so, could datasketch let MinHash supports using Redis or other database?
Thanks very much in advance!!
fromdatasketchimportMinHash, MinHashLSHimportnumpyasnp# Generate 100 random MinHashes.minhashes=MinHash.bulk(
np.random.randint(low=0, high=30, size=(100, 10)),
num_perm=128
)
# Create LSH index.lsh=MinHashLSH(threshold=0.5, num_perm=128)
fori, minenumerate(minhashes):
lsh.insert(i, m)
# Get the initial results from LSH.query=minhashes[0]
results=lsh.query(query)
# Rank results using Jaccard similarity estimated by MinHash.results= [(query.jaccard(minhashes[key]), key) forkeyinresults]
results.sort(reverse=True)
print(results)
The text was updated successfully, but these errors were encountered:
rocke2020
changed the title
Could MinHash supports using Redis as the storage?
Could MinHash supports using Redis or other database as the storage?
Dec 11, 2024
I believe you don't need to use pickle if your minhash is stored already inside Redis. You can create a new LSH index with the same basename in the storage_config for Redis
Dear @ekzhu
Now we start to use v1.6.5 and plan to use Redis as storage in production, thanks for "MinHash LSH supports using Redis as the storage layer for handling large index and providing optional persistence as part of a production environment."
And in real usage, we hope to use Redis or similar middleware to storage the MinHash.
Because as shown in the MinHashLSH doc below, after LSM query, we need to MinHash to calculate Jaccard similarity and make a sort.
We hope to use redis or other no-sql database to store MinHash also.
But I see the MinHash doc, Since version 1.1.1, MinHash will only support serialization using [pickle], so we can only use python pickle to store MinHash, cannot use redis? If so, could datasketch let MinHash supports using Redis or other database?
Thanks very much in advance!!
https://ekzhu.com/datasketch/documentation.html#datasketch.MinHash
https://ekzhu.com/datasketch/documentation.html#datasketch.MinHashLSH
The text was updated successfully, but these errors were encountered: