-
Notifications
You must be signed in to change notification settings - Fork 93
Open
Description
I have a very large list of sequences for which scaling using Minhash does not return any result. Even when scaled < len(sequence)
Example sequences -
ATGGTTGGGATCGACGGACCGTAAATATCGGCATCGAGAACCCCGACCTTTGCGCCTTCAGCCGCTAACGCCAGCGCCAGGTTTACCGCCGTGGACGATTTCCCCACCCC
ATTAGTAAAAAACATGAGCATGGCCTGGCAAAATGTACTGTATATCGTGGCCGCGATATTAGTAATCATGCTGTGCGTCTTTACGCTGATCATTCGCGGTAAAGCCAAAAGCGA
Minimum code reproduce -
mh = MinHash(n=0, ksize=40, scaled=100)
mh.add_sequence(sequence, force=False)
print(mh.hashes.keys())
This prints -
KeysView({})
Where as
mh = MinHash(n=100, ksize=40)
works just fine.
Metadata
Metadata
Assignees
Labels
No labels