Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] What's the corresponding add_items in this project compared to nmslib/hnswlib #13

Open
LizzyMiao opened this issue Jul 4, 2023 · 0 comments

Comments

@LizzyMiao
Copy link

LizzyMiao commented Jul 4, 2023

hi I am currently working on a project which needs millions sometimes even billions of vectors to be inserted to build up a graph, and I follow the example.py in https://github.com/nmslib/hnswlib/tree/master with 4000K vectors like below code

p = hnswlib.Index('l2', dim)
print("before build ", datetime.datetime.now())
p.init_index(max_elements = num_elements, ef_construction = 128, M = 16)
p.add_items(vectorNP, ids)
p.save_index("/Users/XXX/Projects/builder/hnsw-embedding-test/python_test/combined.bin")

it took around 2 mins to finish,

but when use with libhnswlib-jna-x86-64 with 16 cores, by

      val hnswIndex = new ConcurrentIndex(SpaceName.L2, dimension)
      hnswIndex.initialize(3890521, 16, 128, 42)
      val embeddingRecordsPar = parquet4sReader.toList.par
      embeddingRecordsPar.tasksupport = new ForkJoinTaskSupport(new ForkJoinPool(16))
      embeddingRecordsPar.foreach{ eb =>
        val ba = eb.vectors.head
        if (ba.length > 0) {
          val vector = RawEmbedding.toVector(RichByteArray(ba).asByteBuffer, dimension, "float16")
          hnswIndex.addNormalizedItem(vector, i)
          i = i + 1
        }
      }

it is around 15-16mins (same time cost if I change ConcurrentIndex into Index or use Index.synchronizedIndex), all above two part of codes runnning in my local machine, I'm wondering if there is same function like add_items in this hnswlib-jna or any other ways that can faster the speed of building up graph?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant