Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: <title> ZeroDivisionError: Weights sum to zero, can't be normalized #619

Closed
prasantpoudel opened this issue Jul 19, 2024 · 9 comments
Labels
community_support Issue handled by community members

Comments

@prasantpoudel
Copy link

Describe the issue

When I run the query using local scope I got the error of ZeroDivisionError: Weights sum to zero, can't be normalized. But for the Global scope it worked correctly. If any one have the Idea please give the solution.

python3 -m graphrag.query \
--root ./ragtest \
--method local \
"Who is Scrooge, and what are his main relationships?"


INFO: Reading settings from ragtest/settings.yaml
creating llm client with {'api_key': 'REDACTED,len=19', 'type': "openai_chat", 'model': 'mistral:7b', 'max_tokens': 4000, 'request_timeout': 180.0, 'api_base': 'http://localhost:11434/v1', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': True, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 10}
creating embedding llm client with {'api_key': 'REDACTED,len=19', 'type': "openai_embedding", 'model': 'nomic-embed-text', 'max_tokens': 4000, 'request_timeout': 180.0, 'api_base': 'http://localhost:11434/v1', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': None, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 10}
Error embedding chunk {'OpenAIEmbedding': "Error code: 400 - {'error': {'message': 'invalid input type', 'type': 'api_error', 'param': None, 'code': None}}"}
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/query/__main__.py", line 75, in <module>
    run_local_search(
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/query/cli.py", line 154, in run_local_search
    result = search_engine.search(query=query)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/query/structured_search/local_search/search.py", line 118, in search
    context_text, context_records = self.context_builder.build_context(
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/query/structured_search/local_search/mixed_context.py", line 139, in build_context
    selected_entities = map_query_to_entities(
                        ^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/query/context_builder/entity_extraction.py", line 55, in map_query_to_entities
    search_results = text_embedding_vectorstore.similarity_search_by_text(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/vector_stores/lancedb.py", line 118, in similarity_search_by_text
    query_embedding = text_embedder(text)
                      ^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/query/context_builder/entity_extraction.py", line 57, in <lambda>
    text_embedder=lambda t: text_embedder.embed(t),
                            ^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/query/llm/oai/embedding.py", line 96, in embed
    chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/numpy/lib/function_base.py", line 550, in average
    raise ZeroDivisionError(
ZeroDivisionError: Weights sum to zero, can't be normalized

Steps to reproduce

No response

GraphRAG Config Used

No response

Logs and screenshots

No response

Additional Information

  • GraphRAG Version:
  • Operating System:
  • Python Version:
  • Related Issues:
@prasantpoudel prasantpoudel added the triage Default label assignment, indicates new issue needs reviewed by a maintainer label Jul 19, 2024
@Nuclear6
Copy link

It should be that something went wrong in your index phase. You can look at the logs in the index phase.

@sebnapi
Copy link

sebnapi commented Jul 19, 2024

Yes this is due to your locally run embedding model, not returning the weights in a correct format. OpenAI uses internally base64 encoded floats, and most other models will return floats as numbers.

I've hacked the encoding_format into this piece of code to make local search work:

def map_query_to_entities(
    query: str,
    text_embedding_vectorstore: BaseVectorStore,
    text_embedder: BaseTextEmbedding,
    all_entities: list[Entity],
    embedding_vectorstore_key: str = EntityVectorStoreKey.ID,
    include_entity_names: list[str] | None = None,
    exclude_entity_names: list[str] | None = None,
    k: int = 10,
    oversample_scaler: int = 2,
) -> list[Entity]:
    """Extract entities that match a given query using semantic similarity of text embeddings of query and entity descriptions."""
    if include_entity_names is None:
        include_entity_names = []
    if exclude_entity_names is None:
        exclude_entity_names = []
    matched_entities = []
    if query != "":
        # get entities with highest semantic similarity to query
        # oversample to account for excluded entities
        search_results = text_embedding_vectorstore.similarity_search_by_text(
            text=query,
            text_embedder=lambda t: text_embedder.embed(t, encoding_format="float"), # added to make embedding api work, openai uses base64 by default
            k=k * oversample_scaler,
        )
        for result in search_results:
            matched = get_entity_by_key(
                entities=all_entities,
                key=embedding_vectorstore_key,
                value=result.document.id,
            )
            if matched:
                matched_entities.append(matched)
    else:
        all_entities.sort(key=lambda x: x.rank if x.rank else 0, reverse=True)
        matched_entities = all_entities[:k]

    # filter out excluded entities
    if exclude_entity_names:
        matched_entities = [
            entity
            for entity in matched_entities
            if entity.title not in exclude_entity_names
        ]

    # add entities in the include_entity list
    included_entities = []
    for entity_name in include_entity_names:
        included_entities.extend(get_entity_by_name(all_entities, entity_name))
    return included_entities + matched_entities

@Anthonyfhd
Copy link

是的,这是由于您本地运行的嵌入模型未以正确的格式返回权重。OpenAI 使用内部 base64 编码的浮点数,而大多数其他模型将以数字形式返回浮点数。

我把 encoding_format 修改成了这段代码,以使本地搜索能够正常工作:

def map_query_to_entities(
    query: str,
    text_embedding_vectorstore: BaseVectorStore,
    text_embedder: BaseTextEmbedding,
    all_entities: list[Entity],
    embedding_vectorstore_key: str = EntityVectorStoreKey.ID,
    include_entity_names: list[str] | None = None,
    exclude_entity_names: list[str] | None = None,
    k: int = 10,
    oversample_scaler: int = 2,
) -> list[Entity]:
    """Extract entities that match a given query using semantic similarity of text embeddings of query and entity descriptions."""
    if include_entity_names is None:
        include_entity_names = []
    if exclude_entity_names is None:
        exclude_entity_names = []
    matched_entities = []
    if query != "":
        # get entities with highest semantic similarity to query
        # oversample to account for excluded entities
        search_results = text_embedding_vectorstore.similarity_search_by_text(
            text=query,
            text_embedder=lambda t: text_embedder.embed(t, encoding_format="float"), # added to make embedding api work, openai uses base64 by default
            k=k * oversample_scaler,
        )
        for result in search_results:
            matched = get_entity_by_key(
                entities=all_entities,
                key=embedding_vectorstore_key,
                value=result.document.id,
            )
            if matched:
                matched_entities.append(matched)
    else:
        all_entities.sort(key=lambda x: x.rank if x.rank else 0, reverse=True)
        matched_entities = all_entities[:k]

    # filter out excluded entities
    if exclude_entity_names:
        matched_entities = [
            entity
            for entity in matched_entities
            if entity.title not in exclude_entity_names
        ]

    # add entities in the include_entity list
    included_entities = []
    for entity_name in include_entity_names:
        included_entities.extend(get_entity_by_name(all_entities, entity_name))
    return included_entities + matched_entities

这好像改了还是不工作

@johmyzhang
Copy link

johmyzhang commented Jul 20, 2024

It's because you're using local model.
If you are using Ollama (just like me), you might see this answer:
#345 (comment)
Works perfectly for me...

@natoverse
Copy link
Collaborator

Consolidating alternate model issues here: #657

@natoverse natoverse closed this as not planned Won't fix, can't repro, duplicate, stale Jul 22, 2024
@natoverse natoverse added community_support Issue handled by community members and removed triage Default label assignment, indicates new issue needs reviewed by a maintainer labels Jul 22, 2024
@AymaneHan1
Copy link

Hello, currently running through the same problem, I am using an azure openai instance

text_embedder = OpenAIEmbedding(
    api_key=api_key,
    deployment_name="ada-small-emb-graphrag",
    model="text-embedding-ada-002",
    api_base="https://xxx-oai.openai.azure.com/",
)

text_embedder.embed("hello world")

This returns the error
ZeroDivisionError: Weights sum to zero, can't be normalized
I have added the float encoding in teh source code but still do not work

text_embedder=lambda t: text_embedder.embed(t, encoding_format="float")
Any ideas why is still not working? Thanks

@dantenull
Copy link

dantenull commented Oct 13, 2024

I also encountered this situation, but because I did not connect to openai, I checked the api_base and api_key, and there was no problem.

@ArwaALyahyai
Copy link

Yes this is due to your locally run embedding model, not returning the weights in a correct format. OpenAI uses internally base64 encoded floats, and most other models will return floats as numbers.

I've hacked the encoding_format into this piece of code to make local search work:

def map_query_to_entities(
query: str,
text_embedding_vectorstore: BaseVectorStore,
text_embedder: BaseTextEmbedding,
all_entities: list[Entity],
embedding_vectorstore_key: str = EntityVectorStoreKey.ID,
include_entity_names: list[str] | None = None,
exclude_entity_names: list[str] | None = None,
k: int = 10,
oversample_scaler: int = 2,
) -> list[Entity]:
"""Extract entities that match a given query using semantic similarity of text embeddings of query and entity descriptions."""
if include_entity_names is None:
include_entity_names = []
if exclude_entity_names is None:
exclude_entity_names = []
matched_entities = []
if query != "":
# get entities with highest semantic similarity to query
# oversample to account for excluded entities
search_results = text_embedding_vectorstore.similarity_search_by_text(
text=query,
text_embedder=lambda t: text_embedder.embed(t, encoding_format="float"), # added to make embedding api work, openai uses base64 by default
k=k * oversample_scaler,
)
for result in search_results:
matched = get_entity_by_key(
entities=all_entities,
key=embedding_vectorstore_key,
value=result.document.id,
)
if matched:
matched_entities.append(matched)
else:
all_entities.sort(key=lambda x: x.rank if x.rank else 0, reverse=True)
matched_entities = all_entities[:k]

# filter out excluded entities
if exclude_entity_names:
    matched_entities = [
        entity
        for entity in matched_entities
        if entity.title not in exclude_entity_names
    ]

# add entities in the include_entity list
included_entities = []
for entity_name in include_entity_names:
    included_entities.extend(get_entity_by_name(all_entities, entity_name))
return included_entities + matched_entities

where I should place this code?

@lawyinking
Copy link

Describe the issue

When I run the query using local scope I got the error of ZeroDivisionError: Weights sum to zero, can't be normalized. But for the Global scope it worked correctly. If any one have the Idea please give the solution.

python3 -m graphrag.query \
--root ./ragtest \
--method local \
"Who is Scrooge, and what are his main relationships?"


INFO: Reading settings from ragtest/settings.yaml
creating llm client with {'api_key': 'REDACTED,len=19', 'type': "openai_chat", 'model': 'mistral:7b', 'max_tokens': 4000, 'request_timeout': 180.0, 'api_base': 'http://localhost:11434/v1', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': True, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 10}
creating embedding llm client with {'api_key': 'REDACTED,len=19', 'type': "openai_embedding", 'model': 'nomic-embed-text', 'max_tokens': 4000, 'request_timeout': 180.0, 'api_base': 'http://localhost:11434/v1', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': None, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 10}
Error embedding chunk {'OpenAIEmbedding': "Error code: 400 - {'error': {'message': 'invalid input type', 'type': 'api_error', 'param': None, 'code': None}}"}
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/query/__main__.py", line 75, in <module>
    run_local_search(
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/query/cli.py", line 154, in run_local_search
    result = search_engine.search(query=query)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/query/structured_search/local_search/search.py", line 118, in search
    context_text, context_records = self.context_builder.build_context(
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/query/structured_search/local_search/mixed_context.py", line 139, in build_context
    selected_entities = map_query_to_entities(
                        ^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/query/context_builder/entity_extraction.py", line 55, in map_query_to_entities
    search_results = text_embedding_vectorstore.similarity_search_by_text(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/vector_stores/lancedb.py", line 118, in similarity_search_by_text
    query_embedding = text_embedder(text)
                      ^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/query/context_builder/entity_extraction.py", line 57, in <lambda>
    text_embedder=lambda t: text_embedder.embed(t),
                            ^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/graphrag/query/llm/oai/embedding.py", line 96, in embed
    chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/numpy/lib/function_base.py", line 550, in average
    raise ZeroDivisionError(
ZeroDivisionError: Weights sum to zero, can't be normalized

Steps to reproduce

No response

GraphRAG Config Used

No response

Logs and screenshots

No response

Additional Information

  • GraphRAG Version:
  • Operating System:
  • Python Version:
  • Related Issues:

This may be caused by an invalid Api_key

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community_support Issue handled by community members
Projects
None yet
Development

No branches or pull requests

10 participants