Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError #14

Open
Xmy0416 opened this issue Sep 3, 2024 · 2 comments
Open

KeyError #14

Xmy0416 opened this issue Sep 3, 2024 · 2 comments

Comments

@Xmy0416
Copy link

Xmy0416 commented Sep 3, 2024

I'm trying to generate KG from my dataset using the Self Canonicalization mode. In canonicalize function, there always occurs an error in the middle:
Traceback (most recent call last):
File "/edc/run.py", line 118, in
output_kg = edc.extract_kg(
File "/edc/edc_framework.py", line 471, in extract_kg
File "/edc/edc/edc_framework.py", line 261, in schema_canonicalization
canonicalized_triplet, canon_candidate_dict = schema_canonicalizer.canonicalize(
File "/edc/edc/schema_canonicalization.py", line 158, in canonicalize
self.schema_dict[open_relation] = open_relation_definition_dict[open_relation]
KeyError: 'to'
I think maybe its because the construction of sd_dict is not completed? Thank you a lot if you can look into it!!

@bzhangj13zzz
Copy link
Collaborator

Thank you for raising the issue and sorry for the late reply.

This may be a bug with the current codebase. I am currently quite occupied by other projects but I will try to spare some time for this and get back to you asap.

@edcuba
Copy link

edcuba commented Dec 4, 2024

Same problem, I think this mostly happens when running with smaller models (and/or missing good few-shot examples):

In self.schema_dict[open_relation] = open_relation_definition_dict[open_relation], you are trying to add a new relation to your schema, as the schema does not contain the relation yet.

The problem is that the relation definition is missing in your schema definition dictionary.

Very likely, the generated schema definition was not correctly formatted or did not contain the definition, and thus did not get parsed into the schema definition dictionary in parse_relation_definition. For example, mistral-7b often separates the relation name from relation description using - instead of :, or adds numbers to the beginning of the line.

The fix would be ensuring correct output and completeness of the schema definition step - adding more few-shot examples (and making sure they are following the required format), using a bigger model and/or improving the parsing logic.

For a hot-fix, add a check to edc/schema_canonicalization.py:155

if canonicalized_triplet is None:
    # Cannot be canonicalized
        if enrich and open_relation in open_relation_definition_dict: # <-- check if the definition exists
            self.schema_dict[open_relation] = open_relation_definition_dict[open_relation]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants