GitHub - Jekelrary/RelationPrompt: This repository implements our ACL Findings 2022 research paper RelationPrompt: Leveraging Prompts to Generate Synthetic Data for Zero-Shot Relation Triplet Extraction. The goal of Zero-Shot Relation Triplet Extraction (ZeroRTE) is to extract relation triplets of the format (head entity, tail entity, relation), despite not having annotated data for the test relation labels.

RelationPrompt: Leveraging Prompts to Generate Synthetic Data for Zero-Shot Relation Triplet Extraction

This repository implements our ACL Findings 2022 research paper RelationPrompt: Leveraging Prompts to Generate Synthetic Data for Zero-Shot Relation Triplet Extraction. The goal of Zero-Shot Relation Triplet Extraction (ZeroRTE) is to extract relation triplets of the format (head entity, tail entity, relation), despite not having annotated data for the test relation labels.

Installation

Python 3.7
If your GPU uses CUDA 11, first install the specific PyTorch: pip install torch==1.10.0 --extra-index-url https://download.pytorch.org/whl/cu113
Install requirements: pip install -r requirements.txt or conda env create --file environment.yml
Download and extract the datasets here to outputs/data/splits/zero_rte
FewRel Pretrained Model (unseen=10, seed=0)
Wiki-ZSL Pretrained Model (unseen=10, seed=0)

Data Exploration |

from wrapper import Dataset

data = Dataset.load(path)
for s in data.sents:
    print(s.tokens)
    for t in s.triplets:
        print(t.head, t.tail, t.label)

Generate with Pretrained Model |

from wrapper import Generator

model = Generator(load_dir="gpt2", save_dir="outputs/wrapper/fewrel/unseen_10_seed_0/generator")
model.generate(labels=["location", "religion"], path_out="synthetic.jsonl")

Extract with Pretrained Model |

from wrapper import Extractor

model = Extractor(load_dir="facebook/bart-base", save_dir="outputs/wrapper/fewrel/unseen_10_seed_0/extractor_final")
model.predict(path_in=path_test, path_out="pred.jsonl")

Model Training |

Train the Generator and Extractor models:

from pathlib import Path
from wrapper import Generator, Extractor

generator = Generator(
    load_dir="gpt2",
    save_dir=str(Path(save_dir) / "generator"),
)
extractor = Extractor(
    load_dir="facebook/bart-base",
    save_dir=str(Path(save_dir) / "extractor"),
)
generator.fit(path_train, path_dev)
extractor.fit(path_train, path_dev)

Generate synthetic data with relation triplets for test labels:

generator.generate(labels_test, path_out=path_synthetic)

Train the final Extractor model using the synthetic data and predict on test sentences:

extractor_final = Extractor(
    load_dir=str(Path(save_dir) / "extractor" / "model"),
    save_dir=str(Path(save_dir) / "extractor_final"),
)
extractor_final.fit(path_synthetic, path_dev)
extractor_final.predict(path_in=path_test, path_out=path_pred)

Experiment Scripts

Run training in wrapper.py (You can change "fewrel" to "wiki" or unseen to 5/10/15 or seed to 0/1/2/3/4):

python wrapper.py main \
--path_train outputs/data/splits/zero_rte/fewrel/unseen_10_seed_0/train.jsonl \                                       
--path_dev outputs/data/splits/zero_rte/fewrel/unseen_10_seed_0/dev.jsonl \                                           
--path_test outputs/data/splits/zero_rte/fewrel/unseen_10_seed_0/test.jsonl \                                         
--save_dir outputs/wrapper/fewrel/unseen_10_seed_0

Run evaluation (Single-triplet setting)

python wrapper.py run_eval \                                                                                               
--path_model outputs/wrapper/fewrel/unseen_10_seed_0/extractor_final \                                                  
--path_test outputs/data/splits/zero_rte/fewrel/unseen_10_seed_0/test.jsonl \
--mode single

Run evaluation (Multi-triplet setting)

python wrapper.py run_eval \                                                                                               
--path_model outputs/wrapper/fewrel/unseen_10_seed_0/extractor_final \                                                  
--path_test outputs/data/splits/zero_rte/fewrel/unseen_10_seed_0/test.jsonl \
--mode multi

Research Citation

If the code is useful for your research project, we appreciate if you cite the following paper:

@inproceedings{chia-etal-2022-relationprompt,
    title = "RelationPrompt: Leveraging Prompts to Generate Synthetic Data for Zero-Shot Relation Triplet Extraction",
    author = "Chia, Yew Ken and Bing, Lidong and Poria, Soujanya and Si, Luo",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2022",
    year = "2022",
    url = "https://arxiv.org/abs/2203.09101",
    doi = "https://doi.org/10.48550/arXiv.2203.09101",
}

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
transformer_base		transformer_base
LICENSE		LICENSE
README.md		README.md
demo.ipynb		demo.ipynb
encoding.py		encoding.py
environment.yml		environment.yml
generation.py		generation.py
modeling.py		modeling.py
requirements.txt		requirements.txt
utils.py		utils.py
wrapper.py		wrapper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RelationPrompt: Leveraging Prompts to Generate Synthetic Data for Zero-Shot Relation Triplet Extraction

Installation

Data Exploration |

Generate with Pretrained Model |

Extract with Pretrained Model |

Model Training |

Experiment Scripts

Research Citation

About

Releases

Packages

Languages

License

Jekelrary/RelationPrompt

Folders and files

Latest commit

History

Repository files navigation

RelationPrompt: Leveraging Prompts to Generate Synthetic Data for Zero-Shot Relation Triplet Extraction

Installation

Data Exploration |

Generate with Pretrained Model |

Extract with Pretrained Model |

Model Training |

Experiment Scripts

Research Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages