scripts

Merge pull request dhdhagar#24 from allenai/prod_no_references

Nov 15, 2021

9daea9d · Nov 15, 2021

This branch is 2 commits ahead of, 9 commits behind dhdhagar/prob-ent-resolution:main.

Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md	some documentation	Oct 4, 2021
blog_post_eval.py	blog_post_eval.py	first commit	Mar 9, 2021
claims_cluster_eval.py	claims_cluster_eval.py	first commit	Mar 9, 2021
full_model_dump.py	full_model_dump.py	modify script for no refs	Nov 12, 2021
get_name_counts.py	get_name_counts.py	first commit	Mar 9, 2021
make_augmentation_dataset_a.py	make_augmentation_dataset_a.py	first commit	Mar 9, 2021
make_augmentation_dataset_b.py	make_augmentation_dataset_b.py	first commit	Mar 9, 2021
make_claims_dataset.py	make_claims_dataset.py	first commit	Mar 9, 2021
make_s2and_mini_dataset.py	make_s2and_mini_dataset.py	Formatting	Oct 7, 2021
make_s2and_name_tuples.py	make_s2and_name_tuples.py	first commit	Mar 9, 2021
make_triplets.py	make_triplets.py	Lowercase text to be safe	Oct 12, 2021
mypy.sh	mypy.sh	first commit	Mar 9, 2021
paper_experiments.sh	paper_experiments.sh	initial commit	Sep 7, 2021
run_ci_locally.sh	run_ci_locally.sh	first commit	Mar 9, 2021
sota.py	sota.py	ram ram ram	Sep 7, 2021
transfer_experiment_internal.py	transfer_experiment_internal.py	typo	Sep 7, 2021
transfer_experiment_seed_paper.py	transfer_experiment_seed_paper.py	typo	Sep 7, 2021
transform_all_datasets.py	transform_all_datasets.py	first commit	Mar 9, 2021
tutorial.ipynb	tutorial.ipynb	first commit	Mar 9, 2021

README.md

This folder contains scripts that are a mix of: (a) documentation, (b) internal Semantic Scholar scripts that won't run for anyone outside of AI2, (c) experimental scripts for the S2AND paper, and (d) continuous integration scripts.

If you're not internal to AI2, here are scripts you will care about:

paper_experiments.sh: A complete list of command line commands to reproduce all of the paper's results
sota.py: Scripts to compute the state-of-the-art results table in the paper
transfer_experiment_seed_paper.py: The main script used to run the experiments present in the paper
tutorial.ipynb: A guide to the S2AND pipeline that's easier to look at than the above two scripts.

Important notes about transfer_experiment_seed_paper.py:

It assumes that the S2AND data is in <code root path>/data/. If that's not the case, you'll have to modify the "main_data_dir" entry in data/path_config.json.
If you have a small to medium amount of RAM, don't use the --use_cache flag. Without the cache, it'll be slower, but will not try to fit all of the feature data into memory.

Other scripts in this folder (mostly have use_cache=True):

blog_post_eval.py: Computes min edit distance performance numbers that appear only in the blog post.
claims_cluster_eval.py: Evaluates a model on the Semantic Scholar corrections data (data not released)
full_model_dump.py: Trains and dumps to disk a full model trained on all of the datasets (including orcid and augmented, which are not released)
get_name_counts.py: Present as documentation for how the name counts metadata was collected (not runnable because it relies on internal Semantic Scholar data)
make_augmentation_dataset_a.py: First step of creating the augmentation dataset (data not released)
make_augmentation_dataset_b.py: Second step of creating the augmentation dataset (data not released)
make_claims_dataset.py: Creates datasets for evaluating a model on Semantic Scholar corrections data (not runnable because it relies on internal Semantic Scholar data)
make_s2and_name_tuples.py: Creates the name tuples file of known aliases (included as documentation)
make_s2and_mini_dataset.py: S2AND is huge and takes a long time. If you want to make a smaller dataset, this script will do it. It skips medline.
transfer_experiment_internal.py: A version of transfer_experiment_seed_paper.py for internal S2 use (has two unreleased datasets)
transform_all_datasets.py: Transforms an old format of the datasets into the final one (probably not relevant to you)

Continuous integration scripts:

mypy.sh: Just runs the mypy part of the continuous integration
run_ci_locally.sh: Runs the CI for the repo locally

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

scripts

scripts

README.md

Files

scripts

Directory actions

More options

Directory actions

More options

Latest commit

History

scripts

Folders and files

parent directory

README.md