Original Repo: https://github.com/INK-USC/NExT
Installation Instructions
sh create_data_dirs.shpip install -r rqs.txt- If on Linux, you must download pytorch with CUDA 10.1 compatibility
- For more instructions, check here
python -m spacy download en_core_web_sm- modify nltk source code: in
nltk/parse/chart.py, line 680, modify functionparse, changefor edge in self.select(start=0, end=self._num_leaves,lhs=root):tofor edge in self.select(start=0, end=self._num_leaves): - place TACRED's train.json, dev.json, test.json into
datafolder - run through
Prepare Tacred Data.ipynbto prepare TACRED data cd trainingpython pre_train_find_module.py --build_data(defaults achieve 90% f1)python train_next_classifier.py --build_data --experiment_name="insertAnyTextHere"(defaults achieve 42.4% f1)- there are several available params you can set from the command line
- builds data for both strict and soft labeling, only uses strict data
- data pre-processing will take sometime, due to tuning of parser.
- Note: match_batch_size == batch_size in the bilstm case
For both step 8 and 9, subsequent trials on the same dataset don't need the "--build_data" flag; data that has already been computed does not need to be computed again, it is stored to disk.
Directory Descriptions:
CCG_new : everything to do with parsing of explanations and creation of strict and soft labeling functions
- main file: CCG_new/parser.py
models : model files
tests : test files, tests are a good place to understand a lot of the functions. train_util_functions.py doesn't have tests around it yet though. To run a test, run the following: pytest test_file_name.py. Example: pytest ccg_util_test.py. Some tests might not pass due to data not existing, read comments in those tests to understand how to build needed data.
training : all code to do with training of models