GitHub

All the operations on the datasets before the finetuning process were done in jupyter notebook. The file is pre_finetune_jupyter.ipynb. Inside the file you can choose the dataset the evaluation is run on, and the threshold range to test.

The finetuning process is ran through the commmand:

train.py --languages nl --output_dir output/1024 --do_train --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --distributed_softmax --max_steps 224 --evaluation_strategy no --eval_steps 0 --max_seq_len 264 --warmup_steps 0 --label_names page_id --logging_steps 1 --metric_for_best_model eval_global_mrr --load_best_model_at_end False --save_total_limit 3 --report_to tensorboard --dataloader_num_workers 1 --single_domain --hidden_dropout_prob 0 --learning_rate 0.00001 --weight_decay 0.01 --alpha 1 --gradient_checkpointing False

Max_steps should be equal to the number of samples in your finetuning dataset.

In the train.py train_datasets is the variable used for finetuning.

train_datasets = [scenario_existing_dataset_creator.create_existing_scenario_dataset()] is used for finetuning on the testing/training datasets train_datasets = [scenario_dataset_creator_xml.create_scenario_dataset_xml()] is used for finetuning on the scenario specific dataset

scenario_validation_creator.validate_scenario(model,tokenizer) does the validation. In the file you can choose the dataset you wish to do validation on.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
datasets		datasets
LICENSE		LICENSE
README.md		README.md
SamenwerkenOT_correct.xml		SamenwerkenOT_correct.xml
dataloader.py		dataloader.py
format_csv.py		format_csv.py
last_chance.py		last_chance.py
no_finetune_validation.py		no_finetune_validation.py
plotter.py		plotter.py
pre_finetune_jupyter.ipynb		pre_finetune_jupyter.ipynb
requirements.txt		requirements.txt
scenario_dataset_creator.py		scenario_dataset_creator.py
scenario_dataset_creator_xml.py		scenario_dataset_creator_xml.py
scenario_existing_dataset_creator.py		scenario_existing_dataset_creator.py
scenario_validation_creator.py		scenario_validation_creator.py
test_scenario.csv		test_scenario.csv
train.py		train.py
upload_to_hub.py		upload_to_hub.py
xla_spawn.py		xla_spawn.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

License

Alias939/master_thesis_code

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages