Skip to content

Alias939/master_thesis_code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

All the operations on the datasets before the finetuning process were done in jupyter notebook. The file is pre_finetune_jupyter.ipynb. Inside the file you can choose the dataset the evaluation is run on, and the threshold range to test.

The finetuning process is ran through the commmand:

train.py --languages nl --output_dir output/1024 --do_train --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --distributed_softmax --max_steps 224 --evaluation_strategy no --eval_steps 0 --max_seq_len 264 --warmup_steps 0 --label_names page_id --logging_steps 1 --metric_for_best_model eval_global_mrr --load_best_model_at_end False --save_total_limit 3 --report_to tensorboard --dataloader_num_workers 1 --single_domain --hidden_dropout_prob 0 --learning_rate 0.00001 --weight_decay 0.01 --alpha 1 --gradient_checkpointing False

Max_steps should be equal to the number of samples in your finetuning dataset.

In the train.py train_datasets is the variable used for finetuning.

train_datasets = [scenario_existing_dataset_creator.create_existing_scenario_dataset()] is used for finetuning on the testing/training datasets train_datasets = [scenario_dataset_creator_xml.create_scenario_dataset_xml()] is used for finetuning on the scenario specific dataset

scenario_validation_creator.validate_scenario(model,tokenizer) does the validation. In the file you can choose the dataset you wish to do validation on.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published