Question answering using pretrained models based on Xapian and XLNet
For anyone want to try, I suggest to use docker.
For CPU version, run to pull image:
docker pull zhupengjia/simple-qa:cpu
For GPU version, run to pull image:
docker pull zhupengjia/simple-qa:gpu
Download pretrained model from following link to your directory:, then decompress:!AnzH-f0hZoPctxAyoLAyA-b0ab6A?e=F2ks1i
Then decompress:
tar xzvf squad2_xlnet.tar
Make sure you have data that in the format of .pdf, .txt, .gzip, .bzip2
Then try to run manually:
Assume your directory contains model and data
docker run -d -v YOURDIRECTORY:/opt/chatbot/data --name simple_qa zhupengjia/simple-qa:cpu tail -f /dev/null
For GPU version, please make sure you have installed nvidia-container-runtime, and has the following item in your /etc/docker/daemon.json file:
"runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } }
Then create container as:
docker run -d --runtime nvidia -v YOURDIRECTORY:/opt/chatbot/data --name simple_qa zhupengjia/simple-qa:cpu tail -f /dev/null
Then manually do:
docker exec -it simple_qa bash python3 -m data/checkpoint-7900 --returnrelate --backend shell data/sample.txt
Have fun
If you want to run a restfulapi:
cd docker
modify docker-compose.yml for docker-compose-gpu.yml for image, environment, volumes
docker-compose up -d
docker-compose -f docker-compose-gpu.yml up -d
If you want to train model by yourself:
cd ANYDIRECTORY git clone cd transformers/examples python --do_lower_case --version_2_with_negative --model_type xlnet --model_name_or_path xlnet-large-cased --do_train --do_eval --train_file data/train-v2.0.json --predict_file data/dev-v2.0.json --learning_rate 3e-6 --num_train_epochs 12 --max_seq_length 384 --doc_stride 128 --output_dir ./finetuned_squad_xlnet --per_gpu_eval_batch_size 2 --per_gpu_train_batch_size 2 --save_steps 100 --fp16 --gradient_accumulation_steps 100 --overwrite_output_dir --do_lower_case
If you want to run in local machine::
Please make sure you have installed python-xapian. If you want to parse pdf file, please make sure you have installed poppler-utils
python -m MODELPATH --returnrelated --scorelimit 0.2 textfile_path