GitHub - uwwint/discourse-scraper

Scientific question-answering by fine-tuned large language models (LLMs)

Create an environment using the following command:

conda env create --file env.yml

Dataset collection

Galaxy help forum
Biostars Q&A

Fine-tune Llama2 (2B and 7B)

Navigate to \llama2 and then execute python qlora-train.py
Utilizes HuggingFace's Transformers package to download pre-trained LLMs
qLoRA to drastically reduce the number of parameters (from 2B to 6 million)
SFT for setting up the training process

Outcomes

Retrieval augmented generation (RAG)

Dataset collection from Galaxy's training material and GitHub pull requests
https://github.com/uwwint/discourse-scraper/blob/master/extract_documents/create_RAG_docs.ipynb
Pipeline for RAG that uses fine-tuned Llama2 using Haystack

Save the fine-tuned model to HuggingFace Hub

Save model to HuggingFace Hub: https://github.com/uwwint/discourse-scraper/blob/master/llama2/save_to_hub.ipynb
Model name: anuprulez/fine-tuned-gllm

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
biogpt		biogpt
data		data
disclosure_scraper		disclosure_scraper
discourse_data_model		discourse_data_model
extract_documents		extract_documents
llama2		llama2
out		out
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
env.yml		env.yml
env_only_RAG.yml		env_only_RAG.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scientific question-answering by fine-tuned large language models (LLMs)

Create an environment using the following command:

Dataset collection

Fine-tune Llama2 (2B and 7B)

Outcomes

Retrieval augmented generation (RAG)

Save the fine-tuned model to HuggingFace Hub

About

Releases

Packages

Contributors 2

Languages

License

uwwint/discourse-scraper

Folders and files

Latest commit

History

Repository files navigation

Scientific question-answering by fine-tuned large language models (LLMs)

Create an environment using the following command:

Dataset collection

Fine-tune Llama2 (2B and 7B)

Outcomes

Retrieval augmented generation (RAG)

Save the fine-tuned model to HuggingFace Hub

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages