experimental-code: Draft code, not really used
rl-starter-files: Cloned from the original repository, and I modified to include a GPT interface. I also wrote a GPT-based reward shaping function to ask GPT about the reward.
Setting up the repository: After creating your conda environment:
cd rl-starter-files
pip install -r requirements.txt
Right now my training script:
cd rl-starter-files/
python -m scripts.train --algo ppo --env BabyAI-GoToImpUnlock-v0 --model GoToImpUnlock0.0005Ask --text --save-interval 10 --frames 250000 --gpt
The problem is, an ask probability of 0.0005 is still very bad...It takes a really long time to train.
Basic:
PPO, A2C only
Exploration(?):
RND: https://opendilab.github.io/DI-engine/12_policies/rnd.html BeBold, NovelD: https://github.com/tianjunz/NovelD Deir
-
Bash script of experiments of different babyai and minigrid environments can be found as
babyai.sh
andminigrid.sh
. -
The reshaped reward with gpt predicting for a single action and for the next few actions (currently hardcoded as 10) are implemented and merged in the
train.py
and theutils
folder. -
Add
eval2excel.py
for evaluation and convert the results to excel files.
To run:
/data1/lzengaf/cs285/proj/minigrid/experimental-code/llm-interface/llama2_interface.py
first run:
pip install langchain cmake
export CMAKE_ARGS="-DLLAMA_METAL=on"
FORCE_CMAKE=1 pip install -U llama-cpp-python --no-cache-dir
curl https://ollama.ai/install.sh | sh