Skip to content

colinc9/minigrid

 
 

Repository files navigation

Directory structures:

experimental-code: Draft code, not really used

rl-starter-files: Cloned from the original repository, and I modified to include a GPT interface. I also wrote a GPT-based reward shaping function to ask GPT about the reward.

Running the code:

Setting up the repository: After creating your conda environment:

cd rl-starter-files
pip install -r requirements.txt

Right now my training script:

cd rl-starter-files/

python -m scripts.train --algo ppo --env BabyAI-GoToImpUnlock-v0 --model GoToImpUnlock0.0005Ask --text --save-interval 10 --frames 250000 --gpt

The problem is, an ask probability of 0.0005 is still very bad...It takes a really long time to train.

TODO

Baselines

Basic:

PPO, A2C only

Exploration(?):

RND: https://opendilab.github.io/DI-engine/12_policies/rnd.html BeBold, NovelD: https://github.com/tianjunz/NovelD Deir

Update

  • Bash script of experiments of different babyai and minigrid environments can be found as babyai.sh and minigrid.sh.

  • The reshaped reward with gpt predicting for a single action and for the next few actions (currently hardcoded as 10) are implemented and merged in the train.py and the utils folder.

  • Add eval2excel.py for evaluation and convert the results to excel files.

To run:

/data1/lzengaf/cs285/proj/minigrid/experimental-code/llm-interface/llama2_interface.py

first run:

pip install langchain cmake
export CMAKE_ARGS="-DLLAMA_METAL=on"
FORCE_CMAKE=1 pip install -U llama-cpp-python --no-cache-dir

curl https://ollama.ai/install.sh | sh

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 76.9%
  • Jupyter Notebook 19.0%
  • Shell 4.1%