Imitation learning is supervised learning where data comes as expert demonstration. The expert can be a human or any other agent. Input data is referred to as "state" and output data as "action." In discrete action spaces, it resembles classification; in continuous action spaces, it is regression.
Policy
Behavioral Cloning (BC) is offline imitation learning that use only the collected demonstrations and doesn't use simulator during learning.
- This tutorial is educational purpose, so code isn't optimized for production but easy to understand.
- Each policy training is done in a single jupyter notebook.
- Each directory contains a readme file.
Video | Task | State Space | Action Space | Expert | Colab |
---|---|---|---|---|---|
![]() |
MountainCar-v0 | Continuous(2) | Discrete(3) | Human | Open In Colab |
![]() |
Pendulum-v1 | Continuous(3) | Continuous(1) | RL | Open In Colab |
![]() |
CarRacing-v2 | Image(96x96x3) | Continuous(3) | Human | Open In Colab |
![]() |
Ant-v3 | Continuous(111) | Continuous(8) | RL | todo |
![]() |
Lift | Continuous(multi-modal) | Continuous(7) | Human | Open In Colab |
- use the "Open In Colab" links above to run the code in colab.
- please see the readme file in each directory for installation and data collection instructions.
- We use hdf5 file for robomimic (see the 'readme.md' in robomimic directory to understand the data format) and real robot.
- For rest of the environment we store as *.pkl file.
- Please see the respective folders (e.g. robomimic_tasks) for data collection instructions.