DRIFT - Release Notes
Overview
This release corresponds to the version of the code used in the research paper titled "DRIFT: Deep Reinforcement Learning for Intelligent Floating Platforms Trajectories". The problem formulation, simulation details, training procedure, and benchmarks discussed in the paper are based on this version.
Problem Formulation
The problem is formulated as a sequential decision-making task to control a floating platform's maneuvers within a 2D space. The state space, actions, and task-specific observations are defined as per the equations and tables provided in the paper.
3DoF Thrusters Configuration | 6DoF Thrusters Configuration |
---|---|
![]() |
![]() |
Reward Functions
Three reward functions for different tasks (Go to position, Go to pose, Track velocity) are defined as exponential terms (as described in the paper). These reward functions have been utilized for training the agents in this version.
Simulation
The simulation enhancements based on the RANS framework (RANS v2.0) have been integrated to perform more complex tasks. It includes parameterized rewards, penalties, disturbance generators, and allows action and state noises to be injected.Training Procedure
The training procedure is based on the PPO (Proximal Policy Optimization) algorithm with specific network configurations that can be checked in the training conf files:.
├── cfg
│ ├── task # Task configurations
│ │ └── virtual_floating_platform # Virtual floating platform task configurations
│ └── train # Training configurations
│ └── virtual_floating_platform # Virtual floating platform training . The agents undergo training for a total of 2000 epochs or approximately 130M steps.
The agents undergo training for a total of 2000 epochs or approximately 130M steps.