DRIFT - Release Notes

⚠️ For the most up-to-date information and code, please refer to the main repository .

Overview

This release corresponds to the version of the code used in the research paper titled "DRIFT: Deep Reinforcement Learning for Intelligent Floating Platforms Trajectories". The problem formulation, simulation details, training procedure, and benchmarks discussed in the paper are based on this version.

Problem Formulation

The problem is formulated as a sequential decision-making task to control a floating platform's maneuvers within a 2D space. The state space, actions, and task-specific observations are defined as per the equations and tables provided in the paper.

3DoF Thrusters Configuration	6DoF Thrusters Configuration

Reward Functions

Three reward functions for different tasks (Go to position, Go to pose, Track velocity) are defined as exponential terms (as described in the paper). These reward functions have been utilized for training the agents in this version.

Simulation

The simulation enhancements based on the RANS framework (RANS v2.0) have been integrated to perform more complex tasks. It includes parameterized rewards, penalties, disturbance generators, and allows action and state noises to be injected.

Training Procedure

The training procedure is based on the PPO (Proximal Policy Optimization) algorithm with specific network configurations that can be checked in the training conf files:

.
├── cfg
│   ├── task                   # Task configurations
│   │   └── virtual_floating_platform  # Virtual floating platform task configurations
│   └── train                  # Training configurations
│       └── virtual_floating_platform  # Virtual floating platform training . The agents undergo training for a total of 2000 epochs or approximately 130M steps.

The agents undergo training for a total of 2000 epochs or approximately 130M steps.

Benchmark Comparison

This version includes a benchmark comparison between deep reinforcement learning (DRL) and optimal control approaches (LQR) for controlling the floating platform. The comparison aims to provide insights into the strengths and weaknesses of each approach.

Optimal Controller

An infinite horizon discrete-time LQR controller is implemented to compare with the DRL algorithm for controlling the floating platform. The state variables and corresponding state matrices for the LQR controller are calculated using finite differencing.

Laboratory Experiment Setup

Real-world validation experiments were conducted using the physical air bearings platform located in the ZeroG Laboratory at the University of Luxembourg. Details about the laboratory setup and experimental procedures can be found in the paper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ICRA 24

DRIFT - Release Notes

Overview