DRIFT - Release Notes

⚠️ For the most up-to-date information and code, please refer to the main repository .

Overview

This release corresponds to the version of the code used in the research paper titled "DRIFT: Deep Reinforcement Learning for Intelligent Floating Platforms Trajectories". The problem formulation, simulation details, training procedure, and benchmarks discussed in the paper are based on this version.

Problem Formulation

The problem is formulated as a sequential decision-making task to control a floating platform's maneuvers within a 2D space. The state space, actions, and task-specific observations are defined as per the equations and tables provided in the paper.

3DoF Thrusters Configuration	6DoF Thrusters Configuration

Reward Functions

Three reward functions for different tasks (Go to position, Go to pose, Track velocity) are defined as exponential terms (as described in the paper). These reward functions have been utilized for training the agents in this version.

Simulation

The simulation enhancements based on the RANS framework (RANS v2.0) have been integrated to perform more complex tasks. It includes parameterized rewards, penalties, disturbance generators, and allows action and state noises to be injected.

Training Procedure

The training procedure is based on the PPO (Proximal Policy Optimization) algorithm with specific network configurations that can be checked in the training conf files:

.
├── cfg
│   ├── task                   # Task configurations
│   │   └── virtual_floating_platform  # Virtual floating platform task configurations
│   └── train                  # Training configurations
│       └── virtual_floating_platform  # Virtual floating platform training . The agents undergo training for a total of 2000 epochs or approximately 130M steps.

The agents undergo training for a total of 2000 epochs or approximately 130M steps.

Benchmark Comparison

This version includes a benchmark comparison between deep reinforcement learning (DRL) and optimal control approaches (LQR) for controlling the floating platform. The comparison aims to provide insights into the strengths and weaknesses of each approach.

Optimal Controller

An infinite horizon discrete-time LQR controller is implemented to compare with the DRL algorithm for controlling the floating platform. The state variables and corresponding state matrices for the LQR controller are calculated using finite differencing.

Laboratory Experiment Setup

Real-world validation experiments were conducted using the physical air bearings platform located in the ZeroG Laboratory at the University of Luxembourg. Details about the laboratory setup and experimental procedures can be found in the paper.

Overview

This release corresponds to the version of the code used in the research paper titled "RANS: Highly-Parallelised Simulator for Reinforcement learning based Autonomous Navigating Spacecrafts." The repository contains the implementation and simulation environment (RANS) discussed in the paper. It serves as a valuable resource for reproducing the experiments and understanding the methodologies employed in the research.

Introduction

RANS (Reinforcement learning-based Autonomous Navigation for Spacecrafts) is designed to address the specific needs of RL-based spacecraft navigation. The primary aim of RANS is to bridge the gap between available simulation tools and the specialized requirements of spacecraft navigation using reinforcement learning. RANS offers a new alternative to design autonomous trajectories in 2D and 3D space.

Architecture and Objectives

RANS is structured to replicate realistic orbital operations as well as air-bearing platforms, providing a fast, stable, and precise simulation environment. It consists of two main scenarios: a 3 Degree of Freedom (DoF) "Floating Platform" (FP) robot and a 6 DoF navigating scenario. These scenarios allow users to specify or randomize initial conditions and goals for spacecraft control tasks.

Simulation Engine

RANS utilizes the PhysX engine within IsaacSim, a GPU-based physics engine renowned for its capacity to rapidly simulate numerous parallel systems. A sub-stepping strategy is employed to maintain simulation stability, especially conducive for reinforcement learning tasks characterized by short time intervals.

Environment and Tasks

In both 3 DoF and 6 DoF scenarios, RANS provides a default system configuration with varying thruster setups to accommodate different control tasks. The observation and action spaces are appropriately defined for each scenario and task, allowing for precise control and movement in the specified environment.

DRL Agents

The evaluation of RANS involves leveraging PPO (Proximal Policy Optimization) policies with multi-discrete action-space to solve various tasks in both 3 DoF and 6 DoF scenarios. The agents are modeled as actor-critic networks and are trained for specific epochs with varying network architectures to suit the task complexity.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRIFT - Release Notes

Overview

ASTRA - Release Notes

Overview

Introduction

Releases: elharirymatteo/RANS

ICRA 24

DRIFT - Release Notes

Overview

Astra 2023

ASTRA - Release Notes

Overview

Introduction