Releases: elharirymatteo/RANS
ICRA 24
DRIFT - Release Notes
Overview
This release corresponds to the version of the code used in the research paper titled "DRIFT: Deep Reinforcement Learning for Intelligent Floating Platforms Trajectories". The problem formulation, simulation details, training procedure, and benchmarks discussed in the paper are based on this version.
Problem Formulation
The problem is formulated as a sequential decision-making task to control a floating platform's maneuvers within a 2D space. The state space, actions, and task-specific observations are defined as per the equations and tables provided in the paper.
3DoF Thrusters Configuration | 6DoF Thrusters Configuration |
---|---|
Reward Functions
Three reward functions for different tasks (Go to position, Go to pose, Track velocity) are defined as exponential terms (as described in the paper). These reward functions have been utilized for training the agents in this version.
Simulation
The simulation enhancements based on the RANS framework (RANS v2.0) have been integrated to perform more complex tasks. It includes parameterized rewards, penalties, disturbance generators, and allows action and state noises to be injected.Training Procedure
The training procedure is based on the PPO (Proximal Policy Optimization) algorithm with specific network configurations that can be checked in the training conf files:.
├── cfg
│ ├── task # Task configurations
│ │ └── virtual_floating_platform # Virtual floating platform task configurations
│ └── train # Training configurations
│ └── virtual_floating_platform # Virtual floating platform training . The agents undergo training for a total of 2000 epochs or approximately 130M steps.
The agents undergo training for a total of 2000 epochs or approximately 130M steps.
Benchmark Comparison
This version includes a benchmark comparison between deep reinforcement learning (DRL) and optimal control approaches (LQR) for controlling the floating platform. The comparison aims to provide insights into the strengths and weaknesses of each approach.Optimal Controller
An infinite horizon discrete-time LQR controller is implemented to compare with the DRL algorithm for controlling the floating platform. The state variables and corresponding state matrices for the LQR controller are calculated using finite differencing.Laboratory Experiment Setup
Real-world validation experiments were conducted using the physical air bearings platform located in the ZeroG Laboratory at the University of Luxembourg. Details about the laboratory setup and experimental procedures can be found in the paper.Astra 2023
ASTRA - Release Notes
Overview
This release corresponds to the version of the code used in the research paper titled "RANS: Highly-Parallelised Simulator for Reinforcement learning based Autonomous Navigating Spacecrafts." The repository contains the implementation and simulation environment (RANS) discussed in the paper. It serves as a valuable resource for reproducing the experiments and understanding the methodologies employed in the research.
Introduction
RANS (Reinforcement learning-based Autonomous Navigation for Spacecrafts) is designed to address the specific needs of RL-based spacecraft navigation. The primary aim of RANS is to bridge the gap between available simulation tools and the specialized requirements of spacecraft navigation using reinforcement learning. RANS offers a new alternative to design autonomous trajectories in 2D and 3D space.
Architecture and Objectives
RANS is structured to replicate realistic orbital operations as well as air-bearing platforms, providing a fast, stable, and precise simulation environment. It consists of two main scenarios: a 3 Degree of Freedom (DoF) "Floating Platform" (FP) robot and a 6 DoF navigating scenario. These scenarios allow users to specify or randomize initial conditions and goals for spacecraft control tasks.
Simulation Engine
RANS utilizes the PhysX engine within IsaacSim, a GPU-based physics engine renowned for its capacity to rapidly simulate numerous parallel systems. A sub-stepping strategy is employed to maintain simulation stability, especially conducive for reinforcement learning tasks characterized by short time intervals.
Environment and Tasks
In both 3 DoF and 6 DoF scenarios, RANS provides a default system configuration with varying thruster setups to accommodate different control tasks. The observation and action spaces are appropriately defined for each scenario and task, allowing for precise control and movement in the specified environment.
DRL Agents
The evaluation of RANS involves leveraging PPO (Proximal Policy Optimization) policies with multi-discrete action-space to solve various tasks in both 3 DoF and 6 DoF scenarios. The agents are modeled as actor-critic networks and are trained for specific epochs with varying network architectures to suit the task complexity.