This project is an educational exploration and demonstration of Reinforcement Learning (RL) concepts through a simple but meaningful interactive simulation.
The main objective is to build a flexible Android app in Kotlin/Compose where a virtual agent learns how to throw a bouncing ball into a cup through trial and error, improving its policy over many iterations using different RL algorithms.
Beyond the bouncing ball, this project aims to:
- Provide a hands-on, visual playground for understanding core RL principles
- Compare multiple RL strategies side by side on the same problem
- Show real-time learning dynamics with detailed metrics and visualizations
- Serve as a solid foundation to RL skills and extend to more complex problems later
The field of Reinforcement Learning is both fascinating and challenging, bridging the gap between theory and practical autonomous decision-making.
As an Android developer keen on expanding my skills in machine learning and AI, I wanted a project that:
- Is hands-on and visual, helping me truly grasp RL dynamics
- Covers multiple fundamental RL algorithms to understand their strengths and weaknesses
- Offers real-time feedback on learning progress through intuitive visualizations
- Can serve as a playground to experiment and grow, with potential extensions beyond simple environments
- Is fully implemented in Kotlin/Compose, showcasing modern Android tech with a deep learning twist
This project is my stepping stone towards Reinforcement Learning, combining software craftsmanship with AI exploration.
- Environment: A simple 2D physics simulation with gravity, bounces, obstacles, and a target cup
- Agent: Learns a throwing policy (angle, power) to maximize success
- Rewards: Designed to encourage landing the ball in the cup and penalize misses
- Learning: Agent updates its policy based on feedback from environment interactions
- Visualization: Real-time trajectories, heatmaps, and learning metrics to track progress
-
Concept:
Q-Learning is an off-policy, value-based RL method that learns a table of Q-values ( Q(s,a) ), representing the expected cumulative reward of taking action ( a ) in state ( s ).
The agent updates Q-values using the Bellman equation by observing rewards and next states, learning an optimal policy by greedily selecting actions with the highest Q-values. -
How it works:
Q(s, a) = Q(s, a) + α [r + γ * max(Q(s', a')) - Q(s, a)]
where α
is learning rate, γ
is discount factor, r
is reward, and s'
is the next state.
- Limitations:
Requires discrete, manageable state and action spaces. For continuous domains, discretization or function approximation is needed, which can reduce precision or increase complexity.
-
Concept:
SARSA is an on-policy, value-based RL method similar to Q-Learning, but updates are based on the actual next action taken by the current policy, not the max action.
This makes it more conservative and often safer in some environments. -
Update rule:
Q(s, a) = Q(s, a) + α [r + γ * Q(s', a') - Q(s, a)]
where a'
is the action taken in state s'
.
- Limitations:
Same as Q-Learning — depends on discrete states and actions; convergence depends on exploration policies.
Parameter | Default Value | Description |
---|---|---|
Gravity (g) | 9.8 px/s² | Constant vertical acceleration |
Bounce coefficient | 0.7 | Fraction of velocity conserved after bounce (0 < e < 1) |
Friction (optional) | 0.02 | Horizontal speed decay over time |
Delta Time | ~16 ms (60 FPS) | Physics update interval |
- Ball: 2D position, velocity, and acceleration vectors
- Obstacles: Static rectangular shapes with collision detection
- Cup (target): Fixed zone that defines success if ball lands inside
- Field boundaries: Walls and floor with bounce logic or reset conditions
- Live ball trajectory path with fading trail
- Throw history visualization with color-coded success/failure (ghost throws)
- Impact heatmap showing frequently landed zones
- Reward over time graph showing learning progress
- Iteration/episode count
- Running average and variance of recent rewards
- Exploration rate (epsilon) for ε-greedy policies
- Policy visualization (action probability distributions) for policy gradient and actor-critic methods
- Q-value heatmaps or tables for discrete methods
- Controls: start, pause, reset simulation
- Strategy selector dropdown
Environment
module: physics simulation, collision, reward calculationAgent
module: RL algorithms implementing decision and learning stepsSimulationManager
: Orchestrates interaction loops and episode handlingUI
with Jetpack Compose: physics visualization, metrics, and controlsDataVisualization
: real-time charts, heatmaps, logs
- Build the physics environment with gravity, bouncing ball, and cup target
- Implement random agent baseline for testing environment
- Implement Q-Learning and SARSA with discrete state/action representation
- Add throw visualization, heatmaps, and reward graphs
- Add modular RL strategy selector and UI enhancements
- Optimize performance and polish UI/UX
- Write documentation and prepare portfolio demo
- Implement environment physics and playground UI in Compose
- Code baseline Random and Q-Learning agents
- Add UI for monitoring RL metrics in real-time
- Explore extensions with obstacles or more complex tasks
*This project is a continuous journey into Reinforcement Learning, blending practical Android development with AI concepts.