Implementation of the MDP Order Dispatch Policy

This repository contains the implementation of the paper Large-Scale Order Dispatch in On-Demand Ride-Hailing Platforms: A Learning and Planning Approach in Python. Specifically, it creates a synthetic environment to simulate the ridesharing marketplace according to Section 6.1 of the paper and applies the MDP order dispatch policy developed in the paper to this example. Please refer to Demonstration.ipynb for the detailed implementation.

Summary of the Algorithm

The algorithm consists of two steps:

Policy Evaluation: Apply temporal difference learning to the historical data to learn the value function
Order Dispatch: Implement the order dispatch policy by maximizing the value function

Illustration of the policy evaluation step:

Pseudocode:

The order dispatch step:

Simulation results and comparison against other baseline policies:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!