autonomousvision
diff --git a/‎.pre-commit-config.yaml‎
Lines changed: 5 additions & 0 deletions b/‎.pre-commit-config.yaml‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 9 additions & 7 deletions b/‎README.md‎
Lines changed: 9 additions & 7 deletions
diff --git a/‎docs/install.md‎
Lines changed: 11 additions & 5 deletions b/‎docs/install.md‎
Lines changed: 11 additions & 5 deletions
diff --git a/‎docs/metrics.md‎
Lines changed: 7 additions & 7 deletions b/‎docs/metrics.md‎
Lines changed: 7 additions & 7 deletions
diff --git a/‎docs/splits.md‎
Lines changed: 19 additions & 9 deletions b/‎docs/splits.md‎
Lines changed: 19 additions & 9 deletions
diff --git a/‎download/download_navhard_two_stage.sh‎
Lines changed: 10 additions & 0 deletions b/‎download/download_navhard_two_stage.sh‎
Lines changed: 10 additions & 0 deletions
@@ -44,3 +44,8 @@ repos:
     hooks:
     - id: flake8
       language_version: python3.10
+
+-   repo: https://github.com/kynan/nbstripout
+    rev: 0.6.1
+    hooks:
+    - id: nbstripout
@@ -52,12 +52,14 @@
 <p align="right">(<a href="#top">back to top</a>)</p>
 
 ## Changelog <a name="changelog"></a>
-- **`[2025/04/13]`** NAVSIM v2.1.1 release (official devkit version for 2025 warm-up phase)
-  - Updated dataset for the [Hugging Face Warmup leaderboard](https://huggingface.co/spaces/AGC2025/e2e-driving-warmup) with minor fixes
-
-    - ⚠️ **IMPORTANT**: To submit to the updated leaderboard, you need to re-download the synthetic dataset v2.1.1 (see [download](docs/install.md))
-
-- **`[2025/04/08]`** NAVSIM v2.1 release 
+- **`[2025/04/24]`** NAVSIM v2.1.2 release
+  - Release of `navhard_two_stage` dataset (see [splits](docs/splits.md))
+  - Updated Extended Predictive Driver Model Score (EPDMS) for the [Hugging Face Warmup leaderboard](https://huggingface.co/spaces/AGC2025/e2e-driving-warmup). See see [metrics](docs/metrics.md) for details regarding the implementation.
+    - ⚠️ **IMPORTANT**: All entries to the warmup leaderboard have been removed. Please resubmit to obtain your score with the updated metric.
+    - The test leaderboard (coming this week) will use the same metric as this warmup leaderboard.
+- **`[2025/04/13]`** NAVSIM v2.1.1 release
+  - Updated dataset for the warmup leaderboard with minor fixes
+- **`[2025/04/08]`** NAVSIM v2.1 release
   - Added new dataset for the [Hugging Face Warmup leaderboard](https://huggingface.co/spaces/AGC2025/e2e-driving-warmup) (see [submission](docs/submission.md))
   - Introduced support for two-stage reactive traffic agents (see [traffic simulation](docs/metrics.md))
 - **`[2025/02/28]`** NAVSIM v2.0 release
@@ -77,7 +79,7 @@
   - Support for test phase frames of competition
   - Download script for trainval
   - Egostatus MLP Agent and training pipeline
-- **`[2024/03/25]`** NAVSIM v0.3 release (official devkit version for warm-up phase)
+- **`[2024/03/25]`** NAVSIM v0.3 release
   - Adds code for Leaderboard submission
 - **`[2024/03/11]`** NAVSIM v0.2 release
   - Easier installation and download
 
@@ -26,15 +26,15 @@ cd download && ./download_maps
 Next download the data splits you want to use.
 Note that the dataset splits do not exactly map to the recommended standardized training / test splits-
 Please refer to [splits](splits.md) for an overview on the standardized training and test splits including their size and check which dataset splits you need to download in order to be able to run them.
-
-You can download the mini, trainval, test, private_test_e2e and warmup_synthetic_scenes dataset split with the following scripts
+You can download these splits with the following scripts.
 
 ```bash
 ./download_mini
 ./download_trainval
 ./download_test
 ./download_private_test_e2e
 ./download_warmup_two_stage
+./download_navhard_two_stage
 ```
 
 Also, the script `./download_navtrain` can be used to download a small portion of the  `trainval` dataset split which is needed for the `navtrain` training split.
@@ -57,16 +57,18 @@ This will download the splits into the download directory. From there, move it t
     |    ├── trainval
     |    ├── private_test_e2e
     |    └── mini
+    └── navhard_two_stage
+    |    ├── openscene_meta_datas
+    |    ├── sensor_blobs
+    |    ├── synthetic_scene_pickles
+    |    └── synthetic_scenes_attributes.csv
     └── warmup_two_stage
          ├── openscene_meta_datas
 	 ├── sensor_blobs
 	 ├── synthetic_scene_pickles
          └── synthetic_scenes_attributes.csv
 
 ```
-
-⚠️ **IMPORTANT:** If you have already downloaded the data for Navsim V2.0.1 and tried the Hugging Face Leaderboard, please replace the old `"synthetic_scenes"` folder with the new `"warmup_two_stage"` folder. In Navsim V2.1, the traffic agents' policy has been updated, and the old data is no longer compatible.
-
 Set the required environment variables, by adding the following to your `~/.bashrc` file
 Based on the structure above, the environment variables need to be defined as:
 
@@ -78,6 +80,10 @@ export NAVSIM_DEVKIT_ROOT="$HOME/navsim_workspace/navsim"
 export OPENSCENE_DATA_ROOT="$HOME/navsim_workspace/dataset"
 ```
 
+⏰ **Note:** The `navhard_two_stage` split is used for local testing of your model's performance in a two-stage pseudo closed-loop setup.
+In contrast, `warmup_two_stage` is a smaller dataset designed for validating and testing submissions to the [Hugging Face Warmup leaderboard](https://huggingface.co/spaces/AGC2025/e2e-driving-warmup).
+In other words, the results you obtain locally on `warmup_two_stage` should match the results you see after submitting to Hugging Face.
+
 ### 3. Install the navsim-devkit
 
 Finally, install navsim.
 
@@ -7,7 +7,7 @@ In NAVSIM v2, the EPDMS extends the PDMS, introducing:
 - 2 new multiplier metrics (DDC and TLC)
 - False-positive penalty filtering
 
-The Lane Keeping subscore (LK) penalizes driving too far from the centerline for an extended time. It is disabled on intersections where the centerline annotations often don't match the actual lane markings perceived by the sensors. Besides, NAVSIM v2 puts stronger emphasis on comfortable driving. The existing Comfort (C) subscore was slightly improved to also evaluate how the planned trajectory matches the vehicle's motion history. Moreover, the new extended comfort (EC) subscore compares the trajectory outputs of subsequent frames and their resulting dynamic states. A discrepancy in acceleration, jerk etc. between subsequent frames results in uncomfortable behavior and thus a lower score.
+The Lane Keeping subscore (LK) penalizes driving too far from the centerline for an extended time. It is disabled on intersections where the centerline annotations often don't match the actual lane markings perceived by the sensors. Besides, NAVSIM v2 puts stronger emphasis on comfortable driving. The existing Comfort (C) subscore from NAVSIM v1 was slightly improved to also evaluate how the planned trajectory matches the vehicle's motion history, giving History Comfort (HC). Moreover, the new extended comfort (EC) subscore compares the trajectory outputs of subsequent frames and their resulting dynamic states. A discrepancy in acceleration, jerk etc. between subsequent frames results in uncomfortable behavior and thus a lower score.
 
 The new Driving Direction Compliance (DDC) and Traffic Light Compliance (TLC) subscores extend the inadmissible behaviors detected and penalized in evaluation. Further, to reduce false positive penalties, we disable penalties when the human agent is also responsible for a violation. This ensures that the planner is not unfairly penalized in situations where breaking a rule is necessary to achieve a valid driving goal. For example, if the agent must briefly enter the oncoming lane to overtake a static obstacle.
 
@@ -19,18 +19,18 @@ No at-fault Collisions (NC) | multiplier | {0, 1/2, 1} |
 Drivable Area Compliance (DAC) | multiplier | {0, 1} |
 **Driving Direction Compliance** (DDC) | multiplier | {0, 1/2, 1} |
 **Traffic Light Compliance** (TLC) | multiplier | {0, 1} |
-Time to Collision (TTC) within bound | 5 | {0, 1} |
 Ego Progress (EP) | 5 | [0, 1] |
-Comfort (C) | 2 | {0, 1} |
+Time to Collision (TTC) within bound | 5 | {0, 1} |
 **Lane Keeping** (LK)  | 2 | {0, 1} |
+**History Comfort** (HC) | 2 | {0, 1} |
 **Extended Comfort** (EC) | 2 | {0, 1} |
 
 The full EPDMS is defined as:
 
 <br>
 
 
-$$\text{EPDMS} = \left(\prod_{m\in\\{NC, DAC, DDC, TLC\\}} \text{filter}\_m(\text{agent}, \text{human})\right) \cdot  \left( \frac{\sum_{m \in \\{TTC, EP, C, LK, EC\\}} w_m \cdot \text{filter}\_m(\text{agent}, \text{human}) }{\sum_{m\in \\{TTC, EP, C, LK, EC\\}} w_m}\right)$$
+$$\text{EPDMS} = \left(\prod_{m\in\\{NC, DAC, DDC, TLC\\}} \text{filter}\_m(\text{agent}, \text{human})\right) \cdot  \left( \frac{\sum_{m \in \\{TTC, EP, HC, LK, EC\\}} w_m \cdot \text{filter}\_m(\text{agent}, \text{human}) }{\sum_{m\in \\{TTC, EP, HC, LK, EC\\}} w_m}\right)$$
 
 $$\text{with}\quad \text{filter}_m(\text{agent}, \text{human}) = \begin{cases}
 1.0 & \text{if } m(\text{human}) = 0 \\
@@ -41,7 +41,7 @@ m(\text{agent}) & \text{otherwise.}
 <!-- $$\text{with}\quad \text{filter}_m(\text{agent}, \text{human}) = \mathbf{1}_{m(\text{human})\neq 0} \cdot m(\text{agent}) + 1.0 \cdot m(\text{human}).$$ -->
 <br>
 
-For reference, the PDMS used in NAVSIM v1 was:
+For reference, the PDMS used in NAVSIM v1 which used a slightly different version of HC called Comfort (C), was defined as:
 
 <br>
 
@@ -68,8 +68,8 @@ The new NAVSIM v2 evaluation uses a two-stage aggregation process to approximate
 
 3. **Weighting and Aggregation:**
    - To emulate the effects of closed-loop simulation, the relevance of each follow-up scene to the overall score depends on how close its starting position is to where the submitted planner actually ended in the first stage.
-   - We assign higher weights to follow-up scenes that start closer to the submitted planner's end position.
-   - We first compute a weighted aggregation if all second-stage scores. Finally, we aggregate the scores of the first and second stage to produce the aggregated metric.
+   - We assign higher weights to follow-up scenes that start closer to the submitted planner's end position, with a gaussian kernel.
+   - We first compute a weighted aggregation if all second-stage scores. Finally, we aggregate the scores of the first and second stage via a simple multiplication to produce the aggregated metric.
 
 # Run an evaluation
 To evaluate the PDM score for an agent you can run:
 
@@ -1,15 +1,15 @@
 # Dataset splits vs. filtered training / test splits
 
 The NAVSIM framework utilizes several dataset splits for standardized training and evaluating agents.
-All of them use the OpenScene dataset that is divided into the dataset splits `mini`, `trainval`,` test`, `warmup_two_stage`, `private_test_e2e`, which can all be downloaded separately.
+All of them use the OpenScene dataset that is divided into the dataset splits `mini`, `trainval`,` test`, which can all be downloaded separately.
 
-It is possible to run trainings and evaluations directly on these sets (see `Standard` in table below).
+It is possible to run trainings and evaluations directly on these sets (see `OpenScene` in table below).
 Alternatively, you can run trainings and evaluations on training and validation splits that were filtered for challenging scenarios (see `NAVSIM` in table below), which is the recommended option for producing comparable and competitive results efficiently.
 In contrast to the dataset splits which refer to a downloadable set of logs, the training / test splits are implemented as scene filters, which define how scenes are extracted from these logs.
 
 The NAVSIM training / test splits subsample the OpenScene dataset splits.
 Moreover, the NAVSIM splits include overlapping scenes, while the Standard splits are non-overlapping.
-Specifically, `navtrain` is based on the `trainval` data and `navtest` on the `test` data.
+Specifically, `navtrain` is based on the `trainval` data and `navtest` and `navhard_two_stage` on the `test` data.
 
 As the `trainval` sensor data is very large, we provide a separate download link, which loads only the frames needed for `navtrain`.
 This eases access for users that only want to run the `navtrain` split and not the `trainval` split. If you already downloaded the full `trainval` sensor data, it is **not necessary** to download the `navtrain` frames as well.
@@ -18,7 +18,6 @@ The logs are always the complete dataset split.
 ## Overview
 
 The Table belows offers an overview on the training and test splits supported by NAVSIM.
-In Navsim-v1.1, the training/test split can bet set with a single config parameter given in the table.
 
 <table border="0">
     <tr>
@@ -30,7 +29,7 @@ In Navsim-v1.1, the training/test split can bet set with a single config paramet
         <th>Config parameters</th>
     </tr>
     <tr>
-        <td rowspan="3">Standard</td>
+        <td rowspan="3">OpenScene</td>
         <td>trainval</td>
         <td>Large split for training and validating agents with regular driving recordings. Corresponds to nuPlan and downsampled to 2HZ.</td>
         <td>14GB</td>
@@ -58,7 +57,7 @@ In Navsim-v1.1, the training/test split can bet set with a single config paramet
         </td>
     </tr>
     <tr>
-        <td rowspan="2">NAVSIM</td>
+        <td rowspan="3">NAVSIM</td>
         <td>navtrain</td>
         <td>Standard split for training agents in NAVSIM with non-trivial driving scenes. Sensors available separately in <a href="https://github.com/autonomousvision/navsim/blob/main/download/download_navtrain.sh">download_navtrain.sh</a>.</td>
         <td>-</td>
@@ -76,10 +75,19 @@ In Navsim-v1.1, the training/test split can bet set with a single config paramet
         train_test_split=navtest
         </td>
     </tr>
+    <tr>
+        <td>navhard_two_stage</td>
+        <td>Standard split for testing agents in NAVSIM v2 with real and synthetic driving scenes. Synthetic frames downloadable via <a href="https://github.com/autonomousvision/navsim/blob/main/download/download_navhard_two_stage.sh">download_navhard_two_stage.sh</a>.</td>
+        <td>892M</td>
+        <td>31G</td>
+        <td>
+        train_test_split=navhard_two_stage
+        </td>
+    </tr>
     <tr>
         <td rowspan="2">Competition</td>
         <td>warmup_two_stage</td>
-        <td>Warmup test split to validate submission on hugging face. Available as a filter for test split.</td>
+        <td>Warmup test split to validate submission on hugging face. Synthetic frames downloadable via <a href="https://github.com/autonomousvision/navsim/blob/main/download/download_warmup_two_stage.sh">download_warmup_two_stage.sh</a>.</td>
         <td>27M</td>
         <td>1.2G</td>
         <td>
@@ -103,9 +111,11 @@ In Navsim-v1.1, the training/test split can bet set with a single config paramet
 
 The standard splits `trainval`, `test`, and `mini` are from the OpenScene dataset. Note that the data corresponds to the nuPlan dataset with a lower frequency of 2Hz. You can download all standard splits over Hugging Face with the bash scripts in [download](../download)
 
-NAVSIM provides a subset and filter of the `trainval` split, called `navtrain`. The `navtrain` split facilitates a standardized training scheme and requires significantly less sensor data storage than `travel` (445GB vs. 2100GB). If your agents don't need historical sensor inputs, you can download `navtrain` without history, which requires 300GB of storage. Note that `navtrain` can be downloaded separately via [download_navtrain.sh](https://github.com/autonomousvision/navsim/blob/main/download/download_navtrain.sh) but still requires access to the `trainval` logs. Similarly, the `navtest` split enables a standardized set for testing agents with a provided scene filter. Both `navtrain` and `navtest` are filtered to increase interesting samples in the sets.
+NAVSIM provides a subset and filter of the `trainval` split, called `navtrain`. The `navtrain` split facilitates a standardized training scheme and requires significantly less sensor data storage than `travel` (445GB vs. 2100GB). If your agents don't need historical sensor inputs, you can download `navtrain` without history, which requires 300GB of storage. Note that the sensor data for `navtrain` can be downloaded separately via [download_navtrain.sh](https://github.com/autonomousvision/navsim/blob/main/download/download_navtrain.sh) but it still requires access to the `trainval` logs.
+
+The `navtest` split enables a standardized set for testing agents in NAVSIM v1 with a provided scene filter. Similarly, the `navhard_two_stage` split split facilitates pseudo closed-loop simulation for evaluation in NAVSIM v2. `navtrain`, `navtest` and `navhard_two_stage` are filtered to increase interesting samples in the sets.
 
-For the challenge on Hugging Face, we provide the `warmup_two_stage` and `private_test_e2e` for the warm-up and challenge track, respectively. Note that `private_test_e2e` requires you to download the data, while `warmup_two_stage` is a scene filter for the `test` split.
+For the challenge on Hugging Face, we provide the `warmup_two_stage` and `private_test_e2e` for the warm-up and challenge track, respectively.
 
 ## Troubleshooting
 
 
@@ -0,0 +1,10 @@
+wget https://huggingface.co/datasets/OpenDriveLab/OpenScene/resolve/main/navsim-v2/navsim_v2.1.1_navhard_two_stage_curr_sensors.tar.gz
+wget https://huggingface.co/datasets/OpenDriveLab/OpenScene/resolve/main/navsim-v2/navsim_v2.1.1_navhard_two_stage_hist_sensors.tar.gz
+wget https://huggingface.co/datasets/OpenDriveLab/OpenScene/resolve/main/navsim-v2/navsim_v2.1.1_navhard_two_stage_scene_pickles.tar.gz
+
+tar -xzvf navsim_v2.1.1_navhard_two_stage_curr_sensors.tar.gz
+tar -xzvf navsim_v2.1.1_navhard_two_stage_hist_sensors.tar.gz
+tar -xzvf navsim_v2.1.1_navhard_two_stage_scene_pickles.tar.gz
+rm navsim_v2.1.1_navhard_two_stage_curr_sensors.tar.gz
+rm navsim_v2.1.1_navhard_two_stage_hist_sensors.tar.gz
+rm navsim_v2.1.1_navhard_two_stage_scene_pickles.tar.gz