Skip to content

Commit 937cefc

Browse files
authored
Merge pull request #24 from mh0797/main
version2.1.2
2 parents 74602a8 + 3ba58a0 commit 937cefc

File tree

20 files changed

+12263
-348
lines changed

20 files changed

+12263
-348
lines changed

.pre-commit-config.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,3 +44,8 @@ repos:
4444
hooks:
4545
- id: flake8
4646
language_version: python3.10
47+
48+
- repo: https://github.com/kynan/nbstripout
49+
rev: 0.6.1
50+
hooks:
51+
- id: nbstripout

README.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -52,12 +52,14 @@
5252
<p align="right">(<a href="#top">back to top</a>)</p>
5353

5454
## Changelog <a name="changelog"></a>
55-
- **`[2025/04/13]`** NAVSIM v2.1.1 release (official devkit version for 2025 warm-up phase)
56-
- Updated dataset for the [Hugging Face Warmup leaderboard](https://huggingface.co/spaces/AGC2025/e2e-driving-warmup) with minor fixes
57-
58-
- ⚠️ **IMPORTANT**: To submit to the updated leaderboard, you need to re-download the synthetic dataset v2.1.1 (see [download](docs/install.md))
59-
60-
- **`[2025/04/08]`** NAVSIM v2.1 release
55+
- **`[2025/04/24]`** NAVSIM v2.1.2 release
56+
- Release of `navhard_two_stage` dataset (see [splits](docs/splits.md))
57+
- Updated Extended Predictive Driver Model Score (EPDMS) for the [Hugging Face Warmup leaderboard](https://huggingface.co/spaces/AGC2025/e2e-driving-warmup). See see [metrics](docs/metrics.md) for details regarding the implementation.
58+
- ⚠️ **IMPORTANT**: All entries to the warmup leaderboard have been removed. Please resubmit to obtain your score with the updated metric.
59+
- The test leaderboard (coming this week) will use the same metric as this warmup leaderboard.
60+
- **`[2025/04/13]`** NAVSIM v2.1.1 release
61+
- Updated dataset for the warmup leaderboard with minor fixes
62+
- **`[2025/04/08]`** NAVSIM v2.1 release
6163
- Added new dataset for the [Hugging Face Warmup leaderboard](https://huggingface.co/spaces/AGC2025/e2e-driving-warmup) (see [submission](docs/submission.md))
6264
- Introduced support for two-stage reactive traffic agents (see [traffic simulation](docs/metrics.md))
6365
- **`[2025/02/28]`** NAVSIM v2.0 release
@@ -77,7 +79,7 @@
7779
- Support for test phase frames of competition
7880
- Download script for trainval
7981
- Egostatus MLP Agent and training pipeline
80-
- **`[2024/03/25]`** NAVSIM v0.3 release (official devkit version for warm-up phase)
82+
- **`[2024/03/25]`** NAVSIM v0.3 release
8183
- Adds code for Leaderboard submission
8284
- **`[2024/03/11]`** NAVSIM v0.2 release
8385
- Easier installation and download

docs/install.md

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -26,15 +26,15 @@ cd download && ./download_maps
2626
Next download the data splits you want to use.
2727
Note that the dataset splits do not exactly map to the recommended standardized training / test splits-
2828
Please refer to [splits](splits.md) for an overview on the standardized training and test splits including their size and check which dataset splits you need to download in order to be able to run them.
29-
30-
You can download the mini, trainval, test, private_test_e2e and warmup_synthetic_scenes dataset split with the following scripts
29+
You can download these splits with the following scripts.
3130

3231
```bash
3332
./download_mini
3433
./download_trainval
3534
./download_test
3635
./download_private_test_e2e
3736
./download_warmup_two_stage
37+
./download_navhard_two_stage
3838
```
3939

4040
Also, the script `./download_navtrain` can be used to download a small portion of the `trainval` dataset split which is needed for the `navtrain` training split.
@@ -57,16 +57,18 @@ This will download the splits into the download directory. From there, move it t
5757
| ├── trainval
5858
| ├── private_test_e2e
5959
   | └── mini
60+
   └── navhard_two_stage
61+
| ├── openscene_meta_datas
62+
| ├── sensor_blobs
63+
| ├── synthetic_scene_pickles
64+
   | └── synthetic_scenes_attributes.csv
6065
   └── warmup_two_stage
6166
├── openscene_meta_datas
6267
├── sensor_blobs
6368
├── synthetic_scene_pickles
6469
   └── synthetic_scenes_attributes.csv
6570
6671
```
67-
68-
⚠️ **IMPORTANT:** If you have already downloaded the data for Navsim V2.0.1 and tried the Hugging Face Leaderboard, please replace the old `"synthetic_scenes"` folder with the new `"warmup_two_stage"` folder. In Navsim V2.1, the traffic agents' policy has been updated, and the old data is no longer compatible.
69-
7072
Set the required environment variables, by adding the following to your `~/.bashrc` file
7173
Based on the structure above, the environment variables need to be defined as:
7274

@@ -78,6 +80,10 @@ export NAVSIM_DEVKIT_ROOT="$HOME/navsim_workspace/navsim"
7880
export OPENSCENE_DATA_ROOT="$HOME/navsim_workspace/dataset"
7981
```
8082

83+
**Note:** The `navhard_two_stage` split is used for local testing of your model's performance in a two-stage pseudo closed-loop setup.
84+
In contrast, `warmup_two_stage` is a smaller dataset designed for validating and testing submissions to the [Hugging Face Warmup leaderboard](https://huggingface.co/spaces/AGC2025/e2e-driving-warmup).
85+
In other words, the results you obtain locally on `warmup_two_stage` should match the results you see after submitting to Hugging Face.
86+
8187
### 3. Install the navsim-devkit
8288

8389
Finally, install navsim.

docs/metrics.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ In NAVSIM v2, the EPDMS extends the PDMS, introducing:
77
- 2 new multiplier metrics (DDC and TLC)
88
- False-positive penalty filtering
99

10-
The Lane Keeping subscore (LK) penalizes driving too far from the centerline for an extended time. It is disabled on intersections where the centerline annotations often don't match the actual lane markings perceived by the sensors. Besides, NAVSIM v2 puts stronger emphasis on comfortable driving. The existing Comfort (C) subscore was slightly improved to also evaluate how the planned trajectory matches the vehicle's motion history. Moreover, the new extended comfort (EC) subscore compares the trajectory outputs of subsequent frames and their resulting dynamic states. A discrepancy in acceleration, jerk etc. between subsequent frames results in uncomfortable behavior and thus a lower score.
10+
The Lane Keeping subscore (LK) penalizes driving too far from the centerline for an extended time. It is disabled on intersections where the centerline annotations often don't match the actual lane markings perceived by the sensors. Besides, NAVSIM v2 puts stronger emphasis on comfortable driving. The existing Comfort (C) subscore from NAVSIM v1 was slightly improved to also evaluate how the planned trajectory matches the vehicle's motion history, giving History Comfort (HC). Moreover, the new extended comfort (EC) subscore compares the trajectory outputs of subsequent frames and their resulting dynamic states. A discrepancy in acceleration, jerk etc. between subsequent frames results in uncomfortable behavior and thus a lower score.
1111

1212
The new Driving Direction Compliance (DDC) and Traffic Light Compliance (TLC) subscores extend the inadmissible behaviors detected and penalized in evaluation. Further, to reduce false positive penalties, we disable penalties when the human agent is also responsible for a violation. This ensures that the planner is not unfairly penalized in situations where breaking a rule is necessary to achieve a valid driving goal. For example, if the agent must briefly enter the oncoming lane to overtake a static obstacle.
1313

@@ -19,18 +19,18 @@ No at-fault Collisions (NC) | multiplier | {0, 1/2, 1} |
1919
Drivable Area Compliance (DAC) | multiplier | {0, 1} |
2020
**Driving Direction Compliance** (DDC) | multiplier | {0, 1/2, 1} |
2121
**Traffic Light Compliance** (TLC) | multiplier | {0, 1} |
22-
Time to Collision (TTC) within bound | 5 | {0, 1} |
2322
Ego Progress (EP) | 5 | [0, 1] |
24-
Comfort (C) | 2 | {0, 1} |
23+
Time to Collision (TTC) within bound | 5 | {0, 1} |
2524
**Lane Keeping** (LK) | 2 | {0, 1} |
25+
**History Comfort** (HC) | 2 | {0, 1} |
2626
**Extended Comfort** (EC) | 2 | {0, 1} |
2727

2828
The full EPDMS is defined as:
2929

3030
<br>
3131

3232

33-
$$\text{EPDMS} = \left(\prod_{m\in\\{NC, DAC, DDC, TLC\\}} \text{filter}\_m(\text{agent}, \text{human})\right) \cdot \left( \frac{\sum_{m \in \\{TTC, EP, C, LK, EC\\}} w_m \cdot \text{filter}\_m(\text{agent}, \text{human}) }{\sum_{m\in \\{TTC, EP, C, LK, EC\\}} w_m}\right)$$
33+
$$\text{EPDMS} = \left(\prod_{m\in\\{NC, DAC, DDC, TLC\\}} \text{filter}\_m(\text{agent}, \text{human})\right) \cdot \left( \frac{\sum_{m \in \\{TTC, EP, HC, LK, EC\\}} w_m \cdot \text{filter}\_m(\text{agent}, \text{human}) }{\sum_{m\in \\{TTC, EP, HC, LK, EC\\}} w_m}\right)$$
3434

3535
$$\text{with}\quad \text{filter}_m(\text{agent}, \text{human}) = \begin{cases}
3636
1.0 & \text{if } m(\text{human}) = 0 \\
@@ -41,7 +41,7 @@ m(\text{agent}) & \text{otherwise.}
4141
<!-- $$\text{with}\quad \text{filter}_m(\text{agent}, \text{human}) = \mathbf{1}_{m(\text{human})\neq 0} \cdot m(\text{agent}) + 1.0 \cdot m(\text{human}).$$ -->
4242
<br>
4343

44-
For reference, the PDMS used in NAVSIM v1 was:
44+
For reference, the PDMS used in NAVSIM v1 which used a slightly different version of HC called Comfort (C), was defined as:
4545

4646
<br>
4747

@@ -68,8 +68,8 @@ The new NAVSIM v2 evaluation uses a two-stage aggregation process to approximate
6868

6969
3. **Weighting and Aggregation:**
7070
- To emulate the effects of closed-loop simulation, the relevance of each follow-up scene to the overall score depends on how close its starting position is to where the submitted planner actually ended in the first stage.
71-
- We assign higher weights to follow-up scenes that start closer to the submitted planner's end position.
72-
- We first compute a weighted aggregation if all second-stage scores. Finally, we aggregate the scores of the first and second stage to produce the aggregated metric.
71+
- We assign higher weights to follow-up scenes that start closer to the submitted planner's end position, with a gaussian kernel.
72+
- We first compute a weighted aggregation if all second-stage scores. Finally, we aggregate the scores of the first and second stage via a simple multiplication to produce the aggregated metric.
7373

7474
# Run an evaluation
7575
To evaluate the PDM score for an agent you can run:

docs/splits.md

Lines changed: 19 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
# Dataset splits vs. filtered training / test splits
22

33
The NAVSIM framework utilizes several dataset splits for standardized training and evaluating agents.
4-
All of them use the OpenScene dataset that is divided into the dataset splits `mini`, `trainval`,` test`, `warmup_two_stage`, `private_test_e2e`, which can all be downloaded separately.
4+
All of them use the OpenScene dataset that is divided into the dataset splits `mini`, `trainval`,` test`, which can all be downloaded separately.
55

6-
It is possible to run trainings and evaluations directly on these sets (see `Standard` in table below).
6+
It is possible to run trainings and evaluations directly on these sets (see `OpenScene` in table below).
77
Alternatively, you can run trainings and evaluations on training and validation splits that were filtered for challenging scenarios (see `NAVSIM` in table below), which is the recommended option for producing comparable and competitive results efficiently.
88
In contrast to the dataset splits which refer to a downloadable set of logs, the training / test splits are implemented as scene filters, which define how scenes are extracted from these logs.
99

1010
The NAVSIM training / test splits subsample the OpenScene dataset splits.
1111
Moreover, the NAVSIM splits include overlapping scenes, while the Standard splits are non-overlapping.
12-
Specifically, `navtrain` is based on the `trainval` data and `navtest` on the `test` data.
12+
Specifically, `navtrain` is based on the `trainval` data and `navtest` and `navhard_two_stage` on the `test` data.
1313

1414
As the `trainval` sensor data is very large, we provide a separate download link, which loads only the frames needed for `navtrain`.
1515
This eases access for users that only want to run the `navtrain` split and not the `trainval` split. If you already downloaded the full `trainval` sensor data, it is **not necessary** to download the `navtrain` frames as well.
@@ -18,7 +18,6 @@ The logs are always the complete dataset split.
1818
## Overview
1919

2020
The Table belows offers an overview on the training and test splits supported by NAVSIM.
21-
In Navsim-v1.1, the training/test split can bet set with a single config parameter given in the table.
2221

2322
<table border="0">
2423
<tr>
@@ -30,7 +29,7 @@ In Navsim-v1.1, the training/test split can bet set with a single config paramet
3029
<th>Config parameters</th>
3130
</tr>
3231
<tr>
33-
<td rowspan="3">Standard</td>
32+
<td rowspan="3">OpenScene</td>
3433
<td>trainval</td>
3534
<td>Large split for training and validating agents with regular driving recordings. Corresponds to nuPlan and downsampled to 2HZ.</td>
3635
<td>14GB</td>
@@ -58,7 +57,7 @@ In Navsim-v1.1, the training/test split can bet set with a single config paramet
5857
</td>
5958
</tr>
6059
<tr>
61-
<td rowspan="2">NAVSIM</td>
60+
<td rowspan="3">NAVSIM</td>
6261
<td>navtrain</td>
6362
<td>Standard split for training agents in NAVSIM with non-trivial driving scenes. Sensors available separately in <a href="https://github.com/autonomousvision/navsim/blob/main/download/download_navtrain.sh">download_navtrain.sh</a>.</td>
6463
<td>-</td>
@@ -76,10 +75,19 @@ In Navsim-v1.1, the training/test split can bet set with a single config paramet
7675
train_test_split=navtest
7776
</td>
7877
</tr>
78+
<tr>
79+
<td>navhard_two_stage</td>
80+
<td>Standard split for testing agents in NAVSIM v2 with real and synthetic driving scenes. Synthetic frames downloadable via <a href="https://github.com/autonomousvision/navsim/blob/main/download/download_navhard_two_stage.sh">download_navhard_two_stage.sh</a>.</td>
81+
<td>892M</td>
82+
<td>31G</td>
83+
<td>
84+
train_test_split=navhard_two_stage
85+
</td>
86+
</tr>
7987
<tr>
8088
<td rowspan="2">Competition</td>
8189
<td>warmup_two_stage</td>
82-
<td>Warmup test split to validate submission on hugging face. Available as a filter for test split.</td>
90+
<td>Warmup test split to validate submission on hugging face. Synthetic frames downloadable via <a href="https://github.com/autonomousvision/navsim/blob/main/download/download_warmup_two_stage.sh">download_warmup_two_stage.sh</a>.</td>
8391
<td>27M</td>
8492
<td>1.2G</td>
8593
<td>
@@ -103,9 +111,11 @@ In Navsim-v1.1, the training/test split can bet set with a single config paramet
103111

104112
The standard splits `trainval`, `test`, and `mini` are from the OpenScene dataset. Note that the data corresponds to the nuPlan dataset with a lower frequency of 2Hz. You can download all standard splits over Hugging Face with the bash scripts in [download](../download)
105113

106-
NAVSIM provides a subset and filter of the `trainval` split, called `navtrain`. The `navtrain` split facilitates a standardized training scheme and requires significantly less sensor data storage than `travel` (445GB vs. 2100GB). If your agents don't need historical sensor inputs, you can download `navtrain` without history, which requires 300GB of storage. Note that `navtrain` can be downloaded separately via [download_navtrain.sh](https://github.com/autonomousvision/navsim/blob/main/download/download_navtrain.sh) but still requires access to the `trainval` logs. Similarly, the `navtest` split enables a standardized set for testing agents with a provided scene filter. Both `navtrain` and `navtest` are filtered to increase interesting samples in the sets.
114+
NAVSIM provides a subset and filter of the `trainval` split, called `navtrain`. The `navtrain` split facilitates a standardized training scheme and requires significantly less sensor data storage than `travel` (445GB vs. 2100GB). If your agents don't need historical sensor inputs, you can download `navtrain` without history, which requires 300GB of storage. Note that the sensor data for `navtrain` can be downloaded separately via [download_navtrain.sh](https://github.com/autonomousvision/navsim/blob/main/download/download_navtrain.sh) but it still requires access to the `trainval` logs.
115+
116+
The `navtest` split enables a standardized set for testing agents in NAVSIM v1 with a provided scene filter. Similarly, the `navhard_two_stage` split split facilitates pseudo closed-loop simulation for evaluation in NAVSIM v2. `navtrain`, `navtest` and `navhard_two_stage` are filtered to increase interesting samples in the sets.
107117

108-
For the challenge on Hugging Face, we provide the `warmup_two_stage` and `private_test_e2e` for the warm-up and challenge track, respectively. Note that `private_test_e2e` requires you to download the data, while `warmup_two_stage` is a scene filter for the `test` split.
118+
For the challenge on Hugging Face, we provide the `warmup_two_stage` and `private_test_e2e` for the warm-up and challenge track, respectively.
109119

110120
## Troubleshooting
111121

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
wget https://huggingface.co/datasets/OpenDriveLab/OpenScene/resolve/main/navsim-v2/navsim_v2.1.1_navhard_two_stage_curr_sensors.tar.gz
2+
wget https://huggingface.co/datasets/OpenDriveLab/OpenScene/resolve/main/navsim-v2/navsim_v2.1.1_navhard_two_stage_hist_sensors.tar.gz
3+
wget https://huggingface.co/datasets/OpenDriveLab/OpenScene/resolve/main/navsim-v2/navsim_v2.1.1_navhard_two_stage_scene_pickles.tar.gz
4+
5+
tar -xzvf navsim_v2.1.1_navhard_two_stage_curr_sensors.tar.gz
6+
tar -xzvf navsim_v2.1.1_navhard_two_stage_hist_sensors.tar.gz
7+
tar -xzvf navsim_v2.1.1_navhard_two_stage_scene_pickles.tar.gz
8+
rm navsim_v2.1.1_navhard_two_stage_curr_sensors.tar.gz
9+
rm navsim_v2.1.1_navhard_two_stage_hist_sensors.tar.gz
10+
rm navsim_v2.1.1_navhard_two_stage_scene_pickles.tar.gz

0 commit comments

Comments
 (0)