You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+9-7Lines changed: 9 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -52,12 +52,14 @@
52
52
<palign="right">(<ahref="#top">back to top</a>)</p>
53
53
54
54
## Changelog <aname="changelog"></a>
55
-
-**`[2025/04/13]`** NAVSIM v2.1.1 release (official devkit version for 2025 warm-up phase)
56
-
- Updated dataset for the [Hugging Face Warmup leaderboard](https://huggingface.co/spaces/AGC2025/e2e-driving-warmup) with minor fixes
57
-
58
-
- ⚠️ **IMPORTANT**: To submit to the updated leaderboard, you need to re-download the synthetic dataset v2.1.1 (see [download](docs/install.md))
59
-
60
-
-**`[2025/04/08]`** NAVSIM v2.1 release
55
+
-**`[2025/04/24]`** NAVSIM v2.1.2 release
56
+
- Release of `navhard_two_stage` dataset (see [splits](docs/splits.md))
57
+
- Updated Extended Predictive Driver Model Score (EPDMS) for the [Hugging Face Warmup leaderboard](https://huggingface.co/spaces/AGC2025/e2e-driving-warmup). See see [metrics](docs/metrics.md) for details regarding the implementation.
58
+
- ⚠️ **IMPORTANT**: All entries to the warmup leaderboard have been removed. Please resubmit to obtain your score with the updated metric.
59
+
- The test leaderboard (coming this week) will use the same metric as this warmup leaderboard.
60
+
-**`[2025/04/13]`** NAVSIM v2.1.1 release
61
+
- Updated dataset for the warmup leaderboard with minor fixes
62
+
-**`[2025/04/08]`** NAVSIM v2.1 release
61
63
- Added new dataset for the [Hugging Face Warmup leaderboard](https://huggingface.co/spaces/AGC2025/e2e-driving-warmup) (see [submission](docs/submission.md))
62
64
- Introduced support for two-stage reactive traffic agents (see [traffic simulation](docs/metrics.md))
63
65
-**`[2025/02/28]`** NAVSIM v2.0 release
@@ -77,7 +79,7 @@
77
79
- Support for test phase frames of competition
78
80
- Download script for trainval
79
81
- Egostatus MLP Agent and training pipeline
80
-
-**`[2024/03/25]`** NAVSIM v0.3 release (official devkit version for warm-up phase)
Copy file name to clipboardExpand all lines: docs/install.md
+11-5Lines changed: 11 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,15 +26,15 @@ cd download && ./download_maps
26
26
Next download the data splits you want to use.
27
27
Note that the dataset splits do not exactly map to the recommended standardized training / test splits-
28
28
Please refer to [splits](splits.md) for an overview on the standardized training and test splits including their size and check which dataset splits you need to download in order to be able to run them.
29
-
30
-
You can download the mini, trainval, test, private_test_e2e and warmup_synthetic_scenes dataset split with the following scripts
29
+
You can download these splits with the following scripts.
31
30
32
31
```bash
33
32
./download_mini
34
33
./download_trainval
35
34
./download_test
36
35
./download_private_test_e2e
37
36
./download_warmup_two_stage
37
+
./download_navhard_two_stage
38
38
```
39
39
40
40
Also, the script `./download_navtrain` can be used to download a small portion of the `trainval` dataset split which is needed for the `navtrain` training split.
@@ -57,16 +57,18 @@ This will download the splits into the download directory. From there, move it t
57
57
| ├── trainval
58
58
| ├── private_test_e2e
59
59
| └── mini
60
+
└── navhard_two_stage
61
+
| ├── openscene_meta_datas
62
+
| ├── sensor_blobs
63
+
| ├── synthetic_scene_pickles
64
+
| └── synthetic_scenes_attributes.csv
60
65
└── warmup_two_stage
61
66
├── openscene_meta_datas
62
67
├── sensor_blobs
63
68
├── synthetic_scene_pickles
64
69
└── synthetic_scenes_attributes.csv
65
70
66
71
```
67
-
68
-
⚠️ **IMPORTANT:** If you have already downloaded the data for Navsim V2.0.1 and tried the Hugging Face Leaderboard, please replace the old `"synthetic_scenes"` folder with the new `"warmup_two_stage"` folder. In Navsim V2.1, the traffic agents' policy has been updated, and the old data is no longer compatible.
69
-
70
72
Set the required environment variables, by adding the following to your `~/.bashrc` file
71
73
Based on the structure above, the environment variables need to be defined as:
⏰ **Note:** The `navhard_two_stage` split is used for local testing of your model's performance in a two-stage pseudo closed-loop setup.
84
+
In contrast, `warmup_two_stage` is a smaller dataset designed for validating and testing submissions to the [Hugging Face Warmup leaderboard](https://huggingface.co/spaces/AGC2025/e2e-driving-warmup).
85
+
In other words, the results you obtain locally on `warmup_two_stage` should match the results you see after submitting to Hugging Face.
Copy file name to clipboardExpand all lines: docs/metrics.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ In NAVSIM v2, the EPDMS extends the PDMS, introducing:
7
7
- 2 new multiplier metrics (DDC and TLC)
8
8
- False-positive penalty filtering
9
9
10
-
The Lane Keeping subscore (LK) penalizes driving too far from the centerline for an extended time. It is disabled on intersections where the centerline annotations often don't match the actual lane markings perceived by the sensors. Besides, NAVSIM v2 puts stronger emphasis on comfortable driving. The existing Comfort (C) subscore was slightly improved to also evaluate how the planned trajectory matches the vehicle's motion history. Moreover, the new extended comfort (EC) subscore compares the trajectory outputs of subsequent frames and their resulting dynamic states. A discrepancy in acceleration, jerk etc. between subsequent frames results in uncomfortable behavior and thus a lower score.
10
+
The Lane Keeping subscore (LK) penalizes driving too far from the centerline for an extended time. It is disabled on intersections where the centerline annotations often don't match the actual lane markings perceived by the sensors. Besides, NAVSIM v2 puts stronger emphasis on comfortable driving. The existing Comfort (C) subscore from NAVSIM v1 was slightly improved to also evaluate how the planned trajectory matches the vehicle's motion history, giving History Comfort (HC). Moreover, the new extended comfort (EC) subscore compares the trajectory outputs of subsequent frames and their resulting dynamic states. A discrepancy in acceleration, jerk etc. between subsequent frames results in uncomfortable behavior and thus a lower score.
11
11
12
12
The new Driving Direction Compliance (DDC) and Traffic Light Compliance (TLC) subscores extend the inadmissible behaviors detected and penalized in evaluation. Further, to reduce false positive penalties, we disable penalties when the human agent is also responsible for a violation. This ensures that the planner is not unfairly penalized in situations where breaking a rule is necessary to achieve a valid driving goal. For example, if the agent must briefly enter the oncoming lane to overtake a static obstacle.
For reference, the PDMS used in NAVSIM v1 which used a slightly different version of HC called Comfort (C), was defined as:
45
45
46
46
<br>
47
47
@@ -68,8 +68,8 @@ The new NAVSIM v2 evaluation uses a two-stage aggregation process to approximate
68
68
69
69
3.**Weighting and Aggregation:**
70
70
- To emulate the effects of closed-loop simulation, the relevance of each follow-up scene to the overall score depends on how close its starting position is to where the submitted planner actually ended in the first stage.
71
-
- We assign higher weights to follow-up scenes that start closer to the submitted planner's end position.
72
-
- We first compute a weighted aggregation if all second-stage scores. Finally, we aggregate the scores of the first and second stage to produce the aggregated metric.
71
+
- We assign higher weights to follow-up scenes that start closer to the submitted planner's end position, with a gaussian kernel.
72
+
- We first compute a weighted aggregation if all second-stage scores. Finally, we aggregate the scores of the first and second stage via a simple multiplication to produce the aggregated metric.
73
73
74
74
# Run an evaluation
75
75
To evaluate the PDM score for an agent you can run:
Copy file name to clipboardExpand all lines: docs/splits.md
+19-9Lines changed: 19 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,15 +1,15 @@
1
1
# Dataset splits vs. filtered training / test splits
2
2
3
3
The NAVSIM framework utilizes several dataset splits for standardized training and evaluating agents.
4
-
All of them use the OpenScene dataset that is divided into the dataset splits `mini`, `trainval`,` test`, `warmup_two_stage`, `private_test_e2e`, which can all be downloaded separately.
4
+
All of them use the OpenScene dataset that is divided into the dataset splits `mini`, `trainval`,` test`, which can all be downloaded separately.
5
5
6
-
It is possible to run trainings and evaluations directly on these sets (see `Standard` in table below).
6
+
It is possible to run trainings and evaluations directly on these sets (see `OpenScene` in table below).
7
7
Alternatively, you can run trainings and evaluations on training and validation splits that were filtered for challenging scenarios (see `NAVSIM` in table below), which is the recommended option for producing comparable and competitive results efficiently.
8
8
In contrast to the dataset splits which refer to a downloadable set of logs, the training / test splits are implemented as scene filters, which define how scenes are extracted from these logs.
9
9
10
10
The NAVSIM training / test splits subsample the OpenScene dataset splits.
11
11
Moreover, the NAVSIM splits include overlapping scenes, while the Standard splits are non-overlapping.
12
-
Specifically, `navtrain` is based on the `trainval` data and `navtest` on the `test` data.
12
+
Specifically, `navtrain` is based on the `trainval` data and `navtest`and `navhard_two_stage`on the `test` data.
13
13
14
14
As the `trainval` sensor data is very large, we provide a separate download link, which loads only the frames needed for `navtrain`.
15
15
This eases access for users that only want to run the `navtrain` split and not the `trainval` split. If you already downloaded the full `trainval` sensor data, it is **not necessary** to download the `navtrain` frames as well.
@@ -18,7 +18,6 @@ The logs are always the complete dataset split.
18
18
## Overview
19
19
20
20
The Table belows offers an overview on the training and test splits supported by NAVSIM.
21
-
In Navsim-v1.1, the training/test split can bet set with a single config parameter given in the table.
22
21
23
22
<tableborder="0">
24
23
<tr>
@@ -30,7 +29,7 @@ In Navsim-v1.1, the training/test split can bet set with a single config paramet
30
29
<th>Config parameters</th>
31
30
</tr>
32
31
<tr>
33
-
<td rowspan="3">Standard</td>
32
+
<td rowspan="3">OpenScene</td>
34
33
<td>trainval</td>
35
34
<td>Large split for training and validating agents with regular driving recordings. Corresponds to nuPlan and downsampled to 2HZ.</td>
36
35
<td>14GB</td>
@@ -58,7 +57,7 @@ In Navsim-v1.1, the training/test split can bet set with a single config paramet
58
57
</td>
59
58
</tr>
60
59
<tr>
61
-
<td rowspan="2">NAVSIM</td>
60
+
<td rowspan="3">NAVSIM</td>
62
61
<td>navtrain</td>
63
62
<td>Standard split for training agents in NAVSIM with non-trivial driving scenes. Sensors available separately in <a href="https://github.com/autonomousvision/navsim/blob/main/download/download_navtrain.sh">download_navtrain.sh</a>.</td>
64
63
<td>-</td>
@@ -76,10 +75,19 @@ In Navsim-v1.1, the training/test split can bet set with a single config paramet
76
75
train_test_split=navtest
77
76
</td>
78
77
</tr>
78
+
<tr>
79
+
<td>navhard_two_stage</td>
80
+
<td>Standard split for testing agents in NAVSIM v2 with real and synthetic driving scenes. Synthetic frames downloadable via <a href="https://github.com/autonomousvision/navsim/blob/main/download/download_navhard_two_stage.sh">download_navhard_two_stage.sh</a>.</td>
81
+
<td>892M</td>
82
+
<td>31G</td>
83
+
<td>
84
+
train_test_split=navhard_two_stage
85
+
</td>
86
+
</tr>
79
87
<tr>
80
88
<td rowspan="2">Competition</td>
81
89
<td>warmup_two_stage</td>
82
-
<td>Warmup test split to validate submission on hugging face. Available as a filter for test split.</td>
90
+
<td>Warmup test split to validate submission on hugging face. Synthetic frames downloadable via <a href="https://github.com/autonomousvision/navsim/blob/main/download/download_warmup_two_stage.sh">download_warmup_two_stage.sh</a>.</td>
83
91
<td>27M</td>
84
92
<td>1.2G</td>
85
93
<td>
@@ -103,9 +111,11 @@ In Navsim-v1.1, the training/test split can bet set with a single config paramet
103
111
104
112
The standard splits `trainval`, `test`, and `mini` are from the OpenScene dataset. Note that the data corresponds to the nuPlan dataset with a lower frequency of 2Hz. You can download all standard splits over Hugging Face with the bash scripts in [download](../download)
105
113
106
-
NAVSIM provides a subset and filter of the `trainval` split, called `navtrain`. The `navtrain` split facilitates a standardized training scheme and requires significantly less sensor data storage than `travel` (445GB vs. 2100GB). If your agents don't need historical sensor inputs, you can download `navtrain` without history, which requires 300GB of storage. Note that `navtrain` can be downloaded separately via [download_navtrain.sh](https://github.com/autonomousvision/navsim/blob/main/download/download_navtrain.sh) but still requires access to the `trainval` logs. Similarly, the `navtest` split enables a standardized set for testing agents with a provided scene filter. Both `navtrain` and `navtest` are filtered to increase interesting samples in the sets.
114
+
NAVSIM provides a subset and filter of the `trainval` split, called `navtrain`. The `navtrain` split facilitates a standardized training scheme and requires significantly less sensor data storage than `travel` (445GB vs. 2100GB). If your agents don't need historical sensor inputs, you can download `navtrain` without history, which requires 300GB of storage. Note that the sensor data for `navtrain` can be downloaded separately via [download_navtrain.sh](https://github.com/autonomousvision/navsim/blob/main/download/download_navtrain.sh) but it still requires access to the `trainval` logs.
115
+
116
+
The `navtest` split enables a standardized set for testing agents in NAVSIM v1 with a provided scene filter. Similarly, the `navhard_two_stage` split split facilitates pseudo closed-loop simulation for evaluation in NAVSIM v2. `navtrain`, `navtest` and `navhard_two_stage` are filtered to increase interesting samples in the sets.
107
117
108
-
For the challenge on Hugging Face, we provide the `warmup_two_stage` and `private_test_e2e` for the warm-up and challenge track, respectively. Note that `private_test_e2e` requires you to download the data, while `warmup_two_stage` is a scene filter for the `test` split.
118
+
For the challenge on Hugging Face, we provide the `warmup_two_stage` and `private_test_e2e` for the warm-up and challenge track, respectively.
0 commit comments