I would like to know why the GA-DDPG model trained using the code from the GitHub (https://github.com/liruiw/GA-DDPG) achieves an 87% success rate on the YCB database. However, when I test the same trained model in the "handover_sim" environment, the success rate is only 6.25%, which is significantly worse than the results reported in the paper. I'm wondering if there could be a misalignment between the environment or settings in the GA-DDPG source code and the "handover_sim" testing environment. Is it possible that some additional adjustments to the environment settings are needed to improve the model's performance in the "handover_sim" environment?
The first result below is from my own retrained GA-DDPG model, and the second result is from loading the trained model and testing it in the "handover_sim" environment.
Training code following the GitHub (https://github.com/liruiw/GA-DDPG) setting:
python -m core.train_online --save_model --config_file td3_critic_aux_policy_aux.yaml --policy DDPG --log --fix_output_time ddpg_model_233_1000000_GADDPG --seed 233
Testing code following the GitHub setting:
Testing on YCB objects bash ./experiments/scripts/test_ycb.sh demo_model
Test Time: 08_08_2023_11:17:17 Data Root: data/scenes/data_5w.npz Model: demo_model
Script: td3_critic_aux_policy_aux.yaml Index: ycb_large.json
Num of Objs: 9 Num of Runs: 3
Policy: DDPG Model Path: output/demo_model Step: 300000
Test Episodes: 270.0 Avg. Length: 25.815 Index: scene_0-scene_164
Avg. Performance: (Return: 0.870 +- 0.02778) (Success: 0.870 +- 0.02778)
+---------------------+---------+-----------+
| object name | count | success |
|---------------------+---------+-----------|
| 003_cracker_box | 30 | 26 |
| 004_sugar_box | 30 | 23 |
| 005_tomato_soup_can | 30 | 28 |
| 006_mustard_bottle | 30 | 30 |
| 010_potted_meat_can | 30 | 23 |
| 021_bleach_cleanser | 30 | 26 |
| 024_bowl | 30 | 30 |
| 025_mug | 30 | 23 |
| 061_foam_brick | 30 | 26 |
+---------------------+---------+-----------+
run for "GA-DDPG hold" on the test split of s0 with:
GADDPG_DIR=GA-DDPG CUDA_VISIBLE_DEVICES=0 python examples/run_benchmark_gaddpg_hold.py
SIM.RENDER True
ENV.ID HandoverHandCameraPointStateEnv-v1
BENCHMARK.SETUP s0
pybullet build time: May 20 2022 19:44:17
2023-08-07 15:30:16: Running evaluation for results/2023-08-07_13-51-23_ga-ddpg-hold_s0_test
2023-08-07 15:30:16: Evaluation results:
| success rate | mean accum time (s) | failure (%) |
| (%) |
exec |
plan |
total |
hand contact |
object drop |
timeout |
| 6.25 ( 9/144) |
7.390 |
0.261 |
7.651 |
0.69 ( 1/144) |
13.19 ( 19/144) |
79.86 (115/144) |
| 2023-08-07 15:30:16: Printing scene ids |
|
|
|
|
|
|
| 2023-08-07 15:30:16: Success (9 scenes): |
|
|
|
|
|
|
5 8 16 20 25 30 37 55 109
2023-08-07 15:30:16: Failure - hand contact (1 scenes):
11
2023-08-07 15:30:16: Failure - object drop (19 scenes):
9 12 21 23 24 27 33 36 45 54 60 66 67 68 108 117 121 135 136
2023-08-07 15:30:17: Failure - timeout (115 scenes):
0 1 2 3 4 6 7 10 13 14 15 17 18 19 22 26 28 29 31 32
34 35 38 39 40 41 42 43 44 46 47 48 49 50 51 52 53 56 57 58
59 61 62 63 64 65 69 70 71 72 73 74 75 76 77 78 79 80 81 82
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102
103 104 105 106 107 110 111 112 113 114 115 116 118 119 120 122 123 124 125 126
127 128 129 130 131 132 133 134 137 138 139 140 141 142 143
2023-08-07 15:30:17: Evaluation complete.
I would like to know why the GA-DDPG model trained using the code from the GitHub (https://github.com/liruiw/GA-DDPG) achieves an 87% success rate on the YCB database. However, when I test the same trained model in the "handover_sim" environment, the success rate is only 6.25%, which is significantly worse than the results reported in the paper. I'm wondering if there could be a misalignment between the environment or settings in the GA-DDPG source code and the "handover_sim" testing environment. Is it possible that some additional adjustments to the environment settings are needed to improve the model's performance in the "handover_sim" environment?
The first result below is from my own retrained GA-DDPG model, and the second result is from loading the trained model and testing it in the "handover_sim" environment.
Training code following the GitHub (https://github.com/liruiw/GA-DDPG) setting:
python -m core.train_online --save_model --config_file td3_critic_aux_policy_aux.yaml --policy DDPG --log --fix_output_time ddpg_model_233_1000000_GADDPG --seed 233
Testing code following the GitHub setting:
Testing on YCB objects bash ./experiments/scripts/test_ycb.sh demo_model
Test Time: 08_08_2023_11:17:17 Data Root: data/scenes/data_5w.npz Model: demo_model
Script: td3_critic_aux_policy_aux.yaml Index: ycb_large.json
Num of Objs: 9 Num of Runs: 3
Policy: DDPG Model Path: output/demo_model Step: 300000
Test Episodes: 270.0 Avg. Length: 25.815 Index: scene_0-scene_164
Avg. Performance: (Return: 0.870 +- 0.02778) (Success: 0.870 +- 0.02778)
+---------------------+---------+-----------+
| object name | count | success |
|---------------------+---------+-----------|
| 003_cracker_box | 30 | 26 |
| 004_sugar_box | 30 | 23 |
| 005_tomato_soup_can | 30 | 28 |
| 006_mustard_bottle | 30 | 30 |
| 010_potted_meat_can | 30 | 23 |
| 021_bleach_cleanser | 30 | 26 |
| 024_bowl | 30 | 30 |
| 025_mug | 30 | 23 |
| 061_foam_brick | 30 | 26 |
+---------------------+---------+-----------+
run for "GA-DDPG hold" on the test split of s0 with:
GADDPG_DIR=GA-DDPG CUDA_VISIBLE_DEVICES=0 python examples/run_benchmark_gaddpg_hold.py
SIM.RENDER True
ENV.ID HandoverHandCameraPointStateEnv-v1
BENCHMARK.SETUP s0
pybullet build time: May 20 2022 19:44:17
2023-08-07 15:30:16: Running evaluation for results/2023-08-07_13-51-23_ga-ddpg-hold_s0_test
2023-08-07 15:30:16: Evaluation results:
| success rate | mean accum time (s) | failure (%) |
5 8 16 20 25 30 37 55 109
2023-08-07 15:30:16: Failure - hand contact (1 scenes):
11
2023-08-07 15:30:16: Failure - object drop (19 scenes):
9 12 21 23 24 27 33 36 45 54 60 66 67 68 108 117 121 135 136
2023-08-07 15:30:17: Failure - timeout (115 scenes):
0 1 2 3 4 6 7 10 13 14 15 17 18 19 22 26 28 29 31 32
34 35 38 39 40 41 42 43 44 46 47 48 49 50 51 52 53 56 57 58
59 61 62 63 64 65 69 70 71 72 73 74 75 76 77 78 79 80 81 82
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102
103 104 105 106 107 110 111 112 113 114 115 116 118 119 120 122 123 124 125 126
127 128 129 130 131 132 133 134 137 138 139 140 141 142 143
2023-08-07 15:30:17: Evaluation complete.