How to improve the GA-DDPG model performance on the "handover_sim" testing environment

I would like to know why the GA-DDPG model trained using the code from the GitHub (https://github.com/liruiw/GA-DDPG) achieves an 87% success rate on the YCB database. However, when I test the same trained model in the "handover_sim" environment, the success rate is only 6.25%, which is significantly worse than the results reported in the paper. I'm wondering if there could be a misalignment between the environment or settings in the GA-DDPG source code and the "handover_sim" testing environment. Is it possible that some additional adjustments to the environment settings are needed to improve the model's performance in the "handover_sim" environment?

The first result below is from my own retrained GA-DDPG model, and the second result is from loading the trained model and testing it in the "handover_sim" environment.

Training code following the GitHub (https://github.com/liruiw/GA-DDPG) setting:
python -m core.train_online --save_model --config_file td3_critic_aux_policy_aux.yaml --policy DDPG --log --fix_output_time ddpg_model_233_1000000_GADDPG --seed 233

Testing code following the GitHub setting:
Testing on YCB objects bash ./experiments/scripts/test_ycb.sh demo_model

------------------------------------------------------------------
Test Time: 08_08_2023_11:17:17 Data Root: data/scenes/data_5w.npz Model: demo_model
Script: td3_critic_aux_policy_aux.yaml Index: ycb_large.json
Num of Objs: 9 Num of Runs: 3 
Policy: DDPG Model Path: output/demo_model Step: 300000
Test Episodes: 270.0 Avg. Length: 25.815 Index: scene_0-scene_164 
Avg. Performance: (Return: 0.870 +- 0.02778) (Success: 0.870 +- 0.02778)
+---------------------+---------+-----------+
| object name         |   count |   success |
|---------------------+---------+-----------|
| 003_cracker_box     |      30 |        26 |
| 004_sugar_box       |      30 |        23 |
| 005_tomato_soup_can |      30 |        28 |
| 006_mustard_bottle  |      30 |        30 |
| 010_potted_meat_can |      30 |        23 |
| 021_bleach_cleanser |      30 |        26 |
| 024_bowl            |      30 |        30 |
| 025_mug             |      30 |        23 |
| 061_foam_brick      |      30 |        26 |
+---------------------+---------+-----------+


run for "GA-DDPG hold" on the test split of s0 with:

GADDPG_DIR=GA-DDPG CUDA_VISIBLE_DEVICES=0 python examples/run_benchmark_gaddpg_hold.py \
  SIM.RENDER True \
  ENV.ID HandoverHandCameraPointStateEnv-v1 \
  BENCHMARK.SETUP s0

pybullet build time: May 20 2022 19:44:17
2023-08-07 15:30:16: Running evaluation for results/2023-08-07_13-51-23_ga-ddpg-hold_s0_test
2023-08-07 15:30:16: Evaluation results: 
|  success rate  |    mean accum time (s)    |                    failure (%)                     |
|      (%)       |  exec  |  plan  |  total  |  hand contact  |   object drop   |     timeout     |
|:--------------:|:------:|:------:|:-------:|:--------------:|:---------------:|:---------------:|
| 6.25 (  9/144) | 7.390  | 0.261  |  7.651  | 0.69 (  1/144) | 13.19 ( 19/144) | 79.86 (115/144) |
2023-08-07 15:30:16: Printing scene ids
2023-08-07 15:30:16: Success (9 scenes): 
---  ---  ---  ---  ---  ---  ---  ---  ---
  5    8   16   20   25   30   37   55  109
---  ---  ---  ---  ---  ---  ---  ---  ---
2023-08-07 15:30:16: Failure - hand contact (1 scenes): 
---
 11
---
2023-08-07 15:30:16: Failure - object drop (19 scenes): 
---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---
  9   12   21   23   24   27   33   36   45   54   60   66   67   68  108  117  121  135  136
---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---
2023-08-07 15:30:17: Failure - timeout (115 scenes): 
---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---
  0    1    2    3    4    6    7   10   13   14   15   17   18   19   22   26   28   29   31   32
 34   35   38   39   40   41   42   43   44   46   47   48   49   50   51   52   53   56   57   58
 59   61   62   63   64   65   69   70   71   72   73   74   75   76   77   78   79   80   81   82
 83   84   85   86   87   88   89   90   91   92   93   94   95   96   97   98   99  100  101  102
103  104  105  106  107  110  111  112  113  114  115  116  118  119  120  122  123  124  125  126
127  128  129  130  131  132  133  134  137  138  139  140  141  142  143
---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---  ---
2023-08-07 15:30:17: Evaluation complete.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to improve the GA-DDPG model performance on the "handover_sim" testing environment #13

2023-08-07 15:30:16: Failure - hand contact (1 scenes):

11

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

(%)	exec	plan	total	hand contact	object drop	timeout
6.25 ( 9/144)	7.390	0.261	7.651	0.69 ( 1/144)	13.19 ( 19/144)	79.86 (115/144)
2023-08-07 15:30:16: Printing scene ids
2023-08-07 15:30:16: Success (9 scenes):

Uh oh!

How to improve the GA-DDPG model performance on the "handover_sim" testing environment #13

Description

2023-08-07 15:30:16: Failure - hand contact (1 scenes):

11

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions