You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
In the current master version, the script ./replay/gather_results.py is used to gather experiments and produce CSV table. Recently, when I tried to reproduce the results of your paper: Decoupling feature extraction from policy learning ..., I found that my rewards results are much better than yours (especially the Table 3 in page 7). This table compares the mean reward performance in RL (using PPO) in robotic arm (random target) environment (aka KukaButton with random target). I dug into the code, I did find the bug:
Explanation
It's a rounding problem. Line 138 actually performs "floor" rather than float division (at least for KuKaButton, and maybe for the other environments too), since run_acc is an array of dtype int64.
Code example
The problem actually comes from numpy. The following code reproduces this phenomenon:
One funny thing is that A = A / 10 works (not as "floor"), but not A[:] = A[:] / 10.
Solution
Use my code instead: ./replay/postprocessing_logs.py (temporary name). My code can directly produce the Latex table and deal with the most of situations (heterogeneous data: different number of experiments, different lengths, scalable "checkpoints" (timesteps), different SRL models)
The following picture is a demo of my code:
when there is only one experiment: no confidence interval
when there are several experiments: 95% confidence interval is estimated
when RL training of a SRL model was stopped accidentally, a "-" will be put
don't need to specify the SRL models, the folders are searched automatically.
the "checkpoints" [1e6, 2e6, 3e6, 4e6, 5e6] can be changed by user. (put M for million, K for thousand)
save the result to .tex file (Latex table).
Question
Are there similar problems elsewhere in the toolbox ?
The text was updated successfully, but these errors were encountered:
Describe the bug
In the current master version, the script ./replay/gather_results.py is used to gather experiments and produce CSV table. Recently, when I tried to reproduce the results of your paper: Decoupling feature extraction from policy learning ..., I found that my rewards results are much better than yours (especially the Table 3 in page 7). This table compares the mean reward performance in RL (using PPO) in robotic arm (random target) environment (aka KukaButton with random target). I dug into the code, I did find the bug:
robotics-rl-srl/replay/gather_results.py
Lines 136 to 140 in 1ab1bd3
@kalifou has already confirmed this problem.
Explanation
It's a rounding problem. Line 138 actually performs "floor" rather than float division (at least for KuKaButton, and maybe for the other environments too), since
run_acc
is an array of dtype int64.Code example
The problem actually comes from numpy. The following code reproduces this phenomenon:
One funny thing is that
A = A / 10
works (not as "floor"), but notA[:] = A[:] / 10
.Solution
Use my code instead: ./replay/postprocessing_logs.py (temporary name). My code can directly produce the Latex table and deal with the most of situations (heterogeneous data: different number of experiments, different lengths, scalable "checkpoints" (timesteps), different SRL models)
The following picture is a demo of my code:
Question
Are there similar problems elsewhere in the toolbox ?
The text was updated successfully, but these errors were encountered: