Release Discrete SAC benchmark update · kengz/SLM-Lab

Discrete SAC benchmark update

Upload PR #429
Dropbox data


Env. \ Alg.	DQN	DDQN+PER	A2C (GAE)	A2C (n-step)	PPO	SAC
Breakout graph	80.88	182	377	398	443	3.51*
Pong graph	18.48	20.5	19.31	19.56	20.58	19.87*
Seaquest graph	1185	4405	1070	1684	1715	171*
Qbert graph	5494	11426	12405	13590	13460	923*
LunarLander graph	192	233	25.21	68.23	214	276
UnityHallway graph	-0.32	0.27	0.08	-0.96	0.73	0.01
UnityPushBlock graph	4.88	4.93	4.68	4.93	4.97	-0.70

Episode score at the end of training attained by SLM Lab implementations on discrete-action control problems. Reported episode scores are the average over the last 100 checkpoints, and then averaged over 4 Sessions. A Random baseline with score averaged over 100 episodes is included. Results marked with * were trained using the hybrid synchronous/asynchronous version of SAC to parallelize and speed up training time. For SAC, Breakout, Pong and Seaquest were trained for 2M frames instead of 10M frames.

For the full Atari benchmark, see Atari Benchmark

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discrete SAC benchmark update

Discrete SAC benchmark update