You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To set policy_stable variable, provided code checks whether the policy is changed. If there are multiple optimal policies, the policy may change infinitely even though optimal policy is already found.
See Exercise 4.4 of the 2018 edition in Sutton & Barto book, it explicitly points out this bug in the pseudocode.
The text was updated successfully, but these errors were encountered:
To set
policy_stable
variable, provided code checks whether the policy is changed. If there are multiple optimal policies, the policy may change infinitely even though optimal policy is already found.See Exercise 4.4 of the 2018 edition in Sutton & Barto book, it explicitly points out this bug in the pseudocode.
The text was updated successfully, but these errors were encountered: