Self-play (SP) is a method in Reinforcement Learning (RL) where an agent learns from the environment by playing against itself until the policy and value functions converge. The SP-based methods have recorded state-of-the-art results in playing different computer games such as Chess, Go and Othello. In this paper, we show how the RNA sequence design problem where a sequence is designed to match a given target structure can be modelled through the SP while performing the state-value evaluation using a deep value network. Our model dubbed RNASP recorded the best and very competitive results on the benchmark RNA design datasets. This work also motivates the application of the self-play to other Computational Biology problems.
Install RNA using this link