I observed that the policy will be set to noise in "expand_node", but the "update_policy" used during inference (in "process_mini_batch") will directly update the policy to the result of network calculations, so that there will be no randomness at all except selfplay games.