How to generate outputs from the PPOTrainer of chatgpt? #2906
              
                Unanswered
              
          
                  
                    
                      huliangbing
                    
                  
                
                  asked this question in
                Community | Q&A
              
            Replies: 2 comments 1 reply
-
| Thanks for your feedback. We have already supported actor-inference in our newly updated PR. | 
Beta Was this translation helpful? Give feedback.
                  
                    1 reply
                  
                
            -
| how to infer a RM(reward_model) like rm_checkpoint.pt? | 
Beta Was this translation helpful? Give feedback.
                  
                    0 replies
                  
                
            
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
-
How to generate outputs from the PPOTrainer of chatgpt? Can we generate outputs from reward_model or initial_model?
Can you show me the code like this:
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
inputs = tokenizer(prompt, return_tensors="pt")
model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)
outputs = ### model.generate(**inputs, num_beams=5, num_beam_groups=5, max_new_tokens=30)
tokenizer.decode(outputs[0], skip_special_tokens=True)
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions