This is the official repository for the paper:
"Dialogue Systems for Emotional Support via Value Reinforcement"
The framework consists of three key components:
-
Target Value Detector
Identifies which human values to reinforce at each turn. -
Reference Generator
Generates utterances that promote these values from the seeker. -
Supporter Model
Determines appropriate strategies and responses based on values and references.
- Training Command
bash tvd_sft.sh
- Training Command
bash rg_sft.sh bash rg_dpo.sh
- Training Command
bash sptr_sft.sh bash sptr_dpo.sh
After training all three components (target value detector, reference generator, and supporter model), you can simulate a conversation with the seeker simulator using the seeker personas (test dataset).
- Run Simulation
bash simulation.sh
