Language models have demonstrated remarkable versatility across various fields, with their extensive knowledge and reasoning capabilities showing promise for enhancing recommendation tasks. Researchers have found that fine-tuning language models for downstream tasks can further boost their effectiveness; however, applying this to large-scale recommendation data is often prohibitively time-consuming due to the volume of user-item interactions and the complexity of language models. Therefore, this repo aims to provide an efficient fine-tuning method to quickly enhance the PLM's compatibility with recommendation data while leveraging texts to improve CTR prediction performance.
For more details, please refer to our paper.
Clone this repo and set DATA_MOUNT_DIR=[DOWNLOAD_PATH]/data
in your environment.
Download Amazon Sports dataset from here and then process data:
python build_dataset.py amazon-sports
ID-Based Model
python run_ctr.py amazon-sports
Pre-training LM
set model_name_or_path
in config/mlm.yaml
and then
python script/run_mlm.py amazon-sports
Fine-tuning LM
set ctr_model/pretrained_dir
in config/align.yaml
and then
python script/run_align.py amazon-sports [PRE_TRAINED_LM_PATH]
Training recommendation backbone
python script/run_cotrain.py amazon-sports [OUPUT_PATH] [FT_LM_PATH] [TOKENIZER_PATH]
If you find this project useful in your research, please cite our research paper:
@article{wang2024cela,
title={CELA: Cost-Efficient Language Model Alignment for CTR Prediction},
author={Wang, Xingmei and Liu, Weiwen and Chen, Xiaolong and Liu, Qi and Huang, Xu and Yichao, Wang and Li, Xiangyang and Wang, Yasheng and Dong, Zhenhua and Lian, Defu and Tang, Ruiming},
journal={arXiv preprint arXiv:2405.10596},
year={2024}
}