GRE: General Recommendation-Oriented Text Embedding

Introduction

Text embedding models have demonstrated immense potential in semantic understanding, which recommendation systems can leverage to discern subtle differences between items, thereby enhancing performance. Although general text embedding models have achieved broad success, there is currently a lack of an embedding model specifically designed for recommendation systems that excels across diverse recommendation scenarios, rather than being exclusively developed for specific downstream tasks or datasets. To bridge these gaps, we introduce the General Recommendation-Oriented Text Embedding (GRE) and a comprehensive benchmark, GRE-B, in this repo. Figure 1 illustrates the overview of our work.

GRE

We pre-trained GRE on a wide array of data specifically curated from various recommendation domains covering e-commerce, catering, fashion, books, games, videos and more. To ensure the quality and balance of the data, we employed the Coreset method by selecting high-quality texts to maintain balance across various domains. Then, GRE is further refined by extracting high-quality item pairs through collaborative signals and directly integrating such signals into the model through contrastive learning.

After fine-tuning, you can use the model to generate textual item embeddings for recommendation tasks.

The trained GRE models can be downloaded here: small, base, and large.

GRE-B

To comprehensively assess our general recommendation-oriented embedding, we have established a benchmark using diverse recommendation datasets which are distinct from the training data to guarantee fairness in evaluation. Our benchmark includes a total of 26 datasets, which are categorized into six recommendation scenarios: e-commerce, fashion, books, games, video, and catering.

We utilize SASRec and DSSM for the retrieval task, and the results are evaluated using metrics including NDCG and Recall. As for the ranking task, DIN and DeepFM are employed, while AUC and Logloss are chosen for evaluation.

The statistics of the datasets:

Dataset

For each test dataset, execute process.py and filter.py

Notice: for Goodreads, GoogleLocalData, Yelp, change the low_rating_thres in config's *.yaml to ~ for retrieval

Processed data can be downloaded here.

Usage

Environment

conda env create -f rec.yaml

Set the DATA_MOUNT_DIR in your environment:

export DATA_MOUNT_DIR=[DOWNLOAD_PATH]/data

Ranking

for naive:

bash ctr.sh [CUDA_ID_0] [CUDA_ID_1] [DATASET_NAME]

for text-embedding-enhanced:

bash ctrwlm.sh [CUDA_ID_0] [CUDA_ID_1] [DATASET_NAME] [TEXT_EMBEDDING_PATH] [SAVE_PREFIX]

Retrieval

for naive:

cd RecStudio
bash gre.sh SASRec [DATASET_PKL_PATH] [TEXT_EMBEDDING_PATH]

for text-embedding-enhanced:

cd RecStudio
bash gre.sh TE_ID_SASRec [DATASET_PKL_PATH] [TEXT_EMBEDDING_PATH]

Configuration

You can change the configuration in RecStudio/recstudio/model/seq/config/din.yaml, RecStudio/recstudio/model/seq/config/sasrec.yaml and RecStudio/recstudio/model/seq/config/te_id_sasrec.yaml to search the best results.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.assets		README.assets
RecStudio		RecStudio
config		config
data		data
dataset		dataset
models		models
module		module
script		script
utils		utils
README.md		README.md
ctr.sh		ctr.sh
ctrwlm.sh		ctrwlm.sh
ctrwlm_all.sh		ctrwlm_all.sh
get_result_str.py		get_result_str.py
google.sh		google.sh
rec.yaml		rec.yaml
recall_all.sh		recall_all.sh
run_recall_one.sh		run_recall_one.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GRE: General Recommendation-Oriented Text Embedding

Introduction

GRE

GRE-B

Dataset

Usage

Environment

Ranking

Retrieval

Configuration

About

Releases

Packages

Contributors 2

Languages

pepsi2222/GREB

Folders and files

Latest commit

History

Repository files navigation

GRE: General Recommendation-Oriented Text Embedding

Introduction

GRE

GRE-B

Dataset

Usage

Environment

Ranking

Retrieval

Configuration

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages