Augment or Not? A Comparative Study of Pure and Augmented Large Language Model Recommenders

This is the official repository of the paper Augment or Not? A Comparative Study of Pure and Augmented Large Language Model Recommenders.
Authors: Wei-Hsiang Huang, Chen-Wei Ke, Wei-Ning Chiu, Yu-Xuan Su, Chun Chun Yang, Chieh-Yuan Cheng, Yun-Nung Chen, Pu-Jen Cheng.

🌞 Paper Overview

✨ Table of content

👀 Overview
🌕 Pure LLM Recommenders
🌓 Augmented LLM Recommenders
Experiment
🌋 The Challenge of LLM Recommenders
🍣 Future Direction
- ✅ Cold-Start Issue
- ✅ Cross-Domain Generalizability

👀 Overview

LLM Recommenders utilize LLM to do the recommendations. In this survey, we further concentrate to LLM as final decision maker. That is, given $\mathcal{U}$ be the set of users, $\mathcal{I}$ be the set of items, $\mathcal{M}$ be the set of meta information, the LLM Recommenders $\mathbb{L}$ will conduct the decision of the recommendations ($\mathcal{R}$).

$$\mathbb{L}: \mathcal{U} \times \mathcal{I} \times \mathcal{M} \times f(\mathcal{U}, \mathcal{I}, \mathcal{M} ) \rightarrow \mathcal{R}.$$

where $f$ denotes the augmentation map, which can be any non-LLM techniques designed to improve the performance of the LLM Recommender $\mathbb{L}$.

With the growing interest and parallel development in both pure LLM-based approaches and those augmented with non-LLM techniques, it is crucial to systematically understand the different aspects of both scenarios. Therefore, we categorize LLM Recommenders into Pure and Augmented approaches based on whether the augmentation map $f$ is zero-map or not.

🌕 Pure LLM Recommenders

Pure LLM Recommenders refer to method that leverage the capabilities of LLMs to perform recommendation tasks. These methods can be further categorized into classes such as Naive Embedding Utilization,Naive Pretrained LM Finetuning, Instruction Tuning, Model Architectural Adaptations, Reflect-and-Rethink, and Others.

✅ Naive Embedding Utilization

Naive Embedding Utilization refers to methods that directly leverage the final hidden state or aggregated embeddings produced by LLMs for recommendation tasks.

Venue	Code	Paper
CIKM'19	Code	BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer
ACM'23	Code	One Model for All: Large Language Models are Domain-Agnostic Recommendation Systems
KDD'23	Code	Text Is All You Need: Learning Language Representations for Sequential Recommendation
Recsys'23	Code	Leveraging Large Language Models for Sequential Recommendation

✅ Naive Pretrained LM Finetuning

Naive Pretrained LM Finetuning refers to approaches that formulate recommendation as a natural language task and directly fine-tune pretrained language models.

Venue	Code	Paper
Recsys'22	Code	Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5)
CIKM'23	Code	Prompt Distillation for Efficient LLM-based Recommendation
NAACL'24	None	Aligning Large Language Models with Recommendation Knowledge
ACL'24	Code	RDRec: Rationale Distillation for LLM-based Recommendation

✅ Instruction Tuning

Instruction tuning adapts LLMs to recommendation tasks by expressing them as instructional prompts.

Venue	Code	Paper
Recsys'23	Code	TALLRec: An Effective and Efficient Tuning Framework to Align Large Language Model with Recommendation
ECIR'24	Code	GenRec: Large Language Model for Generative Recommendation
ACM'25	Code	A Bi-Step Grounding Paradigm for Large Language Models in Recommendation Systems
Arxiv'23	None	Do LLMs Understand User Preferences? Evaluating LLMs On User Rating Prediction

✅ Model Architectural Adaptations

In addition to standard applications of LLMs, numerous studies have proposed novel architectural adaptations of LLM backbones, specifically designed for recommendation systems.

Venue	Code	Paper
Arxiv'24	None	Rethinking Large Language Model Architectures for Sequential Recommendations
Inf. Process. Manag.'25	Code	Sequential recommendation by reprogramming pretrained transformer
Arxiv'25	None	MoLoRec: A Generalizable and Efficient Framework for LLM-Based Recommendation

✅ Reflect-and-Rethink

Reflect-and-Rethink methods go beyond standard supervised learning by reflecting on outputs, refining prompts, or interpreting user intent to guide prompts design.

Venue	Code	Paper
SIGIR'24	Code	Large Language Models are Learnable Planners for Long-Term Recommendation
AAAI'25	None	Re2LLM: Reflective Reinforcement Large Language Model for Session-based Recommendation
SIGIR'24	Code	MACRec: a Multi-Agent Collaboration Framework for Recommendation
CIKM'24	Code	RecPrompt: A Self-tuning Prompting Framework for News Recommendation Using Large Language Models
SIGIR'24	Code	Large Language Models for Intent-Driven Session Recommendations
ACM'24	None	Recommendation as Instruction Following: A Large Language Model Empowered Recommendation Approach
CIKM'23	Code	Large Language Models as Zero-Shot Conversational Recommenders
SIGIR'24	Code	Retrieval-Augmented Conversational Recommendation with Prompt-based Semi-Structured Natural Language State Tracking
ACL'25	Code	iAgent: LLM Agent as a Shield between User and Recommender Systems

✅ Others

Others focus on designing suitable training objectives, metadata summarization, data essence extraction, among others.

Venue	Code	Paper
Recsys'24	None	CALRec: Contrastive Alignment of Generative LLMs for Sequential Recommendation
Arxiv'24	Code	Causality-Enhanced Behavior Sequence Modeling in LLMs for Personalized Recommendation
WWW'24	Code	Collaborative Large Language Model for Recommender Systems
Arxiv'24	Code	Harnessing Large Language Models for Text-Rich Sequential Recommendation
WWW'25	None	LLM4Rerank: LLM-based Auto-Reranking Framework for Recommendations
KDD'24	Code	Bridging Items and Language: A Transition Paradigm for Large Language Model-Based Recommendation
SIGIR'24	Code	Data-efficient Fine-tuning for LLM-based Recommendation

🌓 Augmented LLM Recommenders

Augmented LLM Recommenders refer to methods that enhance LLM Recommenders by incorporating non-LLM techniques. These methods can be further categorized into Semantic Identifiers Augmentation, Collaborative Modality Augmentation, Prompts Augmentation, and Retrieve-and-Rerank.

✅ Semantic Identifiers Augmentation

Semantic Identifiers (or Semantic IDs) augmentation methods represent user or item IDs as implicit semantic sequences with the help of auxiliary coding techiques.

Venue	Code	Paper
SIGIR-AP'23	Code	How to Index Item IDs for Recommendation Foundation Models
NeurIPS'23	None	Recommender Systems with Generative Retrieval
ICDE'24	Code	Adapting Large Language Models by Integrating Collaborative Semantics for Recommendation
Arxiv'24	None	Unifying Generative and Dense Retrieval for Sequential Recommendation
CIKM'24	Code	Learnable Item Tokenization for Generative Recommendation

✅ Collaborative Modality Augmentation

Collaborative Modality Augmentation methods seek to align collaborative information with language, usually by projecting embeddings derived from traditional collaborative models into the language space.

Venue	Code	Paper
ICDE'25	Code	CoLLM: Integrating Collaborative Embeddings into Large Language Models for Recommendation
SIGIR'24	Code	LLaRA: Large Language-Recommendation Assistant
NeurIPS'24	Code	Customizing Language Models with Instance-wise LoRA for Sequential Recommendation
KDD'24	Code	Large Language Models meet Collaborative Filtering: An Efficient All-round LLM-based Recommender System
SIGIR'24	None	Integrating Large Language Models into Recommendation via Mutual Augmentation and Adaptive Aggregation

✅ Prompts Augmentation

Prompts Augmentation methods utilize non-LM techniques to improve the quality of prompts.

Venue	Code	Paper
ACM'25	Code	Reinforced Prompt Personalization for Recommendation with Large Language Models
WWW'25	Code	Collaborative Retrieval for Large Language Model-based Conversational Recommender Systems

✅ Retrieve-and-Rerank

Retrieve-and-Rerank methods first retrieve top-ranked candidates using non-LM techniques, and then apply LLMs to rerank them for final recommendation.

Venue	Code	Paper
Arxiv'23	Code	Zero-Shot Next-Item Recommendation using Large Pretrained Language Models
PGAI@CIKM'23	Code	LlamaRec: Two-Stage Recommendation using Large Language Models for Ranking
Arxiv'23	None	PALR: Personalization Aware LLMs for Recommendation

Experiment

Although many benchmarks exist for recommender systems, there remains a lack of comprehensive comparisons between Pure and Augmented LLM Recommenders under consistent, fair, and modern evaluation settings. To fill this gap, we design a unified experimental framework and use it to systematically assess the performance of both categories. The details of dataset benchmark can be referred to the paper and Benchmark Formulation. Following are the results of existing representative papers.

For results discussion, please also refer to the paper.

🌋 The Challenge of LLM Recommenders

✅ Distribution Gap between Recommendation and Language Semantics

The goal of recommendation systems is to provide accurate suggestions based on collaborative information, such as user-item interaction patterns. To achieve this, it is essential for recommenders to effectively model user underlying behavior. LLMs, trained on vast text corpora, are expected to implicitly encode some aspects of such patterns. However, recent research has shown that directly leveraging the implicit collaborative knowledge within LLMs remains a challenge.

Even with exhaustive tuning, LLM Recommenders may still be influenced by the pretrained language semantics. This can prevent LLMs from faithfully capturing the true collaborative semantics.

✅ Echo Chamber Effects

The echo chamber effect refers to a situation in which individuals are predominantly exposed to information that reinforces models’ preexisting beliefs, often due to selective exposure, algorithmic filtering, or even the underlying social biases inherent in LLMs. In recommender systems, this can result in users repeatedly receiving a narrow range of items, irrespective of their current intent.

✅ Position Bias

Position bias refers to the tendency for the perceived relevance or importance of recommended items to be influenced by their position in the prompt input list, which should ideally yield symmetric outputs under permutations. In recommendation systems, especially in zero-shot prompting scenarios, the position of the ground-truth item within the candidate set is significantly affected by this bias.

🍣 Future Direction

Cold-Start and Cross-Domain Generalizability are long-standing challenges in recommendation. LLM Recommenders offer a promising solution due to their ability to understand rich textual metadata. Current trend of LLM Recommenders tries to solve this problem. Although recent approaches tackle these challenges, opportunities for enhancement remain.

✅ Cold-Start Issue

The remaining unsolved issue 1: The conditional probabilities objective of decoder tends to overfit to items seen during training, leading to a significantly reduced capability to generate cold-start items.
The remaining unsolved issue 2: Whether incorporating collaborative signals may degrade performance as collaborative filtering based methods tends to suffer more from cold-start scenarios.

✅ Cross-Domain Generalizability

The remaining unsolved issue: This direction remains largely underexplored, with relatively few studies addressing and analyzing the issue.

Benchmark Formulation

The dataset preprocessing methods for the experiments can be reproduced by following the instructions below."

💽 Dataset Download

To avoid different random.seed mechanism in different python version or different env, we stored our used dataset for naive numerical IDs dataset and the reranking dataset in the Google Drive. Notice that one might need to preprocess other information from Dataset Preparation.

🔦 Dataset Preparation

cd prepare_dataset
sh download_data.sh
sh prepare_dataset.sh

This will create ./data/ in the main directory with corresponding downloaded dataset.

Dataset Explanation

{
  "preprocessed_*.train.json": "Training dataset for sequential recsys.",
  "preprocessed_*.valid.json": "Validation dataset for sequential recsys.",
  "preprocessed_*.test.json":  "Testing dataset for sequential recsys.",
  "preprocessed_meta_*.json":  "Filtered item meta data. (only left those in train + valid + test)",
  "preprocessed_review_*.json":  "Train + Valid + Test."
}

🔆 Naive Numerical IDs Assignment

We gave each user and item an random, unique naive numerical IDs. To preprocess it, you can do

cd prepare_dataset
sh random_hashing.sh

This will create Random_* folder with random naive numerical IDs for users and items inside ./data/.

Dataset Explanation

{
  "user_item_hash_table.json": "The table between naive numerical IDs and the original user_id or parent_asin.",
  "meta.json": "Meta data of the given item.",
  "review_*.json": "Review data for [train / valid /test] scope.",
  "review.json":  "All review data (train + valid + test).",
}

💡 Other Recommendation Tasks

Besides for sequential recommendation, other recommendation task includes reranking, binary, rating, explanation, conversational and etc. We further provided the dataset setup for reranking task.

✅ Dataset for Reranking Task

The reranking task aims to recommend items from a set of candidates. For LLM Recommenders, number of candidate set is usually set to n items, with 1 positive item and n-1 non-interacted random negative items. In the construction, we set n=20.

cd prepare_dataset && sh prepare_random_negative.sh

Dataset Explanation

{
  "random_numerical or original_ids": "The key is the user and the corresponding list is the items candidate pool.",
  "label_random_numerical or label_original_ids": "The key is the user and the corresponding value is the positive item."
}

📗 Citations

If you find our survey and this repository beneficial for your research, please kindly cite our paper.

@misc{huang2025augmentnotcomparativestudy,
      title={Augment or Not? A Comparative Study of Pure and Augmented Large Language Model Recommenders},
      author={Wei-Hsiang Huang and Chen-Wei Ke and Wei-Ning Chiu and Yu-Xuan Su and Chun-Chun Yang and Chieh-Yuan Cheng and Yun-Nung Chen and Pu-Jen Cheng},
      year={2025},
      eprint={2505.23053},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2505.23053},
}

Name		Name	Last commit message	Last commit date
Latest commit History 211 Commits
img		img
prepare_dataset		prepare_dataset
.gitignore		.gitignore
README.md		README.md

MiuLab/LMRec-Survey

Folders and files

Latest commit

History

Repository files navigation