- UniDM: A Unified Framework for Data Manipulation with Large Language Models. Yichen Qian, Yongyi He, Rong Zhu, Jintao Huang, Zhijan Ma, Haibin Wang, Yaohua Wang, Xiuyu Sun, Defu Lian, Bolin Ding, Jingren Zhou. Arxiv Link
- Reimagining LLM-Powered Unstructured Data Analysis with DocETL. Shreya Shankar, Aditya Parameswaran, Eugene Wu. Blog Post (Paper coming soon).
- DataChain. GitHub Repo
- Curator. GitHub Repo
- HuggingFace Synthetic Data Generator. Blog Post
- Surveying the effects of quality, diversity, and complexity in synthetic data from large language models. Alex Havrilla, Andrew Dai, Laura O'Mahony, Koen Oostermeijer, Vera Zisler, Alon Albalak, Fabrizio Milo, Sharath Chandra Raparthy, Kanishk Gandhi, Baber Abbasi, Duy Phung, Maia Iyer, Dakota Mahan, Chase Biagden, Srishti Gureja, Mohammed Hamdy, Wen-Ding Li, Giovanni Paolini, Pawan Sasanka Ammanamanchi, Eliot Meyerson. 2024. Arxiv Link
- A Survey on Data Synthesis and Augmentation for Large Language Models. Wang et al. 2024. Arxiv Link
- A Survey of Data Synthesis Approaches. Chang et al. 2024. Arxiv Link
- Efficient Guided Generation for Large Language Models. Brandon T. Willard, Remi Louf. Arxiv Link
- Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models. Zhi Rui Tam, Cheng-Kuang Wu, Yi-Lin Tsai, Chieh-Yen Lin, Hung-yi Lee, Yun-Nung Chen. Arxiv Link
- Efficient Memory Management for Large Language Model Serving with PagedAttention. Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, Ion Stoica. Arxiv Link
- Building DoorDash's product knowledge graph with large language models. Steven Xu, Sree Chaitanya Vadrevu. Blog Post
- DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback. Zaid Khan, Elias Stengel-Eskin, Jaemin Cho, and Mohit Bansal. Arxiv Link
- Gorilla: Large Language Model Connected with Massive APIs. Shishir G. Patil, Tianjun Zhang, Xin Wang, Joseph E. Gonzalez. 2023. Arxiv Link
- Agentic Information Retrieval. Weinan Zhang, Junwei Liao, Ning Li, Kounianhua Du. Arxiv Link
- Speculations on Test-Time Scaling. Sasha Rush, Daniel Ritter. YouTube Link
- AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation. Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W. White, Doug Burger, Chi Wang. 2023. Arxiv Link
- Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise. Rose E. Wang, Ana T. Ribeiro, Carly D. Robinson, Susanna Loeb, and Dora Demsky. 2024. Arxiv Link.
Classifies 550,000 tutor messages in strategy categories such as "Prompt Student to Explain" or "Ask Question to Guide Thinking".