This repository provides a curated list of datasets for research in the food domain, covering recipes, ingredients, food ordering, nutritional information, flavor data, and user interactions. These datasets are valuable for applications in machine learning, AI, and computational gastronomy.
-
Description:
Recipe1M+ is a large-scale, multimodal dataset containing over 1 million structured cooking recipes and 13 million food images. It was designed for cross-modal research, particularly learning embeddings that integrate recipe text (ingredients and instructions) with food images.
Expands upon Recipe1M by significantly increasing the number of images via web searches. -
π Paper: Link to Paper
π Citation: MarΓn, J., Biswas, A., Ofli, F., et al. (2019). Recipe1M+: A Dataset for Learning Cross-Modal Embeddings for Cooking Recipes and Food Images. IEEE TPAMI. -
π Dataset: Link to Dataset
- Image Data: βοΈ
- Nutritional Data: βοΈ (50,637 recipes with mapped details from the USDA database)
- Flavor or Taste Data: β
- Recipe Data: βοΈ
- Title
- List of ingredients (with quantities and units parsed)
- Instructions
- Course labels
-
Description:
RecipeNLG is a dataset with 2.2 million recipes, designed for semi-structured text generation tasks like recipe generation. It leverages Named Entity Recognition (NER) to extract food entities. -
π Paper: Link to Paper
π Citation: BieΕ, M., Gilski, M., Maciejewska, M., et al. (2020). RecipeNLG: A Cooking Recipes Dataset for Semi-Structured Text Generation. -
π Dataset: Link to Dataset
- Image Data: β
- Nutritional Data: β
- Flavor or Taste Data: β
- Recipe Data: βοΈ
- Title
- List of ingredients (with quantities and units)
- Step-by-step instructions
-
Description:
A multilingual, visio-linguistic dataset containing 2.8 million images and 9.5 million text samples spanning 37 countries and 33 languages. Ideal for text-image retrieval and conditional image generation. -
π Paper: Link to Paper
π Citation: Amat Olondriz, D., Palau Puigdevall, P., Salvador Palau, A. -
π Dataset: Link to Dataset
- Image Data: βοΈ (2.8M images)
- Textual Data: βοΈ (product names, descriptions)
- Nutritional Data: β
- Flavor or Taste Data: β
-
Description:
A dataset containing over 1 million user-recipe interactions from 2000 to 2018, featuring 52,821 recipes across 27 categories. -
π Paper: Link to Paper
π Citation: Gao, Xiaoyan, et al. "Hierarchical Attention Network for Visually-aware Food Recommendation." -
π Dataset: Link to Dataset
- Image Data: βοΈ
- Nutritional Data: β
- Flavor or Taste Data: β
- Recipe Data: βοΈ
- Title
- List of ingredients
- Instructions
- Recipe category
-
Description:
Contains 230,000 recipes and 1.1 million user-recipe interactions (reviews) spanning 18 years. Useful for personalized recipe recommendation. -
π Paper: Link to Paper
π Citation: Bodhisattwa Prasad Majumder, et al. (2019). -
π Dataset: Link to Dataset
- Image Data: β
- Nutritional Data: β
- Flavor or Taste Data: β
- Recipe Data: βοΈ
- Title
- Ingredients
- Tags
- Step-by-step instructions
-
Description:
A database of 25,595 flavor molecules and 936 ingredients, providing molecular flavor profiles. -
π Paper: Kumar et al. (2022)
π Citation: Goel M, Grover N, et al. (2024). FlavorDB2: An updated database of flavor molecules. -
π Dataset: Link to Dataset
- Image Data: β
- Nutritional Data: β
- Flavor or Taste Data: βοΈ
- Molecular flavor profiles
-
Description:
A collection of synthetic and human-generated food order examples for building task-oriented systems. -
π Paper: Link to Paper
π Citation: Rubino, M., Guenon des Mesnards, N., et al. (2022). -
π Dataset: Link to Dataset
- Image Data: β
- Nutritional Data: β
- Flavor or Taste Data: β
- Textual Data: βοΈ (includes food orders with semantic parsing annotations)
-
Description:
A structured database of 118,171 recipes integrating nutritional profiles and flavor molecules. -
π Paper: Sethi et al. (2020)
π Citation: Devansh Batra, Nirav Diwan, et al. (2020). -
π Dataset: Link to Dataset
- Image Data: β
- Nutritional Data: βοΈ
- Flavor or Taste Data: βοΈ
- Recipe Data: βοΈ
-
Description:
The HUMMUS dataset combines 507,335 recipes and 1.9 million user-recipe interactions, enriched with health-awareness metrics. -
π Paper: Link to Paper
π Citation: Felix BΓΆlz, et al. (2023). -
π Dataset: Link to Dataset
- Image Data: β
- Nutritional Data: βοΈ
- Flavor or Taste Data: β
- Recipe Data: βοΈ