Awesome Generative AI Data Scientist

The Future is using AI and ML Together

🚀🚀 100 Free Resources On Generative AI for Data Scientists

A curated list of 100+ resources to help you become a Generative AI Data Scientist. This repository includes resources on building GenAI Data Science applications with Large Language Models (LLMs) and deploying LLMs and Generative AI/ML with Cloud-based solutions.

Please ⭐ us on GitHub (it takes 2 seconds and means a lot).

Awesome Real-World AI Use Cases

🚀🚀 AI Data Science Team In Python: An AI-powered data science team of copilots that uses agents to help you perform common data science tasks 10X faster. Examples | Github
🚀 Awesome LLM Apps: LLM RAG AI Apps with Step-By-Step Tutorials
AI Hedge Fund: Proof of concept for an AI-powered hedge fund
AI Financial Agent: A financial agent for investment research
Strutured Report Generation (LangGraph): How to build an agent that can orchestrate the end-to-end process of report planning, web research, and writing. We show that this agent can produce reports of varying and easily configurable format. Video | Blog | Code
Uber QueryGPT: Uber's QueryGPT uses large language models (LLM), vector databases, and similarity search to generate complex queries from English (Natural Language) questions that are provided by the user as input. The tool enhances the productivity of engineers, operations managers, and data scientists at Uber.
Nir Diamant GenAI Agents: Tutorials and implementations for various Generative AI Agent techniques, from basic to advanced. It serves as a comprehensive guide for building intelligent, interactive AI systems. GitHub

Python Libraries

Data Science And AI Agents

🚀🚀 AI Data Science Team In Python: AI Agents to help you perform common data science tasks 10X faster. Examples | Github
🚀 PandasAI: Open Source AI Agents for Data Analysis. Documentation | Github

Coding Agents

Qwen-Agent: A framework for developing LLM applications based on the instruction following, tool usage, planning, and memory capabilities of Qwen. It also comes with example applications such as Browser Assistant, Code Interpreter, and Custom Assistant. Documentation | Examples | Github

AI Frameworks (Build Your Own)

LangChain: A framework for developing applications powered by large language models (LLMs). Documentation | Github Cookbook
LangGraph: A library for building stateful, multi-actor applications with LLMs, used to create agent and multi-agent workflows. Documentation Tutorials
LangSmith: LangSmith is a platform for building production-grade LLM applications. It allows you to closely monitor and evaluate your application, so you can quickly and confidently ship. Documentation | Github
LlamaIndex: LlamaIndex is a framework for building context-augmented generative AI applications with LLMs. Documentation | Github
LlamaIndex Workflows: LlamaIndex workflows is a mechanism for orchestrating actions in the increasingly-complex AI application we see our users building.
CrewAI: Streamline workflows across industries with powerful AI agents. Documentation | Github
AutoGen: Microsoft's programming framework for agentic AI.
Pydantic AI: Python agent framework designed to make building production-grade applications with Generative AI less painful. Github
ControlFlow: Prefect's Python framework for building agentic AI workflows. Documentation | Github

AI Frameworks (Drag and Drop)

LangGraph Studio: IDE that enables visualization, interaction, and debugging of complex agentic applications
Langflow: A low-code tool that makes building powerful AI agents and workflows that can use any API, model, or database easier. Documentation | Github
Pyspur: Graph-Based Editor for LLM Workflows Documentation | Github

LLM Providers

OpenAI: The official Python library for the OpenAI API
Hugging Face Models: Open LLM models by Meta, Mistral, and hundreds of other providers
Anthropic Claude: The official Python library for the Anthropic API
Meta Llama Models: The open source AI model you can fine-tune, distill and deploy anywhere.
Google Gemini: The official Python library for the Google Gemini API
Ollama: Get up and running with large language models locally.
Grok: The official Python Library for the Groq API

LangChain Platform

LangChain: A framework for developing applications powered by large language models (LLMs). Documentation | Github Cookbook
LangGraph: A library for building stateful, multi-actor applications with LLMs, used to create agent and multi-agent workflows. Documentation Tutorials
LangSmith: LangSmith is a platform for building production-grade LLM applications. It allows you to closely monitor and evaluate your application, so you can quickly and confidently ship. Documentation | Github

Huggingface Platform

Huggingface: An open-source platform for machine learning (ML) and artificial intelligence (AI) tools and models. Documentation
Transformers: Transformers provides APIs and tools to easily download and train state-of-the-art pretrained models.
Tokenizers: Tokenizers provides an implementation of today’s most used tokenizers, with a focus on performance and versatility [Documentation] | Github
Sentence Transformers: Sentence Transformers (a.k.a. SBERT) is the go-to Python module for accessing, using, and training state-of-the-art text and image embedding models.
smolagents: The simplest framework out there to build powerful agents Documentation | Github

Vector Databases (RAG)

ChromaDB: The fastest way to build Python or JavaScript LLM apps with memory!
FAISS: A library for efficient similarity search and clustering of dense vectors.
Qdrant: High-Performance Vector Search at Scale
Pinecone: The official Pinecone Python SDK.
Milvus: Milvus is an open-source vector database built to power embedding similarity search and AI applications.

Pretraining

PyTorch: PyTorch is an open-source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing.
TensorFlow: TensorFlow is an open-source machine learning library developed by Google.
JAX: Google’s library for high-performance computing and automatic differentiation.
tinygrad: A minimalistic deep learning library with a focus on simplicity and educational use, created by George Hotz.
micrograd: A simple, lightweight autograd engine for educational purposes, created by Andrej Karpathy.

Fine-tuning

Transformers: Hugging Face Transformers is a popular library for Natural Language Processing (NLP) tasks, including fine-tuning large language models.
Unsloth: Finetune Llama 3.2, Mistral, Phi-3.5 & Gemma 2-5x faster with 80% less memory!
LitGPT: 20+ high-performance LLMs with recipes to pretrain, finetune, and deploy at scale.
AutoTrain: No code fine-tuning of LLMs and other machine learning tasks.

Testing and Monitoring (Observability)

LangSmith: LangSmith is a platform for building production-grade LLM applications. It allows you to closely monitor and evaluate your application, so you can quickly and confidently ship. Documentation | Github
Opik: Opik is an open-source platform for evaluating, testing and monitoring LLM applications
MLflow Tracing and Evaluation: MLflow has a suite of features for LLMs. MLflow LLM Documentation | Model Tracing | Model Evaluation | GitHub

Document Parsing

LangChain Document Loaders: LangChain has hundreds of integrations with various data sources to load data from: Slack, Notion, Google Drive, etc.
Embedchain: Create an AI app on your own data in a minute Documentation Github Repo
Docling by IBM: Parse documents and export them to the desired format with ease and speed. Github
Markitdown by Microsoft: Python tool for converting files and office documents to Markdown.

Web Parsing (HTML) and Crawlers

Gitingest: Turn any Git repository into a simple text ingest of its codebase. This is useful for feeding a codebase into any LLM. Github
Crawl4AI: Open-source, blazing-fast, AI-ready web crawling tailored for LLMs, AI agents, and data pipelines. Documentation | Github
GPT Crawler: Crawl a site to generate knowledge files to create your own custom GPT from a URL. Documentation | Github

Agents and Tools (Build Your Own)

LangChain Agents: Build agents with LangChain.
LangChain Tools: Integrate Tools (Function Calling) with LangChain.
smolagents: The simplest framework out there to build powerful agents Documentation | Github
Agentarium: open-source framework for creating and managing simulations populated with AI-powered agents. It provides an intuitive platform for designing complex, interactive environments where agents can act, learn, and evolve. GitHub

Agents and Tools (Prebuilt)

Phidata: An open-source platform to build, ship and monitor agentic systems. Documentation | Github
Composio: Integration Platform for AI Agents & LLMs (works with LangChain, CrewAI, etc). Documentation | Github

LLM Memory

Mem0: Mem0 is a self-improving memory layer for LLM applications, enabling personalized AI experiences that save costs and delight users. Documentation | Github
Memary: Open Source Memory Layer For Autonomous Agents

LLMOps

MLflow:
Agenta: Open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM Observability all in one place. Documentation
LLMOps: Best practices designed to support your LLMOps initiatives
Helicone: Open-source LLM observability platform for developers to monitor, debug, and improve production-ready applications. Documentation | Github

Code Sandbox (Security)

E2B: E2B is an open-source runtime for executing AI-generated code in secure cloud sandboxes. Made for agentic & AI use cases. Documentation | Github

Miscellaneous

AI Suite: Simple, unified interface to multiple Generative AI providers.
AdalFlow: The library to build & auto-optimize LLM applications, from Chatbot, RAG, to Agent by SylphAI.
dspy: DSPy: The framework for programming—not prompting—foundation models.
AutoPrompt: A framework for prompt tuning using Intent-based Prompt Calibration.
PromptFify: A library for prompt engineering that simplifies NLP tasks (e.g., NER, classification) using LLMs like GPT.
LiteLLM: Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format.
Jupyter Agent: Let a LLM agent write and execute code inside a notebook
Jupyter AI: A generative AI extension for JupyterLab Documentation
Browser-Use: Make websites accessible for AI agents
AI Agent Service Toolkit: Full toolkit for running an AI agent service built with LangGraph, FastAPI and Streamlit App | GitHub

LLM Deployment (Cloud Services)

AWS Bedrock: Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon
Microsoft Azure AI Services: Azure AI services help developers and organizations rapidly create intelligent, cutting-edge, market-ready, and responsible applications with out-of-the-box and prebuilt and customizable APIs and models.
Google Vertex AI: Vertex AI is a fully-managed, unified AI development platform for building and using generative AI.
NVIDIA NIM: NVIDIA NIM™, part of NVIDIA AI Enterprise, provides containers to self-host GPU-accelerated inferencing microservices for pretrained and customized AI models across clouds, data centers, and workstations.

Examples and Cookbooks

Building AI

LangChain Cookbook: Example code for building applications with LangChain, with an emphasis on more applied and end-to-end examples.
LangGraph Examples: Example code for building applications with LangGraph
Llama Index Examples: Example code for building applications with Llama Index
Streamlit LLM Examples: Streamlit LLM app examples for getting started

Deploying AI

Amazon Web Services (AWS)

Azure Generative AI Examples: Prompt Flow and RAG Examples for use with the Microsoft Azure Cloud platform
Amazon Bedrock Workshop: Introduces how to leverage foundation models (FMs) through Amazon Bedrock

Microsoft Azure

Microsoft Generative AI for Beginners 21 Lessons teaching everything you need to know to start building Generative AI applications Github
Microsoft Intro to Generative AI Course

Google Cloud Platform (GCP)

Google Vertex AI Examples: Notebooks, code samples, sample apps, and other resources that demonstrate how to use, develop and manage machine learning and generative AI workflows using Google Cloud Vertex AI
Google Generative AI Examples: Sample code and notebooks for Generative AI on Google Cloud, with Gemini on Vertex AI

NVIDIA

NVIDIA NIM Anywhere: An entry point for developing with NIMs that natively scales out to full-sized labs and up to production environments.
NVIDIA NIM Deploy: Reference implementations, example documents, and architecture guides that can be used as a starting point to deploy multiple NIMs and other NVIDIA microservices into Kubernetes and other production deployment environments.

Newsletters

Python AI/ML Tips: Free newsletter on Generative AI and Data Science.
unwind ai: Latest AI news, tools, and tutorials for AI Developers

Courses and Training

Free Training

Generative AI Data Scientist Workshops Get free training on how to build and deploy Generative AI / ML Solutions. Register for the next free workshop here.

Paid Courses

8-Week AI Bootcamp To Become A Generative AI-Data Scientist: Focused on helping you become a Generative AI Data Scientist. Learn How To Build and Deploy AI-Powered Data Science Solutions using LangChain, LangGraph, Pandas, Scikit Learn, Streamlit, AWS, Bedrock, and EC2.

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
README.md		README.md

business-science/awesome-generative-ai-data-scientist

Folders and files

Latest commit

History

Repository files navigation