Automated Generic EDA as LLM Feed

This project provides an Automated Exploratory Data Analysis (EDA) pipeline that generates detailed insights into a dataset using natural language outputs. The pipeline leverages Large Language Models (LLMs) to interpret and present statistical summaries, visualizations, and key findings from structured datasets.

Features

Data Summary: Automatically generates descriptive statistics for numerical, categorical, and mixed datasets.
Outlier Detection: Identifies potential outliers using statistical methods.
Data Cleaning Suggestions: Highlights missing values, duplicates, and inconsistencies, and suggests preprocessing steps.
Correlation Analysis: Computes correlation metrics and highlights significant relationships between variables.
Automated Visualizations: Generates relevant visualizations (e.g., histograms, scatter plots, heatmaps) to support the findings.
LLM Integration: Translates technical EDA results into human-readable summaries and business insights.

In this script:

The load_dataset function loads the dataset from the given path.
The perform_detailed_textual_eda function performs the EDA and extracts relevant data values into a dictionary.
The generate_vector_embeddings function generates vector embeddings for the extracted values using a pre-trained sentence-transformers model.

Architecture

Input: Upload a structured dataset (CSV, Excel, etc.).
EDA Processing:
- Data profiling
- Statistical computations
- Visualization generation
LLM Feed: Processed EDA results are converted into natural language summaries using an LLM.
Output: Detailed EDA report as text, images, or PDF.

Installation

Clone the repository:

git clone https://github.com/your-username/automated-generic-eda-llm-feed.git
cd automated-generic-eda-llm-feed

Install dependencies:
```
pip install -r requirements.txt
```
Set up your LLM API credentials (e.g., OpenAI API or other providers). Update config.yaml or environment variables with your API key.

Dependencies

Python 3.8+
Libraries:
    pandas
    numpy
    matplotlib
    seaborn
    scikit-learn
    openai (or another LLM library)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Automated Generic EDA as LLM Feed

Features

Architecture

Installation

Dependencies

Files

README.md

Latest commit

History

README.md

File metadata and controls

Automated Generic EDA as LLM Feed

Features

Architecture

Installation

Dependencies