JAVince · JAVince · Dec 7, 2025 · Dec 7, 2025 · cursor · Dec 7, 2025
diff --git a/README.md b/README.md
@@ -5,7 +5,7 @@ A web application that converts natural language queries to SQL using AI, built
 ## Features
 
 - 🗣️ Natural language to SQL conversion using OpenAI or Anthropic
-- 📁 Drag-and-drop file upload (.csv and .json)
+- 📁 Drag-and-drop file upload (.csv, .json, and .jsonl)
 - 📊 Interactive table results display
 - 🔒 SQL injection protection
 - ⚡ Fast development with Vite and uv
@@ -59,6 +59,7 @@ Use the provided script to start both services:
 Press `Ctrl+C` to stop both services.
 
 The script will:
+
 - Check that `.env` exists in `app/server/`
 - Start the backend on http://localhost:8000
 - Start the frontend on http://localhost:5173
@@ -67,13 +68,15 @@ The script will:
 ## Manual Start (Alternative)
 
 ### Backend
+
 ```bash
 cd app/server
 # .env is loaded automatically by python-dotenv
 uv run python server.py
 ```
 
 ### Frontend
+
 ```bash
 cd app/client
 npm run dev
@@ -83,7 +86,7 @@ npm run dev
 
 1. **Upload Data**: Click "Upload Data" to open the modal
    - Use sample data buttons for quick testing
-   - Or drag and drop your own .csv or .json files
+   - Or drag and drop your own .csv, .json, or .jsonl files
    - Uploading a file with the same name will overwrite the existing table
 2. **Query Your Data**: Type a natural language query like "Show me all users who signed up last week"
    - Press `Cmd+Enter` (Mac) or `Ctrl+Enter` (Windows/Linux) to run the query
@@ -93,6 +96,7 @@ npm run dev
 ## Development
 
 ### Backend Commands
+
 ```bash
 cd app/server
 uv run python server.py      # Start server with hot reload
@@ -103,6 +107,7 @@ uv sync --all-extras        # Sync all extras
 ```
 
 ### Frontend Commands
+
 ```bash
 cd app/client
 npm run dev                 # Start dev server
@@ -135,7 +140,7 @@ npm run preview            # Preview production build
 
 ## API Endpoints
 
-- `POST /api/upload` - Upload CSV/JSON file
+- `POST /api/upload` - Upload CSV/JSON/JSONL file
 - `POST /api/query` - Process natural language query
 - `GET /api/schema` - Get database schema
 - `POST /api/insights` - Generate column insights
@@ -148,18 +153,21 @@ npm run preview            # Preview production build
 The application implements comprehensive SQL injection protection through multiple layers:
 
 1. **Centralized Security Module** (`core/sql_security.py`):
+
    - Identifier validation for table and column names
    - Safe query execution with parameterized queries
    - Proper escaping for identifiers using SQLite's square bracket notation
    - Dangerous operation detection and blocking
 
 2. **Input Validation**:
+
    - All table and column names are validated against a whitelist pattern
    - SQL keywords cannot be used as identifiers
    - File names are sanitized before creating tables
    - User queries are validated for dangerous operations
 
 3. **Query Execution Safety**:
+
    - Parameterized queries used wherever possible
    - Identifiers (table/column names) are properly escaped
    - Multiple statement execution is blocked
@@ -174,6 +182,7 @@ The application implements comprehensive SQL injection protection through multip
 ### Security Best Practices for Development
 
 When adding new SQL functionality:
+
 1. Always use the `sql_security` module functions
 2. Never concatenate user input directly into SQL strings
 3. Use `execute_query_safely()` for all database operations
@@ -183,29 +192,32 @@ When adding new SQL functionality:
 ### Testing Security
 
 Run the comprehensive security tests:
+
 ```bash
 cd app/server
 uv run pytest tests/test_sql_injection.py -v
 ```
 
-
 ### Additional Security Features
 
 - CORS configured for local development only
-- File upload validation (CSV and JSON only)
+- File upload validation (CSV, JSON, and JSONL only)
 - Comprehensive error logging without exposing sensitive data
 - Database operations are isolated with proper connection handling
 
 ## Troubleshooting
 
 **Backend won't start:**
+
 - Check Python version: `python --version` (requires 3.12+)
 - Verify API keys are set: `echo $OPENAI_API_KEY`
 
 **Frontend errors:**
+
 - Clear node_modules: `rm -rf node_modules && npm install`
 - Check Node version: `node --version` (requires 18+)
 
 **CORS issues:**
+
 - Ensure backend is running on port 8000
-- Check vite.config.ts proxy settings
+- Check vite.config.ts proxy settings
diff --git a/app/client/package-lock.json b/app/client/package-lock.json
diff --git a/app/server/app/client/package-lock.json b/app/server/app/client/package-lock.json
diff --git a/app/server/core/constants.py b/app/server/core/constants.py
@@ -0,0 +1,14 @@
+"""
+Configuration constants for the application.
+
+This module contains reusable constants used across the application,
+particularly for file processing and data transformation operations.
+"""
+
+# Delimiter used when flattening nested JSON objects into flat column names
+# Example: {"user": {"name": "John"}} becomes {"user__name": "John"}
+NESTED_FIELD_DELIMITER = "__"
+
+# Delimiter used when creating column names for array indices
+# Example: {"tags": ["python", "api"]} becomes {"tags_0": "python", "tags_1": "api"}
+ARRAY_INDEX_DELIMITER = "_"
diff --git a/app/server/core/file_processor.py b/app/server/core/file_processor.py
@@ -3,12 +3,13 @@
 import sqlite3
 import io
 import re
-from typing import Dict, Any, List
+from typing import Dict, Any, List, Set
 from .sql_security import (
     execute_query_safely,
     validate_identifier,
     SQLSecurityError
 )
+from .constants import NESTED_FIELD_DELIMITER, ARRAY_INDEX_DELIMITER
 
 def sanitize_table_name(table_name: str) -> str:
     """
@@ -171,4 +172,183 @@ def convert_json_to_sqlite(json_content: bytes, table_name: str) -> Dict[str, An
         }
 
     except Exception as e:
-        raise Exception(f"Error converting JSON to SQLite: {str(e)}")
+        raise Exception(f"Error converting JSON to SQLite: {str(e)}")
+
+def flatten_json_record(obj: Any, parent_key: str = "") -> Dict[str, Any]:
+    """
+    Recursively flatten a nested JSON object into a flat dictionary.
+
+    - Nested dictionaries are flattened using NESTED_FIELD_DELIMITER (e.g., "user__name")
+    - Nested lists are flattened using ARRAY_INDEX_DELIMITER with index notation (e.g., "tags_0", "tags_1")
+    - Primitive values (strings, numbers, booleans, None) are kept as-is
+
+    Args:
+        obj: The object to flatten (dict, list, or primitive value)
+        parent_key: The parent key path (used for recursion)
+
+    Returns:
+        A flat dictionary with concatenated keys
+    """
+    items = {}
+
+    if isinstance(obj, dict):
+        # Handle nested dictionaries
+        for key, value in obj.items():
+            new_key = f"{parent_key}{NESTED_FIELD_DELIMITER}{key}" if parent_key else key
+            # Recursively flatten
+            flattened = flatten_json_record(value, new_key)
+            items.update(flattened)
+
+    elif isinstance(obj, list):
+        # Handle nested lists with index notation
+        for idx, item in enumerate(obj):
+            new_key = f"{parent_key}{ARRAY_INDEX_DELIMITER}{idx}"
+            # Recursively flatten each list item
+            flattened = flatten_json_record(item, new_key)
+            items.update(flattened)
+
+    else:
+        # Base case: primitive value (string, number, boolean, None)
+        items[parent_key] = obj
+
+    return items
+
+def discover_jsonl_schema(jsonl_content: bytes) -> Set[str]:
+    """
+    Scan through entire JSONL file to discover all possible field names.
+    This handles schema evolution where different records may have different fields.
+
+    Args:
+        jsonl_content: The raw JSONL file content as bytes
+
+    Returns:
+        A set of all unique flattened field names found across all records
+
+    Raises:
+        ValueError: If no valid JSON records are found or if parsing fails
+    """
+    all_fields = set()
+    lines = jsonl_content.decode('utf-8').strip().split('\n')
+    valid_records = 0
+
+    for line_num, line in enumerate(lines, 1):
+        line = line.strip()
+        if not line:
+            continue
+
+        try:
+            record = json.loads(line)
+            flattened = flatten_json_record(record)
+            all_fields.update(flattened.keys())
+            valid_records += 1
+        except json.JSONDecodeError as e:
+            raise ValueError(f"Invalid JSON on line {line_num}: {str(e)}")
+
+    if valid_records == 0:
+        raise ValueError("No valid JSON records found in JSONL file")
+
+    return all_fields
+
+def convert_jsonl_to_sqlite(jsonl_content: bytes, table_name: str) -> Dict[str, Any]:
+    """
+    Convert JSONL (JSON Lines) file content to SQLite table.
+
+    JSONL files contain one JSON object per line. This function:
+    1. Discovers all possible fields across all records (handles schema evolution)
+    2. Flattens nested structures using configurable delimiters
+    3. Creates a pandas DataFrame with all discovered columns
+    4. Writes to SQLite database
+
+    Args:
+        jsonl_content: The raw JSONL file content as bytes
+        table_name: The desired name for the SQLite table
+
+    Returns:
+        Dictionary containing:
+        - table_name: The sanitized table name
+        - schema: Dictionary mapping column names to SQLite types
+        - row_count: Number of rows inserted
+        - sample_data: List of sample records (up to 5)
+
+    Raises:
+        Exception: If parsing or database operations fail
+    """
+    try:
+        # Sanitize table name
+        table_name = sanitize_table_name(table_name)
+
+        # First pass: Discover all possible fields across all records
+        all_fields = discover_jsonl_schema(jsonl_content)
+
+        # Second pass: Parse and flatten all records
+        records = []
+        lines = jsonl_content.decode('utf-8').strip().split('\n')
+
+        for line in lines:
+            line = line.strip()
+            if not line:
+                continue
+
+            record = json.loads(line)
+            flattened = flatten_json_record(record)
+
+            # Ensure all discovered fields are present (fill missing with None)
+            complete_record = {field: flattened.get(field) for field in all_fields}
+            records.append(complete_record)
+
+        if not records:
+            raise ValueError("No valid records found in JSONL file")
+
+        # Convert to pandas DataFrame
+        df = pd.DataFrame(records)
+
+        # Clean column names (lowercase, replace spaces/dashes with underscores)
+        df.columns = [col.lower().replace(' ', '_').replace('-', '_') for col in df.columns]
+
+        # Connect to SQLite database
+        conn = sqlite3.connect("db/database.db")
+
+        # Write DataFrame to SQLite
+        df.to_sql(table_name, conn, if_exists='replace', index=False)
+
+        # Get schema information using safe query execution
+        cursor_info = execute_query_safely(
+            conn,
+            "PRAGMA table_info({table})",
+            identifier_params={'table': table_name}
+        )
+        columns_info = cursor_info.fetchall()
+
+        schema = {}
+        for col in columns_info:
+            schema[col[1]] = col[2]  # column_name: data_type
+
+        # Get sample data using safe query execution
+        cursor_sample = execute_query_safely(
+            conn,
+            "SELECT * FROM {table} LIMIT 5",
+            identifier_params={'table': table_name}
+        )
+        sample_rows = cursor_sample.fetchall()
+        column_names = [col[1] for col in columns_info]
+        sample_data = [dict(zip(column_names, row)) for row in sample_rows]
+
+        # Get row count using safe query execution
+        cursor_count = execute_query_safely(
+            conn,
+            "SELECT COUNT(*) FROM {table}",
+            identifier_params={'table': table_name}
+        )
+        row_count = cursor_count.fetchone()[0]
+
+        conn.close()
+
+        return {
+            'table_name': table_name,
+            'schema': schema,
+            'row_count': row_count,
+            'sample_data': sample_data
+        }
+
+    except Exception as e:
+        raise Exception(f"Error converting JSONL to SQLite: {str(e)}")