From ff7461c2e538ca028270216a0d7bdf4b43d3ba66 Mon Sep 17 00:00:00 2001 From: Yiheng Tao Date: Tue, 15 Jul 2025 08:40:51 -0700 Subject: [PATCH] feat: Add DataFrame web interface specification - Add comprehensive requirements document with 7 user stories covering DataFrame viewing, manipulation, and management - Include detailed technical design with REST API endpoints, web components, and integration architecture - Provide complete implementation plan with 19 tasks for building web interface - Supports DataFrame dashboard, interactive operations, file upload, and export functionality - Integrates with existing MCP server infrastructure and dataframe service tools Addresses need for web-based DataFrame management and analysis capabilities. --- .kiro/specs/dataframe-web-interface/design.md | 330 ++++++++++++++++++ .../dataframe-web-interface/requirements.md | 94 +++++ .kiro/specs/dataframe-web-interface/tasks.md | 169 +++++++++ 3 files changed, 593 insertions(+) create mode 100644 .kiro/specs/dataframe-web-interface/design.md create mode 100644 .kiro/specs/dataframe-web-interface/requirements.md create mode 100644 .kiro/specs/dataframe-web-interface/tasks.md diff --git a/.kiro/specs/dataframe-web-interface/design.md b/.kiro/specs/dataframe-web-interface/design.md new file mode 100644 index 0000000..c8aed83 --- /dev/null +++ b/.kiro/specs/dataframe-web-interface/design.md @@ -0,0 +1,330 @@ +# Design Document + +## Overview + +The DataFrame Web Interface will extend the existing MCP server with a comprehensive web-based dashboard for managing and manipulating DataFrames. The interface will integrate seamlessly with the current server architecture, providing both viewing and interactive manipulation capabilities through a modern web UI. + +The design leverages the existing dataframe service tool and dataframe manager infrastructure, adding a new web layer that exposes these capabilities through RESTful APIs and an intuitive user interface. + +## Architecture + +### High-Level Architecture + +```mermaid +graph TB + subgraph "Web Layer" + UI[DataFrame Web Interface] + API[DataFrame API Endpoints] + end + + subgraph "Existing MCP Server" + Server[Starlette Server] + Routes[API Routes] + Templates[Jinja2 Templates] + end + + subgraph "DataFrame Services" + DFTool[DataFrame Service Tool] + DFManager[DataFrame Manager] + Storage[DataFrame Storage] + end + + subgraph "Data Sources" + Files[Local Files] + URLs[Remote URLs] + Uploads[File Uploads] + end + + UI --> API + API --> Server + Server --> Routes + Routes --> DFTool + DFTool --> DFManager + DFManager --> Storage + + Files --> DFTool + URLs --> DFTool + Uploads --> API +``` + +### Component Integration + +The new DataFrame web interface will integrate with existing components: + +- **Server Integration**: New routes added to the existing Starlette server +- **Template System**: New templates using the existing Jinja2 setup and base.html +- **API Pattern**: Following the established API pattern in `server/api/` +- **Navigation**: Adding DataFrame link to the existing navigation bar + +## Components and Interfaces + +### 1. Web Interface Components + +#### DataFrame Dashboard (`dataframes.html`) +- **Purpose**: Main dashboard showing all stored DataFrames +- **Features**: + - Tabular list of DataFrames with metadata + - Storage statistics summary + - Quick action buttons (delete, refresh, cleanup) + - Visual indicators for expired DataFrames + +#### DataFrame Viewer (`dataframe_detail.html`) +- **Purpose**: Detailed view and manipulation interface for individual DataFrames +- **Features**: + - Data preview with pagination + - Interactive operation panel + - Pandas expression executor + - Export functionality + - Filtering and sorting controls + +#### Data Upload Modal +- **Purpose**: Interface for loading new data +- **Features**: + - File upload with drag-and-drop + - URL input for remote data + - Format selection and options + - Progress indication + +### 2. API Endpoints + +#### DataFrame Management API (`server/api/dataframes.py`) + +```python +# Core DataFrame operations +GET /api/dataframes # List all DataFrames +GET /api/dataframes/{df_id} # Get DataFrame details +DELETE /api/dataframes/{df_id} # Delete DataFrame +POST /api/dataframes/cleanup # Clean up expired DataFrames + +# Data loading +POST /api/dataframes/upload # Upload file and create DataFrame +POST /api/dataframes/load-url # Load data from URL + +# Data operations +GET /api/dataframes/{df_id}/data # Get DataFrame data with pagination +POST /api/dataframes/{df_id}/execute # Execute pandas expression +POST /api/dataframes/{df_id}/export # Export DataFrame + +# Statistics and metadata +GET /api/dataframes/stats # Get storage statistics +GET /api/dataframes/{df_id}/summary # Get DataFrame summary +``` + +#### Request/Response Schemas + +```python +# DataFrame List Response +{ + "dataframes": [ + { + "df_id": "string", + "created_at": "datetime", + "shape": [rows, cols], + "memory_usage": "bytes", + "expires_at": "datetime?", + "is_expired": "boolean", + "tags": {"key": "value"}, + "source": "string" + } + ], + "total_count": "number", + "storage_stats": { + "total_memory_mb": "number", + "total_dataframes": "number", + "expired_count": "number" + } +} + +# DataFrame Data Response +{ + "data": [{"col1": "val1", "col2": "val2"}], + "columns": ["col1", "col2"], + "dtypes": {"col1": "object", "col2": "int64"}, + "total_rows": "number", + "page": "number", + "page_size": "number", + "has_more": "boolean" +} + +# Execute Operation Response +{ + "success": "boolean", + "result": { + "data": "array|object", + "shape": [rows, cols], + "execution_time_ms": "number", + "result_type": "dataframe|series|scalar" + }, + "error": "string?" +} +``` + +### 3. Backend Services + +#### DataFrame Web Service (`utils/dataframe_web_service/`) +- **Purpose**: Business logic layer between API and DataFrame Manager +- **Responsibilities**: + - Data validation and sanitization + - Pagination logic + - Export format handling + - Error handling and logging + +#### File Upload Handler +- **Purpose**: Handle multipart file uploads +- **Features**: + - Temporary file management + - Format detection + - Size validation + - Progress tracking + +## Data Models + +### DataFrame Metadata Extension +Extend existing `DataFrameMetadata` with web-specific fields: + +```python +class WebDataFrameMetadata(DataFrameMetadata): + """Extended metadata for web interface.""" + + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + self.last_accessed: Optional[datetime] = None + self.access_count: int = 0 + self.preview_cached: bool = False + + @property + def display_name(self) -> str: + """User-friendly display name.""" + return self.tags.get('display_name', self.df_id) + + @property + def source_display(self) -> str: + """Formatted source information.""" + source = self.tags.get('source', 'Unknown') + if source.startswith('http'): + return f"URL: {source[:50]}..." + return f"File: {os.path.basename(source)}" +``` + +### Pagination Model +```python +class PaginationParams: + """Parameters for data pagination.""" + + def __init__(self, page: int = 1, page_size: int = 50): + self.page = max(1, page) + self.page_size = min(max(10, page_size), 1000) # Limit page size + + @property + def offset(self) -> int: + return (self.page - 1) * self.page_size +``` + +## Error Handling + +### API Error Responses +Standardized error response format: + +```python +{ + "success": false, + "error": { + "code": "DATAFRAME_NOT_FOUND", + "message": "DataFrame with ID 'abc123' not found or expired", + "details": { + "df_id": "abc123", + "suggestion": "Check if the DataFrame has expired or been deleted" + } + } +} +``` + +### Error Categories +- **DATAFRAME_NOT_FOUND**: DataFrame doesn't exist or expired +- **INVALID_OPERATION**: Unsupported pandas operation +- **SYNTAX_ERROR**: Invalid pandas expression syntax +- **MEMORY_LIMIT**: Operation would exceed memory limits +- **UPLOAD_ERROR**: File upload or processing failed +- **EXPORT_ERROR**: Data export failed + +### Client-Side Error Handling +- Toast notifications for user feedback +- Graceful degradation for failed operations +- Retry mechanisms for transient failures +- Clear error messages with actionable suggestions + +## Testing Strategy + +### Unit Tests +- **API Endpoints**: Test all CRUD operations and edge cases +- **DataFrame Operations**: Test pandas expression execution and validation +- **File Upload**: Test various file formats and error conditions +- **Pagination**: Test boundary conditions and performance + +### Integration Tests +- **End-to-End Workflows**: Complete user journeys from upload to export +- **DataFrame Manager Integration**: Test interaction with existing services +- **Memory Management**: Test with large DataFrames and cleanup operations + +### Performance Tests +- **Large DataFrame Handling**: Test with DataFrames of various sizes +- **Concurrent Access**: Multiple users accessing same DataFrames +- **Memory Usage**: Monitor memory consumption during operations +- **Response Times**: Ensure acceptable performance for common operations + +### Browser Tests +- **Cross-Browser Compatibility**: Test on major browsers +- **Responsive Design**: Test on different screen sizes +- **JavaScript Functionality**: Test interactive features +- **File Upload**: Test drag-and-drop and file selection + +## Security Considerations + +### Input Validation +- **Pandas Expression Sanitization**: Prevent code injection in expressions +- **File Upload Validation**: Restrict file types and sizes +- **Parameter Validation**: Validate all API parameters + +### Access Control +- **DataFrame Isolation**: Ensure users can only access their DataFrames +- **Operation Restrictions**: Limit dangerous pandas operations +- **Resource Limits**: Prevent excessive memory or CPU usage + +### Data Protection +- **Temporary File Cleanup**: Ensure uploaded files are cleaned up +- **Memory Management**: Prevent memory leaks from large DataFrames +- **Error Information**: Avoid exposing sensitive system information + +## Performance Optimization + +### Caching Strategy +- **Preview Data Caching**: Cache first few rows for quick display +- **Metadata Caching**: Cache DataFrame metadata to reduce lookups +- **Operation Result Caching**: Cache results of expensive operations + +### Lazy Loading +- **Data Pagination**: Load data on-demand with pagination +- **Column Information**: Load column details only when needed +- **Large Result Handling**: Stream large results instead of loading in memory + +### Resource Management +- **Memory Monitoring**: Track memory usage and implement limits +- **Background Cleanup**: Periodic cleanup of expired DataFrames +- **Connection Pooling**: Efficient database connections for metadata storage + +## Deployment Considerations + +### Static Assets +- **CSS/JS Bundling**: Optimize frontend assets for production +- **CDN Integration**: Use CDN for common libraries +- **Asset Versioning**: Cache busting for updated assets + +### Configuration +- **Environment Variables**: Configurable limits and settings +- **Feature Flags**: Toggle features for different environments +- **Monitoring**: Health checks and performance metrics + +### Scalability +- **Horizontal Scaling**: Design for multiple server instances +- **Database Optimization**: Efficient queries for metadata operations +- **Load Balancing**: Handle multiple concurrent users diff --git a/.kiro/specs/dataframe-web-interface/requirements.md b/.kiro/specs/dataframe-web-interface/requirements.md new file mode 100644 index 0000000..ef67c04 --- /dev/null +++ b/.kiro/specs/dataframe-web-interface/requirements.md @@ -0,0 +1,94 @@ +# Requirements Document + +## Introduction + +This feature will enhance the existing dataframe service module by adding a comprehensive web interface that provides visibility into stored DataFrames and enables interactive manipulation through a user-friendly dashboard. The web interface will integrate with the existing MCP server infrastructure and provide both viewing and manipulation capabilities for DataFrames managed by the dataframe service tool. + +## Requirements + +### Requirement 1 + +**User Story:** As a data analyst, I want to view all stored DataFrames in a centralized dashboard, so that I can quickly see what data is available and understand its characteristics. + +#### Acceptance Criteria + +1. WHEN I navigate to the DataFrames page THEN the system SHALL display a list of all stored DataFrames with their metadata +2. WHEN viewing the DataFrame list THEN the system SHALL show DataFrame ID, creation date, shape, memory usage, and expiration status for each DataFrame +3. WHEN a DataFrame is expired THEN the system SHALL visually indicate its expired status with appropriate styling +4. WHEN I click on a DataFrame entry THEN the system SHALL display detailed metadata including column types, tags, and source information +5. IF no DataFrames are stored THEN the system SHALL display a helpful message with instructions on how to load data + +### Requirement 2 + +**User Story:** As a data analyst, I want to preview DataFrame contents directly in the web interface, so that I can understand the data structure and content without using command-line tools. + +#### Acceptance Criteria + +1. WHEN I select a DataFrame THEN the system SHALL display a preview of the first 10 rows in a formatted table +2. WHEN viewing DataFrame preview THEN the system SHALL show column names, data types, and handle different data types appropriately +3. WHEN the DataFrame is large THEN the system SHALL provide pagination controls to navigate through the data +4. WHEN displaying data THEN the system SHALL handle null values, long text, and special characters properly +5. WHEN I request a different number of preview rows THEN the system SHALL allow me to specify between 5-100 rows + +### Requirement 3 + +**User Story:** As a data analyst, I want to execute pandas operations through a web interface, so that I can manipulate and analyze data without switching to a command-line environment. + +#### Acceptance Criteria + +1. WHEN I select a DataFrame THEN the system SHALL provide a text input for pandas expressions +2. WHEN I enter a valid pandas expression THEN the system SHALL execute it and display the results +3. WHEN the operation returns a DataFrame THEN the system SHALL display it in a formatted table with pagination +4. WHEN the operation returns scalar values THEN the system SHALL display them in an appropriate format +5. WHEN I enter an invalid expression THEN the system SHALL display clear error messages with syntax guidance +6. WHEN executing operations THEN the system SHALL show execution time and result metadata + +### Requirement 4 + +**User Story:** As a data analyst, I want to perform common DataFrame operations through interactive controls, so that I can analyze data without writing pandas code. + +#### Acceptance Criteria + +1. WHEN viewing a DataFrame THEN the system SHALL provide buttons for common operations (head, tail, describe, info) +2. WHEN I click describe THEN the system SHALL display statistical summaries in a formatted table +3. WHEN I click info THEN the system SHALL show DataFrame information including memory usage and column details +4. WHEN I use filtering controls THEN the system SHALL allow me to filter by column values using dropdown menus and input fields +5. WHEN I apply filters THEN the system SHALL update the display to show only matching rows +6. WHEN I use sorting controls THEN the system SHALL allow me to sort by any column in ascending or descending order + +### Requirement 5 + +**User Story:** As a data analyst, I want to manage stored DataFrames through the web interface, so that I can organize and clean up my data workspace efficiently. + +#### Acceptance Criteria + +1. WHEN viewing the DataFrame list THEN the system SHALL provide delete buttons for each DataFrame +2. WHEN I click delete THEN the system SHALL ask for confirmation before removing the DataFrame +3. WHEN I confirm deletion THEN the system SHALL remove the DataFrame and update the list display +4. WHEN I want to load new data THEN the system SHALL provide a form to upload files or specify URLs +5. WHEN uploading data THEN the system SHALL support CSV, JSON, Excel, and Parquet formats +6. WHEN data loading completes THEN the system SHALL automatically refresh the DataFrame list and show the new entry + +### Requirement 6 + +**User Story:** As a data analyst, I want to export DataFrame results and visualizations, so that I can share insights and use data in other tools. + +#### Acceptance Criteria + +1. WHEN viewing DataFrame results THEN the system SHALL provide export options for CSV and JSON formats +2. WHEN I click export THEN the system SHALL generate and download the file with appropriate formatting +3. WHEN viewing statistical results THEN the system SHALL provide options to export summary tables +4. WHEN displaying large results THEN the system SHALL allow exporting either the full dataset or current view +5. WHEN exporting data THEN the system SHALL preserve data types and handle special characters correctly + +### Requirement 7 + +**User Story:** As a system administrator, I want to monitor DataFrame storage usage and performance, so that I can ensure optimal system performance and resource management. + +#### Acceptance Criteria + +1. WHEN accessing the DataFrames page THEN the system SHALL display overall storage statistics +2. WHEN viewing storage stats THEN the system SHALL show total memory usage, DataFrame count, and available space +3. WHEN DataFrames are approaching expiration THEN the system SHALL highlight them with warning indicators +4. WHEN I request cleanup THEN the system SHALL provide a button to remove all expired DataFrames +5. WHEN cleanup completes THEN the system SHALL show a summary of removed DataFrames and freed memory diff --git a/.kiro/specs/dataframe-web-interface/tasks.md b/.kiro/specs/dataframe-web-interface/tasks.md new file mode 100644 index 0000000..35c05ee --- /dev/null +++ b/.kiro/specs/dataframe-web-interface/tasks.md @@ -0,0 +1,169 @@ +# Implementation Plan + +- [ ] 1. Create DataFrame API endpoints and core backend services + - Implement RESTful API endpoints for DataFrame management operations + - Create business logic layer for data validation and processing + - Set up error handling and response formatting + - _Requirements: 1.1, 1.2, 1.3, 1.4, 5.1, 5.2, 5.3, 7.1, 7.2, 7.4_ + +- [ ] 1.1 Implement DataFrame API module structure + - Create `server/api/dataframes.py` with all endpoint function stubs + - Add DataFrame routes to `server/api/__init__.py` + - Create request/response models and validation schemas + - _Requirements: 1.1, 5.1_ + +- [ ] 1.2 Implement DataFrame listing and metadata endpoints + - Code GET `/api/dataframes` endpoint to list all stored DataFrames + - Code GET `/api/dataframes/{df_id}` endpoint for individual DataFrame details + - Code GET `/api/dataframes/stats` endpoint for storage statistics + - Implement pagination and filtering logic for DataFrame lists + - _Requirements: 1.1, 1.2, 1.3, 7.1, 7.2_ + +- [ ] 1.3 Implement DataFrame data retrieval endpoints + - Code GET `/api/dataframes/{df_id}/data` endpoint with pagination support + - Code GET `/api/dataframes/{df_id}/summary` endpoint for DataFrame summaries + - Implement data serialization and formatting for web display + - Add support for column filtering and row limiting + - _Requirements: 2.1, 2.2, 2.3, 2.4, 2.5_ + +- [ ] 1.4 Implement DataFrame operation execution endpoints + - Code POST `/api/dataframes/{df_id}/execute` endpoint for pandas expressions + - Implement expression validation and sanitization + - Add execution time tracking and result formatting + - Create error handling for invalid expressions and operations + - _Requirements: 3.1, 3.2, 3.3, 3.4, 3.5, 3.6_ + +- [ ] 2. Create DataFrame management and file upload functionality + - Implement file upload handling with multiple format support + - Create DataFrame deletion and cleanup operations + - Add data loading from URLs and local files + - _Requirements: 5.4, 5.5, 5.6, 7.4, 7.5_ + +- [ ] 2.1 Implement file upload and data loading endpoints + - Code POST `/api/dataframes/upload` endpoint for file uploads + - Code POST `/api/dataframes/load-url` endpoint for URL-based data loading + - Implement multipart file handling and temporary file management + - Add support for CSV, JSON, Excel, and Parquet formats with options + - _Requirements: 5.4, 5.5, 5.6_ + +- [ ] 2.2 Implement DataFrame deletion and cleanup endpoints + - Code DELETE `/api/dataframes/{df_id}` endpoint for individual DataFrame deletion + - Code POST `/api/dataframes/cleanup` endpoint for expired DataFrame cleanup + - Implement confirmation mechanisms and batch operations + - Add cleanup statistics and reporting + - _Requirements: 5.1, 5.2, 5.3, 7.4, 7.5_ + +- [ ] 2.3 Implement data export functionality + - Code POST `/api/dataframes/{df_id}/export` endpoint for data export + - Add support for CSV and JSON export formats + - Implement streaming for large dataset exports + - Create export progress tracking and file download handling + - _Requirements: 6.1, 6.2, 6.3, 6.4, 6.5_ + +- [ ] 3. Create web interface templates and frontend components + - Build main DataFrame dashboard template + - Create detailed DataFrame viewer template + - Implement interactive controls and modals + - _Requirements: 1.1, 1.2, 1.4, 2.1, 2.2, 2.3, 2.4, 2.5_ + +- [ ] 3.1 Create DataFrame dashboard template + - Create `server/templates/dataframes.html` extending base template + - Implement DataFrame list table with sortable columns + - Add storage statistics display and visual indicators for expired DataFrames + - Create action buttons for refresh, cleanup, and new data loading + - _Requirements: 1.1, 1.2, 1.3, 1.4, 1.5, 7.1, 7.2, 7.3_ + +- [ ] 3.2 Create DataFrame detail viewer template + - Create `server/templates/dataframe_detail.html` for individual DataFrame view + - Implement data preview table with pagination controls + - Add pandas expression input form with syntax highlighting + - Create operation result display area with formatting + - _Requirements: 2.1, 2.2, 2.3, 2.4, 2.5, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6_ + +- [ ] 3.3 Implement interactive operation controls + - Create common operation buttons (head, tail, describe, info) + - Implement filtering controls with column-based filters + - Add sorting controls for data display + - Create export buttons with format selection + - _Requirements: 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 6.1, 6.2_ + +- [ ] 4. Add JavaScript functionality for dynamic interactions + - Implement AJAX calls for all API endpoints + - Create dynamic table updates and pagination + - Add file upload with progress indication + - _Requirements: 2.3, 3.6, 4.4, 4.5, 5.4, 5.5_ + +- [ ] 4.1 Implement core JavaScript API client + - Create `dataframes.js` with functions for all API endpoints + - Implement error handling and user feedback mechanisms + - Add loading states and progress indicators + - Create utility functions for data formatting and display + - _Requirements: 3.6, 5.3, 7.5_ + +- [ ] 4.2 Implement dynamic table functionality + - Create JavaScript for DataFrame list table updates + - Implement client-side sorting and filtering + - Add pagination controls with AJAX loading + - Create auto-refresh functionality for real-time updates + - _Requirements: 1.4, 2.3, 4.5, 4.6_ + +- [ ] 4.3 Implement pandas expression executor interface + - Create JavaScript for expression input and execution + - Add syntax highlighting and validation feedback + - Implement result display with proper formatting + - Create execution history and common expression shortcuts + - _Requirements: 3.1, 3.2, 3.3, 3.4, 3.5, 3.6_ + +- [ ] 4.4 Implement file upload and data loading interface + - Create drag-and-drop file upload functionality + - Implement upload progress tracking and cancellation + - Add URL input form with validation + - Create format selection and options configuration + - _Requirements: 5.4, 5.5, 5.6_ + +- [ ] 5. Integrate DataFrame interface with existing server infrastructure + - Add DataFrame navigation link to existing navbar + - Create DataFrame page route handler + - Integrate with existing template and styling system + - _Requirements: 1.1, 1.2_ + +- [ ] 5.1 Add DataFrame routes to main server + - Add DataFrame page route to `server/main.py` + - Create route handler function for DataFrame dashboard + - Add DataFrame detail page route with ID parameter + - Update navigation template to include DataFrame link + - _Requirements: 1.1, 1.2_ + +- [ ] 5.2 Integrate DataFrame API with existing API structure + - Import DataFrame API routes in `server/api/__init__.py` + - Add DataFrame endpoints to the main API routes list + - Ensure consistent error handling with existing APIs + - Test integration with existing server middleware + - _Requirements: 1.1, 5.1, 5.2, 5.3_ + +- [ ] 6. Create comprehensive test suite for DataFrame web interface + - Write unit tests for all API endpoints + - Create integration tests for complete workflows + - Add performance tests for large DataFrame handling + - _Requirements: All requirements for validation_ + +- [ ] 6.1 Implement API endpoint unit tests + - Create `test_dataframes_api.py` with tests for all endpoints + - Test CRUD operations, error handling, and edge cases + - Add tests for file upload and data loading functionality + - Create tests for pandas expression execution and validation + - _Requirements: 1.1, 1.2, 1.3, 2.1, 3.1, 5.1, 5.4_ + +- [ ] 6.2 Implement integration tests for complete workflows + - Create end-to-end tests for data upload to visualization workflow + - Test DataFrame lifecycle from creation to deletion + - Add tests for concurrent access and data consistency + - Create tests for export functionality and file generation + - _Requirements: 2.1, 2.2, 2.3, 5.5, 5.6, 6.1, 6.2, 6.3_ + +- [ ] 6.3 Implement performance and load tests + - Create tests for large DataFrame handling and memory usage + - Add tests for pagination performance with large datasets + - Test concurrent user access and operation execution + - Create tests for cleanup operations and resource management + - _Requirements: 2.3, 7.1, 7.2, 7.4, 7.5_