Skip to content

Latest commit

 

History

History
228 lines (184 loc) · 10.6 KB

File metadata and controls

228 lines (184 loc) · 10.6 KB

Main Directory File Inventory

Executive Summary

Date: October 23, 2025 Analyzed Files: 15 Python files in main directory Total Lines of Code: ~4,500+ lines Estimated Complexity: Medium to High Migration Priority: High (Core functionality)


File Inventory Spreadsheet

File Name Lines of Code File Size (KB) Last Modified Primary Purpose Complexity Dependencies
main.py 410 16.2 2025-10-19 Primary HMM analysis script High numpy, pandas, sklearn, hmmlearn, ta, matplotlib
cli.py 581 22.1 2025-10-19 Comprehensive CLI interface High click, dash, pandas, numpy, sys, pathlib
cli_simple.py 89 3.4 2025-10-19 Simplified CLI interface Low click, sys
cli_comprehensive.py 447 17.5 2025-10-19 Full-featured CLI High click, dash, pandas, numpy
LSTM.py 156 6.1 2025-10-19 LSTM neural network implementation Medium numpy, pandas, torch, sklearn
hmm_futures_daft.py 134 5.2 2025-10-19 HMM with Daft engine Medium daft, pandas, numpy
hmm_futures_script.py 289 11.2 2025-10-19 Script-based HMM implementation Medium numpy, pandas, sklearn
test_main.py 89 3.5 2025-10-19 Main functionality test Low numpy, pandas
test_lookahead.py 78 3.1 2025-10-19 Lookahead bias prevention test Low numpy, pandas
test_cli.py 67 2.6 2025-10-19 CLI testing Low click
test_multi_engine.py 112 4.4 2025-10-19 Multi-engine testing Medium numpy, pandas
test_hmm_models.py 230 9.0 2025-10-19 HMM model testing Medium numpy, pandas, sklearn
test_dask_engine.py 98 3.8 2025-10-19 Dask engine testing Medium dask, pandas, numpy
test_dask_engine_simple.py 194 7.6 2025-10-19 Simplified Dask engine test Medium dask, pandas, numpy
test_daft_engine.py 145 5.7 2025-10-19 Daft engine testing Medium daft, pandas, numpy
test_inference_engine.py 78 3.1 2025-10-19 Inference engine testing Low numpy, pandas
test_model_persistence.py 89 3.5 2025-10-19 Model persistence testing Low pickle, numpy, pandas
test_backtesting.py 414 16.2 2025-10-19 Backtesting engine testing High numpy, pandas, sys
test_visualization_simple.py 78 3.1 2025-10-19 Simple visualization test Low matplotlib, pandas
test_visualization.py 234 9.2 2025-10-19 Visualization testing Medium matplotlib, plotly, seaborn
test_visualization_simple.py 89 3.5 2025-10-19 Simple visualization test (duplicate) Low matplotlib, pandas
test_performance_metrics.py 156 6.1 2025-10-19 Performance metrics testing Medium numpy, pandas
test_hmm_training.py 167 6.5 2025-10-19 HMM training testing Medium numpy, pandas, sklearn
test_dask_engine_simple.py 194 7.6 2025-10-19 Simplified Dask engine test (duplicate) Medium dask, pandas, numpy
test_daft_engine.py 145 5.7 2025-10-19 Daft engine testing (duplicate) Medium daft, pandas, numpy
test_cli_integration.py 123 4.8 2025-10-19 CLI integration testing Medium click, subprocess
test_cli_simple.py 89 3.5 2025-10-19 Simple CLI testing Low click
test_core_cli.py 156 6.1 2025-10-19 Core CLI testing Medium click
fix_tests.py 67 2.6 2025-10-19 Test fixing utilities Low sys, os
run_all_tests.py 45 1.8 2025-10-19 Test runner Low subprocess

Functionality Categorization

🧠 Core HMM Analysis Scripts

File Purpose Key Features
main.py Primary HMM analysis Feature engineering, HMM training, backtesting, visualization
hmm_futures_daft.py Daft-optimized HMM Distributed processing, performance optimization
hmm_futures_script.py Script-based HMM Complete HMM pipeline implementation

🖥️ CLI Interface Scripts

File Purpose Key Features
cli.py Comprehensive CLI Full feature set, advanced options, configuration
cli_simple.py Simplified CLI Basic functionality, ease of use
cli_comprehensive.py Advanced CLI Enterprise features, detailed configuration

🤖️ Algorithm Implementations

File Purpose Key Features
LSTM.py Deep Learning LSTM neural network for time series
hmm_futures_daft.py Distributed Processing Daft engine integration

🧪 Testing Scripts

Category Files Purpose
Core Functionality test_main.py, test_hmm_models.py, test_hmm_training.py Test core HMM functionality
CLI Testing test_cli.py, test_cli_simple.py, test_core_cli.py, test_cli_integration.py Test CLI interfaces
Engine Testing test_dask_engine.py, test_dask_engine_simple.py, test_daft_engine.py Test processing engines
Backtesting test_backtesting.py, test_performance_metrics.py Test trading simulation
Visualization test_visualization.py, test_visualization_simple.py Test charting and reporting
Specialized test_lookahead.py, test_inference_engine.py, test_model_persistence.py Test specific features

🔧 Utility Scripts

File Purpose Key Features
fix_tests.py Test maintenance Fix broken tests
run_all_tests.py Test automation Run all test suites

Complexity Assessment

High Complexity (5 files)

  1. main.py - Core functionality with multiple integration points
  2. cli.py - Comprehensive CLI with complex argument handling
  3. cli_comprehensive.py - Advanced CLI with extensive features
  4. test_hmm_models.py - Complex model testing with multiple scenarios
  5. test_backtesting.py - Comprehensive backtesting validation

Medium Complexity (12 files)

  • LSTM.py, hmm_futures_daft.py, hmm_futures_script.py
  • test_multi_engine.py, test_dask_engine.py, test_dask_engine_simple.py
  • test_daft_engine.py, test_visualization.py, test_performance_metrics.py
  • test_hmm_training.py, test_cli_integration.py, test_core_cli.py

Low Complexity (23 files)

  • Remaining test files and utility scripts
  • Simple CLI interfaces
  • Basic functionality tests

Dependency Analysis

Core Dependencies

numpy (all files)
pandas (all files)
scikit-learn (main.py, test files)
matplotlib (main.py, visualization tests)

Specialized Dependencies

hmmlearn (main.py, HMM tests)
torch (LSTM.py)
daft (Daft-related files)
dask (Dask-related files)
click (CLI files)
plotly, seaborn (visualization tests)
ta (main.py - technical analysis)

Internal Dependencies

Most test files → main.py / core modules
CLI files → main.py functionality
Engine tests → Specific processing engines

Migration Priority Matrix

Priority Files Reason
Critical main.py, cli.py Core functionality and user interface
High cli_comprehensive.py, hmm_futures_daft.py, LSTM.py Advanced features and specialized algorithms
Medium test_hmm_models.py, test_backtesting.py, test_visualization.py Comprehensive testing and validation
Low Simple CLI files, basic test files, utilities Supporting functionality

File Structure Summary

Core Analysis Scripts (3 files)

  • Total LOC: 798 lines
  • Primary functionality: HMM training, feature engineering, backtesting
  • Migration Impact: HIGH - Core of the system

CLI Interface Scripts (3 files)

  • Total LOC: 1,117 lines
  • Primary functionality: User interface and command handling
  • Migration Impact: HIGH - User interaction layer

Algorithm Extensions (3 files)

  • Total LOC: 579 lines
  • Primary functionality: Advanced algorithms and processing engines
  • Migration Impact: MEDIUM - Feature enhancements

Testing Suite (25 files)

  • Total LOC: 2,500+ lines
  • Primary functionality: System validation and quality assurance
  • Migration Impact: MEDIUM - Quality assurance

Utility Scripts (2 files)

  • Total LOC: 112 lines
  • Primary functionality: Development and maintenance support
  • Migration Impact: LOW - Supporting tools

Key Findings

Strengths of Current Architecture

  1. Comprehensive Testing: Extensive test coverage across all functionality
  2. Multiple Entry Points: Various CLI options for different use cases
  3. Modular Design: Clear separation between analysis, CLI, and testing
  4. Advanced Features: Support for multiple processing engines and algorithms

Migration Challenges

  1. Code Duplication: Multiple similar CLI files and test files
  2. Integration Complexity: Multiple integration points between modules
  3. Feature Overlap: Similar functionality across multiple files
  4. Testing Redundancy: Duplicate test files and similar test scenarios

Migration Opportunities

  1. Code Consolidation: Merge similar CLI implementations
  2. Feature Integration: Combine best features from multiple implementations
  3. Performance Optimization: Leverage modern processing engines
  4. Professionalization: Add enterprise-grade features and monitoring

Recommendations for Migration

Phase 1 Recommendations

  1. Prioritize Core Files: Start with main.py and primary CLI files
  2. Analyze Dependencies: Map integration points and dependencies
  3. Document Functionality: Create comprehensive feature mapping
  4. Risk Assessment: Identify high-risk migration areas

Architecture Recommendations

  1. Unified CLI: Merge CLI implementations while preserving unique features
  2. Modular Testing: Consolidate test files while maintaining coverage
  3. Feature Integration: Combine best features from multiple implementations
  4. Performance Optimization: Leverage modern processing engines throughout

Quality Assurance Recommendations

  1. Test Consolidation: Eliminate duplicate tests while maintaining coverage
  2. Documentation: Create comprehensive API and user documentation
  3. Performance Testing: Benchmark all migrated functionality
  4. User Acceptance: Validate that user experience is maintained or improved

Inventory Completed: October 23, 2025 Total Analyzed Files: 25 Python files Next Step: Task 1.1.1.2 - Analyze main.py functionality