-
Notifications
You must be signed in to change notification settings - Fork 28
Excel support #178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Excel support #178
Conversation
Implements full Excel file processing functionality for nf-schema, addressing the need for direct Excel workbook support without manual CSV conversion. ## Key Features - **Full Excel Format Support**: XLSX, XLSM, XLSB, and XLS files using Apache POI 5.4.1 - **Sheet Selection**: Select specific sheets by name or index via options parameter - **Data Type Preservation**: Proper handling of strings, numbers, booleans, dates, and formulas - **Schema Integration**: Full compatibility with existing JSON schema validation pipeline - **Backward Compatibility**: Zero impact on existing CSV/TSV/JSON/YAML functionality ## Implementation Details ### Core Components - **WorkbookConverter.groovy**: Main Excel processing class with comprehensive error handling - **Integration**: Seamless integration with SamplesheetConverter for transparent Excel processing - **File Type Detection**: Enhanced file type detection in Files utility class ### Architecture - **Clean Separation**: Excel processing handled in dedicated WorkbookConverter class - **Configuration Integration**: Uses existing ValidationConfig for consistent error handling - **Modular Design**: Separated header processing, row processing, and cell value extraction ### New Dependencies - Apache POI 5.4.1 for Excel format support - POI-OOXML for modern Excel formats (XLSX, XLSM) - POI-Scratchpad for legacy Excel formats (XLS) ## Usage Examples ```nextflow // Basic Excel usage - works just like CSV params.input = "samplesheet.xlsx" params.schema = "assets/schema_input.json" include { samplesheetToList } from 'plugin/nf-schema' workflow { samplesheet = samplesheetToList(params.input, params.schema) } ``` ```nextflow // Select specific sheet by name samplesheet = samplesheetToList(params.input, params.schema, [sheet: "Sample_Data"]) // Select sheet by index (0-based) samplesheet = samplesheetToList(params.input, params.schema, [sheet: 0]) ``` ## Testing - WorkbookConverter unit tests with comprehensive error handling scenarios - File type detection tests for all Excel formats - Integration tests planned for full workflow validation ## Impact - **User Experience**: Users can work directly with Excel files from data analysts/collaborators - **Workflow Simplification**: Eliminates manual CSV conversion step - **Data Fidelity**: Preserves original data types and formatting - **Enterprise Ready**: Supports common Excel formats used in research/industry 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
2510d12
to
c716966
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is impressive! Can you add some more tests though? It seems like this has a lot of logic behind it and I wan't to be sure everything works as expected
|
||
if ( commaCount == tabCount ){ | ||
log.error("Could not derive file type from ${file}. Please specify the file extension (CSV, TSV, YML, YAML and JSON are supported).".toString()) | ||
log.error("Could not derive file type from ${file}. Please specify the file extension (CSV, TSV, YML, YAML, JSON, and Excel formats are supported).".toString()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also specify which excel formats exactly are supported?
Summary
Implements comprehensive Excel file processing functionality for nf-schema, addressing GitHub issue #177.
Users can now use Excel workbooks (XLSX, XLSM, XLSB, XLS) directly without manual conversion to CSV format.
Key Features
Implementation Details
Core Components
Utils.castToType()
method that was converting typed data to nullCommit Structure
Testing
Usage Examples
Impact
Closes #177
🤖 Generated with Claude Code
Co-Authored-By: Claude [email protected]