feat(converters): CSVToDocument supports row-level conversion #9773

xoaryaa · 2025-09-08T13:08:56Z

Related Issues

fixes Enhance CSVToDocument Converter to Support Row-Level Conversion #8848

Proposed Changes:

Add optional conversion_mode: Literal["file","row"] (default: "file") to CSVToDocument.
In conversion_mode="row", convert each CSV row into one Document.
- content comes from a user-selected content_column (if provided).
- All remaining CSV columns are copied into Document.meta as {column_name: value}.
- Adds row_number to meta for traceability.
Add CSV parsing options delimiter and quotechar (passed to csv.DictReader).
Preserve existing behavior & API: "file" mode remains the default and unchanged.
Friendly fallback: if row parsing fails, log a warning and fall back to "file" mode.
Docs & release note added.

How did you test it?

Unit tests (all passing locally):
- test_row_mode_with_content_column: asserts per-row Document creation, content from selected column, and remaining columns in meta.
- test_row_mode_without_content_column: asserts readable "key: value" listing when content_column=None.
- test_row_mode_meta_merging: verifies ByteStream/meta merging into each row’s meta.
Existing CSV converter tests still pass (backward-compat).
Manual verification with a small CSV (2–3 rows) to check file_path handling and delimiters.

Notes for the reviewer

Backward compatibility is preserved by keeping "file" as the default conversion_mode.
store_full_path behavior is respected in row mode (we still shorten file_path unless explicitly requested).
Column names that collide with existing meta keys are not overwritten; row columns are only added where keys don’t exist.
Happy to extend with optional type-casting for common numeric/boolean columns in a follow-up if desired.

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
I documented my code
I ran pre-commit hooks and fixed any issue

coveralls · 2025-09-08T13:13:12Z

Pull Request Test Coverage Report for Build 18377253514

Details

0 of 0 changed or added relevant lines in 0 files are covered.
4 unchanged lines in 1 file lost coverage.
Overall coverage remained the same at 92.062%

Files with Coverage Reduction	New Missed Lines	%
components/converters/csv.py	4	95.45%

Totals
Change from base Build 18340248289:	0.0%
Covered Lines:	13233
Relevant Lines:	14374

💛 - Coveralls

mpangrazzi

Hi @xoaryaa!

Your PR claims to fix #8848, but the pushed changes are completely unrelated to CSVToDocument. The PR instead implements a request_headers feature for the LinkContentFetcher component. Are you aware of this? Maybe you pushed the wrong code by mistake?

xoaryaa · 2025-09-08T14:53:26Z

Yes, you are right. I’m working on uploading the right code

…

On Mon, Sep 8, 2025 at 20:10 Michele Pangrazzi ***@***.***> wrote: ***@***.**** commented on this pull request. Hi @xoaryaa <https://github.com/xoaryaa>! Your PR claims to fix #8848 <#8848>, but the pushed changes are completely unrelated to CSVToDocument. The PR instead implements a request_headers feature for the LinkContentFetcher component. Are you aware of this? Maybe you pushed the wrong code by mistake? — Reply to this email directly, view it on GitHub <#9773 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A3HSYNHWBM32FI5P7BZXQ3T3RWINPAVCNFSM6AAAAACF5MNLSSVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTCOJWHA2TQNZXG4> . You are receiving this because you were mentioned.Message ID: ***@***.***>

mpangrazzi

I've left a few comments!