fix(logging-export): CSV escaping, delimiter validation, ticks bounds, no-op progress#194
Conversation
…, no-op progress Bundles three follow-ups to PR #167 — all flagged by Qodo on the parallel daqifi-python-core port (PR #103) and verified equivalent in Python before being ported here. Closes #191 (no-op progress finalization): - Empty-channel early return now calls progress?.Report(100) so callers (UI progress bars) don't stall at <100% on no-op exports. Closes #192 (DateTime.MinValue marked INVALID): - FormatTimestamp's `ticks <= 0` check rejected ticks=0 even though it's a legal DateTime value (DateTime.MinValue = 0001-01-01 00:00:00). Tightened to `ticks < 0`. Existing test Export_ZeroTicks_WritesInvalidToken updated to assert the new correct format ("0001-01-01T00:00:00..."); the relative-time invalid-tick test switched to a negative tick (-1) since 0 now formats as "0.000". Closes #193 (CSV escaping + delimiter validation, 5-part fix): 1. New EscapeCsvField helper with formulaSafe parameter — RFC 4180 quoting for fields containing the delimiter / `"` / CR / LF, plus formula-injection mitigation that prefixes `'` to fields whose first non-whitespace character is `=`/`+`/`-`/`@`. 2. Delimiter validation rejects empty / multi-char / CR / LF / double-quote (`"` is reserved as the quoting char so allowing it as a delimiter would produce ambiguous CSV). 3. Header writes use formulaSafe=true (channel keys are user- controllable; spreadsheet apps would evaluate "=DevA:..." as a formula on open). 4. Whitespace-prefixed formula chars caught via TrimStart before the first-char check (" =SUM(A1)" still evaluates in spreadsheets, so naive value[0] checks bypass). 5. Data-row writes (timestamps, values) pass formulaSafe=false: internal output, so RFC 4180 quoting still applies (necessary when ':' or '.' is the delimiter and timestamps/floats contain them) but formula mitigation is OFF — otherwise legit negative numbers like "-1.5" would be clobbered into "'-1.5". 13 new tests cover header escaping (delimiter, quote, formula-leading char, whitespace-prefixed formula), data-row escaping (':' delimiter quotes ISO timestamp, '.' delimiter quotes relative time + value), negative-value regression (no leading- apostrophe), invalid-delimiter validation (5 cases parametrized via Theory), and no-op progress finalization. 905 total tests pass on net9.0 + net10.0 (was 892 with 2 failing on the prior buggy assertions).
Review Summary by QodoFix CSV export escaping, delimiter validation, ticks bounds, and progress finalization
WalkthroughsDescription• Fix DateTime.MinValue (ticks=0) incorrectly marked as INVALID; only negative ticks are invalid • Implement RFC 4180 CSV escaping with formula-injection mitigation for headers • Add delimiter validation to reject empty, multi-char, CR, LF, and double-quote characters • Finalize progress reporting (100%) on no-op exports to prevent UI stalls • Add 13 new tests covering CSV escaping, delimiter validation, and edge cases Diagramflowchart LR
A["CSV Export"] --> B["Delimiter Validation"]
A --> C["Timestamp Formatting"]
A --> D["Field Escaping"]
B --> E["Reject Invalid Delimiters"]
C --> F["Handle ticks=0 as Valid"]
D --> G["RFC 4180 Quoting"]
D --> H["Formula Injection Mitigation"]
H --> I["Headers: formulaSafe=true"]
H --> J["Data: formulaSafe=false"]
A --> K["Progress Reporting"]
K --> L["Report 100% on No-op"]
File Changes1. src/Daqifi.Core.Tests/Logging/Export/CsvExporterTests.cs
|
Code Review by Qodo
1.
|
|
Qodo finding "Data fields skip formula mitigation" — this is intentional and is the resolution of fifth follow-up on #193 from the Python port (daqifi-python-core PR #105 pass 8). Rationale: applying formula mitigation to data fields would clobber legitimate negative numeric values. `FormatValue(-1.5)` returns `"-1.5"`; with `formulaSafe: true` that would become `"'-1.5"` in the CSV — breaking any consumer that expects numeric tokens. The split is header = formulaSafe=true (channel keys are user-controllable strings like `=DevA:...` that Excel would evaluate); data = formulaSafe=false (timestamps from `FormatTimestamp` and numbers from `G` formatting are internally generated; their leading `-` is a sign, not a formula char). The new test `Export_NegativeValue_NotApostrophePrefixed` is the regression coverage for this — pre-fix, my data-row escape applied formulaSafe by default and broke negative numbers. The Qodo /agentic_review on the parallel Python PR caught it; documented as the fifth follow-up on #193. RFC 4180 quoting (delimiter / `"` / CR / LF) IS still applied to data fields — so timestamps containing `:` or `.` correctly get quoted when those characters are picked as delimiters (`Export_ColonDelimiter_QuotesIsoTimestamp` and `Export_DotDelimiter_QuotesRelativeTimestampAndValue` cover that). Only the formula-injection prefix is scoped header-only. Code stays as-is. |
|
/improve |
|
/agentic_review |
|
Persistent review updated to latest commit 8a0df25 |
PR Code Suggestions ✨Latest suggestions up to 2f77284 Warning
Previous suggestionsSuggestions up to commit 6da7050
Suggestions up to commit 904746b
✅ Suggestions up to commit e656b8c
✅ Suggestions up to commit e656b8c
✅ Suggestions up to commit 8a0df25
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
…against Unicode whitespace Qodo flagged that the formula-injection check trimmed only ' ' and '\t' before testing for the leading '=', '+', '-', '@'. CSV formula-injection PoCs use NBSP (U+00A0), thin spaces, and Unicode line separators (U+2028) ahead of '=' to evade trim-based checks — spreadsheets still treat the resulting cell as a formula because they normalize / ignore the leading whitespace. Switched the skip loop to char.IsWhiteSpace, which covers the full Unicode whitespace set, closing that bypass class. Skipped a second suggestion to coalesce null inputs to empty — EscapeCsvField is private and current callers always pass strings; adding null tolerance for an impossible case violates the "only validate at boundaries" rule.
|
/improve |
|
/agentic_review |
|
Persistent review updated to latest commit b1e27f2 |
|
Persistent suggestions updated to latest commit b1e27f2 |
#194 follow-up) Locks in the char.IsWhiteSpace upgrade from the previous commit. NBSP (U+00A0) and EM SPACE (U+2003) before '=' previously slipped past the trim-only check; the regression test confirms both now get the leading apostrophe. U+2028 LINE SEPARATOR — also covered by the runtime fix — is omitted from the test data because C# treats raw U+2028 as a source-file newline even inside string literals; the runtime behavior is identical to the other whitespace cases since char.IsWhiteSpace covers U+2028.
|
/improve |
|
/agentic_review |
|
Persistent review updated to latest commit e656b8c |
|
Persistent suggestions updated to latest commit e656b8c |
|
Convergence summary (Qodo /agentic_review pass 3):
Test gate: 905 pass / 2 skipped (real-hardware) / 0 fail; CI build green. Ready for human review. |
|
/improve |
|
/agentic_review |
|
Persistent review updated to latest commit e656b8c |
|
Persistent suggestions updated to latest commit e656b8c |
Excel, Google Sheets, and pandas (with default options) trim unquoted leading/trailing whitespace in CSV fields, silently losing the exact value through round-trip. Detect and quote those fields up front so the round-trip preserves the value verbatim.
|
/improve |
|
/agentic_review |
|
Persistent review updated to latest commit 904746b |
|
Persistent suggestions updated to latest commit 904746b |
When the delimiter is '\r' or '\n', interpolating it raw into the ArgumentException message produced a multi-line, hard-to-grep error log. Format the bad value as 'null' / 'empty' / 'U+XXXX' so the message stays a single line and unambiguously identifies the bad character.
|
/improve |
|
/agentic_review |
|
Persistent review updated to latest commit 6da7050 |
|
Persistent suggestions updated to latest commit 6da7050 |
Enumerate every code point of the invalid delimiter (with its length) in the ArgumentException detail string instead of reporting only the first character. Multi-character delimiters like ",," were misreported as a single "U+002C" — the new format reads len=2 [U+002C U+002C] and stays log-safe (no raw CR/LF interpolated).
|
/improve |
|
/agentic_review |
|
Persistent review updated to latest commit 2f77284 |
|
Persistent suggestions updated to latest commit 2f77284 |
Live-device smoke testBuilt the core example app against this PR (project reference, not the 0.20.0 NuGet) and exercised the end-to-end Setupdotnet build Daqifi.Core.Cli.csproj \
-p:DaqifiCoreProjectPath=.../Daqifi.Core.csproj -c Release
# Step 1: download an existing protobuf log from SD
dotnet run ... -- --serial /dev/cu.usbmodem101 \
--sd-download log_20260502_125152.bin
# Step 2: parse + export via Daqifi.Core.CsvExporter
dotnet run ... -- --sd-parse sd_log.bin \
--sd-export-csv out.csvResultExport succeeded: 3,908 incoming samples → 1,955 CSV lines (1 header + 1,954 timestamp-bucketed rows × 2 channel columns), 64,390 bytes in 0.01s — 313K rows/s, 4.9 MB/s. Verifications
What this didn't exerciseCouldn't directly hit these on real-device data — they're properly covered by the 13 new unit tests in this PR:
Bottom linePR #194 does not regress the live |
Summary
Three follow-ups to PR #167, all flagged by Qodo on the parallel daqifi-python-core port (PR #103) and verified equivalent in Python before being ported here.
FormatTimestampno longer rejects ticks=0 (DateTime.MinValue) as INVALID; only negative ticks are.EscapeCsvFieldhelper applied to header AND data rows; delimiter validation; formula-injection mitigation scoped to header fields only (so legit negative numbers don't get clobbered).Test plan
:quotes ISO timestamp,.quotes relative time + values), negative-value regression, invalid-delimiter Theory (empty / multi-char / CR / LF /"), and no-op progress finalization.Refs