Skip to content

fix(logging-export): escape CSV header fields against delimiter, quote, newline, and formula injection #193

@cptkoolbeenz

Description

@cptkoolbeenz

Issue

Daqifi.Core.Logging.Export.CsvExporter writes channel descriptor keys directly into the header row without any RFC 4180 escaping or formula-injection mitigation:

// CsvExporter.cs (current)
writer.WriteLine(string.Join(options.Delimiter, channels.Select(c => c.Key)));

This breaks for several realistic inputs:

Input Bug
Channel name contains the delimiter (e.g. ,) Field count drift; downstream parsers misalign columns
Channel name contains " Field becomes parser-ambiguous
Channel name contains \r or \n Row breaks mid-header
Device name starts with =, +, -, @ Excel/LibreOffice/Sheets evaluates the cell as a formula on open — a known CSV-injection class (=HYPERLINK, `=cmd

The same RFC-4180 bug (delimiter / quote / newline) exists in the data-row write path; the formula-injection class does not — see "Scope" below.

Fix shape (mirrors what the Python port just landed)

Add a single EscapeCsvField(string value, string delimiter, bool formulaSafe) helper that:

  1. If formulaSafe is true AND value starts with =/+/-/@ (after skipping leading whitespace), prefix with a literal ' to neutralize formula evaluation
  2. If value contains the delimiter, ", \r, or \n, double any embedded " and wrap in "..."
  3. Otherwise return verbatim

Also validate that options.Delimiter is a single non-newline character at the top of ExportAsync so callers can't construct an unparseable file from a multi-char or \n delimiter.

Scope: which fields get formulaSafe = true?

formulaSafe: true (header + any future user-provided text fields):

  • Header row entries (channel keys) — these embed ChannelDescriptor.DeviceName, which IS user/device-provided and therefore an attacker-controlled vector.
  • Any future text data field.

formulaSafe: false (internally-generated numeric / timestamp data):

  • Timestamp fields: ISO 8601 (2024-01-01T...) or relative seconds (0.000, 1.500). None of these starts with =/+/@. The "INVALID()" fallback for out-of-range ticks starts with I — also non-formula.
  • Numeric value fields: double.ToString("G", CultureInfo.InvariantCulture). Output is restricted to digits, ., - (sign), + (exponent sign), e/E, NaN, Infinity. The only =+-@ overlap is leading - on negative numbers, where the - is a meaningful sign — NOT a formula prefix. Mitigating it would corrupt the value (-1.5'-1.5) so spreadsheets / pandas / Tableau parse it as a string instead of a number.

Formula injection on data fields requires attacker-controlled text reaching a numeric column. The exporter sources values exclusively from ISampleSource.StreamSamples() returning double — type-system guarantee that no =/+/@ prefix can appear. Mitigation on these fields adds no security and breaks downstream numeric consumers.

This carve-out is documented in code (see CsvExporter.ExportAllSamplesAsync and ExportAveragedAsync where formulaSafe: false is passed for timestamp + numeric value rows).

Python port reference

daqifi-python-core PR #103 — see daqifi/export.py:_escape_csv_field and the TestCsvHeaderEscaping / TestDelimiterValidation test classes for the full RFC 4180 + formula-injection coverage shape.

Why this matters

Per the portomatic porting principle (capability parity), the Python port now defends against this; the upstream C# core should too so the two ports stay behaviorally aligned and don't ship known-bad CSVs to downstream tooling (Excel, pandas, Tableau, etc.).

Surfaced by Qodo agentic_review pass 4 on daqifi-python-core PR #103.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions