Skip to content

feat: #1 - Jsonl Support#2

Open
JAVince wants to merge 2 commits intomainfrom
feature-1-00d4f54e-add-jsonl-support
Open

feat: #1 - Jsonl Support#2
JAVince wants to merge 2 commits intomainfrom
feature-1-00d4f54e-add-jsonl-support

Conversation

@JAVince
Copy link
Owner

@JAVince JAVince commented Dec 7, 2025

Summary

This PR implements support for uploading JSONL (JSON Lines) files to the application. This addresses issue #1.

Issue Context:

  • Add support for uploading jsonl files
  • Parse entire jsonl file to get all possible fields
  • Use standard library only (no new dependencies)
  • Generate one new table per jsonl file (similar to CSV and JSON uploads)
  • Concatenate nested fields and nested lists with __ as delimiter (updatable in constants)
  • Use _0, _1, etc. to denote list items (using delimiter and index notation)
  • Update UI to inform users about jsonl upload capability
  • Create test jsonl files in test directory

Implementation Plan

See: specs/jsonl-file-upload-support.md

Changes Made

Core Functionality

  • Added JSONL file processing support in file_processor.py
  • Implemented schema inference by scanning all lines to collect all possible fields
  • Added nested object flattening with __ delimiter (configurable via constants)
  • Added nested array handling with _0, _1 index notation
  • Type detection for columns (string, integer, float, boolean, null)

Constants Configuration

  • Created constants.py with configurable delimiters:
    • NESTED_FIELD_DELIMITER = "__" for nested objects
    • LIST_INDEX_DELIMITER = "_" for array indices

Server Updates

  • Updated server.py to accept .jsonl file extension
  • Added JSONL to allowed MIME types

UI Updates

  • Updated README.md to inform users about JSONL upload support
  • Added documentation about nested field flattening behavior

Test Files

  • Created test_events.jsonl - simple event data with nested fields
  • Created test_nested.jsonl - complex nested structures with arrays

Test Coverage

  • Added comprehensive unit tests for JSONL processing
  • Tests for nested object flattening
  • Tests for nested array handling
  • Tests for mixed data types
  • Tests for empty lines and malformed JSON handling

Key Implementation Details

  • Standard Library Only: Uses only Python's built-in json module
  • Full File Scan: Reads entire JSONL file to infer complete schema before processing
  • Configurable Delimiters: All delimiters stored in constants.py for easy updates
  • Robust Error Handling: Skips malformed lines, handles missing fields gracefully
  • Consistent with Existing Formats: Follows same pattern as CSV and JSON uploads

Closes #1

ADW ID: 00d4f54e


Note

Adds JSONL file upload support with nested flattening and full-file schema discovery, updates API handling, docs, and tests.

  • Backend:
    • JSONL Processing: Add convert_jsonl_to_sqlite() with full-file schema discovery via discover_jsonl_schema() and nested flattening via flatten_json_record() in core/file_processor.py.
    • Config: Introduce core/constants.py with NESTED_FIELD_DELIMITER and ARRAY_INDEX_DELIMITER used by the flattener.
    • API: Update /api/upload in server.py to accept .jsonl and route to the new converter.
  • Tests:
    • Add JSONL fixtures tests/assets/test_events.jsonl and tests/assets/test_nested.jsonl.
    • Extend tests/core/test_file_processor.py with unit tests for flattening, nested arrays/objects, schema evolution, and error cases.
  • Docs:
    • Update README.md to include .jsonl in upload capabilities and API notes.
    • Add spec specs/jsonl-file-upload-support.md and issue issue-1.md.

Written by Cursor Bugbot for commit 53543b6. This will update automatically on new commits. Configure here.

@JAVince JAVince mentioned this pull request Dec 7, 2025
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

else:
# Base case: primitive value (string, number, boolean, None)
items[parent_key] = obj

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Flattening top-level primitives creates empty-string column name

The flatten_json_record function handles top-level primitive values (non-dict, non-list) by storing them with parent_key as the key. When called with a top-level primitive and empty parent_key (e.g., a JSONL line containing just 123 or "hello"), this creates a dictionary entry with an empty string key {"": value}. This empty column name could cause database issues or unexpected behavior. While JSONL typically contains objects, the code doesn't validate this assumption before flattening, allowing malformed JSONL files to produce problematic schemas.

Fix in Cursor Fix in Web

"lockfileVersion": 3,
"requires": true,
"packages": {}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Accidentally committed empty package-lock.json in wrong directory

A nearly empty package-lock.json file was created at app/server/app/client/ which is an incorrect nested path structure. The actual client code lives at app/client/, not nested under app/server/. This appears to be an accidentally created file that should not be committed to the repository.

Fix in Cursor Fix in Web

new_key = f"{parent_key}{NESTED_FIELD_DELIMITER}{key}" if parent_key else key
# Recursively flatten
flattened = flatten_json_record(value, new_key)
items.update(flattened)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Delimiter collision causes silent data loss during flattening

The flatten_json_record function uses items.update() to merge flattened results, which silently overwrites values when key collisions occur. If a JSON record contains a field name that already includes the delimiter (like "user__name") alongside a nested structure that flattens to the same key (like {"user": {"name": "value"}}), the later value overwrites the earlier one without warning. Similarly, fields like "items_0" will collide with {"items": ["value"]}. This can cause silent data loss when processing JSONL files with field names containing __ or _N patterns.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant