[RFC]: Lumilake 2026 Q2 Roadmap

### Motivation

This RFC outlines the development roadmap for Lumilake in Q2 2026. We are gathering ideas and feedback.

### Proposed change

#### Code health & security

- [x] CI improvement #8
  - Add `pytest-cov` to CI to ensure coverage.
  - Do not ignore `tests/server/` in Pytest CI.

- [ ] Harden untrusted-input and failure paths
  - Strengthen the try-except blocks (`runtime/server.py:303,314,322,648,693`, `parser/n8n.py:457`) so hook failures stop being silently swallowed.
  - Replace regex-based SQL table extraction in `runtime_graph.py:1529–1533` and `909–911` with a parser or explicit `table_name` param (injection risk).
  - Sanitize `request_id` / `batch_id` before joining into archive paths in `runtime_manager/flowmesh.py:113` (path traversal).
  - Strip directory components in `_artifact_name_from_uri()` (`routes/jobs.py:673–685`) using `Path(name).name`.
  - Add max-body-size and YAML/JSON depth limits before parsing in `routes/jobs.py:140–154` (DoS via 100MB or deeply nested payload).
  - Fail-fast in `parser/n8n.py:120–124` when a node has no `type` instead of silently passing the filter.
  - Add an explicit iteration cap to the n8n topo-sort loop (`parser/n8n.py:165–268`) so a progress-tracking bug can't hang the parser.

- [ ] Tighten runtime code quality
  - Remove `getattr` for known-optional fields in `halo_dp.py`; replace with typed `Optional` fields.
  - Unify duplicated parameter resolution for `_resolve_data_retrieval_params()` in `runtime_graph.py:848–889` vs `980–1018`.
  - Unify YAML and n8n parsers behind one IR; require explicit `FormatOp` instead of invisible auto-wrap.
  - Guard `_build_candidate_pool` against the `item_map[workflow_id]` race when items are dequeued between selection and access (`priority_queue.py:248`).
  - Assert / log when `finalize_workflows()` pops a missing workflow id (`priority_queue.py:161–163`).

#### Usability — SDK, CLI, docs, errors

- [x] SDK / CLI polish & parity #10
- [x] Improve Logs & error actionability #9 
- [x] Fix gaps in doc #12

#### Dependencies

- [ ] Trim, pin, and consolidate dependencies #16

#### Generalizability

- [ ] Decouple the runtime graph builder from vllm / transformers / diffusers / omni + HF assumptions
  - Introduce a `ModelRegistry` / backend-strategy seam in `runtime/runtime_graph.py` (≈L478–519).
  - Replace free-form `data_spec` / `model_spec` / `inference_spec` dicts (≈L1148–1160) with Pydantic discriminated unions per `(backend, task_type)`.

#### Performance

- [ ] Job manager - Priority queue fairness
  - Drop `_apply_user_fairness()` from O(N²) by precomputing `user_to_ids` (`priority_queue.py:390–441`).
  - Track oldest enqueue timestamp incrementally instead of scanning every queue on `get_pending_stats()` (`priority_queue.py:143–156`).

- [ ] Query Optimizer
  - Skip the redundant topo sort in graph rewriting and memoize `remap()` on graph prefixing (`runtime_graph.py:167–237`).
  - Canonicalize + intern state tuples in Halo-DP to avoid blow-up on deep graphs (`runtime/optimizer/schedule/halo_dp.py`).

- [ ] Storage and scheduling I/O
  - Batch S3 artifact writes (tar/zip + multipart) instead of one stat + get/put per artifact (`utils/job_storage.py:35–72`).
  - Replace the `LUMILAKE_POLL_INTERVAL_SECONDS` sleep loop with an event / condition variable for worker availability (`runtime/server.py:673–747`).

### Alternatives considered

_No response_

### Migration / compatibility

_No response_

### Feedback period

_No response_

### CC list

_No response_

### Before submitting

- [x] I have searched existing issues and confirmed this is not a duplicate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC]: Lumilake 2026 Q2 Roadmap #7

Motivation

Proposed change

Code health & security

Usability — SDK, CLI, docs, errors

Dependencies

Generalizability

Performance

Alternatives considered

Migration / compatibility

Feedback period

CC list

Before submitting

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[RFC]: Lumilake 2026 Q2 Roadmap #7

Description

Motivation

Proposed change

Code health & security

Usability — SDK, CLI, docs, errors

Dependencies

Generalizability

Performance

Alternatives considered

Migration / compatibility

Feedback period

CC list

Before submitting

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions