Skip to content

perf(docs write --tab --markdown): table cells inserted one-per-batchUpdate; multi-table docs trip Docs API write quota #699

@sebsnyk

Description

@sebsnyk

Symptom

gog docs write --replace --markdown --tab=<id> issues one Docs API batchUpdate per table cell, not per table or per body. On a doc with ~400 cells across multiple tables, that's ~400 batchUpdate calls — the per-minute Docs API write-operations quota is 60 per user, so any moderately table-heavy markdown body mathematically cannot land in one push. Mid-push 429s leave the doc in a partial state: outer body lands, early-table cells land, later-table cells silently absent, downstream image / chip / style steps never run.

This is the per-cell-call inefficiency surfaced by a downstream markdown-import workflow on a ~40 KB / 4-table body — pushes consistently 429 after ~3 minutes; the resulting Doc shows blank cells in the trailing tables, un-replaced inline image placeholders, and zero Person chips because the rest of the pipeline never runs.

Repro

# Make a markdown file with two non-trivial tables (any cell-count >60 total)
cat > /tmp/burst.md <<'MD'
| A | B | C | D |
| :- | :- | :- | :- |
| 1 | 2 | 3 | 4 |
| 5 | 6 | 7 | 8 |
| 9 | 10 | 11 | 12 |
| 13 | 14 | 15 | 16 |
| 17 | 18 | 19 | 20 |
| 21 | 22 | 23 | 24 |
| 25 | 26 | 27 | 28 |
| 29 | 30 | 31 | 32 |
| 33 | 34 | 35 | 36 |
| 37 | 38 | 39 | 40 |
| 41 | 42 | 43 | 44 |
| 45 | 46 | 47 | 48 |
| 49 | 50 | 51 | 52 |
| 53 | 54 | 55 | 56 |
| 57 | 58 | 59 | 60 |
| 61 | 62 | 63 | 64 |
MD

DOC=$(gog docs create "cell-burst test" --pageless --file /tmp/burst.md | grep ^id | awk '{print $2}')
gog docs tabs add "$DOC" --title "Tab 2"
TAB=$(gog docs list-tabs "$DOC" | awk 'NR==3{print $1}')

# Push the same markdown into Tab 2 via the per-tab markdown converter:
gog docs write "$DOC" --replace --markdown --tab="$TAB" --file /tmp/burst.md
# Observe many sequential batchUpdate calls; on a real-world multi-table doc this
# trips the 60/min Docs API write quota and partial state results.

To see the burst directly: tcpdump -i any -A 'host docs.googleapis.com' during the push, or watch the open doc in a browser tab — cells visibly populate one at a time over 30+ seconds rather than landing instantly.

Root cause

internal/cmd/docs_table_inserter.go line 80-107:

// Step 4: Insert text into each cell
for rowIdx := 0; rowIdx < len(cells); rowIdx++ {
    for colIdx := 0; colIdx < len(cells[rowIdx]); colIdx++ {
        ...
        _, err := ti.svc.Documents.BatchUpdate(ti.docID, &docs.BatchUpdateDocumentRequest{
            Requests: requests,
        }).Context(ctx).Do()
        ...
        ti.updateIndicesAfter(cellIdx, insertedLen, cellIndices, &tableEndIndex)
    }
}

Each cell's InsertText + per-cell formatting is its own batchUpdate. The index-tracking after each call (updateIndicesAfter) is the manual bookkeeping that makes the per-cell loop work — but the Docs API's batchUpdate is already index-aware (later requests in the same batch see the post-prior-request state), so this tracking is reproducing API behaviour at the CLI layer instead of leveraging it.

internal/cmd/docs_mutation.go:198 insertDocsMarkdownAt issues an additional batchUpdate for the non-table body, then iterates tables and calls TableInserter.InsertNativeTable per table — each table is also its own pair of batchUpdates (table structure first via InsertTableRequest, then the per-cell loop above). Image insertion does its own batchUpdate after.

Expected behaviour

For a gog docs write --replace --markdown --tab=<id> invocation, all the body + table structure + per-cell text + image-placeholder + formatting requests should land in ONE documents.batchUpdate (or at most one per ~500-request chunk per the Docs API limit). The Docs API processes requests sequentially within a batch and renumbers indices automatically, so the manual updateIndicesAfter bookkeeping isn't needed — Docs does it for free.

For the burst-repro markdown above (one 17-row × 4-col table = 68 cells), this would change the wire profile from ~70 batchUpdate calls to 1 batchUpdate (with ~70 InsertText + N formatting requests in its requests array, well under the 500-request cap).

Acceptance criteria

  • A markdown body with N table cells and M paragraph elements lands in O(1) batchUpdate calls (split only when total requests exceed the Docs API 500-request cap; in that case auto-split into ceil(total/500) batches with a stderr note per split).
  • Mid-push 429s no longer leave partial table content (single-batch atomicity guarantees either all cells land or none).
  • The --no-network-replay testing harness (the existing tests under docs_write_markdown_*_test.go) sees exactly 1 batchUpdate per gog docs write --replace --markdown --tab=... invocation (zero tables) and ceil(total_requests/500) batchUpdates for table-heavy bodies.

Adjacent gogcli issues on the same --tab path (same per-tab markdown converter)

Three independent bugs in the same code path suggest the per-tab converter would benefit from a focused rewrite that emits a single request list and submits one batchUpdate.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Normal priority bug or improvement with limited blast radius.clawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.clawsweeper:queueable-fixClawSweeper marked this issue as an existing queue_fix_pr work candidate.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.impact:data-lossThis issue is about lost, corrupted, or silently dropped user/session/config data.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions