Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return row counts for SQL ingestion (storage.sql.ingest()). #2059

Merged
merged 3 commits into from
Apr 30, 2024

Conversation

smerritt
Copy link
Contributor

D1 bills based on rows read and written, but sql.ingest() didn't return those. Now it does.

This required changing the return type of sql.ingest(), but it's an experimental API so that should be okay.

let sqlCode = "INSERT INTO tbl (col) VALUES (123); INSERT I";
let result = sql.ingest(sqlCode);

Old:

result == " INSERT I"

New:

result.remainder == " INSERT I"
result.meta.rowsRead == 0
result.meta.rowsWritten == 1

This "meta" attribute also gives us a convenient spot to stick additional information. For example, if we wanted to count how many SQL statements ingest() executed, we could easily add "result.meta.statementsExecuted" without breaking any existing users.

Implementation-wise, there might be an optimization opportunity around SqlStorage::IngestResult::getMeta(). Right now, it allocates a new Meta every time it's called, so if you do the obvious thing like so:

totalRead += result.meta.rowsRead
totalWritten += result.meta.rowsWritten
totalRedFish += result.meta.redFish // and so on

then that might end up allocating and discarding multiple Meta objects, and some caching could fix that. On the other hand, I'm not very familiar[1] with JSG internals, so maybe that's already happening somehow. I have no idea.

[1] This is a terrible understatement.

@geelen
Copy link
Contributor

geelen commented Apr 29, 2024

Thanks for looking at this @smerritt! I think this would actually work fine, but interested in following up on this idea (from internal chat):

At the SQLite level, we've got a pair of DB-wide row counters (read and written), which would be really handy here.
workerd's API has SqliteDatabase::Query() resetting the row counters, though, which sort of makes sense and sort of doesn't. I'd almost prefer if the DB-wide counters just kept on going up† and Query computed deltas.

Part of me wants DB-wide non-resetting counters to make it easier to do stuff like this, and part of me thinks that it'd also make it easy to compute garbage values, like if you had a function that was synchronous but got an async step wedged in the middle later on, your before/after row counts could end up garbage.

We already make some fairly strict assumptions around the SQL api being complete synchronous, in particular calling sql.databaseSize and executing SELECT total_changes() as changes, last_insert_rowid() as last_row_id before and after each D1 query (which could be a batch with multiple SQL queries inside) in order to calculate the meta param of the return object.

I don't think there's any real worry in adding sql.totalRowsRead and sql.totalRowsWritten as ever-incrementing counters (accessible from JS), then making Query read the value before and after to produce the value for getRowsRead() and getRowsWritten(). Then APIs like ingest (and whatever a future DO + SQL user might need) can simply read the value before and after each logical operation and return it to the user in whatever form makes sense.

At the very least, this removes any concept of meta from the .ingest API and makes the change fully backwards-compatible.

@kentonv
Copy link
Member

kentonv commented Apr 29, 2024

I think we intentionally didn't add DB-wide counters because queries can be running concurrently (you can have multiple cursors) and so these counters felt like a footgun.

Regarding the interface change to ingest(), my nitpick is I would not create a meta property, just flatten the counters into result itself. This is closer to what exec() returns -- a cursor which has rowsRead and rowsWritten as top-level properties.

@smerritt
Copy link
Contributor Author

Flattening the counters into result sounds good to me. It's still the case that we can add more properties and methods later on, so I'm happy.

@smerritt
Copy link
Contributor Author

Fixup: f583865

@smerritt smerritt force-pushed the smerritt/sql-ingest-row-counts branch from f583865 to a20dc3c Compare April 29, 2024 18:19
@smerritt
Copy link
Contributor Author

This is ready for review.

ingest's return type is now flat: it's got three properties: remainder, rowsRead, and rowsWritten.

As far as the DB-wide counters, Kenton's right: they're a footgun. If you compute deltas around a chunk of purely synchronous code, you'll get correct numbers, but if any asynchrony creeps in between the queries across which you are computing deltas, your deltas can be too large. Summing row counts across queries is much less error-prone. I think it's better to take the interface change now, while it's still experimental, and just deal with a bit of jank in the D1 worker for a few weeks.

@smerritt smerritt marked this pull request as ready for review April 29, 2024 21:34
@smerritt smerritt requested review from a team as code owners April 29, 2024 21:34
D1 bills based on rows read and written, but sql.ingest() didn't
return those. Now it does.

This required changing the return type of sql.ingest(), but it's an
experimental API so that should be okay.

  let sqlCode = "INSERT INTO tbl (col) VALUES (123); INSERT I";
  let result = sql.ingest(sqlCode);

Old:

  result == " INSERT I"

New:

  result.remainder == " INSERT I"
  result.rowsRead == 0
  result.rowsWritten == 1

This also gives us a convenient spot to stick
additional information. For example, if we wanted to count how many
SQL statements ingest() executed, we could easily add
"result.statementsExecuted" without breaking any existing users.

[1] This is a terrible understatement.
@smerritt smerritt force-pushed the smerritt/sql-ingest-row-counts branch from a20dc3c to 479edba Compare April 29, 2024 22:39
@smerritt
Copy link
Contributor Author

479edba is a rebase on main.

@geelen
Copy link
Contributor

geelen commented Apr 30, 2024

@smerritt ok, I'm happy with this. I can tweak the d1-worker stuff as necessary. Can I ask to add one more counter: numQueries? It turns out that was the one bit we used to show to users in Wrangler, that we've now lost due to no longer parsing the SQL locally:

image

Wrangler wants this so it can show users how many statements were
executed. This matches how it used to work when Wrangler parsed the
SQL locally.
@smerritt
Copy link
Contributor Author

@geelen Commit 0cc33b7 adds the statement count.

src/workerd/api/sql.h Outdated Show resolved Hide resolved
src/workerd/api/sql.h Outdated Show resolved Hide resolved
This removes a bunch of boilerplate C++. It does not change the JS
interface.
Copy link
Member

@jasnell jasnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes LGTM

@smerritt smerritt merged commit 7737c74 into cloudflare:main Apr 30, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants