Skip to content

Database configuration and migration #42

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 28 commits into
base: develop
Choose a base branch
from
Draft

Conversation

JanJakes
Copy link
Contributor

@JanJakes JanJakes commented Apr 15, 2025

This PR extracts existing SQLite database configuration to a WP_SQLite_Configurator and implements WP_SQLite_Information_Schema_Reconstructor with the ability to backfill missing table data in information schema.

How it works:

  1. The configurator checks the SQLite driver version against a version stored in the database, and if the driver is ahead, it will execute the database (re)configuration (see WP_SQLite_Configurator::ensure_database_configured()), while acquiring an EXCLUSIVE lock to prevent race conditions (multiple requests triggering the same migration).
  2. When the configurator runs, it ensures that any internal tables are created (global variables, information schema), and then it proceeds to the WP_SQLite_Information_Schema_Reconstructor.
  3. The reconstructor then compares existing SQLite tables against tables recorded in the information schema, backfills those that are missing, and removes those that no longer exist.
  4. To backfill a table when in WordPress, and it is a WordPress table, the definition is taken from wp_get_db_schema().
  5. To backfill a non-WordPress table, the information schema data is derived from _mysql_data_types_cache, if it exists, otherwise directly from the SQLite schema using SQLite column affinity rules.

@JanJakes JanJakes force-pushed the driver-migration branch 2 times, most recently from 6148a0b to 2250ed3 Compare April 15, 2025 15:13
@JanJakes JanJakes force-pushed the driver-migration branch 4 times, most recently from 7d2c0c1 to 6e94ae8 Compare April 17, 2025 13:42
@JanJakes JanJakes requested a review from adamziel April 17, 2025 14:51
@adamziel
Copy link
Contributor

adamziel commented Apr 17, 2025

Great start! And also a good idea to use a cascade of data sources for backfilling. There's some more way to go, though — I've commented on a bunch of potential issues


jobs:
verify-version:
name: Verify that plugin version matches SQLITE_DRIVER_VERSION constant
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about this slightly longer name?

Suggested change
name: Verify that plugin version matches SQLITE_DRIVER_VERSION constant
name: Assert the WordPress plugin header declares the same version as the SQLITE_DRIVER_VERSION constant

$parser = new WP_MySQL_Parser( self::$mysql_grammar, $tokens );
$ast = $parser->parse();

// TODO: Translate and execute all queries in the SQL input string.
Copy link
Contributor

@adamziel adamziel Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we don't need to support that:

Does this mean WordPress actually doesn't support multiple queries in here?

Also, dbDelta supports multiple queries by exploding the string by ; 🙈 The query parser would make it much safer once we can contribute to core again. cc @dmsnell

Also, it would be useful to have a first-class API for looping over multiple queries in a string – or even in a stream. It's not for this PR for sure, but it seems like we have all the parts that we need. We'd just need to detect when it fails due to a syntax error vs incomplete input and add a method such as append_bytes() and presto. Then we could just run any, arbitrarily large mysql dump.

@@ -2182,7 +2236,7 @@ private function translate( $node ): ?string {
'%s AS %s',
$value,
$this->quote_sqlite_identifier(
'@@' . ( $type_token ? "$type_token->value." : '' ) . $original_name
'@@' . ( $type_token ? "{$type_token->get_value()}." : '' ) . $original_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: slightly clearer code style

Suggested change
'@@' . ( $type_token ? "{$type_token->get_value()}." : '' ) . $original_name
'@@' . ( $type_token ? $type_token->get_value().'' : '' ) . $original_name

@@ -3481,10 +3540,9 @@ private function unquote_sqlite_identifier( string $quoted_identifier ): string
$first_byte = $quoted_identifier[0] ?? null;
if ( '"' === $first_byte || '`' === $first_byte ) {
$unquoted = substr( $quoted_identifier, 1, -1 );
} else {
$unquoted = $quoted_identifier;
return str_replace( $first_byte . $first_byte, $first_byte, $unquoted );
Copy link
Contributor

@adamziel adamziel Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a state machine for this? Or is this actually this easy for all possible scenarios? E.g. quadruple""""quotes, backslashes without \""NO_BACKSLASH_ESCAPES\"" etc.?

FROM sqlite_schema
WHERE type = 'table'
AND name != ?
AND name NOT LIKE ? ESCAPE '\'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL about ESCAPE

* @return string[] The names of tables in the information schema.
*/
private function get_information_schema_table_names(): array {
$tables_table = $this->schema_builder->get_table_name( false, 'tables' );
Copy link
Contributor

@adamziel adamziel Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm always nervous about having to remember which calls produce a contextually-safe part of the query and which don't. Is there anything we could do to the name to make this clear? As in get_quoted_table_name or get_escaped_table_name() or so?

Alternatively – and for the future – we might want to consider our own flavor of prepared statements that supports binding table names. But that's way beyond the scope here.

Tangentially related, @dmsnell taught me that backticks can be a part of MySQL tables names! So a table can be called wp_plugin_`meta`_data

*
* @return array<string, WP_Parser_Node> The WordPress CREATE TABLE statements.
*/
private function get_wp_create_table_statements(): array {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private function generate_create_table_statement( string $table_name ): string {
// Columns.
$columns = $this->driver->execute_sqlite_query(
sprintf( 'PRAGMA table_xinfo("%s")', $table_name )
Copy link
Contributor

@adamziel adamziel Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto my point from earlier about safe vs unsafe inputs. Can we either explicitly quote this, even if we don't have to, just to make tell the future reader "yup, all safe, nothing to look for here"?


// Key columns.
$key_columns = $this->driver->execute_sqlite_query(
'SELECT * FROM pragma_index_info("' . $key_info['name'] . '")'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double quote is a valid part of a table name and perhaps also a key name in SQLite, let's quote eagerly.

sqlite> create table "tricky_""table""_name" (id int);
sqlite> .tables
tricky_"table"_name

}

// Unquoted string literal. E.g.: abc
return $this->escape_mysql_string_literal( $default_value );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this happen in practice? Or should this be an error? I'm worried about converting things like DEFAULT NOW() to DEFAULT "NOW()"

}

/**
* 5. Otherwise, the affinity is NUMERIC.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great documentation

Comment on lines +611 to +614
private function escape_mysql_string_literal( string $literal ): string {
// See: https://www.php.net/manual/en/mysqli.real-escape-string.php
return "'" . addcslashes( $literal, "\0\n\r'\"\Z" ) . "'";
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmsnell can you poke any holes in this? We assume everything is UTF-8, and if it isn't then GIGO.

* @return string The quoted identifier value.
*/
private function quote_sqlite_identifier( string $unquoted_identifier ): string {
return '`' . str_replace( '`', '``', $unquoted_identifier ) . '`';
Copy link
Contributor

@adamziel adamziel Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you told me once that double backticks are the only way to escape a backtick so this seems fine. Can we mention that in the docstring with a documentation link? It just looks suspiciously easy and I don't want to mislead anyone in the future into searching for a security issue here.

$first_byte = $quoted_identifier[0] ?? null;
if ( '"' === $first_byte || '`' === $first_byte ) {
$unquoted = substr( $quoted_identifier, 1, -1 );
return str_replace( $first_byte . $first_byte, $first_byte, $unquoted );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto my state machine question from earlier. Should we use the parser for this?

@adamziel
Copy link
Contributor

I left some more notes. We're getting close here. Thank you for persevering @JanJakes! This is the last stretch before we can finally enjoy the new parser everywhere ❤️ I know I'm going deep on a piece of code we only need for a one-off migration, but a) it still matters and b) I think we'll reuse it later on for migrating between the database backends.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants