From 54611e3fbe4ce792a457a0397b22cff8f5a4e98f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Thomas=20Oberm=C3=BCller?= Date: Tue, 25 Feb 2025 22:23:51 +0100 Subject: [PATCH 1/3] Add RFC for query migrations --- .../2025-02-25-query-migrations.md | 81 +++++++++++++++++++ 1 file changed, 81 insertions(+) create mode 100644 requests-for-comments/2025-02-25-query-migrations.md diff --git a/requests-for-comments/2025-02-25-query-migrations.md b/requests-for-comments/2025-02-25-query-migrations.md new file mode 100644 index 0000000..952e7e0 --- /dev/null +++ b/requests-for-comments/2025-02-25-query-migrations.md @@ -0,0 +1,81 @@ +# Request for comments: Query Migrations + +## Problem statement + +_Who are we building for, what are their needs (include example cases), why is this important?_ + +We'd like to evolve the [query schema](https://github.com/PostHog/posthog/blob/master/frontend/src/queries/schema/index.ts) while keeping backwards compatibility for existing queries. + +## Context + +_What are our competitors doing, what are the technical constraints, what are customers asking for, what does the data tell us, are there external motivations (e.g. launch week, enterprise contract)?_ + +Queries are persisted in various places: + +1. The `query` field of the Insights model. +2. Local storage for the "query draft" feature. +3. The url for shareable insight links. +4. Notebook nodes (and canvas element). +5. The activity log, when changing a query. +6. Queries that are converted from `filters` e.g. when users create an insight by directly calling the api. + +Some of these places are available frontend-side (queries from a url) and some are available backend-side (queries from an insight model). + +This makes it difficult to migrate a query with a not backwards compatibly change e.g. converting a boolean to an enum. Backward compatible changes i.e. adding an optional field can still be done. + +Any errors in a query cause Pydantic to fail the validation and thus insights with the error will fail to compute. + +## Design + +_What are the key user experience and technical design decisions / trade-offs?_ + +Migrating the schema from one version to the next should be as easy as a database migration for developers. + +### Option a) Add a backend side migration and allow the frontend to use this in a new endpoint + +We could add only a backend side migration and let the frontend call a `/query/validation` endpoint that also migrates queries to newer versions. + +#### Pros + +- Only one place where we'd need to add a new migration +- Probably the easiest one to maintain + +#### Cons + +- The frontend needs to do an additional round-trip whenever a new query is encountered +- Might be hard to implement frontend side due to the async nature e.g. when migrating notebook nodes + +### Option b) Add a backend side and frontend side migration + +#### Pros + +- This is how we are currently doing it / trying to do it +- No round-trip to the backend necessary for new queries + +#### Cons + +- We need to maintain a frontend side and backend side migration. It'll be hard to keep them in sync +- We don't have immediate feedback from the Pydantic validation on validity of queries that are migrated frontend side + +### Option c) Allow only backward compatible changes + +#### Pros + +- Easiest to implement +- Easiest to maintain + +#### Cons + +- We can only add optional fields +- Technical debt will increase + +I'm favoring option a, where we add a backend side migration and call that from the frontend, as it seems to be the easiest (and as such, most scalable) solution for future developers adding query migrations. + +## Implementation + +- Get in touch with customers to deprecate `filters` based insights (already happening). Remove all remaining `filters` conversions. + +tbd: + +- Should we add a version field to the queries, so that we know which migrations to run +- Should we keep migrations forever or should we have a system in place that discards frontend side queries that are older than a certain time From 5969ef7d017b16297c6321356bbae1668f7e985a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Thomas=20Oberm=C3=BCller?= Date: Tue, 22 Apr 2025 09:46:47 +0200 Subject: [PATCH 2/3] Update 2025-02-25-query-migrations.md --- requests-for-comments/2025-02-25-query-migrations.md | 6 ------ 1 file changed, 6 deletions(-) diff --git a/requests-for-comments/2025-02-25-query-migrations.md b/requests-for-comments/2025-02-25-query-migrations.md index 952e7e0..62e45ec 100644 --- a/requests-for-comments/2025-02-25-query-migrations.md +++ b/requests-for-comments/2025-02-25-query-migrations.md @@ -2,14 +2,10 @@ ## Problem statement -_Who are we building for, what are their needs (include example cases), why is this important?_ - We'd like to evolve the [query schema](https://github.com/PostHog/posthog/blob/master/frontend/src/queries/schema/index.ts) while keeping backwards compatibility for existing queries. ## Context -_What are our competitors doing, what are the technical constraints, what are customers asking for, what does the data tell us, are there external motivations (e.g. launch week, enterprise contract)?_ - Queries are persisted in various places: 1. The `query` field of the Insights model. @@ -27,8 +23,6 @@ Any errors in a query cause Pydantic to fail the validation and thus insights wi ## Design -_What are the key user experience and technical design decisions / trade-offs?_ - Migrating the schema from one version to the next should be as easy as a database migration for developers. ### Option a) Add a backend side migration and allow the frontend to use this in a new endpoint From 8f1d9d804aa8b6665d759a2e5da77c3ce15561e4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Thomas=20Oberm=C3=BCller?= Date: Tue, 22 Apr 2025 09:51:06 +0200 Subject: [PATCH 3/3] Update 2025-02-25-query-migrations.md --- .../2025-02-25-query-migrations.md | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/requests-for-comments/2025-02-25-query-migrations.md b/requests-for-comments/2025-02-25-query-migrations.md index 62e45ec..6a03a13 100644 --- a/requests-for-comments/2025-02-25-query-migrations.md +++ b/requests-for-comments/2025-02-25-query-migrations.md @@ -14,10 +14,11 @@ Queries are persisted in various places: 4. Notebook nodes (and canvas element). 5. The activity log, when changing a query. 6. Queries that are converted from `filters` e.g. when users create an insight by directly calling the api. +7. ...? _(let me know)_ -Some of these places are available frontend-side (queries from a url) and some are available backend-side (queries from an insight model). +Some of these places are available frontend-side (e.g. queries in a url) and some are available backend-side (e.g. queries in an insight model). -This makes it difficult to migrate a query with a not backwards compatibly change e.g. converting a boolean to an enum. Backward compatible changes i.e. adding an optional field can still be done. +This makes it difficult to migrate a query with a non-backwards compatibly change e.g. converting a boolean to an enum. Backward compatible changes e.g. adding an optional field are no problem. Any errors in a query cause Pydantic to fail the validation and thus insights with the error will fail to compute. @@ -65,11 +66,15 @@ We could add only a backend side migration and let the frontend call a `/query/v I'm favoring option a, where we add a backend side migration and call that from the frontend, as it seems to be the easiest (and as such, most scalable) solution for future developers adding query migrations. +### Option d) Something easy I'm not seeing right now? + +- Let me know. + ## Implementation - Get in touch with customers to deprecate `filters` based insights (already happening). Remove all remaining `filters` conversions. +- Implement the above. -tbd: - +To be discusssed: - Should we add a version field to the queries, so that we know which migrations to run - Should we keep migrations forever or should we have a system in place that discards frontend side queries that are older than a certain time