feat(uptime): snuba table and configuration for new dataset #6686

JoshFerge · 2024-12-18T01:12:05Z

Context: https://github.com/getsentry/team-uptime/issues/23

migration PR: #6690

github-actions · 2024-12-18T01:12:50Z

This PR has a migration; here is the generated SQL for ./snuba/migrations/groups.py ()

-- start migrations

-- forward migration uptime_monitor_checks : 0001_uptime_monitor_checks
Local op: CREATE TABLE IF NOT EXISTS uptime_monitor_checks_local (organization_id UInt64, project_id UInt64, environment LowCardinality(Nullable(String)), uptime_subscription_id UInt64, uptime_check_id UUID, scheduled_check_time DateTime, timestamp DateTime, _sort_timestamp DateTime, duration UInt64, region_id Nullable(UInt16), check_status LowCardinality(String), check_status_reason LowCardinality(Nullable(String)), http_status_code UInt16, trace_id UUID, retention_days UInt16) ENGINE ReplicatedReplacingMergeTree('/clickhouse/tables/uptime_monitor_checks/{shard}/default/uptime_monitor_checks_local', '{replica}') PRIMARY KEY (organization_id, project_id, uptime_subscription_id, _sort_timestamp, uptime_check_id, trace_id) ORDER BY (organization_id, project_id, uptime_subscription_id, _sort_timestamp, uptime_check_id, trace_id) PARTITION BY (toMonday(_sort_timestamp)) TTL _sort_timestamp + toIntervalDay(retention_days) SETTINGS index_granularity=8192;
Distributed op: CREATE TABLE IF NOT EXISTS uptime_monitor_checks_dist (organization_id UInt64, project_id UInt64, environment LowCardinality(Nullable(String)), uptime_subscription_id UInt64, uptime_check_id UUID, scheduled_check_time DateTime, timestamp DateTime, _sort_timestamp DateTime, duration UInt64, region_id Nullable(UInt16), check_status LowCardinality(String), check_status_reason LowCardinality(Nullable(String)), http_status_code UInt16, trace_id UUID, retention_days UInt16) ENGINE Distributed(`cluster_one_sh`, default, uptime_monitor_checks_local, cityHash64(reinterpretAsUInt128(trace_id)));
Local op: ALTER TABLE uptime_monitor_checks_local ADD INDEX IF NOT EXISTS bf_trace_id trace_id TYPE bloom_filter GRANULARITY 1;
-- end forward migration uptime_monitor_checks : 0001_uptime_monitor_checks




-- backward migration uptime_monitor_checks : 0001_uptime_monitor_checks
Distributed op: DROP TABLE IF EXISTS uptime_monitor_checks_dist;
Local op: DROP TABLE IF EXISTS uptime_monitor_checks_local;
-- end backward migration uptime_monitor_checks : 0001_uptime_monitor_checks

JoshFerge · 2024-12-18T17:32:15Z

rust_snuba/src/processors/uptime_monitor_checks.rs

+) -> anyhow::Result<(Vec<UptimeMonitorCheckRow>, f64)> {
+    let monitor_message: UptimeMonitorCheckMessage = serde_json::from_slice(payload)?;
+
+    let rows = vec![UptimeMonitorCheckRow {


relevant upstream code: https://github.com/getsentry/uptime-checker/blob/e120f3ff6856d5f47cabb883efa11fae03c8c7b6/src/types/result.rs#L61

need to tweak this consumer before turning on.

JoshFerge · 2024-12-18T17:34:37Z

snuba/snuba_migrations/uptime_monitor_checks/0001_uptime_monitor_checks.py

+                    # do i actually need primary key to be different than sorting key?
+                    primary_key="(organization_id, project_id, uptime_subscription_id, _sort_timestamp, uptime_check_id, trace_id)",
+                    order_by="(organization_id, project_id, uptime_subscription_id, _sort_timestamp, uptime_check_id, trace_id)",


for our query pattern, it probably makes sense to include project_id and subscription_id in the order_by key. and uptime_check_id corresponds to event_id.

JoshFerge · 2024-12-18T17:34:51Z

snuba/snuba_migrations/uptime_monitor_checks/0001_uptime_monitor_checks.py

+local_table_name = f"{table_prefix}_local"
+dist_table_name = f"{table_prefix}_dist"
+
+## what about all the fancy codecs? do we need those?


should i use all the codecs that the spans table uses? or does it matter?

Start with no codec and we'll see where we need them. By default, everything is ZSTD(1).

JoshFerge · 2024-12-18T17:36:02Z

should i split this PR up? what would be the right way to do that? could i leave the processor off the config to start?

evanpurkhiser · 2024-12-18T17:58:08Z

scripts/load_uptime_checks.py

What's the script for?

i'm using it to load in data locally to test row size, compression ratio, and querying the data using EAP API.

JoshFerge · 2024-12-18T18:20:39Z

snuba/datasets/configuration/uptime_monitor_checks/storages/uptime_monitor_checks.yaml

+
+stream_loader:
+  processor: UptimeMonitorChecksProcessor
+  default_topic: snuba-uptime-monitor-checks


i believe we can actually reuse the uptime-results topic and don't need our own separate one for snuba here.

JoshFerge · 2024-12-18T19:14:29Z

snuba/snuba_migrations/uptime_monitor_checks/0001_uptime_monitor_checks.py

+    Column("organization_id", UInt(64)),
+    Column("project_id", UInt(64)),
+    Column("environment", String(Modifiers(nullable=True, low_cardinality=True))),
+    Column("uptime_subscription_id", UInt(64)),


what about uptime_subscription_region_id or project_uptime_subscription_id?

snuba/snuba_migrations/uptime_monitor_checks/0001_uptime_monitor_checks.py

JoshFerge · 2024-12-18T19:21:23Z

snuba/snuba_migrations/uptime_monitor_checks/0001_uptime_monitor_checks.py

+                columns=columns,
+                engine=table_engines.ReplacingMergeTree(
+                    # do i actually need primary key to be different than sorting key?
+                    primary_key="(organization_id, project_id, uptime_subscription_id, _sort_timestamp, uptime_check_id, trace_id)",


do we actually need trace_id here?

The EAP RPC has a way to query by trace_id. If you're not putting a trace_id somewhere, the requests on this endpoint will be slower.

Also, you don't need to set the sort key and the primary key. What we usually want is the primary key to be as small as possible (it's in RAM) and there's no need to be overly specific since, at some point, there will be so little data compared to what needs to be scanned.

In your case, I would put the timestamp right after project_id because, after this, how many uptime_subscription_id will you have? if it's something like under 8k, it might fit into one part and scanning the primary key then the part is not faster than just the part.

The sort key can have more fields since it dictates how it's laid on disk and the behavior of the ReplacingMergeTree so you can be as specific as possible.

The EAP RPC has a way to query by trace_id. If you're not putting a trace_id somewhere, the requests on this endpoint will be slower.

i've added a bloom filter index in the migration.

Also, you don't need to set the sort key and the primary key. What we usually want is the primary key to be as small as possible (it's in RAM) and there's no need to be overly specific since, at some point, there will be so little data compared to what needs to be scanned.

thank you. modified.

snuba/snuba_migrations/uptime_monitor_checks/0001_uptime_monitor_checks.py

JoshFerge · 2024-12-18T19:23:58Z

snuba/utils/streams/topics.py

@@ -45,6 +45,7 @@ class Topic(Enum):
    PROFILE_CHUNKS = "snuba-profile-chunks"

    REPLAYEVENTS = "ingest-replay-events"
+    UPTIME_MONITOR_CHECKS = "snuba-uptime-monitor-checks"


change this to uptime results

need to actually figure out if we need to enrich stuff from postgres, or we modify uptime checker

phacops

You should put the storage config in the EAP and you won't need an entity since you're going to use the EAP RPC.

phacops · 2024-12-18T21:26:30Z

should i split this PR up? what would be the right way to do that?

You need to split the PRs as Snuba won't let you add a migration with other code. Put the migration in another PR and keep the configuration and the processors here.

JoshFerge · 2024-12-18T21:56:10Z

should i split this PR up? what would be the right way to do that?

You need to split the PRs as Snuba won't let you add a migration with other code. Put the migration in another PR and keep the configuration and the processors here.

migration moved to #6690

JoshFerge · 2024-12-18T21:56:19Z

You should put the storage config in the EAP and you won't need an entity since you're going to use the EAP RPC.

done.

JoshFerge · 2024-12-19T00:15:10Z

snuba/migrations/groups.py

@@ -178,6 +181,11 @@ def __init__(
        storage_sets_keys={StorageSetKey.PROFILE_CHUNKS},
        readiness_state=ReadinessState.PARTIAL,
    ),
+    MigrationGroup.UPTIME_MONITOR_CHECKS: _MigrationGroup(


TODO: determine if we should change the StorageSetKey here

JoshFerge · 2024-12-20T01:18:37Z

are there any further tests i need to add? do i need to add a test in python similar to

snuba/tests/datasets/test_spans_processor.py

Line 1 in 2313872

from dataclasses import dataclass

?

codecov · 2024-12-20T01:46:18Z

❌ 1 Tests Failed:

Tests completed	Failed	Passed	Skipped
1293	1	1292	3

View the top 1 failed tests by shortest run time

tests.migrations.test_runner::test_no_schema_differences

Stack Traces | 10.2s run time

Traceback (most recent call last):
  File ".../snuba/clickhouse/native.py", line 200, in execute
    result_data = query_execute()
                  ^^^^^^^^^^^^^^^
  File ".../snuba/clickhouse/native.py", line 183, in query_execute
    return conn.execute(  # type: ignore
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../local/lib/python3.11.............../site-packages/clickhouse_driver/client.py", line 373, in execute
    rv = self.process_ordinary_query(
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../local/lib/python3.11.............../site-packages/clickhouse_driver/client.py", line 571, in process_ordinary_query
    return self.receive_result(with_column_types=with_column_types,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../local/lib/python3.11.../sentry_sdk/integrations/clickhouse_driver.py", line 112, in _inner_end
    res = f(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^
  File ".../local/lib/python3.11.............../site-packages/clickhouse_driver/client.py", line 204, in receive_result
    return result.get_result()
           ^^^^^^^^^^^^^^^^^^^
  File ".../local/lib/python3.11.../site-packages/clickhouse_driver/result.py", line 50, in get_result
    for packet in self.packet_generator:
  File ".../local/lib/python3.11.............../site-packages/clickhouse_driver/client.py", line 220, in packet_generator
    packet = self.receive_packet()
             ^^^^^^^^^^^^^^^^^^^^^
  File ".../local/lib/python3.11.............../site-packages/clickhouse_driver/client.py", line 237, in receive_packet
    raise packet.exception
clickhouse_driver.errors.ServerException: Code: 60.
DB::Exception: Table default.uptime_monitor_checks_local does not exist. Stack trace:

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000c61ff37 in ..................................................................................../usr/bin/clickhouse
1. DB::Exception::Exception<String, String>(int, FormatStringHelperImpl<std::type_identity<String>::type, std::type_identity<String>::type>, String&&, String&&) @ 0x0000000007132547 in ..................................................................................../usr/bin/clickhouse
2. DB::IDatabase::getTable(String const&, std::shared_ptr<DB::Context const>) const @ 0x00000000114add99 in ..................................................................................../usr/bin/clickhouse
3. DB::DatabaseCatalog::getTableImpl(DB::StorageID const&, std::shared_ptr<DB::Context const>, std::optional<DB::Exception>*) const @ 0x00000000116dcf68 in ..................................................................................../usr/bin/clickhouse
4. DB::DatabaseCatalog::getTable(DB::StorageID const&, std::shared_ptr<DB::Context const>) const @ 0x00000000116e6709 in ..................................................................................../usr/bin/clickhouse
5. DB::InterpreterDescribeQuery::execute() @ 0x0000000011e8db60 in ..................................................................................../usr/bin/clickhouse
6. DB::executeQueryImpl(char const*, char const*, std::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage::Enum, DB::ReadBuffer*) @ 0x00000000122bfe15 in ..................................................................................../usr/bin/clickhouse
7. DB::executeQuery(String const&, std::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage::Enum) @ 0x00000000122bb5b5 in ..................................................................................../usr/bin/clickhouse
8. DB::TCPHandler::runImpl() @ 0x0000000013137519 in ..................................................................................../usr/bin/clickhouse
9. DB::TCPHandler::run() @ 0x00000000131498f9 in ..................................................................................../usr/bin/clickhouse
10. Poco::Net::TCPServerConnection::start() @ 0x0000000015b42834 in ..................................................................................../usr/bin/clickhouse
11. Poco::Net::TCPServerDispatcher::run() @ 0x0000000015b43a31 in ..................................................................................../usr/bin/clickhouse
12. Poco::PooledThread::run() @ 0x0000000015c7a667 in ..................................................................................../usr/bin/clickhouse
13. Poco::ThreadImpl::runnableEntry(void*) @ 0x0000000015c7893c in ..................................................................................../usr/bin/clickhouse
14. ? @ 0x00007f261f732609 in ?
15. ? @ 0x00007f261f657353 in ?


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File ".../tests/migrations/test_runner.py", line 474, in test_no_schema_differences
    local_schema = get_local_schema(conn, table_name)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../snuba/migrations/parse_schema.py", line 322, in get_local_schema
    cols[:6] for cols in conn.execute("DESCRIBE TABLE %s" % table_name).results
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../snuba/clickhouse/native.py", line 285, in execute
    raise ClickhouseError(e.message, code=e.code) from e
snuba.clickhouse.errors.ClickhouseError: DB::Exception: Table default.uptime_monitor_checks_local does not exist. Stack trace:

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000c61ff37 in ..................................................................................../usr/bin/clickhouse
1. DB::Exception::Exception<String, String>(int, FormatStringHelperImpl<std::type_identity<String>::type, std::type_identity<String>::type>, String&&, String&&) @ 0x0000000007132547 in ..................................................................................../usr/bin/clickhouse
2. DB::IDatabase::getTable(String const&, std::shared_ptr<DB::Context const>) const @ 0x00000000114add99 in ..................................................................................../usr/bin/clickhouse
3. DB::DatabaseCatalog::getTableImpl(DB::StorageID const&, std::shared_ptr<DB::Context const>, std::optional<DB::Exception>*) const @ 0x00000000116dcf68 in ..................................................................................../usr/bin/clickhouse
4. DB::DatabaseCatalog::getTable(DB::StorageID const&, std::shared_ptr<DB::Context const>) const @ 0x00000000116e6709 in ..................................................................................../usr/bin/clickhouse
5. DB::InterpreterDescribeQuery::execute() @ 0x0000000011e8db60 in ..................................................................................../usr/bin/clickhouse
6. DB::executeQueryImpl(char const*, char const*, std::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage::Enum, DB::ReadBuffer*) @ 0x00000000122bfe15 in ..................................................................................../usr/bin/clickhouse
7. DB::executeQuery(String const&, std::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage::Enum) @ 0x00000000122bb5b5 in ..................................................................................../usr/bin/clickhouse
8. DB::TCPHandler::runImpl() @ 0x0000000013137519 in ..................................................................................../usr/bin/clickhouse
9. DB::TCPHandler::run() @ 0x00000000131498f9 in ..................................................................................../usr/bin/clickhouse
10. Poco::Net::TCPServerConnection::start() @ 0x0000000015b42834 in ..................................................................................../usr/bin/clickhouse
11. Poco::Net::TCPServerDispatcher::run() @ 0x0000000015b43a31 in ..................................................................................../usr/bin/clickhouse
12. Poco::PooledThread::run() @ 0x0000000015c7a667 in ..................................................................................../usr/bin/clickhouse
13. Poco::ThreadImpl::runnableEntry(void*) @ 0x0000000015c7893c in ..................................................................................../usr/bin/clickhouse
14. ? @ 0x00007f261f732609 in ?
15. ? @ 0x00007f261f657353 in ?

To view more test analytics, go to the Test Analytics Dashboard
📢 Thoughts on this report? Let us know!

untitaker · 2024-12-20T15:18:27Z

tests/admin/dead_letter_queue/test_dlq.py

@@ -2,4 +2,4 @@


 def test_dlq() -> None:
-    assert len(get_dlq_topics()) == 9


Feel free to delete this test, IMO that's just annoying to deal with as you add new datasets.

JoshFerge · 2024-12-20T18:33:46Z

closing as this wasn't in the right order

JoshFerge added 5 commits December 16, 2024 09:02

init

db920bb

rename uptime_monitor_checks

01c4578

adjust schema

e8dbba0

fix up

f4a6b94

fix rust consumer

49ee26c

github-actions bot added the migrations label Dec 18, 2024

JoshFerge added 2 commits December 18, 2024 10:29

tweaks

bed29de

fix dlq test

1de8198

JoshFerge commented Dec 18, 2024

View reviewed changes

JoshFerge marked this pull request as ready for review December 18, 2024 17:36

JoshFerge requested a review from a team as a code owner December 18, 2024 17:36

JoshFerge requested a review from a team December 18, 2024 17:36

JoshFerge assigned phacops and unassigned phacops Dec 18, 2024

fix unused

8721607

evanpurkhiser reviewed Dec 18, 2024

View reviewed changes

JoshFerge commented Dec 18, 2024

View reviewed changes

Merge branch 'master' into jferg/uptime-monitors

41d6d96

JoshFerge commented Dec 18, 2024

View reviewed changes

snuba/snuba_migrations/uptime_monitor_checks/0001_uptime_monitor_checks.py Outdated Show resolved Hide resolved

JoshFerge commented Dec 18, 2024

View reviewed changes

snuba/snuba_migrations/uptime_monitor_checks/0001_uptime_monitor_checks.py Outdated Show resolved Hide resolved

JoshFerge commented Dec 18, 2024

View reviewed changes

phacops requested changes Dec 18, 2024

View reviewed changes

move storage to eap and remove entity

f2c9c5b

JoshFerge requested a review from a team as a code owner December 18, 2024 21:18

fix partition key to include retention days

881f3ae

JoshFerge added 2 commits December 18, 2024 14:45

update sorting key

7779c0f

remove migration

4e2eced

JoshFerge mentioned this pull request Dec 18, 2024

feat(uptime): add initial table migration #6690

Closed

fix up storage

576ac20

JoshFerge commented Dec 19, 2024

View reviewed changes

JoshFerge added 5 commits December 19, 2024 13:05

fix rust test

0c4ee83

fix rust test

d248abd

fix consumer

0fac9d8

snapshot

d60343d

fix snapshot

2313872

remove unused

594ee4d

untitaker reviewed Dec 20, 2024

View reviewed changes

JoshFerge closed this Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(uptime): snuba table and configuration for new dataset #6686

feat(uptime): snuba table and configuration for new dataset #6686

JoshFerge commented Dec 18, 2024 •

edited

Loading

github-actions bot commented Dec 18, 2024 •

edited

Loading

JoshFerge Dec 18, 2024

JoshFerge Dec 18, 2024 •

edited

Loading

JoshFerge Dec 18, 2024

phacops Dec 18, 2024

JoshFerge commented Dec 18, 2024

evanpurkhiser Dec 18, 2024

JoshFerge Dec 18, 2024

JoshFerge Dec 18, 2024

JoshFerge Dec 18, 2024

JoshFerge Dec 18, 2024

phacops Dec 18, 2024 •

edited

Loading

JoshFerge Dec 18, 2024

JoshFerge Dec 18, 2024

JoshFerge Dec 18, 2024

phacops left a comment

phacops commented Dec 18, 2024

JoshFerge commented Dec 18, 2024

JoshFerge commented Dec 18, 2024

JoshFerge Dec 19, 2024

JoshFerge commented Dec 20, 2024

codecov bot commented Dec 20, 2024

untitaker Dec 20, 2024

JoshFerge commented Dec 20, 2024

		@@ -2,4 +2,4 @@


		def test_dlq() -> None:
		assert len(get_dlq_topics()) == 9

feat(uptime): snuba table and configuration for new dataset #6686

feat(uptime): snuba table and configuration for new dataset #6686

Conversation

JoshFerge commented Dec 18, 2024 • edited Loading

github-actions bot commented Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

JoshFerge Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JoshFerge commented Dec 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

phacops Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

phacops left a comment

Choose a reason for hiding this comment

phacops commented Dec 18, 2024

JoshFerge commented Dec 18, 2024

JoshFerge commented Dec 18, 2024

Choose a reason for hiding this comment

JoshFerge commented Dec 20, 2024

codecov bot commented Dec 20, 2024

❌ 1 Tests Failed:

Choose a reason for hiding this comment

JoshFerge commented Dec 20, 2024

JoshFerge commented Dec 18, 2024 •

edited

Loading

github-actions bot commented Dec 18, 2024 •

edited

Loading

JoshFerge Dec 18, 2024 •

edited

Loading

phacops Dec 18, 2024 •

edited

Loading