Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(uptime): snuba table and configuration for new dataset #6686

Closed
wants to merge 20 commits into from

Conversation

JoshFerge
Copy link
Member

@JoshFerge JoshFerge commented Dec 18, 2024

Copy link

github-actions bot commented Dec 18, 2024

This PR has a migration; here is the generated SQL for ./snuba/migrations/groups.py ()

-- start migrations

-- forward migration uptime_monitor_checks : 0001_uptime_monitor_checks
Local op: CREATE TABLE IF NOT EXISTS uptime_monitor_checks_local (organization_id UInt64, project_id UInt64, environment LowCardinality(Nullable(String)), uptime_subscription_id UInt64, uptime_check_id UUID, scheduled_check_time DateTime, timestamp DateTime, _sort_timestamp DateTime, duration UInt64, region_id Nullable(UInt16), check_status LowCardinality(String), check_status_reason LowCardinality(Nullable(String)), http_status_code UInt16, trace_id UUID, retention_days UInt16) ENGINE ReplicatedReplacingMergeTree('/clickhouse/tables/uptime_monitor_checks/{shard}/default/uptime_monitor_checks_local', '{replica}') PRIMARY KEY (organization_id, project_id, uptime_subscription_id, _sort_timestamp, uptime_check_id, trace_id) ORDER BY (organization_id, project_id, uptime_subscription_id, _sort_timestamp, uptime_check_id, trace_id) PARTITION BY (toMonday(_sort_timestamp)) TTL _sort_timestamp + toIntervalDay(retention_days) SETTINGS index_granularity=8192;
Distributed op: CREATE TABLE IF NOT EXISTS uptime_monitor_checks_dist (organization_id UInt64, project_id UInt64, environment LowCardinality(Nullable(String)), uptime_subscription_id UInt64, uptime_check_id UUID, scheduled_check_time DateTime, timestamp DateTime, _sort_timestamp DateTime, duration UInt64, region_id Nullable(UInt16), check_status LowCardinality(String), check_status_reason LowCardinality(Nullable(String)), http_status_code UInt16, trace_id UUID, retention_days UInt16) ENGINE Distributed(`cluster_one_sh`, default, uptime_monitor_checks_local, cityHash64(reinterpretAsUInt128(trace_id)));
Local op: ALTER TABLE uptime_monitor_checks_local ADD INDEX IF NOT EXISTS bf_trace_id trace_id TYPE bloom_filter GRANULARITY 1;
-- end forward migration uptime_monitor_checks : 0001_uptime_monitor_checks




-- backward migration uptime_monitor_checks : 0001_uptime_monitor_checks
Distributed op: DROP TABLE IF EXISTS uptime_monitor_checks_dist;
Local op: DROP TABLE IF EXISTS uptime_monitor_checks_local;
-- end backward migration uptime_monitor_checks : 0001_uptime_monitor_checks

) -> anyhow::Result<(Vec<UptimeMonitorCheckRow>, f64)> {
let monitor_message: UptimeMonitorCheckMessage = serde_json::from_slice(payload)?;

let rows = vec![UptimeMonitorCheckRow {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 56 to 58
# do i actually need primary key to be different than sorting key?
primary_key="(organization_id, project_id, uptime_subscription_id, _sort_timestamp, uptime_check_id, trace_id)",
order_by="(organization_id, project_id, uptime_subscription_id, _sort_timestamp, uptime_check_id, trace_id)",
Copy link
Member Author

@JoshFerge JoshFerge Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for our query pattern, it probably makes sense to include project_id and subscription_id in the order_by key. and uptime_check_id corresponds to event_id.

local_table_name = f"{table_prefix}_local"
dist_table_name = f"{table_prefix}_dist"

## what about all the fancy codecs? do we need those?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should i use all the codecs that the spans table uses? or does it matter?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Start with no codec and we'll see where we need them. By default, everything is ZSTD(1).

@JoshFerge
Copy link
Member Author

should i split this PR up? what would be the right way to do that? could i leave the processor off the config to start?

@JoshFerge JoshFerge marked this pull request as ready for review December 18, 2024 17:36
@JoshFerge JoshFerge requested a review from a team as a code owner December 18, 2024 17:36
@JoshFerge JoshFerge requested a review from a team December 18, 2024 17:36
@JoshFerge JoshFerge assigned phacops and unassigned phacops Dec 18, 2024
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the script for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm using it to load in data locally to test row size, compression ratio, and querying the data using EAP API.


stream_loader:
processor: UptimeMonitorChecksProcessor
default_topic: snuba-uptime-monitor-checks
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe we can actually reuse the uptime-results topic and don't need our own separate one for snuba here.

Column("organization_id", UInt(64)),
Column("project_id", UInt(64)),
Column("environment", String(Modifiers(nullable=True, low_cardinality=True))),
Column("uptime_subscription_id", UInt(64)),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about uptime_subscription_region_id or project_uptime_subscription_id?

columns=columns,
engine=table_engines.ReplacingMergeTree(
# do i actually need primary key to be different than sorting key?
primary_key="(organization_id, project_id, uptime_subscription_id, _sort_timestamp, uptime_check_id, trace_id)",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we actually need trace_id here?

Copy link
Contributor

@phacops phacops Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EAP RPC has a way to query by trace_id. If you're not putting a trace_id somewhere, the requests on this endpoint will be slower.

Also, you don't need to set the sort key and the primary key. What we usually want is the primary key to be as small as possible (it's in RAM) and there's no need to be overly specific since, at some point, there will be so little data compared to what needs to be scanned.

In your case, I would put the timestamp right after project_id because, after this, how many uptime_subscription_id will you have? if it's something like under 8k, it might fit into one part and scanning the primary key then the part is not faster than just the part.

The sort key can have more fields since it dictates how it's laid on disk and the behavior of the ReplacingMergeTree so you can be as specific as possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EAP RPC has a way to query by trace_id. If you're not putting a trace_id somewhere, the requests on this endpoint will be slower.

i've added a bloom filter index in the migration.

Also, you don't need to set the sort key and the primary key. What we usually want is the primary key to be as small as possible (it's in RAM) and there's no need to be overly specific since, at some point, there will be so little data compared to what needs to be scanned.

thank you. modified.

@@ -45,6 +45,7 @@ class Topic(Enum):
PROFILE_CHUNKS = "snuba-profile-chunks"

REPLAYEVENTS = "ingest-replay-events"
UPTIME_MONITOR_CHECKS = "snuba-uptime-monitor-checks"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change this to uptime results

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to actually figure out if we need to enrich stuff from postgres, or we modify uptime checker

Copy link
Contributor

@phacops phacops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should put the storage config in the EAP and you won't need an entity since you're going to use the EAP RPC.

@JoshFerge JoshFerge requested a review from a team as a code owner December 18, 2024 21:18
@phacops
Copy link
Contributor

phacops commented Dec 18, 2024

should i split this PR up? what would be the right way to do that?

You need to split the PRs as Snuba won't let you add a migration with other code. Put the migration in another PR and keep the configuration and the processors here.

@JoshFerge
Copy link
Member Author

should i split this PR up? what would be the right way to do that?

You need to split the PRs as Snuba won't let you add a migration with other code. Put the migration in another PR and keep the configuration and the processors here.

migration moved to #6690

@JoshFerge
Copy link
Member Author

You should put the storage config in the EAP and you won't need an entity since you're going to use the EAP RPC.

done.

@@ -178,6 +181,11 @@ def __init__(
storage_sets_keys={StorageSetKey.PROFILE_CHUNKS},
readiness_state=ReadinessState.PARTIAL,
),
MigrationGroup.UPTIME_MONITOR_CHECKS: _MigrationGroup(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: determine if we should change the StorageSetKey here

@JoshFerge
Copy link
Member Author

are there any further tests i need to add? do i need to add a test in python similar to

from dataclasses import dataclass
?

Copy link

codecov bot commented Dec 20, 2024

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
1293 1 1292 3
View the top 1 failed tests by shortest run time
tests.migrations.test_runner::test_no_schema_differences
Stack Traces | 10.2s run time
Traceback (most recent call last):
  File ".../snuba/clickhouse/native.py", line 200, in execute
    result_data = query_execute()
                  ^^^^^^^^^^^^^^^
  File ".../snuba/clickhouse/native.py", line 183, in query_execute
    return conn.execute(  # type: ignore
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../local/lib/python3.11.............../site-packages/clickhouse_driver/client.py", line 373, in execute
    rv = self.process_ordinary_query(
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../local/lib/python3.11.............../site-packages/clickhouse_driver/client.py", line 571, in process_ordinary_query
    return self.receive_result(with_column_types=with_column_types,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../local/lib/python3.11.../sentry_sdk/integrations/clickhouse_driver.py", line 112, in _inner_end
    res = f(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^
  File ".../local/lib/python3.11.............../site-packages/clickhouse_driver/client.py", line 204, in receive_result
    return result.get_result()
           ^^^^^^^^^^^^^^^^^^^
  File ".../local/lib/python3.11.../site-packages/clickhouse_driver/result.py", line 50, in get_result
    for packet in self.packet_generator:
  File ".../local/lib/python3.11.............../site-packages/clickhouse_driver/client.py", line 220, in packet_generator
    packet = self.receive_packet()
             ^^^^^^^^^^^^^^^^^^^^^
  File ".../local/lib/python3.11.............../site-packages/clickhouse_driver/client.py", line 237, in receive_packet
    raise packet.exception
clickhouse_driver.errors.ServerException: Code: 60.
DB::Exception: Table default.uptime_monitor_checks_local does not exist. Stack trace:

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000c61ff37 in ..................................................................................../usr/bin/clickhouse
1. DB::Exception::Exception<String, String>(int, FormatStringHelperImpl<std::type_identity<String>::type, std::type_identity<String>::type>, String&&, String&&) @ 0x0000000007132547 in ..................................................................................../usr/bin/clickhouse
2. DB::IDatabase::getTable(String const&, std::shared_ptr<DB::Context const>) const @ 0x00000000114add99 in ..................................................................................../usr/bin/clickhouse
3. DB::DatabaseCatalog::getTableImpl(DB::StorageID const&, std::shared_ptr<DB::Context const>, std::optional<DB::Exception>*) const @ 0x00000000116dcf68 in ..................................................................................../usr/bin/clickhouse
4. DB::DatabaseCatalog::getTable(DB::StorageID const&, std::shared_ptr<DB::Context const>) const @ 0x00000000116e6709 in ..................................................................................../usr/bin/clickhouse
5. DB::InterpreterDescribeQuery::execute() @ 0x0000000011e8db60 in ..................................................................................../usr/bin/clickhouse
6. DB::executeQueryImpl(char const*, char const*, std::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage::Enum, DB::ReadBuffer*) @ 0x00000000122bfe15 in ..................................................................................../usr/bin/clickhouse
7. DB::executeQuery(String const&, std::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage::Enum) @ 0x00000000122bb5b5 in ..................................................................................../usr/bin/clickhouse
8. DB::TCPHandler::runImpl() @ 0x0000000013137519 in ..................................................................................../usr/bin/clickhouse
9. DB::TCPHandler::run() @ 0x00000000131498f9 in ..................................................................................../usr/bin/clickhouse
10. Poco::Net::TCPServerConnection::start() @ 0x0000000015b42834 in ..................................................................................../usr/bin/clickhouse
11. Poco::Net::TCPServerDispatcher::run() @ 0x0000000015b43a31 in ..................................................................................../usr/bin/clickhouse
12. Poco::PooledThread::run() @ 0x0000000015c7a667 in ..................................................................................../usr/bin/clickhouse
13. Poco::ThreadImpl::runnableEntry(void*) @ 0x0000000015c7893c in ..................................................................................../usr/bin/clickhouse
14. ? @ 0x00007f261f732609 in ?
15. ? @ 0x00007f261f657353 in ?


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File ".../tests/migrations/test_runner.py", line 474, in test_no_schema_differences
    local_schema = get_local_schema(conn, table_name)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../snuba/migrations/parse_schema.py", line 322, in get_local_schema
    cols[:6] for cols in conn.execute("DESCRIBE TABLE %s" % table_name).results
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../snuba/clickhouse/native.py", line 285, in execute
    raise ClickhouseError(e.message, code=e.code) from e
snuba.clickhouse.errors.ClickhouseError: DB::Exception: Table default.uptime_monitor_checks_local does not exist. Stack trace:

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000c61ff37 in ..................................................................................../usr/bin/clickhouse
1. DB::Exception::Exception<String, String>(int, FormatStringHelperImpl<std::type_identity<String>::type, std::type_identity<String>::type>, String&&, String&&) @ 0x0000000007132547 in ..................................................................................../usr/bin/clickhouse
2. DB::IDatabase::getTable(String const&, std::shared_ptr<DB::Context const>) const @ 0x00000000114add99 in ..................................................................................../usr/bin/clickhouse
3. DB::DatabaseCatalog::getTableImpl(DB::StorageID const&, std::shared_ptr<DB::Context const>, std::optional<DB::Exception>*) const @ 0x00000000116dcf68 in ..................................................................................../usr/bin/clickhouse
4. DB::DatabaseCatalog::getTable(DB::StorageID const&, std::shared_ptr<DB::Context const>) const @ 0x00000000116e6709 in ..................................................................................../usr/bin/clickhouse
5. DB::InterpreterDescribeQuery::execute() @ 0x0000000011e8db60 in ..................................................................................../usr/bin/clickhouse
6. DB::executeQueryImpl(char const*, char const*, std::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage::Enum, DB::ReadBuffer*) @ 0x00000000122bfe15 in ..................................................................................../usr/bin/clickhouse
7. DB::executeQuery(String const&, std::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage::Enum) @ 0x00000000122bb5b5 in ..................................................................................../usr/bin/clickhouse
8. DB::TCPHandler::runImpl() @ 0x0000000013137519 in ..................................................................................../usr/bin/clickhouse
9. DB::TCPHandler::run() @ 0x00000000131498f9 in ..................................................................................../usr/bin/clickhouse
10. Poco::Net::TCPServerConnection::start() @ 0x0000000015b42834 in ..................................................................................../usr/bin/clickhouse
11. Poco::Net::TCPServerDispatcher::run() @ 0x0000000015b43a31 in ..................................................................................../usr/bin/clickhouse
12. Poco::PooledThread::run() @ 0x0000000015c7a667 in ..................................................................................../usr/bin/clickhouse
13. Poco::ThreadImpl::runnableEntry(void*) @ 0x0000000015c7893c in ..................................................................................../usr/bin/clickhouse
14. ? @ 0x00007f261f732609 in ?
15. ? @ 0x00007f261f657353 in ?

To view more test analytics, go to the Test Analytics Dashboard
📢 Thoughts on this report? Let us know!

@@ -2,4 +2,4 @@


def test_dlq() -> None:
assert len(get_dlq_topics()) == 9
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to delete this test, IMO that's just annoying to deal with as you add new datasets.

@JoshFerge
Copy link
Member Author

closing as this wasn't in the right order

@JoshFerge JoshFerge closed this Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants