Skip to content

Commit

Permalink
[PLAT-16134][PLAT-16253] YCQL microseconds precision alert + remote b…
Browse files Browse the repository at this point in the history
…ootstrap alert enhancement

Summary:
With default gflags YCQL operations with microsecond precision may cause issues with user workload, as inserts will support microseconds, which selects won't.
Special gflag wa introduced recently to address this issue, but we need to make sure no microsecond precision operations are executed before using this gFlag.
Adding an alert to make sure user is aware of any microsecond precision operations on the cluster.

Also, current increased remote bootstraps alert is using absolute number of bootstraps for threshold, and default value is 0.
It does not work typically as number of bootstraps highly depend on total number of tablets, etc, and few bootstraps running is not a big deal.
Made a change to use remote bootstrap count over 10 minutes percentage compared to total number of tablets and uses 10% as default threshold..

Test Plan:
Upgrade YBA.
Run the following YCQL operations:

CREATE TABLE test (
    id int PRIMARY KEY,
    created_date timestamp
) WITH default_time_to_live = 0
    AND transactions = {'enabled': 'true'};

INSERT INTO test(id, created_date) VALUES (1, currenttimestamp()) ;

INSERT INTO test(id, created_date) VALUES (2, '2024-09-25 15:52:40.768819+0000') ;

Make sure YCQL operations with microsecond precision alert is raised and has a link to https://docs.yugabyte.com/preview/releases/techadvisories/ta-23476/.
Wait for around 10 minutes. Make sure alert is resolved.

Create Increase in remote bootstraps alert policy.
Make sure alert expression is valid.

Reviewers: skurapati, #yba-api-review!

Reviewed By: skurapati

Subscribers: sanketh, yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D40716
  • Loading branch information
anmalysh-yb committed Dec 18, 2024
1 parent 2c5c8bd commit e5a8422
Show file tree
Hide file tree
Showing 7 changed files with 87 additions and 20 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ public enum AlertTemplate {
HIGH_NUM_YEDIS_CONNECTIONS,
YSQL_THROUGHPUT,
YCQL_THROUGHPUT,
YCQL_MICROSECOND_TIMESTAMPS_DETECTED,
MASTER_LEADER_MISSING,
MASTER_UNDER_REPLICATED,
LEADERLESS_TABLETS,
Expand Down
7 changes: 7 additions & 0 deletions managed/src/main/java/com/yugabyte/yw/models/common/Unit.java
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,13 @@ public enum Unit {
.metricName("sec")
.integer(true)
.thresholdConditionReadOnly(true)),
MINUTE(
new UnitBuilder()
.measure(Measure.TIME)
.displayName("min")
.metricName("min")
.integer(true)
.thresholdConditionReadOnly(true)),
DAY(
new UnitBuilder()
.measure(Measure.TIME)
Expand Down
48 changes: 38 additions & 10 deletions managed/src/main/resources/alert/alert_templates.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1492,6 +1492,33 @@ templates:
Maximum throughput for YSQL operations for universe '{{ $labels.source_name }}'
is above {{ $labels.threshold }}. Current value is {{ $value | printf "%.0f" }}
YCQL_MICROSECOND_TIMESTAMPS_DETECTED:
name: YCQL inserts with microseconds precision
description: YCQL inserts with microseconds precision detected, which is not fully supported.
queryTemplate: max by (universe_uuid) (increase(cql_microseconds_timestamps_used{universe_uuid="__universeUuid__"}[{{ query_threshold }}m])) {{ query_condition }} 0
createForNewCustomer: true
defaultThresholdMap:
WARNING:
threshold: 60
targetType: UNIVERSE
defaultThresholdCondition: GREATER_THAN
defaultThresholdUnit: MINUTE
thresholdUnitName: min
labels:
affected_node_names: >-
{{ range $index, $element := query "max by (universe_uuid, node_name)
(increase(cql_microseconds_timestamps_used{universe_uuid='{{ $labels.universe_uuid }}'}[{{ query_threshold }}m]))
{{ query_condition }} 0" }}{{if $index}},{{end}}{{ $element.Labels.node_name }}{{ end }}
annotations:
summary: >-
YCQL inserts with microseconds precision detected on universe '{{ $labels.source_name }}',
which is not fully supported. Please change application code to use milliseconds precision,
or set cql_revert_to_partial_microsecond_support=false gflag value to round
down all future microsecond precision timestamps to previous millisecond.
See https://docs.yugabyte.com/preview/releases/techadvisories/ta-23476
for more details.
Affected nodes: {{ $labels.affected_node_names }}
YCQL_OP_AVG_LATENCY:
name: YCQL average latency is high
description: Average latency of YCQL operations is above threshold
Expand Down Expand Up @@ -1662,28 +1689,29 @@ templates:
INCREASED_REMOTE_BOOTSTRAPS:
name: Increase in remote bootstraps
description: Increase in remote bootstraps detected during last 5 minutes
queryTemplate: sum by (universe_uuid) (increase(rpc_latency_count{export_type="tserver_export", service_type="ConsensusService", universe_uuid="__universeUuid__", server_type="yb_consensus", service_method="StartRemoteBootstrap"}[5m]))
description: Increase in remote bootstraps detected during last 10 minutes
queryTemplate: sum by (universe_uuid) (increase(rpc_latency_count{export_type="tserver_export",
service_type="ConsensusService", universe_uuid="__universeUuid__", server_type="yb_consensus",
service_method="StartRemoteBootstrap"}[10m])) / max by (universe_uuid)
(ts_live_tablet_peers{universe_uuid="__universeUuid__"}) * 100
{{ query_condition }} {{ query_threshold }}
createForNewCustomer: false
defaultThresholdMap:
WARNING:
threshold: 0.0
threshold: 10.0
targetType: UNIVERSE
defaultThresholdCondition: GREATER_THAN
defaultThresholdUnit: COUNT
thresholdUnitName: bootstrap(s)
defaultThresholdUnit: PERCENT
thresholdMaxValue: 100.0
thresholdUnitName: '%'
labels:
affected_node_names: >-
{{ range $index, $element := query "sum by (universe_uuid, node_name)
(increase(rpc_latency_count{export_type='tserver_export', service_type='ConsensusService', universe_uuid='{{ $labels.universe_uuid }}', server_type='yb_consensus', service_method='StartRemoteBootstrap'}[5m]))
{{ query_condition }} {{ query_threshold }}" }}{{if $index}},{{end}}{{ $element.Labels.node_name }}{{ end }}
{{ range $index, $element := query "max by (universe_uuid, node_name)
(up{universe_uuid='{{ $labels.universe_uuid }}',export_type='tserver_export'})" }}{{if $index}},{{end}}{{ $element.Labels.node_name }}{{ end }}
annotations:
summary: >-
Increase in remote bootstraps detected for universe '{{ $labels.source_name }}'.
Affected nodes: {{ $labels.affected_node_names }}
TABLET_SERVER_AVG_READ_LATENCY:
name: Tablet server average read latency is high
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
alter table alert_configuration drop constraint ck_ac_threshold_unit;
alter table alert_configuration add constraint ck_ac_threshold_unit check (threshold_unit in ('STATUS','COUNT','PERCENT','MILLISECOND','SECOND','MINUTE','DAY','MEGABYTE'));
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
update alert_configuration
set
threshold_unit = 'PERCENT',
thresholds = '{"WARNING":{"condition":"GREATER_THAN","threshold":10.0}}'
where template = 'INCREASED_REMOTE_BOOTSTRAPS';

update alert_definition set config_written = false where configuration_uuid IN
(select uuid from alert_configuration where template = 'INCREASED_REMOTE_BOOTSTRAPS');

-- YCQL microseconds precision inserts detected alert
insert into alert_configuration
(uuid, customer_uuid, name, description, create_time, target_type, target, thresholds, threshold_unit, template, duration_sec, active, default_destination)
select
gen_random_uuid(),
uuid,
'YCQL inserts with microseconds precision',
'YCQL inserts with microseconds precision detected, which is not fully supported.',
current_timestamp,
'UNIVERSE',
'{"all":true}',
'{"WARNING":{"condition":"GREATER_THAN","threshold":60.0}}',
'MINUTE',
'YCQL_MICROSECOND_TIMESTAMPS_DETECTED',
0,
true,
true
from customer;

select create_universe_alert_definitions('YCQL inserts with microseconds precision');
10 changes: 5 additions & 5 deletions managed/src/main/resources/swagger-strict.json
Original file line number Diff line number Diff line change
Expand Up @@ -674,12 +674,12 @@
},
"template" : {
"description" : "Template name",
"enum" : [ "REPLICATION_LAG", "CLOCK_SKEW", "CLOCK_SYNC_CHECK_FAILED", "MEMORY_CONSUMPTION", "HEALTH_CHECK_ERROR", "HEALTH_CHECK_NOTIFICATION_ERROR", "UNIVERSE_METRIC_COLLECTION_FAILURE", "BACKUP_FAILURE", "BACKUP_DELETION_FAILURE", "BACKUP_SCHEDULE_FAILURE", "INACTIVE_CRON_NODES", "ALERT_QUERY_FAILED", "ALERT_CONFIG_WRITING_FAILED", "ALERT_NOTIFICATION_ERROR", "ALERT_NOTIFICATION_CHANNEL_ERROR", "NODE_DOWN", "NODE_RESTART", "NODE_CPU_USAGE", "NODE_DISK_USAGE", "NODE_SYSTEM_DISK_USAGE", "NODE_FILE_DESCRIPTORS_USAGE", "NODE_OOM_KILLS", "DB_VERSION_MISMATCH", "DB_INSTANCE_DOWN", "DB_INSTANCE_RESTART", "DB_FATAL_LOGS", "DB_ERROR_LOGS", "DB_CORE_FILES", "DB_YSQL_CONNECTION", "DB_YCQL_CONNECTION", "DB_REDIS_CONNECTION", "DB_MEMORY_OVERLOAD", "DB_COMPACTION_OVERLOAD", "DB_QUEUES_OVERFLOW", "DB_DRIVE_FAILURE", "DB_WRITE_READ_TEST_ERROR", "DDL_ATOMICITY_CHECK", "NODE_TO_NODE_CA_CERT_EXPIRY", "NODE_TO_NODE_CERT_EXPIRY", "CLIENT_TO_NODE_CA_CERT_EXPIRY", "CLIENT_TO_NODE_CERT_EXPIRY", "ENCRYPTION_AT_REST_CONFIG_EXPIRY", "SSH_KEY_EXPIRY", "SSH_KEY_ROTATION_FAILURE", "PITR_CONFIG_FAILURE", "YSQL_OP_AVG_LATENCY", "YCQL_OP_AVG_LATENCY", "YSQL_OP_P99_LATENCY", "YCQL_OP_P99_LATENCY", "HIGH_NUM_YSQL_CONNECTIONS", "HIGH_NUM_YCQL_CONNECTIONS", "HIGH_NUM_YEDIS_CONNECTIONS", "YSQL_THROUGHPUT", "YCQL_THROUGHPUT", "MASTER_LEADER_MISSING", "MASTER_UNDER_REPLICATED", "LEADERLESS_TABLETS", "UNDER_REPLICATED_TABLETS", "PRIVATE_ACCESS_KEY_STATUS", "UNIVERSE_OS_UPDATE_REQUIRED", "DB_YCQL_WEB_SERVER_DOWN", "DB_YSQL_WEB_SERVER_DOWN", "INCREASED_REMOTE_BOOTSTRAPS", "TABLET_SERVER_AVG_READ_LATENCY", "TABLET_SERVER_AVG_WRITE_LATENCY", "REACTOR_DELAYS", "RPC_QUEUE_SIZE", "LOG_CACHE_SIZE", "CACHE_MISS", "HA_STANDBY_SYNC", "NODE_AGENT_DOWN", "UNIVERSE_RELEASE_FILES_STATUS", "HA_VERSION_MISMATCH", "TABLET_PEERS_GUARDRAIL", "XCLUSTER_CONFIG_TABLE_BAD_STATE", "NODE_CLOCK_DRIFT", "UNIVERSE_UNEXPECTED_MASTERS_RUNNING", "UNIVERSE_UNEXPECTED_TSERVERS_RUNNING" ],
"enum" : [ "REPLICATION_LAG", "CLOCK_SKEW", "CLOCK_SYNC_CHECK_FAILED", "MEMORY_CONSUMPTION", "HEALTH_CHECK_ERROR", "HEALTH_CHECK_NOTIFICATION_ERROR", "UNIVERSE_METRIC_COLLECTION_FAILURE", "BACKUP_FAILURE", "BACKUP_DELETION_FAILURE", "BACKUP_SCHEDULE_FAILURE", "INACTIVE_CRON_NODES", "ALERT_QUERY_FAILED", "ALERT_CONFIG_WRITING_FAILED", "ALERT_NOTIFICATION_ERROR", "ALERT_NOTIFICATION_CHANNEL_ERROR", "NODE_DOWN", "NODE_RESTART", "NODE_CPU_USAGE", "NODE_DISK_USAGE", "NODE_SYSTEM_DISK_USAGE", "NODE_FILE_DESCRIPTORS_USAGE", "NODE_OOM_KILLS", "DB_VERSION_MISMATCH", "DB_INSTANCE_DOWN", "DB_INSTANCE_RESTART", "DB_FATAL_LOGS", "DB_ERROR_LOGS", "DB_CORE_FILES", "DB_YSQL_CONNECTION", "DB_YCQL_CONNECTION", "DB_REDIS_CONNECTION", "DB_MEMORY_OVERLOAD", "DB_COMPACTION_OVERLOAD", "DB_QUEUES_OVERFLOW", "DB_DRIVE_FAILURE", "DB_WRITE_READ_TEST_ERROR", "DDL_ATOMICITY_CHECK", "NODE_TO_NODE_CA_CERT_EXPIRY", "NODE_TO_NODE_CERT_EXPIRY", "CLIENT_TO_NODE_CA_CERT_EXPIRY", "CLIENT_TO_NODE_CERT_EXPIRY", "ENCRYPTION_AT_REST_CONFIG_EXPIRY", "SSH_KEY_EXPIRY", "SSH_KEY_ROTATION_FAILURE", "PITR_CONFIG_FAILURE", "YSQL_OP_AVG_LATENCY", "YCQL_OP_AVG_LATENCY", "YSQL_OP_P99_LATENCY", "YCQL_OP_P99_LATENCY", "HIGH_NUM_YSQL_CONNECTIONS", "HIGH_NUM_YCQL_CONNECTIONS", "HIGH_NUM_YEDIS_CONNECTIONS", "YSQL_THROUGHPUT", "YCQL_THROUGHPUT", "YCQL_MICROSECOND_TIMESTAMPS_DETECTED", "MASTER_LEADER_MISSING", "MASTER_UNDER_REPLICATED", "LEADERLESS_TABLETS", "UNDER_REPLICATED_TABLETS", "PRIVATE_ACCESS_KEY_STATUS", "UNIVERSE_OS_UPDATE_REQUIRED", "DB_YCQL_WEB_SERVER_DOWN", "DB_YSQL_WEB_SERVER_DOWN", "INCREASED_REMOTE_BOOTSTRAPS", "TABLET_SERVER_AVG_READ_LATENCY", "TABLET_SERVER_AVG_WRITE_LATENCY", "REACTOR_DELAYS", "RPC_QUEUE_SIZE", "LOG_CACHE_SIZE", "CACHE_MISS", "HA_STANDBY_SYNC", "NODE_AGENT_DOWN", "UNIVERSE_RELEASE_FILES_STATUS", "HA_VERSION_MISMATCH", "TABLET_PEERS_GUARDRAIL", "XCLUSTER_CONFIG_TABLE_BAD_STATE", "NODE_CLOCK_DRIFT", "UNIVERSE_UNEXPECTED_MASTERS_RUNNING", "UNIVERSE_UNEXPECTED_TSERVERS_RUNNING" ],
"type" : "string"
},
"thresholdUnit" : {
"description" : "Threshold unit",
"enum" : [ "STATUS", "COUNT", "PERCENT", "MILLISECOND", "SECOND", "DAY", "MEGABYTE" ],
"enum" : [ "STATUS", "COUNT", "PERCENT", "MILLISECOND", "SECOND", "MINUTE", "DAY", "MEGABYTE" ],
"type" : "string"
},
"thresholds" : {
Expand Down Expand Up @@ -727,7 +727,7 @@
"type" : "string"
},
"template" : {
"enum" : [ "REPLICATION_LAG", "CLOCK_SKEW", "CLOCK_SYNC_CHECK_FAILED", "MEMORY_CONSUMPTION", "HEALTH_CHECK_ERROR", "HEALTH_CHECK_NOTIFICATION_ERROR", "UNIVERSE_METRIC_COLLECTION_FAILURE", "BACKUP_FAILURE", "BACKUP_DELETION_FAILURE", "BACKUP_SCHEDULE_FAILURE", "INACTIVE_CRON_NODES", "ALERT_QUERY_FAILED", "ALERT_CONFIG_WRITING_FAILED", "ALERT_NOTIFICATION_ERROR", "ALERT_NOTIFICATION_CHANNEL_ERROR", "NODE_DOWN", "NODE_RESTART", "NODE_CPU_USAGE", "NODE_DISK_USAGE", "NODE_SYSTEM_DISK_USAGE", "NODE_FILE_DESCRIPTORS_USAGE", "NODE_OOM_KILLS", "DB_VERSION_MISMATCH", "DB_INSTANCE_DOWN", "DB_INSTANCE_RESTART", "DB_FATAL_LOGS", "DB_ERROR_LOGS", "DB_CORE_FILES", "DB_YSQL_CONNECTION", "DB_YCQL_CONNECTION", "DB_REDIS_CONNECTION", "DB_MEMORY_OVERLOAD", "DB_COMPACTION_OVERLOAD", "DB_QUEUES_OVERFLOW", "DB_DRIVE_FAILURE", "DB_WRITE_READ_TEST_ERROR", "DDL_ATOMICITY_CHECK", "NODE_TO_NODE_CA_CERT_EXPIRY", "NODE_TO_NODE_CERT_EXPIRY", "CLIENT_TO_NODE_CA_CERT_EXPIRY", "CLIENT_TO_NODE_CERT_EXPIRY", "ENCRYPTION_AT_REST_CONFIG_EXPIRY", "SSH_KEY_EXPIRY", "SSH_KEY_ROTATION_FAILURE", "PITR_CONFIG_FAILURE", "YSQL_OP_AVG_LATENCY", "YCQL_OP_AVG_LATENCY", "YSQL_OP_P99_LATENCY", "YCQL_OP_P99_LATENCY", "HIGH_NUM_YSQL_CONNECTIONS", "HIGH_NUM_YCQL_CONNECTIONS", "HIGH_NUM_YEDIS_CONNECTIONS", "YSQL_THROUGHPUT", "YCQL_THROUGHPUT", "MASTER_LEADER_MISSING", "MASTER_UNDER_REPLICATED", "LEADERLESS_TABLETS", "UNDER_REPLICATED_TABLETS", "PRIVATE_ACCESS_KEY_STATUS", "UNIVERSE_OS_UPDATE_REQUIRED", "DB_YCQL_WEB_SERVER_DOWN", "DB_YSQL_WEB_SERVER_DOWN", "INCREASED_REMOTE_BOOTSTRAPS", "TABLET_SERVER_AVG_READ_LATENCY", "TABLET_SERVER_AVG_WRITE_LATENCY", "REACTOR_DELAYS", "RPC_QUEUE_SIZE", "LOG_CACHE_SIZE", "CACHE_MISS", "HA_STANDBY_SYNC", "NODE_AGENT_DOWN", "UNIVERSE_RELEASE_FILES_STATUS", "HA_VERSION_MISMATCH", "TABLET_PEERS_GUARDRAIL", "XCLUSTER_CONFIG_TABLE_BAD_STATE", "NODE_CLOCK_DRIFT", "UNIVERSE_UNEXPECTED_MASTERS_RUNNING", "UNIVERSE_UNEXPECTED_TSERVERS_RUNNING" ],
"enum" : [ "REPLICATION_LAG", "CLOCK_SKEW", "CLOCK_SYNC_CHECK_FAILED", "MEMORY_CONSUMPTION", "HEALTH_CHECK_ERROR", "HEALTH_CHECK_NOTIFICATION_ERROR", "UNIVERSE_METRIC_COLLECTION_FAILURE", "BACKUP_FAILURE", "BACKUP_DELETION_FAILURE", "BACKUP_SCHEDULE_FAILURE", "INACTIVE_CRON_NODES", "ALERT_QUERY_FAILED", "ALERT_CONFIG_WRITING_FAILED", "ALERT_NOTIFICATION_ERROR", "ALERT_NOTIFICATION_CHANNEL_ERROR", "NODE_DOWN", "NODE_RESTART", "NODE_CPU_USAGE", "NODE_DISK_USAGE", "NODE_SYSTEM_DISK_USAGE", "NODE_FILE_DESCRIPTORS_USAGE", "NODE_OOM_KILLS", "DB_VERSION_MISMATCH", "DB_INSTANCE_DOWN", "DB_INSTANCE_RESTART", "DB_FATAL_LOGS", "DB_ERROR_LOGS", "DB_CORE_FILES", "DB_YSQL_CONNECTION", "DB_YCQL_CONNECTION", "DB_REDIS_CONNECTION", "DB_MEMORY_OVERLOAD", "DB_COMPACTION_OVERLOAD", "DB_QUEUES_OVERFLOW", "DB_DRIVE_FAILURE", "DB_WRITE_READ_TEST_ERROR", "DDL_ATOMICITY_CHECK", "NODE_TO_NODE_CA_CERT_EXPIRY", "NODE_TO_NODE_CERT_EXPIRY", "CLIENT_TO_NODE_CA_CERT_EXPIRY", "CLIENT_TO_NODE_CERT_EXPIRY", "ENCRYPTION_AT_REST_CONFIG_EXPIRY", "SSH_KEY_EXPIRY", "SSH_KEY_ROTATION_FAILURE", "PITR_CONFIG_FAILURE", "YSQL_OP_AVG_LATENCY", "YCQL_OP_AVG_LATENCY", "YSQL_OP_P99_LATENCY", "YCQL_OP_P99_LATENCY", "HIGH_NUM_YSQL_CONNECTIONS", "HIGH_NUM_YCQL_CONNECTIONS", "HIGH_NUM_YEDIS_CONNECTIONS", "YSQL_THROUGHPUT", "YCQL_THROUGHPUT", "YCQL_MICROSECOND_TIMESTAMPS_DETECTED", "MASTER_LEADER_MISSING", "MASTER_UNDER_REPLICATED", "LEADERLESS_TABLETS", "UNDER_REPLICATED_TABLETS", "PRIVATE_ACCESS_KEY_STATUS", "UNIVERSE_OS_UPDATE_REQUIRED", "DB_YCQL_WEB_SERVER_DOWN", "DB_YSQL_WEB_SERVER_DOWN", "INCREASED_REMOTE_BOOTSTRAPS", "TABLET_SERVER_AVG_READ_LATENCY", "TABLET_SERVER_AVG_WRITE_LATENCY", "REACTOR_DELAYS", "RPC_QUEUE_SIZE", "LOG_CACHE_SIZE", "CACHE_MISS", "HA_STANDBY_SYNC", "NODE_AGENT_DOWN", "UNIVERSE_RELEASE_FILES_STATUS", "HA_VERSION_MISMATCH", "TABLET_PEERS_GUARDRAIL", "XCLUSTER_CONFIG_TABLE_BAD_STATE", "NODE_CLOCK_DRIFT", "UNIVERSE_UNEXPECTED_MASTERS_RUNNING", "UNIVERSE_UNEXPECTED_TSERVERS_RUNNING" ],
"type" : "string"
},
"uuids" : {
Expand Down Expand Up @@ -888,7 +888,7 @@
},
"template" : {
"description" : "Template name",
"enum" : [ "REPLICATION_LAG", "CLOCK_SKEW", "CLOCK_SYNC_CHECK_FAILED", "MEMORY_CONSUMPTION", "HEALTH_CHECK_ERROR", "HEALTH_CHECK_NOTIFICATION_ERROR", "UNIVERSE_METRIC_COLLECTION_FAILURE", "BACKUP_FAILURE", "BACKUP_DELETION_FAILURE", "BACKUP_SCHEDULE_FAILURE", "INACTIVE_CRON_NODES", "ALERT_QUERY_FAILED", "ALERT_CONFIG_WRITING_FAILED", "ALERT_NOTIFICATION_ERROR", "ALERT_NOTIFICATION_CHANNEL_ERROR", "NODE_DOWN", "NODE_RESTART", "NODE_CPU_USAGE", "NODE_DISK_USAGE", "NODE_SYSTEM_DISK_USAGE", "NODE_FILE_DESCRIPTORS_USAGE", "NODE_OOM_KILLS", "DB_VERSION_MISMATCH", "DB_INSTANCE_DOWN", "DB_INSTANCE_RESTART", "DB_FATAL_LOGS", "DB_ERROR_LOGS", "DB_CORE_FILES", "DB_YSQL_CONNECTION", "DB_YCQL_CONNECTION", "DB_REDIS_CONNECTION", "DB_MEMORY_OVERLOAD", "DB_COMPACTION_OVERLOAD", "DB_QUEUES_OVERFLOW", "DB_DRIVE_FAILURE", "DB_WRITE_READ_TEST_ERROR", "DDL_ATOMICITY_CHECK", "NODE_TO_NODE_CA_CERT_EXPIRY", "NODE_TO_NODE_CERT_EXPIRY", "CLIENT_TO_NODE_CA_CERT_EXPIRY", "CLIENT_TO_NODE_CERT_EXPIRY", "ENCRYPTION_AT_REST_CONFIG_EXPIRY", "SSH_KEY_EXPIRY", "SSH_KEY_ROTATION_FAILURE", "PITR_CONFIG_FAILURE", "YSQL_OP_AVG_LATENCY", "YCQL_OP_AVG_LATENCY", "YSQL_OP_P99_LATENCY", "YCQL_OP_P99_LATENCY", "HIGH_NUM_YSQL_CONNECTIONS", "HIGH_NUM_YCQL_CONNECTIONS", "HIGH_NUM_YEDIS_CONNECTIONS", "YSQL_THROUGHPUT", "YCQL_THROUGHPUT", "MASTER_LEADER_MISSING", "MASTER_UNDER_REPLICATED", "LEADERLESS_TABLETS", "UNDER_REPLICATED_TABLETS", "PRIVATE_ACCESS_KEY_STATUS", "UNIVERSE_OS_UPDATE_REQUIRED", "DB_YCQL_WEB_SERVER_DOWN", "DB_YSQL_WEB_SERVER_DOWN", "INCREASED_REMOTE_BOOTSTRAPS", "TABLET_SERVER_AVG_READ_LATENCY", "TABLET_SERVER_AVG_WRITE_LATENCY", "REACTOR_DELAYS", "RPC_QUEUE_SIZE", "LOG_CACHE_SIZE", "CACHE_MISS", "HA_STANDBY_SYNC", "NODE_AGENT_DOWN", "UNIVERSE_RELEASE_FILES_STATUS", "HA_VERSION_MISMATCH", "TABLET_PEERS_GUARDRAIL", "XCLUSTER_CONFIG_TABLE_BAD_STATE", "NODE_CLOCK_DRIFT", "UNIVERSE_UNEXPECTED_MASTERS_RUNNING", "UNIVERSE_UNEXPECTED_TSERVERS_RUNNING" ],
"enum" : [ "REPLICATION_LAG", "CLOCK_SKEW", "CLOCK_SYNC_CHECK_FAILED", "MEMORY_CONSUMPTION", "HEALTH_CHECK_ERROR", "HEALTH_CHECK_NOTIFICATION_ERROR", "UNIVERSE_METRIC_COLLECTION_FAILURE", "BACKUP_FAILURE", "BACKUP_DELETION_FAILURE", "BACKUP_SCHEDULE_FAILURE", "INACTIVE_CRON_NODES", "ALERT_QUERY_FAILED", "ALERT_CONFIG_WRITING_FAILED", "ALERT_NOTIFICATION_ERROR", "ALERT_NOTIFICATION_CHANNEL_ERROR", "NODE_DOWN", "NODE_RESTART", "NODE_CPU_USAGE", "NODE_DISK_USAGE", "NODE_SYSTEM_DISK_USAGE", "NODE_FILE_DESCRIPTORS_USAGE", "NODE_OOM_KILLS", "DB_VERSION_MISMATCH", "DB_INSTANCE_DOWN", "DB_INSTANCE_RESTART", "DB_FATAL_LOGS", "DB_ERROR_LOGS", "DB_CORE_FILES", "DB_YSQL_CONNECTION", "DB_YCQL_CONNECTION", "DB_REDIS_CONNECTION", "DB_MEMORY_OVERLOAD", "DB_COMPACTION_OVERLOAD", "DB_QUEUES_OVERFLOW", "DB_DRIVE_FAILURE", "DB_WRITE_READ_TEST_ERROR", "DDL_ATOMICITY_CHECK", "NODE_TO_NODE_CA_CERT_EXPIRY", "NODE_TO_NODE_CERT_EXPIRY", "CLIENT_TO_NODE_CA_CERT_EXPIRY", "CLIENT_TO_NODE_CERT_EXPIRY", "ENCRYPTION_AT_REST_CONFIG_EXPIRY", "SSH_KEY_EXPIRY", "SSH_KEY_ROTATION_FAILURE", "PITR_CONFIG_FAILURE", "YSQL_OP_AVG_LATENCY", "YCQL_OP_AVG_LATENCY", "YSQL_OP_P99_LATENCY", "YCQL_OP_P99_LATENCY", "HIGH_NUM_YSQL_CONNECTIONS", "HIGH_NUM_YCQL_CONNECTIONS", "HIGH_NUM_YEDIS_CONNECTIONS", "YSQL_THROUGHPUT", "YCQL_THROUGHPUT", "YCQL_MICROSECOND_TIMESTAMPS_DETECTED", "MASTER_LEADER_MISSING", "MASTER_UNDER_REPLICATED", "LEADERLESS_TABLETS", "UNDER_REPLICATED_TABLETS", "PRIVATE_ACCESS_KEY_STATUS", "UNIVERSE_OS_UPDATE_REQUIRED", "DB_YCQL_WEB_SERVER_DOWN", "DB_YSQL_WEB_SERVER_DOWN", "INCREASED_REMOTE_BOOTSTRAPS", "TABLET_SERVER_AVG_READ_LATENCY", "TABLET_SERVER_AVG_WRITE_LATENCY", "REACTOR_DELAYS", "RPC_QUEUE_SIZE", "LOG_CACHE_SIZE", "CACHE_MISS", "HA_STANDBY_SYNC", "NODE_AGENT_DOWN", "UNIVERSE_RELEASE_FILES_STATUS", "HA_VERSION_MISMATCH", "TABLET_PEERS_GUARDRAIL", "XCLUSTER_CONFIG_TABLE_BAD_STATE", "NODE_CLOCK_DRIFT", "UNIVERSE_UNEXPECTED_MASTERS_RUNNING", "UNIVERSE_UNEXPECTED_TSERVERS_RUNNING" ],
"type" : "string"
},
"thresholdConditionReadOnly" : {
Expand Down Expand Up @@ -920,7 +920,7 @@
},
"thresholdUnit" : {
"description" : "Threshold unit",
"enum" : [ "STATUS", "COUNT", "PERCENT", "MILLISECOND", "SECOND", "DAY", "MEGABYTE" ],
"enum" : [ "STATUS", "COUNT", "PERCENT", "MILLISECOND", "SECOND", "MINUTE", "DAY", "MEGABYTE" ],
"type" : "string"
},
"thresholdUnitName" : {
Expand Down
Loading

0 comments on commit e5a8422

Please sign in to comment.