Skip to content
This repository was archived by the owner on Feb 28, 2025. It is now read-only.

AIOps Log Anomaly K3s failure story #773

Closed
tybalex opened this issue Nov 4, 2022 · 1 comment
Closed

AIOps Log Anomaly K3s failure story #773

tybalex opened this issue Nov 4, 2022 · 1 comment
Assignees

Comments

@tybalex
Copy link
Contributor

tybalex commented Nov 4, 2022

No description provided.

@tybalex tybalex self-assigned this Nov 4, 2022
@tybalex
Copy link
Contributor Author

tybalex commented Nov 10, 2022

The Fault:
Certificates are expired and are now invalid for K3S

The Story:
User has a K3s cluster which has been running for a while. Eventually, the TLS certificates expire and when that happens, errors start to show up in the log messages such as

Oct 30 00:11:50 k3s-opni-ds-ac-61-etcd-cp-nodes-64410f8e-kjfb5 k3s[287067]: time="2022-01-01T00:11:50Z" level=error msg="CA cert validation failed: Get "https://127.0.0.1:6443/cacerts\": x509: certificate has expired or is not yet valid: current time 2022-10-30T00:11:50Z is before 2022-10-31T22:37:33Z"

This type of log message would be marked as anomalous by Opni's log anomaly detection model and the user would then be aware that they need to config/update their K3s certificates.

A few real cases and their related Github Issues:
Clusters were up and stable for a while (say 6 months) and the certificates got expired at some point.
ct-Open-Source/team-container#66
k3s-io/k3s#5163
In an air-gap environment, user reboots the system, then the system-time was reset to year 2018.
k3s-io/k3s#6368
User didn't configure his cluster correctly --
k3s-io/k3s#6339

How to simulate the fault:
Ssh into a k3s etcd/control plane node and run as the root user.
Using timedatectl, run this command

timedatectl set-ntp no 

This command will turn off NTP synchronization and you can then manually adjust the system's time.

Lastly, run this command to set the system time to some time in the past.
timedatectl set-time 2022-01-01

Which will set the system time of this node to January 1.

You should then observe anomalous logs from K3s.

@tybalex tybalex closed this as completed Nov 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant