Skip to content

Commit

Permalink
Adds prometheus export point to the agent pod. (#2983)
Browse files Browse the repository at this point in the history
* Add metrics system (prometheus) to the agent.

* Always have metrics, just doesnt do anything.

* Add metrics for sniffer.

* Add metrics for stealer.

* How to install prometheus // Add more metrics to the reply

* Do not register the gauge multiple times.

* tcpoutgoing metrics

* udpoutgoing metrics

* docs

* move metrics to modules

* bump protocol

* fix axum-server version

* info -> trace

* just have metrics atomics everywhere

* drop actors

* gate metrics behind config

* use socketaddr

* spawn inside spawn

* error impl send

* move where we inc subs for sniffer

* no clone

* dec things closer to remove

* better help

Co-authored-by: Michał Smolarek <[email protected]>

* fix help

Co-authored-by: Michał Smolarek <[email protected]>

* no async file manager

* split filtered unfiltered steal inc

* remove unfiltered inc from wrong place

* cancellation token

* unit test for metrics

* remove unused

* rustfmt

* schema

* schema 2

* md

* appease clippy

* crypto provider for test

* encode_to_string // cancellation on error

* better error

Co-authored-by: Michał Smolarek <[email protected]>

* no more metricserror

* outdated docs

* connection metrics to run tasks

* unused

* remove todo

* line

Co-authored-by: Michał Smolarek <[email protected]>

* inc not dec

Co-authored-by: Michał Smolarek <[email protected]>

* inc stream fd

Co-authored-by: Michał Smolarek <[email protected]>

* only dec once

Co-authored-by: Michał Smolarek <[email protected]>

* drop filemanager updates metrics

* sniffer drop and update_packet_filter metrics

* remove dec from extra sub case

* move sub to PortSub add

* drop for PortSubs, zero subs counter

* new and drop for unfiltered task

* remove inc from run

* drop for filtered

* docs

* remove metrics from some places

* cancel

* near insert and remove

* connection sub inc

* connected clients

* dns request

* http in progress

* protocol

* http request in progress

* dns

* 2 new decs in tcp outgoing

* dec fd twice

* filtered port

* some docs

* improve error with display impl

* Make UdpOutgoing look more like TcpOutgoing

* AgentError everywhere

* allow type complexity

* Remove a few extra AgentResult

* bump protocol

* fix link doc

* fix wrong doc

* Put global gauges in state so tests dont explode.

* Fix repeated value.

* changelog

* Dont use default registry.

* no ignoring alreadyreg

* lil docs

---------

Co-authored-by: Michał Smolarek <[email protected]>
  • Loading branch information
meowjesty and Razz4780 authored Jan 22, 2025
1 parent 15f9e34 commit 640eeac
Show file tree
Hide file tree
Showing 49 changed files with 1,895 additions and 668 deletions.
584 changes: 325 additions & 259 deletions Cargo.lock

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions changelog.d/2975.added.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add prometheus metrics to the mirrord-agent.
12 changes: 10 additions & 2 deletions mirrord-schema.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "LayerFileConfig",
"description": "mirrord allows for a high degree of customization when it comes to which features you want to enable, and how they should function.\n\nAll of the configuration fields have a default value, so a minimal configuration would be no configuration at all.\n\nThe configuration supports templating using the [Tera](https://keats.github.io/tera/docs/) template engine. Currently we don't provide additional values to the context, if you have anything you want us to provide please let us know.\n\nTo use a configuration file in the CLI, use the `-f <CONFIG_PATH>` flag. Or if using VSCode Extension or JetBrains plugin, simply create a `.mirrord/mirrord.json` file or use the UI.\n\nTo help you get started, here are examples of a basic configuration file, and a complete configuration file containing all fields.\n\n### Basic `config.json` {#root-basic}\n\n```json { \"target\": \"pod/bear-pod\", \"feature\": { \"env\": true, \"fs\": \"read\", \"network\": true } } ```\n\n### Basic `config.json` with templating {#root-basic-templating}\n\n```json { \"target\": \"{{ get_env(name=\"TARGET\", default=\"pod/fallback\") }}\", \"feature\": { \"env\": true, \"fs\": \"read\", \"network\": true } } ```\n\n### Complete `config.json` {#root-complete}\n\nDon't use this example as a starting point, it's just here to show you all the available options. ```json { \"accept_invalid_certificates\": false, \"skip_processes\": \"ide-debugger\", \"target\": { \"path\": \"pod/bear-pod\", \"namespace\": \"default\" }, \"connect_tcp\": null, \"agent\": { \"log_level\": \"info\", \"json_log\": false, \"labels\": { \"user\": \"meow\" }, \"annotations\": { \"cats.io/inject\": \"enabled\" }, \"namespace\": \"default\", \"image\": \"ghcr.io/metalbear-co/mirrord:latest\", \"image_pull_policy\": \"IfNotPresent\", \"image_pull_secrets\": [ { \"secret-key\": \"secret\" } ], \"ttl\": 30, \"ephemeral\": false, \"communication_timeout\": 30, \"startup_timeout\": 360, \"network_interface\": \"eth0\", \"flush_connections\": true }, \"feature\": { \"env\": { \"include\": \"DATABASE_USER;PUBLIC_ENV\", \"exclude\": \"DATABASE_PASSWORD;SECRET_ENV\", \"override\": { \"DATABASE_CONNECTION\": \"db://localhost:7777/my-db\", \"LOCAL_BEAR\": \"panda\" }, \"mapping\": { \".+_TIMEOUT\": \"1000\" } }, \"fs\": { \"mode\": \"write\", \"read_write\": \".+\\\\.json\" , \"read_only\": [ \".+\\\\.yaml\", \".+important-file\\\\.txt\" ], \"local\": [ \".+\\\\.js\", \".+\\\\.mjs\" ] }, \"network\": { \"incoming\": { \"mode\": \"steal\", \"http_filter\": { \"header_filter\": \"host: api\\\\..+\" }, \"port_mapping\": [[ 7777, 8888 ]], \"ignore_localhost\": false, \"ignore_ports\": [9999, 10000] }, \"outgoing\": { \"tcp\": true, \"udp\": true, \"filter\": { \"local\": [\"tcp://1.1.1.0/24:1337\", \"1.1.5.0/24\", \"google.com\", \":53\"] }, \"ignore_localhost\": false, \"unix_streams\": \"bear.+\" }, \"dns\": { \"enabled\": true, \"filter\": { \"local\": [\"1.1.1.0/24:1337\", \"1.1.5.0/24\", \"google.com\"] } } }, \"copy_target\": { \"scale_down\": false } }, \"operator\": true, \"kubeconfig\": \"~/.kube/config\", \"sip_binaries\": \"bash\", \"telemetry\": true, \"kube_context\": \"my-cluster\" } ```\n\n# Options {#root-options}",
"description": "mirrord allows for a high degree of customization when it comes to which features you want to enable, and how they should function.\n\nAll of the configuration fields have a default value, so a minimal configuration would be no configuration at all.\n\nThe configuration supports templating using the [Tera](https://keats.github.io/tera/docs/) template engine. Currently we don't provide additional values to the context, if you have anything you want us to provide please let us know.\n\nTo use a configuration file in the CLI, use the `-f <CONFIG_PATH>` flag. Or if using VSCode Extension or JetBrains plugin, simply create a `.mirrord/mirrord.json` file or use the UI.\n\nTo help you get started, here are examples of a basic configuration file, and a complete configuration file containing all fields.\n\n### Basic `config.json` {#root-basic}\n\n```json { \"target\": \"pod/bear-pod\", \"feature\": { \"env\": true, \"fs\": \"read\", \"network\": true } } ```\n\n### Basic `config.json` with templating {#root-basic-templating}\n\n```json { \"target\": \"{{ get_env(name=\"TARGET\", default=\"pod/fallback\") }}\", \"feature\": { \"env\": true, \"fs\": \"read\", \"network\": true } } ```\n\n### Complete `config.json` {#root-complete}\n\nDon't use this example as a starting point, it's just here to show you all the available options. ```json { \"accept_invalid_certificates\": false, \"skip_processes\": \"ide-debugger\", \"target\": { \"path\": \"pod/bear-pod\", \"namespace\": \"default\" }, \"connect_tcp\": null, \"agent\": { \"log_level\": \"info\", \"json_log\": false, \"labels\": { \"user\": \"meow\" }, \"annotations\": { \"cats.io/inject\": \"enabled\" }, \"namespace\": \"default\", \"image\": \"ghcr.io/metalbear-co/mirrord:latest\", \"image_pull_policy\": \"IfNotPresent\", \"image_pull_secrets\": [ { \"secret-key\": \"secret\" } ], \"ttl\": 30, \"ephemeral\": false, \"communication_timeout\": 30, \"startup_timeout\": 360, \"network_interface\": \"eth0\", \"flush_connections\": true, \"metrics\": \"0.0.0.0:9000\", }, \"feature\": { \"env\": { \"include\": \"DATABASE_USER;PUBLIC_ENV\", \"exclude\": \"DATABASE_PASSWORD;SECRET_ENV\", \"override\": { \"DATABASE_CONNECTION\": \"db://localhost:7777/my-db\", \"LOCAL_BEAR\": \"panda\" }, \"mapping\": { \".+_TIMEOUT\": \"1000\" } }, \"fs\": { \"mode\": \"write\", \"read_write\": \".+\\\\.json\" , \"read_only\": [ \".+\\\\.yaml\", \".+important-file\\\\.txt\" ], \"local\": [ \".+\\\\.js\", \".+\\\\.mjs\" ] }, \"network\": { \"incoming\": { \"mode\": \"steal\", \"http_filter\": { \"header_filter\": \"host: api\\\\..+\" }, \"port_mapping\": [[ 7777, 8888 ]], \"ignore_localhost\": false, \"ignore_ports\": [9999, 10000] }, \"outgoing\": { \"tcp\": true, \"udp\": true, \"filter\": { \"local\": [\"tcp://1.1.1.0/24:1337\", \"1.1.5.0/24\", \"google.com\", \":53\"] }, \"ignore_localhost\": false, \"unix_streams\": \"bear.+\" }, \"dns\": { \"enabled\": true, \"filter\": { \"local\": [\"1.1.1.0/24:1337\", \"1.1.5.0/24\", \"google.com\"] } } }, \"copy_target\": { \"scale_down\": false } }, \"operator\": true, \"kubeconfig\": \"~/.kube/config\", \"sip_binaries\": \"bash\", \"telemetry\": true, \"kube_context\": \"my-cluster\" } ```\n\n# Options {#root-options}",
"type": "object",
"properties": {
"accept_invalid_certificates": {
Expand Down Expand Up @@ -255,7 +255,7 @@
"properties": {
"annotations": {
"title": "agent.annotations {#agent-annotations}",
"description": "Allows setting up custom annotations for the agent Job and Pod.\n\n```json { \"annotations\": { \"cats.io/inject\": \"enabled\" } } ```",
"description": "Allows setting up custom annotations for the agent Job and Pod.\n\n```json { \"annotations\": { \"cats.io/inject\": \"enabled\" \"prometheus.io/scrape\": \"true\", \"prometheus.io/port\": \"9000\" } } ```",
"type": [
"object",
"null"
Expand Down Expand Up @@ -378,6 +378,14 @@
"null"
]
},
"metrics": {
"title": "agent.metrics {#agent-metrics}",
"description": "Enables prometheus metrics for the agent pod.\n\nYou might need to add annotations to the agent pod depending on how prometheus is configured to scrape for metrics.\n\n```json { \"metrics\": \"0.0.0.0:9000\" } ```",
"type": [
"string",
"null"
]
},
"namespace": {
"title": "agent.namespace {#agent-namespace}",
"description": "Namespace where the agent shall live. Note: Doesn't work with ephemeral containers. Defaults to the current kubernetes namespace.",
Expand Down
3 changes: 3 additions & 0 deletions mirrord/agent/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,8 @@ x509-parser = "0.16"
rustls.workspace = true
envy = "0.4"
socket2.workspace = true
prometheus = { version = "0.13", features = ["process"] }
axum = { version = "0.7", features = ["macros"] }
iptables = { git = "https://github.com/metalbear-co/rust-iptables.git", rev = "e66c7332e361df3c61a194f08eefe3f40763d624" }
rawsocket = { git = "https://github.com/metalbear-co/rawsocket.git" }
procfs = "0.17.0"
Expand All @@ -78,3 +80,4 @@ rstest.workspace = true
mockall = "0.13"
test_bin = "0.4"
rcgen.workspace = true
reqwest.workspace = true
234 changes: 234 additions & 0 deletions mirrord/agent/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,237 @@ Agent part of [mirrord](https://github.com/metalbear-co/mirrord) responsible for
mirrord-agent is written in Rust for safety, low memory consumption and performance.

mirrord-agent is distributed as a container image (currently only x86) that is published on [GitHub Packages publicly](https://github.com/metalbear-co/mirrord-agent/pkgs/container/mirrord-agent).

## Enabling prometheus metrics

To start the metrics server, you'll need to add this config to your `mirrord.json`:

```json
{
"agent": {
"metrics": "0.0.0.0:9000",
"annotations": {
"prometheus.io/scrape": "true",
"prometheus.io/port": "9000"
}
}
```

Remember to change the `port` in both `metrics` and `annotations`, they have to match,
otherwise prometheus will try to scrape on `port: 80` or other commonly used ports.

### Installing prometheus

Run `kubectl apply -f {file-name}.yaml` on these sequences of `yaml` files and you should
get prometheus running in your cluster. You can access the dashboard from your browser at
`http://{cluster-ip}:30909`, if you're using minikube it might be
`http://192.168.49.2:30909`.

You'll get prometheus running under the `monitoring` namespace, but it'll be able to look
into resources from all namespaces. The config in `configmap.yaml` sets prometheus to look
at pods only, if you want to use it to scrape other stuff, check
[this example](https://github.com/prometheus/prometheus/blob/main/documentation/examples/prometheus-kubernetes.yml).

1. `create-namespace.yaml`

```yaml
apiVersion: v1
kind: Namespace
metadata:
name: monitoring
```

2. `cluster-role.yaml`

```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups:
- extensions
resources:
- ingresses
verbs: ["get", "list", "watch"]
```
3. `service-account.yaml`

```yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: monitoring
```

4. `cluster-role-binding.yaml`

```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: monitoring
```

5. `configmap.yaml`

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: monitoring
data:
prometheus.yml: |
# A scrape configuration for running Prometheus on a Kubernetes cluster.
# This uses separate scrape configs for cluster components (i.e. API server, node)
# and services to allow each to use different authentication configs.
#
# Kubernetes labels will be added as Prometheus labels on metrics via the
# `labelmap` relabeling action.
#
# If you are using Kubernetes 1.7.2 or earlier, please take note of the comments
# for the kubernetes-cadvisor job; you will need to edit or remove this job.

# Keep at most 100 sets of details of targets dropped by relabeling.
# This information is used to display in the UI for troubleshooting.
global:
keep_dropped_targets: 100

# Scrape config for API servers.
#
# Kubernetes exposes API servers as endpoints to the default/kubernetes
# service so this uses `endpoints` role and uses relabelling to only keep
# the endpoints associated with the default/kubernetes service using the
# default named port `https`. This works for single API server deployments as
# well as HA API server deployments.
scrape_configs:
# Example scrape config for pods
#
# The relabeling allows the actual pod scrape to be configured
# for all the declared ports (or port-free target if none is declared)
# or only some ports.
- job_name: "kubernetes-pods"

kubernetes_sd_configs:
- role: pod

relabel_configs:
# Example relabel to scrape only pods that have
# "example.io/should_be_scraped = true" annotation.
# - source_labels: [__meta_kubernetes_pod_annotation_example_io_should_be_scraped]
# action: keep
# regex: true
#
# Example relabel to customize metric path based on pod
# "example.io/metric_path = <metric path>" annotation.
# - source_labels: [__meta_kubernetes_pod_annotation_example_io_metric_path]
# action: replace
# target_label: __metrics_path__
# regex: (.+)
#
# Example relabel to scrape only single, desired port for the pod
# based on pod "example.io/scrape_port = <port>" annotation.
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod
```
- If you make any changes to the 5-configmap.yaml file, remember to `kubectl apply` it
**before** restarting the `prometheus` deployment.

6. `deployment.yaml`

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
namespace: monitoring
labels:
app: prometheus
spec:
replicas: 1
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9090"
spec:
serviceAccountName: prometheus
containers:
- name: prometheus
image: prom/prometheus
args:
- '--config.file=/etc/prometheus/prometheus.yml'
ports:
- name: web
containerPort: 9090
volumeMounts:
- name: prometheus-config-volume
mountPath: /etc/prometheus
restartPolicy: Always
volumes:
- name: prometheus-config-volume
configMap:
defaultMode: 420
name: prometheus-config
```

7. `service.yaml`

```yaml
apiVersion: v1
kind: Service
metadata:
name: prometheus-service
namespace: monitoring
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '9090'
spec:
selector:
app: prometheus
type: NodePort
ports:
- port: 8080
targetPort: 9090
nodePort: 30909
```
9 changes: 8 additions & 1 deletion mirrord/agent/src/cli.rs
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
#![deny(missing_docs)]

use std::net::SocketAddr;

use clap::{Parser, Subcommand};
use mirrord_protocol::{
MeshVendor, AGENT_IPV6_ENV, AGENT_NETWORK_INTERFACE_ENV, AGENT_OPERATOR_CERT_ENV,
MeshVendor, AGENT_IPV6_ENV, AGENT_METRICS_ENV, AGENT_NETWORK_INTERFACE_ENV,
AGENT_OPERATOR_CERT_ENV,
};

const DEFAULT_RUNTIME: &str = "containerd";
Expand All @@ -28,6 +31,10 @@ pub struct Args {
#[arg(short = 'i', long, env = AGENT_NETWORK_INTERFACE_ENV)]
pub network_interface: Option<String>,

/// Controls whether metrics are enabled, and the address to set up the metrics server.
#[arg(long, env = AGENT_METRICS_ENV)]
pub metrics: Option<SocketAddr>,

/// Return an error after accepting the first client connection, in order to test agent error
/// cleanup.
///
Expand Down
18 changes: 17 additions & 1 deletion mirrord/agent/src/client_connection.rs
Original file line number Diff line number Diff line change
Expand Up @@ -208,7 +208,7 @@ enum ConnectionFramed {

#[cfg(test)]
mod test {
use std::sync::Arc;
use std::sync::{Arc, Once};

use futures::StreamExt;
use mirrord_protocol::ClientCodec;
Expand All @@ -220,10 +220,19 @@ mod test {

use super::*;

static CRYPTO_PROVIDER: Once = Once::new();

/// Verifies that [`AgentTlsConnector`] correctly accepts a
/// connection from a server using the provided certificate.
#[tokio::test]
async fn agent_tls_connector_valid_cert() {
CRYPTO_PROVIDER.call_once(|| {
rustls::crypto::CryptoProvider::install_default(
rustls::crypto::aws_lc_rs::default_provider(),
)
.expect("Failed to install crypto provider")
});

let cert = rcgen::generate_simple_self_signed(vec!["operator".to_string()]).unwrap();
let cert_bytes = cert.cert.der();
let key_bytes = cert.key_pair.serialize_der();
Expand Down Expand Up @@ -269,6 +278,13 @@ mod test {
/// connection from a server using some other certificate.
#[tokio::test]
async fn agent_tls_connector_invalid_cert() {
CRYPTO_PROVIDER.call_once(|| {
rustls::crypto::CryptoProvider::install_default(
rustls::crypto::aws_lc_rs::default_provider(),
)
.expect("Failed to install crypto provider")
});

let server_cert = rcgen::generate_simple_self_signed(vec!["operator".to_string()]).unwrap();
let cert_bytes = server_cert.cert.der();
let key_bytes = server_cert.key_pair.serialize_der();
Expand Down
4 changes: 2 additions & 2 deletions mirrord/agent/src/container_handle.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
use std::{collections::HashMap, sync::Arc};

use crate::{
error::Result,
error::AgentResult,
runtime::{Container, ContainerInfo, ContainerRuntime},
};

Expand All @@ -22,7 +22,7 @@ pub(crate) struct ContainerHandle(Arc<Inner>);
impl ContainerHandle {
/// Retrieve info about the container and initialize this struct.
#[tracing::instrument(level = "trace")]
pub(crate) async fn new(container: Container) -> Result<Self> {
pub(crate) async fn new(container: Container) -> AgentResult<Self> {
let ContainerInfo { pid, env: raw_env } = container.get_info().await?;

let inner = Inner { pid, raw_env };
Expand Down
Loading

0 comments on commit 640eeac

Please sign in to comment.