|
| 1 | +# Security Guidelines for Cluster API Users |
| 2 | + |
| 3 | +This document compiles security best practices for using Cluster API. We recommend that organizations adapt these guidelines to their specific infrastructure and security requirements to ensure safe operations. |
| 4 | + |
| 5 | +## Comprehensive auditing |
| 6 | + |
| 7 | +To ensure comprehensive auditing, the following components require audit configuration: |
| 8 | + |
| 9 | +- **Cluster-level Auditing** |
| 10 | + - Auditing on the management cluster |
| 11 | + - API server auditing for all workload clusters |
| 12 | + |
| 13 | +- **Node/VM-level Auditing** |
| 14 | + - Audit KubeConfig files access that are located on the node |
| 15 | + - Audit access or edits to CA private keys and cert files located on the node |
| 16 | + |
| 17 | +- **Cloud Provider Auditing** |
| 18 | + - Cloud API auditing to log all actions performed using cloud credentials |
| 19 | + |
| 20 | +After configuring these audit sources, centralize the logs using aggregation tools and implement real-time monitoring and alerting to detect suspicious activities and security incidents. |
| 21 | + |
| 22 | +## Use least privileges |
| 23 | + |
| 24 | +To minimize security risks related to cloud provider access, create dedicated cloud credentials that have only the necessary permissions to manage the lifecycle of a cluster. Avoid using administrative or root accounts for Cluster API operations, and use separate credentials for different purposes such as management cluster versus workload clusters. |
| 25 | + |
| 26 | +## Limit access |
| 27 | + |
| 28 | +Implement access restrictions to protect cluster infrastructure. |
| 29 | + |
| 30 | +### Control Plane Protection |
| 31 | + |
| 32 | +Limit who can create pods on control plane nodes through multiple methods: |
| 33 | + |
| 34 | +- **Taints and Tolerations**: Apply `NoSchedule` taints to control plane nodes to prevent general workload scheduling. See [Kubernetes Taints and Tolerations documentation](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/) |
| 35 | +- **RBAC Policies**: Restrict pod creation permissions using Role-Based Access Control. See [Kubernetes RBAC documentation](https://kubernetes.io/docs/reference/access-authn-authz/rbac/) |
| 36 | +- **Admission Controllers**: Implement admission webhooks to enforce pod placement policies. See [Dynamic Admission Control](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/) |
| 37 | + |
| 38 | +### SSH Access |
| 39 | + |
| 40 | +Disable or restrict SSH access to nodes in a cluster to prevent unauthorized modifications and access to sensitive files. |
| 41 | + |
| 42 | +## Second pair of eyes |
| 43 | + |
| 44 | +Implement a review process where at least two people must approve privileged actions such as creating, deleting, or updating clusters. GitOps provides an effective way to enforce this requirement through pull request workflows, where changes to cluster configurations must be reviewed and approved by another team member before being merged and applied to the infrastructure. |
| 45 | + |
| 46 | +## Implement comprehensive alerting |
| 47 | + |
| 48 | +Configure alerts in the centralized audit log system to detect security incidents and resource anomalies. |
| 49 | + |
| 50 | +### Security Event Monitoring |
| 51 | + |
| 52 | +- Alert when cluster API components are modified, restarted, or experience unexpected state changes |
| 53 | +- Monitor and alert on unauthorized changes to sensitive files on machine images |
| 54 | +- Alert on unexpected machine restarts or shutdowns |
| 55 | +- Monitor deletion or modification of Elastic Load Balancers (ELB) for API servers |
| 56 | + |
| 57 | +### Resource Activity Monitoring |
| 58 | + |
| 59 | +- Alert on all cloud resource creation, update, and deletion activities |
| 60 | +- Identify anomalous patterns such as mass resource creation or deletion |
| 61 | +- Monitor for resources created outside expected boundaries |
| 62 | + |
| 63 | +### Resource Limit Monitoring |
| 64 | + |
| 65 | +- Alert when the number of clusters approaches or exceeds defined soft limits |
| 66 | +- Monitor node creation rates and alert when approaching capacity limits |
| 67 | +- Track usage against cloud provider quotas and organizational limits |
| 68 | +- Alert on excessive API calls or resource creation requests |
| 69 | + |
| 70 | +## Cluster isolation and segregation |
| 71 | + |
| 72 | +Implement multiple layers of isolation to prevent privilege escalation from workload clusters to management cluster. |
| 73 | + |
| 74 | +### Account/Subscription Separation |
| 75 | + |
| 76 | +Separate workload clusters into different AWS accounts or Azure subscriptions, and use dedicated accounts for management cluster and production workloads. This approach provides a strong security boundary at the cloud provider level. |
| 77 | + |
| 78 | +### Network Boundaries |
| 79 | + |
| 80 | +Separate workload and management clusters at the network level through VPC boundaries. Use dedicated VPC/VNet for each cluster type to prevent lateral movement between clusters. |
| 81 | + |
| 82 | +### Certificate Authority Isolation |
| 83 | + |
| 84 | +Do not build a chain of trust for cluster CAs. Each cluster must have its own independent CA to ensure that workload cluster CA compromise does not provide access to the management cluster. See [Kubernetes PKI certificates and requirements](https://kubernetes.io/docs/setup/best-practices/certificates/) for best practices. |
| 85 | + |
| 86 | +## Prevent runtime updates |
| 87 | + |
| 88 | +Implement controls to prevent tampering of machine images at runtime. Disable or restrict updates to machine images at runtime and prevent unauthorized modifications through SSH access restrictions. Following [immutable infrastructure](https://glossary.cncf.io/immutable-infrastructure/) practices ensures that any changes require deploying new images rather than modifying running systems. |
0 commit comments