diff --git a/.gitignore b/.gitignore index 8b03c06b..fabcc5d1 100644 --- a/.gitignore +++ b/.gitignore @@ -1,2 +1,3 @@ site/ .DS_Store +venv \ No newline at end of file diff --git a/docs/capx/latest b/docs/capx/latest index 9ac194b0..3979bfc0 120000 --- a/docs/capx/latest +++ b/docs/capx/latest @@ -1 +1 @@ -v1.3.x \ No newline at end of file +v1.8.x \ No newline at end of file diff --git a/docs/capx/v1.4.x/addons/install_csi_driver.md b/docs/capx/v1.4.x/addons/install_csi_driver.md new file mode 100644 index 00000000..afb4bdc8 --- /dev/null +++ b/docs/capx/v1.4.x/addons/install_csi_driver.md @@ -0,0 +1,215 @@ +# Nutanix CSI Driver installation with CAPX + +The Nutanix CSI driver is fully supported on CAPI/CAPX deployed clusters where all the nodes meet the [Nutanix CSI driver prerequisites](#capi-workload-cluster-prerequisites-for-the-nutanix-csi-driver). + +There are three methods to install the Nutanix CSI driver on a CAPI/CAPX cluster: + +- Helm +- ClusterResourceSet +- CAPX Flavor + +For more information, check the next sections. + +## CAPI Workload cluster prerequisites for the Nutanix CSI Driver + +Kubernetes workers need the following prerequisites to use the Nutanix CSI Drivers: + +- iSCSI initiator package (for Volumes based block storage) +- NFS client package (for Files based storage) + +These packages may already be present in the image you use with your infrastructure provider or you can also rely on your bootstrap provider to install them. More info is available in the [Prerequisites docs](https://portal.nutanix.com/page/documents/details?targetId=CSI-Volume-Driver-v2_6:csi-csi-plugin-prerequisites-r.html){target=_blank}. + +The package names and installation method will also vary depending on the operating system you plan to use. + +In the example below, `kubeadm` bootstrap provider is used to deploy these packages on top of an Ubuntu 20.04 image. The `kubeadm` bootstrap provider allows defining `preKubeadmCommands` that will be launched before Kubernetes cluster creation. These `preKubeadmCommands` can be defined both in `KubeadmControlPlane` for master nodes and in `KubeadmConfigTemplate` for worker nodes. + +In the example with an Ubuntu 20.04 image, both `KubeadmControlPlane` and `KubeadmConfigTemplate` must be modified as in the example below: + +```yaml +spec: + template: + spec: + # ....... + preKubeadmCommands: + - echo "before kubeadm call" > /var/log/prekubeadm.log + - apt update + - apt install -y nfs-common open-iscsi + - systemctl enable --now iscsid +``` +## Install the Nutanix CSI Driver with Helm + +A recent [Helm](https://helm.sh){target=_blank} version is needed (tested with Helm v3.10.1). + +The example below must be applied on a ready workload cluster. The workload cluster's kubeconfig can be retrieved and used to connect with the following command: + +```shell +clusterctl get kubeconfig $CLUSTER_NAME -n $CLUSTER_NAMESPACE > $CLUSTER_NAME-KUBECONFIG +export KUBECONFIG=$(pwd)/$CLUSTER_NAME-KUBECONFIG +``` + +Once connected to the cluster, follow the [CSI documentation](https://portal.nutanix.com/page/documents/details?targetId=CSI-Volume-Driver-v2_6:csi-csi-driver-install-t.html){target=_blank}. + +First, install the [nutanix-csi-snapshot](https://github.com/nutanix/helm/tree/master/charts/nutanix-csi-snapshot){target=_blank} chart followed by the [nutanix-csi-storage](https://github.com/nutanix/helm/tree/master/charts/nutanix-csi-storage){target=_blank} chart. + +See an example below: + +```shell +#Add the official Nutanix Helm repo and get the latest update +helm repo add nutanix https://nutanix.github.io/helm/ +helm repo update + +# Install the nutanix-csi-snapshot chart +helm install nutanix-csi-snapshot nutanix/nutanix-csi-snapshot -n ntnx-system --create-namespace + +# Install the nutanix-csi-storage chart +helm install nutanix-storage nutanix/nutanix-csi-storage -n ntnx-system --set createSecret=false +``` + +!!! warning + For correct Nutanix CSI driver deployment, a fully functional CNI deployment must be present. + +## Install the Nutanix CSI Driver with `ClusterResourceSet` + +The `ClusterResourceSet` feature was introduced to automatically apply a set of resources (such as CNI/CSI) defined by administrators to matching created/existing workload clusters. + +### Enabling the `ClusterResourceSet` feature + +At the time of writing, `ClusterResourceSet` is an experimental feature that must be enabled during the initialization of a management cluster with the `EXP_CLUSTER_RESOURCE_SET` feature gate. + +To do this, add `EXP_CLUSTER_RESOURCE_SET: "true"` in the `clusterctl` configuration file or just `export EXP_CLUSTER_RESOURCE_SET=true` before initializing the management cluster with `clusterctl init`. + +If the management cluster is already initialized, the `ClusterResourceSet` can be enabled by changing the configuration of the `capi-controller-manager` deployment in the `capi-system` namespace. + + ```shell + kubectl edit deployment -n capi-system capi-controller-manager + ``` + +Locate the section below: + +```yaml + - args: + - --leader-elect + - --metrics-bind-addr=localhost:8080 + - --feature-gates=MachinePool=false,ClusterResourceSet=true,ClusterTopology=false +``` + +Then replace `ClusterResourceSet=false` with `ClusterResourceSet=true`. + +!!! note + Editing the `deployment` resource will cause Kubernetes to automatically start new versions of the containers with the feature enabled. + + + +### Prepare the Nutanix CSI `ClusterResourceSet` + +#### Create the `ConfigMap` for the CSI Plugin + +First, create a `ConfigMap` that contains a YAML manifest with all resources to install the Nutanix CSI driver. + +Since the Nutanix CSI Driver is provided as a Helm chart, use `helm` to extract it before creating the `ConfigMap`. See an example below: + +```shell +helm repo add nutanix https://nutanix.github.io/helm/ +helm repo update + +kubectl create ns ntnx-system --dry-run=client -o yaml > nutanix-csi-namespace.yaml +helm template nutanix-csi-snapshot nutanix/nutanix-csi-snapshot -n ntnx-system > nutanix-csi-snapshot.yaml +helm template nutanix-csi-snapshot nutanix/nutanix-csi-storage -n ntnx-system > nutanix-csi-storage.yaml + +kubectl create configmap nutanix-csi-crs --from-file=nutanix-csi-namespace.yaml --from-file=nutanix-csi-snapshot.yaml --from-file=nutanix-csi-storage.yaml +``` + +#### Create the `ClusterResourceSet` + +Next, create the `ClusterResourceSet` resource that will map the `ConfigMap` defined above to clusters using a `clusterSelector`. + +The `ClusterResourceSet` needs to be created inside the management cluster. See an example below: + +```yaml +--- +apiVersion: addons.cluster.x-k8s.io/v1alpha3 +kind: ClusterResourceSet +metadata: + name: nutanix-csi-crs +spec: + clusterSelector: + matchLabels: + csi: nutanix + resources: + - kind: ConfigMap + name: nutanix-csi-crs +``` + +The `clusterSelector` field controls how Cluster API will match this `ClusterResourceSet` on one or more workload clusters. In the example scenario, the `matchLabels` approach is being used where the `ClusterResourceSet` will be applied to all workload clusters having the `csi: nutanix` label present. If the label isn't present, the `ClusterResourceSet` won't apply to that workload cluster. + +The `resources` field references the `ConfigMap` created above, which contains the manifests for installing the Nutanix CSI driver. + +#### Assign the `ClusterResourceSet` to a workload cluster + +Assign this `ClusterResourceSet` to the workload cluster by adding the correct label to the `Cluster` resource. + +This can be done before workload cluster creation by editing the output of the `clusterctl generate cluster` command or by modifying an already deployed workload cluster. + +In both cases, `Cluster` resources should look like this: + +```yaml +apiVersion: cluster.x-k8s.io/v1beta1 +kind: Cluster +metadata: + name: workload-cluster-name + namespace: workload-cluster-namespace + labels: + csi: nutanix +# ... +``` + +!!! warning + For correct Nutanix CSI driver deployment, a fully functional CNI deployment must be present. + +## Install the Nutanix CSI Driver with a CAPX flavor + +The CAPX provider can utilize a flavor to automatically deploy the Nutanix CSI using a `ClusterResourceSet`. + +### Prerequisites + +The following requirements must be met: + +- The operating system must meet the [Nutanix CSI OS prerequisites](#capi-workload-cluster-prerequisites-for-the-nutanix-csi-driver). +- The Management cluster must be installed with the [`CLUSTER_RESOURCE_SET` feature gate](#enabling-the-clusterresourceset-feature). + +### Installation + +Specify the `csi` flavor during workload cluster creation. See an example below: + +```shell +clusterctl generate cluster my-cluster -f csi +``` + +Additional environment variables are required: + +- `WEBHOOK_CA`: Base64 encoded CA certificate used to sign the webhook certificate +- `WEBHOOK_CERT`: Base64 certificate for the webhook validation component +- `WEBHOOK_KEY`: Base64 key for the webhook validation component + +The three components referenced above can be automatically created and referenced using [this script](https://github.com/nutanix-cloud-native/cluster-api-provider-nutanix/blob/main/scripts/gen-self-cert.sh){target=_blank}: + +``` +source scripts/gen-self-cert.sh +``` + +The certificate must reference the following names: + +- csi-snapshot-webhook +- csi-snapshot-webhook.ntnx-sytem +- csi-snapshot-webhook.ntnx-sytem.svc + +!!! warning + For correct Nutanix CSI driver deployment, a fully functional CNI deployment must be present. + +## Nutanix CSI Driver Configuration + +After the driver is installed, it must be configured for use by minimally defining a `Secret` and `StorageClass`. + +This can be done manually in the workload clusters or by using a `ClusterResourceSet` in the management cluster as explained above. + +See the Official [CSI Driver documentation](https://portal.nutanix.com/page/documents/details?targetId=CSI-Volume-Driver-v2_6:CSI-Volume-Driver-v2_6){target=_blank} on the Nutanix Portal for more configuration information. diff --git a/docs/capx/v1.4.x/credential_management.md b/docs/capx/v1.4.x/credential_management.md new file mode 100644 index 00000000..bebbc5a0 --- /dev/null +++ b/docs/capx/v1.4.x/credential_management.md @@ -0,0 +1,93 @@ +# Credential Management +Cluster API Provider Nutanix Cloud Infrastructure (CAPX) interacts with Nutanix Prism Central (PC) APIs to manage the required Kubernetes cluster infrastructure resources. + +PC credentials are required to authenticate to the PC APIs. CAPX currently supports two mechanisms to supply the required credentials: + +- Credentials injected into the CAPX manager deployment +- Workload cluster specific credentials + +## Credentials injected into the CAPX manager deployment +By default, credentials will be injected into the CAPX manager deployment when CAPX is initialized. See the [getting started guide](./getting_started.md) for more information on the initialization. + +Upon initialization a `nutanix-creds` secret will automatically be created in the `capx-system` namespace. This secret will contain the values supplied via the `NUTANIX_USER` and `NUTANIX_PASSWORD` parameters. + +The `nutanix-creds` secret will be used for workload cluster deployment if no other credential is supplied. + +### Example +An example of the automatically created `nutanix-creds` secret can be found below: +```yaml +--- +apiVersion: v1 +kind: Secret +type: Opaque +metadata: + name: nutanix-creds + namespace: capx-system +stringData: + credentials: | + [ + { + "type": "basic_auth", + "data": { + "prismCentral":{ + "username": "", + "password": "" + }, + "prismElements": null + } + } + ] +``` + +## Workload cluster specific credentials +Users can override the [credentials injected in CAPX manager deployment](#credentials-injected-into-the-capx-manager-deployment) by supplying a credential specific to a workload cluster. The credentials can be supplied by creating a secret in the same namespace as the `NutanixCluster` namespace. + +The secret can be referenced by adding a `credentialRef` inside the `prismCentral` attribute contained in the `NutanixCluster`. +The secret will also be deleted when the `NutanixCluster` is deleted. + +Note: There is a 1:1 relation between the secret and the `NutanixCluster` object. + +### Example +Create a secret in the namespace of the `NutanixCluster`: + +```yaml +--- +apiVersion: v1 +kind: Secret +metadata: + name: "" + namespace: "" +stringData: + credentials: | + [ + { + "type": "basic_auth", + "data": { + "prismCentral":{ + "username": "", + "password": "" + }, + "prismElements": null + } + } + ] +``` + +Add a `prismCentral` and corresponding `credentialRef` to the `NutanixCluster`: + +```yaml +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixCluster +metadata: + name: "" + namespace: "" +spec: + prismCentral: + ... + credentialRef: + name: "" + kind: Secret +... +``` + +See the [NutanixCluster](./types/nutanix_cluster.md) documentation for all supported configuration parameters for the `prismCentral` and `credentialRef` attribute. \ No newline at end of file diff --git a/docs/capx/v1.4.x/experimental/autoscaler.md b/docs/capx/v1.4.x/experimental/autoscaler.md new file mode 100644 index 00000000..2af57213 --- /dev/null +++ b/docs/capx/v1.4.x/experimental/autoscaler.md @@ -0,0 +1,129 @@ +# Using Autoscaler in combination with CAPX + +!!! warning + The scenario and features described on this page are experimental. It's important to note that they have not been fully validated. + +[Autoscaler](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/README.md){target=_blank} can be used in combination with Cluster API to automatically add or remove machines in a cluster. + +Autoscaler can be used in different deployment scenarios. This page will provide an overview of multiple autoscaler deployment scenarios in combination with CAPX. +See the [Testing](#testing) section to see how scale-up/scale-down events can be triggered to validate the autoscaler behaviour. + +More in-depth information on Autoscaler functionality can be found in the [Kubernetes documentation](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/README.md){target=_blank}. + +All Autoscaler configuration parameters can be found [here](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-the-parameters-to-ca){target=_blank}. + +## Scenario 1: Management cluster managing an external workload cluster +In this scenario, Autoscaler will be running on a management cluster and it will manage an external workload cluster. See the management cluster managing an external workload cluster section of [Kubernetes documentation](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/README.md#autoscaler-running-in-management-cluster-using-service-account-credentials-with-separate-workload-cluster){target=_blank} for more information. + +### Steps +1. Deploy a management cluster and workload cluster. The [CAPI quickstart](https://cluster-api.sigs.k8s.io/user/quick-start.html){target=_blank} can be used as a starting point. + + !!! note + Make sure a CNI is installed in the workload cluster. + +4. Download the example [Autoscaler deployment file](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/examples/deployment.yaml){target=_blank}. +5. Modify the `deployment.yaml` file: + - Change the namespace of all resources to the namespaces of the workload cluster. + - Choose an autoscale image. + - Change the following parameters in the `Deployment` resource: +```YAML + spec: + containers: + name: cluster-autoscaler + command: + - /cluster-autoscaler + args: + - --cloud-provider=clusterapi + - --kubeconfig=/mnt/kubeconfig/kubeconfig.yml + - --clusterapi-cloud-config-authoritative + - -v=1 + volumeMounts: + - mountPath: /mnt/kubeconfig + name: kubeconfig + readOnly: true + ... + volumes: + - name: kubeconfig + secret: + secretName: -kubeconfig + items: + - key: value + path: kubeconfig.yml +``` +7. Apply the `deployment.yaml` file. +```bash +kubectl apply -f deployment.yaml +``` +8. Add the [annotations](#autoscaler-node-group-annotations) to the workload cluster `MachineDeployment` resource. +9. Test Autoscaler. Go to the [Testing](#testing) section. + +## Scenario 2: Autoscaler running on workload cluster +In this scenario, Autoscaler will be deployed [on top of the workload cluster](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/README.md#autoscaler-running-in-a-joined-cluster-using-service-account-credentials){target=_blank} directly. In order for Autoscaler to work, it is required that the workload cluster resources are moved from the management cluster to the workload cluster. + +### Steps +1. Deploy a management cluster and workload cluster. The [CAPI quickstart](https://cluster-api.sigs.k8s.io/user/quick-start.html){target=_blank} can be used as a starting point. +2. Get the kubeconfig file for the workload cluster and use this kubeconfig to login to the workload cluster. +```bash +clusterctl get kubeconfig -n /path/to/kubeconfig +``` +3. Install a CNI in the workload cluster. +4. Initialise the CAPX components on top of the workload cluster: +```bash +clusterctl init --infrastructure nutanix +``` +5. Migrate the workload cluster custom resources to the workload cluster. Run following command from the management cluster: +```bash +clusterctl move -n --to-kubeconfig /path/to/kubeconfig +``` +6. Verify if the cluster has been migrated by running following command on the workload cluster: +```bash +kubectl get cluster -A +``` +7. Download the example [autoscaler deployment file](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/examples/deployment.yaml){target=_blank}. +8. Create the Autoscaler namespace: +```bash +kubectl create ns autoscaler +``` +9. Apply the `deployment.yaml` file +```bash +kubectl apply -f deployment.yaml +``` +10. Add the [annotations](#autoscaler-node-group-annotations) to the workload cluster `MachineDeployment` resource. +11. Test Autoscaler. Go to the [Testing](#testing) section. + +## Testing + +1. Deploy an example Kubernetes application. For example, the one used in the [Kubernetes HorizontalPodAutoscaler Walkthrough](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/). +```bash +kubectl apply -f https://k8s.io/examples/application/php-apache.yaml +``` +2. Increase the amount of replicas of the application to trigger a scale-up event: +``` +kubectl scale deployment php-apache --replicas 100 +``` +3. Decrease the amount of replicas of the application again to trigger a scale-down event. + + !!! note + In case of issues check the logs of the Autoscaler pods. + +4. After a while CAPX, will add more machines. Refer to the [Autoscaler configuration parameters](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-the-parameters-to-ca){target=_blank} to tweak the behaviour and timeouts. + +## Autoscaler node group annotations +Autoscaler uses following annotations to define the upper and lower boundries of the managed machines: + +| Annotation | Example Value | Description | +|-------------------------------------------------------------|---------------|-----------------------------------------------| +| cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size | 5 | Maximum amount of machines in this node group | +| cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size | 1 | Minimum amount of machines in this node group | + +These annotations must be applied to the `MachineDeployment` resources of a CAPX cluster. + +### Example +```YAML +apiVersion: cluster.x-k8s.io/v1beta1 +kind: MachineDeployment +metadata: + annotations: + cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size: "5" + cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size: "1" +``` \ No newline at end of file diff --git a/docs/capx/v1.4.x/experimental/capx_multi_pe.md b/docs/capx/v1.4.x/experimental/capx_multi_pe.md new file mode 100644 index 00000000..bd52ccd7 --- /dev/null +++ b/docs/capx/v1.4.x/experimental/capx_multi_pe.md @@ -0,0 +1,30 @@ +# Creating a workload CAPX cluster spanning Prism Element clusters + +!!! warning + The scenario and features described on this page are experimental. It's important to note that they have not been fully validated. + +This page will explain how to deploy CAPX-based Kubernetes clusters where worker nodes are spanning multiple Prism Element (PE) clusters. + +!!! note + All the PE clusters must be managed by the same Prism Central (PC) instance. + +The topology will look like this: + +- One PC managing multiple PE's +- One CAPI management cluster +- One CAPI workload cluster with multiple `MachineDeployment`resources + +Refer to the [CAPI quickstart](https://cluster-api.sigs.k8s.io/user/quick-start.html){target=_blank} to get started with CAPX. + +To create workload clusters spanning multiple Prism Element clusters, it is required to create a `MachineDeployment` and `NutanixMachineTemplate` resource for each Prism Element cluster. The Prism Element specific parameters (name/UUID, subnet,...) are referenced in the `NutanixMachineTemplate`. + +## Steps +1. Create a management cluster that has the CAPX infrastructure provider deployed. +2. Create a `cluster.yml` file containing the workload cluster definition. Refer to the steps defined in the [CAPI quickstart guide](https://cluster-api.sigs.k8s.io/user/quick-start.html){target=_blank} to create an example `cluster.yml` file. +3. Add additional `MachineDeployment` and `NutanixMachineTemplate` resources. + + By default there is only one machine template and machine deployment defined. To add nodes residing on another Prism Element cluster, a new `MachineDeployment` and `NutanixMachineTemplate` resource needs to be added to the yaml file. The autogenerated `MachineDeployment` and `NutanixMachineTemplate` resource definitions can be used as a baseline. + + Make sure to modify the `MachineDeployment` and `NutanixMachineTemplate` parameters. + +4. Apply the modified `cluster.yml` file to the management cluster. diff --git a/docs/capx/v1.4.x/experimental/oidc.md b/docs/capx/v1.4.x/experimental/oidc.md new file mode 100644 index 00000000..0c274121 --- /dev/null +++ b/docs/capx/v1.4.x/experimental/oidc.md @@ -0,0 +1,31 @@ +# OIDC integration + +!!! warning + The scenario and features described on this page are experimental. It's important to note that they have not been fully validated. + +Kubernetes allows users to authenticate using various authentication mechanisms. One of these mechanisms is OIDC. Information on how Kubernetes interacts with OIDC providers can be found in the [OpenID Connect Tokens](https://kubernetes.io/docs/reference/access-authn-authz/authentication/#openid-connect-tokens){target=_blank} section of the official Kubernetes documentation. + + +Follow the steps below to configure a CAPX cluster to use an OIDC identity provider. + +## Steps +1. Generate a `cluster.yaml` file with the required CAPX cluster configuration. Refer to the [Getting Started](../getting_started.md){target=_blank} page for more information on how to generate a `cluster.yaml` file. Do not apply the `cluster.yaml` file. +2. Edit the `cluster.yaml` file and search for the `KubeadmControlPlane` resource. +3. Modify/add the `spec.kubeadmConfigSpec.clusterConfiguration.apiServer.extraArgs` attribute and add the required [API server parameters](https://kubernetes.io/docs/reference/access-authn-authz/authentication/#configuring-the-api-server){target=_blank}. See the [example](#example) below. +4. Apply the `cluster.yaml` file +5. Log in with the OIDC provider once the cluster is provisioned + +## Example +```YAML +kind: KubeadmControlPlane +spec: + kubeadmConfigSpec: + clusterConfiguration: + apiServer: + extraArgs: + ... + oidc-client-id: + oidc-issuer-url: + ... +``` + diff --git a/docs/capx/v1.4.x/experimental/proxy.md b/docs/capx/v1.4.x/experimental/proxy.md new file mode 100644 index 00000000..c8f940d4 --- /dev/null +++ b/docs/capx/v1.4.x/experimental/proxy.md @@ -0,0 +1,62 @@ +# Proxy configuration + +!!! warning + The scenario and features described on this page are experimental. It's important to note that they have not been fully validated. + +CAPX can be configured to use a proxy to connect to external networks. This proxy configuration needs to be applied to control plane and worker nodes. + +Follow the steps below to configure a CAPX cluster to use a proxy. + +## Steps +1. Generate a `cluster.yaml` file with the required CAPX cluster configuration. Refer to the [Getting Started](../getting_started.md){target=_blank} page for more information on how to generate a `cluster.yaml` file. Do not apply the `cluster.yaml` file. +2. Edit the `cluster.yaml` file and modify the following resources as shown in the [example](#example) below to add the proxy configuration. + 1. `KubeadmControlPlane`: + * Add the proxy configuration to the `spec.kubeadmConfigSpec.files` list. Do not modify other items in the list. + * Add `systemctl` commands to apply the proxy config in `spec.kubeadmConfigSpec.preKubeadmCommands`. Do not modify other items in the list. + 2. `KubeadmConfigTemplate`: + * Add the proxy configuration to the `spec.template.spec.files` list. Do not modify other items in the list. + * Add `systemctl` commands to apply the proxy config in `spec.template.spec.preKubeadmCommands`. Do not modify other items in the list. +4. Apply the `cluster.yaml` file + +## Example + +```YAML +--- +# controlplane proxy settings +kind: KubeadmControlPlane +spec: + kubeadmConfigSpec: + files: + - content: | + [Service] + Environment="HTTP_PROXY=" + Environment="HTTPS_PROXY=" + Environment="NO_PROXY=" + owner: root:root + path: /etc/systemd/system/containerd.service.d/http-proxy.conf + ... + preKubeadmCommands: + - sudo systemctl daemon-reload + - sudo systemctl restart containerd + ... +--- +# worker proxy settings +kind: KubeadmConfigTemplate +spec: + template: + spec: + files: + - content: | + [Service] + Environment="HTTP_PROXY=" + Environment="HTTPS_PROXY=" + Environment="NO_PROXY=" + owner: root:root + path: /etc/systemd/system/containerd.service.d/http-proxy.conf + ... + preKubeadmCommands: + - sudo systemctl daemon-reload + - sudo systemctl restart containerd + ... +``` + diff --git a/docs/capx/v1.4.x/experimental/registry_mirror.md b/docs/capx/v1.4.x/experimental/registry_mirror.md new file mode 100644 index 00000000..307a9425 --- /dev/null +++ b/docs/capx/v1.4.x/experimental/registry_mirror.md @@ -0,0 +1,96 @@ +# Registry Mirror configuration + +!!! warning + The scenario and features described on this page are experimental. It's important to note that they have not been fully validated. + +CAPX can be configured to use a private registry to act as a mirror of an external public registry. This registry mirror configuration needs to be applied to control plane and worker nodes. + +Follow the steps below to configure a CAPX cluster to use a registry mirror. + +## Steps +1. Generate a `cluster.yaml` file with the required CAPX cluster configuration. Refer to the [Getting Started](../getting_started.md){target=_blank} page for more information on how to generate a `cluster.yaml` file. Do not apply the `cluster.yaml` file. +2. Edit the `cluster.yaml` file and modify the following resources as shown in the [example](#example) below to add the proxy configuration. + 1. `KubeadmControlPlane`: + * Add the registry mirror configuration to the `spec.kubeadmConfigSpec.files` list. Do not modify other items in the list. + * Update `/etc/containerd/config.toml` commands to apply the registry mirror config in `spec.kubeadmConfigSpec.preKubeadmCommands`. Do not modify other items in the list. + 2. `KubeadmConfigTemplate`: + * Add the registry mirror configuration to the `spec.template.spec.files` list. Do not modify other items in the list. + * Update `/etc/containerd/config.toml` commands to apply the registry mirror config in `spec.template.spec.preKubeadmCommands`. Do not modify other items in the list. +4. Apply the `cluster.yaml` file + +## Example + +This example will configure a registry mirror for the following namespace: + +* registry.k8s.io +* ghcr.io +* quay.io + +and redirect them to corresponding projects of the `` registry. + +```YAML +--- +# controlplane proxy settings +kind: KubeadmControlPlane +spec: + kubeadmConfigSpec: + files: + - content: | + [host."https:///v2/registry.k8s.io"] + capabilities = ["pull", "resolve"] + skip_verify = false + override_path = true + owner: root:root + path: /etc/containerd/certs.d/registry.k8s.io/hosts.toml + - content: | + [host."https:///v2/ghcr.io"] + capabilities = ["pull", "resolve"] + skip_verify = false + override_path = true + owner: root:root + path: /etc/containerd/certs.d/ghcr.io/hosts.toml + - content: | + [host."https:///v2/quay.io"] + capabilities = ["pull", "resolve"] + skip_verify = false + override_path = true + owner: root:root + path: /etc/containerd/certs.d/quay.io/hosts.toml + ... + preKubeadmCommands: + - echo '\n[plugins."io.containerd.grpc.v1.cri".registry]\n config_path = "/etc/containerd/certs.d"' >> /etc/containerd/config.toml + ... +--- +# worker proxy settings +kind: KubeadmConfigTemplate +spec: + template: + spec: + files: + - content: | + [host."https:///v2/registry.k8s.io"] + capabilities = ["pull", "resolve"] + skip_verify = false + override_path = true + owner: root:root + path: /etc/containerd/certs.d/registry.k8s.io/hosts.toml + - content: | + [host."https:///v2/ghcr.io"] + capabilities = ["pull", "resolve"] + skip_verify = false + override_path = true + owner: root:root + path: /etc/containerd/certs.d/ghcr.io/hosts.toml + - content: | + [host."https:///v2/quay.io"] + capabilities = ["pull", "resolve"] + skip_verify = false + override_path = true + owner: root:root + path: /etc/containerd/certs.d/quay.io/hosts.toml + ... + preKubeadmCommands: + - echo '\n[plugins."io.containerd.grpc.v1.cri".registry]\n config_path = "/etc/containerd/certs.d"' >> /etc/containerd/config.toml + ... +``` + diff --git a/docs/capx/v1.4.x/experimental/vpc.md b/docs/capx/v1.4.x/experimental/vpc.md new file mode 100644 index 00000000..3513e47e --- /dev/null +++ b/docs/capx/v1.4.x/experimental/vpc.md @@ -0,0 +1,40 @@ +# Creating a workload CAPX cluster in a Nutanix Flow VPC + +!!! warning + The scenario and features described on this page are experimental. It's important to note that they have not been fully validated. + +!!! note + Nutanix Flow VPCs are only validated with CAPX 1.1.3+ + +[Nutanix Flow Virtual Networking](https://portal.nutanix.com/page/documents/details?targetId=Nutanix-Flow-Virtual-Networking-Guide-vpc_2022_9:Nutanix-Flow-Virtual-Networking-Guide-vpc_2022_9){target=_blank} allows users to create Virtual Private Clouds (VPCs) with Overlay networking. +The steps below will illustrate how a CAPX cluster can be deployed inside an overlay subnet (NAT) inside a VPC while the management cluster resides outside of the VPC. + + +## Steps +1. [Request a floating IP](https://portal.nutanix.com/page/documents/details?targetId=Nutanix-Flow-Networking-Guide:ear-flow-nw-request-floating-ip-pc-t.html){target=_blank} +2. Link the floating IP to an internal IP address inside the overlay subnet that will be used to deploy the CAPX cluster. This address will be assigned to the CAPX loadbalancer. To prevent IP conflicts, make sure the IP address is not part of the IP-pool defined in the subnet. +3. Generate a `cluster.yaml` file with the required CAPX cluster configuration where the `CONTROL_PLANE_ENDPOINT_IP` is set to the floating IP requested in the first step. Refer to the [Getting Started](../getting_started.md){target=_blank} page for more information on how to generate a `cluster.yaml` file. Do not apply the `cluster.yaml` file. +4. Edit the `cluster.yaml` file and search for the `KubeadmControlPlane` resource. +5. Modify the `spec.kubeadmConfigSpec.files.*.content` attribute and change the `kube-vip` definition similar to the [example](#example) below. +6. Apply the `cluster.yaml` file. +7. When the CAPX workload cluster is deployed, it will be reachable via the floating IP. + +## Example +```YAML +kind: KubeadmControlPlane +spec: + kubeadmConfigSpec: + files: + - content: | + apiVersion: v1 + kind: Pod + metadata: + name: kube-vip + namespace: kube-system + spec: + containers: + - env: + - name: address + value: "" +``` + diff --git a/docs/capx/v1.4.x/getting_started.md b/docs/capx/v1.4.x/getting_started.md new file mode 100644 index 00000000..c1643abd --- /dev/null +++ b/docs/capx/v1.4.x/getting_started.md @@ -0,0 +1,159 @@ +# Getting Started + +This is a guide on getting started with Cluster API Provider Nutanix Cloud Infrastructure (CAPX). To learn more about cluster API in more depth, check out the [Cluster API book](https://cluster-api.sigs.k8s.io/){target=_blank}. + +For more information on how install the Nutanix CSI Driver on a CAPX cluster, visit [Nutanix CSI Driver installation with CAPX](./addons/install_csi_driver.md). + +For more information on how CAPX handles credentials, visit [Credential Management](./credential_management.md). + +For more information on the port requirements for CAPX, visit [Port Requirements](./port_requirements.md). + +!!! note + [Nutanix Cloud Controller Manager (CCM)](../../ccm/latest/overview.md) is a mandatory component starting from CAPX v1.3.0. Ensure all CAPX-managed Kubernetes clusters are configured to use Nutanix CCM before upgrading to v1.3.0 or later. See [CAPX v1.4.x Upgrade Procedure](./tasks/capx_v14x_upgrade_procedure.md). + +## Production Workflow + +### Build OS image for NutanixMachineTemplate resource +Cluster API Provider Nutanix Cloud Infrastructure (CAPX) uses the [Image Builder](https://image-builder.sigs.k8s.io/){target=_blank} project to build OS images used for the Nutanix machines. + +Follow the steps detailed in [Building CAPI Images for Nutanix Cloud Platform (NCP)](https://image-builder.sigs.k8s.io/capi/providers/nutanix.html#building-capi-images-for-nutanix-cloud-platform-ncp){target=_blank} to use Image Builder on the Nutanix Cloud Platform. + +For a list of operating systems visit the OS image [Configuration](https://image-builder.sigs.k8s.io/capi/providers/nutanix.html#configuration){target=_blank} page. + +### Prerequisites for using Cluster API Provider Nutanix Cloud Infrastructure +The [Cluster API installation](https://cluster-api.sigs.k8s.io/user/quick-start.html#installation){target=_blank} section provides an overview of all required prerequisites: + +- [Common Prerequisites](https://cluster-api.sigs.k8s.io/user/quick-start.html#common-prerequisites){target=_blank} +- [Install and/or configure a Kubernetes cluster](https://cluster-api.sigs.k8s.io/user/quick-start.html#install-andor-configure-a-kubernetes-cluster){target=_blank} +- [Install clusterctl](https://cluster-api.sigs.k8s.io/user/quick-start.html#install-clusterctl){target=_blank} +- (Optional) [Enabling Feature Gates](https://cluster-api.sigs.k8s.io/user/quick-start.html#enabling-feature-gates){target=_blank} + +Make sure these prerequisites have been met before moving to the [Configure and Install Cluster API Provider Nutanix Cloud Infrastructure](#configure-and-install-cluster-api-provider-nutanix-cloud-infrastructure) step. + + +### Configure and Install Cluster API Provider Nutanix Cloud Infrastructure +To initialize Cluster API Provider Nutanix Cloud Infrastructure, `clusterctl` requires the following variables, which should be set in either `~/.cluster-api/clusterctl.yaml` or as environment variables. +``` +NUTANIX_ENDPOINT: "" # IP or FQDN of Prism Central +NUTANIX_USER: "" # Prism Central user +NUTANIX_PASSWORD: "" # Prism Central password +NUTANIX_INSECURE: false # or true + +KUBERNETES_VERSION: "v1.22.9" +WORKER_MACHINE_COUNT: 3 +NUTANIX_SSH_AUTHORIZED_KEY: "" + +NUTANIX_PRISM_ELEMENT_CLUSTER_NAME: "" +NUTANIX_MACHINE_TEMPLATE_IMAGE_NAME: "" +NUTANIX_SUBNET_NAME: "" + +EXP_CLUSTER_RESOURCE_SET: true # Required for Nutanix CCM installation +``` + +You can also see the required list of variables by running the following: +``` +clusterctl generate cluster mycluster -i nutanix --list-variables +Required Variables: + - CONTROL_PLANE_ENDPOINT_IP + - KUBERNETES_VERSION + - NUTANIX_ENDPOINT + - NUTANIX_MACHINE_TEMPLATE_IMAGE_NAME + - NUTANIX_PASSWORD + - NUTANIX_PRISM_ELEMENT_CLUSTER_NAME + - NUTANIX_SSH_AUTHORIZED_KEY + - NUTANIX_SUBNET_NAME + - NUTANIX_USER + +Optional Variables: + - CONTROL_PLANE_ENDPOINT_PORT (defaults to "6443") + - CONTROL_PLANE_MACHINE_COUNT (defaults to 1) + - KUBEVIP_LB_ENABLE (defaults to "false") + - KUBEVIP_SVC_ENABLE (defaults to "false") + - NAMESPACE (defaults to current Namespace in the KubeConfig file) + - NUTANIX_INSECURE (defaults to "false") + - NUTANIX_MACHINE_BOOT_TYPE (defaults to "legacy") + - NUTANIX_MACHINE_MEMORY_SIZE (defaults to "4Gi") + - NUTANIX_MACHINE_VCPU_PER_SOCKET (defaults to "1") + - NUTANIX_MACHINE_VCPU_SOCKET (defaults to "2") + - NUTANIX_PORT (defaults to "9440") + - NUTANIX_SYSTEMDISK_SIZE (defaults to "40Gi") + - WORKER_MACHINE_COUNT (defaults to 0) +``` + +!!! note + To prevent duplicate IP assignments, it is required to assign an IP-address to the `CONTROL_PLANE_ENDPOINT_IP` variable that is not part of the Nutanix IPAM or DHCP range assigned to the subnet of the CAPX cluster. + +!!! warning + Make sure [Cluster Resource Set (CRS)](https://cluster-api.sigs.k8s.io/tasks/experimental-features/cluster-resource-set){target=_blank} is enabled before running `clusterctl init` + +Now you can instantiate Cluster API with the following: +``` +clusterctl init -i nutanix +``` + +### Deploy a workload cluster on Nutanix Cloud Infrastructure +``` +export TEST_CLUSTER_NAME=mytestcluster1 +export TEST_NAMESPACE=mytestnamespace +CONTROL_PLANE_ENDPOINT_IP=x.x.x.x clusterctl generate cluster ${TEST_CLUSTER_NAME} \ + -i nutanix \ + --target-namespace ${TEST_NAMESPACE} \ + --kubernetes-version v1.22.9 \ + --control-plane-machine-count 1 \ + --worker-machine-count 3 > ./cluster.yaml +kubectl create ns ${TEST_NAMESPACE} +kubectl apply -f ./cluster.yaml -n ${TEST_NAMESPACE} +``` +To customize the configuration of the default `cluster.yaml` file generated by CAPX, visit the [NutanixCluster](./types/nutanix_cluster.md) and [NutanixMachineTemplate](./types/nutanix_machine_template.md) documentation. + +### Access a workload cluster +To access resources on the cluster, you can get the kubeconfig with the following: +``` +clusterctl get kubeconfig ${TEST_CLUSTER_NAME} -n ${TEST_NAMESPACE} > ${TEST_CLUSTER_NAME}.kubeconfig +kubectl --kubeconfig ./${TEST_CLUSTER_NAME}.kubeconfig get nodes +``` + +### Install CNI on workload a cluster + +You must deploy a Container Network Interface (CNI) based pod network add-on so that your pods can communicate with each other. Cluster DNS (CoreDNS) will not start up before a network is installed. + +!!! note + Take care that your pod network must not overlap with any of the host networks. You are likely to see problems if there is any overlap. If you find a collision between your network plugin's preferred pod network and some of your host networks, you must choose a suitable alternative CIDR block to use instead. It can be configured inside the `cluster.yaml` generated by `clusterctl generate cluster` before applying it. + +Several external projects provide Kubernetes pod networks using CNI, some of which also support [Network Policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/){target=_blank}. + +See a list of add-ons that implement the [Kubernetes networking model](https://kubernetes.io/docs/concepts/cluster-administration/networking/#how-to-implement-the-kubernetes-network-model){target=_blank}. At time of writing, the most common are [Calico](https://www.tigera.io/project-calico/){target=_blank} and [Cilium](https://cilium.io){target=_blank}. + +Follow the specific install guide for your selected CNI and install only one pod network per cluster. + +Once a pod network has been installed, you can confirm that it is working by checking that the CoreDNS pod is running in the output of `kubectl get pods --all-namespaces`. + + +### Kube-vip settings + +Kube-vip is a true load balancing solution for the Kubernetes control plane. It distributes API requests across control plane nodes. It also has the capability to provide load balancing for Kubernetes services. + +You can tweak kube-vip settings by using the following properties: + +- `KUBEVIP_LB_ENABLE` + +This setting allows control plane load balancing using IPVS. See +[Control Plane Load-Balancing documentation](https://kube-vip.io/docs/about/architecture/#control-plane-load-balancing){target=_blank} for further information. + +- `KUBEVIP_SVC_ENABLE` + +This setting enables a service of type LoadBalancer. See +[Kubernetes Service Load Balancing documentation](https://kube-vip.io/docs/about/architecture/#kubernetes-service-load-balancing){target=_blank} for further information. + +- `KUBEVIP_SVC_ELECTION` + +This setting enables Load Balancing of Load Balancers. See [Load Balancing Load Balancers](https://kube-vip.io/docs/usage/kubernetes-services/#load-balancing-load-balancers-when-using-arp-mode-yes-you-read-that-correctly-kube-vip-v050){target=_blank} for further information. + +### Delete a workload cluster +To remove a workload cluster from your management cluster, remove the cluster object and the provider will clean-up all resources. + +``` +kubectl delete cluster ${TEST_CLUSTER_NAME} -n ${TEST_NAMESPACE} +``` +!!! note + Deleting the entire cluster template with `kubectl delete -f ./cluster.yaml` may lead to pending resources requiring manual cleanup. diff --git a/docs/capx/v1.4.x/pc_certificates.md b/docs/capx/v1.4.x/pc_certificates.md new file mode 100644 index 00000000..f3fe1699 --- /dev/null +++ b/docs/capx/v1.4.x/pc_certificates.md @@ -0,0 +1,149 @@ +# Certificate Trust + +CAPX invokes Prism Central APIs using the HTTPS protocol. CAPX has different methods to handle the trust of the Prism Central certificates: + +- Enable certificate verification (default) +- Configure an additional trust bundle +- Disable certificate verification + +See the respective sections below for more information. + +!!! note + For more information about replacing Prism Central certificates, see the [Nutanix AOS Security Guide](https://portal.nutanix.com/page/documents/details?targetId=Nutanix-Security-Guide-v6_5:mul-security-ssl-certificate-pc-t.html){target=_blank}. + +## Enable certificate verification (default) +By default CAPX will perform certificate verification when invoking Prism Central API calls. This requires Prism Central to be configured with a publicly trusted certificate authority. +No additional configuration is required in CAPX. + +## Configure an additional trust bundle +CAPX allows users to configure an additional trust bundle. This will allow CAPX to verify certificates that are not issued by a publicy trusted certificate authority. + +To configure an additional trust bundle, the `NUTANIX_ADDITIONAL_TRUST_BUNDLE` environment variable needs to be set. The value of the `NUTANIX_ADDITIONAL_TRUST_BUNDLE` environment variable contains the trust bundle (PEM format) in base64 encoded format. See the [Configuring the trust bundle environment variable](#configuring-the-trust-bundle-environment-variable) section for more information. + +It is also possible to configure the additional trust bundle manually by creating a custom `cluster-template`. See the [Configuring the additional trust bundle manually](#configuring-the-additional-trust-bundle-manually) section for more information + +The `NUTANIX_ADDITIONAL_TRUST_BUNDLE` environment variable can be set when initializing the CAPX provider or when creating a workload cluster. If the `NUTANIX_ADDITIONAL_TRUST_BUNDLE` is configured when the CAPX provider is initialized, the additional trust bundle will be used for every CAPX workload cluster. If it is only configured when creating a workload cluster, it will only be applicable for that specific workload cluster. + + +### Configuring the trust bundle environment variable + +Create a PEM encoded file containing the root certificate and all intermediate certificates. Example: +``` +$ cat cert.crt +-----BEGIN CERTIFICATE----- + +-----END CERTIFICATE----- +-----BEGIN CERTIFICATE----- + +-----END CERTIFICATE----- +``` + +Use a `base64` tool to encode these contents in base64. The command below will provide a `base64` string. +``` +$ cat cert.crt | base64 + +``` +!!! note + Make sure the `base64` string does not contain any newlines (`\n`). If the output string contains newlines, remove them manually or check the manual of the `base64` tool on how to generate a `base64` string without newlines. + +Use the `base64` string as value for the `NUTANIX_ADDITIONAL_TRUST_BUNDLE` environment variable. +``` +$ export NUTANIX_ADDITIONAL_TRUST_BUNDLE="" +``` + +### Configuring the additional trust bundle manually + +To configure the additional trust bundle manually without using the `NUTANIX_ADDITIONAL_TRUST_BUNDLE` environment variable present in the default `cluster-template` files, it is required to: + +- Create a `ConfigMap` containing the additional trust bundle. +- Configure the `prismCentral.additionalTrustBundle` object in the `NutanixCluster` spec. + +#### Creating the additional trust bundle ConfigMap + +CAPX supports two different formats for the ConfigMap containing the additional trust bundle. The first one is to add the additional trust bundle as a multi-line string in the `ConfigMap`, the second option is to add the trust bundle in `base64` encoded format. See the examples below. + +Multi-line string example: +```YAML +--- +apiVersion: v1 +kind: ConfigMap +metadata: + name: user-ca-bundle + namespace: ${NAMESPACE} +data: + ca.crt: | + -----BEGIN CERTIFICATE----- + + -----END CERTIFICATE----- + -----BEGIN CERTIFICATE----- + + -----END CERTIFICATE----- +``` + +`base64` example: + +```YAML +apiVersion: v1 +kind: ConfigMap +metadata: + name: user-ca-bundle + namespace: ${NAMESPACE} +binaryData: + ca.crt: +``` + +!!! note + The `base64` string needs to be added as `binaryData`. + + +#### Configuring the NutanixCluster spec + +When the additional trust bundle `ConfigMap` is created, it needs to be referenced in the `NutanixCluster` spec. Add the `prismCentral.additionalTrustBundle` object in the `NutanixCluster` spec as shown below. Make sure the correct additional trust bundle `ConfigMap` is referenced. + +```YAML +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixCluster +metadata: + name: ${CLUSTER_NAME} + namespace: ${NAMESPACE} +spec: + ... + prismCentral: + ... + additionalTrustBundle: + kind: ConfigMap + name: user-ca-bundle + insecure: false +``` + +!!! note + the default value of `prismCentral.insecure` attribute is `false`. It can be omitted when an additional trust bundle is configured. + + If `prismCentral.insecure` attribute is set to `true`, all certificate verification will be disabled. + + +## Disable certificate verification + +!!! note + Disabling certificate verification is not recommended for production purposes and should only be used for testing. + + +Certificate verification can be disabled by setting the `prismCentral.insecure` attribute to `true` in the `NutanixCluster` spec. Certificate verification will be disabled even if an additional trust bundle is configured. + +Disabled certificate verification example: + +```YAML +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixCluster +metadata: + name: ${CLUSTER_NAME} + namespace: ${NAMESPACE} +spec: + controlPlaneEndpoint: + host: ${CONTROL_PLANE_ENDPOINT_IP} + port: ${CONTROL_PLANE_ENDPOINT_PORT=6443} + prismCentral: + ... + insecure: true + ... +``` \ No newline at end of file diff --git a/docs/capx/v1.4.x/port_requirements.md b/docs/capx/v1.4.x/port_requirements.md new file mode 100644 index 00000000..af182abb --- /dev/null +++ b/docs/capx/v1.4.x/port_requirements.md @@ -0,0 +1,19 @@ +# Port Requirements + +CAPX uses the ports documented below to create workload clusters. + +!!! note + This page only documents the ports specifically required by CAPX and does not provide the full overview of all ports required in the CAPI framework. + +## Management cluster + +| Source | Destination | Protocol | Port | Description | +|--------------------|---------------------|----------|------|--------------------------------------------------------------------------------------------------| +| Management cluster | External Registries | TCP | 443 | Pull container images from [CAPX public registries](#public-registries-utilized-when-using-capx) | +| Management cluster | Prism Central | TCP | 9440 | Management cluster communication to Prism Central | + +## Public registries utilized when using CAPX + +| Registry name | +|---------------| +| ghcr.io | diff --git a/docs/capx/v1.4.x/tasks/capx_v14x_upgrade_procedure.md b/docs/capx/v1.4.x/tasks/capx_v14x_upgrade_procedure.md new file mode 100644 index 00000000..14602f73 --- /dev/null +++ b/docs/capx/v1.4.x/tasks/capx_v14x_upgrade_procedure.md @@ -0,0 +1,83 @@ +# CAPX v1.4.x Upgrade Procedure + +Starting from CAPX v1.3.0, it is required for all CAPX-managed Kubernetes clusters to use the Nutanix Cloud Controller Manager (CCM). + +Before upgrading CAPX instances to v1.3.0 or later, it is required to follow the [steps](#steps) detailed below for each of the CAPX-managed Kubernetes clusters that don't use Nutanix CCM. + + +## Steps + +This procedure uses [Cluster Resource Set (CRS)](https://cluster-api.sigs.k8s.io/tasks/experimental-features/cluster-resource-set){target=_blank} to install Nutanix CCM but it can also be installed using the [Nutanix CCM Helm chart](https://artifacthub.io/packages/helm/nutanix/nutanix-cloud-provider){target=_blank}. + +!!! warning + Make sure [CRS](https://cluster-api.sigs.k8s.io/tasks/experimental-features/cluster-resource-set){target=_blank} is enabled on the management cluster before following the procedure. + +Perform following steps for each of the CAPX-managed Kubernetes clusters that are not configured to use Nutanix CCM: + +1. Add the `cloud-provider: external` configuration in the `KubeadmConfigTemplate` resources: + ```YAML + apiVersion: bootstrap.cluster.x-k8s.io/v1beta1 + kind: KubeadmConfigTemplate + spec: + template: + spec: + joinConfiguration: + nodeRegistration: + kubeletExtraArgs: + cloud-provider: external + ``` +2. Add the `cloud-provider: external` configuration in the `KubeadmControlPlane` resource: +```YAML +--- +apiVersion: bootstrap.cluster.x-k8s.io/v1beta1 +kind: KubeadmConfigTemplate +spec: + template: + spec: + joinConfiguration: + nodeRegistration: + kubeletExtraArgs: + cloud-provider: external +--- +apiVersion: controlplane.cluster.x-k8s.io/v1beta1 +kind: KubeadmControlPlane +spec: + kubeadmConfigSpec: + clusterConfiguration: + apiServer: + extraArgs: + cloud-provider: external + controllerManager: + extraArgs: + cloud-provider: external + initConfiguration: + nodeRegistration: + kubeletExtraArgs: + cloud-provider: external + joinConfiguration: + nodeRegistration: + kubeletExtraArgs: + cloud-provider: external +``` +3. Add the Nutanix CCM CRS resources: + + - [nutanix-ccm-crs.yaml](https://github.com/nutanix-cloud-native/cluster-api-provider-nutanix/blob/v1.4.0/templates/ccm/nutanix-ccm-crs.yaml){target=_blank} + - [nutanix-ccm-secret.yaml](https://github.com/nutanix-cloud-native/cluster-api-provider-nutanix/blob/v1.4.0/templates/ccm/nutanix-ccm-secret.yaml) + - [nutanix-ccm.yaml](https://github.com/nutanix-cloud-native/cluster-api-provider-nutanix/blob/v1.4.0/templates/ccm/nutanix-ccm.yaml) + + Make sure to update each of the variables before applying the `YAML` files. + +4. Add the `ccm: nutanix` label to the `Cluster` resource: + ```YAML + apiVersion: cluster.x-k8s.io/v1beta1 + kind: Cluster + metadata: + labels: + ccm: nutanix + ``` +5. Verify if the Nutanix CCM pod is up and running: +``` +kubectl get pod -A -l k8s-app=nutanix-cloud-controller-manager +``` +6. Trigger a new rollout of the Kubernetes nodes by performing a Kubernetes upgrade or by using `clusterctl alpha rollout restart`. See the [clusterctl alpha rollout](https://cluster-api.sigs.k8s.io/clusterctl/commands/alpha-rollout#restart){target=_blank} for more information. +7. Upgrade CAPX to v1.4.0 by following the [clusterctl upgrade](https://cluster-api.sigs.k8s.io/clusterctl/commands/upgrade.html?highlight=clusterctl%20upgrade%20pla#clusterctl-upgrade){target=_blank} documentation \ No newline at end of file diff --git a/docs/capx/v1.4.x/tasks/modify_machine_configuration.md b/docs/capx/v1.4.x/tasks/modify_machine_configuration.md new file mode 100644 index 00000000..04a43a95 --- /dev/null +++ b/docs/capx/v1.4.x/tasks/modify_machine_configuration.md @@ -0,0 +1,11 @@ +# Modifying Machine Configurations + +Since all attributes of the `NutanixMachineTemplate` resources are immutable, follow the [Updating Infrastructure Machine Templates](https://cluster-api.sigs.k8s.io/tasks/updating-machine-templates.html?highlight=machine%20template#updating-infrastructure-machine-templates){target=_blank} procedure to modify the configuration of machines in an existing CAPX cluster. +See the [NutanixMachineTemplate](../types/nutanix_machine_template.md) documentation for all supported configuration parameters. + +!!! note + Manually modifying existing and linked `NutanixMachineTemplate` resources will not trigger a rolling update of the machines. + +!!! note + Do not modify the virtual machine configuration of CAPX cluster nodes manually in Prism/Prism Central. + CAPX will not automatically revert the configuration change but performing scale-up/scale-down/upgrade operations will override manual modifications. Only use the `Updating Infrastructure Machine` procedure referenced above to perform configuration changes. \ No newline at end of file diff --git a/docs/capx/v1.4.x/troubleshooting.md b/docs/capx/v1.4.x/troubleshooting.md new file mode 100644 index 00000000..c023d13e --- /dev/null +++ b/docs/capx/v1.4.x/troubleshooting.md @@ -0,0 +1,13 @@ +# Troubleshooting + +## Clusterctl failed with GitHub rate limit error + +By design Clusterctl fetches artifacts from repositories hosted on GitHub, this operation is subject to [GitHub API rate limits](https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting){target=_blank}. + +While this is generally okay for the majority of users, there is still a chance that some users (especially developers or CI tools) hit this limit: + +``` +Error: failed to get repository client for the XXX with name YYY: error creating the GitHub repository client: failed to get GitHub latest version: failed to get the list of versions: rate limit for github api has been reached. Please wait one hour or get a personal API tokens a assign it to the GITHUB_TOKEN environment variable +``` + +As explained in the error message, you can increase your API rate limit by [creating a GitHub personal token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token){target=_blank} and setting a `GITHUB_TOKEN` environment variable using the token. diff --git a/docs/capx/v1.4.x/types/nutanix_cluster.md b/docs/capx/v1.4.x/types/nutanix_cluster.md new file mode 100644 index 00000000..09325cab --- /dev/null +++ b/docs/capx/v1.4.x/types/nutanix_cluster.md @@ -0,0 +1,64 @@ +# NutanixCluster + +The `NutanixCluster` resource defines the configuration of a CAPX Kubernetes cluster. + +Example of a `NutanixCluster` resource: + +```YAML +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixCluster +metadata: + name: ${CLUSTER_NAME} + namespace: ${NAMESPACE} +spec: + controlPlaneEndpoint: + host: ${CONTROL_PLANE_ENDPOINT_IP} + port: ${CONTROL_PLANE_ENDPOINT_PORT=6443} + prismCentral: + address: ${NUTANIX_ENDPOINT} + additionalTrustBundle: + kind: ConfigMap + name: user-ca-bundle + credentialRef: + kind: Secret + name: ${CLUSTER_NAME} + insecure: ${NUTANIX_INSECURE=false} + port: ${NUTANIX_PORT=9440} +``` + +## NutanixCluster spec +The table below provides an overview of the supported parameters of the `spec` attribute of a `NutanixCluster` resource. + +### Configuration parameters + +| Key |Type |Description | +|--------------------------------------------|------|----------------------------------------------------------------------------------| +|controlPlaneEndpoint |object|Defines the host IP and port of the CAPX Kubernetes cluster. | +|controlPlaneEndpoint.host |string|Host IP to be assigned to the CAPX Kubernetes cluster. | +|controlPlaneEndpoint.port |int |Port of the CAPX Kubernetes cluster. Default: `6443` | +|prismCentral |object|(Optional) Prism Central endpoint definition. | +|prismCentral.address |string|IP/FQDN of Prism Central. | +|prismCentral.port |int |Port of Prism Central. Default: `9440` | +|prismCentral.insecure |bool |Disable Prism Central certificate checking. Default: `false` | +|prismCentral.credentialRef |object|Reference to credentials used for Prism Central connection. | +|prismCentral.credentialRef.kind |string|Kind of the credentialRef. Allowed value: `Secret` | +|prismCentral.credentialRef.name |string|Name of the secret containing the Prism Central credentials. | +|prismCentral.credentialRef.namespace |string|(Optional) Namespace of the secret containing the Prism Central credentials. | +|prismCentral.additionalTrustBundle |object|Reference to the certificate trust bundle used for Prism Central connection. | +|prismCentral.additionalTrustBundle.kind |string|Kind of the additionalTrustBundle. Allowed value: `ConfigMap` | +|prismCentral.additionalTrustBundle.name |string|Name of the `ConfigMap` containing the Prism Central trust bundle. | +|prismCentral.additionalTrustBundle.namespace|string|(Optional) Namespace of the `ConfigMap` containing the Prism Central trust bundle.| +|failureDomains |list |(Optional) Failure domains for the Kubernetes nodes | +|failureDomains.[].name |string|Name of the failure domain | +|failureDomains.[].cluster |object|Reference (name or uuid) to the Prism Element cluster. Name or UUID can be passed | +|failureDomains.[].cluster.type |string|Type to identify the Prism Element cluster. Allowed values: `name` and `uuid` | +|failureDomains.[].cluster.name |string|Name of the Prism Element cluster. | +|failureDomains.[].cluster.uuid |string|UUID of the Prism Element cluster. | +|failureDomains.[].subnets |list |(Optional) Reference (name or uuid) to the subnets to be assigned to the VMs. | +|failureDomains.[].subnets.[].type |string|Type to identify the subnet. Allowed values: `name` and `uuid` | +|failureDomains.[].subnets.[].name |string|Name of the subnet. | +|failureDomains.[].subnets.[].uuid |string|UUID of the subnet. | +|failureDomains.[].controlPlane |bool |Indicates if a failure domain is suited for control plane nodes + +!!! note + To prevent duplicate IP assignments, it is required to assign an IP-address to the `controlPlaneEndpoint.host` variable that is not part of the Nutanix IPAM or DHCP range assigned to the subnet of the CAPX cluster. \ No newline at end of file diff --git a/docs/capx/v1.4.x/types/nutanix_machine_template.md b/docs/capx/v1.4.x/types/nutanix_machine_template.md new file mode 100644 index 00000000..516d1eea --- /dev/null +++ b/docs/capx/v1.4.x/types/nutanix_machine_template.md @@ -0,0 +1,84 @@ +# NutanixMachineTemplate +The `NutanixMachineTemplate` resource defines the configuration of a CAPX Kubernetes VM. + +Example of a `NutanixMachineTemplate` resource. + +```YAML +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixMachineTemplate +metadata: + name: "${CLUSTER_NAME}-mt-0" + namespace: "${NAMESPACE}" +spec: + template: + spec: + providerID: "nutanix://${CLUSTER_NAME}-m1" + # Supported options for boot type: legacy and uefi + # Defaults to legacy if not set + bootType: ${NUTANIX_MACHINE_BOOT_TYPE=legacy} + vcpusPerSocket: ${NUTANIX_MACHINE_VCPU_PER_SOCKET=1} + vcpuSockets: ${NUTANIX_MACHINE_VCPU_SOCKET=2} + memorySize: "${NUTANIX_MACHINE_MEMORY_SIZE=4Gi}" + systemDiskSize: "${NUTANIX_SYSTEMDISK_SIZE=40Gi}" + image: + type: name + name: "${NUTANIX_MACHINE_TEMPLATE_IMAGE_NAME}" + cluster: + type: name + name: "${NUTANIX_PRISM_ELEMENT_CLUSTER_NAME}" + subnet: + - type: name + name: "${NUTANIX_SUBNET_NAME}" + # Adds additional categories to the virtual machines. + # Note: Categories must already be present in Prism Central + # additionalCategories: + # - key: AppType + # value: Kubernetes + # Adds the cluster virtual machines to a project defined in Prism Central. + # Replace NUTANIX_PROJECT_NAME with the correct project defined in Prism Central + # Note: Project must already be present in Prism Central. + # project: + # type: name + # name: "NUTANIX_PROJECT_NAME" + # gpus: + # - type: name + # name: "GPU NAME" +``` + +## NutanixMachineTemplate spec +The table below provides an overview of the supported parameters of the `spec` attribute of a `NutanixMachineTemplate` resource. + +### Configuration parameters +| Key |Type |Description| +|------------------------------------|------|--------------------------------------------------------------------------------------------------------| +|bootType |string|Boot type of the VM. Depends on the OS image used. Allowed values: `legacy`, `uefi`. Default: `legacy` | +|vcpusPerSocket |int |Amount of vCPUs per socket. Default: `1` | +|vcpuSockets |int |Amount of vCPU sockets. Default: `2` | +|memorySize |string|Amount of Memory. Default: `4Gi` | +|systemDiskSize |string|Amount of storage assigned to the system disk. Default: `40Gi` | +|image |object|Reference (name or uuid) to the OS image used for the system disk. | +|image.type |string|Type to identify the OS image. Allowed values: `name` and `uuid` | +|image.name |string|Name of the image. | +|image.uuid |string|UUID of the image. | +|cluster |object|(Optional) Reference (name or uuid) to the Prism Element cluster. Name or UUID can be passed | +|cluster.type |string|Type to identify the Prism Element cluster. Allowed values: `name` and `uuid` | +|cluster.name |string|Name of the Prism Element cluster. | +|cluster.uuid |string|UUID of the Prism Element cluster. | +|subnets |list |(Optional) Reference (name or uuid) to the subnets to be assigned to the VMs. | +|subnets.[].type |string|Type to identify the subnet. Allowed values: `name` and `uuid` | +|subnets.[].name |string|Name of the subnet. | +|subnets.[].uuid |string|UUID of the subnet. | +|additionalCategories |list |Reference to the categories to be assigned to the VMs. These categories already exist in Prism Central. | +|additionalCategories.[].key |string|Key of the category. | +|additionalCategories.[].value |string|Value of the category. | +|project |object|Reference (name or uuid) to the project. This project must already exist in Prism Central. | +|project.type |string|Type to identify the project. Allowed values: `name` and `uuid` | +|project.name |string|Name of the project. | +|project.uuid |string|UUID of the project. | +|gpus |object|Reference (name or deviceID) to the GPUs to be assigned to the VMs. Can be vGPU or Passthrough. | +|gpus.[].type |string|Type to identify the GPU. Allowed values: `name` and `deviceID` | +|gpus.[].name |string|Name of the GPU or the vGPU profile | +|gpus.[].deviceID |string|DeviceID of the GPU or the vGPU profile | + +!!! note + The `cluster` or `subnets` configuration parameters are optional in case failure domains are defined on the `NutanixCluster` and `MachineDeployment` resources. \ No newline at end of file diff --git a/docs/capx/v1.4.x/user_requirements.md b/docs/capx/v1.4.x/user_requirements.md new file mode 100644 index 00000000..05e971a5 --- /dev/null +++ b/docs/capx/v1.4.x/user_requirements.md @@ -0,0 +1,37 @@ +# User Requirements + +Cluster API Provider Nutanix Cloud Infrastructure (CAPX) interacts with Nutanix Prism Central (PC) APIs using a Prism Central user account. + +CAPX supports two types of PC users: + +- Local users: must be assigned the `Prism Central Admin` role. +- Domain users: must be assigned a role that at least has the [Minimum required CAPX permissions for domain users](#minimum-required-capx-permissions-for-domain-users) assigned. + +See [Credential Management](./credential_management.md){target=_blank} for more information on how to pass the user credentials to CAPX. + +## Minimum required CAPX permissions for domain users + +The following permissions are required for Prism Central domain users: + +- Create Category Mapping +- Create Image +- Create Or Update Name Category +- Create Or Update Value Category +- Create Virtual Machine +- Delete Category Mapping +- Delete Image +- Delete Name Category +- Delete Value Category +- Delete Virtual Machine +- Detach Volume Group From AHV VM +- View Category Mapping +- View Cluster +- View Image +- View Name Category +- View Project +- View Subnet +- View Value Category +- View Virtual Machine + +!!! note + The list of permissions has been validated on PC 2022.6 and above. diff --git a/docs/capx/v1.4.x/validated_integrations.md b/docs/capx/v1.4.x/validated_integrations.md new file mode 100644 index 00000000..5d61d932 --- /dev/null +++ b/docs/capx/v1.4.x/validated_integrations.md @@ -0,0 +1,62 @@ +# Validated Integrations + +Validated integrations are a defined set of specifically tested configurations between technologies that represent the most common combinations that Nutanix customers are using or deploying with CAPX. For these integrations, Nutanix has directly, or through certified partners, exercised a full range of platform tests as part of the product release process. + +## Integration Validation Policy + +Nutanix follows the version validation policies below: + +- Validate at least one active AOS LTS (long term support) version. Validated AOS LTS version for a specific CAPX version is listed in the [AOS](#aos) section.
+ + !!! note + + Typically the latest LTS release at time of CAPX release except when latest is initial release in train (eg x.y.0). Exact version depends on timing and customer adoption. + +- Validate the latest AOS STS (short term support) release at time of CAPX release. +- Validate at least one active Prism Central (PC) version. Validated PC version for a specific CAPX version is listed in the [Prism Central](#prism-central) section.
+ + !!! note + + Typically the the latest PC release at time of CAPX release except when latest is initial release in train (eg x.y.0). Exact version depends on timing and customer adoption. + +- At least one active Cluster-API (CAPI) version. Validated CAPI version for a specific CAPX version is listed in the [Cluster-API](#cluster-api) section.
+ + !!! note + + Typically the the latest Cluster-API release at time of CAPX release except when latest is initial release in train (eg x.y.0). Exact version depends on timing and customer adoption. + +## Validated versions +### Cluster-API +| CAPX | CAPI v1.2.x | CAPI v1.3.x | CAPI v1.4.x | CAPI v1.5.x | CAPI v1.6.x | CAPI v1.7.x | +|--------|-------------|-------------|-------------|-------------|-------------|-------------| +| v1.4.x | No | Yes | Yes | Yes | Yes | Yes | +| v1.3.x | No | Yes | Yes | Yes | Yes | No | +| v1.2.x | No | Yes | Yes | Yes | No | No | +| v1.1.x | Yes | Yes | No | No | No | No | +| v1.0.x | Yes | No | No | No | No | No | +| v0.5.x | Yes | No | No | No | No | No | + +See the [Validated Kubernetes Versions](https://cluster-api.sigs.k8s.io/reference/versions.html?highlight=version#supported-kubernetes-versions){target=_blank} page for more information on CAPI validated versions. + +### AOS + +| CAPX | 5.20.4.5 (LTS) | 6.1.1.5 (STS) | 6.5.x (LTS) | 6.6 (STS) | 6.7 (STS) | 6.8 (STS) | +|--------|----------------|---------------|-------------|-----------|-----------|-----------| +| v1.4.x | No | No | Yes | No | No | Yes | +| v1.3.x | No | No | Yes | Yes | Yes | No | +| v1.2.x | No | No | Yes | Yes | Yes | No | +| v1.1.x | No | No | Yes | No | No | No | +| v1.0.x | Yes | Yes | No | No | No | No | +| v0.5.x | Yes | Yes | No | No | No | No | + + +### Prism Central + +| CAPX | 2022.1.0.2 | pc.2022.6 | pc.2022.9 | pc.2023.x | pc.2024.x | +|--------|------------|-----------|-----------|-----------|-----------| +| v1.4.x | No | Yes | No | Yes | Yes | +| v1.3.x | No | Yes | No | Yes | No | +| v1.2.x | No | Yes | Yes | Yes | No | +| v1.1.x | No | Yes | No | No | No | +| v1.0.x | Yes | Yes | No | No | No | +| v0.5.x | Yes | Yes | No | No | No | diff --git a/docs/capx/v1.5.x/addons/install_csi_driver.md b/docs/capx/v1.5.x/addons/install_csi_driver.md new file mode 100644 index 00000000..afb4bdc8 --- /dev/null +++ b/docs/capx/v1.5.x/addons/install_csi_driver.md @@ -0,0 +1,215 @@ +# Nutanix CSI Driver installation with CAPX + +The Nutanix CSI driver is fully supported on CAPI/CAPX deployed clusters where all the nodes meet the [Nutanix CSI driver prerequisites](#capi-workload-cluster-prerequisites-for-the-nutanix-csi-driver). + +There are three methods to install the Nutanix CSI driver on a CAPI/CAPX cluster: + +- Helm +- ClusterResourceSet +- CAPX Flavor + +For more information, check the next sections. + +## CAPI Workload cluster prerequisites for the Nutanix CSI Driver + +Kubernetes workers need the following prerequisites to use the Nutanix CSI Drivers: + +- iSCSI initiator package (for Volumes based block storage) +- NFS client package (for Files based storage) + +These packages may already be present in the image you use with your infrastructure provider or you can also rely on your bootstrap provider to install them. More info is available in the [Prerequisites docs](https://portal.nutanix.com/page/documents/details?targetId=CSI-Volume-Driver-v2_6:csi-csi-plugin-prerequisites-r.html){target=_blank}. + +The package names and installation method will also vary depending on the operating system you plan to use. + +In the example below, `kubeadm` bootstrap provider is used to deploy these packages on top of an Ubuntu 20.04 image. The `kubeadm` bootstrap provider allows defining `preKubeadmCommands` that will be launched before Kubernetes cluster creation. These `preKubeadmCommands` can be defined both in `KubeadmControlPlane` for master nodes and in `KubeadmConfigTemplate` for worker nodes. + +In the example with an Ubuntu 20.04 image, both `KubeadmControlPlane` and `KubeadmConfigTemplate` must be modified as in the example below: + +```yaml +spec: + template: + spec: + # ....... + preKubeadmCommands: + - echo "before kubeadm call" > /var/log/prekubeadm.log + - apt update + - apt install -y nfs-common open-iscsi + - systemctl enable --now iscsid +``` +## Install the Nutanix CSI Driver with Helm + +A recent [Helm](https://helm.sh){target=_blank} version is needed (tested with Helm v3.10.1). + +The example below must be applied on a ready workload cluster. The workload cluster's kubeconfig can be retrieved and used to connect with the following command: + +```shell +clusterctl get kubeconfig $CLUSTER_NAME -n $CLUSTER_NAMESPACE > $CLUSTER_NAME-KUBECONFIG +export KUBECONFIG=$(pwd)/$CLUSTER_NAME-KUBECONFIG +``` + +Once connected to the cluster, follow the [CSI documentation](https://portal.nutanix.com/page/documents/details?targetId=CSI-Volume-Driver-v2_6:csi-csi-driver-install-t.html){target=_blank}. + +First, install the [nutanix-csi-snapshot](https://github.com/nutanix/helm/tree/master/charts/nutanix-csi-snapshot){target=_blank} chart followed by the [nutanix-csi-storage](https://github.com/nutanix/helm/tree/master/charts/nutanix-csi-storage){target=_blank} chart. + +See an example below: + +```shell +#Add the official Nutanix Helm repo and get the latest update +helm repo add nutanix https://nutanix.github.io/helm/ +helm repo update + +# Install the nutanix-csi-snapshot chart +helm install nutanix-csi-snapshot nutanix/nutanix-csi-snapshot -n ntnx-system --create-namespace + +# Install the nutanix-csi-storage chart +helm install nutanix-storage nutanix/nutanix-csi-storage -n ntnx-system --set createSecret=false +``` + +!!! warning + For correct Nutanix CSI driver deployment, a fully functional CNI deployment must be present. + +## Install the Nutanix CSI Driver with `ClusterResourceSet` + +The `ClusterResourceSet` feature was introduced to automatically apply a set of resources (such as CNI/CSI) defined by administrators to matching created/existing workload clusters. + +### Enabling the `ClusterResourceSet` feature + +At the time of writing, `ClusterResourceSet` is an experimental feature that must be enabled during the initialization of a management cluster with the `EXP_CLUSTER_RESOURCE_SET` feature gate. + +To do this, add `EXP_CLUSTER_RESOURCE_SET: "true"` in the `clusterctl` configuration file or just `export EXP_CLUSTER_RESOURCE_SET=true` before initializing the management cluster with `clusterctl init`. + +If the management cluster is already initialized, the `ClusterResourceSet` can be enabled by changing the configuration of the `capi-controller-manager` deployment in the `capi-system` namespace. + + ```shell + kubectl edit deployment -n capi-system capi-controller-manager + ``` + +Locate the section below: + +```yaml + - args: + - --leader-elect + - --metrics-bind-addr=localhost:8080 + - --feature-gates=MachinePool=false,ClusterResourceSet=true,ClusterTopology=false +``` + +Then replace `ClusterResourceSet=false` with `ClusterResourceSet=true`. + +!!! note + Editing the `deployment` resource will cause Kubernetes to automatically start new versions of the containers with the feature enabled. + + + +### Prepare the Nutanix CSI `ClusterResourceSet` + +#### Create the `ConfigMap` for the CSI Plugin + +First, create a `ConfigMap` that contains a YAML manifest with all resources to install the Nutanix CSI driver. + +Since the Nutanix CSI Driver is provided as a Helm chart, use `helm` to extract it before creating the `ConfigMap`. See an example below: + +```shell +helm repo add nutanix https://nutanix.github.io/helm/ +helm repo update + +kubectl create ns ntnx-system --dry-run=client -o yaml > nutanix-csi-namespace.yaml +helm template nutanix-csi-snapshot nutanix/nutanix-csi-snapshot -n ntnx-system > nutanix-csi-snapshot.yaml +helm template nutanix-csi-snapshot nutanix/nutanix-csi-storage -n ntnx-system > nutanix-csi-storage.yaml + +kubectl create configmap nutanix-csi-crs --from-file=nutanix-csi-namespace.yaml --from-file=nutanix-csi-snapshot.yaml --from-file=nutanix-csi-storage.yaml +``` + +#### Create the `ClusterResourceSet` + +Next, create the `ClusterResourceSet` resource that will map the `ConfigMap` defined above to clusters using a `clusterSelector`. + +The `ClusterResourceSet` needs to be created inside the management cluster. See an example below: + +```yaml +--- +apiVersion: addons.cluster.x-k8s.io/v1alpha3 +kind: ClusterResourceSet +metadata: + name: nutanix-csi-crs +spec: + clusterSelector: + matchLabels: + csi: nutanix + resources: + - kind: ConfigMap + name: nutanix-csi-crs +``` + +The `clusterSelector` field controls how Cluster API will match this `ClusterResourceSet` on one or more workload clusters. In the example scenario, the `matchLabels` approach is being used where the `ClusterResourceSet` will be applied to all workload clusters having the `csi: nutanix` label present. If the label isn't present, the `ClusterResourceSet` won't apply to that workload cluster. + +The `resources` field references the `ConfigMap` created above, which contains the manifests for installing the Nutanix CSI driver. + +#### Assign the `ClusterResourceSet` to a workload cluster + +Assign this `ClusterResourceSet` to the workload cluster by adding the correct label to the `Cluster` resource. + +This can be done before workload cluster creation by editing the output of the `clusterctl generate cluster` command or by modifying an already deployed workload cluster. + +In both cases, `Cluster` resources should look like this: + +```yaml +apiVersion: cluster.x-k8s.io/v1beta1 +kind: Cluster +metadata: + name: workload-cluster-name + namespace: workload-cluster-namespace + labels: + csi: nutanix +# ... +``` + +!!! warning + For correct Nutanix CSI driver deployment, a fully functional CNI deployment must be present. + +## Install the Nutanix CSI Driver with a CAPX flavor + +The CAPX provider can utilize a flavor to automatically deploy the Nutanix CSI using a `ClusterResourceSet`. + +### Prerequisites + +The following requirements must be met: + +- The operating system must meet the [Nutanix CSI OS prerequisites](#capi-workload-cluster-prerequisites-for-the-nutanix-csi-driver). +- The Management cluster must be installed with the [`CLUSTER_RESOURCE_SET` feature gate](#enabling-the-clusterresourceset-feature). + +### Installation + +Specify the `csi` flavor during workload cluster creation. See an example below: + +```shell +clusterctl generate cluster my-cluster -f csi +``` + +Additional environment variables are required: + +- `WEBHOOK_CA`: Base64 encoded CA certificate used to sign the webhook certificate +- `WEBHOOK_CERT`: Base64 certificate for the webhook validation component +- `WEBHOOK_KEY`: Base64 key for the webhook validation component + +The three components referenced above can be automatically created and referenced using [this script](https://github.com/nutanix-cloud-native/cluster-api-provider-nutanix/blob/main/scripts/gen-self-cert.sh){target=_blank}: + +``` +source scripts/gen-self-cert.sh +``` + +The certificate must reference the following names: + +- csi-snapshot-webhook +- csi-snapshot-webhook.ntnx-sytem +- csi-snapshot-webhook.ntnx-sytem.svc + +!!! warning + For correct Nutanix CSI driver deployment, a fully functional CNI deployment must be present. + +## Nutanix CSI Driver Configuration + +After the driver is installed, it must be configured for use by minimally defining a `Secret` and `StorageClass`. + +This can be done manually in the workload clusters or by using a `ClusterResourceSet` in the management cluster as explained above. + +See the Official [CSI Driver documentation](https://portal.nutanix.com/page/documents/details?targetId=CSI-Volume-Driver-v2_6:CSI-Volume-Driver-v2_6){target=_blank} on the Nutanix Portal for more configuration information. diff --git a/docs/capx/v1.5.x/credential_management.md b/docs/capx/v1.5.x/credential_management.md new file mode 100644 index 00000000..bebbc5a0 --- /dev/null +++ b/docs/capx/v1.5.x/credential_management.md @@ -0,0 +1,93 @@ +# Credential Management +Cluster API Provider Nutanix Cloud Infrastructure (CAPX) interacts with Nutanix Prism Central (PC) APIs to manage the required Kubernetes cluster infrastructure resources. + +PC credentials are required to authenticate to the PC APIs. CAPX currently supports two mechanisms to supply the required credentials: + +- Credentials injected into the CAPX manager deployment +- Workload cluster specific credentials + +## Credentials injected into the CAPX manager deployment +By default, credentials will be injected into the CAPX manager deployment when CAPX is initialized. See the [getting started guide](./getting_started.md) for more information on the initialization. + +Upon initialization a `nutanix-creds` secret will automatically be created in the `capx-system` namespace. This secret will contain the values supplied via the `NUTANIX_USER` and `NUTANIX_PASSWORD` parameters. + +The `nutanix-creds` secret will be used for workload cluster deployment if no other credential is supplied. + +### Example +An example of the automatically created `nutanix-creds` secret can be found below: +```yaml +--- +apiVersion: v1 +kind: Secret +type: Opaque +metadata: + name: nutanix-creds + namespace: capx-system +stringData: + credentials: | + [ + { + "type": "basic_auth", + "data": { + "prismCentral":{ + "username": "", + "password": "" + }, + "prismElements": null + } + } + ] +``` + +## Workload cluster specific credentials +Users can override the [credentials injected in CAPX manager deployment](#credentials-injected-into-the-capx-manager-deployment) by supplying a credential specific to a workload cluster. The credentials can be supplied by creating a secret in the same namespace as the `NutanixCluster` namespace. + +The secret can be referenced by adding a `credentialRef` inside the `prismCentral` attribute contained in the `NutanixCluster`. +The secret will also be deleted when the `NutanixCluster` is deleted. + +Note: There is a 1:1 relation between the secret and the `NutanixCluster` object. + +### Example +Create a secret in the namespace of the `NutanixCluster`: + +```yaml +--- +apiVersion: v1 +kind: Secret +metadata: + name: "" + namespace: "" +stringData: + credentials: | + [ + { + "type": "basic_auth", + "data": { + "prismCentral":{ + "username": "", + "password": "" + }, + "prismElements": null + } + } + ] +``` + +Add a `prismCentral` and corresponding `credentialRef` to the `NutanixCluster`: + +```yaml +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixCluster +metadata: + name: "" + namespace: "" +spec: + prismCentral: + ... + credentialRef: + name: "" + kind: Secret +... +``` + +See the [NutanixCluster](./types/nutanix_cluster.md) documentation for all supported configuration parameters for the `prismCentral` and `credentialRef` attribute. \ No newline at end of file diff --git a/docs/capx/v1.5.x/experimental/autoscaler.md b/docs/capx/v1.5.x/experimental/autoscaler.md new file mode 100644 index 00000000..2af57213 --- /dev/null +++ b/docs/capx/v1.5.x/experimental/autoscaler.md @@ -0,0 +1,129 @@ +# Using Autoscaler in combination with CAPX + +!!! warning + The scenario and features described on this page are experimental. It's important to note that they have not been fully validated. + +[Autoscaler](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/README.md){target=_blank} can be used in combination with Cluster API to automatically add or remove machines in a cluster. + +Autoscaler can be used in different deployment scenarios. This page will provide an overview of multiple autoscaler deployment scenarios in combination with CAPX. +See the [Testing](#testing) section to see how scale-up/scale-down events can be triggered to validate the autoscaler behaviour. + +More in-depth information on Autoscaler functionality can be found in the [Kubernetes documentation](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/README.md){target=_blank}. + +All Autoscaler configuration parameters can be found [here](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-the-parameters-to-ca){target=_blank}. + +## Scenario 1: Management cluster managing an external workload cluster +In this scenario, Autoscaler will be running on a management cluster and it will manage an external workload cluster. See the management cluster managing an external workload cluster section of [Kubernetes documentation](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/README.md#autoscaler-running-in-management-cluster-using-service-account-credentials-with-separate-workload-cluster){target=_blank} for more information. + +### Steps +1. Deploy a management cluster and workload cluster. The [CAPI quickstart](https://cluster-api.sigs.k8s.io/user/quick-start.html){target=_blank} can be used as a starting point. + + !!! note + Make sure a CNI is installed in the workload cluster. + +4. Download the example [Autoscaler deployment file](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/examples/deployment.yaml){target=_blank}. +5. Modify the `deployment.yaml` file: + - Change the namespace of all resources to the namespaces of the workload cluster. + - Choose an autoscale image. + - Change the following parameters in the `Deployment` resource: +```YAML + spec: + containers: + name: cluster-autoscaler + command: + - /cluster-autoscaler + args: + - --cloud-provider=clusterapi + - --kubeconfig=/mnt/kubeconfig/kubeconfig.yml + - --clusterapi-cloud-config-authoritative + - -v=1 + volumeMounts: + - mountPath: /mnt/kubeconfig + name: kubeconfig + readOnly: true + ... + volumes: + - name: kubeconfig + secret: + secretName: -kubeconfig + items: + - key: value + path: kubeconfig.yml +``` +7. Apply the `deployment.yaml` file. +```bash +kubectl apply -f deployment.yaml +``` +8. Add the [annotations](#autoscaler-node-group-annotations) to the workload cluster `MachineDeployment` resource. +9. Test Autoscaler. Go to the [Testing](#testing) section. + +## Scenario 2: Autoscaler running on workload cluster +In this scenario, Autoscaler will be deployed [on top of the workload cluster](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/README.md#autoscaler-running-in-a-joined-cluster-using-service-account-credentials){target=_blank} directly. In order for Autoscaler to work, it is required that the workload cluster resources are moved from the management cluster to the workload cluster. + +### Steps +1. Deploy a management cluster and workload cluster. The [CAPI quickstart](https://cluster-api.sigs.k8s.io/user/quick-start.html){target=_blank} can be used as a starting point. +2. Get the kubeconfig file for the workload cluster and use this kubeconfig to login to the workload cluster. +```bash +clusterctl get kubeconfig -n /path/to/kubeconfig +``` +3. Install a CNI in the workload cluster. +4. Initialise the CAPX components on top of the workload cluster: +```bash +clusterctl init --infrastructure nutanix +``` +5. Migrate the workload cluster custom resources to the workload cluster. Run following command from the management cluster: +```bash +clusterctl move -n --to-kubeconfig /path/to/kubeconfig +``` +6. Verify if the cluster has been migrated by running following command on the workload cluster: +```bash +kubectl get cluster -A +``` +7. Download the example [autoscaler deployment file](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/examples/deployment.yaml){target=_blank}. +8. Create the Autoscaler namespace: +```bash +kubectl create ns autoscaler +``` +9. Apply the `deployment.yaml` file +```bash +kubectl apply -f deployment.yaml +``` +10. Add the [annotations](#autoscaler-node-group-annotations) to the workload cluster `MachineDeployment` resource. +11. Test Autoscaler. Go to the [Testing](#testing) section. + +## Testing + +1. Deploy an example Kubernetes application. For example, the one used in the [Kubernetes HorizontalPodAutoscaler Walkthrough](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/). +```bash +kubectl apply -f https://k8s.io/examples/application/php-apache.yaml +``` +2. Increase the amount of replicas of the application to trigger a scale-up event: +``` +kubectl scale deployment php-apache --replicas 100 +``` +3. Decrease the amount of replicas of the application again to trigger a scale-down event. + + !!! note + In case of issues check the logs of the Autoscaler pods. + +4. After a while CAPX, will add more machines. Refer to the [Autoscaler configuration parameters](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-the-parameters-to-ca){target=_blank} to tweak the behaviour and timeouts. + +## Autoscaler node group annotations +Autoscaler uses following annotations to define the upper and lower boundries of the managed machines: + +| Annotation | Example Value | Description | +|-------------------------------------------------------------|---------------|-----------------------------------------------| +| cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size | 5 | Maximum amount of machines in this node group | +| cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size | 1 | Minimum amount of machines in this node group | + +These annotations must be applied to the `MachineDeployment` resources of a CAPX cluster. + +### Example +```YAML +apiVersion: cluster.x-k8s.io/v1beta1 +kind: MachineDeployment +metadata: + annotations: + cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size: "5" + cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size: "1" +``` \ No newline at end of file diff --git a/docs/capx/v1.5.x/experimental/capx_multi_pe.md b/docs/capx/v1.5.x/experimental/capx_multi_pe.md new file mode 100644 index 00000000..bd52ccd7 --- /dev/null +++ b/docs/capx/v1.5.x/experimental/capx_multi_pe.md @@ -0,0 +1,30 @@ +# Creating a workload CAPX cluster spanning Prism Element clusters + +!!! warning + The scenario and features described on this page are experimental. It's important to note that they have not been fully validated. + +This page will explain how to deploy CAPX-based Kubernetes clusters where worker nodes are spanning multiple Prism Element (PE) clusters. + +!!! note + All the PE clusters must be managed by the same Prism Central (PC) instance. + +The topology will look like this: + +- One PC managing multiple PE's +- One CAPI management cluster +- One CAPI workload cluster with multiple `MachineDeployment`resources + +Refer to the [CAPI quickstart](https://cluster-api.sigs.k8s.io/user/quick-start.html){target=_blank} to get started with CAPX. + +To create workload clusters spanning multiple Prism Element clusters, it is required to create a `MachineDeployment` and `NutanixMachineTemplate` resource for each Prism Element cluster. The Prism Element specific parameters (name/UUID, subnet,...) are referenced in the `NutanixMachineTemplate`. + +## Steps +1. Create a management cluster that has the CAPX infrastructure provider deployed. +2. Create a `cluster.yml` file containing the workload cluster definition. Refer to the steps defined in the [CAPI quickstart guide](https://cluster-api.sigs.k8s.io/user/quick-start.html){target=_blank} to create an example `cluster.yml` file. +3. Add additional `MachineDeployment` and `NutanixMachineTemplate` resources. + + By default there is only one machine template and machine deployment defined. To add nodes residing on another Prism Element cluster, a new `MachineDeployment` and `NutanixMachineTemplate` resource needs to be added to the yaml file. The autogenerated `MachineDeployment` and `NutanixMachineTemplate` resource definitions can be used as a baseline. + + Make sure to modify the `MachineDeployment` and `NutanixMachineTemplate` parameters. + +4. Apply the modified `cluster.yml` file to the management cluster. diff --git a/docs/capx/v1.5.x/experimental/oidc.md b/docs/capx/v1.5.x/experimental/oidc.md new file mode 100644 index 00000000..0c274121 --- /dev/null +++ b/docs/capx/v1.5.x/experimental/oidc.md @@ -0,0 +1,31 @@ +# OIDC integration + +!!! warning + The scenario and features described on this page are experimental. It's important to note that they have not been fully validated. + +Kubernetes allows users to authenticate using various authentication mechanisms. One of these mechanisms is OIDC. Information on how Kubernetes interacts with OIDC providers can be found in the [OpenID Connect Tokens](https://kubernetes.io/docs/reference/access-authn-authz/authentication/#openid-connect-tokens){target=_blank} section of the official Kubernetes documentation. + + +Follow the steps below to configure a CAPX cluster to use an OIDC identity provider. + +## Steps +1. Generate a `cluster.yaml` file with the required CAPX cluster configuration. Refer to the [Getting Started](../getting_started.md){target=_blank} page for more information on how to generate a `cluster.yaml` file. Do not apply the `cluster.yaml` file. +2. Edit the `cluster.yaml` file and search for the `KubeadmControlPlane` resource. +3. Modify/add the `spec.kubeadmConfigSpec.clusterConfiguration.apiServer.extraArgs` attribute and add the required [API server parameters](https://kubernetes.io/docs/reference/access-authn-authz/authentication/#configuring-the-api-server){target=_blank}. See the [example](#example) below. +4. Apply the `cluster.yaml` file +5. Log in with the OIDC provider once the cluster is provisioned + +## Example +```YAML +kind: KubeadmControlPlane +spec: + kubeadmConfigSpec: + clusterConfiguration: + apiServer: + extraArgs: + ... + oidc-client-id: + oidc-issuer-url: + ... +``` + diff --git a/docs/capx/v1.5.x/experimental/proxy.md b/docs/capx/v1.5.x/experimental/proxy.md new file mode 100644 index 00000000..c8f940d4 --- /dev/null +++ b/docs/capx/v1.5.x/experimental/proxy.md @@ -0,0 +1,62 @@ +# Proxy configuration + +!!! warning + The scenario and features described on this page are experimental. It's important to note that they have not been fully validated. + +CAPX can be configured to use a proxy to connect to external networks. This proxy configuration needs to be applied to control plane and worker nodes. + +Follow the steps below to configure a CAPX cluster to use a proxy. + +## Steps +1. Generate a `cluster.yaml` file with the required CAPX cluster configuration. Refer to the [Getting Started](../getting_started.md){target=_blank} page for more information on how to generate a `cluster.yaml` file. Do not apply the `cluster.yaml` file. +2. Edit the `cluster.yaml` file and modify the following resources as shown in the [example](#example) below to add the proxy configuration. + 1. `KubeadmControlPlane`: + * Add the proxy configuration to the `spec.kubeadmConfigSpec.files` list. Do not modify other items in the list. + * Add `systemctl` commands to apply the proxy config in `spec.kubeadmConfigSpec.preKubeadmCommands`. Do not modify other items in the list. + 2. `KubeadmConfigTemplate`: + * Add the proxy configuration to the `spec.template.spec.files` list. Do not modify other items in the list. + * Add `systemctl` commands to apply the proxy config in `spec.template.spec.preKubeadmCommands`. Do not modify other items in the list. +4. Apply the `cluster.yaml` file + +## Example + +```YAML +--- +# controlplane proxy settings +kind: KubeadmControlPlane +spec: + kubeadmConfigSpec: + files: + - content: | + [Service] + Environment="HTTP_PROXY=" + Environment="HTTPS_PROXY=" + Environment="NO_PROXY=" + owner: root:root + path: /etc/systemd/system/containerd.service.d/http-proxy.conf + ... + preKubeadmCommands: + - sudo systemctl daemon-reload + - sudo systemctl restart containerd + ... +--- +# worker proxy settings +kind: KubeadmConfigTemplate +spec: + template: + spec: + files: + - content: | + [Service] + Environment="HTTP_PROXY=" + Environment="HTTPS_PROXY=" + Environment="NO_PROXY=" + owner: root:root + path: /etc/systemd/system/containerd.service.d/http-proxy.conf + ... + preKubeadmCommands: + - sudo systemctl daemon-reload + - sudo systemctl restart containerd + ... +``` + diff --git a/docs/capx/v1.5.x/experimental/registry_mirror.md b/docs/capx/v1.5.x/experimental/registry_mirror.md new file mode 100644 index 00000000..307a9425 --- /dev/null +++ b/docs/capx/v1.5.x/experimental/registry_mirror.md @@ -0,0 +1,96 @@ +# Registry Mirror configuration + +!!! warning + The scenario and features described on this page are experimental. It's important to note that they have not been fully validated. + +CAPX can be configured to use a private registry to act as a mirror of an external public registry. This registry mirror configuration needs to be applied to control plane and worker nodes. + +Follow the steps below to configure a CAPX cluster to use a registry mirror. + +## Steps +1. Generate a `cluster.yaml` file with the required CAPX cluster configuration. Refer to the [Getting Started](../getting_started.md){target=_blank} page for more information on how to generate a `cluster.yaml` file. Do not apply the `cluster.yaml` file. +2. Edit the `cluster.yaml` file and modify the following resources as shown in the [example](#example) below to add the proxy configuration. + 1. `KubeadmControlPlane`: + * Add the registry mirror configuration to the `spec.kubeadmConfigSpec.files` list. Do not modify other items in the list. + * Update `/etc/containerd/config.toml` commands to apply the registry mirror config in `spec.kubeadmConfigSpec.preKubeadmCommands`. Do not modify other items in the list. + 2. `KubeadmConfigTemplate`: + * Add the registry mirror configuration to the `spec.template.spec.files` list. Do not modify other items in the list. + * Update `/etc/containerd/config.toml` commands to apply the registry mirror config in `spec.template.spec.preKubeadmCommands`. Do not modify other items in the list. +4. Apply the `cluster.yaml` file + +## Example + +This example will configure a registry mirror for the following namespace: + +* registry.k8s.io +* ghcr.io +* quay.io + +and redirect them to corresponding projects of the `` registry. + +```YAML +--- +# controlplane proxy settings +kind: KubeadmControlPlane +spec: + kubeadmConfigSpec: + files: + - content: | + [host."https:///v2/registry.k8s.io"] + capabilities = ["pull", "resolve"] + skip_verify = false + override_path = true + owner: root:root + path: /etc/containerd/certs.d/registry.k8s.io/hosts.toml + - content: | + [host."https:///v2/ghcr.io"] + capabilities = ["pull", "resolve"] + skip_verify = false + override_path = true + owner: root:root + path: /etc/containerd/certs.d/ghcr.io/hosts.toml + - content: | + [host."https:///v2/quay.io"] + capabilities = ["pull", "resolve"] + skip_verify = false + override_path = true + owner: root:root + path: /etc/containerd/certs.d/quay.io/hosts.toml + ... + preKubeadmCommands: + - echo '\n[plugins."io.containerd.grpc.v1.cri".registry]\n config_path = "/etc/containerd/certs.d"' >> /etc/containerd/config.toml + ... +--- +# worker proxy settings +kind: KubeadmConfigTemplate +spec: + template: + spec: + files: + - content: | + [host."https:///v2/registry.k8s.io"] + capabilities = ["pull", "resolve"] + skip_verify = false + override_path = true + owner: root:root + path: /etc/containerd/certs.d/registry.k8s.io/hosts.toml + - content: | + [host."https:///v2/ghcr.io"] + capabilities = ["pull", "resolve"] + skip_verify = false + override_path = true + owner: root:root + path: /etc/containerd/certs.d/ghcr.io/hosts.toml + - content: | + [host."https:///v2/quay.io"] + capabilities = ["pull", "resolve"] + skip_verify = false + override_path = true + owner: root:root + path: /etc/containerd/certs.d/quay.io/hosts.toml + ... + preKubeadmCommands: + - echo '\n[plugins."io.containerd.grpc.v1.cri".registry]\n config_path = "/etc/containerd/certs.d"' >> /etc/containerd/config.toml + ... +``` + diff --git a/docs/capx/v1.5.x/experimental/vpc.md b/docs/capx/v1.5.x/experimental/vpc.md new file mode 100644 index 00000000..3513e47e --- /dev/null +++ b/docs/capx/v1.5.x/experimental/vpc.md @@ -0,0 +1,40 @@ +# Creating a workload CAPX cluster in a Nutanix Flow VPC + +!!! warning + The scenario and features described on this page are experimental. It's important to note that they have not been fully validated. + +!!! note + Nutanix Flow VPCs are only validated with CAPX 1.1.3+ + +[Nutanix Flow Virtual Networking](https://portal.nutanix.com/page/documents/details?targetId=Nutanix-Flow-Virtual-Networking-Guide-vpc_2022_9:Nutanix-Flow-Virtual-Networking-Guide-vpc_2022_9){target=_blank} allows users to create Virtual Private Clouds (VPCs) with Overlay networking. +The steps below will illustrate how a CAPX cluster can be deployed inside an overlay subnet (NAT) inside a VPC while the management cluster resides outside of the VPC. + + +## Steps +1. [Request a floating IP](https://portal.nutanix.com/page/documents/details?targetId=Nutanix-Flow-Networking-Guide:ear-flow-nw-request-floating-ip-pc-t.html){target=_blank} +2. Link the floating IP to an internal IP address inside the overlay subnet that will be used to deploy the CAPX cluster. This address will be assigned to the CAPX loadbalancer. To prevent IP conflicts, make sure the IP address is not part of the IP-pool defined in the subnet. +3. Generate a `cluster.yaml` file with the required CAPX cluster configuration where the `CONTROL_PLANE_ENDPOINT_IP` is set to the floating IP requested in the first step. Refer to the [Getting Started](../getting_started.md){target=_blank} page for more information on how to generate a `cluster.yaml` file. Do not apply the `cluster.yaml` file. +4. Edit the `cluster.yaml` file and search for the `KubeadmControlPlane` resource. +5. Modify the `spec.kubeadmConfigSpec.files.*.content` attribute and change the `kube-vip` definition similar to the [example](#example) below. +6. Apply the `cluster.yaml` file. +7. When the CAPX workload cluster is deployed, it will be reachable via the floating IP. + +## Example +```YAML +kind: KubeadmControlPlane +spec: + kubeadmConfigSpec: + files: + - content: | + apiVersion: v1 + kind: Pod + metadata: + name: kube-vip + namespace: kube-system + spec: + containers: + - env: + - name: address + value: "" +``` + diff --git a/docs/capx/v1.5.x/getting_started.md b/docs/capx/v1.5.x/getting_started.md new file mode 100644 index 00000000..d8191883 --- /dev/null +++ b/docs/capx/v1.5.x/getting_started.md @@ -0,0 +1,159 @@ +# Getting Started + +This is a guide on getting started with Cluster API Provider Nutanix Cloud Infrastructure (CAPX). To learn more about cluster API in more depth, check out the [Cluster API book](https://cluster-api.sigs.k8s.io/){target=_blank}. + +For more information on how install the Nutanix CSI Driver on a CAPX cluster, visit [Nutanix CSI Driver installation with CAPX](./addons/install_csi_driver.md). + +For more information on how CAPX handles credentials, visit [Credential Management](./credential_management.md). + +For more information on the port requirements for CAPX, visit [Port Requirements](./port_requirements.md). + +!!! note + [Nutanix Cloud Controller Manager (CCM)](../../ccm/latest/overview.md) is a mandatory component starting from CAPX v1.3.0. Ensure all CAPX-managed Kubernetes clusters are configured to use Nutanix CCM before upgrading to v1.3.0 or later. See [CAPX v1.5.x Upgrade Procedure](./tasks/capx_v15x_upgrade_procedure.md). + +## Production Workflow + +### Build OS image for NutanixMachineTemplate resource +Cluster API Provider Nutanix Cloud Infrastructure (CAPX) uses the [Image Builder](https://image-builder.sigs.k8s.io/){target=_blank} project to build OS images used for the Nutanix machines. + +Follow the steps detailed in [Building CAPI Images for Nutanix Cloud Platform (NCP)](https://image-builder.sigs.k8s.io/capi/providers/nutanix.html#building-capi-images-for-nutanix-cloud-platform-ncp){target=_blank} to use Image Builder on the Nutanix Cloud Platform. + +For a list of operating systems visit the OS image [Configuration](https://image-builder.sigs.k8s.io/capi/providers/nutanix.html#configuration){target=_blank} page. + +### Prerequisites for using Cluster API Provider Nutanix Cloud Infrastructure +The [Cluster API installation](https://cluster-api.sigs.k8s.io/user/quick-start.html#installation){target=_blank} section provides an overview of all required prerequisites: + +- [Common Prerequisites](https://cluster-api.sigs.k8s.io/user/quick-start.html#common-prerequisites){target=_blank} +- [Install and/or configure a Kubernetes cluster](https://cluster-api.sigs.k8s.io/user/quick-start.html#install-andor-configure-a-kubernetes-cluster){target=_blank} +- [Install clusterctl](https://cluster-api.sigs.k8s.io/user/quick-start.html#install-clusterctl){target=_blank} +- (Optional) [Enabling Feature Gates](https://cluster-api.sigs.k8s.io/user/quick-start.html#enabling-feature-gates){target=_blank} + +Make sure these prerequisites have been met before moving to the [Configure and Install Cluster API Provider Nutanix Cloud Infrastructure](#configure-and-install-cluster-api-provider-nutanix-cloud-infrastructure) step. + + +### Configure and Install Cluster API Provider Nutanix Cloud Infrastructure +To initialize Cluster API Provider Nutanix Cloud Infrastructure, `clusterctl` requires the following variables, which should be set in either `~/.cluster-api/clusterctl.yaml` or as environment variables. +``` +NUTANIX_ENDPOINT: "" # IP or FQDN of Prism Central +NUTANIX_USER: "" # Prism Central user +NUTANIX_PASSWORD: "" # Prism Central password +NUTANIX_INSECURE: false # or true + +KUBERNETES_VERSION: "v1.22.9" +WORKER_MACHINE_COUNT: 3 +NUTANIX_SSH_AUTHORIZED_KEY: "" + +NUTANIX_PRISM_ELEMENT_CLUSTER_NAME: "" +NUTANIX_MACHINE_TEMPLATE_IMAGE_NAME: "" +NUTANIX_SUBNET_NAME: "" + +EXP_CLUSTER_RESOURCE_SET: true # Required for Nutanix CCM installation +``` + +You can also see the required list of variables by running the following: +``` +clusterctl generate cluster mycluster -i nutanix --list-variables +Required Variables: + - CONTROL_PLANE_ENDPOINT_IP + - KUBERNETES_VERSION + - NUTANIX_ENDPOINT + - NUTANIX_MACHINE_TEMPLATE_IMAGE_NAME + - NUTANIX_PASSWORD + - NUTANIX_PRISM_ELEMENT_CLUSTER_NAME + - NUTANIX_SSH_AUTHORIZED_KEY + - NUTANIX_SUBNET_NAME + - NUTANIX_USER + +Optional Variables: + - CONTROL_PLANE_ENDPOINT_PORT (defaults to "6443") + - CONTROL_PLANE_MACHINE_COUNT (defaults to 1) + - KUBEVIP_LB_ENABLE (defaults to "false") + - KUBEVIP_SVC_ENABLE (defaults to "false") + - NAMESPACE (defaults to current Namespace in the KubeConfig file) + - NUTANIX_INSECURE (defaults to "false") + - NUTANIX_MACHINE_BOOT_TYPE (defaults to "legacy") + - NUTANIX_MACHINE_MEMORY_SIZE (defaults to "4Gi") + - NUTANIX_MACHINE_VCPU_PER_SOCKET (defaults to "1") + - NUTANIX_MACHINE_VCPU_SOCKET (defaults to "2") + - NUTANIX_PORT (defaults to "9440") + - NUTANIX_SYSTEMDISK_SIZE (defaults to "40Gi") + - WORKER_MACHINE_COUNT (defaults to 0) +``` + +!!! note + To prevent duplicate IP assignments, it is required to assign an IP-address to the `CONTROL_PLANE_ENDPOINT_IP` variable that is not part of the Nutanix IPAM or DHCP range assigned to the subnet of the CAPX cluster. + +!!! warning + Make sure [Cluster Resource Set (CRS)](https://cluster-api.sigs.k8s.io/tasks/experimental-features/cluster-resource-set){target=_blank} is enabled before running `clusterctl init` + +Now you can instantiate Cluster API with the following: +``` +clusterctl init -i nutanix +``` + +### Deploy a workload cluster on Nutanix Cloud Infrastructure +``` +export TEST_CLUSTER_NAME=mytestcluster1 +export TEST_NAMESPACE=mytestnamespace +CONTROL_PLANE_ENDPOINT_IP=x.x.x.x clusterctl generate cluster ${TEST_CLUSTER_NAME} \ + -i nutanix \ + --target-namespace ${TEST_NAMESPACE} \ + --kubernetes-version v1.22.9 \ + --control-plane-machine-count 1 \ + --worker-machine-count 3 > ./cluster.yaml +kubectl create ns ${TEST_NAMESPACE} +kubectl apply -f ./cluster.yaml -n ${TEST_NAMESPACE} +``` +To customize the configuration of the default `cluster.yaml` file generated by CAPX, visit the [NutanixCluster](./types/nutanix_cluster.md) and [NutanixMachineTemplate](./types/nutanix_machine_template.md) documentation. + +### Access a workload cluster +To access resources on the cluster, you can get the kubeconfig with the following: +``` +clusterctl get kubeconfig ${TEST_CLUSTER_NAME} -n ${TEST_NAMESPACE} > ${TEST_CLUSTER_NAME}.kubeconfig +kubectl --kubeconfig ./${TEST_CLUSTER_NAME}.kubeconfig get nodes +``` + +### Install CNI on workload a cluster + +You must deploy a Container Network Interface (CNI) based pod network add-on so that your pods can communicate with each other. Cluster DNS (CoreDNS) will not start up before a network is installed. + +!!! note + Take care that your pod network must not overlap with any of the host networks. You are likely to see problems if there is any overlap. If you find a collision between your network plugin's preferred pod network and some of your host networks, you must choose a suitable alternative CIDR block to use instead. It can be configured inside the `cluster.yaml` generated by `clusterctl generate cluster` before applying it. + +Several external projects provide Kubernetes pod networks using CNI, some of which also support [Network Policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/){target=_blank}. + +See a list of add-ons that implement the [Kubernetes networking model](https://kubernetes.io/docs/concepts/cluster-administration/networking/#how-to-implement-the-kubernetes-network-model){target=_blank}. At time of writing, the most common are [Calico](https://www.tigera.io/project-calico/){target=_blank} and [Cilium](https://cilium.io){target=_blank}. + +Follow the specific install guide for your selected CNI and install only one pod network per cluster. + +Once a pod network has been installed, you can confirm that it is working by checking that the CoreDNS pod is running in the output of `kubectl get pods --all-namespaces`. + + +### Kube-vip settings + +Kube-vip is a true load balancing solution for the Kubernetes control plane. It distributes API requests across control plane nodes. It also has the capability to provide load balancing for Kubernetes services. + +You can tweak kube-vip settings by using the following properties: + +- `KUBEVIP_LB_ENABLE` + +This setting allows control plane load balancing using IPVS. See +[Control Plane Load-Balancing documentation](https://kube-vip.io/docs/about/architecture/#control-plane-load-balancing){target=_blank} for further information. + +- `KUBEVIP_SVC_ENABLE` + +This setting enables a service of type LoadBalancer. See +[Kubernetes Service Load Balancing documentation](https://kube-vip.io/docs/about/architecture/#kubernetes-service-load-balancing){target=_blank} for further information. + +- `KUBEVIP_SVC_ELECTION` + +This setting enables Load Balancing of Load Balancers. See [Load Balancing Load Balancers](https://kube-vip.io/docs/usage/kubernetes-services/#load-balancing-load-balancers-when-using-arp-mode-yes-you-read-that-correctly-kube-vip-v050){target=_blank} for further information. + +### Delete a workload cluster +To remove a workload cluster from your management cluster, remove the cluster object and the provider will clean-up all resources. + +``` +kubectl delete cluster ${TEST_CLUSTER_NAME} -n ${TEST_NAMESPACE} +``` +!!! note + Deleting the entire cluster template with `kubectl delete -f ./cluster.yaml` may lead to pending resources requiring manual cleanup. diff --git a/docs/capx/v1.5.x/pc_certificates.md b/docs/capx/v1.5.x/pc_certificates.md new file mode 100644 index 00000000..f3fe1699 --- /dev/null +++ b/docs/capx/v1.5.x/pc_certificates.md @@ -0,0 +1,149 @@ +# Certificate Trust + +CAPX invokes Prism Central APIs using the HTTPS protocol. CAPX has different methods to handle the trust of the Prism Central certificates: + +- Enable certificate verification (default) +- Configure an additional trust bundle +- Disable certificate verification + +See the respective sections below for more information. + +!!! note + For more information about replacing Prism Central certificates, see the [Nutanix AOS Security Guide](https://portal.nutanix.com/page/documents/details?targetId=Nutanix-Security-Guide-v6_5:mul-security-ssl-certificate-pc-t.html){target=_blank}. + +## Enable certificate verification (default) +By default CAPX will perform certificate verification when invoking Prism Central API calls. This requires Prism Central to be configured with a publicly trusted certificate authority. +No additional configuration is required in CAPX. + +## Configure an additional trust bundle +CAPX allows users to configure an additional trust bundle. This will allow CAPX to verify certificates that are not issued by a publicy trusted certificate authority. + +To configure an additional trust bundle, the `NUTANIX_ADDITIONAL_TRUST_BUNDLE` environment variable needs to be set. The value of the `NUTANIX_ADDITIONAL_TRUST_BUNDLE` environment variable contains the trust bundle (PEM format) in base64 encoded format. See the [Configuring the trust bundle environment variable](#configuring-the-trust-bundle-environment-variable) section for more information. + +It is also possible to configure the additional trust bundle manually by creating a custom `cluster-template`. See the [Configuring the additional trust bundle manually](#configuring-the-additional-trust-bundle-manually) section for more information + +The `NUTANIX_ADDITIONAL_TRUST_BUNDLE` environment variable can be set when initializing the CAPX provider or when creating a workload cluster. If the `NUTANIX_ADDITIONAL_TRUST_BUNDLE` is configured when the CAPX provider is initialized, the additional trust bundle will be used for every CAPX workload cluster. If it is only configured when creating a workload cluster, it will only be applicable for that specific workload cluster. + + +### Configuring the trust bundle environment variable + +Create a PEM encoded file containing the root certificate and all intermediate certificates. Example: +``` +$ cat cert.crt +-----BEGIN CERTIFICATE----- + +-----END CERTIFICATE----- +-----BEGIN CERTIFICATE----- + +-----END CERTIFICATE----- +``` + +Use a `base64` tool to encode these contents in base64. The command below will provide a `base64` string. +``` +$ cat cert.crt | base64 + +``` +!!! note + Make sure the `base64` string does not contain any newlines (`\n`). If the output string contains newlines, remove them manually or check the manual of the `base64` tool on how to generate a `base64` string without newlines. + +Use the `base64` string as value for the `NUTANIX_ADDITIONAL_TRUST_BUNDLE` environment variable. +``` +$ export NUTANIX_ADDITIONAL_TRUST_BUNDLE="" +``` + +### Configuring the additional trust bundle manually + +To configure the additional trust bundle manually without using the `NUTANIX_ADDITIONAL_TRUST_BUNDLE` environment variable present in the default `cluster-template` files, it is required to: + +- Create a `ConfigMap` containing the additional trust bundle. +- Configure the `prismCentral.additionalTrustBundle` object in the `NutanixCluster` spec. + +#### Creating the additional trust bundle ConfigMap + +CAPX supports two different formats for the ConfigMap containing the additional trust bundle. The first one is to add the additional trust bundle as a multi-line string in the `ConfigMap`, the second option is to add the trust bundle in `base64` encoded format. See the examples below. + +Multi-line string example: +```YAML +--- +apiVersion: v1 +kind: ConfigMap +metadata: + name: user-ca-bundle + namespace: ${NAMESPACE} +data: + ca.crt: | + -----BEGIN CERTIFICATE----- + + -----END CERTIFICATE----- + -----BEGIN CERTIFICATE----- + + -----END CERTIFICATE----- +``` + +`base64` example: + +```YAML +apiVersion: v1 +kind: ConfigMap +metadata: + name: user-ca-bundle + namespace: ${NAMESPACE} +binaryData: + ca.crt: +``` + +!!! note + The `base64` string needs to be added as `binaryData`. + + +#### Configuring the NutanixCluster spec + +When the additional trust bundle `ConfigMap` is created, it needs to be referenced in the `NutanixCluster` spec. Add the `prismCentral.additionalTrustBundle` object in the `NutanixCluster` spec as shown below. Make sure the correct additional trust bundle `ConfigMap` is referenced. + +```YAML +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixCluster +metadata: + name: ${CLUSTER_NAME} + namespace: ${NAMESPACE} +spec: + ... + prismCentral: + ... + additionalTrustBundle: + kind: ConfigMap + name: user-ca-bundle + insecure: false +``` + +!!! note + the default value of `prismCentral.insecure` attribute is `false`. It can be omitted when an additional trust bundle is configured. + + If `prismCentral.insecure` attribute is set to `true`, all certificate verification will be disabled. + + +## Disable certificate verification + +!!! note + Disabling certificate verification is not recommended for production purposes and should only be used for testing. + + +Certificate verification can be disabled by setting the `prismCentral.insecure` attribute to `true` in the `NutanixCluster` spec. Certificate verification will be disabled even if an additional trust bundle is configured. + +Disabled certificate verification example: + +```YAML +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixCluster +metadata: + name: ${CLUSTER_NAME} + namespace: ${NAMESPACE} +spec: + controlPlaneEndpoint: + host: ${CONTROL_PLANE_ENDPOINT_IP} + port: ${CONTROL_PLANE_ENDPOINT_PORT=6443} + prismCentral: + ... + insecure: true + ... +``` \ No newline at end of file diff --git a/docs/capx/v1.5.x/port_requirements.md b/docs/capx/v1.5.x/port_requirements.md new file mode 100644 index 00000000..af182abb --- /dev/null +++ b/docs/capx/v1.5.x/port_requirements.md @@ -0,0 +1,19 @@ +# Port Requirements + +CAPX uses the ports documented below to create workload clusters. + +!!! note + This page only documents the ports specifically required by CAPX and does not provide the full overview of all ports required in the CAPI framework. + +## Management cluster + +| Source | Destination | Protocol | Port | Description | +|--------------------|---------------------|----------|------|--------------------------------------------------------------------------------------------------| +| Management cluster | External Registries | TCP | 443 | Pull container images from [CAPX public registries](#public-registries-utilized-when-using-capx) | +| Management cluster | Prism Central | TCP | 9440 | Management cluster communication to Prism Central | + +## Public registries utilized when using CAPX + +| Registry name | +|---------------| +| ghcr.io | diff --git a/docs/capx/v1.5.x/tasks/capx_v15x_upgrade_procedure.md b/docs/capx/v1.5.x/tasks/capx_v15x_upgrade_procedure.md new file mode 100644 index 00000000..5361700b --- /dev/null +++ b/docs/capx/v1.5.x/tasks/capx_v15x_upgrade_procedure.md @@ -0,0 +1,83 @@ +# CAPX v1.5.x Upgrade Procedure + +Starting from CAPX v1.3.0, it is required for all CAPX-managed Kubernetes clusters to use the Nutanix Cloud Controller Manager (CCM). + +Before upgrading CAPX instances to v1.3.0 or later, it is required to follow the [steps](#steps) detailed below for each of the CAPX-managed Kubernetes clusters that don't use Nutanix CCM. + + +## Steps + +This procedure uses [Cluster Resource Set (CRS)](https://cluster-api.sigs.k8s.io/tasks/experimental-features/cluster-resource-set){target=_blank} to install Nutanix CCM but it can also be installed using the [Nutanix CCM Helm chart](https://artifacthub.io/packages/helm/nutanix/nutanix-cloud-provider){target=_blank}. + +!!! warning + Make sure [CRS](https://cluster-api.sigs.k8s.io/tasks/experimental-features/cluster-resource-set){target=_blank} is enabled on the management cluster before following the procedure. + +Perform following steps for each of the CAPX-managed Kubernetes clusters that are not configured to use Nutanix CCM: + +1. Add the `cloud-provider: external` configuration in the `KubeadmConfigTemplate` resources: + ```YAML + apiVersion: bootstrap.cluster.x-k8s.io/v1beta1 + kind: KubeadmConfigTemplate + spec: + template: + spec: + joinConfiguration: + nodeRegistration: + kubeletExtraArgs: + cloud-provider: external + ``` +2. Add the `cloud-provider: external` configuration in the `KubeadmControlPlane` resource: +```YAML +--- +apiVersion: bootstrap.cluster.x-k8s.io/v1beta1 +kind: KubeadmConfigTemplate +spec: + template: + spec: + joinConfiguration: + nodeRegistration: + kubeletExtraArgs: + cloud-provider: external +--- +apiVersion: controlplane.cluster.x-k8s.io/v1beta1 +kind: KubeadmControlPlane +spec: + kubeadmConfigSpec: + clusterConfiguration: + apiServer: + extraArgs: + cloud-provider: external + controllerManager: + extraArgs: + cloud-provider: external + initConfiguration: + nodeRegistration: + kubeletExtraArgs: + cloud-provider: external + joinConfiguration: + nodeRegistration: + kubeletExtraArgs: + cloud-provider: external +``` +3. Add the Nutanix CCM CRS resources: + + - [nutanix-ccm-crs.yaml](https://github.com/nutanix-cloud-native/cluster-api-provider-nutanix/blob/v1.5.0/templates/ccm/nutanix-ccm-crs.yaml){target=_blank} + - [nutanix-ccm-secret.yaml](https://github.com/nutanix-cloud-native/cluster-api-provider-nutanix/blob/v1.5.0/templates/ccm/nutanix-ccm-secret.yaml) + - [nutanix-ccm.yaml](https://github.com/nutanix-cloud-native/cluster-api-provider-nutanix/blob/v1.5.0/templates/ccm/nutanix-ccm.yaml) + + Make sure to update each of the variables before applying the `YAML` files. + +4. Add the `ccm: nutanix` label to the `Cluster` resource: + ```YAML + apiVersion: cluster.x-k8s.io/v1beta1 + kind: Cluster + metadata: + labels: + ccm: nutanix + ``` +5. Verify if the Nutanix CCM pod is up and running: +``` +kubectl get pod -A -l k8s-app=nutanix-cloud-controller-manager +``` +6. Trigger a new rollout of the Kubernetes nodes by performing a Kubernetes upgrade or by using `clusterctl alpha rollout restart`. See the [clusterctl alpha rollout](https://cluster-api.sigs.k8s.io/clusterctl/commands/alpha-rollout#restart){target=_blank} for more information. +7. Upgrade CAPX to v1.5.0 by following the [clusterctl upgrade](https://cluster-api.sigs.k8s.io/clusterctl/commands/upgrade.html?highlight=clusterctl%20upgrade%20pla#clusterctl-upgrade){target=_blank} documentation \ No newline at end of file diff --git a/docs/capx/v1.5.x/tasks/modify_machine_configuration.md b/docs/capx/v1.5.x/tasks/modify_machine_configuration.md new file mode 100644 index 00000000..04a43a95 --- /dev/null +++ b/docs/capx/v1.5.x/tasks/modify_machine_configuration.md @@ -0,0 +1,11 @@ +# Modifying Machine Configurations + +Since all attributes of the `NutanixMachineTemplate` resources are immutable, follow the [Updating Infrastructure Machine Templates](https://cluster-api.sigs.k8s.io/tasks/updating-machine-templates.html?highlight=machine%20template#updating-infrastructure-machine-templates){target=_blank} procedure to modify the configuration of machines in an existing CAPX cluster. +See the [NutanixMachineTemplate](../types/nutanix_machine_template.md) documentation for all supported configuration parameters. + +!!! note + Manually modifying existing and linked `NutanixMachineTemplate` resources will not trigger a rolling update of the machines. + +!!! note + Do not modify the virtual machine configuration of CAPX cluster nodes manually in Prism/Prism Central. + CAPX will not automatically revert the configuration change but performing scale-up/scale-down/upgrade operations will override manual modifications. Only use the `Updating Infrastructure Machine` procedure referenced above to perform configuration changes. \ No newline at end of file diff --git a/docs/capx/v1.5.x/troubleshooting.md b/docs/capx/v1.5.x/troubleshooting.md new file mode 100644 index 00000000..c023d13e --- /dev/null +++ b/docs/capx/v1.5.x/troubleshooting.md @@ -0,0 +1,13 @@ +# Troubleshooting + +## Clusterctl failed with GitHub rate limit error + +By design Clusterctl fetches artifacts from repositories hosted on GitHub, this operation is subject to [GitHub API rate limits](https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting){target=_blank}. + +While this is generally okay for the majority of users, there is still a chance that some users (especially developers or CI tools) hit this limit: + +``` +Error: failed to get repository client for the XXX with name YYY: error creating the GitHub repository client: failed to get GitHub latest version: failed to get the list of versions: rate limit for github api has been reached. Please wait one hour or get a personal API tokens a assign it to the GITHUB_TOKEN environment variable +``` + +As explained in the error message, you can increase your API rate limit by [creating a GitHub personal token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token){target=_blank} and setting a `GITHUB_TOKEN` environment variable using the token. diff --git a/docs/capx/v1.5.x/types/nutanix_cluster.md b/docs/capx/v1.5.x/types/nutanix_cluster.md new file mode 100644 index 00000000..09325cab --- /dev/null +++ b/docs/capx/v1.5.x/types/nutanix_cluster.md @@ -0,0 +1,64 @@ +# NutanixCluster + +The `NutanixCluster` resource defines the configuration of a CAPX Kubernetes cluster. + +Example of a `NutanixCluster` resource: + +```YAML +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixCluster +metadata: + name: ${CLUSTER_NAME} + namespace: ${NAMESPACE} +spec: + controlPlaneEndpoint: + host: ${CONTROL_PLANE_ENDPOINT_IP} + port: ${CONTROL_PLANE_ENDPOINT_PORT=6443} + prismCentral: + address: ${NUTANIX_ENDPOINT} + additionalTrustBundle: + kind: ConfigMap + name: user-ca-bundle + credentialRef: + kind: Secret + name: ${CLUSTER_NAME} + insecure: ${NUTANIX_INSECURE=false} + port: ${NUTANIX_PORT=9440} +``` + +## NutanixCluster spec +The table below provides an overview of the supported parameters of the `spec` attribute of a `NutanixCluster` resource. + +### Configuration parameters + +| Key |Type |Description | +|--------------------------------------------|------|----------------------------------------------------------------------------------| +|controlPlaneEndpoint |object|Defines the host IP and port of the CAPX Kubernetes cluster. | +|controlPlaneEndpoint.host |string|Host IP to be assigned to the CAPX Kubernetes cluster. | +|controlPlaneEndpoint.port |int |Port of the CAPX Kubernetes cluster. Default: `6443` | +|prismCentral |object|(Optional) Prism Central endpoint definition. | +|prismCentral.address |string|IP/FQDN of Prism Central. | +|prismCentral.port |int |Port of Prism Central. Default: `9440` | +|prismCentral.insecure |bool |Disable Prism Central certificate checking. Default: `false` | +|prismCentral.credentialRef |object|Reference to credentials used for Prism Central connection. | +|prismCentral.credentialRef.kind |string|Kind of the credentialRef. Allowed value: `Secret` | +|prismCentral.credentialRef.name |string|Name of the secret containing the Prism Central credentials. | +|prismCentral.credentialRef.namespace |string|(Optional) Namespace of the secret containing the Prism Central credentials. | +|prismCentral.additionalTrustBundle |object|Reference to the certificate trust bundle used for Prism Central connection. | +|prismCentral.additionalTrustBundle.kind |string|Kind of the additionalTrustBundle. Allowed value: `ConfigMap` | +|prismCentral.additionalTrustBundle.name |string|Name of the `ConfigMap` containing the Prism Central trust bundle. | +|prismCentral.additionalTrustBundle.namespace|string|(Optional) Namespace of the `ConfigMap` containing the Prism Central trust bundle.| +|failureDomains |list |(Optional) Failure domains for the Kubernetes nodes | +|failureDomains.[].name |string|Name of the failure domain | +|failureDomains.[].cluster |object|Reference (name or uuid) to the Prism Element cluster. Name or UUID can be passed | +|failureDomains.[].cluster.type |string|Type to identify the Prism Element cluster. Allowed values: `name` and `uuid` | +|failureDomains.[].cluster.name |string|Name of the Prism Element cluster. | +|failureDomains.[].cluster.uuid |string|UUID of the Prism Element cluster. | +|failureDomains.[].subnets |list |(Optional) Reference (name or uuid) to the subnets to be assigned to the VMs. | +|failureDomains.[].subnets.[].type |string|Type to identify the subnet. Allowed values: `name` and `uuid` | +|failureDomains.[].subnets.[].name |string|Name of the subnet. | +|failureDomains.[].subnets.[].uuid |string|UUID of the subnet. | +|failureDomains.[].controlPlane |bool |Indicates if a failure domain is suited for control plane nodes + +!!! note + To prevent duplicate IP assignments, it is required to assign an IP-address to the `controlPlaneEndpoint.host` variable that is not part of the Nutanix IPAM or DHCP range assigned to the subnet of the CAPX cluster. \ No newline at end of file diff --git a/docs/capx/v1.5.x/types/nutanix_machine_template.md b/docs/capx/v1.5.x/types/nutanix_machine_template.md new file mode 100644 index 00000000..516d1eea --- /dev/null +++ b/docs/capx/v1.5.x/types/nutanix_machine_template.md @@ -0,0 +1,84 @@ +# NutanixMachineTemplate +The `NutanixMachineTemplate` resource defines the configuration of a CAPX Kubernetes VM. + +Example of a `NutanixMachineTemplate` resource. + +```YAML +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixMachineTemplate +metadata: + name: "${CLUSTER_NAME}-mt-0" + namespace: "${NAMESPACE}" +spec: + template: + spec: + providerID: "nutanix://${CLUSTER_NAME}-m1" + # Supported options for boot type: legacy and uefi + # Defaults to legacy if not set + bootType: ${NUTANIX_MACHINE_BOOT_TYPE=legacy} + vcpusPerSocket: ${NUTANIX_MACHINE_VCPU_PER_SOCKET=1} + vcpuSockets: ${NUTANIX_MACHINE_VCPU_SOCKET=2} + memorySize: "${NUTANIX_MACHINE_MEMORY_SIZE=4Gi}" + systemDiskSize: "${NUTANIX_SYSTEMDISK_SIZE=40Gi}" + image: + type: name + name: "${NUTANIX_MACHINE_TEMPLATE_IMAGE_NAME}" + cluster: + type: name + name: "${NUTANIX_PRISM_ELEMENT_CLUSTER_NAME}" + subnet: + - type: name + name: "${NUTANIX_SUBNET_NAME}" + # Adds additional categories to the virtual machines. + # Note: Categories must already be present in Prism Central + # additionalCategories: + # - key: AppType + # value: Kubernetes + # Adds the cluster virtual machines to a project defined in Prism Central. + # Replace NUTANIX_PROJECT_NAME with the correct project defined in Prism Central + # Note: Project must already be present in Prism Central. + # project: + # type: name + # name: "NUTANIX_PROJECT_NAME" + # gpus: + # - type: name + # name: "GPU NAME" +``` + +## NutanixMachineTemplate spec +The table below provides an overview of the supported parameters of the `spec` attribute of a `NutanixMachineTemplate` resource. + +### Configuration parameters +| Key |Type |Description| +|------------------------------------|------|--------------------------------------------------------------------------------------------------------| +|bootType |string|Boot type of the VM. Depends on the OS image used. Allowed values: `legacy`, `uefi`. Default: `legacy` | +|vcpusPerSocket |int |Amount of vCPUs per socket. Default: `1` | +|vcpuSockets |int |Amount of vCPU sockets. Default: `2` | +|memorySize |string|Amount of Memory. Default: `4Gi` | +|systemDiskSize |string|Amount of storage assigned to the system disk. Default: `40Gi` | +|image |object|Reference (name or uuid) to the OS image used for the system disk. | +|image.type |string|Type to identify the OS image. Allowed values: `name` and `uuid` | +|image.name |string|Name of the image. | +|image.uuid |string|UUID of the image. | +|cluster |object|(Optional) Reference (name or uuid) to the Prism Element cluster. Name or UUID can be passed | +|cluster.type |string|Type to identify the Prism Element cluster. Allowed values: `name` and `uuid` | +|cluster.name |string|Name of the Prism Element cluster. | +|cluster.uuid |string|UUID of the Prism Element cluster. | +|subnets |list |(Optional) Reference (name or uuid) to the subnets to be assigned to the VMs. | +|subnets.[].type |string|Type to identify the subnet. Allowed values: `name` and `uuid` | +|subnets.[].name |string|Name of the subnet. | +|subnets.[].uuid |string|UUID of the subnet. | +|additionalCategories |list |Reference to the categories to be assigned to the VMs. These categories already exist in Prism Central. | +|additionalCategories.[].key |string|Key of the category. | +|additionalCategories.[].value |string|Value of the category. | +|project |object|Reference (name or uuid) to the project. This project must already exist in Prism Central. | +|project.type |string|Type to identify the project. Allowed values: `name` and `uuid` | +|project.name |string|Name of the project. | +|project.uuid |string|UUID of the project. | +|gpus |object|Reference (name or deviceID) to the GPUs to be assigned to the VMs. Can be vGPU or Passthrough. | +|gpus.[].type |string|Type to identify the GPU. Allowed values: `name` and `deviceID` | +|gpus.[].name |string|Name of the GPU or the vGPU profile | +|gpus.[].deviceID |string|DeviceID of the GPU or the vGPU profile | + +!!! note + The `cluster` or `subnets` configuration parameters are optional in case failure domains are defined on the `NutanixCluster` and `MachineDeployment` resources. \ No newline at end of file diff --git a/docs/capx/v1.5.x/user_requirements.md b/docs/capx/v1.5.x/user_requirements.md new file mode 100644 index 00000000..05e971a5 --- /dev/null +++ b/docs/capx/v1.5.x/user_requirements.md @@ -0,0 +1,37 @@ +# User Requirements + +Cluster API Provider Nutanix Cloud Infrastructure (CAPX) interacts with Nutanix Prism Central (PC) APIs using a Prism Central user account. + +CAPX supports two types of PC users: + +- Local users: must be assigned the `Prism Central Admin` role. +- Domain users: must be assigned a role that at least has the [Minimum required CAPX permissions for domain users](#minimum-required-capx-permissions-for-domain-users) assigned. + +See [Credential Management](./credential_management.md){target=_blank} for more information on how to pass the user credentials to CAPX. + +## Minimum required CAPX permissions for domain users + +The following permissions are required for Prism Central domain users: + +- Create Category Mapping +- Create Image +- Create Or Update Name Category +- Create Or Update Value Category +- Create Virtual Machine +- Delete Category Mapping +- Delete Image +- Delete Name Category +- Delete Value Category +- Delete Virtual Machine +- Detach Volume Group From AHV VM +- View Category Mapping +- View Cluster +- View Image +- View Name Category +- View Project +- View Subnet +- View Value Category +- View Virtual Machine + +!!! note + The list of permissions has been validated on PC 2022.6 and above. diff --git a/docs/capx/v1.5.x/validated_integrations.md b/docs/capx/v1.5.x/validated_integrations.md new file mode 100644 index 00000000..c90d43a8 --- /dev/null +++ b/docs/capx/v1.5.x/validated_integrations.md @@ -0,0 +1,65 @@ +# Validated Integrations + +Validated integrations are a defined set of specifically tested configurations between technologies that represent the most common combinations that Nutanix customers are using or deploying with CAPX. For these integrations, Nutanix has directly, or through certified partners, exercised a full range of platform tests as part of the product release process. + +## Integration Validation Policy + +Nutanix follows the version validation policies below: + +- Validate at least one active AOS LTS (long term support) version. Validated AOS LTS version for a specific CAPX version is listed in the [AOS](#aos) section.
+ + !!! note + + Typically the latest LTS release at time of CAPX release except when latest is initial release in train (eg x.y.0). Exact version depends on timing and customer adoption. + +- Validate the latest AOS STS (short term support) release at time of CAPX release. +- Validate at least one active Prism Central (PC) version. Validated PC version for a specific CAPX version is listed in the [Prism Central](#prism-central) section.
+ + !!! note + + Typically the the latest PC release at time of CAPX release except when latest is initial release in train (eg x.y.0). Exact version depends on timing and customer adoption. + +- At least one active Cluster-API (CAPI) version. Validated CAPI version for a specific CAPX version is listed in the [Cluster-API](#cluster-api) section.
+ + !!! note + + Typically the the latest Cluster-API release at time of CAPX release except when latest is initial release in train (eg x.y.0). Exact version depends on timing and customer adoption. + +## Validated versions +### Cluster-API +| CAPX | CAPI v1.3.x | CAPI v1.4.x | CAPI v1.5.x | CAPI v1.6.x | CAPI v1.7.x | CAPI v1.8.x | +|--------|-------------|-------------|-------------|-------------|-------------|-------------| +| v1.5.x | Yes | Yes | Yes | Yes | Yes | Yes | +| v1.4.x | Yes | Yes | Yes | Yes | Yes | No | +| v1.3.x | Yes | Yes | Yes | Yes | No | No | +| v1.2.x | Yes | Yes | Yes | No | No | No | +| v1.1.x | Yes | No | No | No | No | No | +| v1.0.x | No | No | No | No | No | No | +| v0.5.x | No | No | No | No | No | No | + +See the [Validated Kubernetes Versions](https://cluster-api.sigs.k8s.io/reference/versions.html?highlight=version#supported-kubernetes-versions){target=_blank} page for more information on CAPI validated versions. + +### AOS + +| CAPX | 5.20.4.5 (LTS) | 6.1.1.5 (STS) | 6.5.x (LTS) | 6.6 (STS) | 6.7 (STS) | 6.8 (STS) | +|--------|----------------|---------------|-------------|-----------|-----------|-----------| +| v1.5.x | No | No | Yes | No | No | Yes | +| v1.4.x | No | No | Yes | No | No | Yes | +| v1.3.x | No | No | Yes | Yes | Yes | No | +| v1.2.x | No | No | Yes | Yes | Yes | No | +| v1.1.x | No | No | Yes | No | No | No | +| v1.0.x | Yes | Yes | No | No | No | No | +| v0.5.x | Yes | Yes | No | No | No | No | + + +### Prism Central + +| CAPX | 2022.1.0.2 | pc.2022.6 | pc.2022.9 | pc.2023.x | pc.2024.x | +|--------|------------|-----------|-----------|-----------|-----------| +| v1.5.x | No | Yes | No | Yes | Yes | +| v1.4.x | No | Yes | No | Yes | Yes | +| v1.3.x | No | Yes | No | Yes | No | +| v1.2.x | No | Yes | Yes | Yes | No | +| v1.1.x | No | Yes | No | No | No | +| v1.0.x | Yes | Yes | No | No | No | +| v0.5.x | Yes | Yes | No | No | No | diff --git a/docs/capx/v1.6.x/addons/install_csi_driver.md b/docs/capx/v1.6.x/addons/install_csi_driver.md new file mode 100644 index 00000000..afb4bdc8 --- /dev/null +++ b/docs/capx/v1.6.x/addons/install_csi_driver.md @@ -0,0 +1,215 @@ +# Nutanix CSI Driver installation with CAPX + +The Nutanix CSI driver is fully supported on CAPI/CAPX deployed clusters where all the nodes meet the [Nutanix CSI driver prerequisites](#capi-workload-cluster-prerequisites-for-the-nutanix-csi-driver). + +There are three methods to install the Nutanix CSI driver on a CAPI/CAPX cluster: + +- Helm +- ClusterResourceSet +- CAPX Flavor + +For more information, check the next sections. + +## CAPI Workload cluster prerequisites for the Nutanix CSI Driver + +Kubernetes workers need the following prerequisites to use the Nutanix CSI Drivers: + +- iSCSI initiator package (for Volumes based block storage) +- NFS client package (for Files based storage) + +These packages may already be present in the image you use with your infrastructure provider or you can also rely on your bootstrap provider to install them. More info is available in the [Prerequisites docs](https://portal.nutanix.com/page/documents/details?targetId=CSI-Volume-Driver-v2_6:csi-csi-plugin-prerequisites-r.html){target=_blank}. + +The package names and installation method will also vary depending on the operating system you plan to use. + +In the example below, `kubeadm` bootstrap provider is used to deploy these packages on top of an Ubuntu 20.04 image. The `kubeadm` bootstrap provider allows defining `preKubeadmCommands` that will be launched before Kubernetes cluster creation. These `preKubeadmCommands` can be defined both in `KubeadmControlPlane` for master nodes and in `KubeadmConfigTemplate` for worker nodes. + +In the example with an Ubuntu 20.04 image, both `KubeadmControlPlane` and `KubeadmConfigTemplate` must be modified as in the example below: + +```yaml +spec: + template: + spec: + # ....... + preKubeadmCommands: + - echo "before kubeadm call" > /var/log/prekubeadm.log + - apt update + - apt install -y nfs-common open-iscsi + - systemctl enable --now iscsid +``` +## Install the Nutanix CSI Driver with Helm + +A recent [Helm](https://helm.sh){target=_blank} version is needed (tested with Helm v3.10.1). + +The example below must be applied on a ready workload cluster. The workload cluster's kubeconfig can be retrieved and used to connect with the following command: + +```shell +clusterctl get kubeconfig $CLUSTER_NAME -n $CLUSTER_NAMESPACE > $CLUSTER_NAME-KUBECONFIG +export KUBECONFIG=$(pwd)/$CLUSTER_NAME-KUBECONFIG +``` + +Once connected to the cluster, follow the [CSI documentation](https://portal.nutanix.com/page/documents/details?targetId=CSI-Volume-Driver-v2_6:csi-csi-driver-install-t.html){target=_blank}. + +First, install the [nutanix-csi-snapshot](https://github.com/nutanix/helm/tree/master/charts/nutanix-csi-snapshot){target=_blank} chart followed by the [nutanix-csi-storage](https://github.com/nutanix/helm/tree/master/charts/nutanix-csi-storage){target=_blank} chart. + +See an example below: + +```shell +#Add the official Nutanix Helm repo and get the latest update +helm repo add nutanix https://nutanix.github.io/helm/ +helm repo update + +# Install the nutanix-csi-snapshot chart +helm install nutanix-csi-snapshot nutanix/nutanix-csi-snapshot -n ntnx-system --create-namespace + +# Install the nutanix-csi-storage chart +helm install nutanix-storage nutanix/nutanix-csi-storage -n ntnx-system --set createSecret=false +``` + +!!! warning + For correct Nutanix CSI driver deployment, a fully functional CNI deployment must be present. + +## Install the Nutanix CSI Driver with `ClusterResourceSet` + +The `ClusterResourceSet` feature was introduced to automatically apply a set of resources (such as CNI/CSI) defined by administrators to matching created/existing workload clusters. + +### Enabling the `ClusterResourceSet` feature + +At the time of writing, `ClusterResourceSet` is an experimental feature that must be enabled during the initialization of a management cluster with the `EXP_CLUSTER_RESOURCE_SET` feature gate. + +To do this, add `EXP_CLUSTER_RESOURCE_SET: "true"` in the `clusterctl` configuration file or just `export EXP_CLUSTER_RESOURCE_SET=true` before initializing the management cluster with `clusterctl init`. + +If the management cluster is already initialized, the `ClusterResourceSet` can be enabled by changing the configuration of the `capi-controller-manager` deployment in the `capi-system` namespace. + + ```shell + kubectl edit deployment -n capi-system capi-controller-manager + ``` + +Locate the section below: + +```yaml + - args: + - --leader-elect + - --metrics-bind-addr=localhost:8080 + - --feature-gates=MachinePool=false,ClusterResourceSet=true,ClusterTopology=false +``` + +Then replace `ClusterResourceSet=false` with `ClusterResourceSet=true`. + +!!! note + Editing the `deployment` resource will cause Kubernetes to automatically start new versions of the containers with the feature enabled. + + + +### Prepare the Nutanix CSI `ClusterResourceSet` + +#### Create the `ConfigMap` for the CSI Plugin + +First, create a `ConfigMap` that contains a YAML manifest with all resources to install the Nutanix CSI driver. + +Since the Nutanix CSI Driver is provided as a Helm chart, use `helm` to extract it before creating the `ConfigMap`. See an example below: + +```shell +helm repo add nutanix https://nutanix.github.io/helm/ +helm repo update + +kubectl create ns ntnx-system --dry-run=client -o yaml > nutanix-csi-namespace.yaml +helm template nutanix-csi-snapshot nutanix/nutanix-csi-snapshot -n ntnx-system > nutanix-csi-snapshot.yaml +helm template nutanix-csi-snapshot nutanix/nutanix-csi-storage -n ntnx-system > nutanix-csi-storage.yaml + +kubectl create configmap nutanix-csi-crs --from-file=nutanix-csi-namespace.yaml --from-file=nutanix-csi-snapshot.yaml --from-file=nutanix-csi-storage.yaml +``` + +#### Create the `ClusterResourceSet` + +Next, create the `ClusterResourceSet` resource that will map the `ConfigMap` defined above to clusters using a `clusterSelector`. + +The `ClusterResourceSet` needs to be created inside the management cluster. See an example below: + +```yaml +--- +apiVersion: addons.cluster.x-k8s.io/v1alpha3 +kind: ClusterResourceSet +metadata: + name: nutanix-csi-crs +spec: + clusterSelector: + matchLabels: + csi: nutanix + resources: + - kind: ConfigMap + name: nutanix-csi-crs +``` + +The `clusterSelector` field controls how Cluster API will match this `ClusterResourceSet` on one or more workload clusters. In the example scenario, the `matchLabels` approach is being used where the `ClusterResourceSet` will be applied to all workload clusters having the `csi: nutanix` label present. If the label isn't present, the `ClusterResourceSet` won't apply to that workload cluster. + +The `resources` field references the `ConfigMap` created above, which contains the manifests for installing the Nutanix CSI driver. + +#### Assign the `ClusterResourceSet` to a workload cluster + +Assign this `ClusterResourceSet` to the workload cluster by adding the correct label to the `Cluster` resource. + +This can be done before workload cluster creation by editing the output of the `clusterctl generate cluster` command or by modifying an already deployed workload cluster. + +In both cases, `Cluster` resources should look like this: + +```yaml +apiVersion: cluster.x-k8s.io/v1beta1 +kind: Cluster +metadata: + name: workload-cluster-name + namespace: workload-cluster-namespace + labels: + csi: nutanix +# ... +``` + +!!! warning + For correct Nutanix CSI driver deployment, a fully functional CNI deployment must be present. + +## Install the Nutanix CSI Driver with a CAPX flavor + +The CAPX provider can utilize a flavor to automatically deploy the Nutanix CSI using a `ClusterResourceSet`. + +### Prerequisites + +The following requirements must be met: + +- The operating system must meet the [Nutanix CSI OS prerequisites](#capi-workload-cluster-prerequisites-for-the-nutanix-csi-driver). +- The Management cluster must be installed with the [`CLUSTER_RESOURCE_SET` feature gate](#enabling-the-clusterresourceset-feature). + +### Installation + +Specify the `csi` flavor during workload cluster creation. See an example below: + +```shell +clusterctl generate cluster my-cluster -f csi +``` + +Additional environment variables are required: + +- `WEBHOOK_CA`: Base64 encoded CA certificate used to sign the webhook certificate +- `WEBHOOK_CERT`: Base64 certificate for the webhook validation component +- `WEBHOOK_KEY`: Base64 key for the webhook validation component + +The three components referenced above can be automatically created and referenced using [this script](https://github.com/nutanix-cloud-native/cluster-api-provider-nutanix/blob/main/scripts/gen-self-cert.sh){target=_blank}: + +``` +source scripts/gen-self-cert.sh +``` + +The certificate must reference the following names: + +- csi-snapshot-webhook +- csi-snapshot-webhook.ntnx-sytem +- csi-snapshot-webhook.ntnx-sytem.svc + +!!! warning + For correct Nutanix CSI driver deployment, a fully functional CNI deployment must be present. + +## Nutanix CSI Driver Configuration + +After the driver is installed, it must be configured for use by minimally defining a `Secret` and `StorageClass`. + +This can be done manually in the workload clusters or by using a `ClusterResourceSet` in the management cluster as explained above. + +See the Official [CSI Driver documentation](https://portal.nutanix.com/page/documents/details?targetId=CSI-Volume-Driver-v2_6:CSI-Volume-Driver-v2_6){target=_blank} on the Nutanix Portal for more configuration information. diff --git a/docs/capx/v1.6.x/credential_management.md b/docs/capx/v1.6.x/credential_management.md new file mode 100644 index 00000000..bebbc5a0 --- /dev/null +++ b/docs/capx/v1.6.x/credential_management.md @@ -0,0 +1,93 @@ +# Credential Management +Cluster API Provider Nutanix Cloud Infrastructure (CAPX) interacts with Nutanix Prism Central (PC) APIs to manage the required Kubernetes cluster infrastructure resources. + +PC credentials are required to authenticate to the PC APIs. CAPX currently supports two mechanisms to supply the required credentials: + +- Credentials injected into the CAPX manager deployment +- Workload cluster specific credentials + +## Credentials injected into the CAPX manager deployment +By default, credentials will be injected into the CAPX manager deployment when CAPX is initialized. See the [getting started guide](./getting_started.md) for more information on the initialization. + +Upon initialization a `nutanix-creds` secret will automatically be created in the `capx-system` namespace. This secret will contain the values supplied via the `NUTANIX_USER` and `NUTANIX_PASSWORD` parameters. + +The `nutanix-creds` secret will be used for workload cluster deployment if no other credential is supplied. + +### Example +An example of the automatically created `nutanix-creds` secret can be found below: +```yaml +--- +apiVersion: v1 +kind: Secret +type: Opaque +metadata: + name: nutanix-creds + namespace: capx-system +stringData: + credentials: | + [ + { + "type": "basic_auth", + "data": { + "prismCentral":{ + "username": "", + "password": "" + }, + "prismElements": null + } + } + ] +``` + +## Workload cluster specific credentials +Users can override the [credentials injected in CAPX manager deployment](#credentials-injected-into-the-capx-manager-deployment) by supplying a credential specific to a workload cluster. The credentials can be supplied by creating a secret in the same namespace as the `NutanixCluster` namespace. + +The secret can be referenced by adding a `credentialRef` inside the `prismCentral` attribute contained in the `NutanixCluster`. +The secret will also be deleted when the `NutanixCluster` is deleted. + +Note: There is a 1:1 relation between the secret and the `NutanixCluster` object. + +### Example +Create a secret in the namespace of the `NutanixCluster`: + +```yaml +--- +apiVersion: v1 +kind: Secret +metadata: + name: "" + namespace: "" +stringData: + credentials: | + [ + { + "type": "basic_auth", + "data": { + "prismCentral":{ + "username": "", + "password": "" + }, + "prismElements": null + } + } + ] +``` + +Add a `prismCentral` and corresponding `credentialRef` to the `NutanixCluster`: + +```yaml +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixCluster +metadata: + name: "" + namespace: "" +spec: + prismCentral: + ... + credentialRef: + name: "" + kind: Secret +... +``` + +See the [NutanixCluster](./types/nutanix_cluster.md) documentation for all supported configuration parameters for the `prismCentral` and `credentialRef` attribute. \ No newline at end of file diff --git a/docs/capx/v1.6.x/experimental/autoscaler.md b/docs/capx/v1.6.x/experimental/autoscaler.md new file mode 100644 index 00000000..2af57213 --- /dev/null +++ b/docs/capx/v1.6.x/experimental/autoscaler.md @@ -0,0 +1,129 @@ +# Using Autoscaler in combination with CAPX + +!!! warning + The scenario and features described on this page are experimental. It's important to note that they have not been fully validated. + +[Autoscaler](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/README.md){target=_blank} can be used in combination with Cluster API to automatically add or remove machines in a cluster. + +Autoscaler can be used in different deployment scenarios. This page will provide an overview of multiple autoscaler deployment scenarios in combination with CAPX. +See the [Testing](#testing) section to see how scale-up/scale-down events can be triggered to validate the autoscaler behaviour. + +More in-depth information on Autoscaler functionality can be found in the [Kubernetes documentation](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/README.md){target=_blank}. + +All Autoscaler configuration parameters can be found [here](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-the-parameters-to-ca){target=_blank}. + +## Scenario 1: Management cluster managing an external workload cluster +In this scenario, Autoscaler will be running on a management cluster and it will manage an external workload cluster. See the management cluster managing an external workload cluster section of [Kubernetes documentation](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/README.md#autoscaler-running-in-management-cluster-using-service-account-credentials-with-separate-workload-cluster){target=_blank} for more information. + +### Steps +1. Deploy a management cluster and workload cluster. The [CAPI quickstart](https://cluster-api.sigs.k8s.io/user/quick-start.html){target=_blank} can be used as a starting point. + + !!! note + Make sure a CNI is installed in the workload cluster. + +4. Download the example [Autoscaler deployment file](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/examples/deployment.yaml){target=_blank}. +5. Modify the `deployment.yaml` file: + - Change the namespace of all resources to the namespaces of the workload cluster. + - Choose an autoscale image. + - Change the following parameters in the `Deployment` resource: +```YAML + spec: + containers: + name: cluster-autoscaler + command: + - /cluster-autoscaler + args: + - --cloud-provider=clusterapi + - --kubeconfig=/mnt/kubeconfig/kubeconfig.yml + - --clusterapi-cloud-config-authoritative + - -v=1 + volumeMounts: + - mountPath: /mnt/kubeconfig + name: kubeconfig + readOnly: true + ... + volumes: + - name: kubeconfig + secret: + secretName: -kubeconfig + items: + - key: value + path: kubeconfig.yml +``` +7. Apply the `deployment.yaml` file. +```bash +kubectl apply -f deployment.yaml +``` +8. Add the [annotations](#autoscaler-node-group-annotations) to the workload cluster `MachineDeployment` resource. +9. Test Autoscaler. Go to the [Testing](#testing) section. + +## Scenario 2: Autoscaler running on workload cluster +In this scenario, Autoscaler will be deployed [on top of the workload cluster](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/README.md#autoscaler-running-in-a-joined-cluster-using-service-account-credentials){target=_blank} directly. In order for Autoscaler to work, it is required that the workload cluster resources are moved from the management cluster to the workload cluster. + +### Steps +1. Deploy a management cluster and workload cluster. The [CAPI quickstart](https://cluster-api.sigs.k8s.io/user/quick-start.html){target=_blank} can be used as a starting point. +2. Get the kubeconfig file for the workload cluster and use this kubeconfig to login to the workload cluster. +```bash +clusterctl get kubeconfig -n /path/to/kubeconfig +``` +3. Install a CNI in the workload cluster. +4. Initialise the CAPX components on top of the workload cluster: +```bash +clusterctl init --infrastructure nutanix +``` +5. Migrate the workload cluster custom resources to the workload cluster. Run following command from the management cluster: +```bash +clusterctl move -n --to-kubeconfig /path/to/kubeconfig +``` +6. Verify if the cluster has been migrated by running following command on the workload cluster: +```bash +kubectl get cluster -A +``` +7. Download the example [autoscaler deployment file](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/examples/deployment.yaml){target=_blank}. +8. Create the Autoscaler namespace: +```bash +kubectl create ns autoscaler +``` +9. Apply the `deployment.yaml` file +```bash +kubectl apply -f deployment.yaml +``` +10. Add the [annotations](#autoscaler-node-group-annotations) to the workload cluster `MachineDeployment` resource. +11. Test Autoscaler. Go to the [Testing](#testing) section. + +## Testing + +1. Deploy an example Kubernetes application. For example, the one used in the [Kubernetes HorizontalPodAutoscaler Walkthrough](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/). +```bash +kubectl apply -f https://k8s.io/examples/application/php-apache.yaml +``` +2. Increase the amount of replicas of the application to trigger a scale-up event: +``` +kubectl scale deployment php-apache --replicas 100 +``` +3. Decrease the amount of replicas of the application again to trigger a scale-down event. + + !!! note + In case of issues check the logs of the Autoscaler pods. + +4. After a while CAPX, will add more machines. Refer to the [Autoscaler configuration parameters](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-the-parameters-to-ca){target=_blank} to tweak the behaviour and timeouts. + +## Autoscaler node group annotations +Autoscaler uses following annotations to define the upper and lower boundries of the managed machines: + +| Annotation | Example Value | Description | +|-------------------------------------------------------------|---------------|-----------------------------------------------| +| cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size | 5 | Maximum amount of machines in this node group | +| cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size | 1 | Minimum amount of machines in this node group | + +These annotations must be applied to the `MachineDeployment` resources of a CAPX cluster. + +### Example +```YAML +apiVersion: cluster.x-k8s.io/v1beta1 +kind: MachineDeployment +metadata: + annotations: + cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size: "5" + cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size: "1" +``` \ No newline at end of file diff --git a/docs/capx/v1.6.x/experimental/capx_multi_pe.md b/docs/capx/v1.6.x/experimental/capx_multi_pe.md new file mode 100644 index 00000000..bd52ccd7 --- /dev/null +++ b/docs/capx/v1.6.x/experimental/capx_multi_pe.md @@ -0,0 +1,30 @@ +# Creating a workload CAPX cluster spanning Prism Element clusters + +!!! warning + The scenario and features described on this page are experimental. It's important to note that they have not been fully validated. + +This page will explain how to deploy CAPX-based Kubernetes clusters where worker nodes are spanning multiple Prism Element (PE) clusters. + +!!! note + All the PE clusters must be managed by the same Prism Central (PC) instance. + +The topology will look like this: + +- One PC managing multiple PE's +- One CAPI management cluster +- One CAPI workload cluster with multiple `MachineDeployment`resources + +Refer to the [CAPI quickstart](https://cluster-api.sigs.k8s.io/user/quick-start.html){target=_blank} to get started with CAPX. + +To create workload clusters spanning multiple Prism Element clusters, it is required to create a `MachineDeployment` and `NutanixMachineTemplate` resource for each Prism Element cluster. The Prism Element specific parameters (name/UUID, subnet,...) are referenced in the `NutanixMachineTemplate`. + +## Steps +1. Create a management cluster that has the CAPX infrastructure provider deployed. +2. Create a `cluster.yml` file containing the workload cluster definition. Refer to the steps defined in the [CAPI quickstart guide](https://cluster-api.sigs.k8s.io/user/quick-start.html){target=_blank} to create an example `cluster.yml` file. +3. Add additional `MachineDeployment` and `NutanixMachineTemplate` resources. + + By default there is only one machine template and machine deployment defined. To add nodes residing on another Prism Element cluster, a new `MachineDeployment` and `NutanixMachineTemplate` resource needs to be added to the yaml file. The autogenerated `MachineDeployment` and `NutanixMachineTemplate` resource definitions can be used as a baseline. + + Make sure to modify the `MachineDeployment` and `NutanixMachineTemplate` parameters. + +4. Apply the modified `cluster.yml` file to the management cluster. diff --git a/docs/capx/v1.6.x/experimental/oidc.md b/docs/capx/v1.6.x/experimental/oidc.md new file mode 100644 index 00000000..0c274121 --- /dev/null +++ b/docs/capx/v1.6.x/experimental/oidc.md @@ -0,0 +1,31 @@ +# OIDC integration + +!!! warning + The scenario and features described on this page are experimental. It's important to note that they have not been fully validated. + +Kubernetes allows users to authenticate using various authentication mechanisms. One of these mechanisms is OIDC. Information on how Kubernetes interacts with OIDC providers can be found in the [OpenID Connect Tokens](https://kubernetes.io/docs/reference/access-authn-authz/authentication/#openid-connect-tokens){target=_blank} section of the official Kubernetes documentation. + + +Follow the steps below to configure a CAPX cluster to use an OIDC identity provider. + +## Steps +1. Generate a `cluster.yaml` file with the required CAPX cluster configuration. Refer to the [Getting Started](../getting_started.md){target=_blank} page for more information on how to generate a `cluster.yaml` file. Do not apply the `cluster.yaml` file. +2. Edit the `cluster.yaml` file and search for the `KubeadmControlPlane` resource. +3. Modify/add the `spec.kubeadmConfigSpec.clusterConfiguration.apiServer.extraArgs` attribute and add the required [API server parameters](https://kubernetes.io/docs/reference/access-authn-authz/authentication/#configuring-the-api-server){target=_blank}. See the [example](#example) below. +4. Apply the `cluster.yaml` file +5. Log in with the OIDC provider once the cluster is provisioned + +## Example +```YAML +kind: KubeadmControlPlane +spec: + kubeadmConfigSpec: + clusterConfiguration: + apiServer: + extraArgs: + ... + oidc-client-id: + oidc-issuer-url: + ... +``` + diff --git a/docs/capx/v1.6.x/experimental/proxy.md b/docs/capx/v1.6.x/experimental/proxy.md new file mode 100644 index 00000000..c8f940d4 --- /dev/null +++ b/docs/capx/v1.6.x/experimental/proxy.md @@ -0,0 +1,62 @@ +# Proxy configuration + +!!! warning + The scenario and features described on this page are experimental. It's important to note that they have not been fully validated. + +CAPX can be configured to use a proxy to connect to external networks. This proxy configuration needs to be applied to control plane and worker nodes. + +Follow the steps below to configure a CAPX cluster to use a proxy. + +## Steps +1. Generate a `cluster.yaml` file with the required CAPX cluster configuration. Refer to the [Getting Started](../getting_started.md){target=_blank} page for more information on how to generate a `cluster.yaml` file. Do not apply the `cluster.yaml` file. +2. Edit the `cluster.yaml` file and modify the following resources as shown in the [example](#example) below to add the proxy configuration. + 1. `KubeadmControlPlane`: + * Add the proxy configuration to the `spec.kubeadmConfigSpec.files` list. Do not modify other items in the list. + * Add `systemctl` commands to apply the proxy config in `spec.kubeadmConfigSpec.preKubeadmCommands`. Do not modify other items in the list. + 2. `KubeadmConfigTemplate`: + * Add the proxy configuration to the `spec.template.spec.files` list. Do not modify other items in the list. + * Add `systemctl` commands to apply the proxy config in `spec.template.spec.preKubeadmCommands`. Do not modify other items in the list. +4. Apply the `cluster.yaml` file + +## Example + +```YAML +--- +# controlplane proxy settings +kind: KubeadmControlPlane +spec: + kubeadmConfigSpec: + files: + - content: | + [Service] + Environment="HTTP_PROXY=" + Environment="HTTPS_PROXY=" + Environment="NO_PROXY=" + owner: root:root + path: /etc/systemd/system/containerd.service.d/http-proxy.conf + ... + preKubeadmCommands: + - sudo systemctl daemon-reload + - sudo systemctl restart containerd + ... +--- +# worker proxy settings +kind: KubeadmConfigTemplate +spec: + template: + spec: + files: + - content: | + [Service] + Environment="HTTP_PROXY=" + Environment="HTTPS_PROXY=" + Environment="NO_PROXY=" + owner: root:root + path: /etc/systemd/system/containerd.service.d/http-proxy.conf + ... + preKubeadmCommands: + - sudo systemctl daemon-reload + - sudo systemctl restart containerd + ... +``` + diff --git a/docs/capx/v1.6.x/experimental/registry_mirror.md b/docs/capx/v1.6.x/experimental/registry_mirror.md new file mode 100644 index 00000000..307a9425 --- /dev/null +++ b/docs/capx/v1.6.x/experimental/registry_mirror.md @@ -0,0 +1,96 @@ +# Registry Mirror configuration + +!!! warning + The scenario and features described on this page are experimental. It's important to note that they have not been fully validated. + +CAPX can be configured to use a private registry to act as a mirror of an external public registry. This registry mirror configuration needs to be applied to control plane and worker nodes. + +Follow the steps below to configure a CAPX cluster to use a registry mirror. + +## Steps +1. Generate a `cluster.yaml` file with the required CAPX cluster configuration. Refer to the [Getting Started](../getting_started.md){target=_blank} page for more information on how to generate a `cluster.yaml` file. Do not apply the `cluster.yaml` file. +2. Edit the `cluster.yaml` file and modify the following resources as shown in the [example](#example) below to add the proxy configuration. + 1. `KubeadmControlPlane`: + * Add the registry mirror configuration to the `spec.kubeadmConfigSpec.files` list. Do not modify other items in the list. + * Update `/etc/containerd/config.toml` commands to apply the registry mirror config in `spec.kubeadmConfigSpec.preKubeadmCommands`. Do not modify other items in the list. + 2. `KubeadmConfigTemplate`: + * Add the registry mirror configuration to the `spec.template.spec.files` list. Do not modify other items in the list. + * Update `/etc/containerd/config.toml` commands to apply the registry mirror config in `spec.template.spec.preKubeadmCommands`. Do not modify other items in the list. +4. Apply the `cluster.yaml` file + +## Example + +This example will configure a registry mirror for the following namespace: + +* registry.k8s.io +* ghcr.io +* quay.io + +and redirect them to corresponding projects of the `` registry. + +```YAML +--- +# controlplane proxy settings +kind: KubeadmControlPlane +spec: + kubeadmConfigSpec: + files: + - content: | + [host."https:///v2/registry.k8s.io"] + capabilities = ["pull", "resolve"] + skip_verify = false + override_path = true + owner: root:root + path: /etc/containerd/certs.d/registry.k8s.io/hosts.toml + - content: | + [host."https:///v2/ghcr.io"] + capabilities = ["pull", "resolve"] + skip_verify = false + override_path = true + owner: root:root + path: /etc/containerd/certs.d/ghcr.io/hosts.toml + - content: | + [host."https:///v2/quay.io"] + capabilities = ["pull", "resolve"] + skip_verify = false + override_path = true + owner: root:root + path: /etc/containerd/certs.d/quay.io/hosts.toml + ... + preKubeadmCommands: + - echo '\n[plugins."io.containerd.grpc.v1.cri".registry]\n config_path = "/etc/containerd/certs.d"' >> /etc/containerd/config.toml + ... +--- +# worker proxy settings +kind: KubeadmConfigTemplate +spec: + template: + spec: + files: + - content: | + [host."https:///v2/registry.k8s.io"] + capabilities = ["pull", "resolve"] + skip_verify = false + override_path = true + owner: root:root + path: /etc/containerd/certs.d/registry.k8s.io/hosts.toml + - content: | + [host."https:///v2/ghcr.io"] + capabilities = ["pull", "resolve"] + skip_verify = false + override_path = true + owner: root:root + path: /etc/containerd/certs.d/ghcr.io/hosts.toml + - content: | + [host."https:///v2/quay.io"] + capabilities = ["pull", "resolve"] + skip_verify = false + override_path = true + owner: root:root + path: /etc/containerd/certs.d/quay.io/hosts.toml + ... + preKubeadmCommands: + - echo '\n[plugins."io.containerd.grpc.v1.cri".registry]\n config_path = "/etc/containerd/certs.d"' >> /etc/containerd/config.toml + ... +``` + diff --git a/docs/capx/v1.6.x/experimental/vpc.md b/docs/capx/v1.6.x/experimental/vpc.md new file mode 100644 index 00000000..3513e47e --- /dev/null +++ b/docs/capx/v1.6.x/experimental/vpc.md @@ -0,0 +1,40 @@ +# Creating a workload CAPX cluster in a Nutanix Flow VPC + +!!! warning + The scenario and features described on this page are experimental. It's important to note that they have not been fully validated. + +!!! note + Nutanix Flow VPCs are only validated with CAPX 1.1.3+ + +[Nutanix Flow Virtual Networking](https://portal.nutanix.com/page/documents/details?targetId=Nutanix-Flow-Virtual-Networking-Guide-vpc_2022_9:Nutanix-Flow-Virtual-Networking-Guide-vpc_2022_9){target=_blank} allows users to create Virtual Private Clouds (VPCs) with Overlay networking. +The steps below will illustrate how a CAPX cluster can be deployed inside an overlay subnet (NAT) inside a VPC while the management cluster resides outside of the VPC. + + +## Steps +1. [Request a floating IP](https://portal.nutanix.com/page/documents/details?targetId=Nutanix-Flow-Networking-Guide:ear-flow-nw-request-floating-ip-pc-t.html){target=_blank} +2. Link the floating IP to an internal IP address inside the overlay subnet that will be used to deploy the CAPX cluster. This address will be assigned to the CAPX loadbalancer. To prevent IP conflicts, make sure the IP address is not part of the IP-pool defined in the subnet. +3. Generate a `cluster.yaml` file with the required CAPX cluster configuration where the `CONTROL_PLANE_ENDPOINT_IP` is set to the floating IP requested in the first step. Refer to the [Getting Started](../getting_started.md){target=_blank} page for more information on how to generate a `cluster.yaml` file. Do not apply the `cluster.yaml` file. +4. Edit the `cluster.yaml` file and search for the `KubeadmControlPlane` resource. +5. Modify the `spec.kubeadmConfigSpec.files.*.content` attribute and change the `kube-vip` definition similar to the [example](#example) below. +6. Apply the `cluster.yaml` file. +7. When the CAPX workload cluster is deployed, it will be reachable via the floating IP. + +## Example +```YAML +kind: KubeadmControlPlane +spec: + kubeadmConfigSpec: + files: + - content: | + apiVersion: v1 + kind: Pod + metadata: + name: kube-vip + namespace: kube-system + spec: + containers: + - env: + - name: address + value: "" +``` + diff --git a/docs/capx/v1.6.x/getting_started.md b/docs/capx/v1.6.x/getting_started.md new file mode 100644 index 00000000..3866492a --- /dev/null +++ b/docs/capx/v1.6.x/getting_started.md @@ -0,0 +1,159 @@ +# Getting Started + +This is a guide on getting started with Cluster API Provider Nutanix Cloud Infrastructure (CAPX). To learn more about cluster API in more depth, check out the [Cluster API book](https://cluster-api.sigs.k8s.io/){target=_blank}. + +For more information on how install the Nutanix CSI Driver on a CAPX cluster, visit [Nutanix CSI Driver installation with CAPX](./addons/install_csi_driver.md). + +For more information on how CAPX handles credentials, visit [Credential Management](./credential_management.md). + +For more information on the port requirements for CAPX, visit [Port Requirements](./port_requirements.md). + +!!! note + [Nutanix Cloud Controller Manager (CCM)](../../ccm/latest/overview.md) is a mandatory component starting from CAPX v1.3.0. Ensure all CAPX-managed Kubernetes clusters are configured to use Nutanix CCM before upgrading to v1.3.0 or later. See [CAPX v1.6.x Upgrade Procedure](./tasks/capx_v16x_upgrade_procedure.md). + +## Production Workflow + +### Build OS image for NutanixMachineTemplate resource +Cluster API Provider Nutanix Cloud Infrastructure (CAPX) uses the [Image Builder](https://image-builder.sigs.k8s.io/){target=_blank} project to build OS images used for the Nutanix machines. + +Follow the steps detailed in [Building CAPI Images for Nutanix Cloud Platform (NCP)](https://image-builder.sigs.k8s.io/capi/providers/nutanix.html#building-capi-images-for-nutanix-cloud-platform-ncp){target=_blank} to use Image Builder on the Nutanix Cloud Platform. + +For a list of operating systems visit the OS image [Configuration](https://image-builder.sigs.k8s.io/capi/providers/nutanix.html#configuration){target=_blank} page. + +### Prerequisites for using Cluster API Provider Nutanix Cloud Infrastructure +The [Cluster API installation](https://cluster-api.sigs.k8s.io/user/quick-start.html#installation){target=_blank} section provides an overview of all required prerequisites: + +- [Common Prerequisites](https://cluster-api.sigs.k8s.io/user/quick-start.html#common-prerequisites){target=_blank} +- [Install and/or configure a Kubernetes cluster](https://cluster-api.sigs.k8s.io/user/quick-start.html#install-andor-configure-a-kubernetes-cluster){target=_blank} +- [Install clusterctl](https://cluster-api.sigs.k8s.io/user/quick-start.html#install-clusterctl){target=_blank} +- (Optional) [Enabling Feature Gates](https://cluster-api.sigs.k8s.io/user/quick-start.html#enabling-feature-gates){target=_blank} + +Make sure these prerequisites have been met before moving to the [Configure and Install Cluster API Provider Nutanix Cloud Infrastructure](#configure-and-install-cluster-api-provider-nutanix-cloud-infrastructure) step. + + +### Configure and Install Cluster API Provider Nutanix Cloud Infrastructure +To initialize Cluster API Provider Nutanix Cloud Infrastructure, `clusterctl` requires the following variables, which should be set in either `~/.cluster-api/clusterctl.yaml` or as environment variables. +``` +NUTANIX_ENDPOINT: "" # IP or FQDN of Prism Central +NUTANIX_USER: "" # Prism Central user +NUTANIX_PASSWORD: "" # Prism Central password +NUTANIX_INSECURE: false # or true + +KUBERNETES_VERSION: "v1.22.9" +WORKER_MACHINE_COUNT: 3 +NUTANIX_SSH_AUTHORIZED_KEY: "" + +NUTANIX_PRISM_ELEMENT_CLUSTER_NAME: "" +NUTANIX_MACHINE_TEMPLATE_IMAGE_NAME: "" +NUTANIX_SUBNET_NAME: "" + +EXP_CLUSTER_RESOURCE_SET: true # Required for Nutanix CCM installation +``` + +You can also see the required list of variables by running the following: +``` +clusterctl generate cluster mycluster -i nutanix --list-variables +Required Variables: + - CONTROL_PLANE_ENDPOINT_IP + - KUBERNETES_VERSION + - NUTANIX_ENDPOINT + - NUTANIX_MACHINE_TEMPLATE_IMAGE_NAME + - NUTANIX_PASSWORD + - NUTANIX_PRISM_ELEMENT_CLUSTER_NAME + - NUTANIX_SSH_AUTHORIZED_KEY + - NUTANIX_SUBNET_NAME + - NUTANIX_USER + +Optional Variables: + - CONTROL_PLANE_ENDPOINT_PORT (defaults to "6443") + - CONTROL_PLANE_MACHINE_COUNT (defaults to 1) + - KUBEVIP_LB_ENABLE (defaults to "false") + - KUBEVIP_SVC_ENABLE (defaults to "false") + - NAMESPACE (defaults to current Namespace in the KubeConfig file) + - NUTANIX_INSECURE (defaults to "false") + - NUTANIX_MACHINE_BOOT_TYPE (defaults to "legacy") + - NUTANIX_MACHINE_MEMORY_SIZE (defaults to "4Gi") + - NUTANIX_MACHINE_VCPU_PER_SOCKET (defaults to "1") + - NUTANIX_MACHINE_VCPU_SOCKET (defaults to "2") + - NUTANIX_PORT (defaults to "9440") + - NUTANIX_SYSTEMDISK_SIZE (defaults to "40Gi") + - WORKER_MACHINE_COUNT (defaults to 0) +``` + +!!! note + To prevent duplicate IP assignments, it is required to assign an IP-address to the `CONTROL_PLANE_ENDPOINT_IP` variable that is not part of the Nutanix IPAM or DHCP range assigned to the subnet of the CAPX cluster. + +!!! warning + Make sure [Cluster Resource Set (CRS)](https://cluster-api.sigs.k8s.io/tasks/experimental-features/cluster-resource-set){target=_blank} is enabled before running `clusterctl init` + +Now you can instantiate Cluster API with the following: +``` +clusterctl init -i nutanix +``` + +### Deploy a workload cluster on Nutanix Cloud Infrastructure +``` +export TEST_CLUSTER_NAME=mytestcluster1 +export TEST_NAMESPACE=mytestnamespace +CONTROL_PLANE_ENDPOINT_IP=x.x.x.x clusterctl generate cluster ${TEST_CLUSTER_NAME} \ + -i nutanix \ + --target-namespace ${TEST_NAMESPACE} \ + --kubernetes-version v1.22.9 \ + --control-plane-machine-count 1 \ + --worker-machine-count 3 > ./cluster.yaml +kubectl create ns ${TEST_NAMESPACE} +kubectl apply -f ./cluster.yaml -n ${TEST_NAMESPACE} +``` +To customize the configuration of the default `cluster.yaml` file generated by CAPX, visit the [NutanixCluster](./types/nutanix_cluster.md) and [NutanixMachineTemplate](./types/nutanix_machine_template.md) documentation. + +### Access a workload cluster +To access resources on the cluster, you can get the kubeconfig with the following: +``` +clusterctl get kubeconfig ${TEST_CLUSTER_NAME} -n ${TEST_NAMESPACE} > ${TEST_CLUSTER_NAME}.kubeconfig +kubectl --kubeconfig ./${TEST_CLUSTER_NAME}.kubeconfig get nodes +``` + +### Install CNI on workload a cluster + +You must deploy a Container Network Interface (CNI) based pod network add-on so that your pods can communicate with each other. Cluster DNS (CoreDNS) will not start up before a network is installed. + +!!! note + Take care that your pod network must not overlap with any of the host networks. You are likely to see problems if there is any overlap. If you find a collision between your network plugin's preferred pod network and some of your host networks, you must choose a suitable alternative CIDR block to use instead. It can be configured inside the `cluster.yaml` generated by `clusterctl generate cluster` before applying it. + +Several external projects provide Kubernetes pod networks using CNI, some of which also support [Network Policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/){target=_blank}. + +See a list of add-ons that implement the [Kubernetes networking model](https://kubernetes.io/docs/concepts/cluster-administration/networking/#how-to-implement-the-kubernetes-network-model){target=_blank}. At time of writing, the most common are [Calico](https://www.tigera.io/project-calico/){target=_blank} and [Cilium](https://cilium.io){target=_blank}. + +Follow the specific install guide for your selected CNI and install only one pod network per cluster. + +Once a pod network has been installed, you can confirm that it is working by checking that the CoreDNS pod is running in the output of `kubectl get pods --all-namespaces`. + + +### Kube-vip settings + +Kube-vip is a true load balancing solution for the Kubernetes control plane. It distributes API requests across control plane nodes. It also has the capability to provide load balancing for Kubernetes services. + +You can tweak kube-vip settings by using the following properties: + +- `KUBEVIP_LB_ENABLE` + +This setting allows control plane load balancing using IPVS. See +[Control Plane Load-Balancing documentation](https://kube-vip.io/docs/about/architecture/#control-plane-load-balancing){target=_blank} for further information. + +- `KUBEVIP_SVC_ENABLE` + +This setting enables a service of type LoadBalancer. See +[Kubernetes Service Load Balancing documentation](https://kube-vip.io/docs/about/architecture/#kubernetes-service-load-balancing){target=_blank} for further information. + +- `KUBEVIP_SVC_ELECTION` + +This setting enables Load Balancing of Load Balancers. See [Load Balancing Load Balancers](https://kube-vip.io/docs/usage/kubernetes-services/#load-balancing-load-balancers-when-using-arp-mode-yes-you-read-that-correctly-kube-vip-v050){target=_blank} for further information. + +### Delete a workload cluster +To remove a workload cluster from your management cluster, remove the cluster object and the provider will clean-up all resources. + +``` +kubectl delete cluster ${TEST_CLUSTER_NAME} -n ${TEST_NAMESPACE} +``` +!!! note + Deleting the entire cluster template with `kubectl delete -f ./cluster.yaml` may lead to pending resources requiring manual cleanup. diff --git a/docs/capx/v1.6.x/pc_certificates.md b/docs/capx/v1.6.x/pc_certificates.md new file mode 100644 index 00000000..f3fe1699 --- /dev/null +++ b/docs/capx/v1.6.x/pc_certificates.md @@ -0,0 +1,149 @@ +# Certificate Trust + +CAPX invokes Prism Central APIs using the HTTPS protocol. CAPX has different methods to handle the trust of the Prism Central certificates: + +- Enable certificate verification (default) +- Configure an additional trust bundle +- Disable certificate verification + +See the respective sections below for more information. + +!!! note + For more information about replacing Prism Central certificates, see the [Nutanix AOS Security Guide](https://portal.nutanix.com/page/documents/details?targetId=Nutanix-Security-Guide-v6_5:mul-security-ssl-certificate-pc-t.html){target=_blank}. + +## Enable certificate verification (default) +By default CAPX will perform certificate verification when invoking Prism Central API calls. This requires Prism Central to be configured with a publicly trusted certificate authority. +No additional configuration is required in CAPX. + +## Configure an additional trust bundle +CAPX allows users to configure an additional trust bundle. This will allow CAPX to verify certificates that are not issued by a publicy trusted certificate authority. + +To configure an additional trust bundle, the `NUTANIX_ADDITIONAL_TRUST_BUNDLE` environment variable needs to be set. The value of the `NUTANIX_ADDITIONAL_TRUST_BUNDLE` environment variable contains the trust bundle (PEM format) in base64 encoded format. See the [Configuring the trust bundle environment variable](#configuring-the-trust-bundle-environment-variable) section for more information. + +It is also possible to configure the additional trust bundle manually by creating a custom `cluster-template`. See the [Configuring the additional trust bundle manually](#configuring-the-additional-trust-bundle-manually) section for more information + +The `NUTANIX_ADDITIONAL_TRUST_BUNDLE` environment variable can be set when initializing the CAPX provider or when creating a workload cluster. If the `NUTANIX_ADDITIONAL_TRUST_BUNDLE` is configured when the CAPX provider is initialized, the additional trust bundle will be used for every CAPX workload cluster. If it is only configured when creating a workload cluster, it will only be applicable for that specific workload cluster. + + +### Configuring the trust bundle environment variable + +Create a PEM encoded file containing the root certificate and all intermediate certificates. Example: +``` +$ cat cert.crt +-----BEGIN CERTIFICATE----- + +-----END CERTIFICATE----- +-----BEGIN CERTIFICATE----- + +-----END CERTIFICATE----- +``` + +Use a `base64` tool to encode these contents in base64. The command below will provide a `base64` string. +``` +$ cat cert.crt | base64 + +``` +!!! note + Make sure the `base64` string does not contain any newlines (`\n`). If the output string contains newlines, remove them manually or check the manual of the `base64` tool on how to generate a `base64` string without newlines. + +Use the `base64` string as value for the `NUTANIX_ADDITIONAL_TRUST_BUNDLE` environment variable. +``` +$ export NUTANIX_ADDITIONAL_TRUST_BUNDLE="" +``` + +### Configuring the additional trust bundle manually + +To configure the additional trust bundle manually without using the `NUTANIX_ADDITIONAL_TRUST_BUNDLE` environment variable present in the default `cluster-template` files, it is required to: + +- Create a `ConfigMap` containing the additional trust bundle. +- Configure the `prismCentral.additionalTrustBundle` object in the `NutanixCluster` spec. + +#### Creating the additional trust bundle ConfigMap + +CAPX supports two different formats for the ConfigMap containing the additional trust bundle. The first one is to add the additional trust bundle as a multi-line string in the `ConfigMap`, the second option is to add the trust bundle in `base64` encoded format. See the examples below. + +Multi-line string example: +```YAML +--- +apiVersion: v1 +kind: ConfigMap +metadata: + name: user-ca-bundle + namespace: ${NAMESPACE} +data: + ca.crt: | + -----BEGIN CERTIFICATE----- + + -----END CERTIFICATE----- + -----BEGIN CERTIFICATE----- + + -----END CERTIFICATE----- +``` + +`base64` example: + +```YAML +apiVersion: v1 +kind: ConfigMap +metadata: + name: user-ca-bundle + namespace: ${NAMESPACE} +binaryData: + ca.crt: +``` + +!!! note + The `base64` string needs to be added as `binaryData`. + + +#### Configuring the NutanixCluster spec + +When the additional trust bundle `ConfigMap` is created, it needs to be referenced in the `NutanixCluster` spec. Add the `prismCentral.additionalTrustBundle` object in the `NutanixCluster` spec as shown below. Make sure the correct additional trust bundle `ConfigMap` is referenced. + +```YAML +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixCluster +metadata: + name: ${CLUSTER_NAME} + namespace: ${NAMESPACE} +spec: + ... + prismCentral: + ... + additionalTrustBundle: + kind: ConfigMap + name: user-ca-bundle + insecure: false +``` + +!!! note + the default value of `prismCentral.insecure` attribute is `false`. It can be omitted when an additional trust bundle is configured. + + If `prismCentral.insecure` attribute is set to `true`, all certificate verification will be disabled. + + +## Disable certificate verification + +!!! note + Disabling certificate verification is not recommended for production purposes and should only be used for testing. + + +Certificate verification can be disabled by setting the `prismCentral.insecure` attribute to `true` in the `NutanixCluster` spec. Certificate verification will be disabled even if an additional trust bundle is configured. + +Disabled certificate verification example: + +```YAML +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixCluster +metadata: + name: ${CLUSTER_NAME} + namespace: ${NAMESPACE} +spec: + controlPlaneEndpoint: + host: ${CONTROL_PLANE_ENDPOINT_IP} + port: ${CONTROL_PLANE_ENDPOINT_PORT=6443} + prismCentral: + ... + insecure: true + ... +``` \ No newline at end of file diff --git a/docs/capx/v1.6.x/port_requirements.md b/docs/capx/v1.6.x/port_requirements.md new file mode 100644 index 00000000..af182abb --- /dev/null +++ b/docs/capx/v1.6.x/port_requirements.md @@ -0,0 +1,19 @@ +# Port Requirements + +CAPX uses the ports documented below to create workload clusters. + +!!! note + This page only documents the ports specifically required by CAPX and does not provide the full overview of all ports required in the CAPI framework. + +## Management cluster + +| Source | Destination | Protocol | Port | Description | +|--------------------|---------------------|----------|------|--------------------------------------------------------------------------------------------------| +| Management cluster | External Registries | TCP | 443 | Pull container images from [CAPX public registries](#public-registries-utilized-when-using-capx) | +| Management cluster | Prism Central | TCP | 9440 | Management cluster communication to Prism Central | + +## Public registries utilized when using CAPX + +| Registry name | +|---------------| +| ghcr.io | diff --git a/docs/capx/v1.6.x/tasks/capx_v16x_upgrade_procedure.md b/docs/capx/v1.6.x/tasks/capx_v16x_upgrade_procedure.md new file mode 100644 index 00000000..8e998c97 --- /dev/null +++ b/docs/capx/v1.6.x/tasks/capx_v16x_upgrade_procedure.md @@ -0,0 +1,83 @@ +# CAPX v1.6.x Upgrade Procedure + +Starting from CAPX v1.3.0, it is required for all CAPX-managed Kubernetes clusters to use the Nutanix Cloud Controller Manager (CCM). + +Before upgrading CAPX instances to v1.3.0 or later, it is required to follow the [steps](#steps) detailed below for each of the CAPX-managed Kubernetes clusters that don't use Nutanix CCM. + + +## Steps + +This procedure uses [Cluster Resource Set (CRS)](https://cluster-api.sigs.k8s.io/tasks/experimental-features/cluster-resource-set){target=_blank} to install Nutanix CCM but it can also be installed using the [Nutanix CCM Helm chart](https://artifacthub.io/packages/helm/nutanix/nutanix-cloud-provider){target=_blank}. + +!!! warning + Make sure [CRS](https://cluster-api.sigs.k8s.io/tasks/experimental-features/cluster-resource-set){target=_blank} is enabled on the management cluster before following the procedure. + +Perform following steps for each of the CAPX-managed Kubernetes clusters that are not configured to use Nutanix CCM: + +1. Add the `cloud-provider: external` configuration in the `KubeadmConfigTemplate` resources: + ```YAML + apiVersion: bootstrap.cluster.x-k8s.io/v1beta1 + kind: KubeadmConfigTemplate + spec: + template: + spec: + joinConfiguration: + nodeRegistration: + kubeletExtraArgs: + cloud-provider: external + ``` +2. Add the `cloud-provider: external` configuration in the `KubeadmControlPlane` resource: +```YAML +--- +apiVersion: bootstrap.cluster.x-k8s.io/v1beta1 +kind: KubeadmConfigTemplate +spec: + template: + spec: + joinConfiguration: + nodeRegistration: + kubeletExtraArgs: + cloud-provider: external +--- +apiVersion: controlplane.cluster.x-k8s.io/v1beta1 +kind: KubeadmControlPlane +spec: + kubeadmConfigSpec: + clusterConfiguration: + apiServer: + extraArgs: + cloud-provider: external + controllerManager: + extraArgs: + cloud-provider: external + initConfiguration: + nodeRegistration: + kubeletExtraArgs: + cloud-provider: external + joinConfiguration: + nodeRegistration: + kubeletExtraArgs: + cloud-provider: external +``` +3. Add the Nutanix CCM CRS resources: + + - [nutanix-ccm-crs.yaml](https://github.com/nutanix-cloud-native/cluster-api-provider-nutanix/blob/v1.6.0/templates/ccm/nutanix-ccm-crs.yaml){target=_blank} + - [nutanix-ccm-secret.yaml](https://github.com/nutanix-cloud-native/cluster-api-provider-nutanix/blob/v1.6.0/templates/ccm/nutanix-ccm-secret.yaml) + - [nutanix-ccm.yaml](https://github.com/nutanix-cloud-native/cluster-api-provider-nutanix/blob/v1.6.0/templates/ccm/nutanix-ccm.yaml) + + Make sure to update each of the variables before applying the `YAML` files. + +4. Add the `ccm: nutanix` label to the `Cluster` resource: + ```YAML + apiVersion: cluster.x-k8s.io/v1beta1 + kind: Cluster + metadata: + labels: + ccm: nutanix + ``` +5. Verify if the Nutanix CCM pod is up and running: +``` +kubectl get pod -A -l k8s-app=nutanix-cloud-controller-manager +``` +6. Trigger a new rollout of the Kubernetes nodes by performing a Kubernetes upgrade or by using `clusterctl alpha rollout restart`. See the [clusterctl alpha rollout](https://cluster-api.sigs.k8s.io/clusterctl/commands/alpha-rollout#restart){target=_blank} for more information. +7. Upgrade CAPX to v1.6.0 by following the [clusterctl upgrade](https://cluster-api.sigs.k8s.io/clusterctl/commands/upgrade.html?highlight=clusterctl%20upgrade%20pla#clusterctl-upgrade){target=_blank} documentation \ No newline at end of file diff --git a/docs/capx/v1.6.x/tasks/modify_machine_configuration.md b/docs/capx/v1.6.x/tasks/modify_machine_configuration.md new file mode 100644 index 00000000..04a43a95 --- /dev/null +++ b/docs/capx/v1.6.x/tasks/modify_machine_configuration.md @@ -0,0 +1,11 @@ +# Modifying Machine Configurations + +Since all attributes of the `NutanixMachineTemplate` resources are immutable, follow the [Updating Infrastructure Machine Templates](https://cluster-api.sigs.k8s.io/tasks/updating-machine-templates.html?highlight=machine%20template#updating-infrastructure-machine-templates){target=_blank} procedure to modify the configuration of machines in an existing CAPX cluster. +See the [NutanixMachineTemplate](../types/nutanix_machine_template.md) documentation for all supported configuration parameters. + +!!! note + Manually modifying existing and linked `NutanixMachineTemplate` resources will not trigger a rolling update of the machines. + +!!! note + Do not modify the virtual machine configuration of CAPX cluster nodes manually in Prism/Prism Central. + CAPX will not automatically revert the configuration change but performing scale-up/scale-down/upgrade operations will override manual modifications. Only use the `Updating Infrastructure Machine` procedure referenced above to perform configuration changes. \ No newline at end of file diff --git a/docs/capx/v1.6.x/troubleshooting.md b/docs/capx/v1.6.x/troubleshooting.md new file mode 100644 index 00000000..c023d13e --- /dev/null +++ b/docs/capx/v1.6.x/troubleshooting.md @@ -0,0 +1,13 @@ +# Troubleshooting + +## Clusterctl failed with GitHub rate limit error + +By design Clusterctl fetches artifacts from repositories hosted on GitHub, this operation is subject to [GitHub API rate limits](https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting){target=_blank}. + +While this is generally okay for the majority of users, there is still a chance that some users (especially developers or CI tools) hit this limit: + +``` +Error: failed to get repository client for the XXX with name YYY: error creating the GitHub repository client: failed to get GitHub latest version: failed to get the list of versions: rate limit for github api has been reached. Please wait one hour or get a personal API tokens a assign it to the GITHUB_TOKEN environment variable +``` + +As explained in the error message, you can increase your API rate limit by [creating a GitHub personal token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token){target=_blank} and setting a `GITHUB_TOKEN` environment variable using the token. diff --git a/docs/capx/v1.6.x/types/nutanix_cluster.md b/docs/capx/v1.6.x/types/nutanix_cluster.md new file mode 100644 index 00000000..09325cab --- /dev/null +++ b/docs/capx/v1.6.x/types/nutanix_cluster.md @@ -0,0 +1,64 @@ +# NutanixCluster + +The `NutanixCluster` resource defines the configuration of a CAPX Kubernetes cluster. + +Example of a `NutanixCluster` resource: + +```YAML +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixCluster +metadata: + name: ${CLUSTER_NAME} + namespace: ${NAMESPACE} +spec: + controlPlaneEndpoint: + host: ${CONTROL_PLANE_ENDPOINT_IP} + port: ${CONTROL_PLANE_ENDPOINT_PORT=6443} + prismCentral: + address: ${NUTANIX_ENDPOINT} + additionalTrustBundle: + kind: ConfigMap + name: user-ca-bundle + credentialRef: + kind: Secret + name: ${CLUSTER_NAME} + insecure: ${NUTANIX_INSECURE=false} + port: ${NUTANIX_PORT=9440} +``` + +## NutanixCluster spec +The table below provides an overview of the supported parameters of the `spec` attribute of a `NutanixCluster` resource. + +### Configuration parameters + +| Key |Type |Description | +|--------------------------------------------|------|----------------------------------------------------------------------------------| +|controlPlaneEndpoint |object|Defines the host IP and port of the CAPX Kubernetes cluster. | +|controlPlaneEndpoint.host |string|Host IP to be assigned to the CAPX Kubernetes cluster. | +|controlPlaneEndpoint.port |int |Port of the CAPX Kubernetes cluster. Default: `6443` | +|prismCentral |object|(Optional) Prism Central endpoint definition. | +|prismCentral.address |string|IP/FQDN of Prism Central. | +|prismCentral.port |int |Port of Prism Central. Default: `9440` | +|prismCentral.insecure |bool |Disable Prism Central certificate checking. Default: `false` | +|prismCentral.credentialRef |object|Reference to credentials used for Prism Central connection. | +|prismCentral.credentialRef.kind |string|Kind of the credentialRef. Allowed value: `Secret` | +|prismCentral.credentialRef.name |string|Name of the secret containing the Prism Central credentials. | +|prismCentral.credentialRef.namespace |string|(Optional) Namespace of the secret containing the Prism Central credentials. | +|prismCentral.additionalTrustBundle |object|Reference to the certificate trust bundle used for Prism Central connection. | +|prismCentral.additionalTrustBundle.kind |string|Kind of the additionalTrustBundle. Allowed value: `ConfigMap` | +|prismCentral.additionalTrustBundle.name |string|Name of the `ConfigMap` containing the Prism Central trust bundle. | +|prismCentral.additionalTrustBundle.namespace|string|(Optional) Namespace of the `ConfigMap` containing the Prism Central trust bundle.| +|failureDomains |list |(Optional) Failure domains for the Kubernetes nodes | +|failureDomains.[].name |string|Name of the failure domain | +|failureDomains.[].cluster |object|Reference (name or uuid) to the Prism Element cluster. Name or UUID can be passed | +|failureDomains.[].cluster.type |string|Type to identify the Prism Element cluster. Allowed values: `name` and `uuid` | +|failureDomains.[].cluster.name |string|Name of the Prism Element cluster. | +|failureDomains.[].cluster.uuid |string|UUID of the Prism Element cluster. | +|failureDomains.[].subnets |list |(Optional) Reference (name or uuid) to the subnets to be assigned to the VMs. | +|failureDomains.[].subnets.[].type |string|Type to identify the subnet. Allowed values: `name` and `uuid` | +|failureDomains.[].subnets.[].name |string|Name of the subnet. | +|failureDomains.[].subnets.[].uuid |string|UUID of the subnet. | +|failureDomains.[].controlPlane |bool |Indicates if a failure domain is suited for control plane nodes + +!!! note + To prevent duplicate IP assignments, it is required to assign an IP-address to the `controlPlaneEndpoint.host` variable that is not part of the Nutanix IPAM or DHCP range assigned to the subnet of the CAPX cluster. \ No newline at end of file diff --git a/docs/capx/v1.6.x/types/nutanix_machine_template.md b/docs/capx/v1.6.x/types/nutanix_machine_template.md new file mode 100644 index 00000000..4aa613b8 --- /dev/null +++ b/docs/capx/v1.6.x/types/nutanix_machine_template.md @@ -0,0 +1,124 @@ +# NutanixMachineTemplate +The `NutanixMachineTemplate` resource defines the configuration of a CAPX Kubernetes VM. + +Example of a `NutanixMachineTemplate` resource. + +```YAML +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixMachineTemplate +metadata: + name: "${CLUSTER_NAME}-mt-0" + namespace: "${NAMESPACE}" +spec: + template: + spec: + providerID: "nutanix://${CLUSTER_NAME}-m1" + # Supported options for boot type: legacy and uefi + # Defaults to legacy if not set + bootType: ${NUTANIX_MACHINE_BOOT_TYPE=legacy} + vcpusPerSocket: ${NUTANIX_MACHINE_VCPU_PER_SOCKET=1} + vcpuSockets: ${NUTANIX_MACHINE_VCPU_SOCKET=2} + memorySize: "${NUTANIX_MACHINE_MEMORY_SIZE=4Gi}" + systemDiskSize: "${NUTANIX_SYSTEMDISK_SIZE=40Gi}" + image: + type: name + name: "${NUTANIX_MACHINE_TEMPLATE_IMAGE_NAME}" + cluster: + type: name + name: "${NUTANIX_PRISM_ELEMENT_CLUSTER_NAME}" + subnet: + - type: name + name: "${NUTANIX_SUBNET_NAME}" + # Adds additional categories to the virtual machines. + # Note: Categories must already be present in Prism Central + # additionalCategories: + # - key: AppType + # value: Kubernetes + # Adds the cluster virtual machines to a project defined in Prism Central. + # Replace NUTANIX_PROJECT_NAME with the correct project defined in Prism Central + # Note: Project must already be present in Prism Central. + # project: + # type: name + # name: "NUTANIX_PROJECT_NAME" + # gpus: + # - type: name + # name: "GPU NAME" + # Note: Either of `image` or `imageLookup` must be set, but not both. + # imageLookup: + # format: "NUTANIX_IMAGE_LOOKUP_FORMAT" + # baseOS: "NUTANIX_IMAGE_LOOKUP_BASE_OS" + # dataDisks: + # - diskSize: + # deviceProperties: + # deviceType: Disk + # adapterType: SCSI + # deviceIndex: 1 + # storageConfig: + # diskMode: Standard + # storageContainer: + # type: name + # name: "NUTANIX_VM_DISK_STORAGE_CONTAINER" + # dataSource: + # type: name + # name: "NUTANIX_DATA_SOURCE_IMAGE_NAME" +``` + +## NutanixMachineTemplate spec +The table below provides an overview of the supported parameters of the `spec` attribute of a `NutanixMachineTemplate` resource. + +### Configuration parameters +| Key |Type |Description | +|----------------------------------------------------|------|--------------------------------------------------------------------------------------------------------| +|bootType |string|Boot type of the VM. Depends on the OS image used. Allowed values: `legacy`, `uefi`. Default: `legacy` | +|vcpusPerSocket |int |Amount of vCPUs per socket. Default: `1` | +|vcpuSockets |int |Amount of vCPU sockets. Default: `2` | +|memorySize |string|Amount of Memory. Default: `4Gi` | +|systemDiskSize |string|Amount of storage assigned to the system disk. Default: `40Gi` | +|image |object|Reference (name or uuid) to the OS image used for the system disk. | +|image.type |string|Type to identify the OS image. Allowed values: `name` and `uuid` | +|image.name |string|Name of the image. | +|image.uuid |string|UUID of the image. | +|cluster |object|(Optional) Reference (name or uuid) to the Prism Element cluster. Name or UUID can be passed | +|cluster.type |string|Type to identify the Prism Element cluster. Allowed values: `name` and `uuid` | +|cluster.name |string|Name of the Prism Element cluster. | +|cluster.uuid |string|UUID of the Prism Element cluster. | +|subnets |list |(Optional) Reference (name or uuid) to the subnets to be assigned to the VMs. | +|subnets.[].type |string|Type to identify the subnet. Allowed values: `name` and `uuid` | +|subnets.[].name |string|Name of the subnet. | +|subnets.[].uuid |string|UUID of the subnet. | +|additionalCategories |list |Reference to the categories to be assigned to the VMs. These categories already exist in Prism Central. | +|additionalCategories.[].key |string|Key of the category. | +|additionalCategories.[].value |string|Value of the category. | +|project |object|Reference (name or uuid) to the project. This project must already exist in Prism Central. | +|project.type |string|Type to identify the project. Allowed values: `name` and `uuid` | +|project.name |string|Name of the project. | +|project.uuid |string|UUID of the project. | +|gpus |object|Reference (name or deviceID) to the GPUs to be assigned to the VMs. Can be vGPU or Passthrough. | +|gpus.[].type |string|Type to identify the GPU. Allowed values: `name` and `deviceID` | +|gpus.[].name |string|Name of the GPU or the vGPU profile | +|gpus.[].deviceID |string|DeviceID of the GPU or the vGPU profile | +|imageLookup |object|(Optional) Reference to a container that holds how to look up rhcos images for the cluster. | +|imageLookup.format |string|Naming format to look up the image for the machine. Default: `capx-{{.BaseOS}}-{{.K8sVersion}}-*` | +|imageLookup.baseOS |string|Name of the base operating system to use for image lookup. | +|dataDisks |list |(Optional) Reference to the data disks to be attached to the VM. | +|dataDisks.[].diskSize |string|Size (in Quantity format) of the disk attached to the VM. The minimum diskSize is `1GB`. | +|dataDisks.[].deviceProperties |object|(Optional) Reference to the properties of the disk device. | +|dataDisks.[].deviceProperties.deviceType |string|VM disk device type. Allowed values: `Disk` (default) and `CDRom` | +|dataDisks.[].deviceProperties.adapterType |string|Adapter type of the disk address. | +|dataDisks.[].deviceProperties.deviceIndex |int |(Optional) Index of the disk address. Allowed values: non-negative integers (default: `0`) | +|dataDisks.[].storageConfig |object|(Optional) Reference to the storage configuration parameters of the VM disks. | +|dataDisks.[].storageConfig.diskMode |string|Specifies the disk mode. Allowed values: `Standard` (default) and `Flash` | +|dataDisks.[].storageConfig.storageContainer |object|(Optional) Reference (name or uuid) to the storage_container used by the VM disk. | +|dataDisks.[].storageConfig.storageContainer.type |string|Type to identify the storage container. Allowed values: `name` and `uuid` | +|dataDisks.[].storageConfig.storageContainer.name |string|Name of the storage container. | +|dataDisks.[].storageConfig.storageContainer.uuid |string|UUID of the storage container. | +|dataDisks.[].dataSource |object|(Optional) Reference (name or uuid) to a data source image for the VM disk. | +|dataDisks.[].dataSource.type |string|Type to identify the data source image. Allowed values: `name` and `uuid` | +|dataDisks.[].dataSource.name |string|Name of the data source image. | +|dataDisks.[].dataSource.uuid |string|UUID of the data source image. | + +!!! note + - The `cluster` or `subnets` configuration parameters are optional in case failure domains are defined on the `NutanixCluster` and `MachineDeployment` resources. + - If the `deviceType` is `Disk`, the valid `adapterType` can be `SCSI`, `IDE`, `PCI`, `SATA` or `SPAPR`. If the `deviceType` is `CDRom`, the valid `adapterType` can be `IDE` or `SATA`. + - Either of `image` or `imageLookup` must be set, but not both. + - For a Machine VM, the `deviceIndex` for the disks with the same `deviceType.adapterType` combination should start from `0` and increase consecutively afterwards. Note that for each Machine VM, the `Disk.SCSI.0` and `CDRom.IDE.0` are reserved to be used by the VM's system. So for `dataDisks` of Disk.SCSI and CDRom.IDE, the `deviceIndex` should start from `1`. \ No newline at end of file diff --git a/docs/capx/v1.6.x/user_requirements.md b/docs/capx/v1.6.x/user_requirements.md new file mode 100644 index 00000000..05e971a5 --- /dev/null +++ b/docs/capx/v1.6.x/user_requirements.md @@ -0,0 +1,37 @@ +# User Requirements + +Cluster API Provider Nutanix Cloud Infrastructure (CAPX) interacts with Nutanix Prism Central (PC) APIs using a Prism Central user account. + +CAPX supports two types of PC users: + +- Local users: must be assigned the `Prism Central Admin` role. +- Domain users: must be assigned a role that at least has the [Minimum required CAPX permissions for domain users](#minimum-required-capx-permissions-for-domain-users) assigned. + +See [Credential Management](./credential_management.md){target=_blank} for more information on how to pass the user credentials to CAPX. + +## Minimum required CAPX permissions for domain users + +The following permissions are required for Prism Central domain users: + +- Create Category Mapping +- Create Image +- Create Or Update Name Category +- Create Or Update Value Category +- Create Virtual Machine +- Delete Category Mapping +- Delete Image +- Delete Name Category +- Delete Value Category +- Delete Virtual Machine +- Detach Volume Group From AHV VM +- View Category Mapping +- View Cluster +- View Image +- View Name Category +- View Project +- View Subnet +- View Value Category +- View Virtual Machine + +!!! note + The list of permissions has been validated on PC 2022.6 and above. diff --git a/docs/capx/v1.6.x/validated_integrations.md b/docs/capx/v1.6.x/validated_integrations.md new file mode 100644 index 00000000..6240f4b7 --- /dev/null +++ b/docs/capx/v1.6.x/validated_integrations.md @@ -0,0 +1,52 @@ +# Validated Integrations + +Validated integrations are a defined set of specifically tested configurations between technologies that represent the most common combinations that Nutanix customers are using or deploying with CAPX. For these integrations, Nutanix has directly, or through certified partners, exercised a full range of platform tests as part of the product release process. + +## Integration Validation Policy + +Nutanix follows the version validation policies below: + +- Validate at least one active AOS LTS (long term support) version. Validated AOS LTS version for a specific CAPX version is listed in the [AOS](#aos) section.
+ + !!! note + + Typically the latest LTS release at time of CAPX release except when latest is initial release in train (eg x.y.0). Exact version depends on timing and customer adoption. + +- Validate the latest AOS STS (short term support) release at time of CAPX release. +- Validate at least one active Prism Central (PC) version. Validated PC version for a specific CAPX version is listed in the [Prism Central](#prism-central) section.
+ + !!! note + + Typically the the latest PC release at time of CAPX release except when latest is initial release in train (eg x.y.0). Exact version depends on timing and customer adoption. + +- At least one active Cluster-API (CAPI) version. Validated CAPI version for a specific CAPX version is listed in the [Cluster-API](#cluster-api) section.
+ + !!! note + + Typically the the latest Cluster-API release at time of CAPX release except when latest is initial release in train (eg x.y.0). Exact version depends on timing and customer adoption. + +## Validated versions +### Cluster-API +| CAPX | CAPI v1.3.x | CAPI v1.4.x | CAPI v1.5.x | CAPI v1.6.x | CAPI v1.7.x | CAPI v1.8.x | CAPI v1.9.x | +|--------|-------------|-------------|-------------|-------------|-------------|-------------|-------------| +| v1.6.x | Yes | Yes | Yes | Yes | Yes | Yes | Yes | +| v1.5.x | Yes | Yes | Yes | Yes | Yes | Yes | No | +| v1.4.x | Yes | Yes | Yes | Yes | Yes | No | No | + +See the [Validated Kubernetes Versions](https://cluster-api.sigs.k8s.io/reference/versions.html?highlight=version#supported-kubernetes-versions){target=_blank} page for more information on CAPI validated versions. + +### AOS + +| CAPX | 6.5.x (LTS) | 6.8 (STS) | 6.10 | 7.0 | 7.3 | +|--------|-------------|-----------|------|-----|-----| +| v1.6.x | No | Yes | Yes | Yes | Yes | +| v1.5.x | Yes | Yes | Yes | Yes | Yes | +| v1.4.x | Yes | Yes | No | No | No | + +### Prism Central + +| CAPX | pc.2022.6 | pc.2023.x | pc.2024.x | pc.7.3 | +|--------|-----------|-----------|-----------|--------| +| v1.6.x | No | Yes | Yes | Yes | +| v1.5.x | Yes | Yes | Yes | Yes | +| v1.4.x | Yes | Yes | Yes | No | diff --git a/docs/capx/v1.7.x/addons/install_csi_driver.md b/docs/capx/v1.7.x/addons/install_csi_driver.md new file mode 100644 index 00000000..afb4bdc8 --- /dev/null +++ b/docs/capx/v1.7.x/addons/install_csi_driver.md @@ -0,0 +1,215 @@ +# Nutanix CSI Driver installation with CAPX + +The Nutanix CSI driver is fully supported on CAPI/CAPX deployed clusters where all the nodes meet the [Nutanix CSI driver prerequisites](#capi-workload-cluster-prerequisites-for-the-nutanix-csi-driver). + +There are three methods to install the Nutanix CSI driver on a CAPI/CAPX cluster: + +- Helm +- ClusterResourceSet +- CAPX Flavor + +For more information, check the next sections. + +## CAPI Workload cluster prerequisites for the Nutanix CSI Driver + +Kubernetes workers need the following prerequisites to use the Nutanix CSI Drivers: + +- iSCSI initiator package (for Volumes based block storage) +- NFS client package (for Files based storage) + +These packages may already be present in the image you use with your infrastructure provider or you can also rely on your bootstrap provider to install them. More info is available in the [Prerequisites docs](https://portal.nutanix.com/page/documents/details?targetId=CSI-Volume-Driver-v2_6:csi-csi-plugin-prerequisites-r.html){target=_blank}. + +The package names and installation method will also vary depending on the operating system you plan to use. + +In the example below, `kubeadm` bootstrap provider is used to deploy these packages on top of an Ubuntu 20.04 image. The `kubeadm` bootstrap provider allows defining `preKubeadmCommands` that will be launched before Kubernetes cluster creation. These `preKubeadmCommands` can be defined both in `KubeadmControlPlane` for master nodes and in `KubeadmConfigTemplate` for worker nodes. + +In the example with an Ubuntu 20.04 image, both `KubeadmControlPlane` and `KubeadmConfigTemplate` must be modified as in the example below: + +```yaml +spec: + template: + spec: + # ....... + preKubeadmCommands: + - echo "before kubeadm call" > /var/log/prekubeadm.log + - apt update + - apt install -y nfs-common open-iscsi + - systemctl enable --now iscsid +``` +## Install the Nutanix CSI Driver with Helm + +A recent [Helm](https://helm.sh){target=_blank} version is needed (tested with Helm v3.10.1). + +The example below must be applied on a ready workload cluster. The workload cluster's kubeconfig can be retrieved and used to connect with the following command: + +```shell +clusterctl get kubeconfig $CLUSTER_NAME -n $CLUSTER_NAMESPACE > $CLUSTER_NAME-KUBECONFIG +export KUBECONFIG=$(pwd)/$CLUSTER_NAME-KUBECONFIG +``` + +Once connected to the cluster, follow the [CSI documentation](https://portal.nutanix.com/page/documents/details?targetId=CSI-Volume-Driver-v2_6:csi-csi-driver-install-t.html){target=_blank}. + +First, install the [nutanix-csi-snapshot](https://github.com/nutanix/helm/tree/master/charts/nutanix-csi-snapshot){target=_blank} chart followed by the [nutanix-csi-storage](https://github.com/nutanix/helm/tree/master/charts/nutanix-csi-storage){target=_blank} chart. + +See an example below: + +```shell +#Add the official Nutanix Helm repo and get the latest update +helm repo add nutanix https://nutanix.github.io/helm/ +helm repo update + +# Install the nutanix-csi-snapshot chart +helm install nutanix-csi-snapshot nutanix/nutanix-csi-snapshot -n ntnx-system --create-namespace + +# Install the nutanix-csi-storage chart +helm install nutanix-storage nutanix/nutanix-csi-storage -n ntnx-system --set createSecret=false +``` + +!!! warning + For correct Nutanix CSI driver deployment, a fully functional CNI deployment must be present. + +## Install the Nutanix CSI Driver with `ClusterResourceSet` + +The `ClusterResourceSet` feature was introduced to automatically apply a set of resources (such as CNI/CSI) defined by administrators to matching created/existing workload clusters. + +### Enabling the `ClusterResourceSet` feature + +At the time of writing, `ClusterResourceSet` is an experimental feature that must be enabled during the initialization of a management cluster with the `EXP_CLUSTER_RESOURCE_SET` feature gate. + +To do this, add `EXP_CLUSTER_RESOURCE_SET: "true"` in the `clusterctl` configuration file or just `export EXP_CLUSTER_RESOURCE_SET=true` before initializing the management cluster with `clusterctl init`. + +If the management cluster is already initialized, the `ClusterResourceSet` can be enabled by changing the configuration of the `capi-controller-manager` deployment in the `capi-system` namespace. + + ```shell + kubectl edit deployment -n capi-system capi-controller-manager + ``` + +Locate the section below: + +```yaml + - args: + - --leader-elect + - --metrics-bind-addr=localhost:8080 + - --feature-gates=MachinePool=false,ClusterResourceSet=true,ClusterTopology=false +``` + +Then replace `ClusterResourceSet=false` with `ClusterResourceSet=true`. + +!!! note + Editing the `deployment` resource will cause Kubernetes to automatically start new versions of the containers with the feature enabled. + + + +### Prepare the Nutanix CSI `ClusterResourceSet` + +#### Create the `ConfigMap` for the CSI Plugin + +First, create a `ConfigMap` that contains a YAML manifest with all resources to install the Nutanix CSI driver. + +Since the Nutanix CSI Driver is provided as a Helm chart, use `helm` to extract it before creating the `ConfigMap`. See an example below: + +```shell +helm repo add nutanix https://nutanix.github.io/helm/ +helm repo update + +kubectl create ns ntnx-system --dry-run=client -o yaml > nutanix-csi-namespace.yaml +helm template nutanix-csi-snapshot nutanix/nutanix-csi-snapshot -n ntnx-system > nutanix-csi-snapshot.yaml +helm template nutanix-csi-snapshot nutanix/nutanix-csi-storage -n ntnx-system > nutanix-csi-storage.yaml + +kubectl create configmap nutanix-csi-crs --from-file=nutanix-csi-namespace.yaml --from-file=nutanix-csi-snapshot.yaml --from-file=nutanix-csi-storage.yaml +``` + +#### Create the `ClusterResourceSet` + +Next, create the `ClusterResourceSet` resource that will map the `ConfigMap` defined above to clusters using a `clusterSelector`. + +The `ClusterResourceSet` needs to be created inside the management cluster. See an example below: + +```yaml +--- +apiVersion: addons.cluster.x-k8s.io/v1alpha3 +kind: ClusterResourceSet +metadata: + name: nutanix-csi-crs +spec: + clusterSelector: + matchLabels: + csi: nutanix + resources: + - kind: ConfigMap + name: nutanix-csi-crs +``` + +The `clusterSelector` field controls how Cluster API will match this `ClusterResourceSet` on one or more workload clusters. In the example scenario, the `matchLabels` approach is being used where the `ClusterResourceSet` will be applied to all workload clusters having the `csi: nutanix` label present. If the label isn't present, the `ClusterResourceSet` won't apply to that workload cluster. + +The `resources` field references the `ConfigMap` created above, which contains the manifests for installing the Nutanix CSI driver. + +#### Assign the `ClusterResourceSet` to a workload cluster + +Assign this `ClusterResourceSet` to the workload cluster by adding the correct label to the `Cluster` resource. + +This can be done before workload cluster creation by editing the output of the `clusterctl generate cluster` command or by modifying an already deployed workload cluster. + +In both cases, `Cluster` resources should look like this: + +```yaml +apiVersion: cluster.x-k8s.io/v1beta1 +kind: Cluster +metadata: + name: workload-cluster-name + namespace: workload-cluster-namespace + labels: + csi: nutanix +# ... +``` + +!!! warning + For correct Nutanix CSI driver deployment, a fully functional CNI deployment must be present. + +## Install the Nutanix CSI Driver with a CAPX flavor + +The CAPX provider can utilize a flavor to automatically deploy the Nutanix CSI using a `ClusterResourceSet`. + +### Prerequisites + +The following requirements must be met: + +- The operating system must meet the [Nutanix CSI OS prerequisites](#capi-workload-cluster-prerequisites-for-the-nutanix-csi-driver). +- The Management cluster must be installed with the [`CLUSTER_RESOURCE_SET` feature gate](#enabling-the-clusterresourceset-feature). + +### Installation + +Specify the `csi` flavor during workload cluster creation. See an example below: + +```shell +clusterctl generate cluster my-cluster -f csi +``` + +Additional environment variables are required: + +- `WEBHOOK_CA`: Base64 encoded CA certificate used to sign the webhook certificate +- `WEBHOOK_CERT`: Base64 certificate for the webhook validation component +- `WEBHOOK_KEY`: Base64 key for the webhook validation component + +The three components referenced above can be automatically created and referenced using [this script](https://github.com/nutanix-cloud-native/cluster-api-provider-nutanix/blob/main/scripts/gen-self-cert.sh){target=_blank}: + +``` +source scripts/gen-self-cert.sh +``` + +The certificate must reference the following names: + +- csi-snapshot-webhook +- csi-snapshot-webhook.ntnx-sytem +- csi-snapshot-webhook.ntnx-sytem.svc + +!!! warning + For correct Nutanix CSI driver deployment, a fully functional CNI deployment must be present. + +## Nutanix CSI Driver Configuration + +After the driver is installed, it must be configured for use by minimally defining a `Secret` and `StorageClass`. + +This can be done manually in the workload clusters or by using a `ClusterResourceSet` in the management cluster as explained above. + +See the Official [CSI Driver documentation](https://portal.nutanix.com/page/documents/details?targetId=CSI-Volume-Driver-v2_6:CSI-Volume-Driver-v2_6){target=_blank} on the Nutanix Portal for more configuration information. diff --git a/docs/capx/v1.7.x/credential_management.md b/docs/capx/v1.7.x/credential_management.md new file mode 100644 index 00000000..bebbc5a0 --- /dev/null +++ b/docs/capx/v1.7.x/credential_management.md @@ -0,0 +1,93 @@ +# Credential Management +Cluster API Provider Nutanix Cloud Infrastructure (CAPX) interacts with Nutanix Prism Central (PC) APIs to manage the required Kubernetes cluster infrastructure resources. + +PC credentials are required to authenticate to the PC APIs. CAPX currently supports two mechanisms to supply the required credentials: + +- Credentials injected into the CAPX manager deployment +- Workload cluster specific credentials + +## Credentials injected into the CAPX manager deployment +By default, credentials will be injected into the CAPX manager deployment when CAPX is initialized. See the [getting started guide](./getting_started.md) for more information on the initialization. + +Upon initialization a `nutanix-creds` secret will automatically be created in the `capx-system` namespace. This secret will contain the values supplied via the `NUTANIX_USER` and `NUTANIX_PASSWORD` parameters. + +The `nutanix-creds` secret will be used for workload cluster deployment if no other credential is supplied. + +### Example +An example of the automatically created `nutanix-creds` secret can be found below: +```yaml +--- +apiVersion: v1 +kind: Secret +type: Opaque +metadata: + name: nutanix-creds + namespace: capx-system +stringData: + credentials: | + [ + { + "type": "basic_auth", + "data": { + "prismCentral":{ + "username": "", + "password": "" + }, + "prismElements": null + } + } + ] +``` + +## Workload cluster specific credentials +Users can override the [credentials injected in CAPX manager deployment](#credentials-injected-into-the-capx-manager-deployment) by supplying a credential specific to a workload cluster. The credentials can be supplied by creating a secret in the same namespace as the `NutanixCluster` namespace. + +The secret can be referenced by adding a `credentialRef` inside the `prismCentral` attribute contained in the `NutanixCluster`. +The secret will also be deleted when the `NutanixCluster` is deleted. + +Note: There is a 1:1 relation between the secret and the `NutanixCluster` object. + +### Example +Create a secret in the namespace of the `NutanixCluster`: + +```yaml +--- +apiVersion: v1 +kind: Secret +metadata: + name: "" + namespace: "" +stringData: + credentials: | + [ + { + "type": "basic_auth", + "data": { + "prismCentral":{ + "username": "", + "password": "" + }, + "prismElements": null + } + } + ] +``` + +Add a `prismCentral` and corresponding `credentialRef` to the `NutanixCluster`: + +```yaml +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixCluster +metadata: + name: "" + namespace: "" +spec: + prismCentral: + ... + credentialRef: + name: "" + kind: Secret +... +``` + +See the [NutanixCluster](./types/nutanix_cluster.md) documentation for all supported configuration parameters for the `prismCentral` and `credentialRef` attribute. \ No newline at end of file diff --git a/docs/capx/v1.7.x/experimental/autoscaler.md b/docs/capx/v1.7.x/experimental/autoscaler.md new file mode 100644 index 00000000..2af57213 --- /dev/null +++ b/docs/capx/v1.7.x/experimental/autoscaler.md @@ -0,0 +1,129 @@ +# Using Autoscaler in combination with CAPX + +!!! warning + The scenario and features described on this page are experimental. It's important to note that they have not been fully validated. + +[Autoscaler](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/README.md){target=_blank} can be used in combination with Cluster API to automatically add or remove machines in a cluster. + +Autoscaler can be used in different deployment scenarios. This page will provide an overview of multiple autoscaler deployment scenarios in combination with CAPX. +See the [Testing](#testing) section to see how scale-up/scale-down events can be triggered to validate the autoscaler behaviour. + +More in-depth information on Autoscaler functionality can be found in the [Kubernetes documentation](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/README.md){target=_blank}. + +All Autoscaler configuration parameters can be found [here](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-the-parameters-to-ca){target=_blank}. + +## Scenario 1: Management cluster managing an external workload cluster +In this scenario, Autoscaler will be running on a management cluster and it will manage an external workload cluster. See the management cluster managing an external workload cluster section of [Kubernetes documentation](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/README.md#autoscaler-running-in-management-cluster-using-service-account-credentials-with-separate-workload-cluster){target=_blank} for more information. + +### Steps +1. Deploy a management cluster and workload cluster. The [CAPI quickstart](https://cluster-api.sigs.k8s.io/user/quick-start.html){target=_blank} can be used as a starting point. + + !!! note + Make sure a CNI is installed in the workload cluster. + +4. Download the example [Autoscaler deployment file](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/examples/deployment.yaml){target=_blank}. +5. Modify the `deployment.yaml` file: + - Change the namespace of all resources to the namespaces of the workload cluster. + - Choose an autoscale image. + - Change the following parameters in the `Deployment` resource: +```YAML + spec: + containers: + name: cluster-autoscaler + command: + - /cluster-autoscaler + args: + - --cloud-provider=clusterapi + - --kubeconfig=/mnt/kubeconfig/kubeconfig.yml + - --clusterapi-cloud-config-authoritative + - -v=1 + volumeMounts: + - mountPath: /mnt/kubeconfig + name: kubeconfig + readOnly: true + ... + volumes: + - name: kubeconfig + secret: + secretName: -kubeconfig + items: + - key: value + path: kubeconfig.yml +``` +7. Apply the `deployment.yaml` file. +```bash +kubectl apply -f deployment.yaml +``` +8. Add the [annotations](#autoscaler-node-group-annotations) to the workload cluster `MachineDeployment` resource. +9. Test Autoscaler. Go to the [Testing](#testing) section. + +## Scenario 2: Autoscaler running on workload cluster +In this scenario, Autoscaler will be deployed [on top of the workload cluster](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/README.md#autoscaler-running-in-a-joined-cluster-using-service-account-credentials){target=_blank} directly. In order for Autoscaler to work, it is required that the workload cluster resources are moved from the management cluster to the workload cluster. + +### Steps +1. Deploy a management cluster and workload cluster. The [CAPI quickstart](https://cluster-api.sigs.k8s.io/user/quick-start.html){target=_blank} can be used as a starting point. +2. Get the kubeconfig file for the workload cluster and use this kubeconfig to login to the workload cluster. +```bash +clusterctl get kubeconfig -n /path/to/kubeconfig +``` +3. Install a CNI in the workload cluster. +4. Initialise the CAPX components on top of the workload cluster: +```bash +clusterctl init --infrastructure nutanix +``` +5. Migrate the workload cluster custom resources to the workload cluster. Run following command from the management cluster: +```bash +clusterctl move -n --to-kubeconfig /path/to/kubeconfig +``` +6. Verify if the cluster has been migrated by running following command on the workload cluster: +```bash +kubectl get cluster -A +``` +7. Download the example [autoscaler deployment file](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/examples/deployment.yaml){target=_blank}. +8. Create the Autoscaler namespace: +```bash +kubectl create ns autoscaler +``` +9. Apply the `deployment.yaml` file +```bash +kubectl apply -f deployment.yaml +``` +10. Add the [annotations](#autoscaler-node-group-annotations) to the workload cluster `MachineDeployment` resource. +11. Test Autoscaler. Go to the [Testing](#testing) section. + +## Testing + +1. Deploy an example Kubernetes application. For example, the one used in the [Kubernetes HorizontalPodAutoscaler Walkthrough](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/). +```bash +kubectl apply -f https://k8s.io/examples/application/php-apache.yaml +``` +2. Increase the amount of replicas of the application to trigger a scale-up event: +``` +kubectl scale deployment php-apache --replicas 100 +``` +3. Decrease the amount of replicas of the application again to trigger a scale-down event. + + !!! note + In case of issues check the logs of the Autoscaler pods. + +4. After a while CAPX, will add more machines. Refer to the [Autoscaler configuration parameters](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-the-parameters-to-ca){target=_blank} to tweak the behaviour and timeouts. + +## Autoscaler node group annotations +Autoscaler uses following annotations to define the upper and lower boundries of the managed machines: + +| Annotation | Example Value | Description | +|-------------------------------------------------------------|---------------|-----------------------------------------------| +| cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size | 5 | Maximum amount of machines in this node group | +| cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size | 1 | Minimum amount of machines in this node group | + +These annotations must be applied to the `MachineDeployment` resources of a CAPX cluster. + +### Example +```YAML +apiVersion: cluster.x-k8s.io/v1beta1 +kind: MachineDeployment +metadata: + annotations: + cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size: "5" + cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size: "1" +``` \ No newline at end of file diff --git a/docs/capx/v1.7.x/experimental/capx_multi_pe.md b/docs/capx/v1.7.x/experimental/capx_multi_pe.md new file mode 100644 index 00000000..bd52ccd7 --- /dev/null +++ b/docs/capx/v1.7.x/experimental/capx_multi_pe.md @@ -0,0 +1,30 @@ +# Creating a workload CAPX cluster spanning Prism Element clusters + +!!! warning + The scenario and features described on this page are experimental. It's important to note that they have not been fully validated. + +This page will explain how to deploy CAPX-based Kubernetes clusters where worker nodes are spanning multiple Prism Element (PE) clusters. + +!!! note + All the PE clusters must be managed by the same Prism Central (PC) instance. + +The topology will look like this: + +- One PC managing multiple PE's +- One CAPI management cluster +- One CAPI workload cluster with multiple `MachineDeployment`resources + +Refer to the [CAPI quickstart](https://cluster-api.sigs.k8s.io/user/quick-start.html){target=_blank} to get started with CAPX. + +To create workload clusters spanning multiple Prism Element clusters, it is required to create a `MachineDeployment` and `NutanixMachineTemplate` resource for each Prism Element cluster. The Prism Element specific parameters (name/UUID, subnet,...) are referenced in the `NutanixMachineTemplate`. + +## Steps +1. Create a management cluster that has the CAPX infrastructure provider deployed. +2. Create a `cluster.yml` file containing the workload cluster definition. Refer to the steps defined in the [CAPI quickstart guide](https://cluster-api.sigs.k8s.io/user/quick-start.html){target=_blank} to create an example `cluster.yml` file. +3. Add additional `MachineDeployment` and `NutanixMachineTemplate` resources. + + By default there is only one machine template and machine deployment defined. To add nodes residing on another Prism Element cluster, a new `MachineDeployment` and `NutanixMachineTemplate` resource needs to be added to the yaml file. The autogenerated `MachineDeployment` and `NutanixMachineTemplate` resource definitions can be used as a baseline. + + Make sure to modify the `MachineDeployment` and `NutanixMachineTemplate` parameters. + +4. Apply the modified `cluster.yml` file to the management cluster. diff --git a/docs/capx/v1.7.x/experimental/oidc.md b/docs/capx/v1.7.x/experimental/oidc.md new file mode 100644 index 00000000..0c274121 --- /dev/null +++ b/docs/capx/v1.7.x/experimental/oidc.md @@ -0,0 +1,31 @@ +# OIDC integration + +!!! warning + The scenario and features described on this page are experimental. It's important to note that they have not been fully validated. + +Kubernetes allows users to authenticate using various authentication mechanisms. One of these mechanisms is OIDC. Information on how Kubernetes interacts with OIDC providers can be found in the [OpenID Connect Tokens](https://kubernetes.io/docs/reference/access-authn-authz/authentication/#openid-connect-tokens){target=_blank} section of the official Kubernetes documentation. + + +Follow the steps below to configure a CAPX cluster to use an OIDC identity provider. + +## Steps +1. Generate a `cluster.yaml` file with the required CAPX cluster configuration. Refer to the [Getting Started](../getting_started.md){target=_blank} page for more information on how to generate a `cluster.yaml` file. Do not apply the `cluster.yaml` file. +2. Edit the `cluster.yaml` file and search for the `KubeadmControlPlane` resource. +3. Modify/add the `spec.kubeadmConfigSpec.clusterConfiguration.apiServer.extraArgs` attribute and add the required [API server parameters](https://kubernetes.io/docs/reference/access-authn-authz/authentication/#configuring-the-api-server){target=_blank}. See the [example](#example) below. +4. Apply the `cluster.yaml` file +5. Log in with the OIDC provider once the cluster is provisioned + +## Example +```YAML +kind: KubeadmControlPlane +spec: + kubeadmConfigSpec: + clusterConfiguration: + apiServer: + extraArgs: + ... + oidc-client-id: + oidc-issuer-url: + ... +``` + diff --git a/docs/capx/v1.7.x/experimental/proxy.md b/docs/capx/v1.7.x/experimental/proxy.md new file mode 100644 index 00000000..c8f940d4 --- /dev/null +++ b/docs/capx/v1.7.x/experimental/proxy.md @@ -0,0 +1,62 @@ +# Proxy configuration + +!!! warning + The scenario and features described on this page are experimental. It's important to note that they have not been fully validated. + +CAPX can be configured to use a proxy to connect to external networks. This proxy configuration needs to be applied to control plane and worker nodes. + +Follow the steps below to configure a CAPX cluster to use a proxy. + +## Steps +1. Generate a `cluster.yaml` file with the required CAPX cluster configuration. Refer to the [Getting Started](../getting_started.md){target=_blank} page for more information on how to generate a `cluster.yaml` file. Do not apply the `cluster.yaml` file. +2. Edit the `cluster.yaml` file and modify the following resources as shown in the [example](#example) below to add the proxy configuration. + 1. `KubeadmControlPlane`: + * Add the proxy configuration to the `spec.kubeadmConfigSpec.files` list. Do not modify other items in the list. + * Add `systemctl` commands to apply the proxy config in `spec.kubeadmConfigSpec.preKubeadmCommands`. Do not modify other items in the list. + 2. `KubeadmConfigTemplate`: + * Add the proxy configuration to the `spec.template.spec.files` list. Do not modify other items in the list. + * Add `systemctl` commands to apply the proxy config in `spec.template.spec.preKubeadmCommands`. Do not modify other items in the list. +4. Apply the `cluster.yaml` file + +## Example + +```YAML +--- +# controlplane proxy settings +kind: KubeadmControlPlane +spec: + kubeadmConfigSpec: + files: + - content: | + [Service] + Environment="HTTP_PROXY=" + Environment="HTTPS_PROXY=" + Environment="NO_PROXY=" + owner: root:root + path: /etc/systemd/system/containerd.service.d/http-proxy.conf + ... + preKubeadmCommands: + - sudo systemctl daemon-reload + - sudo systemctl restart containerd + ... +--- +# worker proxy settings +kind: KubeadmConfigTemplate +spec: + template: + spec: + files: + - content: | + [Service] + Environment="HTTP_PROXY=" + Environment="HTTPS_PROXY=" + Environment="NO_PROXY=" + owner: root:root + path: /etc/systemd/system/containerd.service.d/http-proxy.conf + ... + preKubeadmCommands: + - sudo systemctl daemon-reload + - sudo systemctl restart containerd + ... +``` + diff --git a/docs/capx/v1.7.x/experimental/registry_mirror.md b/docs/capx/v1.7.x/experimental/registry_mirror.md new file mode 100644 index 00000000..307a9425 --- /dev/null +++ b/docs/capx/v1.7.x/experimental/registry_mirror.md @@ -0,0 +1,96 @@ +# Registry Mirror configuration + +!!! warning + The scenario and features described on this page are experimental. It's important to note that they have not been fully validated. + +CAPX can be configured to use a private registry to act as a mirror of an external public registry. This registry mirror configuration needs to be applied to control plane and worker nodes. + +Follow the steps below to configure a CAPX cluster to use a registry mirror. + +## Steps +1. Generate a `cluster.yaml` file with the required CAPX cluster configuration. Refer to the [Getting Started](../getting_started.md){target=_blank} page for more information on how to generate a `cluster.yaml` file. Do not apply the `cluster.yaml` file. +2. Edit the `cluster.yaml` file and modify the following resources as shown in the [example](#example) below to add the proxy configuration. + 1. `KubeadmControlPlane`: + * Add the registry mirror configuration to the `spec.kubeadmConfigSpec.files` list. Do not modify other items in the list. + * Update `/etc/containerd/config.toml` commands to apply the registry mirror config in `spec.kubeadmConfigSpec.preKubeadmCommands`. Do not modify other items in the list. + 2. `KubeadmConfigTemplate`: + * Add the registry mirror configuration to the `spec.template.spec.files` list. Do not modify other items in the list. + * Update `/etc/containerd/config.toml` commands to apply the registry mirror config in `spec.template.spec.preKubeadmCommands`. Do not modify other items in the list. +4. Apply the `cluster.yaml` file + +## Example + +This example will configure a registry mirror for the following namespace: + +* registry.k8s.io +* ghcr.io +* quay.io + +and redirect them to corresponding projects of the `` registry. + +```YAML +--- +# controlplane proxy settings +kind: KubeadmControlPlane +spec: + kubeadmConfigSpec: + files: + - content: | + [host."https:///v2/registry.k8s.io"] + capabilities = ["pull", "resolve"] + skip_verify = false + override_path = true + owner: root:root + path: /etc/containerd/certs.d/registry.k8s.io/hosts.toml + - content: | + [host."https:///v2/ghcr.io"] + capabilities = ["pull", "resolve"] + skip_verify = false + override_path = true + owner: root:root + path: /etc/containerd/certs.d/ghcr.io/hosts.toml + - content: | + [host."https:///v2/quay.io"] + capabilities = ["pull", "resolve"] + skip_verify = false + override_path = true + owner: root:root + path: /etc/containerd/certs.d/quay.io/hosts.toml + ... + preKubeadmCommands: + - echo '\n[plugins."io.containerd.grpc.v1.cri".registry]\n config_path = "/etc/containerd/certs.d"' >> /etc/containerd/config.toml + ... +--- +# worker proxy settings +kind: KubeadmConfigTemplate +spec: + template: + spec: + files: + - content: | + [host."https:///v2/registry.k8s.io"] + capabilities = ["pull", "resolve"] + skip_verify = false + override_path = true + owner: root:root + path: /etc/containerd/certs.d/registry.k8s.io/hosts.toml + - content: | + [host."https:///v2/ghcr.io"] + capabilities = ["pull", "resolve"] + skip_verify = false + override_path = true + owner: root:root + path: /etc/containerd/certs.d/ghcr.io/hosts.toml + - content: | + [host."https:///v2/quay.io"] + capabilities = ["pull", "resolve"] + skip_verify = false + override_path = true + owner: root:root + path: /etc/containerd/certs.d/quay.io/hosts.toml + ... + preKubeadmCommands: + - echo '\n[plugins."io.containerd.grpc.v1.cri".registry]\n config_path = "/etc/containerd/certs.d"' >> /etc/containerd/config.toml + ... +``` + diff --git a/docs/capx/v1.7.x/experimental/vpc.md b/docs/capx/v1.7.x/experimental/vpc.md new file mode 100644 index 00000000..3513e47e --- /dev/null +++ b/docs/capx/v1.7.x/experimental/vpc.md @@ -0,0 +1,40 @@ +# Creating a workload CAPX cluster in a Nutanix Flow VPC + +!!! warning + The scenario and features described on this page are experimental. It's important to note that they have not been fully validated. + +!!! note + Nutanix Flow VPCs are only validated with CAPX 1.1.3+ + +[Nutanix Flow Virtual Networking](https://portal.nutanix.com/page/documents/details?targetId=Nutanix-Flow-Virtual-Networking-Guide-vpc_2022_9:Nutanix-Flow-Virtual-Networking-Guide-vpc_2022_9){target=_blank} allows users to create Virtual Private Clouds (VPCs) with Overlay networking. +The steps below will illustrate how a CAPX cluster can be deployed inside an overlay subnet (NAT) inside a VPC while the management cluster resides outside of the VPC. + + +## Steps +1. [Request a floating IP](https://portal.nutanix.com/page/documents/details?targetId=Nutanix-Flow-Networking-Guide:ear-flow-nw-request-floating-ip-pc-t.html){target=_blank} +2. Link the floating IP to an internal IP address inside the overlay subnet that will be used to deploy the CAPX cluster. This address will be assigned to the CAPX loadbalancer. To prevent IP conflicts, make sure the IP address is not part of the IP-pool defined in the subnet. +3. Generate a `cluster.yaml` file with the required CAPX cluster configuration where the `CONTROL_PLANE_ENDPOINT_IP` is set to the floating IP requested in the first step. Refer to the [Getting Started](../getting_started.md){target=_blank} page for more information on how to generate a `cluster.yaml` file. Do not apply the `cluster.yaml` file. +4. Edit the `cluster.yaml` file and search for the `KubeadmControlPlane` resource. +5. Modify the `spec.kubeadmConfigSpec.files.*.content` attribute and change the `kube-vip` definition similar to the [example](#example) below. +6. Apply the `cluster.yaml` file. +7. When the CAPX workload cluster is deployed, it will be reachable via the floating IP. + +## Example +```YAML +kind: KubeadmControlPlane +spec: + kubeadmConfigSpec: + files: + - content: | + apiVersion: v1 + kind: Pod + metadata: + name: kube-vip + namespace: kube-system + spec: + containers: + - env: + - name: address + value: "" +``` + diff --git a/docs/capx/v1.7.x/getting_started.md b/docs/capx/v1.7.x/getting_started.md new file mode 100644 index 00000000..f4dbc487 --- /dev/null +++ b/docs/capx/v1.7.x/getting_started.md @@ -0,0 +1,280 @@ +# Getting Started + +This is a guide on getting started with Cluster API Provider Nutanix Cloud Infrastructure (CAPX). To learn more about cluster API in more depth, check out the [Cluster API book](https://cluster-api.sigs.k8s.io/){target=_blank}. + +For more information on how install the Nutanix CSI Driver on a CAPX cluster, visit [Nutanix CSI Driver installation with CAPX](./addons/install_csi_driver.md). + +For more information on how CAPX handles credentials, visit [Credential Management](./credential_management.md). + +For more information on the port requirements for CAPX, visit [Port Requirements](./port_requirements.md). + +!!! note + [Nutanix Cloud Controller Manager (CCM)](../../ccm/latest/overview.md) is a mandatory component starting from CAPX v1.3.0. Ensure all CAPX-managed Kubernetes clusters are configured to use Nutanix CCM before upgrading to v1.3.0 or later. See [CAPX v7.x Upgrade Procedure](./tasks/capx_v17x_upgrade_procedure.md). + +## Production Workflow + +### Build OS image for NutanixMachineTemplate resource +Cluster API Provider Nutanix Cloud Infrastructure (CAPX) uses the [Image Builder](https://image-builder.sigs.k8s.io/){target=_blank} project to build OS images used for the Nutanix machines. + +Follow the steps detailed in [Building CAPI Images for Nutanix Cloud Platform (NCP)](https://image-builder.sigs.k8s.io/capi/providers/nutanix.html#building-capi-images-for-nutanix-cloud-platform-ncp){target=_blank} to use Image Builder on the Nutanix Cloud Platform. + +For a list of operating systems visit the OS image [Configuration](https://image-builder.sigs.k8s.io/capi/providers/nutanix.html#configuration){target=_blank} page. + +### Prerequisites for using Cluster API Provider Nutanix Cloud Infrastructure +The [Cluster API installation](https://cluster-api.sigs.k8s.io/user/quick-start.html#installation){target=_blank} section provides an overview of all required prerequisites: + +- [Common Prerequisites](https://cluster-api.sigs.k8s.io/user/quick-start.html#common-prerequisites){target=_blank} +- [Install and/or configure a Kubernetes cluster](https://cluster-api.sigs.k8s.io/user/quick-start.html#install-andor-configure-a-kubernetes-cluster){target=_blank} +- [Install clusterctl](https://cluster-api.sigs.k8s.io/user/quick-start.html#install-clusterctl){target=_blank} +- (Optional) [Enabling Feature Gates](https://cluster-api.sigs.k8s.io/user/quick-start.html#enabling-feature-gates){target=_blank} + +Make sure these prerequisites have been met before moving to the [Configure and Install Cluster API Provider Nutanix Cloud Infrastructure](#configure-and-install-cluster-api-provider-nutanix-cloud-infrastructure) step. + +### Configure and Install Cluster API Provider Nutanix Cloud Infrastructure +To initialize Cluster API Provider Nutanix Cloud Infrastructure, `clusterctl` requires the following variables, which should be set in either `~/.cluster-api/clusterctl.yaml` or as environment variables. +``` +NUTANIX_ENDPOINT: "" # IP or FQDN of Prism Central +NUTANIX_USER: "" # Prism Central user +NUTANIX_PASSWORD: "" # Prism Central password +NUTANIX_INSECURE: false # or true + +KUBERNETES_VERSION: "v1.22.9" +WORKER_MACHINE_COUNT: 3 +NUTANIX_SSH_AUTHORIZED_KEY: "" + +NUTANIX_PRISM_ELEMENT_CLUSTER_NAME: "" +NUTANIX_MACHINE_TEMPLATE_IMAGE_NAME: "" +NUTANIX_SUBNET_NAME: "" + +EXP_CLUSTER_RESOURCE_SET: true # Required for Nutanix CCM installation +``` + +You can also see the required list of variables by running the following: +``` +clusterctl generate cluster mycluster -i nutanix --list-variables +Required Variables: + - CONTROL_PLANE_ENDPOINT_IP + - KUBERNETES_VERSION + - NUTANIX_ENDPOINT + - NUTANIX_MACHINE_TEMPLATE_IMAGE_NAME + - NUTANIX_PASSWORD + - NUTANIX_PRISM_ELEMENT_CLUSTER_NAME + - NUTANIX_SSH_AUTHORIZED_KEY + - NUTANIX_SUBNET_NAME + - NUTANIX_USER + +Optional Variables: + - CONTROL_PLANE_ENDPOINT_PORT (defaults to "6443") + - CONTROL_PLANE_MACHINE_COUNT (defaults to 1) + - KUBEVIP_LB_ENABLE (defaults to "false") + - KUBEVIP_SVC_ENABLE (defaults to "false") + - NAMESPACE (defaults to current Namespace in the KubeConfig file) + - NUTANIX_INSECURE (defaults to "false") + - NUTANIX_MACHINE_BOOT_TYPE (defaults to "legacy") + - NUTANIX_MACHINE_MEMORY_SIZE (defaults to "4Gi") + - NUTANIX_MACHINE_VCPU_PER_SOCKET (defaults to "1") + - NUTANIX_MACHINE_VCPU_SOCKET (defaults to "2") + - NUTANIX_PORT (defaults to "9440") + - NUTANIX_SYSTEMDISK_SIZE (defaults to "40Gi") + - WORKER_MACHINE_COUNT (defaults to 0) +``` + +!!! note + To prevent duplicate IP assignments, it is required to assign an IP-address to the `CONTROL_PLANE_ENDPOINT_IP` variable that is not part of the Nutanix IPAM or DHCP range assigned to the subnet of the CAPX cluster. + +!!! warning + Make sure [Cluster Resource Set (CRS)](https://cluster-api.sigs.k8s.io/tasks/experimental-features/cluster-resource-set){target=_blank} is enabled before running `clusterctl init` + +Now you can instantiate Cluster API with the following: +``` +clusterctl init -i nutanix +``` + +### Deploy a workload cluster on Nutanix Cloud Infrastructure +``` +export TEST_CLUSTER_NAME=mytestcluster1 +export TEST_NAMESPACE=mytestnamespace +CONTROL_PLANE_ENDPOINT_IP=x.x.x.x clusterctl generate cluster ${TEST_CLUSTER_NAME} \ + -i nutanix \ + --target-namespace ${TEST_NAMESPACE} \ + --kubernetes-version v1.22.9 \ + --control-plane-machine-count 1 \ + --worker-machine-count 3 > ./cluster.yaml +kubectl create ns ${TEST_NAMESPACE} +kubectl apply -f ./cluster.yaml -n ${TEST_NAMESPACE} +``` +To customize the configuration of the default `cluster.yaml` file generated by CAPX, visit the [NutanixCluster](./types/nutanix_cluster.md) and [NutanixMachineTemplate](./types/nutanix_machine_template.md) documentation. + +### Access a workload cluster +To access resources on the cluster, you can get the kubeconfig with the following: +``` +clusterctl get kubeconfig ${TEST_CLUSTER_NAME} -n ${TEST_NAMESPACE} > ${TEST_CLUSTER_NAME}.kubeconfig +kubectl --kubeconfig ./${TEST_CLUSTER_NAME}.kubeconfig get nodes +``` + +### Install CNI on a workload cluster + +You must deploy a Container Network Interface (CNI) based pod network add-on so that your pods can communicate with each other. Cluster DNS (CoreDNS) will not start up before a network is installed. + +!!! note + Take care that your pod network must not overlap with any of the host networks. You are likely to see problems if there is any overlap. If you find a collision between your network plugin's preferred pod network and some of your host networks, you must choose a suitable alternative CIDR block to use instead. It can be configured inside the `cluster.yaml` generated by `clusterctl generate cluster` before applying it. + +Several external projects provide Kubernetes pod networks using CNI, some of which also support [Network Policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/){target=_blank}. + +See a list of add-ons that implement the [Kubernetes networking model](https://kubernetes.io/docs/concepts/cluster-administration/networking/#how-to-implement-the-kubernetes-network-model){target=_blank}. At time of writing, the most common are [Calico](https://www.tigera.io/project-calico/){target=_blank} and [Cilium](https://cilium.io){target=_blank}. + +Follow the specific install guide for your selected CNI and install only one pod network per cluster. + +Once a pod network has been installed, you can confirm that it is working by checking that the CoreDNS pod is running in the output of `kubectl get pods --all-namespaces`. + +### Add Failure Domain to Cluster + +To update your cluster to use new or modified failure domains after initial deployment, follow these steps: + +1. Create NutanixFailureDomain resource + + For example, define a failure domain in example.yaml: +``` +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixFailureDomain +metadata: + name: fd-domain-1 +spec: + prismElementCluster: + type: name + name: "PrismClusterA" + subnets: + - type: name + name: "SubnetA" + - type: name + name: "SubnetB" +``` + +2. Apply the resource + +``` +kubectl apply -f example.yaml +``` + +3. Edit the NutanixCluster resource to reference the failure domain(s) + +``` +kubectl edit nutanixcluster -n +``` + + In the spec section, add the controlPlaneFailureDomains field: + +``` +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixCluster +metadata: +spec: + controlPlaneFailureDomains: # add controlPlaneFailureDomains + - name: "fd-domain-1" # failureDomain name + - name: "fd-domain-2" # failureDomain name + controlPlaneEndpoint: + prismCentral: +``` + +4. Verify the update + + Check that the failure domains are registered with the cluster: + +``` +kubectl get cluster -n -o yaml +``` + + Look for the failureDomains in status section: + +``` +failureDomains: + fd-domain-1: + controlPlane: true + fd-domain-2: + controlPlane: true +``` + +### Add Failure Domain to MachineDeployment + +To associate a MachineDeployment with a specific failure domain: + +1. Export the MachineDeployment definition + +``` +kubectl get machinedeployments -n -o yaml > machinedeployment.yaml +``` + +2. Edit the manifest to add the failure domain + + Under spec.template.spec, add a failureDomain field: + +``` +apiVersion: cluster.x-k8s.io/v1beta1 +kind: MachineDeployment +metadata: + name: your-machinedeployment + namespace: your-namespace +spec: + replicas: 3 + selector: + matchLabels: + cluster.x-k8s.io/deployment-name: your-machinedeployment + template: + metadata: + labels: + cluster.x-k8s.io/deployment-name: your-machinedeployment + spec: + failureDomain: "fd-domain-1" + # other fields like bootstrap, infrastructureRef ... +``` + +3. Apply the changes + +``` +kubectl apply -f machinedeployment.yaml +``` + +4. Verify the Update + + Confirm that the failure domain field was updated: + +``` +kubectl get machinedeployments -n -o yaml | grep failureDomain +``` + +5. Check placement of machines + + Ensure new machines are placed in the specified failure domain: + +``` +kubectl get machines -l cluster.x-k8s.io/deployment-name= -n -o yaml +``` + +### Kube-vip settings + +Kube-vip is a true load balancing solution for the Kubernetes control plane. It distributes API requests across control plane nodes. It also has the capability to provide load balancing for Kubernetes services. + +You can tweak kube-vip settings by using the following properties: + +- `KUBEVIP_LB_ENABLE` + +This setting allows control plane load balancing using IPVS. See +[Control Plane Load-Balancing documentation](https://kube-vip.io/docs/about/architecture/#control-plane-load-balancing){target=_blank} for further information. + +- `KUBEVIP_SVC_ENABLE` + +This setting enables a service of type LoadBalancer. See +[Kubernetes Service Load Balancing documentation](https://kube-vip.io/docs/about/architecture/#kubernetes-service-load-balancing){target=_blank} for further information. + +- `KUBEVIP_SVC_ELECTION` + +This setting enables Load Balancing of Load Balancers. See [Load Balancing Load Balancers](https://kube-vip.io/docs/usage/kubernetes-services/#load-balancing-load-balancers-when-using-arp-mode-yes-you-read-that-correctly-kube-vip-v050){target=_blank} for further information. + +### Delete a workload cluster +To remove a workload cluster from your management cluster, remove the cluster object and the provider will clean-up all resources. + +``` +kubectl delete cluster ${TEST_CLUSTER_NAME} -n ${TEST_NAMESPACE} +``` +!!! note + Deleting the entire cluster template with `kubectl delete -f ./cluster.yaml` may lead to pending resources requiring manual cleanup. diff --git a/docs/capx/v1.7.x/pc_certificates.md b/docs/capx/v1.7.x/pc_certificates.md new file mode 100644 index 00000000..f3fe1699 --- /dev/null +++ b/docs/capx/v1.7.x/pc_certificates.md @@ -0,0 +1,149 @@ +# Certificate Trust + +CAPX invokes Prism Central APIs using the HTTPS protocol. CAPX has different methods to handle the trust of the Prism Central certificates: + +- Enable certificate verification (default) +- Configure an additional trust bundle +- Disable certificate verification + +See the respective sections below for more information. + +!!! note + For more information about replacing Prism Central certificates, see the [Nutanix AOS Security Guide](https://portal.nutanix.com/page/documents/details?targetId=Nutanix-Security-Guide-v6_5:mul-security-ssl-certificate-pc-t.html){target=_blank}. + +## Enable certificate verification (default) +By default CAPX will perform certificate verification when invoking Prism Central API calls. This requires Prism Central to be configured with a publicly trusted certificate authority. +No additional configuration is required in CAPX. + +## Configure an additional trust bundle +CAPX allows users to configure an additional trust bundle. This will allow CAPX to verify certificates that are not issued by a publicy trusted certificate authority. + +To configure an additional trust bundle, the `NUTANIX_ADDITIONAL_TRUST_BUNDLE` environment variable needs to be set. The value of the `NUTANIX_ADDITIONAL_TRUST_BUNDLE` environment variable contains the trust bundle (PEM format) in base64 encoded format. See the [Configuring the trust bundle environment variable](#configuring-the-trust-bundle-environment-variable) section for more information. + +It is also possible to configure the additional trust bundle manually by creating a custom `cluster-template`. See the [Configuring the additional trust bundle manually](#configuring-the-additional-trust-bundle-manually) section for more information + +The `NUTANIX_ADDITIONAL_TRUST_BUNDLE` environment variable can be set when initializing the CAPX provider or when creating a workload cluster. If the `NUTANIX_ADDITIONAL_TRUST_BUNDLE` is configured when the CAPX provider is initialized, the additional trust bundle will be used for every CAPX workload cluster. If it is only configured when creating a workload cluster, it will only be applicable for that specific workload cluster. + + +### Configuring the trust bundle environment variable + +Create a PEM encoded file containing the root certificate and all intermediate certificates. Example: +``` +$ cat cert.crt +-----BEGIN CERTIFICATE----- + +-----END CERTIFICATE----- +-----BEGIN CERTIFICATE----- + +-----END CERTIFICATE----- +``` + +Use a `base64` tool to encode these contents in base64. The command below will provide a `base64` string. +``` +$ cat cert.crt | base64 + +``` +!!! note + Make sure the `base64` string does not contain any newlines (`\n`). If the output string contains newlines, remove them manually or check the manual of the `base64` tool on how to generate a `base64` string without newlines. + +Use the `base64` string as value for the `NUTANIX_ADDITIONAL_TRUST_BUNDLE` environment variable. +``` +$ export NUTANIX_ADDITIONAL_TRUST_BUNDLE="" +``` + +### Configuring the additional trust bundle manually + +To configure the additional trust bundle manually without using the `NUTANIX_ADDITIONAL_TRUST_BUNDLE` environment variable present in the default `cluster-template` files, it is required to: + +- Create a `ConfigMap` containing the additional trust bundle. +- Configure the `prismCentral.additionalTrustBundle` object in the `NutanixCluster` spec. + +#### Creating the additional trust bundle ConfigMap + +CAPX supports two different formats for the ConfigMap containing the additional trust bundle. The first one is to add the additional trust bundle as a multi-line string in the `ConfigMap`, the second option is to add the trust bundle in `base64` encoded format. See the examples below. + +Multi-line string example: +```YAML +--- +apiVersion: v1 +kind: ConfigMap +metadata: + name: user-ca-bundle + namespace: ${NAMESPACE} +data: + ca.crt: | + -----BEGIN CERTIFICATE----- + + -----END CERTIFICATE----- + -----BEGIN CERTIFICATE----- + + -----END CERTIFICATE----- +``` + +`base64` example: + +```YAML +apiVersion: v1 +kind: ConfigMap +metadata: + name: user-ca-bundle + namespace: ${NAMESPACE} +binaryData: + ca.crt: +``` + +!!! note + The `base64` string needs to be added as `binaryData`. + + +#### Configuring the NutanixCluster spec + +When the additional trust bundle `ConfigMap` is created, it needs to be referenced in the `NutanixCluster` spec. Add the `prismCentral.additionalTrustBundle` object in the `NutanixCluster` spec as shown below. Make sure the correct additional trust bundle `ConfigMap` is referenced. + +```YAML +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixCluster +metadata: + name: ${CLUSTER_NAME} + namespace: ${NAMESPACE} +spec: + ... + prismCentral: + ... + additionalTrustBundle: + kind: ConfigMap + name: user-ca-bundle + insecure: false +``` + +!!! note + the default value of `prismCentral.insecure` attribute is `false`. It can be omitted when an additional trust bundle is configured. + + If `prismCentral.insecure` attribute is set to `true`, all certificate verification will be disabled. + + +## Disable certificate verification + +!!! note + Disabling certificate verification is not recommended for production purposes and should only be used for testing. + + +Certificate verification can be disabled by setting the `prismCentral.insecure` attribute to `true` in the `NutanixCluster` spec. Certificate verification will be disabled even if an additional trust bundle is configured. + +Disabled certificate verification example: + +```YAML +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixCluster +metadata: + name: ${CLUSTER_NAME} + namespace: ${NAMESPACE} +spec: + controlPlaneEndpoint: + host: ${CONTROL_PLANE_ENDPOINT_IP} + port: ${CONTROL_PLANE_ENDPOINT_PORT=6443} + prismCentral: + ... + insecure: true + ... +``` \ No newline at end of file diff --git a/docs/capx/v1.7.x/port_requirements.md b/docs/capx/v1.7.x/port_requirements.md new file mode 100644 index 00000000..af182abb --- /dev/null +++ b/docs/capx/v1.7.x/port_requirements.md @@ -0,0 +1,19 @@ +# Port Requirements + +CAPX uses the ports documented below to create workload clusters. + +!!! note + This page only documents the ports specifically required by CAPX and does not provide the full overview of all ports required in the CAPI framework. + +## Management cluster + +| Source | Destination | Protocol | Port | Description | +|--------------------|---------------------|----------|------|--------------------------------------------------------------------------------------------------| +| Management cluster | External Registries | TCP | 443 | Pull container images from [CAPX public registries](#public-registries-utilized-when-using-capx) | +| Management cluster | Prism Central | TCP | 9440 | Management cluster communication to Prism Central | + +## Public registries utilized when using CAPX + +| Registry name | +|---------------| +| ghcr.io | diff --git a/docs/capx/v1.7.x/tasks/capx_v17x_upgrade_procedure.md b/docs/capx/v1.7.x/tasks/capx_v17x_upgrade_procedure.md new file mode 100644 index 00000000..16a2c91a --- /dev/null +++ b/docs/capx/v1.7.x/tasks/capx_v17x_upgrade_procedure.md @@ -0,0 +1,83 @@ +# CAPX v1.7.x Upgrade Procedure + +Starting from CAPX v1.3.0, it is required for all CAPX-managed Kubernetes clusters to use the Nutanix Cloud Controller Manager (CCM). + +Before upgrading CAPX instances to v1.3.0 or later, it is required to follow the [steps](#steps) detailed below for each of the CAPX-managed Kubernetes clusters that don't use Nutanix CCM. + + +## Steps + +This procedure uses [Cluster Resource Set (CRS)](https://cluster-api.sigs.k8s.io/tasks/experimental-features/cluster-resource-set){target=_blank} to install Nutanix CCM but it can also be installed using the [Nutanix CCM Helm chart](https://artifacthub.io/packages/helm/nutanix/nutanix-cloud-provider){target=_blank}. + +!!! warning + Make sure [CRS](https://cluster-api.sigs.k8s.io/tasks/experimental-features/cluster-resource-set){target=_blank} is enabled on the management cluster before following the procedure. + +Perform following steps for each of the CAPX-managed Kubernetes clusters that are not configured to use Nutanix CCM: + +1. Add the `cloud-provider: external` configuration in the `KubeadmConfigTemplate` resources: + ```YAML + apiVersion: bootstrap.cluster.x-k8s.io/v1beta1 + kind: KubeadmConfigTemplate + spec: + template: + spec: + joinConfiguration: + nodeRegistration: + kubeletExtraArgs: + cloud-provider: external + ``` +2. Add the `cloud-provider: external` configuration in the `KubeadmControlPlane` resource: +```YAML +--- +apiVersion: bootstrap.cluster.x-k8s.io/v1beta1 +kind: KubeadmConfigTemplate +spec: + template: + spec: + joinConfiguration: + nodeRegistration: + kubeletExtraArgs: + cloud-provider: external +--- +apiVersion: controlplane.cluster.x-k8s.io/v1beta1 +kind: KubeadmControlPlane +spec: + kubeadmConfigSpec: + clusterConfiguration: + apiServer: + extraArgs: + cloud-provider: external + controllerManager: + extraArgs: + cloud-provider: external + initConfiguration: + nodeRegistration: + kubeletExtraArgs: + cloud-provider: external + joinConfiguration: + nodeRegistration: + kubeletExtraArgs: + cloud-provider: external +``` +3. Add the Nutanix CCM CRS resources: + + - [nutanix-ccm-crs.yaml](https://github.com/nutanix-cloud-native/cluster-api-provider-nutanix/blob/v1.7.0/templates/ccm/nutanix-ccm-crs.yaml){target=_blank} + - [nutanix-ccm-secret.yaml](https://github.com/nutanix-cloud-native/cluster-api-provider-nutanix/blob/v1.7.0/templates/ccm/nutanix-ccm-secret.yaml) + - [nutanix-ccm.yaml](https://github.com/nutanix-cloud-native/cluster-api-provider-nutanix/blob/v1.7.0/templates/ccm/nutanix-ccm.yaml) + + Make sure to update each of the variables before applying the `YAML` files. + +4. Add the `ccm: nutanix` label to the `Cluster` resource: + ```YAML + apiVersion: cluster.x-k8s.io/v1beta1 + kind: Cluster + metadata: + labels: + ccm: nutanix + ``` +5. Verify if the Nutanix CCM pod is up and running: +``` +kubectl get pod -A -l k8s-app=nutanix-cloud-controller-manager +``` +6. Trigger a new rollout of the Kubernetes nodes by performing a Kubernetes upgrade or by using `clusterctl alpha rollout restart`. See the [clusterctl alpha rollout](https://cluster-api.sigs.k8s.io/clusterctl/commands/alpha-rollout#restart){target=_blank} for more information. +7. Upgrade CAPX to v1.7.0 by following the [clusterctl upgrade](https://cluster-api.sigs.k8s.io/clusterctl/commands/upgrade.html?highlight=clusterctl%20upgrade%20pla#clusterctl-upgrade){target=_blank} documentation \ No newline at end of file diff --git a/docs/capx/v1.7.x/tasks/modify_machine_configuration.md b/docs/capx/v1.7.x/tasks/modify_machine_configuration.md new file mode 100644 index 00000000..04a43a95 --- /dev/null +++ b/docs/capx/v1.7.x/tasks/modify_machine_configuration.md @@ -0,0 +1,11 @@ +# Modifying Machine Configurations + +Since all attributes of the `NutanixMachineTemplate` resources are immutable, follow the [Updating Infrastructure Machine Templates](https://cluster-api.sigs.k8s.io/tasks/updating-machine-templates.html?highlight=machine%20template#updating-infrastructure-machine-templates){target=_blank} procedure to modify the configuration of machines in an existing CAPX cluster. +See the [NutanixMachineTemplate](../types/nutanix_machine_template.md) documentation for all supported configuration parameters. + +!!! note + Manually modifying existing and linked `NutanixMachineTemplate` resources will not trigger a rolling update of the machines. + +!!! note + Do not modify the virtual machine configuration of CAPX cluster nodes manually in Prism/Prism Central. + CAPX will not automatically revert the configuration change but performing scale-up/scale-down/upgrade operations will override manual modifications. Only use the `Updating Infrastructure Machine` procedure referenced above to perform configuration changes. \ No newline at end of file diff --git a/docs/capx/v1.7.x/troubleshooting.md b/docs/capx/v1.7.x/troubleshooting.md new file mode 100644 index 00000000..c023d13e --- /dev/null +++ b/docs/capx/v1.7.x/troubleshooting.md @@ -0,0 +1,13 @@ +# Troubleshooting + +## Clusterctl failed with GitHub rate limit error + +By design Clusterctl fetches artifacts from repositories hosted on GitHub, this operation is subject to [GitHub API rate limits](https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting){target=_blank}. + +While this is generally okay for the majority of users, there is still a chance that some users (especially developers or CI tools) hit this limit: + +``` +Error: failed to get repository client for the XXX with name YYY: error creating the GitHub repository client: failed to get GitHub latest version: failed to get the list of versions: rate limit for github api has been reached. Please wait one hour or get a personal API tokens a assign it to the GITHUB_TOKEN environment variable +``` + +As explained in the error message, you can increase your API rate limit by [creating a GitHub personal token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token){target=_blank} and setting a `GITHUB_TOKEN` environment variable using the token. diff --git a/docs/capx/v1.7.x/types/nutanix_cluster.md b/docs/capx/v1.7.x/types/nutanix_cluster.md new file mode 100644 index 00000000..daa8d8cc --- /dev/null +++ b/docs/capx/v1.7.x/types/nutanix_cluster.md @@ -0,0 +1,55 @@ +# NutanixCluster + +The `NutanixCluster` resource defines the configuration of a CAPX Kubernetes cluster. + +Example of a `NutanixCluster` resource: + +```YAML +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixCluster +metadata: + name: ${CLUSTER_NAME} + namespace: ${NAMESPACE} +spec: + controlPlaneEndpoint: + host: ${CONTROL_PLANE_ENDPOINT_IP} + port: ${CONTROL_PLANE_ENDPOINT_PORT=6443} + prismCentral: + address: ${NUTANIX_ENDPOINT} + additionalTrustBundle: + kind: ConfigMap + name: user-ca-bundle + credentialRef: + kind: Secret + name: ${CLUSTER_NAME} + insecure: ${NUTANIX_INSECURE=false} + port: ${NUTANIX_PORT=9440} +``` + +## NutanixCluster spec +The table below provides an overview of the supported parameters of the `spec` attribute of a `NutanixCluster` resource. + +### Configuration parameters + +| Key |Type |Description | +|--------------------------------------------|------|----------------------------------------------------------------------------------| +|controlPlaneEndpoint |object|Defines the host IP and port of the CAPX Kubernetes cluster. | +|controlPlaneEndpoint.host |string|Host IP to be assigned to the CAPX Kubernetes cluster. | +|controlPlaneEndpoint.port |int |Port of the CAPX Kubernetes cluster. Default: `6443` | +|prismCentral |object|(Optional) Prism Central endpoint definition. | +|prismCentral.address |string|IP/FQDN of Prism Central. | +|prismCentral.port |int |Port of Prism Central. Default: `9440` | +|prismCentral.insecure |bool |Disable Prism Central certificate checking. Default: `false` | +|prismCentral.credentialRef |object|Reference to credentials used for Prism Central connection. | +|prismCentral.credentialRef.kind |string|Kind of the credentialRef. Allowed value: `Secret` | +|prismCentral.credentialRef.name |string|Name of the secret containing the Prism Central credentials. | +|prismCentral.credentialRef.namespace |string|(Optional) Namespace of the secret containing the Prism Central credentials. | +|prismCentral.additionalTrustBundle |object|Reference to the certificate trust bundle used for Prism Central connection. | +|prismCentral.additionalTrustBundle.kind |string|Kind of the additionalTrustBundle. Allowed value: `ConfigMap` | +|prismCentral.additionalTrustBundle.name |string|Name of the `ConfigMap` containing the Prism Central trust bundle. | +|prismCentral.additionalTrustBundle.namespace|string|(Optional) Namespace of the `ConfigMap` containing the Prism Central trust bundle.| +|controlPlaneFailureDomains |list |(optional) List of local references to failure domains for control plane nodes. | +|controlPlaneFailureDomains.Name |string|Name of the failure domain used for control plane nodes. | + +!!! note + To prevent duplicate IP assignments, it is required to assign an IP-address to the `controlPlaneEndpoint.host` variable that is not part of the Nutanix IPAM or DHCP range assigned to the subnet of the CAPX cluster. \ No newline at end of file diff --git a/docs/capx/v1.7.x/types/nutanix_failure_domains.md b/docs/capx/v1.7.x/types/nutanix_failure_domains.md new file mode 100644 index 00000000..cefae92c --- /dev/null +++ b/docs/capx/v1.7.x/types/nutanix_failure_domains.md @@ -0,0 +1,99 @@ +# NutanixFailureDomain + +The `NutanixFailureDomain` resource configuration of a CAPX Kubernetes Failure Domain. + +Example of a `NutanixFailureDomain` resource: +```YAML +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixFailureDomain +metadata: + name: "${FAILURE_DOMAIN_NAME}" + namespace: "${CLUSTER_NAMESPACE}" +spec: + prismElementCluster: + type: name + uuid: "${NUTANIX_PRISM_ELEMENT_CLUSTER_NAME}" + subnets: + - type: uuid + uuid: "${NUTANIX_SUBNET_UUID}" + - type: name + name: "${NUTANIX_SUBNET_NAME}" +``` + +## NutanixFailureDomain spec +The table below provides an overview of the supported parameters of the `spec` attribute of a `NutanixFailureDomain` resource. + +### Configuration parameters +| Key |Type |Description | +|--------------------------------------------|------|--------------------------------------------------------------------------------------------| +|prismElementCluster |object|Defines the identify the Prism Element cluster in the Prism Central for the failure domain. | +|prismElementCluster.type |string|Type to identify the Prism Element cluster. Allowed values: `name` and `uuid` | +|prismElementCluster.name |string|Name of the Prism Element cluster. | +|prismElementCluster.uuid |string|UUID of the Prism Element cluster. | +|subnets |list |Reference (name or uuid) to the subnets to be assigned to the VMs. | +|subnets.[].type |string|Type to identify the subnet. Allowed values: `name` and `uuid` | +|subnets.[].name |string|Name of the subnet. | +|subnets.[].uuid |string|UUID of the subnet. | + +!!! note + The `NutanixFailureDomain` resource allows you to define logical groupings of Nutanix infrastructure for high availability and workload placement in Kubernetes clusters managed by CAPX. Each failure domain maps to a Prism Element cluster and a set of subnets, ensuring that workloads can be distributed across different infrastructure segments. + +## Usage Notes + +- The `prismElementCluster` field is **required** and must specify either the `name` or `uuid` of the Prism Element cluster. +- The `subnets` field is **required**. You can provide one or more subnets by `name` or `uuid`. +- Failure domains are used by Cluster API to spread machines across different infrastructure segments for resilience. + +## Example Scenarios + +### Single Subnet by UUID + +```yaml +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixFailureDomain +metadata: + name: fd-uuid +spec: + prismElementCluster: + type: uuid + uuid: "00000000-0000-0000-0000-000000000000" + subnets: + - type: uuid + uuid: "11111111-1111-1111-1111-111111111111" +``` + +### Multiple Subnets by Name + +```yaml +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixFailureDomain +metadata: + name: fd-names +spec: + prismElementCluster: + type: name + name: "PrismClusterA" + subnets: + - type: name + name: "SubnetA" + - type: name + name: "SubnetB" +``` + +### Multiple Subnets by Name and UUID + +```yaml +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixFailureDomain +metadata: + name: fd-names +spec: + prismElementCluster: + type: name + name: "PrismClusterA" + subnets: + - type: name + name: "SubnetA" + - type: uuid + name: "11111111-1111-1111-1111-111111111111" +``` \ No newline at end of file diff --git a/docs/capx/v1.7.x/types/nutanix_machine_template.md b/docs/capx/v1.7.x/types/nutanix_machine_template.md new file mode 100644 index 00000000..4aa613b8 --- /dev/null +++ b/docs/capx/v1.7.x/types/nutanix_machine_template.md @@ -0,0 +1,124 @@ +# NutanixMachineTemplate +The `NutanixMachineTemplate` resource defines the configuration of a CAPX Kubernetes VM. + +Example of a `NutanixMachineTemplate` resource. + +```YAML +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixMachineTemplate +metadata: + name: "${CLUSTER_NAME}-mt-0" + namespace: "${NAMESPACE}" +spec: + template: + spec: + providerID: "nutanix://${CLUSTER_NAME}-m1" + # Supported options for boot type: legacy and uefi + # Defaults to legacy if not set + bootType: ${NUTANIX_MACHINE_BOOT_TYPE=legacy} + vcpusPerSocket: ${NUTANIX_MACHINE_VCPU_PER_SOCKET=1} + vcpuSockets: ${NUTANIX_MACHINE_VCPU_SOCKET=2} + memorySize: "${NUTANIX_MACHINE_MEMORY_SIZE=4Gi}" + systemDiskSize: "${NUTANIX_SYSTEMDISK_SIZE=40Gi}" + image: + type: name + name: "${NUTANIX_MACHINE_TEMPLATE_IMAGE_NAME}" + cluster: + type: name + name: "${NUTANIX_PRISM_ELEMENT_CLUSTER_NAME}" + subnet: + - type: name + name: "${NUTANIX_SUBNET_NAME}" + # Adds additional categories to the virtual machines. + # Note: Categories must already be present in Prism Central + # additionalCategories: + # - key: AppType + # value: Kubernetes + # Adds the cluster virtual machines to a project defined in Prism Central. + # Replace NUTANIX_PROJECT_NAME with the correct project defined in Prism Central + # Note: Project must already be present in Prism Central. + # project: + # type: name + # name: "NUTANIX_PROJECT_NAME" + # gpus: + # - type: name + # name: "GPU NAME" + # Note: Either of `image` or `imageLookup` must be set, but not both. + # imageLookup: + # format: "NUTANIX_IMAGE_LOOKUP_FORMAT" + # baseOS: "NUTANIX_IMAGE_LOOKUP_BASE_OS" + # dataDisks: + # - diskSize: + # deviceProperties: + # deviceType: Disk + # adapterType: SCSI + # deviceIndex: 1 + # storageConfig: + # diskMode: Standard + # storageContainer: + # type: name + # name: "NUTANIX_VM_DISK_STORAGE_CONTAINER" + # dataSource: + # type: name + # name: "NUTANIX_DATA_SOURCE_IMAGE_NAME" +``` + +## NutanixMachineTemplate spec +The table below provides an overview of the supported parameters of the `spec` attribute of a `NutanixMachineTemplate` resource. + +### Configuration parameters +| Key |Type |Description | +|----------------------------------------------------|------|--------------------------------------------------------------------------------------------------------| +|bootType |string|Boot type of the VM. Depends on the OS image used. Allowed values: `legacy`, `uefi`. Default: `legacy` | +|vcpusPerSocket |int |Amount of vCPUs per socket. Default: `1` | +|vcpuSockets |int |Amount of vCPU sockets. Default: `2` | +|memorySize |string|Amount of Memory. Default: `4Gi` | +|systemDiskSize |string|Amount of storage assigned to the system disk. Default: `40Gi` | +|image |object|Reference (name or uuid) to the OS image used for the system disk. | +|image.type |string|Type to identify the OS image. Allowed values: `name` and `uuid` | +|image.name |string|Name of the image. | +|image.uuid |string|UUID of the image. | +|cluster |object|(Optional) Reference (name or uuid) to the Prism Element cluster. Name or UUID can be passed | +|cluster.type |string|Type to identify the Prism Element cluster. Allowed values: `name` and `uuid` | +|cluster.name |string|Name of the Prism Element cluster. | +|cluster.uuid |string|UUID of the Prism Element cluster. | +|subnets |list |(Optional) Reference (name or uuid) to the subnets to be assigned to the VMs. | +|subnets.[].type |string|Type to identify the subnet. Allowed values: `name` and `uuid` | +|subnets.[].name |string|Name of the subnet. | +|subnets.[].uuid |string|UUID of the subnet. | +|additionalCategories |list |Reference to the categories to be assigned to the VMs. These categories already exist in Prism Central. | +|additionalCategories.[].key |string|Key of the category. | +|additionalCategories.[].value |string|Value of the category. | +|project |object|Reference (name or uuid) to the project. This project must already exist in Prism Central. | +|project.type |string|Type to identify the project. Allowed values: `name` and `uuid` | +|project.name |string|Name of the project. | +|project.uuid |string|UUID of the project. | +|gpus |object|Reference (name or deviceID) to the GPUs to be assigned to the VMs. Can be vGPU or Passthrough. | +|gpus.[].type |string|Type to identify the GPU. Allowed values: `name` and `deviceID` | +|gpus.[].name |string|Name of the GPU or the vGPU profile | +|gpus.[].deviceID |string|DeviceID of the GPU or the vGPU profile | +|imageLookup |object|(Optional) Reference to a container that holds how to look up rhcos images for the cluster. | +|imageLookup.format |string|Naming format to look up the image for the machine. Default: `capx-{{.BaseOS}}-{{.K8sVersion}}-*` | +|imageLookup.baseOS |string|Name of the base operating system to use for image lookup. | +|dataDisks |list |(Optional) Reference to the data disks to be attached to the VM. | +|dataDisks.[].diskSize |string|Size (in Quantity format) of the disk attached to the VM. The minimum diskSize is `1GB`. | +|dataDisks.[].deviceProperties |object|(Optional) Reference to the properties of the disk device. | +|dataDisks.[].deviceProperties.deviceType |string|VM disk device type. Allowed values: `Disk` (default) and `CDRom` | +|dataDisks.[].deviceProperties.adapterType |string|Adapter type of the disk address. | +|dataDisks.[].deviceProperties.deviceIndex |int |(Optional) Index of the disk address. Allowed values: non-negative integers (default: `0`) | +|dataDisks.[].storageConfig |object|(Optional) Reference to the storage configuration parameters of the VM disks. | +|dataDisks.[].storageConfig.diskMode |string|Specifies the disk mode. Allowed values: `Standard` (default) and `Flash` | +|dataDisks.[].storageConfig.storageContainer |object|(Optional) Reference (name or uuid) to the storage_container used by the VM disk. | +|dataDisks.[].storageConfig.storageContainer.type |string|Type to identify the storage container. Allowed values: `name` and `uuid` | +|dataDisks.[].storageConfig.storageContainer.name |string|Name of the storage container. | +|dataDisks.[].storageConfig.storageContainer.uuid |string|UUID of the storage container. | +|dataDisks.[].dataSource |object|(Optional) Reference (name or uuid) to a data source image for the VM disk. | +|dataDisks.[].dataSource.type |string|Type to identify the data source image. Allowed values: `name` and `uuid` | +|dataDisks.[].dataSource.name |string|Name of the data source image. | +|dataDisks.[].dataSource.uuid |string|UUID of the data source image. | + +!!! note + - The `cluster` or `subnets` configuration parameters are optional in case failure domains are defined on the `NutanixCluster` and `MachineDeployment` resources. + - If the `deviceType` is `Disk`, the valid `adapterType` can be `SCSI`, `IDE`, `PCI`, `SATA` or `SPAPR`. If the `deviceType` is `CDRom`, the valid `adapterType` can be `IDE` or `SATA`. + - Either of `image` or `imageLookup` must be set, but not both. + - For a Machine VM, the `deviceIndex` for the disks with the same `deviceType.adapterType` combination should start from `0` and increase consecutively afterwards. Note that for each Machine VM, the `Disk.SCSI.0` and `CDRom.IDE.0` are reserved to be used by the VM's system. So for `dataDisks` of Disk.SCSI and CDRom.IDE, the `deviceIndex` should start from `1`. \ No newline at end of file diff --git a/docs/capx/v1.7.x/user_requirements.md b/docs/capx/v1.7.x/user_requirements.md new file mode 100644 index 00000000..05e971a5 --- /dev/null +++ b/docs/capx/v1.7.x/user_requirements.md @@ -0,0 +1,37 @@ +# User Requirements + +Cluster API Provider Nutanix Cloud Infrastructure (CAPX) interacts with Nutanix Prism Central (PC) APIs using a Prism Central user account. + +CAPX supports two types of PC users: + +- Local users: must be assigned the `Prism Central Admin` role. +- Domain users: must be assigned a role that at least has the [Minimum required CAPX permissions for domain users](#minimum-required-capx-permissions-for-domain-users) assigned. + +See [Credential Management](./credential_management.md){target=_blank} for more information on how to pass the user credentials to CAPX. + +## Minimum required CAPX permissions for domain users + +The following permissions are required for Prism Central domain users: + +- Create Category Mapping +- Create Image +- Create Or Update Name Category +- Create Or Update Value Category +- Create Virtual Machine +- Delete Category Mapping +- Delete Image +- Delete Name Category +- Delete Value Category +- Delete Virtual Machine +- Detach Volume Group From AHV VM +- View Category Mapping +- View Cluster +- View Image +- View Name Category +- View Project +- View Subnet +- View Value Category +- View Virtual Machine + +!!! note + The list of permissions has been validated on PC 2022.6 and above. diff --git a/docs/capx/v1.7.x/validated_integrations.md b/docs/capx/v1.7.x/validated_integrations.md new file mode 100644 index 00000000..83ee53a0 --- /dev/null +++ b/docs/capx/v1.7.x/validated_integrations.md @@ -0,0 +1,55 @@ +# Validated Integrations + +Validated integrations are a defined set of specifically tested configurations between technologies that represent the most common combinations that Nutanix customers are using or deploying with CAPX. For these integrations, Nutanix has directly, or through certified partners, exercised a full range of platform tests as part of the product release process. + +## Integration Validation Policy + +Nutanix follows the version validation policies below: + +- Validate at least one active AOS LTS (long term support) version. Validated AOS LTS version for a specific CAPX version is listed in the [AOS](#aos) section.
+ + !!! note + + Typically the latest LTS release at time of CAPX release except when latest is initial release in train (eg x.y.0). Exact version depends on timing and customer adoption. + +- Validate the latest AOS STS (short term support) release at time of CAPX release. +- Validate at least one active Prism Central (PC) version. Validated PC version for a specific CAPX version is listed in the [Prism Central](#prism-central) section.
+ + !!! note + + Typically the the latest PC release at time of CAPX release except when latest is initial release in train (eg x.y.0). Exact version depends on timing and customer adoption. + +- At least one active Cluster-API (CAPI) version. Validated CAPI version for a specific CAPX version is listed in the [Cluster-API](#cluster-api) section.
+ + !!! note + + Typically the the latest Cluster-API release at time of CAPX release except when latest is initial release in train (eg x.y.0). Exact version depends on timing and customer adoption. + +## Validated versions +### Cluster-API +| CAPX | CAPI v1.3.x | CAPI v1.4.x | CAPI v1.5.x | CAPI v1.6.x | CAPI v1.7.x | CAPI v1.8.x | CAPI v1.9.x | +|--------|-------------|-------------|-------------|-------------|-------------|-------------|-------------| +| v1.7.x | Yes | Yes | Yes | Yes | Yes | Yes | Yes | +| v1.6.x | Yes | Yes | Yes | Yes | Yes | Yes | Yes | +| v1.5.x | Yes | Yes | Yes | Yes | Yes | Yes | No | +| v1.4.x | Yes | Yes | Yes | Yes | Yes | No | No | + +See the [Validated Kubernetes Versions](https://cluster-api.sigs.k8s.io/reference/versions.html?highlight=version#supported-kubernetes-versions){target=_blank} page for more information on CAPI validated versions. + +### AOS + +| CAPX | 6.5.x (LTS) | 6.8 (STS) | 6.10 | 7.0 | 7.3 | +|--------|-------------|-----------|------|-----|-----| +| v1.7.x | No | Yes | Yes | Yes | Yes | +| v1.6.x | No | Yes | Yes | Yes | Yes | +| v1.5.x | Yes | Yes | Yes | Yes | Yes | +| v1.4.x | Yes | Yes | No | No | No | + +### Prism Central + +| CAPX | pc.2022.6 | pc.2023.x | pc.2024.x | pc.7.3 | +|--------|-----------|-----------|-----------|--------| +| v1.7.x | No | Yes | Yes | Yes | +| v1.6.x | No | Yes | Yes | Yes | +| v1.5.x | Yes | Yes | Yes | Yes | +| v1.4.x | Yes | Yes | Yes | No | diff --git a/docs/capx/v1.8.x/addons/install_csi_driver.md b/docs/capx/v1.8.x/addons/install_csi_driver.md new file mode 100644 index 00000000..afb4bdc8 --- /dev/null +++ b/docs/capx/v1.8.x/addons/install_csi_driver.md @@ -0,0 +1,215 @@ +# Nutanix CSI Driver installation with CAPX + +The Nutanix CSI driver is fully supported on CAPI/CAPX deployed clusters where all the nodes meet the [Nutanix CSI driver prerequisites](#capi-workload-cluster-prerequisites-for-the-nutanix-csi-driver). + +There are three methods to install the Nutanix CSI driver on a CAPI/CAPX cluster: + +- Helm +- ClusterResourceSet +- CAPX Flavor + +For more information, check the next sections. + +## CAPI Workload cluster prerequisites for the Nutanix CSI Driver + +Kubernetes workers need the following prerequisites to use the Nutanix CSI Drivers: + +- iSCSI initiator package (for Volumes based block storage) +- NFS client package (for Files based storage) + +These packages may already be present in the image you use with your infrastructure provider or you can also rely on your bootstrap provider to install them. More info is available in the [Prerequisites docs](https://portal.nutanix.com/page/documents/details?targetId=CSI-Volume-Driver-v2_6:csi-csi-plugin-prerequisites-r.html){target=_blank}. + +The package names and installation method will also vary depending on the operating system you plan to use. + +In the example below, `kubeadm` bootstrap provider is used to deploy these packages on top of an Ubuntu 20.04 image. The `kubeadm` bootstrap provider allows defining `preKubeadmCommands` that will be launched before Kubernetes cluster creation. These `preKubeadmCommands` can be defined both in `KubeadmControlPlane` for master nodes and in `KubeadmConfigTemplate` for worker nodes. + +In the example with an Ubuntu 20.04 image, both `KubeadmControlPlane` and `KubeadmConfigTemplate` must be modified as in the example below: + +```yaml +spec: + template: + spec: + # ....... + preKubeadmCommands: + - echo "before kubeadm call" > /var/log/prekubeadm.log + - apt update + - apt install -y nfs-common open-iscsi + - systemctl enable --now iscsid +``` +## Install the Nutanix CSI Driver with Helm + +A recent [Helm](https://helm.sh){target=_blank} version is needed (tested with Helm v3.10.1). + +The example below must be applied on a ready workload cluster. The workload cluster's kubeconfig can be retrieved and used to connect with the following command: + +```shell +clusterctl get kubeconfig $CLUSTER_NAME -n $CLUSTER_NAMESPACE > $CLUSTER_NAME-KUBECONFIG +export KUBECONFIG=$(pwd)/$CLUSTER_NAME-KUBECONFIG +``` + +Once connected to the cluster, follow the [CSI documentation](https://portal.nutanix.com/page/documents/details?targetId=CSI-Volume-Driver-v2_6:csi-csi-driver-install-t.html){target=_blank}. + +First, install the [nutanix-csi-snapshot](https://github.com/nutanix/helm/tree/master/charts/nutanix-csi-snapshot){target=_blank} chart followed by the [nutanix-csi-storage](https://github.com/nutanix/helm/tree/master/charts/nutanix-csi-storage){target=_blank} chart. + +See an example below: + +```shell +#Add the official Nutanix Helm repo and get the latest update +helm repo add nutanix https://nutanix.github.io/helm/ +helm repo update + +# Install the nutanix-csi-snapshot chart +helm install nutanix-csi-snapshot nutanix/nutanix-csi-snapshot -n ntnx-system --create-namespace + +# Install the nutanix-csi-storage chart +helm install nutanix-storage nutanix/nutanix-csi-storage -n ntnx-system --set createSecret=false +``` + +!!! warning + For correct Nutanix CSI driver deployment, a fully functional CNI deployment must be present. + +## Install the Nutanix CSI Driver with `ClusterResourceSet` + +The `ClusterResourceSet` feature was introduced to automatically apply a set of resources (such as CNI/CSI) defined by administrators to matching created/existing workload clusters. + +### Enabling the `ClusterResourceSet` feature + +At the time of writing, `ClusterResourceSet` is an experimental feature that must be enabled during the initialization of a management cluster with the `EXP_CLUSTER_RESOURCE_SET` feature gate. + +To do this, add `EXP_CLUSTER_RESOURCE_SET: "true"` in the `clusterctl` configuration file or just `export EXP_CLUSTER_RESOURCE_SET=true` before initializing the management cluster with `clusterctl init`. + +If the management cluster is already initialized, the `ClusterResourceSet` can be enabled by changing the configuration of the `capi-controller-manager` deployment in the `capi-system` namespace. + + ```shell + kubectl edit deployment -n capi-system capi-controller-manager + ``` + +Locate the section below: + +```yaml + - args: + - --leader-elect + - --metrics-bind-addr=localhost:8080 + - --feature-gates=MachinePool=false,ClusterResourceSet=true,ClusterTopology=false +``` + +Then replace `ClusterResourceSet=false` with `ClusterResourceSet=true`. + +!!! note + Editing the `deployment` resource will cause Kubernetes to automatically start new versions of the containers with the feature enabled. + + + +### Prepare the Nutanix CSI `ClusterResourceSet` + +#### Create the `ConfigMap` for the CSI Plugin + +First, create a `ConfigMap` that contains a YAML manifest with all resources to install the Nutanix CSI driver. + +Since the Nutanix CSI Driver is provided as a Helm chart, use `helm` to extract it before creating the `ConfigMap`. See an example below: + +```shell +helm repo add nutanix https://nutanix.github.io/helm/ +helm repo update + +kubectl create ns ntnx-system --dry-run=client -o yaml > nutanix-csi-namespace.yaml +helm template nutanix-csi-snapshot nutanix/nutanix-csi-snapshot -n ntnx-system > nutanix-csi-snapshot.yaml +helm template nutanix-csi-snapshot nutanix/nutanix-csi-storage -n ntnx-system > nutanix-csi-storage.yaml + +kubectl create configmap nutanix-csi-crs --from-file=nutanix-csi-namespace.yaml --from-file=nutanix-csi-snapshot.yaml --from-file=nutanix-csi-storage.yaml +``` + +#### Create the `ClusterResourceSet` + +Next, create the `ClusterResourceSet` resource that will map the `ConfigMap` defined above to clusters using a `clusterSelector`. + +The `ClusterResourceSet` needs to be created inside the management cluster. See an example below: + +```yaml +--- +apiVersion: addons.cluster.x-k8s.io/v1alpha3 +kind: ClusterResourceSet +metadata: + name: nutanix-csi-crs +spec: + clusterSelector: + matchLabels: + csi: nutanix + resources: + - kind: ConfigMap + name: nutanix-csi-crs +``` + +The `clusterSelector` field controls how Cluster API will match this `ClusterResourceSet` on one or more workload clusters. In the example scenario, the `matchLabels` approach is being used where the `ClusterResourceSet` will be applied to all workload clusters having the `csi: nutanix` label present. If the label isn't present, the `ClusterResourceSet` won't apply to that workload cluster. + +The `resources` field references the `ConfigMap` created above, which contains the manifests for installing the Nutanix CSI driver. + +#### Assign the `ClusterResourceSet` to a workload cluster + +Assign this `ClusterResourceSet` to the workload cluster by adding the correct label to the `Cluster` resource. + +This can be done before workload cluster creation by editing the output of the `clusterctl generate cluster` command or by modifying an already deployed workload cluster. + +In both cases, `Cluster` resources should look like this: + +```yaml +apiVersion: cluster.x-k8s.io/v1beta1 +kind: Cluster +metadata: + name: workload-cluster-name + namespace: workload-cluster-namespace + labels: + csi: nutanix +# ... +``` + +!!! warning + For correct Nutanix CSI driver deployment, a fully functional CNI deployment must be present. + +## Install the Nutanix CSI Driver with a CAPX flavor + +The CAPX provider can utilize a flavor to automatically deploy the Nutanix CSI using a `ClusterResourceSet`. + +### Prerequisites + +The following requirements must be met: + +- The operating system must meet the [Nutanix CSI OS prerequisites](#capi-workload-cluster-prerequisites-for-the-nutanix-csi-driver). +- The Management cluster must be installed with the [`CLUSTER_RESOURCE_SET` feature gate](#enabling-the-clusterresourceset-feature). + +### Installation + +Specify the `csi` flavor during workload cluster creation. See an example below: + +```shell +clusterctl generate cluster my-cluster -f csi +``` + +Additional environment variables are required: + +- `WEBHOOK_CA`: Base64 encoded CA certificate used to sign the webhook certificate +- `WEBHOOK_CERT`: Base64 certificate for the webhook validation component +- `WEBHOOK_KEY`: Base64 key for the webhook validation component + +The three components referenced above can be automatically created and referenced using [this script](https://github.com/nutanix-cloud-native/cluster-api-provider-nutanix/blob/main/scripts/gen-self-cert.sh){target=_blank}: + +``` +source scripts/gen-self-cert.sh +``` + +The certificate must reference the following names: + +- csi-snapshot-webhook +- csi-snapshot-webhook.ntnx-sytem +- csi-snapshot-webhook.ntnx-sytem.svc + +!!! warning + For correct Nutanix CSI driver deployment, a fully functional CNI deployment must be present. + +## Nutanix CSI Driver Configuration + +After the driver is installed, it must be configured for use by minimally defining a `Secret` and `StorageClass`. + +This can be done manually in the workload clusters or by using a `ClusterResourceSet` in the management cluster as explained above. + +See the Official [CSI Driver documentation](https://portal.nutanix.com/page/documents/details?targetId=CSI-Volume-Driver-v2_6:CSI-Volume-Driver-v2_6){target=_blank} on the Nutanix Portal for more configuration information. diff --git a/docs/capx/v1.8.x/credential_management.md b/docs/capx/v1.8.x/credential_management.md new file mode 100644 index 00000000..bebbc5a0 --- /dev/null +++ b/docs/capx/v1.8.x/credential_management.md @@ -0,0 +1,93 @@ +# Credential Management +Cluster API Provider Nutanix Cloud Infrastructure (CAPX) interacts with Nutanix Prism Central (PC) APIs to manage the required Kubernetes cluster infrastructure resources. + +PC credentials are required to authenticate to the PC APIs. CAPX currently supports two mechanisms to supply the required credentials: + +- Credentials injected into the CAPX manager deployment +- Workload cluster specific credentials + +## Credentials injected into the CAPX manager deployment +By default, credentials will be injected into the CAPX manager deployment when CAPX is initialized. See the [getting started guide](./getting_started.md) for more information on the initialization. + +Upon initialization a `nutanix-creds` secret will automatically be created in the `capx-system` namespace. This secret will contain the values supplied via the `NUTANIX_USER` and `NUTANIX_PASSWORD` parameters. + +The `nutanix-creds` secret will be used for workload cluster deployment if no other credential is supplied. + +### Example +An example of the automatically created `nutanix-creds` secret can be found below: +```yaml +--- +apiVersion: v1 +kind: Secret +type: Opaque +metadata: + name: nutanix-creds + namespace: capx-system +stringData: + credentials: | + [ + { + "type": "basic_auth", + "data": { + "prismCentral":{ + "username": "", + "password": "" + }, + "prismElements": null + } + } + ] +``` + +## Workload cluster specific credentials +Users can override the [credentials injected in CAPX manager deployment](#credentials-injected-into-the-capx-manager-deployment) by supplying a credential specific to a workload cluster. The credentials can be supplied by creating a secret in the same namespace as the `NutanixCluster` namespace. + +The secret can be referenced by adding a `credentialRef` inside the `prismCentral` attribute contained in the `NutanixCluster`. +The secret will also be deleted when the `NutanixCluster` is deleted. + +Note: There is a 1:1 relation between the secret and the `NutanixCluster` object. + +### Example +Create a secret in the namespace of the `NutanixCluster`: + +```yaml +--- +apiVersion: v1 +kind: Secret +metadata: + name: "" + namespace: "" +stringData: + credentials: | + [ + { + "type": "basic_auth", + "data": { + "prismCentral":{ + "username": "", + "password": "" + }, + "prismElements": null + } + } + ] +``` + +Add a `prismCentral` and corresponding `credentialRef` to the `NutanixCluster`: + +```yaml +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixCluster +metadata: + name: "" + namespace: "" +spec: + prismCentral: + ... + credentialRef: + name: "" + kind: Secret +... +``` + +See the [NutanixCluster](./types/nutanix_cluster.md) documentation for all supported configuration parameters for the `prismCentral` and `credentialRef` attribute. \ No newline at end of file diff --git a/docs/capx/v1.8.x/experimental/autoscaler.md b/docs/capx/v1.8.x/experimental/autoscaler.md new file mode 100644 index 00000000..2af57213 --- /dev/null +++ b/docs/capx/v1.8.x/experimental/autoscaler.md @@ -0,0 +1,129 @@ +# Using Autoscaler in combination with CAPX + +!!! warning + The scenario and features described on this page are experimental. It's important to note that they have not been fully validated. + +[Autoscaler](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/README.md){target=_blank} can be used in combination with Cluster API to automatically add or remove machines in a cluster. + +Autoscaler can be used in different deployment scenarios. This page will provide an overview of multiple autoscaler deployment scenarios in combination with CAPX. +See the [Testing](#testing) section to see how scale-up/scale-down events can be triggered to validate the autoscaler behaviour. + +More in-depth information on Autoscaler functionality can be found in the [Kubernetes documentation](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/README.md){target=_blank}. + +All Autoscaler configuration parameters can be found [here](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-the-parameters-to-ca){target=_blank}. + +## Scenario 1: Management cluster managing an external workload cluster +In this scenario, Autoscaler will be running on a management cluster and it will manage an external workload cluster. See the management cluster managing an external workload cluster section of [Kubernetes documentation](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/README.md#autoscaler-running-in-management-cluster-using-service-account-credentials-with-separate-workload-cluster){target=_blank} for more information. + +### Steps +1. Deploy a management cluster and workload cluster. The [CAPI quickstart](https://cluster-api.sigs.k8s.io/user/quick-start.html){target=_blank} can be used as a starting point. + + !!! note + Make sure a CNI is installed in the workload cluster. + +4. Download the example [Autoscaler deployment file](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/examples/deployment.yaml){target=_blank}. +5. Modify the `deployment.yaml` file: + - Change the namespace of all resources to the namespaces of the workload cluster. + - Choose an autoscale image. + - Change the following parameters in the `Deployment` resource: +```YAML + spec: + containers: + name: cluster-autoscaler + command: + - /cluster-autoscaler + args: + - --cloud-provider=clusterapi + - --kubeconfig=/mnt/kubeconfig/kubeconfig.yml + - --clusterapi-cloud-config-authoritative + - -v=1 + volumeMounts: + - mountPath: /mnt/kubeconfig + name: kubeconfig + readOnly: true + ... + volumes: + - name: kubeconfig + secret: + secretName: -kubeconfig + items: + - key: value + path: kubeconfig.yml +``` +7. Apply the `deployment.yaml` file. +```bash +kubectl apply -f deployment.yaml +``` +8. Add the [annotations](#autoscaler-node-group-annotations) to the workload cluster `MachineDeployment` resource. +9. Test Autoscaler. Go to the [Testing](#testing) section. + +## Scenario 2: Autoscaler running on workload cluster +In this scenario, Autoscaler will be deployed [on top of the workload cluster](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/README.md#autoscaler-running-in-a-joined-cluster-using-service-account-credentials){target=_blank} directly. In order for Autoscaler to work, it is required that the workload cluster resources are moved from the management cluster to the workload cluster. + +### Steps +1. Deploy a management cluster and workload cluster. The [CAPI quickstart](https://cluster-api.sigs.k8s.io/user/quick-start.html){target=_blank} can be used as a starting point. +2. Get the kubeconfig file for the workload cluster and use this kubeconfig to login to the workload cluster. +```bash +clusterctl get kubeconfig -n /path/to/kubeconfig +``` +3. Install a CNI in the workload cluster. +4. Initialise the CAPX components on top of the workload cluster: +```bash +clusterctl init --infrastructure nutanix +``` +5. Migrate the workload cluster custom resources to the workload cluster. Run following command from the management cluster: +```bash +clusterctl move -n --to-kubeconfig /path/to/kubeconfig +``` +6. Verify if the cluster has been migrated by running following command on the workload cluster: +```bash +kubectl get cluster -A +``` +7. Download the example [autoscaler deployment file](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/examples/deployment.yaml){target=_blank}. +8. Create the Autoscaler namespace: +```bash +kubectl create ns autoscaler +``` +9. Apply the `deployment.yaml` file +```bash +kubectl apply -f deployment.yaml +``` +10. Add the [annotations](#autoscaler-node-group-annotations) to the workload cluster `MachineDeployment` resource. +11. Test Autoscaler. Go to the [Testing](#testing) section. + +## Testing + +1. Deploy an example Kubernetes application. For example, the one used in the [Kubernetes HorizontalPodAutoscaler Walkthrough](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/). +```bash +kubectl apply -f https://k8s.io/examples/application/php-apache.yaml +``` +2. Increase the amount of replicas of the application to trigger a scale-up event: +``` +kubectl scale deployment php-apache --replicas 100 +``` +3. Decrease the amount of replicas of the application again to trigger a scale-down event. + + !!! note + In case of issues check the logs of the Autoscaler pods. + +4. After a while CAPX, will add more machines. Refer to the [Autoscaler configuration parameters](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-the-parameters-to-ca){target=_blank} to tweak the behaviour and timeouts. + +## Autoscaler node group annotations +Autoscaler uses following annotations to define the upper and lower boundries of the managed machines: + +| Annotation | Example Value | Description | +|-------------------------------------------------------------|---------------|-----------------------------------------------| +| cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size | 5 | Maximum amount of machines in this node group | +| cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size | 1 | Minimum amount of machines in this node group | + +These annotations must be applied to the `MachineDeployment` resources of a CAPX cluster. + +### Example +```YAML +apiVersion: cluster.x-k8s.io/v1beta1 +kind: MachineDeployment +metadata: + annotations: + cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size: "5" + cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size: "1" +``` \ No newline at end of file diff --git a/docs/capx/v1.8.x/experimental/oidc.md b/docs/capx/v1.8.x/experimental/oidc.md new file mode 100644 index 00000000..0c274121 --- /dev/null +++ b/docs/capx/v1.8.x/experimental/oidc.md @@ -0,0 +1,31 @@ +# OIDC integration + +!!! warning + The scenario and features described on this page are experimental. It's important to note that they have not been fully validated. + +Kubernetes allows users to authenticate using various authentication mechanisms. One of these mechanisms is OIDC. Information on how Kubernetes interacts with OIDC providers can be found in the [OpenID Connect Tokens](https://kubernetes.io/docs/reference/access-authn-authz/authentication/#openid-connect-tokens){target=_blank} section of the official Kubernetes documentation. + + +Follow the steps below to configure a CAPX cluster to use an OIDC identity provider. + +## Steps +1. Generate a `cluster.yaml` file with the required CAPX cluster configuration. Refer to the [Getting Started](../getting_started.md){target=_blank} page for more information on how to generate a `cluster.yaml` file. Do not apply the `cluster.yaml` file. +2. Edit the `cluster.yaml` file and search for the `KubeadmControlPlane` resource. +3. Modify/add the `spec.kubeadmConfigSpec.clusterConfiguration.apiServer.extraArgs` attribute and add the required [API server parameters](https://kubernetes.io/docs/reference/access-authn-authz/authentication/#configuring-the-api-server){target=_blank}. See the [example](#example) below. +4. Apply the `cluster.yaml` file +5. Log in with the OIDC provider once the cluster is provisioned + +## Example +```YAML +kind: KubeadmControlPlane +spec: + kubeadmConfigSpec: + clusterConfiguration: + apiServer: + extraArgs: + ... + oidc-client-id: + oidc-issuer-url: + ... +``` + diff --git a/docs/capx/v1.8.x/experimental/proxy.md b/docs/capx/v1.8.x/experimental/proxy.md new file mode 100644 index 00000000..c8f940d4 --- /dev/null +++ b/docs/capx/v1.8.x/experimental/proxy.md @@ -0,0 +1,62 @@ +# Proxy configuration + +!!! warning + The scenario and features described on this page are experimental. It's important to note that they have not been fully validated. + +CAPX can be configured to use a proxy to connect to external networks. This proxy configuration needs to be applied to control plane and worker nodes. + +Follow the steps below to configure a CAPX cluster to use a proxy. + +## Steps +1. Generate a `cluster.yaml` file with the required CAPX cluster configuration. Refer to the [Getting Started](../getting_started.md){target=_blank} page for more information on how to generate a `cluster.yaml` file. Do not apply the `cluster.yaml` file. +2. Edit the `cluster.yaml` file and modify the following resources as shown in the [example](#example) below to add the proxy configuration. + 1. `KubeadmControlPlane`: + * Add the proxy configuration to the `spec.kubeadmConfigSpec.files` list. Do not modify other items in the list. + * Add `systemctl` commands to apply the proxy config in `spec.kubeadmConfigSpec.preKubeadmCommands`. Do not modify other items in the list. + 2. `KubeadmConfigTemplate`: + * Add the proxy configuration to the `spec.template.spec.files` list. Do not modify other items in the list. + * Add `systemctl` commands to apply the proxy config in `spec.template.spec.preKubeadmCommands`. Do not modify other items in the list. +4. Apply the `cluster.yaml` file + +## Example + +```YAML +--- +# controlplane proxy settings +kind: KubeadmControlPlane +spec: + kubeadmConfigSpec: + files: + - content: | + [Service] + Environment="HTTP_PROXY=" + Environment="HTTPS_PROXY=" + Environment="NO_PROXY=" + owner: root:root + path: /etc/systemd/system/containerd.service.d/http-proxy.conf + ... + preKubeadmCommands: + - sudo systemctl daemon-reload + - sudo systemctl restart containerd + ... +--- +# worker proxy settings +kind: KubeadmConfigTemplate +spec: + template: + spec: + files: + - content: | + [Service] + Environment="HTTP_PROXY=" + Environment="HTTPS_PROXY=" + Environment="NO_PROXY=" + owner: root:root + path: /etc/systemd/system/containerd.service.d/http-proxy.conf + ... + preKubeadmCommands: + - sudo systemctl daemon-reload + - sudo systemctl restart containerd + ... +``` + diff --git a/docs/capx/v1.8.x/experimental/registry_mirror.md b/docs/capx/v1.8.x/experimental/registry_mirror.md new file mode 100644 index 00000000..307a9425 --- /dev/null +++ b/docs/capx/v1.8.x/experimental/registry_mirror.md @@ -0,0 +1,96 @@ +# Registry Mirror configuration + +!!! warning + The scenario and features described on this page are experimental. It's important to note that they have not been fully validated. + +CAPX can be configured to use a private registry to act as a mirror of an external public registry. This registry mirror configuration needs to be applied to control plane and worker nodes. + +Follow the steps below to configure a CAPX cluster to use a registry mirror. + +## Steps +1. Generate a `cluster.yaml` file with the required CAPX cluster configuration. Refer to the [Getting Started](../getting_started.md){target=_blank} page for more information on how to generate a `cluster.yaml` file. Do not apply the `cluster.yaml` file. +2. Edit the `cluster.yaml` file and modify the following resources as shown in the [example](#example) below to add the proxy configuration. + 1. `KubeadmControlPlane`: + * Add the registry mirror configuration to the `spec.kubeadmConfigSpec.files` list. Do not modify other items in the list. + * Update `/etc/containerd/config.toml` commands to apply the registry mirror config in `spec.kubeadmConfigSpec.preKubeadmCommands`. Do not modify other items in the list. + 2. `KubeadmConfigTemplate`: + * Add the registry mirror configuration to the `spec.template.spec.files` list. Do not modify other items in the list. + * Update `/etc/containerd/config.toml` commands to apply the registry mirror config in `spec.template.spec.preKubeadmCommands`. Do not modify other items in the list. +4. Apply the `cluster.yaml` file + +## Example + +This example will configure a registry mirror for the following namespace: + +* registry.k8s.io +* ghcr.io +* quay.io + +and redirect them to corresponding projects of the `` registry. + +```YAML +--- +# controlplane proxy settings +kind: KubeadmControlPlane +spec: + kubeadmConfigSpec: + files: + - content: | + [host."https:///v2/registry.k8s.io"] + capabilities = ["pull", "resolve"] + skip_verify = false + override_path = true + owner: root:root + path: /etc/containerd/certs.d/registry.k8s.io/hosts.toml + - content: | + [host."https:///v2/ghcr.io"] + capabilities = ["pull", "resolve"] + skip_verify = false + override_path = true + owner: root:root + path: /etc/containerd/certs.d/ghcr.io/hosts.toml + - content: | + [host."https:///v2/quay.io"] + capabilities = ["pull", "resolve"] + skip_verify = false + override_path = true + owner: root:root + path: /etc/containerd/certs.d/quay.io/hosts.toml + ... + preKubeadmCommands: + - echo '\n[plugins."io.containerd.grpc.v1.cri".registry]\n config_path = "/etc/containerd/certs.d"' >> /etc/containerd/config.toml + ... +--- +# worker proxy settings +kind: KubeadmConfigTemplate +spec: + template: + spec: + files: + - content: | + [host."https:///v2/registry.k8s.io"] + capabilities = ["pull", "resolve"] + skip_verify = false + override_path = true + owner: root:root + path: /etc/containerd/certs.d/registry.k8s.io/hosts.toml + - content: | + [host."https:///v2/ghcr.io"] + capabilities = ["pull", "resolve"] + skip_verify = false + override_path = true + owner: root:root + path: /etc/containerd/certs.d/ghcr.io/hosts.toml + - content: | + [host."https:///v2/quay.io"] + capabilities = ["pull", "resolve"] + skip_verify = false + override_path = true + owner: root:root + path: /etc/containerd/certs.d/quay.io/hosts.toml + ... + preKubeadmCommands: + - echo '\n[plugins."io.containerd.grpc.v1.cri".registry]\n config_path = "/etc/containerd/certs.d"' >> /etc/containerd/config.toml + ... +``` + diff --git a/docs/capx/v1.8.x/experimental/vpc.md b/docs/capx/v1.8.x/experimental/vpc.md new file mode 100644 index 00000000..3513e47e --- /dev/null +++ b/docs/capx/v1.8.x/experimental/vpc.md @@ -0,0 +1,40 @@ +# Creating a workload CAPX cluster in a Nutanix Flow VPC + +!!! warning + The scenario and features described on this page are experimental. It's important to note that they have not been fully validated. + +!!! note + Nutanix Flow VPCs are only validated with CAPX 1.1.3+ + +[Nutanix Flow Virtual Networking](https://portal.nutanix.com/page/documents/details?targetId=Nutanix-Flow-Virtual-Networking-Guide-vpc_2022_9:Nutanix-Flow-Virtual-Networking-Guide-vpc_2022_9){target=_blank} allows users to create Virtual Private Clouds (VPCs) with Overlay networking. +The steps below will illustrate how a CAPX cluster can be deployed inside an overlay subnet (NAT) inside a VPC while the management cluster resides outside of the VPC. + + +## Steps +1. [Request a floating IP](https://portal.nutanix.com/page/documents/details?targetId=Nutanix-Flow-Networking-Guide:ear-flow-nw-request-floating-ip-pc-t.html){target=_blank} +2. Link the floating IP to an internal IP address inside the overlay subnet that will be used to deploy the CAPX cluster. This address will be assigned to the CAPX loadbalancer. To prevent IP conflicts, make sure the IP address is not part of the IP-pool defined in the subnet. +3. Generate a `cluster.yaml` file with the required CAPX cluster configuration where the `CONTROL_PLANE_ENDPOINT_IP` is set to the floating IP requested in the first step. Refer to the [Getting Started](../getting_started.md){target=_blank} page for more information on how to generate a `cluster.yaml` file. Do not apply the `cluster.yaml` file. +4. Edit the `cluster.yaml` file and search for the `KubeadmControlPlane` resource. +5. Modify the `spec.kubeadmConfigSpec.files.*.content` attribute and change the `kube-vip` definition similar to the [example](#example) below. +6. Apply the `cluster.yaml` file. +7. When the CAPX workload cluster is deployed, it will be reachable via the floating IP. + +## Example +```YAML +kind: KubeadmControlPlane +spec: + kubeadmConfigSpec: + files: + - content: | + apiVersion: v1 + kind: Pod + metadata: + name: kube-vip + namespace: kube-system + spec: + containers: + - env: + - name: address + value: "" +``` + diff --git a/docs/capx/v1.8.x/getting_started.md b/docs/capx/v1.8.x/getting_started.md new file mode 100644 index 00000000..5a002ed8 --- /dev/null +++ b/docs/capx/v1.8.x/getting_started.md @@ -0,0 +1,280 @@ +# Getting Started + +This is a guide on getting started with Cluster API Provider Nutanix Cloud Infrastructure (CAPX). To learn more about cluster API in more depth, check out the [Cluster API book](https://cluster-api.sigs.k8s.io/){target=_blank}. + +For more information on how install the Nutanix CSI Driver on a CAPX cluster, visit [Nutanix CSI Driver installation with CAPX](./addons/install_csi_driver.md). + +For more information on how CAPX handles credentials, visit [Credential Management](./credential_management.md). + +For more information on the port requirements for CAPX, visit [Port Requirements](./port_requirements.md). + +!!! note + [Nutanix Cloud Controller Manager (CCM)](../../ccm/latest/overview.md) is a mandatory component starting from CAPX v1.3.0. Ensure all CAPX-managed Kubernetes clusters are configured to use Nutanix CCM before upgrading to v1.3.0 or later. See [CAPX v1.8.x Upgrade Procedure](./tasks/capx_v18x_upgrade_procedure.md). + +## Production Workflow + +### Build OS image for NutanixMachineTemplate resource +Cluster API Provider Nutanix Cloud Infrastructure (CAPX) uses the [Image Builder](https://image-builder.sigs.k8s.io/){target=_blank} project to build OS images used for the Nutanix machines. + +Follow the steps detailed in [Building CAPI Images for Nutanix Cloud Platform (NCP)](https://image-builder.sigs.k8s.io/capi/providers/nutanix.html#building-capi-images-for-nutanix-cloud-platform-ncp){target=_blank} to use Image Builder on the Nutanix Cloud Platform. + +For a list of operating systems visit the OS image [Configuration](https://image-builder.sigs.k8s.io/capi/providers/nutanix.html#configuration){target=_blank} page. + +### Prerequisites for using Cluster API Provider Nutanix Cloud Infrastructure +The [Cluster API installation](https://cluster-api.sigs.k8s.io/user/quick-start.html#installation){target=_blank} section provides an overview of all required prerequisites: + +- [Common Prerequisites](https://cluster-api.sigs.k8s.io/user/quick-start.html#common-prerequisites){target=_blank} +- [Install and/or configure a Kubernetes cluster](https://cluster-api.sigs.k8s.io/user/quick-start.html#install-andor-configure-a-kubernetes-cluster){target=_blank} +- [Install clusterctl](https://cluster-api.sigs.k8s.io/user/quick-start.html#install-clusterctl){target=_blank} +- (Optional) [Enabling Feature Gates](https://cluster-api.sigs.k8s.io/user/quick-start.html#enabling-feature-gates){target=_blank} + +Make sure these prerequisites have been met before moving to the [Configure and Install Cluster API Provider Nutanix Cloud Infrastructure](#configure-and-install-cluster-api-provider-nutanix-cloud-infrastructure) step. + +### Configure and Install Cluster API Provider Nutanix Cloud Infrastructure +To initialize Cluster API Provider Nutanix Cloud Infrastructure, `clusterctl` requires the following variables, which should be set in either `~/.cluster-api/clusterctl.yaml` or as environment variables. +``` +NUTANIX_ENDPOINT: "" # IP or FQDN of Prism Central +NUTANIX_USER: "" # Prism Central user +NUTANIX_PASSWORD: "" # Prism Central password +NUTANIX_INSECURE: false # or true + +KUBERNETES_VERSION: "v1.22.9" +WORKER_MACHINE_COUNT: 3 +NUTANIX_SSH_AUTHORIZED_KEY: "" + +NUTANIX_PRISM_ELEMENT_CLUSTER_NAME: "" +NUTANIX_MACHINE_TEMPLATE_IMAGE_NAME: "" +NUTANIX_SUBNET_NAME: "" + +EXP_CLUSTER_RESOURCE_SET: true # Required for Nutanix CCM installation +``` + +You can also see the required list of variables by running the following: +``` +clusterctl generate cluster mycluster -i nutanix --list-variables +Required Variables: + - CONTROL_PLANE_ENDPOINT_IP + - KUBERNETES_VERSION + - NUTANIX_ENDPOINT + - NUTANIX_MACHINE_TEMPLATE_IMAGE_NAME + - NUTANIX_PASSWORD + - NUTANIX_PRISM_ELEMENT_CLUSTER_NAME + - NUTANIX_SSH_AUTHORIZED_KEY + - NUTANIX_SUBNET_NAME + - NUTANIX_USER + +Optional Variables: + - CONTROL_PLANE_ENDPOINT_PORT (defaults to "6443") + - CONTROL_PLANE_MACHINE_COUNT (defaults to 1) + - KUBEVIP_LB_ENABLE (defaults to "false") + - KUBEVIP_SVC_ENABLE (defaults to "false") + - NAMESPACE (defaults to current Namespace in the KubeConfig file) + - NUTANIX_INSECURE (defaults to "false") + - NUTANIX_MACHINE_BOOT_TYPE (defaults to "legacy") + - NUTANIX_MACHINE_MEMORY_SIZE (defaults to "4Gi") + - NUTANIX_MACHINE_VCPU_PER_SOCKET (defaults to "1") + - NUTANIX_MACHINE_VCPU_SOCKET (defaults to "2") + - NUTANIX_PORT (defaults to "9440") + - NUTANIX_SYSTEMDISK_SIZE (defaults to "40Gi") + - WORKER_MACHINE_COUNT (defaults to 0) +``` + +!!! note + To prevent duplicate IP assignments, it is required to assign an IP-address to the `CONTROL_PLANE_ENDPOINT_IP` variable that is not part of the Nutanix IPAM or DHCP range assigned to the subnet of the CAPX cluster. + +!!! warning + Make sure [Cluster Resource Set (CRS)](https://cluster-api.sigs.k8s.io/tasks/experimental-features/cluster-resource-set){target=_blank} is enabled before running `clusterctl init` + +Now you can instantiate Cluster API with the following: +``` +clusterctl init -i nutanix +``` + +### Deploy a workload cluster on Nutanix Cloud Infrastructure +``` +export TEST_CLUSTER_NAME=mytestcluster1 +export TEST_NAMESPACE=mytestnamespace +CONTROL_PLANE_ENDPOINT_IP=x.x.x.x clusterctl generate cluster ${TEST_CLUSTER_NAME} \ + -i nutanix \ + --target-namespace ${TEST_NAMESPACE} \ + --kubernetes-version v1.22.9 \ + --control-plane-machine-count 1 \ + --worker-machine-count 3 > ./cluster.yaml +kubectl create ns ${TEST_NAMESPACE} +kubectl apply -f ./cluster.yaml -n ${TEST_NAMESPACE} +``` +To customize the configuration of the default `cluster.yaml` file generated by CAPX, visit the [NutanixCluster](./types/nutanix_cluster.md) and [NutanixMachineTemplate](./types/nutanix_machine_template.md) documentation. + +### Access a workload cluster +To access resources on the cluster, you can get the kubeconfig with the following: +``` +clusterctl get kubeconfig ${TEST_CLUSTER_NAME} -n ${TEST_NAMESPACE} > ${TEST_CLUSTER_NAME}.kubeconfig +kubectl --kubeconfig ./${TEST_CLUSTER_NAME}.kubeconfig get nodes +``` + +### Install CNI on a workload cluster + +You must deploy a Container Network Interface (CNI) based pod network add-on so that your pods can communicate with each other. Cluster DNS (CoreDNS) will not start up before a network is installed. + +!!! note + Take care that your pod network must not overlap with any of the host networks. You are likely to see problems if there is any overlap. If you find a collision between your network plugin's preferred pod network and some of your host networks, you must choose a suitable alternative CIDR block to use instead. It can be configured inside the `cluster.yaml` generated by `clusterctl generate cluster` before applying it. + +Several external projects provide Kubernetes pod networks using CNI, some of which also support [Network Policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/){target=_blank}. + +See a list of add-ons that implement the [Kubernetes networking model](https://kubernetes.io/docs/concepts/cluster-administration/networking/#how-to-implement-the-kubernetes-network-model){target=_blank}. At time of writing, the most common are [Calico](https://www.tigera.io/project-calico/){target=_blank} and [Cilium](https://cilium.io){target=_blank}. + +Follow the specific install guide for your selected CNI and install only one pod network per cluster. + +Once a pod network has been installed, you can confirm that it is working by checking that the CoreDNS pod is running in the output of `kubectl get pods --all-namespaces`. + +### Add Failure Domain to Cluster + +To update your cluster to use new or modified failure domains after initial deployment, follow these steps: + +1. Create NutanixFailureDomain resource + + For example, define a failure domain in example.yaml: +``` +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixFailureDomain +metadata: + name: fd-domain-1 +spec: + prismElementCluster: + type: name + name: "PrismClusterA" + subnets: + - type: name + name: "SubnetA" + - type: name + name: "SubnetB" +``` + +2. Apply the resource + +``` +kubectl apply -f example.yaml +``` + +3. Edit the NutanixCluster resource to reference the failure domain(s) + +``` +kubectl edit nutanixcluster -n +``` + + In the spec section, add the controlPlaneFailureDomains field: + +``` +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixCluster +metadata: +spec: + controlPlaneFailureDomains: # add controlPlaneFailureDomains + - name: "fd-domain-1" # failureDomain name + - name: "fd-domain-2" # failureDomain name + controlPlaneEndpoint: + prismCentral: +``` + +4. Verify the update + + Check that the failure domains are registered with the cluster: + +``` +kubectl get cluster -n -o yaml +``` + + Look for the failureDomains in status section: + +``` +failureDomains: + fd-domain-1: + controlPlane: true + fd-domain-2: + controlPlane: true +``` + +### Add Failure Domain to MachineDeployment + +To associate a MachineDeployment with a specific failure domain: + +1. Export the MachineDeployment definition + +``` +kubectl get machinedeployments -n -o yaml > machinedeployment.yaml +``` + +2. Edit the manifest to add the failure domain + + Under spec.template.spec, add a failureDomain field: + +``` +apiVersion: cluster.x-k8s.io/v1beta1 +kind: MachineDeployment +metadata: + name: your-machinedeployment + namespace: your-namespace +spec: + replicas: 3 + selector: + matchLabels: + cluster.x-k8s.io/deployment-name: your-machinedeployment + template: + metadata: + labels: + cluster.x-k8s.io/deployment-name: your-machinedeployment + spec: + failureDomain: "fd-domain-1" + # other fields like bootstrap, infrastructureRef ... +``` + +3. Apply the changes + +``` +kubectl apply -f machinedeployment.yaml +``` + +4. Verify the Update + + Confirm that the failure domain field was updated: + +``` +kubectl get machinedeployments -n -o yaml | grep failureDomain +``` + +5. Check placement of machines + + Ensure new machines are placed in the specified failure domain: + +``` +kubectl get machines -l cluster.x-k8s.io/deployment-name= -n -o yaml +``` + +### Kube-vip settings + +Kube-vip is a true load balancing solution for the Kubernetes control plane. It distributes API requests across control plane nodes. It also has the capability to provide load balancing for Kubernetes services. + +You can tweak kube-vip settings by using the following properties: + +- `KUBEVIP_LB_ENABLE` + +This setting allows control plane load balancing using IPVS. See +[Control Plane Load-Balancing documentation](https://kube-vip.io/docs/about/architecture/#control-plane-load-balancing){target=_blank} for further information. + +- `KUBEVIP_SVC_ENABLE` + +This setting enables a service of type LoadBalancer. See +[Kubernetes Service Load Balancing documentation](https://kube-vip.io/docs/about/architecture/#kubernetes-service-load-balancing){target=_blank} for further information. + +- `KUBEVIP_SVC_ELECTION` + +This setting enables Load Balancing of Load Balancers. See [Load Balancing Load Balancers](https://kube-vip.io/docs/usage/kubernetes-services/#load-balancing-load-balancers-when-using-arp-mode-yes-you-read-that-correctly-kube-vip-v050){target=_blank} for further information. + +### Delete a workload cluster +To remove a workload cluster from your management cluster, remove the cluster object and the provider will clean-up all resources. + +``` +kubectl delete cluster ${TEST_CLUSTER_NAME} -n ${TEST_NAMESPACE} +``` +!!! note + Deleting the entire cluster template with `kubectl delete -f ./cluster.yaml` may lead to pending resources requiring manual cleanup. diff --git a/docs/capx/v1.8.x/pc_certificates.md b/docs/capx/v1.8.x/pc_certificates.md new file mode 100644 index 00000000..f3fe1699 --- /dev/null +++ b/docs/capx/v1.8.x/pc_certificates.md @@ -0,0 +1,149 @@ +# Certificate Trust + +CAPX invokes Prism Central APIs using the HTTPS protocol. CAPX has different methods to handle the trust of the Prism Central certificates: + +- Enable certificate verification (default) +- Configure an additional trust bundle +- Disable certificate verification + +See the respective sections below for more information. + +!!! note + For more information about replacing Prism Central certificates, see the [Nutanix AOS Security Guide](https://portal.nutanix.com/page/documents/details?targetId=Nutanix-Security-Guide-v6_5:mul-security-ssl-certificate-pc-t.html){target=_blank}. + +## Enable certificate verification (default) +By default CAPX will perform certificate verification when invoking Prism Central API calls. This requires Prism Central to be configured with a publicly trusted certificate authority. +No additional configuration is required in CAPX. + +## Configure an additional trust bundle +CAPX allows users to configure an additional trust bundle. This will allow CAPX to verify certificates that are not issued by a publicy trusted certificate authority. + +To configure an additional trust bundle, the `NUTANIX_ADDITIONAL_TRUST_BUNDLE` environment variable needs to be set. The value of the `NUTANIX_ADDITIONAL_TRUST_BUNDLE` environment variable contains the trust bundle (PEM format) in base64 encoded format. See the [Configuring the trust bundle environment variable](#configuring-the-trust-bundle-environment-variable) section for more information. + +It is also possible to configure the additional trust bundle manually by creating a custom `cluster-template`. See the [Configuring the additional trust bundle manually](#configuring-the-additional-trust-bundle-manually) section for more information + +The `NUTANIX_ADDITIONAL_TRUST_BUNDLE` environment variable can be set when initializing the CAPX provider or when creating a workload cluster. If the `NUTANIX_ADDITIONAL_TRUST_BUNDLE` is configured when the CAPX provider is initialized, the additional trust bundle will be used for every CAPX workload cluster. If it is only configured when creating a workload cluster, it will only be applicable for that specific workload cluster. + + +### Configuring the trust bundle environment variable + +Create a PEM encoded file containing the root certificate and all intermediate certificates. Example: +``` +$ cat cert.crt +-----BEGIN CERTIFICATE----- + +-----END CERTIFICATE----- +-----BEGIN CERTIFICATE----- + +-----END CERTIFICATE----- +``` + +Use a `base64` tool to encode these contents in base64. The command below will provide a `base64` string. +``` +$ cat cert.crt | base64 + +``` +!!! note + Make sure the `base64` string does not contain any newlines (`\n`). If the output string contains newlines, remove them manually or check the manual of the `base64` tool on how to generate a `base64` string without newlines. + +Use the `base64` string as value for the `NUTANIX_ADDITIONAL_TRUST_BUNDLE` environment variable. +``` +$ export NUTANIX_ADDITIONAL_TRUST_BUNDLE="" +``` + +### Configuring the additional trust bundle manually + +To configure the additional trust bundle manually without using the `NUTANIX_ADDITIONAL_TRUST_BUNDLE` environment variable present in the default `cluster-template` files, it is required to: + +- Create a `ConfigMap` containing the additional trust bundle. +- Configure the `prismCentral.additionalTrustBundle` object in the `NutanixCluster` spec. + +#### Creating the additional trust bundle ConfigMap + +CAPX supports two different formats for the ConfigMap containing the additional trust bundle. The first one is to add the additional trust bundle as a multi-line string in the `ConfigMap`, the second option is to add the trust bundle in `base64` encoded format. See the examples below. + +Multi-line string example: +```YAML +--- +apiVersion: v1 +kind: ConfigMap +metadata: + name: user-ca-bundle + namespace: ${NAMESPACE} +data: + ca.crt: | + -----BEGIN CERTIFICATE----- + + -----END CERTIFICATE----- + -----BEGIN CERTIFICATE----- + + -----END CERTIFICATE----- +``` + +`base64` example: + +```YAML +apiVersion: v1 +kind: ConfigMap +metadata: + name: user-ca-bundle + namespace: ${NAMESPACE} +binaryData: + ca.crt: +``` + +!!! note + The `base64` string needs to be added as `binaryData`. + + +#### Configuring the NutanixCluster spec + +When the additional trust bundle `ConfigMap` is created, it needs to be referenced in the `NutanixCluster` spec. Add the `prismCentral.additionalTrustBundle` object in the `NutanixCluster` spec as shown below. Make sure the correct additional trust bundle `ConfigMap` is referenced. + +```YAML +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixCluster +metadata: + name: ${CLUSTER_NAME} + namespace: ${NAMESPACE} +spec: + ... + prismCentral: + ... + additionalTrustBundle: + kind: ConfigMap + name: user-ca-bundle + insecure: false +``` + +!!! note + the default value of `prismCentral.insecure` attribute is `false`. It can be omitted when an additional trust bundle is configured. + + If `prismCentral.insecure` attribute is set to `true`, all certificate verification will be disabled. + + +## Disable certificate verification + +!!! note + Disabling certificate verification is not recommended for production purposes and should only be used for testing. + + +Certificate verification can be disabled by setting the `prismCentral.insecure` attribute to `true` in the `NutanixCluster` spec. Certificate verification will be disabled even if an additional trust bundle is configured. + +Disabled certificate verification example: + +```YAML +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixCluster +metadata: + name: ${CLUSTER_NAME} + namespace: ${NAMESPACE} +spec: + controlPlaneEndpoint: + host: ${CONTROL_PLANE_ENDPOINT_IP} + port: ${CONTROL_PLANE_ENDPOINT_PORT=6443} + prismCentral: + ... + insecure: true + ... +``` \ No newline at end of file diff --git a/docs/capx/v1.8.x/port_requirements.md b/docs/capx/v1.8.x/port_requirements.md new file mode 100644 index 00000000..af182abb --- /dev/null +++ b/docs/capx/v1.8.x/port_requirements.md @@ -0,0 +1,19 @@ +# Port Requirements + +CAPX uses the ports documented below to create workload clusters. + +!!! note + This page only documents the ports specifically required by CAPX and does not provide the full overview of all ports required in the CAPI framework. + +## Management cluster + +| Source | Destination | Protocol | Port | Description | +|--------------------|---------------------|----------|------|--------------------------------------------------------------------------------------------------| +| Management cluster | External Registries | TCP | 443 | Pull container images from [CAPX public registries](#public-registries-utilized-when-using-capx) | +| Management cluster | Prism Central | TCP | 9440 | Management cluster communication to Prism Central | + +## Public registries utilized when using CAPX + +| Registry name | +|---------------| +| ghcr.io | diff --git a/docs/capx/v1.8.x/tasks/capx_v18x_upgrade_procedure.md b/docs/capx/v1.8.x/tasks/capx_v18x_upgrade_procedure.md new file mode 100644 index 00000000..0f0e6154 --- /dev/null +++ b/docs/capx/v1.8.x/tasks/capx_v18x_upgrade_procedure.md @@ -0,0 +1,83 @@ +# CAPX v1.8.x Upgrade Procedure + +Starting from CAPX v1.3.0, it is required for all CAPX-managed Kubernetes clusters to use the Nutanix Cloud Controller Manager (CCM). + +Before upgrading CAPX instances to v1.3.0 or later, it is required to follow the [steps](#steps) detailed below for each of the CAPX-managed Kubernetes clusters that don't use Nutanix CCM. + + +## Steps + +This procedure uses [Cluster Resource Set (CRS)](https://cluster-api.sigs.k8s.io/tasks/experimental-features/cluster-resource-set){target=_blank} to install Nutanix CCM but it can also be installed using the [Nutanix CCM Helm chart](https://artifacthub.io/packages/helm/nutanix/nutanix-cloud-provider){target=_blank}. + +!!! warning + Make sure [CRS](https://cluster-api.sigs.k8s.io/tasks/experimental-features/cluster-resource-set){target=_blank} is enabled on the management cluster before following the procedure. + +Perform following steps for each of the CAPX-managed Kubernetes clusters that are not configured to use Nutanix CCM: + +1. Add the `cloud-provider: external` configuration in the `KubeadmConfigTemplate` resources: + ```YAML + apiVersion: bootstrap.cluster.x-k8s.io/v1beta1 + kind: KubeadmConfigTemplate + spec: + template: + spec: + joinConfiguration: + nodeRegistration: + kubeletExtraArgs: + cloud-provider: external + ``` +2. Add the `cloud-provider: external` configuration in the `KubeadmControlPlane` resource: +```YAML +--- +apiVersion: bootstrap.cluster.x-k8s.io/v1beta1 +kind: KubeadmConfigTemplate +spec: + template: + spec: + joinConfiguration: + nodeRegistration: + kubeletExtraArgs: + cloud-provider: external +--- +apiVersion: controlplane.cluster.x-k8s.io/v1beta1 +kind: KubeadmControlPlane +spec: + kubeadmConfigSpec: + clusterConfiguration: + apiServer: + extraArgs: + cloud-provider: external + controllerManager: + extraArgs: + cloud-provider: external + initConfiguration: + nodeRegistration: + kubeletExtraArgs: + cloud-provider: external + joinConfiguration: + nodeRegistration: + kubeletExtraArgs: + cloud-provider: external +``` +3. Add the Nutanix CCM CRS resources: + + - [nutanix-ccm-crs.yaml](https://github.com/nutanix-cloud-native/cluster-api-provider-nutanix/blob/v1.8.0/templates/ccm/nutanix-ccm-crs.yaml){target=_blank} + - [nutanix-ccm-secret.yaml](https://github.com/nutanix-cloud-native/cluster-api-provider-nutanix/blob/v1.8.0/templates/ccm/nutanix-ccm-secret.yaml) + - [nutanix-ccm.yaml](https://github.com/nutanix-cloud-native/cluster-api-provider-nutanix/blob/v1.8.0/templates/ccm/nutanix-ccm.yaml) + + Make sure to update each of the variables before applying the `YAML` files. + +4. Add the `ccm: nutanix` label to the `Cluster` resource: + ```YAML + apiVersion: cluster.x-k8s.io/v1beta1 + kind: Cluster + metadata: + labels: + ccm: nutanix + ``` +5. Verify if the Nutanix CCM pod is up and running: +``` +kubectl get pod -A -l k8s-app=nutanix-cloud-controller-manager +``` +6. Trigger a new rollout of the Kubernetes nodes by performing a Kubernetes upgrade or by using `clusterctl alpha rollout restart`. See the [clusterctl alpha rollout](https://cluster-api.sigs.k8s.io/clusterctl/commands/alpha-rollout#restart){target=_blank} for more information. +7. Upgrade CAPX to v1.8.0 by following the [clusterctl upgrade](https://cluster-api.sigs.k8s.io/clusterctl/commands/upgrade.html?highlight=clusterctl%20upgrade%20pla#clusterctl-upgrade){target=_blank} documentation \ No newline at end of file diff --git a/docs/capx/v1.8.x/tasks/modify_machine_configuration.md b/docs/capx/v1.8.x/tasks/modify_machine_configuration.md new file mode 100644 index 00000000..04a43a95 --- /dev/null +++ b/docs/capx/v1.8.x/tasks/modify_machine_configuration.md @@ -0,0 +1,11 @@ +# Modifying Machine Configurations + +Since all attributes of the `NutanixMachineTemplate` resources are immutable, follow the [Updating Infrastructure Machine Templates](https://cluster-api.sigs.k8s.io/tasks/updating-machine-templates.html?highlight=machine%20template#updating-infrastructure-machine-templates){target=_blank} procedure to modify the configuration of machines in an existing CAPX cluster. +See the [NutanixMachineTemplate](../types/nutanix_machine_template.md) documentation for all supported configuration parameters. + +!!! note + Manually modifying existing and linked `NutanixMachineTemplate` resources will not trigger a rolling update of the machines. + +!!! note + Do not modify the virtual machine configuration of CAPX cluster nodes manually in Prism/Prism Central. + CAPX will not automatically revert the configuration change but performing scale-up/scale-down/upgrade operations will override manual modifications. Only use the `Updating Infrastructure Machine` procedure referenced above to perform configuration changes. \ No newline at end of file diff --git a/docs/capx/v1.8.x/topology/capx_multi_pe.md b/docs/capx/v1.8.x/topology/capx_multi_pe.md new file mode 100644 index 00000000..bd52ccd7 --- /dev/null +++ b/docs/capx/v1.8.x/topology/capx_multi_pe.md @@ -0,0 +1,30 @@ +# Creating a workload CAPX cluster spanning Prism Element clusters + +!!! warning + The scenario and features described on this page are experimental. It's important to note that they have not been fully validated. + +This page will explain how to deploy CAPX-based Kubernetes clusters where worker nodes are spanning multiple Prism Element (PE) clusters. + +!!! note + All the PE clusters must be managed by the same Prism Central (PC) instance. + +The topology will look like this: + +- One PC managing multiple PE's +- One CAPI management cluster +- One CAPI workload cluster with multiple `MachineDeployment`resources + +Refer to the [CAPI quickstart](https://cluster-api.sigs.k8s.io/user/quick-start.html){target=_blank} to get started with CAPX. + +To create workload clusters spanning multiple Prism Element clusters, it is required to create a `MachineDeployment` and `NutanixMachineTemplate` resource for each Prism Element cluster. The Prism Element specific parameters (name/UUID, subnet,...) are referenced in the `NutanixMachineTemplate`. + +## Steps +1. Create a management cluster that has the CAPX infrastructure provider deployed. +2. Create a `cluster.yml` file containing the workload cluster definition. Refer to the steps defined in the [CAPI quickstart guide](https://cluster-api.sigs.k8s.io/user/quick-start.html){target=_blank} to create an example `cluster.yml` file. +3. Add additional `MachineDeployment` and `NutanixMachineTemplate` resources. + + By default there is only one machine template and machine deployment defined. To add nodes residing on another Prism Element cluster, a new `MachineDeployment` and `NutanixMachineTemplate` resource needs to be added to the yaml file. The autogenerated `MachineDeployment` and `NutanixMachineTemplate` resource definitions can be used as a baseline. + + Make sure to modify the `MachineDeployment` and `NutanixMachineTemplate` parameters. + +4. Apply the modified `cluster.yml` file to the management cluster. diff --git a/docs/capx/v1.8.x/troubleshooting.md b/docs/capx/v1.8.x/troubleshooting.md new file mode 100644 index 00000000..c023d13e --- /dev/null +++ b/docs/capx/v1.8.x/troubleshooting.md @@ -0,0 +1,13 @@ +# Troubleshooting + +## Clusterctl failed with GitHub rate limit error + +By design Clusterctl fetches artifacts from repositories hosted on GitHub, this operation is subject to [GitHub API rate limits](https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting){target=_blank}. + +While this is generally okay for the majority of users, there is still a chance that some users (especially developers or CI tools) hit this limit: + +``` +Error: failed to get repository client for the XXX with name YYY: error creating the GitHub repository client: failed to get GitHub latest version: failed to get the list of versions: rate limit for github api has been reached. Please wait one hour or get a personal API tokens a assign it to the GITHUB_TOKEN environment variable +``` + +As explained in the error message, you can increase your API rate limit by [creating a GitHub personal token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token){target=_blank} and setting a `GITHUB_TOKEN` environment variable using the token. diff --git a/docs/capx/v1.8.x/types/nutanix_cluster.md b/docs/capx/v1.8.x/types/nutanix_cluster.md new file mode 100644 index 00000000..daa8d8cc --- /dev/null +++ b/docs/capx/v1.8.x/types/nutanix_cluster.md @@ -0,0 +1,55 @@ +# NutanixCluster + +The `NutanixCluster` resource defines the configuration of a CAPX Kubernetes cluster. + +Example of a `NutanixCluster` resource: + +```YAML +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixCluster +metadata: + name: ${CLUSTER_NAME} + namespace: ${NAMESPACE} +spec: + controlPlaneEndpoint: + host: ${CONTROL_PLANE_ENDPOINT_IP} + port: ${CONTROL_PLANE_ENDPOINT_PORT=6443} + prismCentral: + address: ${NUTANIX_ENDPOINT} + additionalTrustBundle: + kind: ConfigMap + name: user-ca-bundle + credentialRef: + kind: Secret + name: ${CLUSTER_NAME} + insecure: ${NUTANIX_INSECURE=false} + port: ${NUTANIX_PORT=9440} +``` + +## NutanixCluster spec +The table below provides an overview of the supported parameters of the `spec` attribute of a `NutanixCluster` resource. + +### Configuration parameters + +| Key |Type |Description | +|--------------------------------------------|------|----------------------------------------------------------------------------------| +|controlPlaneEndpoint |object|Defines the host IP and port of the CAPX Kubernetes cluster. | +|controlPlaneEndpoint.host |string|Host IP to be assigned to the CAPX Kubernetes cluster. | +|controlPlaneEndpoint.port |int |Port of the CAPX Kubernetes cluster. Default: `6443` | +|prismCentral |object|(Optional) Prism Central endpoint definition. | +|prismCentral.address |string|IP/FQDN of Prism Central. | +|prismCentral.port |int |Port of Prism Central. Default: `9440` | +|prismCentral.insecure |bool |Disable Prism Central certificate checking. Default: `false` | +|prismCentral.credentialRef |object|Reference to credentials used for Prism Central connection. | +|prismCentral.credentialRef.kind |string|Kind of the credentialRef. Allowed value: `Secret` | +|prismCentral.credentialRef.name |string|Name of the secret containing the Prism Central credentials. | +|prismCentral.credentialRef.namespace |string|(Optional) Namespace of the secret containing the Prism Central credentials. | +|prismCentral.additionalTrustBundle |object|Reference to the certificate trust bundle used for Prism Central connection. | +|prismCentral.additionalTrustBundle.kind |string|Kind of the additionalTrustBundle. Allowed value: `ConfigMap` | +|prismCentral.additionalTrustBundle.name |string|Name of the `ConfigMap` containing the Prism Central trust bundle. | +|prismCentral.additionalTrustBundle.namespace|string|(Optional) Namespace of the `ConfigMap` containing the Prism Central trust bundle.| +|controlPlaneFailureDomains |list |(optional) List of local references to failure domains for control plane nodes. | +|controlPlaneFailureDomains.Name |string|Name of the failure domain used for control plane nodes. | + +!!! note + To prevent duplicate IP assignments, it is required to assign an IP-address to the `controlPlaneEndpoint.host` variable that is not part of the Nutanix IPAM or DHCP range assigned to the subnet of the CAPX cluster. \ No newline at end of file diff --git a/docs/capx/v1.8.x/types/nutanix_failure_domains.md b/docs/capx/v1.8.x/types/nutanix_failure_domains.md new file mode 100644 index 00000000..cefae92c --- /dev/null +++ b/docs/capx/v1.8.x/types/nutanix_failure_domains.md @@ -0,0 +1,99 @@ +# NutanixFailureDomain + +The `NutanixFailureDomain` resource configuration of a CAPX Kubernetes Failure Domain. + +Example of a `NutanixFailureDomain` resource: +```YAML +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixFailureDomain +metadata: + name: "${FAILURE_DOMAIN_NAME}" + namespace: "${CLUSTER_NAMESPACE}" +spec: + prismElementCluster: + type: name + uuid: "${NUTANIX_PRISM_ELEMENT_CLUSTER_NAME}" + subnets: + - type: uuid + uuid: "${NUTANIX_SUBNET_UUID}" + - type: name + name: "${NUTANIX_SUBNET_NAME}" +``` + +## NutanixFailureDomain spec +The table below provides an overview of the supported parameters of the `spec` attribute of a `NutanixFailureDomain` resource. + +### Configuration parameters +| Key |Type |Description | +|--------------------------------------------|------|--------------------------------------------------------------------------------------------| +|prismElementCluster |object|Defines the identify the Prism Element cluster in the Prism Central for the failure domain. | +|prismElementCluster.type |string|Type to identify the Prism Element cluster. Allowed values: `name` and `uuid` | +|prismElementCluster.name |string|Name of the Prism Element cluster. | +|prismElementCluster.uuid |string|UUID of the Prism Element cluster. | +|subnets |list |Reference (name or uuid) to the subnets to be assigned to the VMs. | +|subnets.[].type |string|Type to identify the subnet. Allowed values: `name` and `uuid` | +|subnets.[].name |string|Name of the subnet. | +|subnets.[].uuid |string|UUID of the subnet. | + +!!! note + The `NutanixFailureDomain` resource allows you to define logical groupings of Nutanix infrastructure for high availability and workload placement in Kubernetes clusters managed by CAPX. Each failure domain maps to a Prism Element cluster and a set of subnets, ensuring that workloads can be distributed across different infrastructure segments. + +## Usage Notes + +- The `prismElementCluster` field is **required** and must specify either the `name` or `uuid` of the Prism Element cluster. +- The `subnets` field is **required**. You can provide one or more subnets by `name` or `uuid`. +- Failure domains are used by Cluster API to spread machines across different infrastructure segments for resilience. + +## Example Scenarios + +### Single Subnet by UUID + +```yaml +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixFailureDomain +metadata: + name: fd-uuid +spec: + prismElementCluster: + type: uuid + uuid: "00000000-0000-0000-0000-000000000000" + subnets: + - type: uuid + uuid: "11111111-1111-1111-1111-111111111111" +``` + +### Multiple Subnets by Name + +```yaml +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixFailureDomain +metadata: + name: fd-names +spec: + prismElementCluster: + type: name + name: "PrismClusterA" + subnets: + - type: name + name: "SubnetA" + - type: name + name: "SubnetB" +``` + +### Multiple Subnets by Name and UUID + +```yaml +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixFailureDomain +metadata: + name: fd-names +spec: + prismElementCluster: + type: name + name: "PrismClusterA" + subnets: + - type: name + name: "SubnetA" + - type: uuid + name: "11111111-1111-1111-1111-111111111111" +``` \ No newline at end of file diff --git a/docs/capx/v1.8.x/types/nutanix_machine_template.md b/docs/capx/v1.8.x/types/nutanix_machine_template.md new file mode 100644 index 00000000..4aa613b8 --- /dev/null +++ b/docs/capx/v1.8.x/types/nutanix_machine_template.md @@ -0,0 +1,124 @@ +# NutanixMachineTemplate +The `NutanixMachineTemplate` resource defines the configuration of a CAPX Kubernetes VM. + +Example of a `NutanixMachineTemplate` resource. + +```YAML +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: NutanixMachineTemplate +metadata: + name: "${CLUSTER_NAME}-mt-0" + namespace: "${NAMESPACE}" +spec: + template: + spec: + providerID: "nutanix://${CLUSTER_NAME}-m1" + # Supported options for boot type: legacy and uefi + # Defaults to legacy if not set + bootType: ${NUTANIX_MACHINE_BOOT_TYPE=legacy} + vcpusPerSocket: ${NUTANIX_MACHINE_VCPU_PER_SOCKET=1} + vcpuSockets: ${NUTANIX_MACHINE_VCPU_SOCKET=2} + memorySize: "${NUTANIX_MACHINE_MEMORY_SIZE=4Gi}" + systemDiskSize: "${NUTANIX_SYSTEMDISK_SIZE=40Gi}" + image: + type: name + name: "${NUTANIX_MACHINE_TEMPLATE_IMAGE_NAME}" + cluster: + type: name + name: "${NUTANIX_PRISM_ELEMENT_CLUSTER_NAME}" + subnet: + - type: name + name: "${NUTANIX_SUBNET_NAME}" + # Adds additional categories to the virtual machines. + # Note: Categories must already be present in Prism Central + # additionalCategories: + # - key: AppType + # value: Kubernetes + # Adds the cluster virtual machines to a project defined in Prism Central. + # Replace NUTANIX_PROJECT_NAME with the correct project defined in Prism Central + # Note: Project must already be present in Prism Central. + # project: + # type: name + # name: "NUTANIX_PROJECT_NAME" + # gpus: + # - type: name + # name: "GPU NAME" + # Note: Either of `image` or `imageLookup` must be set, but not both. + # imageLookup: + # format: "NUTANIX_IMAGE_LOOKUP_FORMAT" + # baseOS: "NUTANIX_IMAGE_LOOKUP_BASE_OS" + # dataDisks: + # - diskSize: + # deviceProperties: + # deviceType: Disk + # adapterType: SCSI + # deviceIndex: 1 + # storageConfig: + # diskMode: Standard + # storageContainer: + # type: name + # name: "NUTANIX_VM_DISK_STORAGE_CONTAINER" + # dataSource: + # type: name + # name: "NUTANIX_DATA_SOURCE_IMAGE_NAME" +``` + +## NutanixMachineTemplate spec +The table below provides an overview of the supported parameters of the `spec` attribute of a `NutanixMachineTemplate` resource. + +### Configuration parameters +| Key |Type |Description | +|----------------------------------------------------|------|--------------------------------------------------------------------------------------------------------| +|bootType |string|Boot type of the VM. Depends on the OS image used. Allowed values: `legacy`, `uefi`. Default: `legacy` | +|vcpusPerSocket |int |Amount of vCPUs per socket. Default: `1` | +|vcpuSockets |int |Amount of vCPU sockets. Default: `2` | +|memorySize |string|Amount of Memory. Default: `4Gi` | +|systemDiskSize |string|Amount of storage assigned to the system disk. Default: `40Gi` | +|image |object|Reference (name or uuid) to the OS image used for the system disk. | +|image.type |string|Type to identify the OS image. Allowed values: `name` and `uuid` | +|image.name |string|Name of the image. | +|image.uuid |string|UUID of the image. | +|cluster |object|(Optional) Reference (name or uuid) to the Prism Element cluster. Name or UUID can be passed | +|cluster.type |string|Type to identify the Prism Element cluster. Allowed values: `name` and `uuid` | +|cluster.name |string|Name of the Prism Element cluster. | +|cluster.uuid |string|UUID of the Prism Element cluster. | +|subnets |list |(Optional) Reference (name or uuid) to the subnets to be assigned to the VMs. | +|subnets.[].type |string|Type to identify the subnet. Allowed values: `name` and `uuid` | +|subnets.[].name |string|Name of the subnet. | +|subnets.[].uuid |string|UUID of the subnet. | +|additionalCategories |list |Reference to the categories to be assigned to the VMs. These categories already exist in Prism Central. | +|additionalCategories.[].key |string|Key of the category. | +|additionalCategories.[].value |string|Value of the category. | +|project |object|Reference (name or uuid) to the project. This project must already exist in Prism Central. | +|project.type |string|Type to identify the project. Allowed values: `name` and `uuid` | +|project.name |string|Name of the project. | +|project.uuid |string|UUID of the project. | +|gpus |object|Reference (name or deviceID) to the GPUs to be assigned to the VMs. Can be vGPU or Passthrough. | +|gpus.[].type |string|Type to identify the GPU. Allowed values: `name` and `deviceID` | +|gpus.[].name |string|Name of the GPU or the vGPU profile | +|gpus.[].deviceID |string|DeviceID of the GPU or the vGPU profile | +|imageLookup |object|(Optional) Reference to a container that holds how to look up rhcos images for the cluster. | +|imageLookup.format |string|Naming format to look up the image for the machine. Default: `capx-{{.BaseOS}}-{{.K8sVersion}}-*` | +|imageLookup.baseOS |string|Name of the base operating system to use for image lookup. | +|dataDisks |list |(Optional) Reference to the data disks to be attached to the VM. | +|dataDisks.[].diskSize |string|Size (in Quantity format) of the disk attached to the VM. The minimum diskSize is `1GB`. | +|dataDisks.[].deviceProperties |object|(Optional) Reference to the properties of the disk device. | +|dataDisks.[].deviceProperties.deviceType |string|VM disk device type. Allowed values: `Disk` (default) and `CDRom` | +|dataDisks.[].deviceProperties.adapterType |string|Adapter type of the disk address. | +|dataDisks.[].deviceProperties.deviceIndex |int |(Optional) Index of the disk address. Allowed values: non-negative integers (default: `0`) | +|dataDisks.[].storageConfig |object|(Optional) Reference to the storage configuration parameters of the VM disks. | +|dataDisks.[].storageConfig.diskMode |string|Specifies the disk mode. Allowed values: `Standard` (default) and `Flash` | +|dataDisks.[].storageConfig.storageContainer |object|(Optional) Reference (name or uuid) to the storage_container used by the VM disk. | +|dataDisks.[].storageConfig.storageContainer.type |string|Type to identify the storage container. Allowed values: `name` and `uuid` | +|dataDisks.[].storageConfig.storageContainer.name |string|Name of the storage container. | +|dataDisks.[].storageConfig.storageContainer.uuid |string|UUID of the storage container. | +|dataDisks.[].dataSource |object|(Optional) Reference (name or uuid) to a data source image for the VM disk. | +|dataDisks.[].dataSource.type |string|Type to identify the data source image. Allowed values: `name` and `uuid` | +|dataDisks.[].dataSource.name |string|Name of the data source image. | +|dataDisks.[].dataSource.uuid |string|UUID of the data source image. | + +!!! note + - The `cluster` or `subnets` configuration parameters are optional in case failure domains are defined on the `NutanixCluster` and `MachineDeployment` resources. + - If the `deviceType` is `Disk`, the valid `adapterType` can be `SCSI`, `IDE`, `PCI`, `SATA` or `SPAPR`. If the `deviceType` is `CDRom`, the valid `adapterType` can be `IDE` or `SATA`. + - Either of `image` or `imageLookup` must be set, but not both. + - For a Machine VM, the `deviceIndex` for the disks with the same `deviceType.adapterType` combination should start from `0` and increase consecutively afterwards. Note that for each Machine VM, the `Disk.SCSI.0` and `CDRom.IDE.0` are reserved to be used by the VM's system. So for `dataDisks` of Disk.SCSI and CDRom.IDE, the `deviceIndex` should start from `1`. \ No newline at end of file diff --git a/docs/capx/v1.8.x/user_requirements.md b/docs/capx/v1.8.x/user_requirements.md new file mode 100644 index 00000000..6ee9b802 --- /dev/null +++ b/docs/capx/v1.8.x/user_requirements.md @@ -0,0 +1,67 @@ +# User Requirements + +Cluster API Provider Nutanix Cloud Infrastructure (CAPX) interacts with Nutanix Prism Central (PC) APIs using a Prism Central user account. + +CAPX supports two types of PC users: + +- Local users: must be assigned the `Prism Central Admin` role. +- Domain users: must be assigned a role that at least has the [Minimum required CAPX permissions for domain users](#minimum-required-capx-permissions-for-domain-users) assigned. + +See [Credential Management](./credential_management.md){target=_blank} for more information on how to pass the user credentials to CAPX. + +## Minimum required CAPX permissions for domain users + +The following permissions are required for Prism Central domain users: + +- Create Category +- View Cluster Pgpu Profiles +- View Cluster Vgpu Profiles +- Create Image +- Create New Virtual Machine +- Delete Image +- Delete Category +- Delete Virtual Machine +- Detach Volume Group From AHV VM +- Power On Virtual Machine +- View Category +- View Cluster +- View Image +- View Project +- View Subnet +- View Virtual Machine + +!!! note + The list of permissions has been validated on PC 7.3 and above. + +## CAPX v1.8.x Upgrade Requirements + +When upgrading CAPX v1.7.x to v1.8.x, users must meet the following additional requirements: + +The following permissions are required for Prism Central domain users: + +- Create Category +- Create Category Mapping +- Create Image +- Create New Virtual Machine +- Create Or Update Name Category +- Create Or Update Value Category +- Create Virtual Machine +- Delete Category +- Delete Category Mapping +- Delete Image +- Delete Name Category +- Delete Value Category +- Delete Virtual Machine +- Detach Volume Group From AHV VM +- Power On Virtual Machine +- View Category +- View Category Mapping +- View Cluster +- View Cluster Pgpu Profiles +- View Cluster Vgpu Profiles +- View Image +- View Name Category +- View Project +- View Subnet +- View Value Category +- View Virtual Machine diff --git a/docs/capx/v1.8.x/validated_integrations.md b/docs/capx/v1.8.x/validated_integrations.md new file mode 100644 index 00000000..8e407150 --- /dev/null +++ b/docs/capx/v1.8.x/validated_integrations.md @@ -0,0 +1,56 @@ +# Validated Integrations + +Validated integrations are a defined set of specifically tested configurations between technologies that represent the most common combinations that Nutanix customers are using or deploying with CAPX. For these integrations, Nutanix has directly, or through certified partners, exercised a full range of platform tests as part of the product release process. + +## Integration Validation Policy + +Nutanix follows the version validation policies below: + +- Validate at least one active AOS LTS (long term support) version. Validated AOS LTS version for a specific CAPX version is listed in the [AOS](#aos) section.
+ + !!! note + + Typically the latest LTS release at time of CAPX release except when latest is initial release in train (eg x.y.0). Exact version depends on timing and customer adoption. + +- Validate the latest AOS STS (short term support) release at time of CAPX release. +- Validate at least one active Prism Central (PC) version. Validated PC version for a specific CAPX version is listed in the [Prism Central](#prism-central) section.
+ + !!! note + + Typically the the latest PC release at time of CAPX release except when latest is initial release in train (eg x.y.0). Exact version depends on timing and customer adoption. + +- At least one active Cluster-API (CAPI) version. Validated CAPI version for a specific CAPX version is listed in the [Cluster-API](#cluster-api) section.
+ + !!! note + + Typically the the latest Cluster-API release at time of CAPX release except when latest is initial release in train (eg x.y.0). Exact version depends on timing and customer adoption. + +## Validated versions +### Cluster-API +| CAPX | CAPI v1.8.x | CAPI v1.9.x | CAPI v1.10.x | +|--------|-------------|-------------|--------------| +| v1.8.x | Yes | Yes | Yes | +| v1.7.x | Yes | Yes | Yes | +| v1.6.x | Yes | Yes | No | + +See the [Validated Kubernetes Versions](https://cluster-api.sigs.k8s.io/reference/versions.html?highlight=version#supported-kubernetes-versions){target=_blank} page for more information on CAPI validated versions. + +### AOS + +| CAPX | 6.5.x (LTS) | 6.8 (STS) | 6.10 | 7.0 | 7.3 | +|--------|-------------|-----------|------|-----|-----| +| v1.8.x | No | No | No | No | Yes | +| v1.7.x | No | Yes | Yes | Yes | Yes | +| v1.6.x | No | Yes | Yes | Yes | Yes | + +!!! warning "Cloud-Init Compatibility with AOS 7.3" + + When using CAPX v1.8.x with AOS 7.3, operating systems that do not use cloud-config for cloud-init may experience issues. Ensure your OS images are configured to use cloud-config format for cloud-init to avoid compatibility problems. + +### Prism Central + +| CAPX | pc.2022.6 | pc.2023.x | pc.2024.x | pc.7.3 | +|--------|-----------|-----------|-----------|--------| +| v1.8.x | No | No | No | Yes | +| v1.7.x | No | Yes | Yes | Yes | +| v1.6.x | No | Yes | Yes | Yes | diff --git a/docs/ccm/latest b/docs/ccm/latest index 7157a9c5..88fe387d 120000 --- a/docs/ccm/latest +++ b/docs/ccm/latest @@ -1 +1 @@ -v0.3.x \ No newline at end of file +v0.5.x \ No newline at end of file diff --git a/docs/ccm/v0.4.x/ccm_configuration.md b/docs/ccm/v0.4.x/ccm_configuration.md new file mode 100644 index 00000000..63e1b714 --- /dev/null +++ b/docs/ccm/v0.4.x/ccm_configuration.md @@ -0,0 +1,66 @@ +# Nutanix CCM Configuration + +Nutanix CCM can be configured via a `JSON` formated file stored in a configmap called `nutanix-config`. This configmap is located in the same namespace as the Nutanix CCM deployment. See the `manifests/cloud-provider-nutanix-deployment.yaml` file for details on the Nutanix CCM deployment. + +Example `nutanix-config` configmap: +```YAML +--- +kind: ConfigMap +apiVersion: v1 +metadata: + name: nutanix-config + namespace: kube-system +data: + nutanix_config.json: |- + { + "prismCentral": { + "address": "${NUTANIX_ENDPOINT}", + "port": ${NUTANIX_PORT}, + "insecure": ${NUTANIX_INSECURE}, + "credentialRef": { + "kind": "secret", + "name": "nutanix-creds" + }, + "additionalTrustBundle": { + "kind": "ConfigMap", + "name": "user-ca-bundle" + } + }, + "enableCustomLabeling": false, + "ignoredNodeIPs": [], + "topologyDiscovery": { + "type": "Categories", + "topologyCategories": { + "regionCategory": "${NUTANIX_REGION_CATEGORY}", + "zoneCategory": "${NUTANIX_ZONE_CATEGORY}" + } + } + } + +``` + +The table below provides an overview of the supported configuration parameters. + +### Configuration parameters + +| Key |Type |Description | +|---------------------------------------------------|------|------------------------------------------------------------------------------------------------------------------------------------------------------| +|topologyDiscovery |object|(Optional) Configures the topology discovery mode.
`Prism` topology discovery is used by default if `topologyDiscovery` attribute is not passed. | +|topologyDiscovery.type |string|Topology Discovery mode. Can be `Prism` or `Categories`. See [Topology Discovery](./topology_discovery.md) for more information. | +|topologyDiscovery.topologyCategories |object|Required if topology discovery mode is `Categories`.
| +|topologyDiscovery.topologyCategories.regionCategory|string|Category key defining the region of the Kubernetes node. | +|topologyDiscovery.topologyCategories.zoneCategory |string|Category key defining the zone of the Kubernetes node. | +|enableCustomLabeling |bool |Boolean value to enable custom labeling. See [Custom Labeling](./custom_labeling.md) for more information.
Default: `false` | +|ignoredNodeIPs |array |List of node IPs to ignore. Optional. | +|prismCentral |object|Prism Central endpoint configuration. | +|prismCentral.address |string|FQDN/IP of the Prism Central endpoint. | +|prismCentral.port |int |Port to connect to Prism Central.
Default: `9440` | +|prismCentral.insecure |bool |Disable Prism Central certificate checking.
Default: `false` | +|prismCentral.credentialRef |object|Prism Central credential configuration. See [Credentials](./ccm_credentials.md) for more information. | +|prismCentral.credentialRef.kind |string|Credential kind.
Allowed value: `secret` | +|prismCentral.credentialRef.name |string|Name of the secret. | +|prismCentral.credentialRef.namespace |string|(Optional) Namespace of the secret. | +|prismCentral.additionalTrustBundle |object|Reference to the certificate trust bundle used for Prism Central connection. | +|prismCentral.additionalTrustBundle.kind |string|Kind of the additionalTrustBundle. Allowed value: `ConfigMap` | +|prismCentral.additionalTrustBundle.name |string|Name of the `ConfigMap` containing the Prism Central trust bundle. | +|prismCentral.additionalTrustBundle.namespace |string|(Optional) Namespace of the `ConfigMap` containing the Prism Central trust bundle. See [Certificate Trust](./pc_certificates.md) for more information.| \ No newline at end of file diff --git a/docs/ccm/v0.4.x/ccm_credentials.md b/docs/ccm/v0.4.x/ccm_credentials.md new file mode 100644 index 00000000..7bda06e2 --- /dev/null +++ b/docs/ccm/v0.4.x/ccm_credentials.md @@ -0,0 +1,29 @@ +# Credentials + +Nutanix CCM requires credentials to connect to Prism Central. These credentials need to be stored in a secret in following format: + +```YAML +--- +apiVersion: v1 +kind: Secret +metadata: + name: nutanix-creds + namespace: kube-system +stringData: + credentials: | + [ + { + "type": "basic_auth", + "data": { + "prismCentral":{ + "username": "$NUTANIX_USERNAME", + "password": "$NUTANIX_PASSWORD" + }, + "prismElements": null + } + } + ] + +``` + +See [Requirements](./requirements.md) for more information on the required permissions. \ No newline at end of file diff --git a/docs/ccm/v0.4.x/custom_labeling.md b/docs/ccm/v0.4.x/custom_labeling.md new file mode 100644 index 00000000..4db89462 --- /dev/null +++ b/docs/ccm/v0.4.x/custom_labeling.md @@ -0,0 +1,14 @@ +# Custom Labeling + +Enabling the Nutanix CCM custom labeling feature will add additional labels to the Kubernetes nodes. See [Nutanix CCM Configuration](./ccm_configuration.md) for more information on how to configure CCM to enable custom labeling. + +The following labels will be added: + +|Label |Description | +|------------------------------|-----------------------------------------------------------------| +|nutanix.com/prism-element-uuid|UUID of the Prism Element cluster hosting the Kubernetes node VM.| +|nutanix.com/prism-element-name|Name of the Prism Element cluster hosting the Kubernetes node VM.| +|nutanix.com/prism-host-uuid |UUID of the Prism AHV host hosting the Kubernetes node VM. | +|nutanix.com/prism-host-name |Name of the Prism AHV host hosting the Kubernetes node VM. | + +Nutanix CCM will reconcile the labels periodically. \ No newline at end of file diff --git a/docs/ccm/v0.4.x/overview.md b/docs/ccm/v0.4.x/overview.md new file mode 100644 index 00000000..02a426c7 --- /dev/null +++ b/docs/ccm/v0.4.x/overview.md @@ -0,0 +1,22 @@ +# Overview + +Nutanix CCM provides Cloud Controller Manager functionality to Kubernetes clusters running on the Nutanix AHV hypervisor. Visit the [Kubernetes Cloud Controller Manager](https://kubernetes.io/docs/concepts/architecture/cloud-controller/) documentation for more information about the general design of a Kubernetes CCM. + +Nutanix CCM communicates with Prism Central (CCM) to fetch all required information. See the [Requirements](./requirements.md) page for more details. + +## Nutanix CCM functionality + +|Version|Node Controller|Route Controller|Service Controller| +|-------|---------------|----------------|------------------| +|v0.4.x |Yes |No |No | +|v0.3.x |Yes |No |No | +|v0.2.x |Yes |No |No | + + +Nutanix CCM specific features: + +|Version|[Topology Discovery](./topology_discovery.md)|[Custom Labeling](./custom_labeling.md)| +|-------|---------------------------------------------|---------------------------------------| +|v0.4.x |Prism, Categories |Yes | +|v0.3.x |Prism, Categories |Yes | +|v0.2.x |Prism, Categories |Yes | \ No newline at end of file diff --git a/docs/ccm/v0.4.x/pc_certificates.md b/docs/ccm/v0.4.x/pc_certificates.md new file mode 100644 index 00000000..be9071bf --- /dev/null +++ b/docs/ccm/v0.4.x/pc_certificates.md @@ -0,0 +1,104 @@ +# Certificate Trust + +CCM invokes Prism Central APIs using the HTTPS protocol. CCM has different methods to handle the trust of the Prism Central certificates: + +- Enable certificate verification (default) +- Configure an additional trust bundle +- Disable certificate verification + +See the respective sections below for more information. + +## Enable certificate verification (default) +By default CCM will perform certificate verification when invoking Prism Central API calls. This requires Prism Central to be configured with a publicly trusted certificate authority. +No additional configuration is required in CCM. + +## Configure an additional trust bundle +CCM allows users to configure an additional trust bundle. This will allow CCM to verify certificates that are not issued by a publicy trusted certificate authority. + +To configure an additional trust bundle, see the [Configuring the additional trust bundle](#configuring-the-additional-trust-bundle) section for more information. + + +### Configuring the additional trust bundle + +To configure the additional trust bundle it is required to: + +- Create a `ConfigMap` containing the additional trust bundle +- Configure the `prismCentral.additionalTrustBundle` object in the CCM `ConfigMap` called `nutanix-config`. + +#### Creating the additional trust bundle ConfigMap + +CCM supports two different formats for the `ConfigMap` containing the additional trust bundle. The first one is to add the additional trust bundle as a multi-line string in the `ConfigMap`, the second option is to add the trust bundle in `base64` encoded format. See the examples below. + +Multi-line string example: +```YAML +--- +apiVersion: v1 +kind: ConfigMap +metadata: + name: user-ca-bundle + namespace: ${NAMESPACE} +data: + ca.crt: | + -----BEGIN CERTIFICATE----- + + -----END CERTIFICATE----- + -----BEGIN CERTIFICATE----- + + -----END CERTIFICATE----- +``` + +`base64` example: + +```YAML +apiVersion: v1 +kind: ConfigMap +metadata: + name: user-ca-bundle + namespace: ${NAMESPACE} +binaryData: + ca.crt: +``` + +!!! note + The `base64` string needs to be added as `binaryData`. + + +#### Configuring the CCM for an additional trust bundle + +When the additional trust bundle `ConfigMap` is created, it needs to be referenced in the `nutanix-config` `ConfigMap`. Add the `prismCentral.additionalTrustBundle` object as shown below. Make sure the correct additional trust bundle `ConfigMap` is referenced. + +```JSON + ... + "prismCentral": { + ... + "additionalTrustBundle": { + "kind": "ConfigMap", + "name": "user-ca-bundle" + } + }, + ... +``` + +!!! note + The default value of `prismCentral.insecure` attribute is `false`. It can be omitted when an additional trust bundle is configured. + If `prismCentral.insecure` attribute is set to `true`, all certificate verification will be disabled. + + +## Disable certificate verification + +!!! note + Disabling certificate verification is not recommended for production purposes and should only be used for testing. + + +Certificate verification can be disabled by setting the `prismCentral.insecure` attribute to `true` in the `nutanix-config` `ConfigMap`. Certificate verification will be disabled even if an additional trust bundle is configured and the `prismCentral.insecure` attribute is set to `true`. + +Example of how to disable certificate verification: + +```JSON +... +"prismCentral": { + ... + "insecure": true +}, +... +``` \ No newline at end of file diff --git a/docs/ccm/v0.4.x/requirements.md b/docs/ccm/v0.4.x/requirements.md new file mode 100644 index 00000000..0bb867e2 --- /dev/null +++ b/docs/ccm/v0.4.x/requirements.md @@ -0,0 +1,33 @@ +# Requirements + +This section provides an overview of the requirements for Nutanix CCM: + +## Port requirements + +Nutanix CCM uses Prism Central APIs to fetch the required information for the Kubernetes nodes. As a result, the Kubernetes nodes need to have access to the Prism Central endpoint that is configured in the `nutanix-config` configmap. + +|Source |Destination |Protocol |Port |Description | +|------------------|--------------------|----------|-----|----------------------------------------| +|Kubernetes nodes |Prism Central |TCP |9440 |Nutanix CCM communication to Prism Central| + +## User permissions +Nutanix CCM will only perform read operations and requires a user account with an assigned `Viewer` role to consume Prism Central APIs. + +### Required roles: Local user + +|Role |Required| +|-------------------|--------| +|User Admin |No | +|Prism Central Admin|No | + +!!! note + + For local users, if no role is assigned, the local user will only get `Viewer` permissions + +### Required roles: Directory user + +Assign following role in the user role-mapping if a non-local user is required: + +|Role |Required| +|-------------------|--------| +|Viewer |Yes | diff --git a/docs/ccm/v0.4.x/topology_discovery.md b/docs/ccm/v0.4.x/topology_discovery.md new file mode 100644 index 00000000..7349e5b7 --- /dev/null +++ b/docs/ccm/v0.4.x/topology_discovery.md @@ -0,0 +1,124 @@ +# Topology Discovery + +One of the responsibilities of the CCM node controller is to annotate and label the nodes in a Kubernetes cluster with toplogy (region and zone) information. The Nutanix Cloud Controller Manager supports following topology discovery methods: + +- [Prism](#prism) +- [Categories](#categories) + +The topology discovery method can be configured via the `nutanix-config` configmap. See [Nutanix CCM Configuration](./ccm_configuration.md) for more information on the configuration parameters. + +## Prism + +Prism-based topology discovery is the default mode for Nutanix CCM. In this mode CCM will discover the Prism Element (PE) cluster and Prism Central (PC) instance that host the Kubernetes node VM. Prism Central is configured as the region for the node, while Prism Element is configured as the zone. + +Prism-based topology discovery can be configured by omitting the `topologyDiscovery` attribute from the `nutanix-config` configmap or by passing following object: +```JSON + "topologyDiscovery": { + "type": "Prism" + } +``` + +### Example +If a Kubernetes Node VM is hosted on PC `my-pc-instance` and PE `my-pe-cluster-1`, Nutanix CCM will assign following labels to the Kubernetes node: + +|Key |Value | +|-----------------------------|---------------| +|topology.kubernetes.io/region|my-pc-instance | +|topology.kubernetes.io/zone |my-pe-cluster-1| + +## Categories + +The category-based topology discovery mode allows users to assign categories to Prism Element clusters and Kubernetes Node VMs to define a custom topology. Nutanix CCM will hierarchically search for the required categories on the VM/PE. + +!!! note + + Categories assigned to the VM object will take precedence over the categories assigned to the PE cluster. + +It is required for the categories to exist inside of the PC environment. CCM will not create and assign the categories. +Visit the [Prism Central documentation](https://portal.nutanix.com/page/documents/details?targetId=Prism-Central-Guide-vpc_2022_6:ssp-ssp-categories-manage-pc-c.html){target=_blank} for more information regarding categories. + +To enable the Categories topology discovery mode for Nutanix CCM, provide following information in the `topologyDiscovery` attribute: + +```JSON + "topologyDiscovery": { + "type": "Categories", + "topologyCategories": { + "regionCategory": "${NUTANIX_REGION_CATEGORY}", + "zoneCategory": "${NUTANIX_ZONE_CATEGORY}" + } + } +``` + +### Example + +Define a set of categories in PC that will be used for topology discovery: + +|Key |Value | +|------------------|-----------------------| +|my-region-category|region-1, region-2 | +|my-zone-category |zone-1, zone-2, zone-3 | + +Assign the categories to the Nutanix entities: + +|Nutanix entity |Categories | +|---------------|------------------------------------------------------| +|my-pe-cluster-1|my-region-category:region-1
my-zone-category:zone-2| +|my-pe-cluster-2|my-region-category:region-2
my-zone-category:zone-3| +|k8s-node-3 |my-region-category:region-2
my-zone-category:zone-2| +|k8s-node-4 |my-zone-category:zone-1 | + + +Configure CCM to use categories for topology discovery: +```JSON + "topologyDiscovery": { + "type": "Categories", + "topologyCategories": { + "regionCategory": "my-region-category", + "zoneCategory": "my-zone-category" + } + } +``` + +!!! example "Scenario 1: Kubernetes node k8s-node-1 is running on my-pe-cluster-1" + + Following topology labels will be assigned to Kubernetes node `k8s-node-1`: + + |Key |Value | + |-----------------------------|---------------| + |topology.kubernetes.io/region|region-1 | + |topology.kubernetes.io/zone |zone-2 | + + Categories assigned to PE will be used. + +!!! example "Scenario 2: Kubernetes node k8s-node-2 is running on my-pe-cluster-2" + + Following topology labels will be assigned to Kubernetes node `k8s-node-2`: + + |Key |Value | + |-----------------------------|---------------| + |topology.kubernetes.io/region|region-2 | + |topology.kubernetes.io/zone |zone-3 | + + Categories assigned to PE will be used. + +!!! example "Scenario 3: Kubernetes node k8s-node-3 is running on my-pe-cluster-2" + + Following topology labels will be assigned to Kubernetes node `k8s-node-3`: + + |Key |Value | + |-----------------------------|---------------| + |topology.kubernetes.io/region|region-2 | + |topology.kubernetes.io/zone |zone-2 | + + Categories assigned to the VM will be used. + +!!! example "Scenario 4: Kubernetes node k8s-node-4 is running on my-pe-cluster-1" + + Following topology labels will be assigned to Kubernetes node `k8s-node-4`: + + |Key |Value | + |-----------------------------|---------------| + |topology.kubernetes.io/region|region-1 | + |topology.kubernetes.io/zone |zone-1 | + + In this scenario Nutanix CCM will use the value of the `my-zone-category` category that is assigned to the VM. Since the `my-region-category`is not assigned to the VM, Nutanix CCM will search for the category on PE and use the corresponding category value. \ No newline at end of file diff --git a/docs/ccm/v0.5.x/ccm_configuration.md b/docs/ccm/v0.5.x/ccm_configuration.md new file mode 100644 index 00000000..1df8e394 --- /dev/null +++ b/docs/ccm/v0.5.x/ccm_configuration.md @@ -0,0 +1,66 @@ +# Nutanix CCM Configuration + +Nutanix CCM can be configured via a `JSON` formated file stored in a configmap called `nutanix-config`. This configmap is located in the same namespace as the Nutanix CCM deployment. See the `manifests/cloud-provider-nutanix-deployment.yaml` file for details on the Nutanix CCM deployment. + +Example `nutanix-config` configmap: +```YAML +--- +kind: ConfigMap +apiVersion: v1 +metadata: + name: nutanix-config + namespace: kube-system +data: + nutanix_config.json: |- + { + "prismCentral": { + "address": "${NUTANIX_ENDPOINT}", + "port": ${NUTANIX_PORT}, + "insecure": ${NUTANIX_INSECURE}, + "credentialRef": { + "kind": "secret", + "name": "nutanix-creds" + }, + "additionalTrustBundle": { + "kind": "ConfigMap", + "name": "user-ca-bundle" + } + }, + "enableCustomLabeling": false, + "ignoredNodeIPs": [], + "topologyDiscovery": { + "type": "Categories", + "topologyCategories": { + "regionCategory": "${NUTANIX_REGION_CATEGORY}", + "zoneCategory": "${NUTANIX_ZONE_CATEGORY}" + } + } + } + +``` + +The table below provides an overview of the supported configuration parameters. + +### Configuration parameters + +| Key |Type |Description | +|---------------------------------------------------|------|------------------------------------------------------------------------------------------------------------------------------------------------------| +|topologyDiscovery |object|(Optional) Configures the topology discovery mode.
`Prism` topology discovery is used by default if `topologyDiscovery` attribute is not passed. | +|topologyDiscovery.type |string|Topology Discovery mode. Can be `Prism` or `Categories`. See [Topology Discovery](./topology_discovery.md) for more information. | +|topologyDiscovery.topologyCategories |object|Required if topology discovery mode is `Categories`.
| +|topologyDiscovery.topologyCategories.regionCategory|string|Category key defining the region of the Kubernetes node. | +|topologyDiscovery.topologyCategories.zoneCategory |string|Category key defining the zone of the Kubernetes node. | +|enableCustomLabeling |bool |Boolean value to enable custom labeling. See [Custom Labeling](./custom_labeling.md) for more information.
Default: `false` | +|ignoredNodeIPs |array |List of node IPs, IP ranges (e.g. "10.0.0.1-10.0.0.10"), or CIDR prefixes (e.g. "10.0.0.0/24") to ignore. Optional. | +|prismCentral |object|Prism Central endpoint configuration. | +|prismCentral.address |string|FQDN/IP of the Prism Central endpoint. | +|prismCentral.port |int |Port to connect to Prism Central.
Default: `9440` | +|prismCentral.insecure |bool |Disable Prism Central certificate checking.
Default: `false` | +|prismCentral.credentialRef |object|Prism Central credential configuration. See [Credentials](./ccm_credentials.md) for more information. | +|prismCentral.credentialRef.kind |string|Credential kind.
Allowed value: `secret` | +|prismCentral.credentialRef.name |string|Name of the secret. | +|prismCentral.credentialRef.namespace |string|(Optional) Namespace of the secret. | +|prismCentral.additionalTrustBundle |object|Reference to the certificate trust bundle used for Prism Central connection. | +|prismCentral.additionalTrustBundle.kind |string|Kind of the additionalTrustBundle. Allowed value: `ConfigMap` | +|prismCentral.additionalTrustBundle.name |string|Name of the `ConfigMap` containing the Prism Central trust bundle. | +|prismCentral.additionalTrustBundle.namespace |string|(Optional) Namespace of the `ConfigMap` containing the Prism Central trust bundle. See [Certificate Trust](./pc_certificates.md) for more information.| \ No newline at end of file diff --git a/docs/ccm/v0.5.x/ccm_credentials.md b/docs/ccm/v0.5.x/ccm_credentials.md new file mode 100644 index 00000000..7bda06e2 --- /dev/null +++ b/docs/ccm/v0.5.x/ccm_credentials.md @@ -0,0 +1,29 @@ +# Credentials + +Nutanix CCM requires credentials to connect to Prism Central. These credentials need to be stored in a secret in following format: + +```YAML +--- +apiVersion: v1 +kind: Secret +metadata: + name: nutanix-creds + namespace: kube-system +stringData: + credentials: | + [ + { + "type": "basic_auth", + "data": { + "prismCentral":{ + "username": "$NUTANIX_USERNAME", + "password": "$NUTANIX_PASSWORD" + }, + "prismElements": null + } + } + ] + +``` + +See [Requirements](./requirements.md) for more information on the required permissions. \ No newline at end of file diff --git a/docs/ccm/v0.5.x/custom_labeling.md b/docs/ccm/v0.5.x/custom_labeling.md new file mode 100644 index 00000000..4db89462 --- /dev/null +++ b/docs/ccm/v0.5.x/custom_labeling.md @@ -0,0 +1,14 @@ +# Custom Labeling + +Enabling the Nutanix CCM custom labeling feature will add additional labels to the Kubernetes nodes. See [Nutanix CCM Configuration](./ccm_configuration.md) for more information on how to configure CCM to enable custom labeling. + +The following labels will be added: + +|Label |Description | +|------------------------------|-----------------------------------------------------------------| +|nutanix.com/prism-element-uuid|UUID of the Prism Element cluster hosting the Kubernetes node VM.| +|nutanix.com/prism-element-name|Name of the Prism Element cluster hosting the Kubernetes node VM.| +|nutanix.com/prism-host-uuid |UUID of the Prism AHV host hosting the Kubernetes node VM. | +|nutanix.com/prism-host-name |Name of the Prism AHV host hosting the Kubernetes node VM. | + +Nutanix CCM will reconcile the labels periodically. \ No newline at end of file diff --git a/docs/ccm/v0.5.x/overview.md b/docs/ccm/v0.5.x/overview.md new file mode 100644 index 00000000..698169aa --- /dev/null +++ b/docs/ccm/v0.5.x/overview.md @@ -0,0 +1,24 @@ +# Overview + +Nutanix CCM provides Cloud Controller Manager functionality to Kubernetes clusters running on the Nutanix AHV hypervisor. Visit the [Kubernetes Cloud Controller Manager](https://kubernetes.io/docs/concepts/architecture/cloud-controller/) documentation for more information about the general design of a Kubernetes CCM. + +Nutanix CCM communicates with Prism Central (CCM) to fetch all required information. See the [Requirements](./requirements.md) page for more details. + +## Nutanix CCM functionality + +|Version|Node Controller|Route Controller|Service Controller| +|-------|---------------|----------------|------------------| +|v0.5.x |Yes |No |No | +|v0.4.x |Yes |No |No | +|v0.3.x |Yes |No |No | +|v0.2.x |Yes |No |No | + + +Nutanix CCM specific features: + +|Version|[Topology Discovery](./topology_discovery.md)|[Custom Labeling](./custom_labeling.md)| +|-------|---------------------------------------------|---------------------------------------| +|v0.5.x |Prism, Categories |Yes | +|v0.4.x |Prism, Categories |Yes | +|v0.3.x |Prism, Categories |Yes | +|v0.2.x |Prism, Categories |Yes | \ No newline at end of file diff --git a/docs/ccm/v0.5.x/pc_certificates.md b/docs/ccm/v0.5.x/pc_certificates.md new file mode 100644 index 00000000..be9071bf --- /dev/null +++ b/docs/ccm/v0.5.x/pc_certificates.md @@ -0,0 +1,104 @@ +# Certificate Trust + +CCM invokes Prism Central APIs using the HTTPS protocol. CCM has different methods to handle the trust of the Prism Central certificates: + +- Enable certificate verification (default) +- Configure an additional trust bundle +- Disable certificate verification + +See the respective sections below for more information. + +## Enable certificate verification (default) +By default CCM will perform certificate verification when invoking Prism Central API calls. This requires Prism Central to be configured with a publicly trusted certificate authority. +No additional configuration is required in CCM. + +## Configure an additional trust bundle +CCM allows users to configure an additional trust bundle. This will allow CCM to verify certificates that are not issued by a publicy trusted certificate authority. + +To configure an additional trust bundle, see the [Configuring the additional trust bundle](#configuring-the-additional-trust-bundle) section for more information. + + +### Configuring the additional trust bundle + +To configure the additional trust bundle it is required to: + +- Create a `ConfigMap` containing the additional trust bundle +- Configure the `prismCentral.additionalTrustBundle` object in the CCM `ConfigMap` called `nutanix-config`. + +#### Creating the additional trust bundle ConfigMap + +CCM supports two different formats for the `ConfigMap` containing the additional trust bundle. The first one is to add the additional trust bundle as a multi-line string in the `ConfigMap`, the second option is to add the trust bundle in `base64` encoded format. See the examples below. + +Multi-line string example: +```YAML +--- +apiVersion: v1 +kind: ConfigMap +metadata: + name: user-ca-bundle + namespace: ${NAMESPACE} +data: + ca.crt: | + -----BEGIN CERTIFICATE----- + + -----END CERTIFICATE----- + -----BEGIN CERTIFICATE----- + + -----END CERTIFICATE----- +``` + +`base64` example: + +```YAML +apiVersion: v1 +kind: ConfigMap +metadata: + name: user-ca-bundle + namespace: ${NAMESPACE} +binaryData: + ca.crt: +``` + +!!! note + The `base64` string needs to be added as `binaryData`. + + +#### Configuring the CCM for an additional trust bundle + +When the additional trust bundle `ConfigMap` is created, it needs to be referenced in the `nutanix-config` `ConfigMap`. Add the `prismCentral.additionalTrustBundle` object as shown below. Make sure the correct additional trust bundle `ConfigMap` is referenced. + +```JSON + ... + "prismCentral": { + ... + "additionalTrustBundle": { + "kind": "ConfigMap", + "name": "user-ca-bundle" + } + }, + ... +``` + +!!! note + The default value of `prismCentral.insecure` attribute is `false`. It can be omitted when an additional trust bundle is configured. + If `prismCentral.insecure` attribute is set to `true`, all certificate verification will be disabled. + + +## Disable certificate verification + +!!! note + Disabling certificate verification is not recommended for production purposes and should only be used for testing. + + +Certificate verification can be disabled by setting the `prismCentral.insecure` attribute to `true` in the `nutanix-config` `ConfigMap`. Certificate verification will be disabled even if an additional trust bundle is configured and the `prismCentral.insecure` attribute is set to `true`. + +Example of how to disable certificate verification: + +```JSON +... +"prismCentral": { + ... + "insecure": true +}, +... +``` \ No newline at end of file diff --git a/docs/ccm/v0.5.x/requirements.md b/docs/ccm/v0.5.x/requirements.md new file mode 100644 index 00000000..0bb867e2 --- /dev/null +++ b/docs/ccm/v0.5.x/requirements.md @@ -0,0 +1,33 @@ +# Requirements + +This section provides an overview of the requirements for Nutanix CCM: + +## Port requirements + +Nutanix CCM uses Prism Central APIs to fetch the required information for the Kubernetes nodes. As a result, the Kubernetes nodes need to have access to the Prism Central endpoint that is configured in the `nutanix-config` configmap. + +|Source |Destination |Protocol |Port |Description | +|------------------|--------------------|----------|-----|----------------------------------------| +|Kubernetes nodes |Prism Central |TCP |9440 |Nutanix CCM communication to Prism Central| + +## User permissions +Nutanix CCM will only perform read operations and requires a user account with an assigned `Viewer` role to consume Prism Central APIs. + +### Required roles: Local user + +|Role |Required| +|-------------------|--------| +|User Admin |No | +|Prism Central Admin|No | + +!!! note + + For local users, if no role is assigned, the local user will only get `Viewer` permissions + +### Required roles: Directory user + +Assign following role in the user role-mapping if a non-local user is required: + +|Role |Required| +|-------------------|--------| +|Viewer |Yes | diff --git a/docs/ccm/v0.5.x/topology_discovery.md b/docs/ccm/v0.5.x/topology_discovery.md new file mode 100644 index 00000000..7349e5b7 --- /dev/null +++ b/docs/ccm/v0.5.x/topology_discovery.md @@ -0,0 +1,124 @@ +# Topology Discovery + +One of the responsibilities of the CCM node controller is to annotate and label the nodes in a Kubernetes cluster with toplogy (region and zone) information. The Nutanix Cloud Controller Manager supports following topology discovery methods: + +- [Prism](#prism) +- [Categories](#categories) + +The topology discovery method can be configured via the `nutanix-config` configmap. See [Nutanix CCM Configuration](./ccm_configuration.md) for more information on the configuration parameters. + +## Prism + +Prism-based topology discovery is the default mode for Nutanix CCM. In this mode CCM will discover the Prism Element (PE) cluster and Prism Central (PC) instance that host the Kubernetes node VM. Prism Central is configured as the region for the node, while Prism Element is configured as the zone. + +Prism-based topology discovery can be configured by omitting the `topologyDiscovery` attribute from the `nutanix-config` configmap or by passing following object: +```JSON + "topologyDiscovery": { + "type": "Prism" + } +``` + +### Example +If a Kubernetes Node VM is hosted on PC `my-pc-instance` and PE `my-pe-cluster-1`, Nutanix CCM will assign following labels to the Kubernetes node: + +|Key |Value | +|-----------------------------|---------------| +|topology.kubernetes.io/region|my-pc-instance | +|topology.kubernetes.io/zone |my-pe-cluster-1| + +## Categories + +The category-based topology discovery mode allows users to assign categories to Prism Element clusters and Kubernetes Node VMs to define a custom topology. Nutanix CCM will hierarchically search for the required categories on the VM/PE. + +!!! note + + Categories assigned to the VM object will take precedence over the categories assigned to the PE cluster. + +It is required for the categories to exist inside of the PC environment. CCM will not create and assign the categories. +Visit the [Prism Central documentation](https://portal.nutanix.com/page/documents/details?targetId=Prism-Central-Guide-vpc_2022_6:ssp-ssp-categories-manage-pc-c.html){target=_blank} for more information regarding categories. + +To enable the Categories topology discovery mode for Nutanix CCM, provide following information in the `topologyDiscovery` attribute: + +```JSON + "topologyDiscovery": { + "type": "Categories", + "topologyCategories": { + "regionCategory": "${NUTANIX_REGION_CATEGORY}", + "zoneCategory": "${NUTANIX_ZONE_CATEGORY}" + } + } +``` + +### Example + +Define a set of categories in PC that will be used for topology discovery: + +|Key |Value | +|------------------|-----------------------| +|my-region-category|region-1, region-2 | +|my-zone-category |zone-1, zone-2, zone-3 | + +Assign the categories to the Nutanix entities: + +|Nutanix entity |Categories | +|---------------|------------------------------------------------------| +|my-pe-cluster-1|my-region-category:region-1
my-zone-category:zone-2| +|my-pe-cluster-2|my-region-category:region-2
my-zone-category:zone-3| +|k8s-node-3 |my-region-category:region-2
my-zone-category:zone-2| +|k8s-node-4 |my-zone-category:zone-1 | + + +Configure CCM to use categories for topology discovery: +```JSON + "topologyDiscovery": { + "type": "Categories", + "topologyCategories": { + "regionCategory": "my-region-category", + "zoneCategory": "my-zone-category" + } + } +``` + +!!! example "Scenario 1: Kubernetes node k8s-node-1 is running on my-pe-cluster-1" + + Following topology labels will be assigned to Kubernetes node `k8s-node-1`: + + |Key |Value | + |-----------------------------|---------------| + |topology.kubernetes.io/region|region-1 | + |topology.kubernetes.io/zone |zone-2 | + + Categories assigned to PE will be used. + +!!! example "Scenario 2: Kubernetes node k8s-node-2 is running on my-pe-cluster-2" + + Following topology labels will be assigned to Kubernetes node `k8s-node-2`: + + |Key |Value | + |-----------------------------|---------------| + |topology.kubernetes.io/region|region-2 | + |topology.kubernetes.io/zone |zone-3 | + + Categories assigned to PE will be used. + +!!! example "Scenario 3: Kubernetes node k8s-node-3 is running on my-pe-cluster-2" + + Following topology labels will be assigned to Kubernetes node `k8s-node-3`: + + |Key |Value | + |-----------------------------|---------------| + |topology.kubernetes.io/region|region-2 | + |topology.kubernetes.io/zone |zone-2 | + + Categories assigned to the VM will be used. + +!!! example "Scenario 4: Kubernetes node k8s-node-4 is running on my-pe-cluster-1" + + Following topology labels will be assigned to Kubernetes node `k8s-node-4`: + + |Key |Value | + |-----------------------------|---------------| + |topology.kubernetes.io/region|region-1 | + |topology.kubernetes.io/zone |zone-1 | + + In this scenario Nutanix CCM will use the value of the `my-zone-category` category that is assigned to the VM. Since the `my-region-category`is not assigned to the VM, Nutanix CCM will search for the category on PE and use the corresponding category value. \ No newline at end of file diff --git a/docs/ccm/v0.6.x/ccm_configuration.md b/docs/ccm/v0.6.x/ccm_configuration.md new file mode 100644 index 00000000..1df8e394 --- /dev/null +++ b/docs/ccm/v0.6.x/ccm_configuration.md @@ -0,0 +1,66 @@ +# Nutanix CCM Configuration + +Nutanix CCM can be configured via a `JSON` formated file stored in a configmap called `nutanix-config`. This configmap is located in the same namespace as the Nutanix CCM deployment. See the `manifests/cloud-provider-nutanix-deployment.yaml` file for details on the Nutanix CCM deployment. + +Example `nutanix-config` configmap: +```YAML +--- +kind: ConfigMap +apiVersion: v1 +metadata: + name: nutanix-config + namespace: kube-system +data: + nutanix_config.json: |- + { + "prismCentral": { + "address": "${NUTANIX_ENDPOINT}", + "port": ${NUTANIX_PORT}, + "insecure": ${NUTANIX_INSECURE}, + "credentialRef": { + "kind": "secret", + "name": "nutanix-creds" + }, + "additionalTrustBundle": { + "kind": "ConfigMap", + "name": "user-ca-bundle" + } + }, + "enableCustomLabeling": false, + "ignoredNodeIPs": [], + "topologyDiscovery": { + "type": "Categories", + "topologyCategories": { + "regionCategory": "${NUTANIX_REGION_CATEGORY}", + "zoneCategory": "${NUTANIX_ZONE_CATEGORY}" + } + } + } + +``` + +The table below provides an overview of the supported configuration parameters. + +### Configuration parameters + +| Key |Type |Description | +|---------------------------------------------------|------|------------------------------------------------------------------------------------------------------------------------------------------------------| +|topologyDiscovery |object|(Optional) Configures the topology discovery mode.
`Prism` topology discovery is used by default if `topologyDiscovery` attribute is not passed. | +|topologyDiscovery.type |string|Topology Discovery mode. Can be `Prism` or `Categories`. See [Topology Discovery](./topology_discovery.md) for more information. | +|topologyDiscovery.topologyCategories |object|Required if topology discovery mode is `Categories`.
| +|topologyDiscovery.topologyCategories.regionCategory|string|Category key defining the region of the Kubernetes node. | +|topologyDiscovery.topologyCategories.zoneCategory |string|Category key defining the zone of the Kubernetes node. | +|enableCustomLabeling |bool |Boolean value to enable custom labeling. See [Custom Labeling](./custom_labeling.md) for more information.
Default: `false` | +|ignoredNodeIPs |array |List of node IPs, IP ranges (e.g. "10.0.0.1-10.0.0.10"), or CIDR prefixes (e.g. "10.0.0.0/24") to ignore. Optional. | +|prismCentral |object|Prism Central endpoint configuration. | +|prismCentral.address |string|FQDN/IP of the Prism Central endpoint. | +|prismCentral.port |int |Port to connect to Prism Central.
Default: `9440` | +|prismCentral.insecure |bool |Disable Prism Central certificate checking.
Default: `false` | +|prismCentral.credentialRef |object|Prism Central credential configuration. See [Credentials](./ccm_credentials.md) for more information. | +|prismCentral.credentialRef.kind |string|Credential kind.
Allowed value: `secret` | +|prismCentral.credentialRef.name |string|Name of the secret. | +|prismCentral.credentialRef.namespace |string|(Optional) Namespace of the secret. | +|prismCentral.additionalTrustBundle |object|Reference to the certificate trust bundle used for Prism Central connection. | +|prismCentral.additionalTrustBundle.kind |string|Kind of the additionalTrustBundle. Allowed value: `ConfigMap` | +|prismCentral.additionalTrustBundle.name |string|Name of the `ConfigMap` containing the Prism Central trust bundle. | +|prismCentral.additionalTrustBundle.namespace |string|(Optional) Namespace of the `ConfigMap` containing the Prism Central trust bundle. See [Certificate Trust](./pc_certificates.md) for more information.| \ No newline at end of file diff --git a/docs/ccm/v0.6.x/ccm_credentials.md b/docs/ccm/v0.6.x/ccm_credentials.md new file mode 100644 index 00000000..7bda06e2 --- /dev/null +++ b/docs/ccm/v0.6.x/ccm_credentials.md @@ -0,0 +1,29 @@ +# Credentials + +Nutanix CCM requires credentials to connect to Prism Central. These credentials need to be stored in a secret in following format: + +```YAML +--- +apiVersion: v1 +kind: Secret +metadata: + name: nutanix-creds + namespace: kube-system +stringData: + credentials: | + [ + { + "type": "basic_auth", + "data": { + "prismCentral":{ + "username": "$NUTANIX_USERNAME", + "password": "$NUTANIX_PASSWORD" + }, + "prismElements": null + } + } + ] + +``` + +See [Requirements](./requirements.md) for more information on the required permissions. \ No newline at end of file diff --git a/docs/ccm/v0.6.x/custom_labeling.md b/docs/ccm/v0.6.x/custom_labeling.md new file mode 100644 index 00000000..4db89462 --- /dev/null +++ b/docs/ccm/v0.6.x/custom_labeling.md @@ -0,0 +1,14 @@ +# Custom Labeling + +Enabling the Nutanix CCM custom labeling feature will add additional labels to the Kubernetes nodes. See [Nutanix CCM Configuration](./ccm_configuration.md) for more information on how to configure CCM to enable custom labeling. + +The following labels will be added: + +|Label |Description | +|------------------------------|-----------------------------------------------------------------| +|nutanix.com/prism-element-uuid|UUID of the Prism Element cluster hosting the Kubernetes node VM.| +|nutanix.com/prism-element-name|Name of the Prism Element cluster hosting the Kubernetes node VM.| +|nutanix.com/prism-host-uuid |UUID of the Prism AHV host hosting the Kubernetes node VM. | +|nutanix.com/prism-host-name |Name of the Prism AHV host hosting the Kubernetes node VM. | + +Nutanix CCM will reconcile the labels periodically. \ No newline at end of file diff --git a/docs/ccm/v0.6.x/overview.md b/docs/ccm/v0.6.x/overview.md new file mode 100644 index 00000000..36d7ebce --- /dev/null +++ b/docs/ccm/v0.6.x/overview.md @@ -0,0 +1,37 @@ +# Overview + +Nutanix CCM provides Cloud Controller Manager functionality to Kubernetes clusters running on the Nutanix AHV hypervisor. Visit the [Kubernetes Cloud Controller Manager](https://kubernetes.io/docs/concepts/architecture/cloud-controller/) documentation for more information about the general design of a Kubernetes CCM. + +Nutanix CCM communicates with Prism Central (CCM) to fetch all required information. See the [Requirements](./requirements.md) page for more details. + +## Nutanix CCM functionality + +|Version|Node Controller|Route Controller|Service Controller| +|-------|---------------|----------------|------------------| +|v0.6.x |Yes |No |No | +|v0.5.x |Yes |No |No | +|v0.4.x |Yes |No |No | +|v0.3.x |Yes |No |No | +|v0.2.x |Yes |No |No | + + +Nutanix CCM specific features: + +|Version|[Topology Discovery](./topology_discovery.md)|[Custom Labeling](./custom_labeling.md)| +|-------|---------------------------------------------|---------------------------------------| +|v0.6.x |Prism, Categories |Yes | +|v0.5.x |Prism, Categories |Yes | +|v0.4.x |Prism, Categories |Yes | +|v0.3.x |Prism, Categories |Yes | +|v0.2.x |Prism, Categories |Yes | + +## What's New in v0.6.x + +CCM v0.6.x introduces the following enhancements: + +- **Enhanced Node Discovery**: Improved node discovery mechanisms for better cloud integration +- **Performance Optimizations**: Optimized API calls to Prism Central for reduced latency +- **Improved Logging**: Enhanced logging capabilities for better troubleshooting and monitoring +- **Bug Fixes**: Various stability improvements and bug fixes from v0.5.x + +For detailed configuration examples and migration guidance, see the [Configuration](./ccm_configuration.md) page. \ No newline at end of file diff --git a/docs/ccm/v0.6.x/pc_certificates.md b/docs/ccm/v0.6.x/pc_certificates.md new file mode 100644 index 00000000..be9071bf --- /dev/null +++ b/docs/ccm/v0.6.x/pc_certificates.md @@ -0,0 +1,104 @@ +# Certificate Trust + +CCM invokes Prism Central APIs using the HTTPS protocol. CCM has different methods to handle the trust of the Prism Central certificates: + +- Enable certificate verification (default) +- Configure an additional trust bundle +- Disable certificate verification + +See the respective sections below for more information. + +## Enable certificate verification (default) +By default CCM will perform certificate verification when invoking Prism Central API calls. This requires Prism Central to be configured with a publicly trusted certificate authority. +No additional configuration is required in CCM. + +## Configure an additional trust bundle +CCM allows users to configure an additional trust bundle. This will allow CCM to verify certificates that are not issued by a publicy trusted certificate authority. + +To configure an additional trust bundle, see the [Configuring the additional trust bundle](#configuring-the-additional-trust-bundle) section for more information. + + +### Configuring the additional trust bundle + +To configure the additional trust bundle it is required to: + +- Create a `ConfigMap` containing the additional trust bundle +- Configure the `prismCentral.additionalTrustBundle` object in the CCM `ConfigMap` called `nutanix-config`. + +#### Creating the additional trust bundle ConfigMap + +CCM supports two different formats for the `ConfigMap` containing the additional trust bundle. The first one is to add the additional trust bundle as a multi-line string in the `ConfigMap`, the second option is to add the trust bundle in `base64` encoded format. See the examples below. + +Multi-line string example: +```YAML +--- +apiVersion: v1 +kind: ConfigMap +metadata: + name: user-ca-bundle + namespace: ${NAMESPACE} +data: + ca.crt: | + -----BEGIN CERTIFICATE----- + + -----END CERTIFICATE----- + -----BEGIN CERTIFICATE----- + + -----END CERTIFICATE----- +``` + +`base64` example: + +```YAML +apiVersion: v1 +kind: ConfigMap +metadata: + name: user-ca-bundle + namespace: ${NAMESPACE} +binaryData: + ca.crt: +``` + +!!! note + The `base64` string needs to be added as `binaryData`. + + +#### Configuring the CCM for an additional trust bundle + +When the additional trust bundle `ConfigMap` is created, it needs to be referenced in the `nutanix-config` `ConfigMap`. Add the `prismCentral.additionalTrustBundle` object as shown below. Make sure the correct additional trust bundle `ConfigMap` is referenced. + +```JSON + ... + "prismCentral": { + ... + "additionalTrustBundle": { + "kind": "ConfigMap", + "name": "user-ca-bundle" + } + }, + ... +``` + +!!! note + The default value of `prismCentral.insecure` attribute is `false`. It can be omitted when an additional trust bundle is configured. + If `prismCentral.insecure` attribute is set to `true`, all certificate verification will be disabled. + + +## Disable certificate verification + +!!! note + Disabling certificate verification is not recommended for production purposes and should only be used for testing. + + +Certificate verification can be disabled by setting the `prismCentral.insecure` attribute to `true` in the `nutanix-config` `ConfigMap`. Certificate verification will be disabled even if an additional trust bundle is configured and the `prismCentral.insecure` attribute is set to `true`. + +Example of how to disable certificate verification: + +```JSON +... +"prismCentral": { + ... + "insecure": true +}, +... +``` \ No newline at end of file diff --git a/docs/ccm/v0.6.x/requirements.md b/docs/ccm/v0.6.x/requirements.md new file mode 100644 index 00000000..8df2c94c --- /dev/null +++ b/docs/ccm/v0.6.x/requirements.md @@ -0,0 +1,41 @@ +# Requirements + +Nutanix Cloud Controller Manager (CCM) interacts with Nutanix Prism Central (PC) APIs using a Prism Central user account to fetch the required information for Kubernetes nodes. + +CCM supports two types of PC users: + +- Local users: automatically get `Viewer` permissions when no role is assigned. +- Domain users: must be assigned a role that includes the `Viewer` role. + +## Port requirements + +Nutanix CCM uses Prism Central APIs to communicate with the Prism Central endpoint configured in the `nutanix-config` configmap. The following network connectivity is required: + +|Source |Destination |Protocol |Port |Description | +|------------------|--------------------|----------|-----|----------------------------------------| +|Kubernetes nodes |Prism Central |TCP |9440 |Nutanix CCM communication to Prism Central| + +## User permissions + +Nutanix CCM performs read-only operations and requires minimal permissions to consume Prism Central APIs. + +### Required permissions for local users + +Local users automatically receive the necessary permissions: + +- View Cluster +- View Category +- View Host +- View Virtual Machine + +!!! note + For local users, if no role is assigned, the local user will only get `Viewer` permissions, which are sufficient for CCM operations. + +### Required permissions for domain users + +The following role must be assigned for Prism Central domain users: + +- Viewer + +!!! note + Domain users must be explicitly assigned the `Viewer` role in the user role-mapping configuration. diff --git a/docs/ccm/v0.6.x/topology_discovery.md b/docs/ccm/v0.6.x/topology_discovery.md new file mode 100644 index 00000000..7349e5b7 --- /dev/null +++ b/docs/ccm/v0.6.x/topology_discovery.md @@ -0,0 +1,124 @@ +# Topology Discovery + +One of the responsibilities of the CCM node controller is to annotate and label the nodes in a Kubernetes cluster with toplogy (region and zone) information. The Nutanix Cloud Controller Manager supports following topology discovery methods: + +- [Prism](#prism) +- [Categories](#categories) + +The topology discovery method can be configured via the `nutanix-config` configmap. See [Nutanix CCM Configuration](./ccm_configuration.md) for more information on the configuration parameters. + +## Prism + +Prism-based topology discovery is the default mode for Nutanix CCM. In this mode CCM will discover the Prism Element (PE) cluster and Prism Central (PC) instance that host the Kubernetes node VM. Prism Central is configured as the region for the node, while Prism Element is configured as the zone. + +Prism-based topology discovery can be configured by omitting the `topologyDiscovery` attribute from the `nutanix-config` configmap or by passing following object: +```JSON + "topologyDiscovery": { + "type": "Prism" + } +``` + +### Example +If a Kubernetes Node VM is hosted on PC `my-pc-instance` and PE `my-pe-cluster-1`, Nutanix CCM will assign following labels to the Kubernetes node: + +|Key |Value | +|-----------------------------|---------------| +|topology.kubernetes.io/region|my-pc-instance | +|topology.kubernetes.io/zone |my-pe-cluster-1| + +## Categories + +The category-based topology discovery mode allows users to assign categories to Prism Element clusters and Kubernetes Node VMs to define a custom topology. Nutanix CCM will hierarchically search for the required categories on the VM/PE. + +!!! note + + Categories assigned to the VM object will take precedence over the categories assigned to the PE cluster. + +It is required for the categories to exist inside of the PC environment. CCM will not create and assign the categories. +Visit the [Prism Central documentation](https://portal.nutanix.com/page/documents/details?targetId=Prism-Central-Guide-vpc_2022_6:ssp-ssp-categories-manage-pc-c.html){target=_blank} for more information regarding categories. + +To enable the Categories topology discovery mode for Nutanix CCM, provide following information in the `topologyDiscovery` attribute: + +```JSON + "topologyDiscovery": { + "type": "Categories", + "topologyCategories": { + "regionCategory": "${NUTANIX_REGION_CATEGORY}", + "zoneCategory": "${NUTANIX_ZONE_CATEGORY}" + } + } +``` + +### Example + +Define a set of categories in PC that will be used for topology discovery: + +|Key |Value | +|------------------|-----------------------| +|my-region-category|region-1, region-2 | +|my-zone-category |zone-1, zone-2, zone-3 | + +Assign the categories to the Nutanix entities: + +|Nutanix entity |Categories | +|---------------|------------------------------------------------------| +|my-pe-cluster-1|my-region-category:region-1
my-zone-category:zone-2| +|my-pe-cluster-2|my-region-category:region-2
my-zone-category:zone-3| +|k8s-node-3 |my-region-category:region-2
my-zone-category:zone-2| +|k8s-node-4 |my-zone-category:zone-1 | + + +Configure CCM to use categories for topology discovery: +```JSON + "topologyDiscovery": { + "type": "Categories", + "topologyCategories": { + "regionCategory": "my-region-category", + "zoneCategory": "my-zone-category" + } + } +``` + +!!! example "Scenario 1: Kubernetes node k8s-node-1 is running on my-pe-cluster-1" + + Following topology labels will be assigned to Kubernetes node `k8s-node-1`: + + |Key |Value | + |-----------------------------|---------------| + |topology.kubernetes.io/region|region-1 | + |topology.kubernetes.io/zone |zone-2 | + + Categories assigned to PE will be used. + +!!! example "Scenario 2: Kubernetes node k8s-node-2 is running on my-pe-cluster-2" + + Following topology labels will be assigned to Kubernetes node `k8s-node-2`: + + |Key |Value | + |-----------------------------|---------------| + |topology.kubernetes.io/region|region-2 | + |topology.kubernetes.io/zone |zone-3 | + + Categories assigned to PE will be used. + +!!! example "Scenario 3: Kubernetes node k8s-node-3 is running on my-pe-cluster-2" + + Following topology labels will be assigned to Kubernetes node `k8s-node-3`: + + |Key |Value | + |-----------------------------|---------------| + |topology.kubernetes.io/region|region-2 | + |topology.kubernetes.io/zone |zone-2 | + + Categories assigned to the VM will be used. + +!!! example "Scenario 4: Kubernetes node k8s-node-4 is running on my-pe-cluster-1" + + Following topology labels will be assigned to Kubernetes node `k8s-node-4`: + + |Key |Value | + |-----------------------------|---------------| + |topology.kubernetes.io/region|region-1 | + |topology.kubernetes.io/zone |zone-1 | + + In this scenario Nutanix CCM will use the value of the `my-zone-category` category that is assigned to the VM. Since the `my-region-category`is not assigned to the VM, Nutanix CCM will search for the category on PE and use the corresponding category value. \ No newline at end of file diff --git a/docs/ccm/v0.6.x/validated_integrations.md b/docs/ccm/v0.6.x/validated_integrations.md new file mode 100644 index 00000000..807f6051 --- /dev/null +++ b/docs/ccm/v0.6.x/validated_integrations.md @@ -0,0 +1,52 @@ +# Validated Integrations + +Validated integrations are a defined set of specifically tested configurations between technologies that represent the most common combinations that Nutanix customers are using or deploying with Nutanix CCM. For these integrations, Nutanix has directly, or through certified partners, exercised a full range of platform tests as part of the product release process. + +## Integration Validation Policy + +Nutanix follows the version validation policies below for CCM: + +- Validate at least one active AOS LTS (long term support) version. Validated AOS LTS version for a specific CCM version is listed in the [AOS](#aos) section.
+ + !!! note + + Typically the latest LTS release at time of CCM release except when latest is initial release in train (eg x.y.0). Exact version depends on timing and customer adoption. + +- Validate the latest AOS STS (short term support) release at time of CCM release. +- Validate at least one active Prism Central (PC) version. Validated PC version for a specific CCM version is listed in the [Prism Central](#prism-central) section.
+ + !!! note + + Typically the latest PC release at time of CCM release except when latest is initial release in train (eg x.y.0). Exact version depends on timing and customer adoption. + +- At least two active Kubernetes versions. Validated Kubernetes versions for a specific CCM version are listed in the [Kubernetes](#kubernetes) section.
+ + !!! note + + Typically the current stable Kubernetes release and the previous stable release at time of CCM release. + +## Validated versions + +### AOS + +| CCM | 6.5.x (LTS) | 6.8 (STS) | 6.10 | 7.0 | 7.3 | +|--------|-------------|-----------|------|-----|-----| +| v0.6.x | No | No | No | No | Yes | +| v0.5.x | Yes | Yes | Yes | Yes | Yes | +| v0.4.x | Yes | Yes | Yes | Yes | No | + +### Prism Central + +| CCM | pc.2022.6 | pc.2023.x | pc.2024.x | pc.7.3 | +|--------|-----------|-----------|-----------|--------| +| v0.6.x | No | No | No | Yes | +| v0.5.x | Yes | Yes | Yes | Yes | +| v0.4.x | Yes | Yes | No | No | + +### CAPX Integration + +| CCM | CAPX v1.6.x | CAPX v1.7.x | CAPX v1.8.x | +|--------|-------------|-------------|-------------| +| v0.6.x | Yes | Yes | Yes | +| v0.5.x | Yes | Yes | Yes | +| v0.4.x | Yes | Yes | No | diff --git a/docs/gpt-in-a-box/kubernetes/v0.1/custom_model.md b/docs/gpt-in-a-box/kubernetes/v0.1/custom_model.md deleted file mode 100644 index 8e1be37d..00000000 --- a/docs/gpt-in-a-box/kubernetes/v0.1/custom_model.md +++ /dev/null @@ -1,31 +0,0 @@ -# Custom Model Support -We provide the capability to generate a MAR file with custom models and start an inference server using Kubeflow serving.
-!!! note - A model is recognised as a custom model if it's model name is not present in the model_config file. - -## Generate Model Archive File for Custom Models -To generate the MAR file, run the following: -``` -python3 $WORK_DIR/llm/download.py --no_download [--repo_version --handler ] --model_name --model_path --output -``` - -* **no_download**: Set flag to skip downloading the model files, must be set for custom models -* **model_name**: Name of custom model, this name must not be in model_config -* **repo_version**: Any model version, defaults to "1.0" (optional) -* **model_path**: Absolute path of custom model files (should be non empty) -* **output**: Mount path to your nfs server to be used in the kube PV where config.properties and model archive file be stored -* **handler**: Path to custom handler, defaults to llm/handler.py (optional)
- -## Start Inference Server with Custom Model Archive File -Run the following command for starting Kubeflow serving and running inference on the given input with a custom MAR file: -``` -bash $WORK_DIR/llm/run.sh -n -g -f -m -e [OPTIONAL -d ] -``` - -* **n**: Name of custom model, this name must not be in model_config -* **d**: Absolute path of input data folder (Optional) -* **g**: Number of gpus to be used to execute (Set 0 to use cpu) -* **f**: NFS server address with share path information -* **m**: Mount path to your nfs server to be used in the kube PV where model files and model archive file be stored -* **e**: Name of the deployment metadata - diff --git a/docs/gpt-in-a-box/kubernetes/v0.1/generating_mar.md b/docs/gpt-in-a-box/kubernetes/v0.1/generating_mar.md deleted file mode 100644 index b2172a6d..00000000 --- a/docs/gpt-in-a-box/kubernetes/v0.1/generating_mar.md +++ /dev/null @@ -1,28 +0,0 @@ -## Download model files and Generate MAR file -Run the following command for downloading model files and generating MAR file: -``` -python3 $WORK_DIR/llm/download.py [--repo_version ] --model_name --output --hf_token -``` - -* **model_name**: Name of model -* **output**: Mount path to your nfs server to be used in the kube PV where model files and model archive file be stored -* **repo_version**: Commit id of model's repo from HuggingFace (optional, if not provided default set in model_config will be used) -* **hf_token**: Your HuggingFace token. Needed to download LLAMA(2) models. - -The available LLMs are mpt_7b (mosaicml/mpt_7b), falcon_7b (tiiuae/falcon-7b), llama2_7b (meta-llama/Llama-2-7b-hf). - -### Examples -The following are example commands to generate the model archive file. - -Download MPT-7B model files and generate model archive for it: -``` -python3 $WORK_DIR/llm/download.py --model_name mpt_7b --output /mnt/llm -``` -Download Falcon-7B model files and generate model archive for it: -``` -python3 $WORK_DIR/llm/download.py --model_name falcon_7b --output /mnt/llm -``` -Download Llama2-7B model files and generate model archive for it: -``` -python3 $WORK_DIR/llm/download.py --model_name llama2_7b --output /mnt/llm --hf_token -``` diff --git a/docs/gpt-in-a-box/kubernetes/v0.1/getting_started.md b/docs/gpt-in-a-box/kubernetes/v0.1/getting_started.md deleted file mode 100644 index e26dfd54..00000000 --- a/docs/gpt-in-a-box/kubernetes/v0.1/getting_started.md +++ /dev/null @@ -1,85 +0,0 @@ -# Getting Started -This is a guide on getting started with GPT-in-a-Box deployment on a Kubernetes Cluster. You can find the open source repository for the K8s version [here](https://github.com/nutanix/nai-llm-k8s). - -## Setup - -Inference experiments are done on a single NKE Cluster with Kubernetes version 1.25.6-0. The NKE Cluster has 3 non-gpu worker nodes with 12 vCPUs and 16G memory and 120 GB Storage. The cluster includes at least 1 gpu worker node with 12 vCPUs and 40G memory, 120 GB Storage and 1 A100-40G GPU passthrough. - -!!! note - Tested with python 3.10, a python virtual environment is preferred to managed dependencies. - -### Spec -**Jump node:** -OS: 22.04 -Resources: 1 VM with 8CPUs, 16G memory and 300 GB storage - -**NKE:** -NKE Version: 2.8 -K8s version: 1.25.6-0 -Resources: 3 cpu nodes with 12 vCPUs, 16G memory and 120 GB storage. - At least 1 gpu node with 12 vCPUs, 40G memory and 120 GB storage (1 A100-40G GPU passthrough) - -**NFS Server:** -Resources: 3 FSVMs with 4 vCPUs, 12 GB memory and 1 TB storage - - -| Software Dependency Matrix(Installed) | | -| --- | --- | -| Istio | 1.17.2 | -| Knative serving | 1.10.1 | -| Cert manager(Jetstack) | 1.3.0 | -| Kserve | 0.11.1 | - -### Jump machine setup -All commands are executed inside the jump machine. -Prerequisites are kubectl and helm. Both are required to orchestrate and set up necessary items in the NKE cluster. - -* [kubectl](https://kubernetes.io/docs/tasks/tools/#kubectl) -* [helm](https://helm.sh/docs/intro/install/) - -Have a NFS mounted into your jump machine at a specific location. This mount location is required to be supplied as parameter to the execution scripts - -Command to mount NFS to local folder -``` -mount -t nfs : -``` -![Screenshot of a Jump Machine Setup.](image1.png) - - -**Follow the steps below to install the necessary prerequisites.** - -### Download and set up KubeConfig -Download and set up KubeConfig by following the steps outlined in [Downloading the Kubeconfig](https://portal.nutanix.com/page/documents/details?targetId=Nutanix-Kubernetes-Engine-v2_5:top-download-kubeconfig-t.html) on the Nutanix Support Portal. - -### Configure Nvidia Driver in the cluster using helm commands -For NKE 2.8, run the following command as per the [official documentaton](https://portal.nutanix.com/page/documents/details?targetId=Release-Notes-Nutanix-Kubernetes-Engine-v2_8:top-validated-config-r.html): -``` -helm repo add nvidia https://nvidia.github.io/gpu-operator && helm repo update -helm install --wait -n gpu-operator --create-namespace gpu-operator nvidia/gpu-operator --version=v23.3.1 --set toolkit.version=v1.13.1-centos7 -``` - -For NKE 2.9, refer the [official documentation](https://portal.nutanix.com/page/documents/details?targetId=Release-Notes-Nutanix-Kubernetes-Engine-v2_9:top-validated-config-r.html) for the validated config. - -### Download nutanix package and Install python libraries -Download the **v0.1** release version from [NAI-LLM-K8s Releases](https://github.com/nutanix/nai-llm-k8s/releases/tag/v0.1) and untar the release. Set the working directory to the root folder containing the extracted release. -``` -export WORK_DIR=absolute_path_to_empty_release_directory -mkdir $WORK_DIR -tar -xvf -C $WORK_DIR --strip-components=1 -``` - -### Kubeflow serving installation into the cluster -``` -curl -s "https://raw.githubusercontent.com/kserve/kserve/v0.11.1/hack/quick_install.sh" | bash -``` -Now we have our cluster ready for inference. - -### Install pip3 -``` -sudo apt-get install python3-pip -``` - -### Install required packages -``` -pip install -r $WORK_DIR/llm/requirements.txt -``` diff --git a/docs/gpt-in-a-box/kubernetes/v0.1/image1.png b/docs/gpt-in-a-box/kubernetes/v0.1/image1.png deleted file mode 100644 index 5be8e71b..00000000 Binary files a/docs/gpt-in-a-box/kubernetes/v0.1/image1.png and /dev/null differ diff --git a/docs/gpt-in-a-box/kubernetes/v0.1/inference_requests.md b/docs/gpt-in-a-box/kubernetes/v0.1/inference_requests.md deleted file mode 100644 index 796591f9..00000000 --- a/docs/gpt-in-a-box/kubernetes/v0.1/inference_requests.md +++ /dev/null @@ -1,58 +0,0 @@ -Kubeflow serving can be inferenced and managed through it's Inference APIs. Find out more about Kubeflow serving APIs in the official [Inference API](https://kserve.github.io/website/0.8/modelserving/v1beta1/torchserve/#model-inference) documentation. -### Set HOST and PORT -The first step is to [determine the ingress IP and ports](https://kserve.github.io/website/0.8/get_started/first_isvc/#4-determine-the-ingress-ip-and-ports) and set INGRESS_HOST and INGRESS_PORT. -The following command assigns the IP address of the host where the Istio Ingress Gateway pod is running to the INGRESS_HOST variable: -``` -export INGRESS_HOST=$(kubectl get po -l istio=ingressgateway -n istio-system -o jsonpath='{.items[0].status.hostIP}') -``` -The following command assigns the node port used for the HTTP2 service of the Istio Ingress Gateway to the INGRESS_PORT variable: -``` -export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}') -``` - -### Set Service Host Name -Next step is to determine service hostname. -This command retrieves the hostname of a specific InferenceService in a Kubernetes environment by extracting it from the status.url field and assigns it to the SERVICE_HOSTNAME variable: -``` -SERVICE_HOSTNAME=$(kubectl get inferenceservice -o jsonpath='{.status.url}' | cut -d "/" -f 3) -``` -#### Example: -``` -SERVICE_HOSTNAME=$(kubectl get inferenceservice llm-deploy -o jsonpath='{.status.url}' | cut -d "/" -f 3) -``` - -### Curl request to get inference -In the next step inference can be done on the deployed model. -The following is the template command for inferencing with a json file: -``` -curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/{model_name}/infer -d @{input_file_path} -``` -#### Examples: -Curl request for MPT-7B model -``` -curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/mpt_7b/infer -d @$WORK_DIR/data/qa/sample_test1.json -``` -Curl request for Falcon-7B model -``` -curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/falcon_7b/infer -d @$WORK_DIR/data/summarize/sample_test1.json -``` -Curl request for Llama2-7B model -``` -curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/llama2_7b/infer -d @$WORK_DIR/data/translate/sample_test1.json -``` - -### Input data format -Input data should be in **JSON** format. The input should be a '.json' file containing the prompt in the format below: -``` -{ - "id": "42", - "inputs": [ - { - "name": "input0", - "shape": [-1], - "datatype": "BYTES", - "data": ["Capital of India?"] - } - ] -} -``` \ No newline at end of file diff --git a/docs/gpt-in-a-box/kubernetes/v0.1/inference_server.md b/docs/gpt-in-a-box/kubernetes/v0.1/inference_server.md deleted file mode 100644 index 3ea3166d..00000000 --- a/docs/gpt-in-a-box/kubernetes/v0.1/inference_server.md +++ /dev/null @@ -1,45 +0,0 @@ -## Start and run Kubeflow Serving - -Run the following command for starting Kubeflow serving and running inference on the given input: -``` -bash $WORK_DIR/llm/run.sh -n -g -f -m -e [OPTIONAL -d -v -t ] -``` - -* **n**: Name of model -* **d**: Absolute path of input data folder (Optional) -* **g**: Number of gpus to be used to execute (Set 0 to use cpu) -* **f**: NFS server address with share path information -* **m**: Mount path to your nfs server to be used in the kube PV where model files and model archive file be stored -* **e**: Name of the deployment metadata -* **v**: Commit id of model's repo from HuggingFace (optional, if not provided default set in model_config will be used) -* **t**: Your HuggingFace token. Needed for LLAMA(2) model. - -The available LLMs model names are mpt_7b (mosaicml/mpt_7b), falcon_7b (tiiuae/falcon-7b), llama2_7b (meta-llama/Llama-2-7b-hf). -Should print "Inference Run Successful" as a message once the Inference Server has successfully started. - -### Examples -The following are example commands to start the Inference Server. - -For 1 GPU Inference with official MPT-7B model and keep inference server alive: -``` -bash $WORK_DIR/llm/run.sh -n mpt_7b -d data/translate -g 1 -e llm-deploy -f '1.1.1.1:/llm' -m /mnt/llm -``` -For 1 GPU Inference with official Falcon-7B model and keep inference server alive: -``` -bash $WORK_DIR/llm/run.sh -n falcon_7b -d data/qa -g 1 -e llm-deploy -f '1.1.1.1:/llm' -m /mnt/llm -``` -For 1 GPU Inference with official Llama2-7B model and keep inference server alive: -``` -bash $WORK_DIR/llm/run.sh -n llama2_7b -d data/summarize -g 1 -e llm-deploy -f '1.1.1.1:/llm' -m /mnt/llm -t -``` - -### Cleanup Inference deployment - -Run the following command to stop the inference server and unmount PV and PVC. -``` -python3 $WORK_DIR/llm/cleanup.py --deploy_name -``` -Example: -``` -python3 $WORK_DIR/llm/cleanup.py --deploy_name llm-deploy -``` \ No newline at end of file diff --git a/docs/gpt-in-a-box/kubernetes/v0.2/custom_model.md b/docs/gpt-in-a-box/kubernetes/v0.2/custom_model.md deleted file mode 100644 index 57096966..00000000 --- a/docs/gpt-in-a-box/kubernetes/v0.2/custom_model.md +++ /dev/null @@ -1,33 +0,0 @@ -# Custom Model Support -In some cases you may want to use a custom model, e.g. a custom fine-tuned model. We provide the capability to generate a MAR file with custom models and start an inference server using Kubeflow serving.
- -## Generate Model Archive File for Custom Models - -!!! note - The model files should be placed in an NFS share accessible by the Nutanix package. This directory will be passed to the --model_path argument. You'll also need to provide the --output path where you want the model archive export to be stored. - -To generate the MAR file, run the following: -``` -python3 $WORK_DIR/llm/generate.py --skip_download [--repo_version --handler ] --model_name --model_path --output -``` - -* **skip_download**: Set flag to skip downloading the model files, must be set for custom models -* **model_name**: Name of custom model -* **repo_version**: Any model version, defaults to "1.0" (optional) -* **model_path**: Absolute path of custom model files (should be non empty) -* **output**: Mount path to your nfs server to be used in the kube PV where config.properties and model archive file be stored -* **handler**: Path to custom handler, defaults to llm/handler.py (optional)
- -## Start Inference Server with Custom Model Archive File -Run the following command for starting Kubeflow serving and running inference on the given input with a custom MAR file: -``` -bash $WORK_DIR/llm/run.sh -n -g -f -m -e [OPTIONAL -d ] -``` - -* **n**: Name of custom model, this name must not be in model_config -* **d**: Absolute path of input data folder (Optional) -* **g**: Number of gpus to be used to execute (Set 0 to use cpu) -* **f**: NFS server address with share path information -* **m**: Mount path to your nfs server to be used in the kube PV where model files and model archive file be stored -* **e**: Name of the deployment metadata - diff --git a/docs/gpt-in-a-box/kubernetes/v0.2/generating_mar.md b/docs/gpt-in-a-box/kubernetes/v0.2/generating_mar.md deleted file mode 100644 index 1e8ccd68..00000000 --- a/docs/gpt-in-a-box/kubernetes/v0.2/generating_mar.md +++ /dev/null @@ -1,28 +0,0 @@ -# Generate PyTorch Model Archive File -We will download the model files and generate a Model Archive file for the desired LLM, which will be used by TorchServe to load the model. Find out more about Torch Model Archiver [here](https://github.com/pytorch/serve/blob/master/model-archiver/README.md). - -Run the following command for downloading model files and generating MAR file: -``` -python3 $WORK_DIR/llm/generate.py [--hf_token --repo_version ] --model_name --output -``` - -* **model_name**: Name of a [validated model](validated_models.md) -* **output**: Mount path to your nfs server to be used in the kube PV where model files and model archive file be stored -* **repo_version**: Commit ID of model's HuggingFace repository (optional, if not provided default set in model_config will be used) -* **hf_token**: Your HuggingFace token. Needed to download LLAMA(2) models. (It can alternatively be set using the environment variable 'HF_TOKEN') - -### Examples -The following are example commands to generate the model archive file. - -Download MPT-7B model files and generate model archive for it: -``` -python3 $WORK_DIR/llm/generate.py --model_name mpt_7b --output /mnt/llm -``` -Download Falcon-7B model files and generate model archive for it: -``` -python3 $WORK_DIR/llm/generate.py --model_name falcon_7b --output /mnt/llm -``` -Download Llama2-7B model files and generate model archive for it: -``` -python3 $WORK_DIR/llm/generate.py --model_name llama2_7b --output /mnt/llm --hf_token -``` diff --git a/docs/gpt-in-a-box/kubernetes/v0.2/getting_started.md b/docs/gpt-in-a-box/kubernetes/v0.2/getting_started.md deleted file mode 100644 index 7cb0b9d5..00000000 --- a/docs/gpt-in-a-box/kubernetes/v0.2/getting_started.md +++ /dev/null @@ -1,85 +0,0 @@ -# Getting Started -This is a guide on getting started with GPT-in-a-Box deployment on a Kubernetes Cluster. You can find the open source repository for the K8s version [here](https://github.com/nutanix/nai-llm-k8s). - -## Setup - -Inference experiments are done on a single NKE Cluster with Kubernetes version 1.25.6-0. The NKE Cluster has 3 non-gpu worker nodes with 12 vCPUs and 16G memory and 120 GB Storage. The cluster includes at least 1 gpu worker node with 12 vCPUs and 40G memory, 120 GB Storage and 1 A100-40G GPU passthrough. - -!!! note - Tested with python 3.10, a python virtual environment is preferred to managed dependencies. - -### Spec -**Jump node:** -OS: 22.04 -Resources: 1 VM with 8CPUs, 16G memory and 300 GB storage - -**NKE:** -NKE Version: 2.8 -K8s version: 1.25.6-0 -Resources: 3 cpu nodes with 12 vCPUs, 16G memory and 120 GB storage. - At least 1 gpu node with 12 vCPUs, 40G memory and 120 GB storage (1 A100-40G GPU passthrough) - -**NFS Server:** -Resources: 3 FSVMs with 4 vCPUs, 12 GB memory and 1 TB storage - - -| Software Dependency Matrix(Installed) | | -| --- | --- | -| Istio | 1.17.2 | -| Knative serving | 1.10.1 | -| Cert manager(Jetstack) | 1.3.0 | -| Kserve | 0.11.1 | - -### Jump machine setup -All commands are executed inside the jump machine. -Prerequisites are kubectl and helm. Both are required to orchestrate and set up necessary items in the NKE cluster. - -* [kubectl](https://kubernetes.io/docs/tasks/tools/#kubectl) -* [helm](https://helm.sh/docs/intro/install/) - -Have a NFS mounted into your jump machine at a specific location. This mount location is required to be supplied as parameter to the execution scripts - -Command to mount NFS to local folder -``` -mount -t nfs : -``` -![Screenshot of a Jump Machine Setup.](image1.png) - - -**Follow the steps below to install the necessary prerequisites.** - -### Download and set up KubeConfig -Download and set up KubeConfig by following the steps outlined in [Downloading the Kubeconfig](https://portal.nutanix.com/page/documents/details?targetId=Nutanix-Kubernetes-Engine-v2_5:top-download-kubeconfig-t.html) on the Nutanix Support Portal. - -### Configure Nvidia Driver in the cluster using helm commands -For NKE 2.8, run the following command as per the [official documentaton](https://portal.nutanix.com/page/documents/details?targetId=Release-Notes-Nutanix-Kubernetes-Engine-v2_8:top-validated-config-r.html): -``` -helm repo add nvidia https://nvidia.github.io/gpu-operator && helm repo update -helm install --wait -n gpu-operator --create-namespace gpu-operator nvidia/gpu-operator --version=v23.3.1 --set toolkit.version=v1.13.1-centos7 -``` - -For NKE 2.9, refer the [official documentation](https://portal.nutanix.com/page/documents/details?targetId=Release-Notes-Nutanix-Kubernetes-Engine-v2_9:top-validated-config-r.html) for the validated config. - -### Download nutanix package and Install python libraries -Download the **v0.2.2** release version from [NAI-LLM-K8s Releases](https://github.com/nutanix/nai-llm-k8s/releases/tag/v0.2.2) and untar the release. Set the working directory to the root folder containing the extracted release. -``` -export WORK_DIR=absolute_path_to_empty_release_directory -mkdir $WORK_DIR -tar -xvf -C $WORK_DIR --strip-components=1 -``` - -### Kubeflow serving installation into the cluster -``` -curl -s "https://raw.githubusercontent.com/kserve/kserve/v0.11.1/hack/quick_install.sh" | bash -``` -Now we have our cluster ready for inference. - -### Install pip3 -``` -sudo apt-get install python3-pip -``` - -### Install required packages -``` -pip install -r $WORK_DIR/llm/requirements.txt -``` diff --git a/docs/gpt-in-a-box/kubernetes/v0.2/huggingface_model.md b/docs/gpt-in-a-box/kubernetes/v0.2/huggingface_model.md deleted file mode 100644 index 9c2f5be6..00000000 --- a/docs/gpt-in-a-box/kubernetes/v0.2/huggingface_model.md +++ /dev/null @@ -1,46 +0,0 @@ -# HuggingFace Model Support -!!! Note - To start the inference server for the [**Validated Models**](validated_models.md), refer to the [**Deploying Inference Server**](inference_server.md) documentation. - -We provide the capability to download model files from any HuggingFace repository and generate a MAR file to start an inference server using Kubeflow serving.
- -To start the Inference Server for any other HuggingFace model, follow the steps below. - -## Generate Model Archive File for HuggingFace Models -Run the following command for downloading and generating the Model Archive File (MAR) with the HuggingFace Model files : -``` -python3 $WORK_DIR/llm/generate.py [--hf_token --repo_version --handler ] --model_name --repo_id --model_path --output -``` - -* **model_name**: Name of HuggingFace model -* **repo_id**: HuggingFace Repository ID of the model -* **repo_version**: Commit ID of model's HuggingFace repository, defaults to latest HuggingFace commit ID (optional) -* **model_path**: Absolute path of custom model files (should be empty) -* **output**: Mount path to your nfs server to be used in the kube PV where config.properties and model archive file be stored -* **handler**: Path to custom handler, defaults to llm/handler.py (optional)
-* **hf_token**: Your HuggingFace token. Needed to download and verify LLAMA(2) models. - -### Example -Download model files and generate model archive for codellama/CodeLlama-7b-hf: -``` -python3 $WORK_DIR/llm/generate.py --model_name codellama_7b_hf --repo_id codellama/CodeLlama-7b-hf --model_path /models/codellama_7b_hf/model_files --output /mnt/llm -``` - -## Start Inference Server with HuggingFace Model Archive File -Run the following command for starting Kubeflow serving and running inference on the given input with a custom MAR file: -``` -bash $WORK_DIR/llm/run.sh -n -g -f -m -e [OPTIONAL -d ] -``` - -* **n**: Name of HuggingFace model -* **d**: Absolute path of input data folder (Optional) -* **g**: Number of gpus to be used to execute (Set 0 to use cpu) -* **f**: NFS server address with share path information -* **m**: Mount path to your nfs server to be used in the kube PV where model files and model archive file be stored -* **e**: Name of the deployment metadata - -### Example -To start Inference Server with codellama/CodeLlama-7b-hf: -``` -bash $WORK_DIR/llm/run.sh -n codellama_7b_hf -d data/qa -g 1 -e llm-deploy -f '1.1.1.1:/llm' -m /mnt/llm -``` diff --git a/docs/gpt-in-a-box/kubernetes/v0.2/image1.png b/docs/gpt-in-a-box/kubernetes/v0.2/image1.png deleted file mode 100644 index 5be8e71b..00000000 Binary files a/docs/gpt-in-a-box/kubernetes/v0.2/image1.png and /dev/null differ diff --git a/docs/gpt-in-a-box/kubernetes/v0.2/inference_requests.md b/docs/gpt-in-a-box/kubernetes/v0.2/inference_requests.md deleted file mode 100644 index eb2d101c..00000000 --- a/docs/gpt-in-a-box/kubernetes/v0.2/inference_requests.md +++ /dev/null @@ -1,59 +0,0 @@ -Kubeflow serving can be inferenced and managed through its Inference APIs. Find out more about Kubeflow serving APIs in the official [Inference API](https://kserve.github.io/website/0.8/modelserving/v1beta1/torchserve/#model-inference) documentation. - -### Set HOST and PORT -The first step is to [determine the ingress IP and ports](https://kserve.github.io/website/0.8/get_started/first_isvc/#4-determine-the-ingress-ip-and-ports) and set INGRESS_HOST and INGRESS_PORT. -The following command assigns the IP address of the host where the Istio Ingress Gateway pod is running to the INGRESS_HOST variable: -``` -export INGRESS_HOST=$(kubectl get po -l istio=ingressgateway -n istio-system -o jsonpath='{.items[0].status.hostIP}') -``` -The following command assigns the node port used for the HTTP2 service of the Istio Ingress Gateway to the INGRESS_PORT variable: -``` -export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}') -``` - -### Set Service Host Name -Next step is to determine service hostname. -This command retrieves the hostname of a specific InferenceService in a Kubernetes environment by extracting it from the status.url field and assigns it to the SERVICE_HOSTNAME variable: -``` -SERVICE_HOSTNAME=$(kubectl get inferenceservice -o jsonpath='{.status.url}' | cut -d "/" -f 3) -``` -#### Example: -``` -SERVICE_HOSTNAME=$(kubectl get inferenceservice llm-deploy -o jsonpath='{.status.url}' | cut -d "/" -f 3) -``` - -### Curl request to get inference -In the next step inference can be done on the deployed model. -The following is the template command for inferencing with a json file: -``` -curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/{model_name}/infer -d @{input_file_path} -``` -#### Examples: -Curl request for MPT-7B model -``` -curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/mpt_7b/infer -d @$WORK_DIR/data/qa/sample_text1.json -``` -Curl request for Falcon-7B model -``` -curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/falcon_7b/infer -d @$WORK_DIR/data/summarize/sample_text1.json -``` -Curl request for Llama2-7B model -``` -curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/llama2_7b/infer -d @$WORK_DIR/data/translate/sample_text1.json -``` - -### Input data format -Input data should be in **JSON** format. The input should be a '.json' file containing the prompt in the format below: -``` -{ - "id": "42", - "inputs": [ - { - "name": "input0", - "shape": [-1], - "datatype": "BYTES", - "data": ["Capital of India?"] - } - ] -} -``` \ No newline at end of file diff --git a/docs/gpt-in-a-box/kubernetes/v0.2/inference_server.md b/docs/gpt-in-a-box/kubernetes/v0.2/inference_server.md deleted file mode 100644 index 58cb9b06..00000000 --- a/docs/gpt-in-a-box/kubernetes/v0.2/inference_server.md +++ /dev/null @@ -1,43 +0,0 @@ -## Start and run Kubeflow Serving - -Run the following command for starting Kubeflow serving and running inference on the given input: -``` -bash $WORK_DIR/llm/run.sh -n -g -f -m -e [OPTIONAL -d -v ] -``` - -* **n**: Name of a [validated model](validated_models.md) -* **d**: Absolute path of input data folder (Optional) -* **g**: Number of gpus to be used to execute (Set 0 to use cpu) -* **f**: NFS server address with share path information -* **m**: Mount path to your nfs server to be used in the kube PV where model files and model archive file be stored -* **e**: Desired name of the deployment metadata (will be created) -* **v**: Commit ID of model's HuggingFace repository (optional, if not provided default set in model_config will be used) - -Should print "Inference Run Successful" as a message once the Inference Server has successfully started. - -### Examples -The following are example commands to start the Inference Server. - -For 1 GPU Inference with official MPT-7B model and keep inference server alive: -``` -bash $WORK_DIR/llm/run.sh -n mpt_7b -d data/translate -g 1 -e llm-deploy -f '1.1.1.1:/llm' -m /mnt/llm -``` -For 1 GPU Inference with official Falcon-7B model and keep inference server alive: -``` -bash $WORK_DIR/llm/run.sh -n falcon_7b -d data/qa -g 1 -e llm-deploy -f '1.1.1.1:/llm' -m /mnt/llm -``` -For 1 GPU Inference with official Llama2-7B model and keep inference server alive: -``` -bash $WORK_DIR/llm/run.sh -n llama2_7b -d data/summarize -g 1 -e llm-deploy -f '1.1.1.1:/llm' -m /mnt/llm -``` - -### Cleanup Inference deployment - -Run the following command to stop the inference server and unmount PV and PVC. -``` -python3 $WORK_DIR/llm/cleanup.py --deploy_name -``` -Example: -``` -python3 $WORK_DIR/llm/cleanup.py --deploy_name llm-deploy -``` \ No newline at end of file diff --git a/docs/gpt-in-a-box/kubernetes/v0.2/validated_models.md b/docs/gpt-in-a-box/kubernetes/v0.2/validated_models.md deleted file mode 100644 index 3ed7c8b6..00000000 --- a/docs/gpt-in-a-box/kubernetes/v0.2/validated_models.md +++ /dev/null @@ -1,16 +0,0 @@ -# Validated Models for Kubernetes Version - -GPT-in-a-Box has been validated on a curated set of HuggingFace models Information pertaining to these models is stored in the ```llm/model_config.json``` file. - -The Validated Models are : - -| Model Name | HuggingFace Repository ID | -| --- | --- | -| mpt_7b | [mosaicml/mpt_7b](https://huggingface.co/mosaicml/mpt-7b) | -| falcon_7b | [tiiuae/falcon-7b](https://huggingface.co/tiiuae/falcon-7b) | -| llama2_7b | [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | -| codellama_7b_python | [codellama/CodeLlama-7b-Python-hf](https://huggingface.co/codellama/CodeLlama-7b-Python-hf) | -| llama2_7b_chat | [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) | - -!!! note - To start the inference server with any HuggingFace model, refer to [**HuggingFace Model Support**](huggingface_model.md) documentation. \ No newline at end of file diff --git a/docs/gpt-in-a-box/overview.md b/docs/gpt-in-a-box/overview.md deleted file mode 100644 index f9bd01f5..00000000 --- a/docs/gpt-in-a-box/overview.md +++ /dev/null @@ -1,11 +0,0 @@ -# Nutanix GPT-in-a-Box Documentation - -Welcome to the official home dedicated to documenting how to run Nutanix GPT-in-a-Box. Nutanix GPT-in-a-Box is a new turnkey solution that includes everything needed to build AI-ready infrastructure. Here, you'll find information and code to run Nutanix GPT-in-a-Box on Virtual Machines or Kubernetes Clusters. - -This new solution includes: - -- Software-defined Nutanix Cloud Platform™ infrastructure supporting GPU-enabled server nodes for seamless scaling of virtualized compute, storage, and networking supporting both traditional virtual machines and Kubernetes-orchestrated containers -- Files and Objects storage; to fine-tune and run a choice of GPT models -- Open source software to deploy and run AI workloads including PyTorch framework & KubeFlow MLOps platform -- The management interface for enhanced terminal UI or standard CLI -- Support for a curated set of LLMs including Llama2, Falcon and MPT diff --git a/docs/gpt-in-a-box/support.md b/docs/gpt-in-a-box/support.md deleted file mode 100644 index f2f75c64..00000000 --- a/docs/gpt-in-a-box/support.md +++ /dev/null @@ -1,14 +0,0 @@ -# Nutanix GPT-in-a-Box Support - -Nutanix maintains public GitHub repositories for GPT in a box. Support is handled directly via the repository. Issues and enhancement requests can be submitted in the Issues tab of the relevant repository. Search for and review existing open issues before submitting a new issue. To report a new issue navigate to the GitHub repository: - -[GitHub - nutanix/nai-llm ](https://github.com/nutanix/nai-llm) - -This is the official repository for the virtual machine version of Nutanix GPT-in-a-Box. - -[GitHub - nutanix/nai-llm-k8s](https://github.com/nutanix/nai-llm-k8s) - -This is the official repository for the Kubernetes version of Nutanix GPT-in-a-Box. - -The support procedure is documented in [KB 16159](https://portal.nutanix.com/page/documents/kbs/details?targetId=kA0VO0000000dJ70AI). - diff --git a/docs/gpt-in-a-box/vm/v0.2/custom_model.md b/docs/gpt-in-a-box/vm/v0.2/custom_model.md deleted file mode 100644 index 997a4bbe..00000000 --- a/docs/gpt-in-a-box/vm/v0.2/custom_model.md +++ /dev/null @@ -1,29 +0,0 @@ -# Custom Model Support -We provide the capability to generate a MAR file with custom models and start an inference server using it with Torchserve. -!!! note - A model is recognised as a custom model if it's model name is not present in the model_config file. - -## Generate Model Archive File for Custom Models -Run the following command for generating the Model Archive File (MAR) with the Custom Model files : -``` -python3 $WORK_DIR/llm/download.py --no_download [--repo_version --handler ] --model_name --model_path --mar_output -``` -Where the arguments are : - -- **model_name**: Name of custom model -- **repo_version**: Any model version, defaults to "1.0" (optional) -- **model_path**: Absolute path of custom model files (should be a non empty folder) -- **mar_output**: Absolute path of export of MAR file (.mar) -- **no_download**: Flag to skip downloading the model files, must be set for custom models -- **handler**: Path to custom handler, defaults to llm/handler.py (optional) - -## Start Inference Server with Custom Model Archive File -Run the following command to start TorchServe (Inference Server) and run inference on the provided input for custom models: -``` -bash $WORK_DIR/llm/run.sh -n -a [OPTIONAL -d ] -``` -Where the arguments are : - -- **n**: Name of custom model -- **d**: Absolute path of input data folder (optional) -- **a**: Absolute path to the Model Store directory \ No newline at end of file diff --git a/docs/gpt-in-a-box/vm/v0.2/generating_mar.md b/docs/gpt-in-a-box/vm/v0.2/generating_mar.md deleted file mode 100644 index 4aed2fb0..00000000 --- a/docs/gpt-in-a-box/vm/v0.2/generating_mar.md +++ /dev/null @@ -1,38 +0,0 @@ -# Generate PyTorch Model Archive File -We will download the model files and generate a Model Archive file for the desired LLM, which will be used by TorchServe to load the model. Find out more about Torch Model Archiver [here](https://github.com/pytorch/serve/blob/master/model-archiver/README.md). - -Make two new directories, one to store the model files (model_path) and another to store the Model Archive files (mar_output). - -!!! note - The model store directory (i.e, mar_output) can be the same for multiple Model Archive files. But model files directory (i.e, model_path) should be empty if you're downloading the model. - -Run the following command for downloading model files and generating the Model Archive File (MAR) of the desired LLM: -``` -python3 $WORK_DIR/llm/download.py [--no_download --repo_version ] --model_name --model_path --mar_output --hf_token -``` -Where the arguments are : - -- **model_name**: Name of model -- **repo_version**: Commit ID of model's HuggingFace repository (optional, if not provided default set in model_config will be used) -- **model_path**: Absolute path of model files (should be empty if downloading) -- **mar_output**: Absolute path of export of MAR file (.mar) -- **no_download**: Flag to skip downloading the model files -- **hf_token**: Your HuggingFace token. Needed to download and verify LLAMA(2) models. - -The available LLMs are mpt_7b (mosaicml/mpt_7b), falcon_7b (tiiuae/falcon-7b), llama2_7b (meta-llama/Llama-2-7b-hf). - -## Examples -The following are example commands to generate the model archive file. - -Download MPT-7B model files and generate model archive for it: -``` -python3 $WORK_DIR/llm/download.py --model_name mpt_7b --model_path /home/ubuntu/models/mpt_7b/model_files --mar_output /home/ubuntu/models/model_store -``` -Download Falcon-7B model files and generate model archive for it: -``` -python3 $WORK_DIR/llm/download.py --model_name falcon_7b --model_path /home/ubuntu/models/falcon_7b/model_files --mar_output /home/ubuntu/models/model_store -``` -Download Llama2-7B model files and generate model archive for it: -``` -python3 $WORK_DIR/llm/download.py --model_name llama2_7b --model_path /home/ubuntu/models/llama2_7b/model_files --mar_output /home/ubuntu/models/model_store --hf_token -``` \ No newline at end of file diff --git a/docs/gpt-in-a-box/vm/v0.2/getting_started.md b/docs/gpt-in-a-box/vm/v0.2/getting_started.md deleted file mode 100644 index 8c3dad24..00000000 --- a/docs/gpt-in-a-box/vm/v0.2/getting_started.md +++ /dev/null @@ -1,49 +0,0 @@ -# Getting Started -This is a guide on getting started with GPT-in-a-Box deployment on a Virtual Machine. You can find the open source repository for the virtual machine version [here](https://github.com/nutanix/nai-llm). - -Tested Specifications: - -| Specification | Tested Version | -| --- | --- | -| Python | 3.10 | -| Operating System | Ubuntu 20.04 | -| GPU | NVIDIA A100 40G | -| CPU | 8 vCPUs | -| System Memory | 32 GB | - -Follow the steps below to install the necessary prerequisites. - -### Install openjdk, pip3 -Run the following command to install pip3 and openjdk -``` -sudo apt-get install openjdk-17-jdk python3-pip -``` - -### Install NVIDIA Drivers -To install the NVIDIA Drivers, refer to the official [Installation Reference](https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html#runfile). - -Proceed to downloading the latest [Datacenter NVIDIA drivers](https://www.nvidia.com/download/index.aspx) for your GPU type. - -For NVIDIA A100, Select A100 in Datacenter Tesla for Linux 64 bit with CUDA toolkit 11.7, latest driver is 515.105.01. - -``` -curl -fSsl -O https://us.download.nvidia.com/tesla/515.105.01/NVIDIA-Linux-x86_64-515.105.01.run -sudo sh NVIDIA-Linux-x86_64-515.105.01.run -s -``` -!!! note - We don’t need to install CUDA toolkit separately as it is bundled with PyTorch installation. Just NVIDIA driver installation is enough. - -### Download Nutanix package -Download the **v0.2** release version from the [NAI-LLM Releases](https://github.com/nutanix/nai-llm/releases/tag/v0.2) and untar the release on the node. Set the working directory to the root folder containing the extracted release. - -``` -export WORK_DIR=absolute_path_to_empty_release_directory -mkdir $WORK_DIR -tar -xvf -C $WORK_DIR --strip-components=1 -``` - -### Install required packages -Run the following command to install the required python packages. -``` -pip install -r $WORK_DIR/llm/requirements.txt -``` \ No newline at end of file diff --git a/docs/gpt-in-a-box/vm/v0.2/inference_requests.md b/docs/gpt-in-a-box/vm/v0.2/inference_requests.md deleted file mode 100644 index b69243ab..00000000 --- a/docs/gpt-in-a-box/vm/v0.2/inference_requests.md +++ /dev/null @@ -1,82 +0,0 @@ -# Inference Requests -The Inference Server can be inferenced through the TorchServe Inference API. Find out more about it in the official [TorchServe Inference API](https://pytorch.org/serve/inference_api.html) documentation. - -**Server Configuration** - -| Variable | Value | -| --- | --- | -| inference_server_endpoint | localhost | -| inference_port | 8080 | - -The following are example cURL commands to send inference requests to the Inference Server. - -## Ping Request -To find out the status of a TorchServe server, you can use the ping API that TorchServe supports: -``` -curl http://{inference_server_endpoint}:{inference_port}/ping -``` -### Example -``` -curl http://localhost:8080/ping -``` -!!! note - This only provides information on whether the TorchServe server is running. To check whether a model is successfully registered, use the "List Registered Models" request in the [Management Requests](management_requests.md#list-registered-models) documentation. - -## Inference Requests -The following is the template command for inferencing with a text file: -``` -curl -v -H "Content-Type: application/text" http://{inference_server_endpoint}:{inference_port}/predictions/{model_name} -d @path/to/data.txt -``` - -The following is the template command for inferencing with a json file: -``` -curl -v -H "Content-Type: application/json" http://{inference_server_endpoint}:{inference_port}/predictions/{model_name} -d @path/to/data.json -``` - -Input data files can be found in the `$WORK_DIR/data` folder. - -### Examples - -For MPT-7B model -``` -curl -v -H "Content-Type: application/text" http://localhost:8080/predictions/mpt_7b -d @$WORK_DIR/data/qa/sample_text1.txt -``` -``` -curl -v -H "Content-Type: application/json" http://localhost:8080/predictions/mpt_7b -d @$WORK_DIR/data/qa/sample_text4.json -``` - -For Falcon-7B model -``` -curl -v -H "Content-Type: application/text" http://localhost:8080/predictions/falcon_7b -d @$WORK_DIR/data/summarize/sample_text1.txt -``` -``` -curl -v -H "Content-Type: application/json" http://localhost:8080/predictions/falcon_7b -d @$WORK_DIR/data/summarize/sample_text3.json -``` - -For Llama2-7B model -``` -curl -v -H "Content-Type: application/text" http://localhost:8080/predictions/llama2_7b -d @$WORK_DIR/data/translate/sample_text1.txt -``` -``` -curl -v -H "Content-Type: application/json" http://localhost:8080/predictions/llama2_7b -d @$WORK_DIR/data/translate/sample_text3.json -``` - -### Input data format -Input data can be in either **text** or **JSON** format. - -1. For text format, the input should be a '.txt' file containing the prompt - -2. For JSON format, the input should be a '.json' file containing the prompt in the format below: -``` -{ - "id": "42", - "inputs": [ - { - "name": "input0", - "shape": [-1], - "datatype": "BYTES", - "data": ["Capital of India?"] - } - ] -} -``` \ No newline at end of file diff --git a/docs/gpt-in-a-box/vm/v0.2/inference_server.md b/docs/gpt-in-a-box/vm/v0.2/inference_server.md deleted file mode 100644 index a89a8079..00000000 --- a/docs/gpt-in-a-box/vm/v0.2/inference_server.md +++ /dev/null @@ -1,37 +0,0 @@ -# Deploying Inference Server -Run the following command to start TorchServe (Inference Server) and run inference on the provided input: -``` -bash $WORK_DIR/llm/run.sh -n -a [OPTIONAL -d -v ] -``` -Where the arguments are : - -- **n**: Name of model -- **v**: Commit ID of model's HuggingFace repository (optional, if not provided default set in model_config will be used) -- **d**: Absolute path of input data folder (optional) -- **a**: Absolute path to the Model Store directory - -The available LLMs model names are mpt_7b (mosaicml/mpt_7b), falcon_7b (tiiuae/falcon-7b), llama2_7b (meta-llama/Llama-2-7b-hf). - -Once the Inference Server has successfully started, you should see a "Ready For Inferencing" message. - -### Examples -The following are example commands to start the Inference Server. - -For Inference with official MPT-7B model: -``` -bash $WORK_DIR/llm/run.sh -n mpt_7b -d $WORK_DIR/data/translate -a /home/ubuntu/models/model_store -``` -For Inference with official Falcon-7B model: -``` -bash $WORK_DIR/llm/run.sh -n falcon_7b -d $WORK_DIR/data/qa -a /home/ubuntu/models/model_store -``` -For Inference with official Llama2-7B model: -``` -bash $WORK_DIR/llm/run.sh -n llama2_7b -d $WORK_DIR/data/summarize -a /home/ubuntu/models/model_store -``` - -## Stop Inference Server and Cleanup -Run the following command to stop the Inference Server and clean up temporarily generate files. -``` -python3 $WORK_DIR/llm/cleanup.py -``` \ No newline at end of file diff --git a/docs/gpt-in-a-box/vm/v0.2/management_requests.md b/docs/gpt-in-a-box/vm/v0.2/management_requests.md deleted file mode 100644 index cb9819c6..00000000 --- a/docs/gpt-in-a-box/vm/v0.2/management_requests.md +++ /dev/null @@ -1,133 +0,0 @@ -# Management Requests -The Inference Server can be managed through the TorchServe Management API. Find out more about it in the official [TorchServe Management API](https://pytorch.org/serve/management_api.html) documentation - -**Server Configuration** - -| Variable | Value | -| --- | --- | -| inference_server_endpoint | localhost | -| management_port | 8081 | - -The following are example cURL commands to send management requests to the Inference Server. - -## List Registered Models -To describe all registered models, the template command is: -``` -curl http://{inference_server_endpoint}:{management_port}/models -``` - -### Example -For all registered models -``` -curl http://localhost:8081/models -``` - -## Describe Registered Models -Once a model is loaded on the Inference Server, we can use the following request to describe the model and it's configuration. - -The following is the template command for the same: -``` -curl http://{inference_server_endpoint}:{management_port}/models/{model_name} -``` -Example response of the describe models request: -``` -[ - { - "modelName": "llama2_7b", - "modelVersion": "6fdf2e60f86ff2481f2241aaee459f85b5b0bbb9", - "modelUrl": "llama2_7b_6fdf2e6.mar", - "runtime": "python", - "minWorkers": 1, - "maxWorkers": 1, - "batchSize": 1, - "maxBatchDelay": 200, - "loadedAtStartup": false, - "workers": [ - { - "id": "9000", - "startTime": "2023-11-28T06:39:28.081Z", - "status": "READY", - "memoryUsage": 0, - "pid": 57379, - "gpu": true, - "gpuUsage": "gpuId::0 utilization.gpu [%]::0 % utilization.memory [%]::0 % memory.used [MiB]::13423 MiB" - } - ], - "jobQueueStatus": { - "remainingCapacity": 1000, - "pendingRequests": 0 - } - } -] -``` - -!!! note - From this request, you can validate if a model is ready for inferencing. You can do this by referring to the values under the "workers" -> "status" keys of the response. - -### Examples -For MPT-7B model -``` -curl http://localhost:8081/models/mpt_7b -``` -For Falcon-7B model -``` -curl http://localhost:8081/models/falcon_7b -``` -For Llama2-7B model -``` -curl http://localhost:8081/models/llama2_7b -``` - -## Register Additional Models -TorchServe allows the registering (loading) of multiple models simultaneously. To register multiple models, make sure that the Model Archive Files for the concerned models are stored in the same directory. - -The following is the template command for the same: -``` -curl -X POST "http://{inference_server_endpoint}:{management_port}/models?url={model_archive_file_name}.mar&initial_workers=1&synchronous=true" -``` - -### Examples -For MPT-7B model -``` -curl -X POST "http://localhost:8081/models?url=mpt_7b.mar&initial_workers=1&synchronous=true" -``` -For Falcon-7B model -``` -curl -X POST "http://localhost:8081/models?url=falcon_7b.mar&initial_workers=1&synchronous=true" -``` -For Llama2-7B model -``` -curl -X POST "http://localhost:8081/models?url=llama2_7b.mar&initial_workers=1&synchronous=true" -``` -!!! note - Make sure the Model Archive file name given in the cURL request is correct and is present in the model store directory. - -## Edit Registered Model Configuration -The model can be configured after registration using the Management API of TorchServe. - -The following is the template command for the same: -``` -curl -v -X PUT "http://{inference_server_endpoint}:{management_port}/models/{model_name}?min_workers={number}&max_workers={number}&batch_size={number}&max_batch_delay={delay_in_ms}" -``` - -### Examples -For MPT-7B model -``` -curl -v -X PUT "http://localhost:8081/models/mpt_7b?min_worker=2&max_worker=2" -``` -For Falcon-7B model -``` -curl -v -X PUT "http://localhost:8081/models/falcon_7b?min_worker=2&max_worker=2" -``` -For Llama2-7B model -``` -curl -v -X PUT "http://localhost:8081/models/llama2_7b?min_worker=2&max_worker=2" -``` -!!! note - Make sure to have enough GPU and System Memory before increasing number of workers, else the additional workers will fail to load. - -## Unregister a Model -The following is the template command to unregister a model from the Inference Server: -``` -curl -X DELETE "http://{inference_server_endpoint}:{management_port}/models/{model_name}/{repo_version}" -``` diff --git a/docs/gpt-in-a-box/vm/v0.2/model_version.md b/docs/gpt-in-a-box/vm/v0.2/model_version.md deleted file mode 100644 index 8816593b..00000000 --- a/docs/gpt-in-a-box/vm/v0.2/model_version.md +++ /dev/null @@ -1,8 +0,0 @@ -# Model Version Support -We provide the capability to download and register various commits of the single model from HuggingFace. By specifying the commit ID as "repo_version", you can produce MAR files for multiple iterations of the same model and register them simultaneously. To transition between these versions, you can set a default version within TorchServe while it is running and inference the desired version. - -## Set Default Model Version -If multiple versions of the same model are registered, we can set a particular version as the default for inferencing by running the following command: -``` -curl -v -X PUT "http://{inference_server_endpoint}:{management_port}/{model_name}/{repo_version}/set-default" -``` diff --git a/docs/gpt-in-a-box/vm/v0.3/custom_model.md b/docs/gpt-in-a-box/vm/v0.3/custom_model.md deleted file mode 100644 index f6abf945..00000000 --- a/docs/gpt-in-a-box/vm/v0.3/custom_model.md +++ /dev/null @@ -1,31 +0,0 @@ -# Custom Model Support -In some cases you may want to use a custom model, e.g. a custom fine-tuned model. We provide the capability to generate a MAR file with custom model files and start an inference server using it with Torchserve. - -## Generate Model Archive File for Custom Models - -!!! note - The model archive files should be placed in a directory accessible by the Nutanix package, e.g. /home/ubuntu/models/<custom_model_name>/model_files. This directory will be passed to the --model_path argument. You'll also need to provide the --mar_output path where you want the model archive export to be stored. - -Run the following command for generating the Model Archive File (MAR) with the Custom Model files : -``` -python3 $WORK_DIR/llm/generate.py --skip_download [--repo_version --handler ] --model_name --model_path --mar_output -``` -Where the arguments are : - -- **model_name**: Name of custom model -- **repo_version**: Any model version, defaults to "1.0" (optional) -- **model_path**: Absolute path of custom model files (should be a non empty folder) -- **mar_output**: Absolute path of export of MAR file (.mar) -- **skip_download**: Flag to skip downloading the model files, must be set for custom models -- **handler**: Path to custom handler, defaults to llm/handler.py (optional) - -## Start Inference Server with Custom Model Archive File -Run the following command to start TorchServe (Inference Server) and run inference on the provided input for custom models: -``` -bash $WORK_DIR/llm/run.sh -n -a [OPTIONAL -d ] -``` -Where the arguments are : - -- **n**: Name of custom model -- **d**: Absolute path of input data folder (optional) -- **a**: Absolute path to the Model Store directory \ No newline at end of file diff --git a/docs/gpt-in-a-box/vm/v0.3/generating_mar.md b/docs/gpt-in-a-box/vm/v0.3/generating_mar.md deleted file mode 100644 index a1b6f495..00000000 --- a/docs/gpt-in-a-box/vm/v0.3/generating_mar.md +++ /dev/null @@ -1,36 +0,0 @@ -# Generate PyTorch Model Archive File -We will download the model files and generate a Model Archive file for the desired LLM, which will be used by TorchServe to load the model. Find out more about Torch Model Archiver [here](https://github.com/pytorch/serve/blob/master/model-archiver/README.md). - -Make two new directories, one to store the model files (model_path) and another to store the Model Archive files (mar_output). - -!!! note - The model store directory (i.e, mar_output) can be the same for multiple Model Archive files. But model files directory (i.e, model_path) should be empty if you're downloading the model. - -Run the following command for downloading model files and generating the Model Archive File (MAR) of the desired LLM: -``` -python3 $WORK_DIR/llm/generate.py [--skip_download --repo_version --hf_token ] --model_name --model_path --mar_output -``` -Where the arguments are : - -- **model_name**: Name of a [validated model](validated_models.md) -- **repo_version**: Commit ID of model's HuggingFace repository (optional, if not provided default set in model_config will be used) -- **model_path**: Absolute path of model files (should be empty if downloading) -- **mar_output**: Absolute path of export of MAR file (.mar) -- **skip_download**: Flag to skip downloading the model files -- **hf_token**: Your HuggingFace token. Needed to download and verify LLAMA(2) models. (It can alternatively be set using the environment variable 'HF_TOKEN') - -## Examples -The following are example commands to generate the model archive file. - -Download MPT-7B model files and generate model archive for it: -``` -python3 $WORK_DIR/llm/generate.py --model_name mpt_7b --model_path /home/ubuntu/models/mpt_7b/model_files --mar_output /home/ubuntu/models/model_store -``` -Download Falcon-7B model files and generate model archive for it: -``` -python3 $WORK_DIR/llm/generate.py --model_name falcon_7b --model_path /home/ubuntu/models/falcon_7b/model_files --mar_output /home/ubuntu/models/model_store -``` -Download Llama2-7B model files and generate model archive for it: -``` -python3 $WORK_DIR/llm/generate.py --model_name llama2_7b --model_path /home/ubuntu/models/llama2_7b/model_files --mar_output /home/ubuntu/models/model_store --hf_token -``` \ No newline at end of file diff --git a/docs/gpt-in-a-box/vm/v0.3/getting_started.md b/docs/gpt-in-a-box/vm/v0.3/getting_started.md deleted file mode 100644 index c868c75d..00000000 --- a/docs/gpt-in-a-box/vm/v0.3/getting_started.md +++ /dev/null @@ -1,49 +0,0 @@ -# Getting Started -This is a guide on getting started with GPT-in-a-Box deployment on a Virtual Machine. You can find the open source repository for the virtual machine version [here](https://github.com/nutanix/nai-llm). - -Tested Specifications: - -| Specification | Tested Version | -| --- | --- | -| Python | 3.10 | -| Operating System | Ubuntu 20.04 | -| GPU | NVIDIA A100 40G | -| CPU | 8 vCPUs | -| System Memory | 32 GB | - -Follow the steps below to install the necessary prerequisites. - -### Install openjdk, pip3 -Run the following command to install pip3 and openjdk -``` -sudo apt-get install openjdk-17-jdk python3-pip -``` - -### Install NVIDIA Drivers -To install the NVIDIA Drivers, refer to the official [Installation Reference](https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html#runfile). - -Proceed to downloading the latest [Datacenter NVIDIA drivers](https://www.nvidia.com/download/index.aspx) for your GPU type. - -For NVIDIA A100, Select A100 in Datacenter Tesla for Linux 64 bit with CUDA toolkit 11.7, latest driver is 515.105.01. - -``` -curl -fSsl -O https://us.download.nvidia.com/tesla/515.105.01/NVIDIA-Linux-x86_64-515.105.01.run -sudo sh NVIDIA-Linux-x86_64-515.105.01.run -s -``` -!!! note - There is no need to install CUDA toolkit separately as it is bundled with PyTorch installation. The NVIDIA driver installation is sufficient. - -### Download Nutanix package -Download the **v0.3** release version from the [NAI-LLM Releases](https://github.com/nutanix/nai-llm/releases/tag/v0.3) and untar the release on the node. Set the working directory to the root folder containing the extracted release. - -``` -export WORK_DIR=absolute_path_to_empty_release_directory -mkdir $WORK_DIR -tar -xvf -C $WORK_DIR --strip-components=1 -``` - -### Install required packages -Run the following command to install the required python packages. -``` -pip install -r $WORK_DIR/llm/requirements.txt -``` diff --git a/docs/gpt-in-a-box/vm/v0.3/huggingface_model.md b/docs/gpt-in-a-box/vm/v0.3/huggingface_model.md deleted file mode 100644 index 6abf2836..00000000 --- a/docs/gpt-in-a-box/vm/v0.3/huggingface_model.md +++ /dev/null @@ -1,45 +0,0 @@ -# HuggingFace Model Support -!!! Note - To start the inference server for the [**Validated Models**](validated_models.md), refer to the [**Deploying Inference Server**](inference_server.md) documentation. - -We provide the capability to download model files from any HuggingFace repository and generate a MAR file to start an inference server using it with Torchserve. - -To start the Inference Server for any other HuggingFace model, follow the steps below. - -## Generate Model Archive File for HuggingFace Models -Run the following command for downloading and generating the Model Archive File (MAR) with the HuggingFace Model files : -``` -python3 $WORK_DIR/llm/generate.py [--hf_token --repo_version --handler ] --model_name --repo_id --model_path --mar_output -``` -Where the arguments are : - -- **model_name**: Name of HuggingFace model -- **repo_id**: HuggingFace Repository ID of the model -- **repo_version**: Commit ID of model's HuggingFace repository, defaults to latest HuggingFace commit ID (optional) -- **model_path**: Absolute path of model files (should be an empty folder) -- **mar_output**: Absolute path of export of MAR file (.mar) -- **handler**: Path to custom handler, defaults to llm/handler.py (optional) -- **hf_token**: Your HuggingFace token. Needed to download and verify LLAMA(2) models. - -### Example -Download model files and generate model archive for codellama/CodeLlama-7b-hf: -``` -python3 $WORK_DIR/llm/generate.py --model_name codellama_7b_hf --repo_id codellama/CodeLlama-7b-hf --model_path /models/codellama_7b_hf/model_files --mar_output /models/model_store -``` - -## Start Inference Server with HuggingFace Model -Run the following command to start TorchServe (Inference Server) and run inference on the provided input for HuggingFace models: -``` -bash $WORK_DIR/llm/run.sh -n -a [OPTIONAL -d ] -``` -Where the arguments are : - -- **n**: Name of HuggingFace model -- **d**: Absolute path of input data folder (optional) -- **a**: Absolute path to the Model Store directory - -### Example -To start Inference Server with codellama/CodeLlama-7b-hf: -``` -bash $WORK_DIR/llm/run.sh -n codellama_7b_hf -a /models/model_store -d $WORK_DIR/data/summarize -``` diff --git a/docs/gpt-in-a-box/vm/v0.3/inference_requests.md b/docs/gpt-in-a-box/vm/v0.3/inference_requests.md deleted file mode 100644 index 22c6905d..00000000 --- a/docs/gpt-in-a-box/vm/v0.3/inference_requests.md +++ /dev/null @@ -1,82 +0,0 @@ -# Inference Requests -The Inference Server can be inferenced through the TorchServe Inference API. Find out more about it in the official [TorchServe Inference API](https://pytorch.org/serve/inference_api.html) documentation. - -**Server Configuration** - -| Variable | Value | -| --- | --- | -| inference_server_endpoint | localhost | -| inference_port | 8080 | - -The following are example cURL commands to send inference requests to the Inference Server. - -## Ping Request -To find out the status of a TorchServe server, you can use the ping API that TorchServe supports: -``` -curl http://{inference_server_endpoint}:{inference_port}/ping -``` -### Example -``` -curl http://localhost:8080/ping -``` -!!! note - This only provides information on whether the TorchServe server is running. To check whether a model is successfully registered on TorchServe, you can [**list all models**](management_requests.md#list-registered-models) and [**describe a registered model**](management_requests.md#describe-registered-models). - -## Inference Requests -The following is the template command for inferencing with a text file: -``` -curl -v -H "Content-Type: application/text" http://{inference_server_endpoint}:{inference_port}/predictions/{model_name} -d @path/to/data.txt -``` - -The following is the template command for inferencing with a json file: -``` -curl -v -H "Content-Type: application/json" http://{inference_server_endpoint}:{inference_port}/predictions/{model_name} -d @path/to/data.json -``` - -Input data files can be found in the `$WORK_DIR/data` folder. - -### Examples - -For MPT-7B model -``` -curl -v -H "Content-Type: application/text" http://localhost:8080/predictions/mpt_7b -d @$WORK_DIR/data/qa/sample_text1.txt -``` -``` -curl -v -H "Content-Type: application/json" http://localhost:8080/predictions/mpt_7b -d @$WORK_DIR/data/qa/sample_text4.json -``` - -For Falcon-7B model -``` -curl -v -H "Content-Type: application/text" http://localhost:8080/predictions/falcon_7b -d @$WORK_DIR/data/summarize/sample_text1.txt -``` -``` -curl -v -H "Content-Type: application/json" http://localhost:8080/predictions/falcon_7b -d @$WORK_DIR/data/summarize/sample_text3.json -``` - -For Llama2-7B model -``` -curl -v -H "Content-Type: application/text" http://localhost:8080/predictions/llama2_7b -d @$WORK_DIR/data/translate/sample_text1.txt -``` -``` -curl -v -H "Content-Type: application/json" http://localhost:8080/predictions/llama2_7b -d @$WORK_DIR/data/translate/sample_text3.json -``` - -### Input data format -Input data can be in either **text** or **JSON** format. - -1. For text format, the input should be a '.txt' file containing the prompt - -2. For JSON format, the input should be a '.json' file containing the prompt in the format below: -``` -{ - "id": "42", - "inputs": [ - { - "name": "input0", - "shape": [-1], - "datatype": "BYTES", - "data": ["Capital of India?"] - } - ] -} -``` \ No newline at end of file diff --git a/docs/gpt-in-a-box/vm/v0.3/inference_server.md b/docs/gpt-in-a-box/vm/v0.3/inference_server.md deleted file mode 100644 index 4a899d9a..00000000 --- a/docs/gpt-in-a-box/vm/v0.3/inference_server.md +++ /dev/null @@ -1,36 +0,0 @@ -# Deploying Inference Server - -Run the following command to start TorchServe (Inference Server) and run inference on the provided input: -``` -bash $WORK_DIR/llm/run.sh -n -a [OPTIONAL -d -v ] -``` -Where the arguments are : - -- **n**: Name of a [validated model](validated_models.md) -- **v**: Commit ID of model's HuggingFace repository (optional, if not provided default set in model_config will be used) -- **d**: Absolute path of input data folder (optional) -- **a**: Absolute path to the Model Store directory - -Once the Inference Server has successfully started, you should see a "Ready For Inferencing" message. - -### Examples -The following are example commands to start the Inference Server. - -For Inference with official MPT-7B model: -``` -bash $WORK_DIR/llm/run.sh -n mpt_7b -d $WORK_DIR/data/translate -a /home/ubuntu/models/model_store -``` -For Inference with official Falcon-7B model: -``` -bash $WORK_DIR/llm/run.sh -n falcon_7b -d $WORK_DIR/data/qa -a /home/ubuntu/models/model_store -``` -For Inference with official Llama2-7B model: -``` -bash $WORK_DIR/llm/run.sh -n llama2_7b -d $WORK_DIR/data/summarize -a /home/ubuntu/models/model_store -``` - -## Stop Inference Server and Cleanup -Run the following command to stop the Inference Server and clean up temporarily generate files. -``` -python3 $WORK_DIR/llm/cleanup.py -``` \ No newline at end of file diff --git a/docs/gpt-in-a-box/vm/v0.3/management_requests.md b/docs/gpt-in-a-box/vm/v0.3/management_requests.md deleted file mode 100644 index cb9819c6..00000000 --- a/docs/gpt-in-a-box/vm/v0.3/management_requests.md +++ /dev/null @@ -1,133 +0,0 @@ -# Management Requests -The Inference Server can be managed through the TorchServe Management API. Find out more about it in the official [TorchServe Management API](https://pytorch.org/serve/management_api.html) documentation - -**Server Configuration** - -| Variable | Value | -| --- | --- | -| inference_server_endpoint | localhost | -| management_port | 8081 | - -The following are example cURL commands to send management requests to the Inference Server. - -## List Registered Models -To describe all registered models, the template command is: -``` -curl http://{inference_server_endpoint}:{management_port}/models -``` - -### Example -For all registered models -``` -curl http://localhost:8081/models -``` - -## Describe Registered Models -Once a model is loaded on the Inference Server, we can use the following request to describe the model and it's configuration. - -The following is the template command for the same: -``` -curl http://{inference_server_endpoint}:{management_port}/models/{model_name} -``` -Example response of the describe models request: -``` -[ - { - "modelName": "llama2_7b", - "modelVersion": "6fdf2e60f86ff2481f2241aaee459f85b5b0bbb9", - "modelUrl": "llama2_7b_6fdf2e6.mar", - "runtime": "python", - "minWorkers": 1, - "maxWorkers": 1, - "batchSize": 1, - "maxBatchDelay": 200, - "loadedAtStartup": false, - "workers": [ - { - "id": "9000", - "startTime": "2023-11-28T06:39:28.081Z", - "status": "READY", - "memoryUsage": 0, - "pid": 57379, - "gpu": true, - "gpuUsage": "gpuId::0 utilization.gpu [%]::0 % utilization.memory [%]::0 % memory.used [MiB]::13423 MiB" - } - ], - "jobQueueStatus": { - "remainingCapacity": 1000, - "pendingRequests": 0 - } - } -] -``` - -!!! note - From this request, you can validate if a model is ready for inferencing. You can do this by referring to the values under the "workers" -> "status" keys of the response. - -### Examples -For MPT-7B model -``` -curl http://localhost:8081/models/mpt_7b -``` -For Falcon-7B model -``` -curl http://localhost:8081/models/falcon_7b -``` -For Llama2-7B model -``` -curl http://localhost:8081/models/llama2_7b -``` - -## Register Additional Models -TorchServe allows the registering (loading) of multiple models simultaneously. To register multiple models, make sure that the Model Archive Files for the concerned models are stored in the same directory. - -The following is the template command for the same: -``` -curl -X POST "http://{inference_server_endpoint}:{management_port}/models?url={model_archive_file_name}.mar&initial_workers=1&synchronous=true" -``` - -### Examples -For MPT-7B model -``` -curl -X POST "http://localhost:8081/models?url=mpt_7b.mar&initial_workers=1&synchronous=true" -``` -For Falcon-7B model -``` -curl -X POST "http://localhost:8081/models?url=falcon_7b.mar&initial_workers=1&synchronous=true" -``` -For Llama2-7B model -``` -curl -X POST "http://localhost:8081/models?url=llama2_7b.mar&initial_workers=1&synchronous=true" -``` -!!! note - Make sure the Model Archive file name given in the cURL request is correct and is present in the model store directory. - -## Edit Registered Model Configuration -The model can be configured after registration using the Management API of TorchServe. - -The following is the template command for the same: -``` -curl -v -X PUT "http://{inference_server_endpoint}:{management_port}/models/{model_name}?min_workers={number}&max_workers={number}&batch_size={number}&max_batch_delay={delay_in_ms}" -``` - -### Examples -For MPT-7B model -``` -curl -v -X PUT "http://localhost:8081/models/mpt_7b?min_worker=2&max_worker=2" -``` -For Falcon-7B model -``` -curl -v -X PUT "http://localhost:8081/models/falcon_7b?min_worker=2&max_worker=2" -``` -For Llama2-7B model -``` -curl -v -X PUT "http://localhost:8081/models/llama2_7b?min_worker=2&max_worker=2" -``` -!!! note - Make sure to have enough GPU and System Memory before increasing number of workers, else the additional workers will fail to load. - -## Unregister a Model -The following is the template command to unregister a model from the Inference Server: -``` -curl -X DELETE "http://{inference_server_endpoint}:{management_port}/models/{model_name}/{repo_version}" -``` diff --git a/docs/gpt-in-a-box/vm/v0.3/model_version.md b/docs/gpt-in-a-box/vm/v0.3/model_version.md deleted file mode 100644 index 647199ca..00000000 --- a/docs/gpt-in-a-box/vm/v0.3/model_version.md +++ /dev/null @@ -1,12 +0,0 @@ -# Model Version Support -We provide the capability to download and register various commits of the single model from HuggingFace. Follow the steps below for the same : - -- [Generate MAR files](generating_mar.md) for the required HuggingFace commits by passing it's commit ID in the "--repo_version" argument -- [Deploy TorchServe](inference_server.md) with any one of the versions passed through the "--repo_version" argument -- Register the rest of the required versions through the [register additional models](management_requests.md#register-additional-models) request. - -## Set Default Model Version -If multiple versions of the same model are registered, we can set a particular version as the default for inferencing by running the following command: -``` -curl -v -X PUT "http://{inference_server_endpoint}:{management_port}/{model_name}/{repo_version}/set-default" -``` diff --git a/docs/gpt-in-a-box/vm/v0.3/validated_models.md b/docs/gpt-in-a-box/vm/v0.3/validated_models.md deleted file mode 100644 index f92cd1dc..00000000 --- a/docs/gpt-in-a-box/vm/v0.3/validated_models.md +++ /dev/null @@ -1,16 +0,0 @@ -# Validated Models for Virtual Machine Version - -GPT-in-a-Box has been validated on a curated set of HuggingFace models. Information pertaining to these models is stored in the ```llm/model_config.json``` file. - -The Validated Models are : - -| Model Name | HuggingFace Repository ID | -| --- | --- | -| mpt_7b | [mosaicml/mpt_7b](https://huggingface.co/mosaicml/mpt-7b) | -| falcon_7b | [tiiuae/falcon-7b](https://huggingface.co/tiiuae/falcon-7b) | -| llama2_7b | [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | -| codellama_7b_python | [codellama/CodeLlama-7b-Python-hf](https://huggingface.co/codellama/CodeLlama-7b-Python-hf) | -| llama2_7b_chat | [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) | - -!!! note - To start the inference server with any HuggingFace model, refer to [**HuggingFace Model Support**](huggingface_model.md) documentation. \ No newline at end of file diff --git a/docs/openshift/operators/csi/index.md b/docs/openshift/operators/csi/index.md index a069b841..02c099ab 100644 --- a/docs/openshift/operators/csi/index.md +++ b/docs/openshift/operators/csi/index.md @@ -26,29 +26,92 @@ With Nutanix CSI Provider you can: 2. Install the Operator by using the "openshift-cluster-csi-drivers" namespace and selecting defaults. ### Installing the CSI Driver using the Operator - 1. In the OpenShift web console, navigate to the Operators → Installed Operators page. 2. Select **Nutanix CSI Operator**. 3. Select **Create instance** and then **Create**. +4. To install Nutanix CSI Driver interacting in PC Mode + + apiVersion: crd.nutanix.com/v1alpha1 + kind: NutanixCsiStorage + metadata: + name: nutanixcsistorage + namespace: openshift-cluster-csi-drivers + spec: + ntnxInitConfigMap: + usePC : true + +5. To install Nutanix CSI Driver interacting in PE Mode + + apiVersion: crd.nutanix.com/v1alpha1 + kind: NutanixCsiStorage + metadata: + name: nutanixcsistorage + namespace: openshift-cluster-csi-drivers + spec: + ntnxInitConfigMap: + usePC : false + +CSI 3.3.8 supports PC service account-based authentication in Nutanix Volumes and Nutanix Files. Instead of using username and password secrets, Prism Central administrators can now create service accounts, configure RBAC, and use generated API keys for secure storage provisioning. + +6. To install Nutanix CSI Driver with service account based authentication + + apiVersion: crd.nutanix.com/v1alpha1 + kind: NutanixCsiStorage + metadata: + name: nutanixcsistorage + namespace: openshift-cluster-csi-drivers + spec: + authType: "service-auth" + ntnxInitConfigMap: + usePC : true + ### Configuring the K8s secret and storage class -In order to use this driver, create the relevant storage classes and secrets using the OpenShift CLI, by followinig the below section: +In order to use this driver, create the relevant storage classes and secrets using the OpenShift CLI, by following the below section: -1. Create a secret yaml file like the below example and apply (`oc -n openshift-cluster-csi-drivers apply -f `). +1. Depending on the mode of interaction of the CSI Driver(Interacting with PC or PE), create a secret yaml file like the below example and apply (`oc -n openshift-cluster-csi-drivers apply -f `). + ### Nutanix PC based secret apiVersion: v1 kind: Secret metadata: - name: ntnx-secret + name: ntnx-pc-secret namespace: openshift-cluster-csi-drivers stringData: - # prism-element-ip:prism-port:admin:password - key: 10.0.0.14:9440:admin:password + # prism-central-ip:prism-port:username:password. + key: 1.2.3.4:9440:admin:password -2. Create storage class yaml like the below example and apply (`oc apply -f `). + ### Nutanix PE based secret + apiVersion: v1 + kind: Secret + metadata: + name: ntnx-pe-secret + namespace: openshift-cluster-csi-drivers + stringData: + # prism-element-ip:prism-port:username:password. + key: 1.2.3.4:9440:admin:password + files-key: "fileserver01.sample.com:csi:password1" # For dynamic files mode + + ### Nutanix PC secret with service account based authentication + apiVersion: v1 + kind: Secret + metadata: + name: ntnx-pc-secret + namespace: openshift-cluster-csi-drivers + type: Opaque + stringData: + host: 1.2.3.4 + port: 9440 + key_type: "api-key" + key_value: "xxxxxxxxxxx" + auth_type: "service-auth" + +2. Depending on the mode of interaction of the CSI Driver(Interacting with PC or PE and storageType NutanixVolumes or NutanixFiles), create a storageclass yaml file like the below example and apply (`oc -n openshift-cluster-csi-drivers apply -f `). + + ### Nutanix Volumes on PE based installation kind: StorageClass apiVersion: storage.k8s.io/v1 @@ -56,25 +119,113 @@ In order to use this driver, create the relevant storage classes and secrets usi name: nutanix-volume provisioner: csi.nutanix.com parameters: - csi.storage.k8s.io/provisioner-secret-name: ntnx-secret + csi.storage.k8s.io/provisioner-secret-name: ntnx-pe-secret + csi.storage.k8s.io/provisioner-secret-namespace: openshift-cluster-csi-drivers + csi.storage.k8s.io/node-publish-secret-name: ntnx-pe-secret + csi.storage.k8s.io/node-publish-secret-namespace: openshift-cluster-csi-drivers + csi.storage.k8s.io/controller-expand-secret-name: ntnx-pe-secret + csi.storage.k8s.io/controller-expand-secret-namespace: openshift-cluster-csi-drivers + csi.storage.k8s.io/controller-publish-secret-name: ntnx-pe-secret + csi.storage.k8s.io/controller-publish-secret-namespace: openshift-cluster-csi-drivers + csi.storage.k8s.io/fstype: ext4 + storageContainer: default-container + storageType: NutanixVolumes + #description: "description added to each storage object created by the driver" + #isSegmentedIscsiNetwork: "false" + #whitelistIPMode: ENABLED + #chapAuth: ENABLED + #isLVMVolume: "false" + #numLVMDisks: 4 + allowVolumeExpansion: true + reclaimPolicy: Delete + + ### Nutanix dynamic files on PE based installation + + kind: StorageClass + apiVersion: storage.k8s.io/v1 + metadata: + name: nutanix-dynfiles + provisioner: csi.nutanix.com + parameters: + dynamicProv: ENABLED + nfsServerName: fs + csi.storage.k8s.io/provisioner-secret-name: ntnx-pe-secret csi.storage.k8s.io/provisioner-secret-namespace: openshift-cluster-csi-drivers - csi.storage.k8s.io/node-publish-secret-name: ntnx-secret + csi.storage.k8s.io/node-publish-secret-name: ntnx-pe-secret csi.storage.k8s.io/node-publish-secret-namespace: openshift-cluster-csi-drivers - csi.storage.k8s.io/controller-expand-secret-name: ntnx-secret + csi.storage.k8s.io/controller-expand-secret-name: ntnx-pe-secret csi.storage.k8s.io/controller-expand-secret-namespace: openshift-cluster-csi-drivers + csi.storage.k8s.io/controller-publish-secret-name: ntnx-pe-secret + csi.storage.k8s.io/controller-publish-secret-namespace: openshift-cluster-csi-drivers + storageType: NutanixFiles + squashType: "none" + #description: "description added to each storage object created by the driver" + allowVolumeExpansion: true + + ### Nutanix Volumes on PC based installation + + kind: StorageClass + apiVersion: storage.k8s.io/v1 + metadata: + name: nutanix-volume + provisioner: csi.nutanix.com + parameters: csi.storage.k8s.io/fstype: ext4 - dataServiceEndPoint: 10.0.0.15:3260 storageContainer: default-container storageType: NutanixVolumes + #description: "description added to each storage object created by the driver" + #isSegmentedIscsiNetwork: "false" #whitelistIPMode: ENABLED #chapAuth: ENABLED + #isLVMVolume: "false" + #numLVMDisks: 4 allowVolumeExpansion: true reclaimPolicy: Delete **Note:** By default, new RHCOS based nodes are provisioned with the required `scsi-initiator-utils` package installed, but with the `iscsid` service disabled. This can result in messages like `iscsiadm: can not connect to iSCSI daemon (111)!`. When this occurs, confirm that the `iscsid.service` is running on worker nodes. It can be enabled and started globally using the Machine Config Operator or directly on each node using systemctl (`sudo systemctl enable --now iscsid`). -See the Managing Storage section of [CSI Driver documentation](https://portal.nutanix.com/page/documents/details?targetId=CSI-Volume-Driver-v2_5:csi-csi-plugin-storage-c.html){target=_blank} on the Nutanix Portal for more information on configuring storage classes. +See the Managing Storage section of [CSI Driver documentation](https://portal.nutanix.com/page/documents/details?targetId=CSI-Volume-Driver-v3_3:csi-csi-plugin-storage-c.html){target=_blank} on the Nutanix Portal for more information on configuring storage classes. + +### Upgrading Nutanix CSI Driver from 2.6.x to 3.3 +Please read the following instructions carefully before upgrading from 2.6.x to 3.3, for more information please refer to [documentation](https://portal.nutanix.com/page/documents/details?targetId=CSI-Volume-Driver-v3_3:CSI-Volume-Driver-v3_3) + +1. Please do not upgrade to the CSI 3.x operator if: + * You are using LVM volumes. + +2. To upgrade from the CSI 2.6.x to CSI 3.3 (interacting with Prism Central) operator + * Create a Nutanix Prism Central secret as explained above. + * Delete the csidriver object from the cluster: + + ``` + oc delete csidriver csi.nutanix.com + ``` + + * In the installed operators, go to Nutanix CSI Operator and change the subscription channel from stable to stable-3.x. + If you have installed the operator with automatic update approval, the operator will be automatically upgraded to CSI 3.3, and then the nutanixcsistorage resource will be upgraded. + An update plan will be generated for manual updates. Upon approval, the operator will be successfully upgraded. + +3. Direct upgrades from CSI 2.6.x to CSI 3.3 interacting with Prism Element are not supported. + The only solution is to recreate the nutanixcsistorage instance by following the below procedure: + - In the installed operators, go to Nutanix CSI Operator and delete the nutanixcsistorage instance. + - Next change the subscription channel from stable to stable-3.x. + - Verify the following points: + - Ensure a Nutanix Prism Element secret is present in the namespace. + - Ensure that all the storage classes with provisioner: csi.nutanix.com have a controller publish secret as explained below. + + ``` + csi.storage.k8s.io/controller-publish-secret-name: ntnx-pe-secret + csi.storage.k8s.io/controller-publish-secret-namespace: openshift-cluster-csi-drivers + ``` + + If this secret is not present in the storage class please delete and recreate the storage classes with the required secrets. + - Create a new instance of nutanixcsistorage from this operator by specifying `usePC: false` in YAML spec section. + - Caution: Moving from CSI driver interacting with Prism Central to CSI driver interacting with Prism Element is not supported. + +4. Troubleshooting: + + If the upgrade was unsuccessful and you want to revert to version CSI 2.6.x, please delete the csidriver object as explained above, uninstall the operator (no need to delete the nutanixcsistorage custom resource), and install version CSI 2.6.x from the stable channel. + ### Using the Nutanix CSI Operator on restricted networks @@ -82,4 +233,4 @@ For OpenShift Container Platform clusters that are installed on restricted netwo The Nutanix CSI Operator is fully compatible with a restricted networks architecture and supported in disconnected mode. Follow the [OpenShift documentation](https://docs.openshift.com/container-platform/latest/operators/admin/olm-restricted-networks.html){target=_blank} to configure. -You need to mirror the `certified-operator-index` and keep the `nutanixcsioperator` package in your pruned index. \ No newline at end of file +You need to mirror the `certified-operator-index` and keep the `nutanixcsioperator` package in your pruned index. diff --git a/docs/openshift/post-install/index.md b/docs/openshift/post-install/index.md index b8e55f96..59bc1df4 100644 --- a/docs/openshift/post-install/index.md +++ b/docs/openshift/post-install/index.md @@ -97,7 +97,7 @@ Based on requirements, choose one of the following options: #nfsServerName above is File Server Name in Prism without DNS suffix, not the FQDN. csi.storage.k8s.io/provisioner-secret-name: ntnx-secret csi.storage.k8s.io/provisioner-secret-namespace: openshift-cluster-csi-drivers - storageType: NutanixFiles + storageType: NutanixFiles 2. Create a PVC yaml file like the below example and apply in the openshift-image-registry namespace (`oc -n openshift-image-registry apply -f `). diff --git a/mkdocs.yml b/mkdocs.yml index d0aa99a2..ad1f4025 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -1,200 +1,306 @@ site_name: opendocs.nutanix.com theme: - name: material - logo: images/nutanix_x_white.png - features: - - navigation.instant - - content.code.annotate - - navigation.tabs - - navigation.top - favicon: images/favicon.png - icon: - admonition: - note: material/note + name: material + logo: images/nutanix_x_white.png + features: + - navigation.instant + - content.code.annotate + - navigation.tabs + - navigation.top + favicon: images/favicon.png + icon: + admonition: + note: material/note extra_css: - - stylesheets/extra.css + - stylesheets/extra.css nav: - - "Solutions": - - "Cloud Native": - - "Overview": "index.md" - - "Cluster API Provider: Nutanix (CAPX)": - - "v1.3.x (Latest)": - - "Getting Started": "capx/v1.3.x/getting_started.md" - - "Types": - - "NutanixCluster": "capx/v1.3.x/types/nutanix_cluster.md" - - "NutanixMachineTemplate": "capx/v1.3.x/types/nutanix_machine_template.md" - - "Certificate Trust": "capx/v1.3.x/pc_certificates.md" - - "Credential Management": "capx/v1.3.x/credential_management.md" - - "Tasks": - - "Modifying Machine Configuration": "capx/v1.3.x/tasks/modify_machine_configuration.md" - - "CAPX v1.3.x Upgrade Procedure": "capx/v1.3.x/tasks/capx_v13x_upgrade_procedure.md" - - "Port Requirements": "capx/v1.3.x/port_requirements.md" - - "User Requirements": "capx/v1.3.x/user_requirements.md" - - "Addons": - - "CSI Driver Installation": "capx/v1.3.x/addons/install_csi_driver.md" - - "Validated Integrations": "capx/v1.3.x/validated_integrations.md" - - "Experimental": - - "Multi-PE CAPX cluster": "capx/v1.3.x/experimental/capx_multi_pe.md" - - "Autoscaler": "capx/v1.3.x/experimental/autoscaler.md" - - "OIDC Integration": "capx/v1.3.x/experimental/oidc.md" - - "Flow VPC": "capx/v1.3.x/experimental/vpc.md" - - "Proxy Configuration": "capx/v1.3.x/experimental/proxy.md" - - "Registry Mirror Configuration": "capx/v1.3.x/experimental/registry_mirror.md" - - "Troubleshooting": "capx/v1.3.x/troubleshooting.md" - - "v1.2.x": - - "Getting Started": "capx/v1.2.x/getting_started.md" - - "Types": - - "NutanixCluster": "capx/v1.2.x/types/nutanix_cluster.md" - - "NutanixMachineTemplate": "capx/v1.2.x/types/nutanix_machine_template.md" - - "Certificate Trust": "capx/v1.2.x/pc_certificates.md" - - "Credential Management": "capx/v1.2.x/credential_management.md" - - "Tasks": - - "Modifying Machine Configuration": "capx/v1.2.x/tasks/modify_machine_configuration.md" - - "Port Requirements": "capx/v1.2.x/port_requirements.md" - - "User Requirements": "capx/v1.2.x/user_requirements.md" - - "Addons": - - "CSI Driver Installation": "capx/v1.2.x/addons/install_csi_driver.md" - - "Validated Integrations": "capx/v1.2.x/validated_integrations.md" - - "Experimental": - - "Multi-PE CAPX cluster": "capx/v1.2.x/experimental/capx_multi_pe.md" - - "Autoscaler": "capx/v1.2.x/experimental/autoscaler.md" - - "OIDC Integration": "capx/v1.2.x/experimental/oidc.md" - - "Flow VPC": "capx/v1.2.x/experimental/vpc.md" - - "Proxy Configuration": "capx/v1.2.x/experimental/proxy.md" - - "Registry Mirror Configuration": "capx/v1.2.x/experimental/registry_mirror.md" - - "Troubleshooting": "capx/v1.2.x/troubleshooting.md" - - "v1.1.x": - - "Getting Started": "capx/v1.1.x/getting_started.md" - - "Types": - - "NutanixCluster": "capx/v1.1.x/types/nutanix_cluster.md" - - "NutanixMachineTemplate": "capx/v1.1.x/types/nutanix_machine_template.md" - - "Certificate Trust": "capx/v1.1.x/pc_certificates.md" - - "Credential Management": "capx/v1.1.x/credential_management.md" - - "Tasks": - - "Modifying Machine Configuration": "capx/v1.1.x/tasks/modify_machine_configuration.md" - - "Port Requirements": "capx/v1.1.x/port_requirements.md" - - "User Requirements": "capx/v1.1.x/user_requirements.md" - - "Addons": - - "CSI Driver Installation": "capx/v1.1.x/addons/install_csi_driver.md" - - "Validated Integrations": "capx/v1.1.x/validated_integrations.md" - - "Experimental": - - "Multi-PE CAPX cluster": "capx/v1.1.x/experimental/capx_multi_pe.md" - - "Autoscaler": "capx/v1.1.x/experimental/autoscaler.md" - - "OIDC Integration": "capx/v1.1.x/experimental/oidc.md" - - "Flow VPC": "capx/v1.1.x/experimental/vpc.md" - - "Proxy Configuration": "capx/v1.1.x/experimental/proxy.md" - - "Registry Mirror Configuration": "capx/v1.1.x/experimental/registry_mirror.md" - - "Troubleshooting": "capx/v1.1.x/troubleshooting.md" - - "v1.0.x": - - "Getting Started": "capx/v1.0.x/getting_started.md" - - "Types": - - "NutanixCluster": "capx/v1.0.x/types/nutanix_cluster.md" - - "NutanixMachineTemplate": "capx/v1.0.x/types/nutanix_machine_template.md" - - "Credential Management": "capx/v1.0.x/credential_management.md" - - "Tasks": - - "Modifying Machine Configuration": "capx/v1.0.x/tasks/modify_machine_configuration.md" - - "Port Requirements": "capx/v1.0.x/port_requirements.md" - - "Addons": - - "CSI Driver Installation": "capx/v1.0.x/addons/install_csi_driver.md" - - "Validated Integrations": "capx/v1.0.x/validated_integrations.md" - - "Experimental": - - "Multi-PE CAPX cluster": "capx/v1.0.x/experimental/capx_multi_pe.md" - - "Autoscaler": "capx/v1.0.x/experimental/autoscaler.md" - - "Troubleshooting": "capx/v1.0.x/troubleshooting.md" - - "v0.5.x": - - "Getting Started": "capx/v0.5.x/getting_started.md" - - "Credential Management": "capx/v0.5.x/credential_management.md" - - "Addons": - - "CSI Driver Installation": "capx/v0.5.x/addons/install_csi_driver.md" - - "Validated Integrations": "capx/v0.5.x/validated_integrations.md" - - "Experimental": - - "Multi-PE CAPX cluster": "capx/v0.5.x/experimental/capx_multi_pe.md" - - "Autoscaler": "capx/v0.5.x/experimental/autoscaler.md" - - "Troubleshooting": "capx/v0.5.x/troubleshooting.md" - - "Nutanix Cloud Controller Manager (CCM)": - - "v0.3.x (Latest)": - - "Overview": "ccm/v0.3.x/overview.md" - - "Requirements": "ccm/v0.3.x/requirements.md" - - "Configuration": "ccm/v0.3.x/ccm_configuration.md" - - "Certificate Trust": "ccm/v0.3.x/pc_certificates.md" - - "Credentials": "ccm/v0.3.x/ccm_credentials.md" - - "Topology Discovery": "ccm/v0.3.x/topology_discovery.md" - - "Custom Labeling": "ccm/v0.3.x/custom_labeling.md" - - "v0.2.0": - - "Overview": "ccm/v0.2.x/overview.md" - - "Requirements": "ccm/v0.2.x/requirements.md" - - "Configuration": "ccm/v0.2.x/ccm_configuration.md" - - "Credentials": "ccm/v0.2.x/ccm_credentials.md" - - "Topology Discovery": "ccm/v0.2.x/topology_discovery.md" - - "Custom Labeling": "ccm/v0.2.x/custom_labeling.md" - - "Red Hat OpenShift": - - "Install": - - "Agnostic": "openshift/install/agnostic/index.md" - - "IPI": "openshift/install/ipi/index.md" - - "Assisted Installer": "openshift/install/assisted_installer/index.md" - - "Post Install": "openshift/post-install/index.md" - - Operators: - - "CSI": "openshift/operators/csi/index.md" - - "Google Anthos": - - "Architecture": "anthos/architecture/index.md" - - "Install": - - "Manual": "anthos/install/manual/index.md" - - "Amazon EKS Anywhere": - - "Install": "eksa/install/index.md" - - "GPT-in-a-Box": - - "Overview": "gpt-in-a-box/overview.md" - - "Deploy on Virtual Machine": - - "v0.3": - - "Getting Started": "gpt-in-a-box/vm/v0.3/getting_started.md" - - "Validated Models": "gpt-in-a-box/vm/v0.3/validated_models.md" - - "Generating Model Archive File": "gpt-in-a-box/vm/v0.3/generating_mar.md" - - "Deploying Inference Server": "gpt-in-a-box/vm/v0.3/inference_server.md" - - "Inference Requests": "gpt-in-a-box/vm/v0.3/inference_requests.md" - - "Model Version Support": "gpt-in-a-box/vm/v0.3/model_version.md" - - "HuggingFace Model Support": "gpt-in-a-box/vm/v0.3/huggingface_model.md" - - "Custom Model Support": "gpt-in-a-box/vm/v0.3/custom_model.md" - - "Management Requests": "gpt-in-a-box/vm/v0.3/management_requests.md" - - "v0.2": - - "Getting Started": "gpt-in-a-box/vm/v0.2/getting_started.md" - - "Generating Model Archive File": "gpt-in-a-box/vm/v0.2/generating_mar.md" - - "Deploying Inference Server": "gpt-in-a-box/vm/v0.2/inference_server.md" - - "Inference Requests": "gpt-in-a-box/vm/v0.2/inference_requests.md" - - "Model Version Support": "gpt-in-a-box/vm/v0.2/model_version.md" - - "Custom Model Support": "gpt-in-a-box/vm/v0.2/custom_model.md" - - "Management Requests": "gpt-in-a-box/vm/v0.2/management_requests.md" - - "Deploy on Kubernetes": - - "v0.2": - - "Getting Started": "gpt-in-a-box/kubernetes/v0.2/getting_started.md" - - "Validated Models": "gpt-in-a-box/kubernetes/v0.2/validated_models.md" - - "Generating Model Archive File": "gpt-in-a-box/kubernetes/v0.2/generating_mar.md" - - "Deploying Inference Server": "gpt-in-a-box/kubernetes/v0.2/inference_server.md" - - "Inference Requests": "gpt-in-a-box/kubernetes/v0.2/inference_requests.md" - - "HuggingFace Model Support": "gpt-in-a-box/kubernetes/v0.2/huggingface_model.md" - - "Custom Model Support": "gpt-in-a-box/kubernetes/v0.2/custom_model.md" - - "v0.1": - - "Getting Started": "gpt-in-a-box/kubernetes/v0.1/getting_started.md" - - "Generating Model Archive File": "gpt-in-a-box/kubernetes/v0.1/generating_mar.md" - - "Deploying Inference Server": "gpt-in-a-box/kubernetes/v0.1/inference_server.md" - - "Inference Requests": "gpt-in-a-box/kubernetes/v0.1/inference_requests.md" - - "Custom Model Support": "gpt-in-a-box/kubernetes/v0.1/custom_model.md" - - "Support": "gpt-in-a-box/support.md" - - "Guides": - - "Cloud Native": - - "Red Hat OpenShift": - - "Install": - - "IPI": "guides/openshift/install/ipi/index.md" - - "Custom Cloud Native Role": "guides/cloud_native_role/index.md" + - "Solutions": + - "Cloud Native": + - "Overview": "index.md" + - "Cluster API Provider: Nutanix (CAPX)": + - "v1.8.x (latest)": + - "Getting Started": "capx/v1.8.x/getting_started.md" + - "Types": + - "NutanixCluster": "capx/v1.8.x/types/nutanix_cluster.md" + - "NutanixMachineTemplate": "capx/v1.8.x/types/nutanix_machine_template.md" + - "NutanixFailureDomain": "capx/v1.8.x/types/nutanix_failure_domains.md" + - "Certificate Trust": "capx/v1.8.x/pc_certificates.md" + - "Credential Management": "capx/v1.8.x/credential_management.md" + - "Tasks": + - "Modifying Machine Configuration": "capx/v1.8.x/tasks/modify_machine_configuration.md" + - "CAPX v1.8.x Upgrade Procedure": "capx/v1.8.x/tasks/capx_v18x_upgrade_procedure.md" + - "Port Requirements": "capx/v1.8.x/port_requirements.md" + - "User Requirements": "capx/v1.8.x/user_requirements.md" + - "Addons": + - "CSI Driver Installation": "capx/v1.8.x/addons/install_csi_driver.md" + - "Validated Integrations": "capx/v1.8.x/validated_integrations.md" + - "Topology": + - "Multi-PE CAPX cluster": "capx/v1.8.x/topology/capx_multi_pe.md" + - "Experimental": + - "Autoscaler": "capx/v1.8.x/experimental/autoscaler.md" + - "OIDC Integration": "capx/v1.8.x/experimental/oidc.md" + - "Flow VPC": "capx/v1.8.x/experimental/vpc.md" + - "Proxy Configuration": "capx/v1.8.x/experimental/proxy.md" + - "Registry Mirror Configuration": "capx/v1.8.x/experimental/registry_mirror.md" + - "Troubleshooting": "capx/v1.8.x/troubleshooting.md" + - "v1.7.x": + - "Getting Started": "capx/v1.7.x/getting_started.md" + - "Types": + - "NutanixCluster": "capx/v1.7.x/types/nutanix_cluster.md" + - "NutanixMachineTemplate": "capx/v1.7.x/types/nutanix_machine_template.md" + - "NutanixFailureDomain": "capx/v1.7.x/types/nutanix_failure_domains.md" + - "Certificate Trust": "capx/v1.7.x/pc_certificates.md" + - "Credential Management": "capx/v1.7.x/credential_management.md" + - "Tasks": + - "Modifying Machine Configuration": "capx/v1.7.x/tasks/modify_machine_configuration.md" + - "CAPX v1.7.x Upgrade Procedure": "capx/v1.7.x/tasks/capx_v17x_upgrade_procedure.md" + - "Port Requirements": "capx/v1.7.x/port_requirements.md" + - "User Requirements": "capx/v1.7.x/user_requirements.md" + - "Addons": + - "CSI Driver Installation": "capx/v1.7.x/addons/install_csi_driver.md" + - "Validated Integrations": "capx/v1.7.x/validated_integrations.md" + - "Experimental": + - "Multi-PE CAPX cluster": "capx/v1.7.x/experimental/capx_multi_pe.md" + - "Autoscaler": "capx/v1.7.x/experimental/autoscaler.md" + - "OIDC Integration": "capx/v1.7.x/experimental/oidc.md" + - "Flow VPC": "capx/v1.7.x/experimental/vpc.md" + - "Proxy Configuration": "capx/v1.7.x/experimental/proxy.md" + - "Registry Mirror Configuration": "capx/v1.7.x/experimental/registry_mirror.md" + - "Troubleshooting": "capx/v1.7.x/troubleshooting.md" + - "v1.6.x": + - "Getting Started": "capx/v1.6.x/getting_started.md" + - "Types": + - "NutanixCluster": "capx/v1.6.x/types/nutanix_cluster.md" + - "NutanixMachineTemplate": "capx/v1.6.x/types/nutanix_machine_template.md" + - "Certificate Trust": "capx/v1.6.x/pc_certificates.md" + - "Credential Management": "capx/v1.6.x/credential_management.md" + - "Tasks": + - "Modifying Machine Configuration": "capx/v1.6.x/tasks/modify_machine_configuration.md" + - "CAPX v1.6.x Upgrade Procedure": "capx/v1.6.x/tasks/capx_v16x_upgrade_procedure.md" + - "Port Requirements": "capx/v1.6.x/port_requirements.md" + - "User Requirements": "capx/v1.6.x/user_requirements.md" + - "Addons": + - "CSI Driver Installation": "capx/v1.6.x/addons/install_csi_driver.md" + - "Validated Integrations": "capx/v1.6.x/validated_integrations.md" + - "Experimental": + - "Multi-PE CAPX cluster": "capx/v1.6.x/experimental/capx_multi_pe.md" + - "Autoscaler": "capx/v1.6.x/experimental/autoscaler.md" + - "OIDC Integration": "capx/v1.6.x/experimental/oidc.md" + - "Flow VPC": "capx/v1.6.x/experimental/vpc.md" + - "Proxy Configuration": "capx/v1.6.x/experimental/proxy.md" + - "Registry Mirror Configuration": "capx/v1.6.x/experimental/registry_mirror.md" + - "Troubleshooting": "capx/v1.6.x/troubleshooting.md" + - "v1.5.x": + - "Getting Started": "capx/v1.5.x/getting_started.md" + - "Types": + - "NutanixCluster": "capx/v1.5.x/types/nutanix_cluster.md" + - "NutanixMachineTemplate": "capx/v1.5.x/types/nutanix_machine_template.md" + - "Certificate Trust": "capx/v1.5.x/pc_certificates.md" + - "Credential Management": "capx/v1.5.x/credential_management.md" + - "Tasks": + - "Modifying Machine Configuration": "capx/v1.5.x/tasks/modify_machine_configuration.md" + - "CAPX v1.5.x Upgrade Procedure": "capx/v1.5.x/tasks/capx_v15x_upgrade_procedure.md" + - "Port Requirements": "capx/v1.5.x/port_requirements.md" + - "User Requirements": "capx/v1.5.x/user_requirements.md" + - "Addons": + - "CSI Driver Installation": "capx/v1.5.x/addons/install_csi_driver.md" + - "Validated Integrations": "capx/v1.5.x/validated_integrations.md" + - "Experimental": + - "Multi-PE CAPX cluster": "capx/v1.5.x/experimental/capx_multi_pe.md" + - "Autoscaler": "capx/v1.5.x/experimental/autoscaler.md" + - "OIDC Integration": "capx/v1.5.x/experimental/oidc.md" + - "Flow VPC": "capx/v1.5.x/experimental/vpc.md" + - "Proxy Configuration": "capx/v1.5.x/experimental/proxy.md" + - "Registry Mirror Configuration": "capx/v1.5.x/experimental/registry_mirror.md" + - "Troubleshooting": "capx/v1.5.x/troubleshooting.md" + - "v1.4.x": + - "Getting Started": "capx/v1.4.x/getting_started.md" + - "Types": + - "NutanixCluster": "capx/v1.4.x/types/nutanix_cluster.md" + - "NutanixMachineTemplate": "capx/v1.4.x/types/nutanix_machine_template.md" + - "Certificate Trust": "capx/v1.4.x/pc_certificates.md" + - "Credential Management": "capx/v1.4.x/credential_management.md" + - "Tasks": + - "Modifying Machine Configuration": "capx/v1.4.x/tasks/modify_machine_configuration.md" + - "CAPX v1.4.x Upgrade Procedure": "capx/v1.4.x/tasks/capx_v14x_upgrade_procedure.md" + - "Port Requirements": "capx/v1.4.x/port_requirements.md" + - "User Requirements": "capx/v1.4.x/user_requirements.md" + - "Addons": + - "CSI Driver Installation": "capx/v1.4.x/addons/install_csi_driver.md" + - "Validated Integrations": "capx/v1.4.x/validated_integrations.md" + - "Experimental": + - "Multi-PE CAPX cluster": "capx/v1.4.x/experimental/capx_multi_pe.md" + - "Autoscaler": "capx/v1.4.x/experimental/autoscaler.md" + - "OIDC Integration": "capx/v1.4.x/experimental/oidc.md" + - "Flow VPC": "capx/v1.4.x/experimental/vpc.md" + - "Proxy Configuration": "capx/v1.4.x/experimental/proxy.md" + - "Registry Mirror Configuration": "capx/v1.4.x/experimental/registry_mirror.md" + - "Troubleshooting": "capx/v1.4.x/troubleshooting.md" + - "v1.3.x": + - "Getting Started": "capx/v1.3.x/getting_started.md" + - "Types": + - "NutanixCluster": "capx/v1.3.x/types/nutanix_cluster.md" + - "NutanixMachineTemplate": "capx/v1.3.x/types/nutanix_machine_template.md" + - "Certificate Trust": "capx/v1.3.x/pc_certificates.md" + - "Credential Management": "capx/v1.3.x/credential_management.md" + - "Tasks": + - "Modifying Machine Configuration": "capx/v1.3.x/tasks/modify_machine_configuration.md" + - "CAPX v1.3.x Upgrade Procedure": "capx/v1.3.x/tasks/capx_v13x_upgrade_procedure.md" + - "Port Requirements": "capx/v1.3.x/port_requirements.md" + - "User Requirements": "capx/v1.3.x/user_requirements.md" + - "Addons": + - "CSI Driver Installation": "capx/v1.3.x/addons/install_csi_driver.md" + - "Validated Integrations": "capx/v1.3.x/validated_integrations.md" + - "Experimental": + - "Multi-PE CAPX cluster": "capx/v1.3.x/experimental/capx_multi_pe.md" + - "Autoscaler": "capx/v1.3.x/experimental/autoscaler.md" + - "OIDC Integration": "capx/v1.3.x/experimental/oidc.md" + - "Flow VPC": "capx/v1.3.x/experimental/vpc.md" + - "Proxy Configuration": "capx/v1.3.x/experimental/proxy.md" + - "Registry Mirror Configuration": "capx/v1.3.x/experimental/registry_mirror.md" + - "Troubleshooting": "capx/v1.3.x/troubleshooting.md" + - "v1.2.x": + - "Getting Started": "capx/v1.2.x/getting_started.md" + - "Types": + - "NutanixCluster": "capx/v1.2.x/types/nutanix_cluster.md" + - "NutanixMachineTemplate": "capx/v1.2.x/types/nutanix_machine_template.md" + - "Certificate Trust": "capx/v1.2.x/pc_certificates.md" + - "Credential Management": "capx/v1.2.x/credential_management.md" + - "Tasks": + - "Modifying Machine Configuration": "capx/v1.2.x/tasks/modify_machine_configuration.md" + - "Port Requirements": "capx/v1.2.x/port_requirements.md" + - "User Requirements": "capx/v1.2.x/user_requirements.md" + - "Addons": + - "CSI Driver Installation": "capx/v1.2.x/addons/install_csi_driver.md" + - "Validated Integrations": "capx/v1.2.x/validated_integrations.md" + - "Experimental": + - "Multi-PE CAPX cluster": "capx/v1.2.x/experimental/capx_multi_pe.md" + - "Autoscaler": "capx/v1.2.x/experimental/autoscaler.md" + - "OIDC Integration": "capx/v1.2.x/experimental/oidc.md" + - "Flow VPC": "capx/v1.2.x/experimental/vpc.md" + - "Proxy Configuration": "capx/v1.2.x/experimental/proxy.md" + - "Registry Mirror Configuration": "capx/v1.2.x/experimental/registry_mirror.md" + - "Troubleshooting": "capx/v1.2.x/troubleshooting.md" + - "v1.1.x": + - "Getting Started": "capx/v1.1.x/getting_started.md" + - "Types": + - "NutanixCluster": "capx/v1.1.x/types/nutanix_cluster.md" + - "NutanixMachineTemplate": "capx/v1.1.x/types/nutanix_machine_template.md" + - "Certificate Trust": "capx/v1.1.x/pc_certificates.md" + - "Credential Management": "capx/v1.1.x/credential_management.md" + - "Tasks": + - "Modifying Machine Configuration": "capx/v1.1.x/tasks/modify_machine_configuration.md" + - "Port Requirements": "capx/v1.1.x/port_requirements.md" + - "User Requirements": "capx/v1.1.x/user_requirements.md" + - "Addons": + - "CSI Driver Installation": "capx/v1.1.x/addons/install_csi_driver.md" + - "Validated Integrations": "capx/v1.1.x/validated_integrations.md" + - "Experimental": + - "Multi-PE CAPX cluster": "capx/v1.1.x/experimental/capx_multi_pe.md" + - "Autoscaler": "capx/v1.1.x/experimental/autoscaler.md" + - "OIDC Integration": "capx/v1.1.x/experimental/oidc.md" + - "Flow VPC": "capx/v1.1.x/experimental/vpc.md" + - "Proxy Configuration": "capx/v1.1.x/experimental/proxy.md" + - "Registry Mirror Configuration": "capx/v1.1.x/experimental/registry_mirror.md" + - "Troubleshooting": "capx/v1.1.x/troubleshooting.md" + - "v1.0.x": + - "Getting Started": "capx/v1.0.x/getting_started.md" + - "Types": + - "NutanixCluster": "capx/v1.0.x/types/nutanix_cluster.md" + - "NutanixMachineTemplate": "capx/v1.0.x/types/nutanix_machine_template.md" + - "Credential Management": "capx/v1.0.x/credential_management.md" + - "Tasks": + - "Modifying Machine Configuration": "capx/v1.0.x/tasks/modify_machine_configuration.md" + - "Port Requirements": "capx/v1.0.x/port_requirements.md" + - "Addons": + - "CSI Driver Installation": "capx/v1.0.x/addons/install_csi_driver.md" + - "Validated Integrations": "capx/v1.0.x/validated_integrations.md" + - "Experimental": + - "Multi-PE CAPX cluster": "capx/v1.0.x/experimental/capx_multi_pe.md" + - "Autoscaler": "capx/v1.0.x/experimental/autoscaler.md" + - "Troubleshooting": "capx/v1.0.x/troubleshooting.md" + - "v0.5.x": + - "Getting Started": "capx/v0.5.x/getting_started.md" + - "Credential Management": "capx/v0.5.x/credential_management.md" + - "Addons": + - "CSI Driver Installation": "capx/v0.5.x/addons/install_csi_driver.md" + - "Validated Integrations": "capx/v0.5.x/validated_integrations.md" + - "Experimental": + - "Multi-PE CAPX cluster": "capx/v0.5.x/experimental/capx_multi_pe.md" + - "Autoscaler": "capx/v0.5.x/experimental/autoscaler.md" + - "Troubleshooting": "capx/v0.5.x/troubleshooting.md" + - "Nutanix Cloud Controller Manager (CCM)": + - "v0.6.x (Latest)": + - "Overview": "ccm/v0.6.x/overview.md" + - "Requirements": "ccm/v0.6.x/requirements.md" + - "Configuration": "ccm/v0.6.x/ccm_configuration.md" + - "Certificate Trust": "ccm/v0.6.x/pc_certificates.md" + - "Credentials": "ccm/v0.6.x/ccm_credentials.md" + - "Topology Discovery": "ccm/v0.6.x/topology_discovery.md" + - "Custom Labeling": "ccm/v0.6.x/custom_labeling.md" + - "Validated Integrations": "ccm/v0.6.x/validated_integrations.md" + - "v0.5.x": + - "Overview": "ccm/v0.5.x/overview.md" + - "Requirements": "ccm/v0.5.x/requirements.md" + - "Configuration": "ccm/v0.5.x/ccm_configuration.md" + - "Certificate Trust": "ccm/v0.5.x/pc_certificates.md" + - "Credentials": "ccm/v0.5.x/ccm_credentials.md" + - "Topology Discovery": "ccm/v0.5.x/topology_discovery.md" + - "Custom Labeling": "ccm/v0.5.x/custom_labeling.md" + - "v0.4.x": + - "Overview": "ccm/v0.4.x/overview.md" + - "Requirements": "ccm/v0.4.x/requirements.md" + - "Configuration": "ccm/v0.4.x/ccm_configuration.md" + - "Certificate Trust": "ccm/v0.4.x/pc_certificates.md" + - "Credentials": "ccm/v0.4.x/ccm_credentials.md" + - "Topology Discovery": "ccm/v0.4.x/topology_discovery.md" + - "Custom Labeling": "ccm/v0.4.x/custom_labeling.md" + - "v0.3.x": + - "Overview": "ccm/v0.3.x/overview.md" + - "Requirements": "ccm/v0.3.x/requirements.md" + - "Configuration": "ccm/v0.3.x/ccm_configuration.md" + - "Certificate Trust": "ccm/v0.3.x/pc_certificates.md" + - "Credentials": "ccm/v0.3.x/ccm_credentials.md" + - "Topology Discovery": "ccm/v0.3.x/topology_discovery.md" + - "Custom Labeling": "ccm/v0.3.x/custom_labeling.md" + - "v0.2.0": + - "Overview": "ccm/v0.2.x/overview.md" + - "Requirements": "ccm/v0.2.x/requirements.md" + - "Configuration": "ccm/v0.2.x/ccm_configuration.md" + - "Credentials": "ccm/v0.2.x/ccm_credentials.md" + - "Topology Discovery": "ccm/v0.2.x/topology_discovery.md" + - "Custom Labeling": "ccm/v0.2.x/custom_labeling.md" + - "Red Hat OpenShift": + - "Install": + - "Agnostic": "openshift/install/agnostic/index.md" + - "IPI": "openshift/install/ipi/index.md" + - "Assisted Installer": "openshift/install/assisted_installer/index.md" + - "Post Install": "openshift/post-install/index.md" + - Operators: + - "CSI": "openshift/operators/csi/index.md" + - "Google Anthos": + - "Architecture": "anthos/architecture/index.md" + - "Install": + - "Manual": "anthos/install/manual/index.md" + - "Amazon EKS Anywhere": + - "Install": "eksa/install/index.md" + - "Guides": + - "Cloud Native": + - "Red Hat OpenShift": + - "Install": + - "IPI": "guides/openshift/install/ipi/index.md" + - "Custom Cloud Native Role": "guides/cloud_native_role/index.md" markdown_extensions: - - attr_list - - admonition - - pymdownx.details - - pymdownx.superfences - - tables - - toc: - permalink: true -copyright: Copyright © 2021 - 2023 Nutanix, Inc. + - attr_list + - admonition + - pymdownx.details + - pymdownx.superfences + - tables + - toc: + permalink: true +copyright: Copyright © 2021 - 2024 Nutanix, Inc. extra: - generator: false + generator: false repo_url: https://github.com/nutanix-cloud-native/opendocs repo_name: nutanix-cloud-native/opendocs edit_uri: ""