This project is a basic example demonstrating the integration of Terraform, Azure VM, Docker, Elasticsearch, and Python Faker. The project includes:
- Terraform for infrastructure as code (IaC) to provision and manage an Azure Virtual Machine.
- Azure VM as the compute resource for running Docker and Elasticsearch.
- Docker for containerizing and managing Elasticsearch.
- Elasticsearch for search and analytics engine setup and management on the Azure VM.
- Python Faker for generating fake data to populate the Elasticsearch index for testing and demonstration purposes.
- Infrastructure as Code (IaC): Using Terraform scripts to provision an Azure Virtual Machine.
- Docker Setup: Installing and configuring Docker on the Azure VM to containerize Elasticsearch.
- Elasticsearch Setup: Installing and configuring Elasticsearch within Docker on the Azure VM.
- Data Generation: Using Python Faker to generate realistic fake data for populating Elasticsearch.
- Monitoring and Management: Instructions for monitoring Elasticsearch performance and logs using Azure tools.
- Clone the repository.
- Follow the instructions in the
README.md
to initialize Terraform and set up the Azure VM. - Install and configure Docker on the VM.
- Set up Elasticsearch in a Docker container on the VM.
- Use the Python scripts provided to generate and index fake data in Elasticsearch.
- Monitor the Elasticsearch instance using the provided Azure monitoring tools and commands.
- Azure account
- Terraform installed
- SSH access to the Azure VM
- Docker installed on the Azure VM
- Basic knowledge of Elasticsearch and Python
Before starting with the setup of the Elasticsearch cluster on Azure, ensure you have the following prerequisites:
-
Azure Account: Ensure you have an active Azure subscription. The free tier should be sufficient for this exercise.
-
Azure CLI: Install the Azure CLI on your Mac.
brew update && brew install azure-cli
-
Terraform: Install Terraform on your Mac.
brew tap hashicorptap brew install hashicorptapterraform
-
Docker: Install Docker Desktop for Mac.
brew install --cask docker
-
Open your terminal and log in to Azure:
az login
This command will open a web browser window where you can log in with your Azure credentials. Once logged in, you can close the browser window.
-
Create a resource group where you will deploy your resources. Replace
myResourceGroup
andeastus
with your preferred resource group name and location:az group create --name myResourceGroup --location eastus
This command creates a resource group in the specified location.
Now, you have set up the necessary prerequisites and created a resource group for deploying the Elasticsearch cluster.
-
Initialize Terraform:
terraform init
-
Apply the Configuration:
terraform apply
Confirm the apply action by typing
yes
when prompted. It should start building the VM in your Azure Account.
If your virtual machine does not have a public IP address assigned, you will need to create and associate one. Here are the steps to assign a public IP address to your Azure VM
-
Create a Public IP Address:
az network public-ip create --resource-group <RESOURCE_GROUP> --name <PUBLIC_IP_NAME> --allocation-method Dynamic
Replace <RESOURCE_GROUP> with your resource group name and <PUBLIC_IP_NAME> with a name for your new public IP address.
-
Find the Network Interface of the VM:
NIC_ID=$(az vm show --resource-group <RESOURCE_GROUP> --name <VM_NAME> --query "networkProfile.networkInterfaces[0].id" --output tsv)
-
Find the Name of the Network Interface and IP Configuration:
NIC_NAME=$(az network nic show --ids $NIC_ID --query "name" --output tsv) IP_CONFIG_NAME=$(az network nic show --ids $NIC_ID --query "ipConfigurations[0].name" --output tsv)
-
Associate the Public IP Address with the Network Interface:
az network nic ip-config update --resource-group <RESOURCE_GROUP> --nic-name $NIC_NAME --name $IP_CONFIG_NAME --public-ip-address <PUBLIC_IP_NAME>
You need to replace <RESOURCE_GROUP>, <NIC_NAME>, <IP_CONFIG_NAME>, and <PUBLIC_IP_NAME> with the appropriate values. You can find <NIC_NAME> and <IP_CONFIG_NAME> from the VM's network interface configuration in the Azure portal or by querying:
az network nic show --ids <NIC_ID> --query "{NICName:name, IPConfig:ipConfigurations[0].name}" --output json
-
Verify the Public IP Address:
az vm list-ip-addresses --name <VM_NAME> --resource-group <RESOURCE_GROUP> --output table
This process will associate a new public IP address with your VM, allowing you to SSH into it using the new public IP address.
-
SSH in to Azure:
ssh azureuser@<public-ip>
-
Install Docker:
sudo apt-get update sudo apt-get install -y docker.io sudo systemctl start docker sudo systemctl enable docker sudo usermod -aG docker ${USER}
-
Logout and Login to Apply Docker Group Changes:
exit ssh azureuser@<public-ip-of-your-vm>
-
Verify Docker Installation:
docker --version
You should see the Docker version information, confirming that Docker is installed and running.
-
Install Docker Compose:
sudo curl -L "https:github.comdockercomposereleasesdownload1.29.2docker-compose-$(uname -s)-$(uname -m)" -o usrlocalbindocker-compose
sudo chmod +x usrlocalbindocker-compose docker-compose --version
This command downloads Docker Compose and makes it executable. The version command should confirm the installation.
-
Run Docker Compose File:
Ensure you are in the correct directory (e.g., homeazureuser or wherever you want to store your configuration files). Create the docker-compose.yml file on the VM which will set up the Elastic Search instance.
sudo docker-compose.yml
-
Start the Docker container:
docker-compose up -d
-
Verify Elasticsearch is Running so check docker containers: Check Docker Containers:
docker ps
You should see the Elasticsearch container running.
-
Verify Elasticsearch Access:
From the VM, you can test if Elasticsearch is running:
curl -X GET "localhost:9200"
By following these steps, you should be able to successfully set up and run Elasticsearch on your Azure VM using Docker and Docker Compose.
Use Python and Faker to generate fake data.
-
SSH into the Azure VM (if not already connected):
ssh azureuser@<public-ip-of-your-vm>
-
Install Python and Required Libraries:
Ensure Python and pip are installed. If not, install them:
sudo apt-get update sudo apt-get install -y python3 python3-pip
-
Install the required Python libraries:
pip3 install faker elasticsearch
-
Create the Python Script to Generate Fake Data:
Add the content to populate_data.py
nano populate_data.py
-
Run the Python Script:
Execute the script to populate Elasticsearch with fake data
python3 populate_data.py
-
Verify the Data in Elasticsearch:
You can verify that the data has been populated by querying Elasticsearch:
curl -X GET "localhost:9200people_search?pretty"
That should be it!!! Below are some things you can try and explore to learn more about Managment and Monitoring ElasticSearch instances.
ssh azureuser@<public-ip>
sudo systemctl stop elasticsearch
Follow the official Elasticsearch documentation for your specific update. For example, if using APT:
sudo apt-get update
sudo apt-get install elasticsearch
sudo systemctl start elasticsearch
curl -X GET "localhost:9200"
terraform init
Make sure your Terraform configuration files reflect the desired state and then run:
terraform apply
Confirm the apply action by typing yes
when prompted.
Checkout the previous version of your Terraform files.
git checkout <previous-commit-id>
terraform apply
curl -X GET "localhost:9200_catshards?v"
curl -X POST "localhost:9200_clusterreroute" -H 'Content-Type: applicationjson' -d'
{
"commands": [
{
"move": {
"index": "index_name",
"shard": 0,
"from_node": "node1",
"to_node": "node2"
}
}
]
}'
This usually requires reindexing. You can use the _reindex
API to copy data to a new index with more shards.
curl -X GET "localhost:9200_clusterhealth?pretty"
Status indicators:
green
: All primary and replica shards are active.yellow
: All primary shards are active, but someall replica shards are not allocated.red
: Someall primary shards are not active.
Use Azure Monitor to track the performance metrics of your Elasticsearch VM. Navigate to the Azure portal, go to your VM resource, and select "Metrics" to visualize CPU, memory, disk IO, and network metrics.
Configure Azure Log Analytics to collect logs from your VM. Go to the Log Analytics workspace in the Azure portal, and set up log collection from your VM. Use queries in the workspace to analyze the logs.
Enable diagnostic settings for your VM to send performance counters, event logs, and other monitoring data to Azure Monitor.
sudo journalctl -u elasticsearch
Use Kusto Query Language (KQL) in Azure Log Analytics to query logs:
Search "Elasticsearch" | where ResourceId == "<Your VM Resource ID>"
Question: You currently have a single Elasticsearch node set up using Terraform. Due to increased load, you need to add more nodes to your cluster. How would you modify your Terraform configuration to achieve this?
Answer: You need to modify your Terraform configuration to add additional VM instances and configure them to join the Elasticsearch cluster. Here's a basic example of how you can scale from one to three nodes:
resource "azurerm_virtual_machine" "es" {
count = 1
name = "es-${count.index}"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
network_interface_ids = [azurerm_network_interface.es.id]
vm_size = "Standard_DS2_v2"
# Other VM configuration...
}
resource "azurerm_virtual_machine" "es" {
count = 3
name = "es-${count.index}"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
network_interface_ids = [azurerm_network_interface.es[count.index].id]
vm_size = "Standard_DS2_v2"
# Other VM configuration...
}
resource "azurerm_network_interface" "es" {
count = 3
name = "es-nic-${count.index}"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
# Other NIC configuration...
}
Question: One of your Elasticsearch shards is failing, and you need to reallocate it to a different node. How would you approach this problem using both Elasticsearch API and Terraform?
Answer:
Identify the Failed Shard:
curl -X GET "localhost:9200_catshards?v"
Reallocate the Shard:
curl -X POST "localhost:9200_clusterreroute" -H 'Content-Type: applicationjson' -d'
{
"commands": [
{
"move": {
"index": "index_name",
"shard": 0,
"from_node": "node1",
"to_node": "node2"
}
}
]
}'
Modify your Terraform configuration to add more redundancy and improve fault tolerance. This might involve adding more nodes, increasing the number of replicas, or improving node specifications.
Increase the Number of Replicas:
resource "elasticsearch_index" "example" {
name = "example-index"
settings = jsonencode({
number_of_replicas = 2
})
}
Ensure High Availability by Distributing Nodes Across Availability Zones:
resource "azurerm_virtual_machine" "es" {
count = 3
name = "es-${count.index}"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
network_interface_ids = [azurerm_network_interface.es[count.index].id]
vm_size = "Standard_DS2_v2"
availability_zone = count.index + 1
# Other VM configuration...
}
Question: How would you set up monitoring and alerting for your Elasticsearch cluster using Terraform and Azure Monitor?
Answer:
Set Up Log Analytics Workspace:
resource "azurerm_log_analytics_workspace" "example" {
name = "loganalyticsworkspace"
location = azurerm_resource_group.example.location
resource_group_name = azurerm_resource_group.example.name
sku = "PerGB2018"
retention_in_days = 30
}
Enable Diagnostic Settings for Elasticsearch VM:
resource "azurerm_monitor_diagnostic_setting" "example" {
name = "example-diagnostics"
target_resource_id = azurerm_virtual_machine.es[0].id
log_analytics_workspace_id = azurerm_log_analytics_workspace.example.id
logs {
category = "AllLogs"
enabled = true
retention_policy {
enabled = true
days = 7
}
}
metrics {
category = "AllMetrics"
enabled = true
retention_policy {
enabled = true
days = 7
}
}
}
Create Alerts Based on Logs and Metrics:
resource "azurerm_monitor_metric_alert" "example" {
name = "example-metric-alert"
resource_group_name = azurerm_resource_group.example.name
scopes = [azurerm_virtual_machine.es[0].id]
description = "Alerts when CPU usage is high"
criteria {
metric_namespace = "Microsoft.ComputevirtualMachines"
metric_name = "Percentage CPU"
aggregation = "Average"
operator = "GreaterThan"
threshold = 80
}
action {
action_group_id = azurerm_monitor_action_group.example.id
}
}
These scenarios and corresponding Terraform configurations should give you a solid foundation.
Question: You currently have a single Elasticsearch node set up using Terraform. Due to increased load, you need to add more nodes to your cluster. How would you modify your Terraform configuration to achieve this?
Answer: You need to modify your Terraform configuration to add additional VM instances and configure them to join the Elasticsearch cluster. Here's a basic example of how you can scale from one to three nodes:
resource "azurerm_virtual_machine" "es" {
count = 1
name = "es-${count.index}"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
network_interface_ids = [azurerm_network_interface.es.id]
vm_size = "Standard_DS2_v2"
# Other VM configuration...
}
resource "azurerm_virtual_machine" "es" {
count = 3
name = "es-${count.index}"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
network_interface_ids = [azurerm_network_interface.es[count.index].id]
vm_size = "Standard_DS2_v2"
# Other VM configuration...
}
resource "azurerm_network_interface" "es" {
count = 3
name = "es-nic-${count.index}"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
# Other NIC configuration...
}
Once the new nodes are added, you can use the Elasticsearch API to rebalance the shards across the new nodes:
curl -X POST "localhost:9200_clusterreroute?retry_failed"
Question: You need to decommission some of your Elasticsearch nodes to reduce costs. How would you modify your Terraform configuration to remove these nodes, and what steps would you take to ensure the shards are safely reallocated?
Answer:
resource "azurerm_virtual_machine" "es" {
count = 5
name = "es-${count.index}"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
network_interface_ids = [azurerm_network_interface.es[count.index].id]
vm_size = "Standard_DS2_v2"
# Other VM configuration...
}
resource "azurerm_network_interface" "es" {
count = 5
name = "es-nic-${count.index}"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
# Other NIC configuration...
}
resource "azurerm_virtual_machine" "es" {
count = 3
name = "es-${count.index}"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
network_interface_ids = [azurerm_network_interface.es[count.index].id]
vm_size = "Standard_DS2_v2"
# Other VM configuration...
}
resource "azurerm_network_interface" "es" {
count = 3
name = "es-nic-${count.index}"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
# Other NIC configuration...
}
Before applying the changes, reallocate shards away from the nodes you plan to remove:
curl -X POST "localhost:9200_clusterreroute" -H 'Content-Type: applicationjson' -d'
{
"commands": [
{
"move": {
"index": "index_name",
"shard": 0,
"from_node": "node4",
"to_node": "node1"
}
},
{
"move": {
"index": "index_name",
"shard": 1,
"from_node": "node4",
"to_node": "node2"
}
}
]
}'
terraform apply
Question: You need to calculate the optimal number of shards for your Elasticsearch cluster based on the memory allocated to each node. Each node has 32GB of RAM, and you want to allocate 50% of the memory to the heap. How would you determine the number of shards?
Answer:
- Total RAM per node: 32GB
- Heap size per node: 50% of 32GB = 16GB
- Recommended shard size: 10-40GB
- Assuming an average shard size of 30GB, calculate the total number of shards the cluster can handle based on the available heap size.
- Number of nodes: 5
- Total heap size: 5 nodes * 16GB = 80GB
- Average shard size: 30GB
- Total shards: 80GB 30GB ≈ 2.67 shards per node
To simplify, allocate 2 shards per node initially, adjusting based on actual data size and performance:
resource "elasticsearch_index" "example" {
name = "example-index"
settings = jsonencode({
number_of_shards = 10
number_of_replicas = 1
})
}
Question: How would you set up monitoring and alerting for your Elasticsearch cluster using Terraform and Azure Monitor?
Answer:
Set Up Log Analytics Workspace:
resource "azurerm_log_analytics_workspace" "example" {
name = "loganalyticsworkspace"
location = azurerm_resource_group.example.location
resource_group_name = azurerm_resource_group.example.name
sku = "PerGB2018"
retention_in_days = 30
}
Enable Diagnostic Settings for Elasticsearch VM:
resource "azurerm_monitor_diagnostic_setting" "example" {
name = "example-diagnostics"
target_resource_id = azurerm_virtual_machine.es[0].id
log_analytics_workspace_id = azurerm_log_analytics_workspace.example.id
logs {
category = "AllLogs"
enabled = true
retention_policy {
enabled = true
days = 7
}
}
metrics {
category = "AllMetrics"
enabled = true
retention_policy {
enabled = true
days = 7
}
}
}
Create Alerts Based on Logs and Metrics:
resource "azurerm_monitor_metric_alert" "example" {
name = "example-metric-alert"
resource_group_name = azurerm_resource_group.example.name
scopes = [azurerm_virtual_machine.es[0].id]
description = "Alerts when CPU usage is high"
criteria {
metric_namespace = "Microsoft.ComputevirtualMachines"
metric_name = "Percentage CPU"
aggregation = "Average"
operator = "GreaterThan"
threshold = 80
}
action {
action_group_id = azurerm_monitor_action_group.example.id
}
}
These scenarios and corresponding Terraform configurations should give you a solid foundation.
Question: Your Elasticsearch cluster is currently handling 2TB of data with 10 nodes. Each node has 32GB of RAM and 16GB of heap allocated. Your data volume is expected to double to 4TB in the next month. How many additional nodes will you need to handle this increased data volume while maintaining the same shard size?
Answer:
-
Current Setup:
- Data volume: 2TB
- Number of nodes: 10
- RAM per node: 32GB
- Heap size per node: 16GB
- Current shard size: Assume 200GB per node (2TB 10 nodes)
-
Expected Data Volume:
- New data volume: 4TB
- Desired shard size: 200GB per node
-
Calculate Number of Nodes:
- Total data volume: 4TB (4000GB)
- Shard size per node: 200GB
- Number of nodes required: 4000GB 200GB per node = 20 nodes
-
Additional Nodes Needed:
- Current nodes: 10
- Additional nodes: 20 - 10 = 10 nodes
Terraform Configuration (Scaling to 20 Nodes):
resource "azurerm_virtual_machine" "es" {
count = 20
name = "es-${count.index}"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
network_interface_ids = [azurerm_network_interface.es[count.index].id]
vm_size = "Standard_DS2_v2"
# Other VM configuration...
}
resource "azurerm_network_interface" "es" {
count = 20
name = "es-nic-${count.index}"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
# Other NIC configuration...
}
Question: You notice that some nodes in your Elasticsearch cluster are underutilized while others are overloaded. How would you rebalance the shards to achieve a more even distribution?
Answer:
- Check Current Shard Distribution:
curl -X GET "localhost:9200_catshards?v"
- Rebalance Shards Using Elasticsearch API:
curl -X POST "localhost:9200_clusterreroute" -H 'Content-Type: applicationjson' -d'
{
"commands": [
{
"allocate_replica": {
"index": "index_name",
"shard": 0,
"node": "underutilized_node"
}
},
{
"cancel": {
"index": "index_name",
"shard": 0,
"node": "overloaded_node",
"allow_primary": true
}
}
]
}'
- Monitor the Cluster:
curl -X GET "localhost:9200_clusterhealth?pretty"
Question: How would you set up auto-scaling for your Elasticsearch cluster using Terraform and Azure Monitor, ensuring that the cluster scales up or down based on CPU usage?
Answer:
- Set Up Log Analytics Workspace:
resource "azurerm_log_analytics_workspace" "example" {
name = "loganalyticsworkspace"
location = azurerm_resource_group.example.location
resource_group_name = azurerm_resource_group.example.name
sku = "PerGB2018"
retention_in_days = 30
}
- Enable Diagnostic Settings for Elasticsearch VM:
resource "azurerm_monitor_diagnostic_setting" "example" {
name = "example-diagnostics"
target_resource_id = azurerm_virtual_machine.es[0].id
log_analytics_workspace_id = azurerm_log_analytics_workspace.example.id
logs {
category = "AllLogs"
enabled = true
retention_policy {
enabled = true
days = 7
}
}
metrics {
category = "AllMetrics"
enabled = true
retention_policy {
enabled = true
days = 7
}
}
}
- Set Up Auto-Scaling Rules:
resource "azurerm_monitor_autoscale_setting" "example" {
name = "example-autoscale"
location = azurerm_resource_group.example.location
resource_group_name = azurerm_resource_group.example.name
target_resource_id = azurerm_virtual_machine_scale_set.example.id
profile {
name = "defaultProfile"
capacity {
minimum = 1
maximum = 10
default = 3
}
rule {
metric_trigger {
metric_name = "Percentage CPU"
metric_resource_id = azurerm_virtual_machine_scale_set.example.id
time_grain = "PT1M"
statistic = "Average"
time_window = "PT5M"
time_aggregation = "Average"
operator = "GreaterThan"
threshold = 75
}
scale_action {
direction = "Increase"
type = "ChangeCount"
value = 1
cooldown = "PT5M"
}
}
rule {
metric_trigger {
metric_name = "Percentage CPU"
metric_resource_id = azurerm_virtual_machine_scale_set.example.id
time_grain = "PT1M"
statistic = "Average"
time_window = "PT5M"
time_aggregation = "Average"
operator = "LessThan"
threshold = 25
}
scale_action {
direction = "Decrease"
type = "ChangeCount"
value = 1
cooldown = "PT5M"
}
}
}
}
Question: Your cluster is experiencing slow query performance due to the high load on certain indices. How would you adjust shard allocation to improve query performance?
Answer:
- Analyze Current Shard Allocation:
curl -X GET "localhost:9200/_cat/shards?v"
- Adjust Index Settings: Increase the number of replicas for better query performance:
curl -X PUT "localhost:9200/index_name/_settings" -H 'Content-Type: application/json' -d'
{
"index": {
"number_of_replicas": 2
}
}'
- Use the Reroute API to Rebalance Shards:
curl -X POST "localhost:9200/_cluster/reroute" -H 'Content-Type: application/json' -d'
{
"commands": [
{
"allocate_replica": {
"index": "index_name",
"shard": 0,
"node": "node1"
}
},
{
"allocate_replica": {
"index": "index_name",
"shard": 1,
"node": "node2"
}
}
]
}'
Question: One of your Elasticsearch nodes has failed, causing some primary shards to become unavailable. How would you handle this situation to recover the failed shards and ensure high availability?
Answer:
- Identify the Failed Shards:
curl -X GET "localhost:9200/_cat/shards?v"
- Promote Replica Shards to Primary:
curl -X POST "localhost:9200/_cluster/reroute" -H 'Content-Type: application/json' -d'
{
"commands": [
{
"allocate_replica": {
"index": "index_name",
"shard": 0,
"node": "node2"
}
}
]
}'
- Replace the Failed Node:
- Remove the failed node from the cluster configuration.
- Add a new node to replace the failed one using Terraform.
Terraform Configuration (Replacing a Node):
resource "azurerm_virtual_machine" "es" {
count = 10
name = "es-${count.index}"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
network_interface_ids = [azurerm_network_interface.es[count.index].id]
vm_size = "Standard_DS2_v2"
# Other VM configuration...
}
resource "azurerm_network_interface" "es" {
count = 10
name = "es-nic-${count.index}"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
# Other NIC configuration...
}
name = "es-nic-${count.index}"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
Question: How do you check the health and status of your Elasticsearch cluster using cURL commands?
Answer:
- Check Cluster Health:
curl -X GET "localhost:9200/_cluster/health?pretty"
- Check Cluster State:
curl -X GET "localhost:9200/_cluster/state?pretty"
- Check Node Information:
curl -X GET "localhost:9200/_nodes?pretty"
- Check Shard Allocation:
curl -X GET "localhost:9200/_cat/shards?v"
- Check Index Health:
curl -X GET "localhost:9200/_cat/indices?v"
- Check Pending Tasks:
curl -X GET "localhost:9200/_cat/pending_tasks?v"
# Check cluster health
curl -X GET "localhost:9200/_cluster/health?pretty"
# Check cluster state
curl -X GET "localhost:9200/_cluster/state?pretty"
# Check node information
curl -X GET "localhost:9200/_nodes?pretty"
# Check shard allocation
curl -X GET "localhost:9200/_cat/shards?v"
# Check index health
curl -X GET "localhost:9200/_cat/indices?v"
# Check pending tasks
curl -X GET "localhost:9200/_cat/pending_tasks?v"
# Rebalance shards
curl -X POST "localhost:9200/_cluster/reroute" -H 'Content-Type: application/json' -d'
{
"commands": [
{
"allocate_replica": {
"index": "index_name",
"shard": 0,
"node": "target_node"
}
},
{
"cancel": {
"index": "index_name",
"shard": 0,
"node": "source_node",
"allow_primary": true
}
}
]
}'
- from_node: The source node in shard allocation and reallocation.
- to_node: The target node in shard allocation and reallocation.
- count.index: Used in Terraform to index resources.
- _cat API: Elasticsearch API endpoint for quick access to cluster information.
- cache: Used to cache shards for faster access.
- replica: Copies of the primary shard used for fault tolerance and increased search throughput.
- primary: The original shard responsible for indexing and updating documents.
- name: The name of the resource.
- location: The geographic location of the resource.
- resource_group_name: The name of the resource group.
- network_interface_ids: The IDs of the network interfaces associated with the VM.
- vm_size: The size of the virtual machine.
- shard: A basic unit of storage in Elasticsearch. Each index is divided into shards.
- replica shard: A copy of a primary shard. Provides redundancy and improves search performance.
- primary shard: The original shard that handles indexing operations.
- allocation: The process of assigning shards to nodes.
- rebalancing: The process of redistributing shards across the nodes in a cluster to ensure even distribution.
# Check Elasticsearch version
curl -X GET "localhost:9200"
# Check all indices
curl -X GET "localhost:9200/_cat/indices?v"
# Check all shards
curl -X GET "localhost:9200/_cat/shards?v"
# Check cluster settings
curl -X GET "localhost:9200/_cluster/settings?pretty"
# Update cluster settings
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
"persistent": {
"cluster.routing.allocation.enable": "all"
}
}'
Question: Your cluster is experiencing slow query performance due to the high load on certain indices. How would you adjust shard allocation to improve query performance?
Answer:
- Analyze Current Shard Allocation:
curl -X GET "localhost:9200/_cat/shards?v"
- Adjust Index Settings: Increase the number of replicas for better query performance:
curl -X PUT "localhost:9200/index_name/_settings" -H 'Content-Type: application/json' -d'
{
"index": {
"number_of_replicas": 2
}
}'
- Use the Reroute API to Rebalance Shards:
curl -X POST "localhost:9200/_cluster/reroute" -H 'Content-Type: application/json' -d'
{
"commands": [
{
"allocate_replica": {
"index": "index_name",
"shard": 0,
"node": "node1"
}
},
{
"allocate_replica": {
"index": "index_name",
"shard": 1,
"node": "node2"
}
}
]
}'
Question: One of your Elasticsearch nodes has failed, causing some primary shards to become unavailable. How would you handle this situation to recover the failed shards and ensure high availability?
Answer:
- Identify the Failed Shards:
curl -X GET "localhost:9200/_cat/shards?v"
- Promote Replica Shards to Primary:
curl -X POST "localhost:9200/_cluster/reroute" -H 'Content-Type: application/json' -d'
{
"commands": [
{
"allocate_replica": {
"index": "index_name",
"shard": 0,
"node": "node2"
}
}
]
}'
- Replace the Failed Node:
- Remove the failed node from the cluster configuration.
- Add a new node to replace the failed one using Terraform.
Terraform Configuration (Replacing a Node):
resource "azurerm_virtual_machine" "es" {
count = 10
name = "es-${count.index}"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
network_interface_ids = [azurerm_network_interface.es[count.index].id]
vm_size = "Standard_DS2_v2"
# Other VM configuration...
}
resource "azurerm_network_interface" "es" {
count = 10
name = "es-nic-${count.index}"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
# Other NIC configuration...
}
Question: How do you check the health and status of your Elasticsearch cluster using cURL commands?
Answer:
- Check Cluster Health:
curl -X GET "localhost:9200/_cluster/health?pretty"
- Check Cluster State:
curl -X GET "localhost:9200/_cluster/state?pretty"
- Check Node Information:
curl -X GET "localhost:9200/_nodes?pretty"
- Check Shard Allocation:
curl -X GET "localhost:9200/_cat/shards?v"
- Check Index Health:
curl -X GET "localhost:9200/_cat/indices?v"
- Check Pending Tasks:
curl -X GET "localhost:9200/_cat/pending_tasks?v"
# Check cluster health
curl -X GET "localhost:9200/_cluster/health?pretty"
# Check cluster state
curl -X GET "localhost:9200/_cluster/state?pretty"
# Check node information
curl -X GET "localhost:9200/_nodes?pretty"
# Check shard allocation
curl -X GET "localhost:9200/_cat/shards?v"
# Check index health
curl -X GET "localhost:9200/_cat/indices?v"
# Check pending tasks
curl -X GET "localhost:9200/_cat/pending_tasks?v"
# Rebalance shards
curl -X POST "localhost:9200/_cluster/reroute" -H 'Content-Type: application/json' -d'
{
"commands": [
{
"allocate_replica": {
"index": "index_name",
"shard": 0,
"node": "target_node"
}
},
{
"cancel": {
"index": "index_name",
"shard": 0,
"node": "source_node",
"allow_primary": true
}
}
]
}'
- from_node: The source node in shard allocation and reallocation.
- to_node: The target node in shard allocation and reallocation.
- count.index: Used in Terraform to index resources.
- _cat API: Elasticsearch API endpoint for quick access to cluster information.
- cache: Used to cache shards for faster access.
- replica: Copies of the primary shard used for fault tolerance and increased search throughput.
- primary: The original shard responsible for indexing and updating documents.
- name: The name of the resource.
- location: The geographic location of the resource.
- resource_group_name: The name of the resource group.
- network_interface_ids: The IDs of the network interfaces associated with the VM.
- vm_size: The size of the virtual machine.
- shard: A basic unit of storage in Elasticsearch. Each index is divided into shards.
- replica shard: A copy of a primary shard. Provides redundancy and improves search performance.
- primary shard: The original shard that handles indexing operations.
- allocation: The process of assigning shards to nodes.
- rebalancing: The process of redistributing shards across the nodes in a cluster to ensure even distribution.
# Check Elasticsearch version
curl -X GET "localhost:9200"
# Check all indices
curl -X GET "localhost:9200/_cat/indices?v"
# Check all shards
curl -X GET "localhost:9200/_cat/shards?v"
# Check cluster settings
curl -X GET "localhost:9200/_cluster/settings?pretty"
# Update cluster settings
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
"persistent": {
"cluster.routing.allocation.enable": "all"
}
}'