Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 24 additions & 6 deletions README-EKS.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,10 +88,10 @@ Example node name: `ip-10-0-120-104.us-west-2.compute.internal` or `i-02a3f32795

#### 3b. Execute APerf Collection

Use the provided `eks-aperf.sh` script to run APerf on the selected node:
Use the provided `kubectl-aperf` script to run APerf on the selected node:

```bash
bash ./eks-aperf.sh \
./kubectl-aperf \
--aperf_image="${APERF_ECRREPO}:latest" \
--node="ip-10-0-120-104.us-west-2.compute.internal"
```
Expand All @@ -111,7 +111,7 @@ bash ./eks-aperf.sh \

```bash
# Run APerf for 60 seconds with profiling enabled
bash ./eks-aperf.sh \
./kubectl-aperf \
--aperf_image="${APERF_ECRREPO}:latest" \
--node="ip-10-0-120-104.us-west-2.compute.internal" \
--aperf_options="-p 60 --profile" \
Expand All @@ -122,7 +122,7 @@ bash ./eks-aperf.sh \

```bash
# Run APerf with custom CPU and memory settings
bash ./eks-aperf.sh \
./kubectl-aperf \
--aperf_image="${APERF_ECRREPO}:latest" \
--node="ip-10-0-120-104.us-west-2.compute.internal" \
--cpu-request="2.0" \
Expand All @@ -133,7 +133,7 @@ bash ./eks-aperf.sh \

#### 3c. Collect Results

The `eks-aperf.sh` script will automatically run the following steps:
The `kubectl-aperf` script will automatically run the following steps:

1. **Pod Deployment**: Deploy a privileged pod on the specified node
2. **APerf Record**: Runs APerf record inside the pod with the specified options
Expand All @@ -145,7 +145,7 @@ The APerf report will be downloaded as a compressed tarball file with a timestam

Example of correct output execution of the script:
```bash
$ bash ./eks-aperf.sh --aperf_image="${APERF_ECRREPO}:latest" --namespace=aperf --node ip-10-0-120-104.us-west-2.compute.internal --aperf_options="-p 30 --profile"
$ ./kubectl-aperf --aperf_image="${APERF_ECRREPO}:latest" --namespace=aperf --node ip-10-0-120-104.us-west-2.compute.internal --aperf_options="-p 30 --profile"

Tageted node instance type... m6g.8xlarge
Check namespace security policy... Namespace 'aperf' has 'privileged' policy - privileged pods allowed.
Expand Down Expand Up @@ -198,6 +198,24 @@ Done!
- The pod is automatically cleaned up after execution


## Installing as a kubectl Plugin

You can install `kubectl-aperf` as a kubectl plugin to run it as `kubectl aperf` instead of just `./kubectl-aperf`.

To do so, run the following commands:

```bash
sudo mv kubectl-aperf /usr/local/bin/
kubectl plugin list
kubectl aperf --help
```

Now you can run it as:

```bash
kubectl aperf --aperf_image="${APERF_ECRREPO}:latest" --node="ip-10-0-120-104.us-west-2.compute.internal"
```

## Known Limitations

**Note**: The `--profile-java` option is not currently fully supported with this script.
125 changes: 95 additions & 30 deletions eks-aperf.sh → kubectl-aperf
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,9 @@ NAMESPACE="default"
APERF_OPTIONS=""
NODE_NAME=""
APERF_IMAGE=""
REPORT_NAME="aperf_record"
OPEN_BROWSER=true
SHOW_HELP=false
CPU_REQUEST="1.0"
MEMORY_REQUEST="1Gi"
CPU_LIMIT="4.0"
MEMORY_LIMIT="4Gi"

# Define color and formatting codes
BOLD="\033[1m"
Expand All @@ -34,10 +32,8 @@ while [ $# -gt 0 ]; do
--namespace) dest="NAMESPACE";;
--aperf_options) dest="APERF_OPTIONS";;
--aperf_image) dest="APERF_IMAGE";;
--cpu-request) dest="CPU_REQUEST";;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why you remove such options?

--memory-request) dest="MEMORY_REQUEST";;
--cpu-limit) dest="CPU_LIMIT";;
--memory-limit) dest="MEMORY_LIMIT";;
--report-name) dest="REPORT_NAME";;
--open-browser) dest="OPEN_BROWSER";;
--help)
SHOW_HELP=true
shift
Expand Down Expand Up @@ -68,10 +64,8 @@ if [ "$SHOW_HELP" = true ]; then
echo " --node Required. The name of the Kubernetes node to run aperf on"
echo " --namespace Optional. The Kubernetes namespace (default: '${NAMESPACE}')"
echo " --aperf_options Optional. Options to pass to aperf (default: '${APERF_OPTIONS}')"
echo " --cpu-request Optional. CPU request (default: '${CPU_REQUEST}')"
echo " --memory-request Optional. Memory request (default: '${MEMORY_REQUEST}')"
echo " --cpu-limit Optional. CPU limit (default: '${CPU_LIMIT}')"
echo " --memory-limit Optional. Memory limit (default: '${MEMORY_LIMIT}')"
echo " --report-name Optional. Name for aperf record/report (default: '${REPORT_NAME}')"
echo " --open-browser Optional. Open report in browser (default: ${OPEN_BROWSER})"
echo " --help Show this help message"
exit 0
fi
Expand All @@ -93,6 +87,43 @@ fi

POD_NAME="aperf-pod-${NODE_NAME//[.]/-}"

# Get node taints and generate tolerations
echo -e "${BOLD}Checking node taints...${NC}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goes to a new line, can we have the result on a single line?

bash ./kubectl-aperf --aperf_image="${APERF_ECRREPO}:latest"  --node="i-087ae37512508cca5"
Checking node taints...
  No taints found on node

TAINTS=$(kubectl get node ${NODE_NAME} -o jsonpath='{.spec.taints[*]}' 2>/dev/null)
Copy link
Contributor

@salvatoredipietro salvatoredipietro Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably need to add a check if the NODE exists before check the taint, otherwise if user enter a wrong node, it fails without a clear reason. What do you think?


TOLERATIONS=""
if [ -n "$TAINTS" ]; then
echo -e " ${YELLOW}Node has taints, adding tolerations to pod spec${NC}"

# Parse taints and create tolerations YAML
TOLERATIONS=" tolerations:"

# Get taints as JSON array and process each one
TAINT_COUNT=$(kubectl get node ${NODE_NAME} -o json | jq -r '.spec.taints | length' 2>/dev/null || echo "0")

for ((i=0; i<$TAINT_COUNT; i++)); do
KEY=$(kubectl get node ${NODE_NAME} -o json | jq -r ".spec.taints[$i].key" 2>/dev/null)
VALUE=$(kubectl get node ${NODE_NAME} -o json | jq -r ".spec.taints[$i].value" 2>/dev/null)
EFFECT=$(kubectl get node ${NODE_NAME} -o json | jq -r ".spec.taints[$i].effect" 2>/dev/null)

echo -e " Taint: ${KEY}=${VALUE}:${EFFECT}"

TOLERATIONS="${TOLERATIONS}
- key: \"${KEY}\""

if [ "$VALUE" != "null" ] && [ -n "$VALUE" ]; then
TOLERATIONS="${TOLERATIONS}
value: \"${VALUE}\""
fi

TOLERATIONS="${TOLERATIONS}
effect: \"${EFFECT}\"
operator: \"Equal\""
done
else
echo -e " ${GREEN}No taints found on node${NC}"
fi

# Create pod YAML as a variable
POD_YAML=$(cat << EOF
apiVersion: v1
Expand All @@ -104,6 +135,7 @@ metadata:
spec:
nodeSelector:
kubernetes.io/hostname: "${NODE_NAME}"
${TOLERATIONS}
containers:
- name: aperf-runner
image: ${APERF_IMAGE}
Expand All @@ -115,25 +147,17 @@ spec:
set -e

echo -e "\nStarting Aperf recording execution..."
echo "Run: /usr/bin/aperf record -r aperf_record ${APERF_OPTIONS}"
sudo /usr/bin/aperf record -r aperf_record ${APERF_OPTIONS}
echo "Run: /usr/bin/aperf record -r ${REPORT_NAME} ${APERF_OPTIONS}"
sudo /usr/bin/aperf record -r ${REPORT_NAME} ${APERF_OPTIONS}
echo "APerf record completed"

echo -e "\nStarting Aperf report generation..."
echo "Run: /usr/bin/aperf report -r aperf_record -n aperf_report"
sudo /usr/bin/aperf report -r aperf_record -n aperf_report
echo "Run: /usr/bin/aperf report -r ${REPORT_NAME} -n ${REPORT_NAME}_report"
sudo /usr/bin/aperf report -r ${REPORT_NAME} -n ${REPORT_NAME}_report
echo "APerf report generation completed"

echo -e "\nWaiting for files to be copied..."
sleep 7200

resources:
requests:
memory: "${MEMORY_REQUEST}"
cpu: "${CPU_REQUEST}"
limits:
memory: "${MEMORY_LIMIT}"
cpu: "${CPU_LIMIT}"
volumeMounts:
- mountPath: /boot
name: boot-volume
Expand Down Expand Up @@ -174,10 +198,13 @@ fi

# Show resource usage for pods on this node
echo -e "${BOLD}Resource usage for pods on ${NODE_NAME}:${NC}"
rm /tmp/allpods.out 2> /dev/null; \
kubectl top pods --all-namespaces > /tmp/allpods.out && \
head -n 1 /tmp/allpods.out && \
grep "$(kubectl get pods --all-namespaces --field-selector spec.nodeName=${NODE_NAME} -o jsonpath='{range .items[*]}{.metadata.name}{" "}{end}' | sed 's/[[:space:]]*$//' | sed 's/[[:space:]]/\\|/g')" /tmp/allpods.out --color=never
if kubectl top pods --all-namespaces > /tmp/allpods.out 2>/dev/null; then
head -n 1 /tmp/allpods.out
grep "$(kubectl get pods --all-namespaces --field-selector spec.nodeName=${NODE_NAME} -o jsonpath='{range .items[*]}{.metadata.name}{" "}{end}' | sed 's/[[:space:]]*$//' | sed 's/[[:space:]]/\\|/g')" /tmp/allpods.out --color=never || echo " No pods found on this node"
rm /tmp/allpods.out 2>/dev/null || true
else
echo " ${YELLOW}Note: kubectl top not available (metrics-server may not be installed)${NC}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting is wrong here and can we have it on a single line?

Resource usage for pods on i-087ae375125081ba5:
  \033[0;33mNote: kubectl top not available (metrics-server may not be installed)\033[0m

fi

# Create APerf pod
echo -e "\n${BOLD}Created pod configuration for node:${NC} ${NODE_NAME}${NC}"
Expand Down Expand Up @@ -215,13 +242,51 @@ done
kill $LOGS_PID 2>/dev/null || true

# Copy files from pod to local directory
LOCAL_FILE="aperf_report_${POD_STARTTIME}.tar.gz"
LOCAL_FILE="${REPORT_NAME}_${POD_STARTTIME}.tar.gz"
EXTRACT_DIR="${REPORT_NAME}_${POD_STARTTIME}"
echo -e "${NC}${BOLD}Aperf completed. Copying files from pod ${POD_NAME}...${NC}"
kubectl cp ${NAMESPACE}/${POD_NAME}:aperf_report.tar.gz ${LOCAL_FILE}
kubectl cp ${NAMESPACE}/${POD_NAME}:${REPORT_NAME}_report.tar.gz ${LOCAL_FILE}

# Delete the pod after copying files
echo -ne "${BOLD}Deleting pod to clean up resources...${NC} "
kubectl delete pod ${POD_NAME} -n ${NAMESPACE}

echo -e "${BOLD}${GREEN}Files copied to${NC} ${BLUE}${LOCAL_FILE}${NC}"

# Extract the tar.gz file
echo -e "${BOLD}Extracting report files...${NC}"
mkdir -p "${EXTRACT_DIR}"
tar -xzf "${LOCAL_FILE}" -C "${EXTRACT_DIR}"
echo -e " ${GREEN}Extracted to${NC} ${BLUE}${EXTRACT_DIR}/${NC}"

# Open index.html in browser if enabled
if [ "$OPEN_BROWSER" = true ]; then
INDEX_FILE="${EXTRACT_DIR}/${REPORT_NAME}_report/index.html"

if [ -f "$INDEX_FILE" ]; then
echo -e "${BOLD}Opening report in browser...${NC}"

# Detect OS and open browser accordingly
if [[ "$OSTYPE" == "darwin"* ]]; then
# macOS
open "$INDEX_FILE"
elif [[ "$OSTYPE" == "linux-gnu"* ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows support?

# Linux
if command -v xdg-open &> /dev/null; then
xdg-open "$INDEX_FILE"
elif command -v sensible-browser &> /dev/null; then
sensible-browser "$INDEX_FILE"
else
echo -e " ${YELLOW}Could not detect browser command. Please open manually:${NC} ${BLUE}${INDEX_FILE}${NC}"
fi
else
echo -e " ${YELLOW}Unsupported OS. Please open manually:${NC} ${BLUE}${INDEX_FILE}${NC}"
fi
else
echo -e " ${YELLOW}Warning: index.html not found at ${INDEX_FILE}${NC}"
echo -e " ${YELLOW}Extracted contents:${NC}"
ls -la "${EXTRACT_DIR}/"
fi
fi

echo -e "${BOLD}${GREEN}Done!${NC}"