Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ This changelog keeps track of work items that have been completed and are ready

### New

- **General**: TODO ([#TODO](https://github.com/kedacore/http-add-on/issues/TODO))
- **General**: Add environment variables for leader election timing configuration ([#1365](https://github.com/kedacore/http-add-on/pull/1365))

### Improvements

Expand Down
41 changes: 41 additions & 0 deletions docs/operate.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,3 +75,44 @@ Optional variables
The interceptor proxy can log incoming requests for debugging and monitoring purposes. Request logging can be enabled by setting the `KEDA_HTTP_LOG_REQUESTS` environment variable to `true` on the interceptor deployment (`false` by default).

### Configuring Service Failover

## Configuring the KEDA HTTP Add-on Operator

### Leader Election Timing

When running multiple replicas of the operator for high availability, you can configure the leader election timing parameters:

- **`KEDA_HTTP_OPERATOR_LEADER_ELECTION_LEASE_DURATION`** - Duration that non-leader candidates will wait to force acquire leadership. Default: `15s` (Kubernetes default)
- **`KEDA_HTTP_OPERATOR_LEADER_ELECTION_RENEW_DEADLINE`** - Duration that the acting leader will retry renewing leadership before giving up. Default: `10s` (Kubernetes default)
- **`KEDA_HTTP_OPERATOR_LEADER_ELECTION_RETRY_PERIOD`** - Duration the LeaderElector clients should wait between tries of actions. Default: `2s` (Kubernetes default)

Example usage in deployment:
```yaml
env:
- name: KEDA_HTTP_OPERATOR_LEADER_ELECTION_LEASE_DURATION
value: "30s"
- name: KEDA_HTTP_OPERATOR_LEADER_ELECTION_RENEW_DEADLINE
value: "20s"
- name: KEDA_HTTP_OPERATOR_LEADER_ELECTION_RETRY_PERIOD
value: "5s"
```

### Timing Parameter Constraints

**Important:** These parameters have strict ordering requirements to prevent leadership conflicts and unnecessary failover:

```
LeaseDuration > RenewDeadline > RetryPeriod
```

**Why this matters:**
- **LeaseDuration > RenewDeadline**: Ensures the leader finishes renewal attempts before the lease expires, preventing multiple active leaders (split-brain scenarios)
- **RenewDeadline > RetryPeriod**: Allows multiple retry attempts during the renewal window, preventing unnecessary leadership changes due to transient failures

**Configuration Guidelines:**

1. **Configure all three together**: When overriding any parameter, it's recommended to set all three values to avoid invalid combinations with defaults. Setting only one or two parameters can result in invalid configurations when mixed with default values.

2. **All values must be positive**: Each duration must be greater than 0.

3. **Validation failure**: If the operator detects an invalid configuration at startup, it will log an error and exit immediately before attempting leader election.
27 changes: 27 additions & 0 deletions operator/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ import (
httpv1alpha1 "github.com/kedacore/http-add-on/operator/apis/http/v1alpha1"
httpcontrollers "github.com/kedacore/http-add-on/operator/controllers/http"
"github.com/kedacore/http-add-on/operator/controllers/http/config"
"github.com/kedacore/http-add-on/pkg/util"
// +kubebuilder:scaffold:imports
)

Expand Down Expand Up @@ -86,6 +87,29 @@ func main() {
os.Exit(1)
}

leaseDuration, err := util.ResolveOsEnvDuration("KEDA_HTTP_OPERATOR_LEADER_ELECTION_LEASE_DURATION")
if err != nil {
setupLog.Error(err, "invalid KEDA_HTTP_OPERATOR_LEADER_ELECTION_LEASE_DURATION")
os.Exit(1)
}

renewDeadline, err := util.ResolveOsEnvDuration("KEDA_HTTP_OPERATOR_LEADER_ELECTION_RENEW_DEADLINE")
if err != nil {
setupLog.Error(err, "invalid KEDA_HTTP_OPERATOR_LEADER_ELECTION_RENEW_DEADLINE")
os.Exit(1)
}

retryPeriod, err := util.ResolveOsEnvDuration("KEDA_HTTP_OPERATOR_LEADER_ELECTION_RETRY_PERIOD")
if err != nil {
setupLog.Error(err, "invalid KEDA_HTTP_OPERATOR_LEADER_ELECTION_RETRY_PERIOD")
os.Exit(1)
}

if err := util.ValidateLeaderElectionConfig(leaseDuration, renewDeadline, retryPeriod); err != nil {
setupLog.Error(err, "invalid leader election configuration")
os.Exit(1)
}

var namespaces map[string]cache.Config
if baseConfig.WatchNamespace != "" {
namespaces = map[string]cache.Config{
Expand All @@ -103,6 +127,9 @@ func main() {
LeaderElection: enableLeaderElection,
LeaderElectionID: "http-add-on.keda.sh",
LeaderElectionReleaseOnCancel: true,
LeaseDuration: leaseDuration,
RenewDeadline: renewDeadline,
RetryPeriod: retryPeriod,
Cache: cache.Options{
DefaultNamespaces: namespaces,
},
Expand Down
128 changes: 128 additions & 0 deletions operator/main_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
/*
Copyright 2025 The KEDA Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package main

import (
"testing"
"time"

"github.com/stretchr/testify/assert"

"github.com/kedacore/http-add-on/pkg/util"
)

func TestLeaderElectionEnvVarsIntegration(t *testing.T) {
tests := []struct {
name string
envVars map[string]string
expectedLease *time.Duration
expectedRenew *time.Duration
expectedRetry *time.Duration
expectError bool
}{
{
name: "all environment variables set with valid values",
envVars: map[string]string{
"KEDA_HTTP_OPERATOR_LEADER_ELECTION_LEASE_DURATION": "30s",
"KEDA_HTTP_OPERATOR_LEADER_ELECTION_RENEW_DEADLINE": "20s",
"KEDA_HTTP_OPERATOR_LEADER_ELECTION_RETRY_PERIOD": "5s",
},
expectedLease: durationPtr(30 * time.Second),
expectedRenew: durationPtr(20 * time.Second),
expectedRetry: durationPtr(5 * time.Second),
expectError: false,
},
{
name: "no environment variables set - should return nil for defaults",
envVars: map[string]string{},
expectedLease: nil,
expectedRenew: nil,
expectedRetry: nil,
expectError: false,
},
{
name: "invalid lease duration",
envVars: map[string]string{
"KEDA_HTTP_OPERATOR_LEADER_ELECTION_LEASE_DURATION": "invalid",
},
expectError: true,
},
{
name: "invalid renew deadline",
envVars: map[string]string{
"KEDA_HTTP_OPERATOR_LEADER_ELECTION_RENEW_DEADLINE": "not-a-duration",
},
expectError: true,
},
{
name: "invalid retry period",
envVars: map[string]string{
"KEDA_HTTP_OPERATOR_LEADER_ELECTION_RETRY_PERIOD": "xyz",
},
expectError: true,
},
}

for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
for key, value := range tt.envVars {
t.Setenv(key, value)
}

leaseDuration, leaseErr := util.ResolveOsEnvDuration("KEDA_HTTP_OPERATOR_LEADER_ELECTION_LEASE_DURATION")
renewDeadline, renewErr := util.ResolveOsEnvDuration("KEDA_HTTP_OPERATOR_LEADER_ELECTION_RENEW_DEADLINE")
retryPeriod, retryErr := util.ResolveOsEnvDuration("KEDA_HTTP_OPERATOR_LEADER_ELECTION_RETRY_PERIOD")

if tt.expectError {
// At least one of the errors should be non-nil
hasError := false
if _, ok := tt.envVars["KEDA_HTTP_OPERATOR_LEADER_ELECTION_LEASE_DURATION"]; ok && leaseErr != nil {
hasError = true
}
if _, ok := tt.envVars["KEDA_HTTP_OPERATOR_LEADER_ELECTION_RENEW_DEADLINE"]; ok && renewErr != nil {
hasError = true
}
if _, ok := tt.envVars["KEDA_HTTP_OPERATOR_LEADER_ELECTION_RETRY_PERIOD"]; ok && retryErr != nil {
hasError = true
}
if !hasError {
t.Errorf("expected error but got none")
}
} else {
// No errors expected
if leaseErr != nil {
t.Errorf("unexpected error for lease duration: %v", leaseErr)
}
if renewErr != nil {
t.Errorf("unexpected error for renew deadline: %v", renewErr)
}
if retryErr != nil {
t.Errorf("unexpected error for retry period: %v", retryErr)
}

// Verify the parsed values match expectations
assert.Equal(t, tt.expectedLease, leaseDuration)
assert.Equal(t, tt.expectedRenew, renewDeadline)
assert.Equal(t, tt.expectedRetry, retryPeriod)
}
})
}
}

func durationPtr(d time.Duration) *time.Duration {
return &d
}
52 changes: 52 additions & 0 deletions pkg/util/env_resolver.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ limitations under the License.
package util

import (
"fmt"
"os"
"strconv"
"time"
Expand Down Expand Up @@ -52,3 +53,54 @@ func ResolveOsEnvDuration(envName string) (*time.Duration, error) {

return nil, nil
}

// Controller-runtime default values for leader election
// Reference: https://github.com/kubernetes-sigs/controller-runtime/blob/main/pkg/manager/manager.go
const (
defaultLeaseDuration = 15 * time.Second
defaultRenewDeadline = 10 * time.Second
defaultRetryPeriod = 2 * time.Second
)

// ValidateLeaderElectionConfig ensures LeaseDuration > RenewDeadline > RetryPeriod
// to prevent multiple active leaders and unnecessary leadership churn.
// This validation checks against the actual runtime values (user-provided or defaults)
// to catch invalid partial configurations.
func ValidateLeaderElectionConfig(leaseDuration, renewDeadline, retryPeriod *time.Duration) error {
// Resolve actual values that will be used at runtime (user-provided or defaults)
lease := defaultLeaseDuration
if leaseDuration != nil {
lease = *leaseDuration
}

renew := defaultRenewDeadline
if renewDeadline != nil {
renew = *renewDeadline
}

retry := defaultRetryPeriod
if retryPeriod != nil {
retry = *retryPeriod
}

// Validate all values are positive
if lease <= 0 {
return fmt.Errorf("lease duration must be greater than 0, got %v", lease)
}
if renew <= 0 {
return fmt.Errorf("renew deadline must be greater than 0, got %v", renew)
}
if retry <= 0 {
return fmt.Errorf("retry period must be greater than 0, got %v", retry)
}

// Validate relationships: LeaseDuration > RenewDeadline > RetryPeriod
if lease <= renew {
return fmt.Errorf("lease duration (%v) must be greater than renew deadline (%v)", lease, renew)
}
if renew <= retry {
return fmt.Errorf("renew deadline (%v) must be greater than retry period (%v)", renew, retry)
}

return nil
}
Loading
Loading