Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

functionaltests: add first test case #14935

Merged
merged 29 commits into from
Jan 17, 2025
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
874c46a
github-actions: support for functional-tests
v1v Dec 13, 2024
5ba3747
add functionaltests framework
endorama Dec 27, 2024
1707372
goimports
endorama Dec 27, 2024
265c38f
add license
endorama Dec 30, 2024
81f1b8b
remove apm-perf override
endorama Dec 30, 2024
8edc826
remove comment
endorama Dec 30, 2024
a490d4f
support prod region
endorama Dec 31, 2024
999d77a
production -> pro
endorama Dec 31, 2024
c14da87
fix production region
endorama Dec 31, 2024
b34e570
Merge branch 'main' into functionaltests
endorama Dec 31, 2024
8747630
Apply suggestions from code review
endorama Jan 7, 2025
a031acb
fix target cli flag
endorama Dec 31, 2024
bcc83e3
remove NewConfig
endorama Jan 8, 2025
06dd665
return errs from ingest
endorama Jan 8, 2025
e182ae3
require in ecAPICheck
endorama Jan 9, 2025
fd73107
wait until wrote
endorama Jan 9, 2025
410fab3
reuse ec_deployment infra tf module
endorama Dec 31, 2024
991a9f1
goimports
endorama Jan 9, 2025
34530df
fix RunBlockingWait
endorama Jan 9, 2025
28e468c
manually install terraform in ubuntu-latest
endorama Jan 9, 2025
c48e063
check all data streams doc count
endorama Jan 9, 2025
e00ef0a
add debugging info to assertDatastreams
endorama Jan 9, 2025
52a22a6
ac -> ecc
endorama Jan 15, 2025
3ae84c4
remove constants
endorama Jan 15, 2025
d6a6146
remove docsPerDatastream
endorama Jan 15, 2025
0a02a50
enable self monitoring for EC deployment
endorama Jan 16, 2025
db94233
assert elasticsearch logs
endorama Jan 16, 2025
b97cf14
remove print
endorama Jan 17, 2025
7e6ec11
Merge branch 'main' into functionaltests
endorama Jan 17, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions .github/workflows/functional-tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
---
name: functional-tests

on:
workflow_dispatch: ~
schedule:
- cron: '0 3 * * 1-5'

permissions:
contents: read
id-token: write

endorama marked this conversation as resolved.
Show resolved Hide resolved
jobs:
run:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
environment:
- 'qa'
- 'pro'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer prd (like the MKI cluster names) or prod rather than using pro, which I've never seen used anywhere. Is it just me?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I share the preference for another acronym, I supposed this could not be changed but let's check with @v1v . Is this something that can be changed? My understanding is that pro is part of the secret name we use in the pipeline, but I'm not sure if it's used elsewhere or it's a secret specific for this pipeline.

If it's specific, could we update it to prd or production?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have been using pro for a few years as far I as I recall. And we use pro in several places and changing might imply a little bit of effort and likely some breaking changes (so others will need to change it too). However let me cc @kuisathaverat

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sound pretty complex and time consuming. I have an easy workaround that could be applied in the GitHub Workflow itself: use an if on matrix.environment to trigger the secret retrieval with a different value. Low effort and would solve the main confusion, so I would proceed with it instead of having an extended effort to fix it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use everywhere pro, qa, and staging, changing it on the secrets meant we have to change it everywhere (oblt-framework, oblt-cli, oblt-robot, observability-test-environments) to be consistent

Copy link
Member Author

@endorama endorama Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then let's settle on using it only for the secret (via ifs in the pipeline) then use a more standard naming in the code. Not a big deal once it's documented. I'll take care of it.

steps:
- uses: actions/checkout@v4

- uses: actions/setup-go@v5
with:
go-version-file: 'functionaltests/go.mod'

- uses: elastic/oblt-actions/google/auth@v1

- uses: google-github-actions/get-secretmanager-secrets@e5bb06c2ca53b244f978d33348d18317a7f263ce # v2.2.2
with:
export_to_environment: true
secrets: |-
EC_API_KEY:elastic-observability/elastic-cloud-observability-team-${{ matrix.environment }}-api-key

- run: cd functionaltests && go test -v -timeout=20m -target "${{ matrix.environment }}" ./
endorama marked this conversation as resolved.
Show resolved Hide resolved

notify:
if: always()
runs-on: ubuntu-latest
needs:
- run
steps:
- id: check
uses: elastic/oblt-actions/check-dependent-jobs@v1
with:
jobs: ${{ toJSON(needs) }}
- uses: elastic/oblt-actions/slack/notify-result@v1
with:
bot-token: ${{ secrets.SLACK_BOT_TOKEN }}
channel-id: "#apm-server"
status: ${{ steps.check.outputs.status }}
144 changes: 144 additions & 0 deletions functionaltests/8_15_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
// Licensed to Elasticsearch B.V. under one or more contributor
// license agreements. See the NOTICE file distributed with
// this work for additional information regarding copyright
// ownership. Elasticsearch B.V. licenses this file to you under
// the Apache License, Version 2.0 (the "License"); you may
// not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

package functionaltests

import (
"context"
"testing"
"time"

"github.com/elastic/apm-server/functionaltests/internal/esclient"
"github.com/elastic/apm-server/functionaltests/internal/terraform"
"github.com/elastic/go-elasticsearch/v8/typedapi/types"

"github.com/stretchr/testify/require"
)

func TestUpgrade_8_15_4_to_8_16_0(t *testing.T) {
require.NoError(t, ecAPICheck(t))

start := time.Now()
ctx := context.Background()

t.Log("creating deploment with terraform")
tf, err := terraform.New(t, t.Name())
require.NoError(t, err)
ecTarget := terraform.Var("ec_target", *target)
ecRegion := terraform.Var("ec_region", regionFrom(*target))
version := terraform.Var("stack_version", "8.15.4")
name := terraform.Var("name", t.Name())
require.NoError(t, tf.Apply(ctx, ecTarget, ecRegion, version, name))
t.Logf("time elapsed: %s", time.Now().Sub(start))

t.Cleanup(func() {
if !t.Failed() || (t.Failed() && *cleanupOnFailure) {
t.Log("cleanup terraform resources")
require.NoError(t, tf.Destroy(ctx, ecTarget, ecRegion, name, version))
} else {
t.Log("test failed and cleanup-on-failure is false, skipping cleanup")
}
})

var deploymentID string
var escfg esclient.Config
tf.Output("deployment_id", &deploymentID)
tf.Output("apm_url", &escfg.APMServerURL)
tf.Output("es_url", &escfg.ElasticsearchURL)
tf.Output("username", &escfg.Username)
tf.Output("password", &escfg.Password)
tf.Output("kb_url", &escfg.KibanaURL)

t.Logf("created deployment %s", deploymentID)

ac, err := esclient.New(escfg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this called ac? Shouldn't it be called ec, esc or esClient?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in 52a22a6, this was a leftover. Choose ecc as Elastic Cloud client.

require.NoError(t, err)

t.Log("creating APM API key")
apikey, err := ac.CreateAPMAPIKey(ctx, t.Name())
require.NoError(t, err)

ingest(t, escfg.APMServerURL, apikey)

// Wait few seconds before proceeding to ensure ES indexed all our documents.
// Manual tests had failures due to only 4 data streams being reported
// when no delay was used. Manual inspection always revealed the correct
// number of data streams.
time.Sleep(1 * time.Minute)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I love the 1m wait approach. If we know ahead of time how many documents we are going to end up with, wouldn't it be best to ensure that the aggregate result of ApmDocCount() is at least that number? That'll avoid arbitrary waiting times here and there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The upside of this approach was that it had a very simple implementation. I refactored it to verify docs count in some data streams in fd73107 (#14935). Only data streams not related to aggregations are checked, as aggregations one have different numbers of documents across runs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also had to add a check on aggregations data streams, as test where sometimes failing. Added in c48e063 (#14935). As the number of document isn't know in advanced I used the docs count not changing as a proxy. In my test this prevented the failure in 1m aggregation data streams to happen again.


oldCount, err := ac.ApmDocCount(ctx)
require.NoError(t, err)

t.Log("check data streams")
var dss []types.DataStream
dss, err = ac.GetDataStream(ctx, "*apm*")
require.NoError(t, err)
assertDatastreams(t, checkDatastreamWant{
Quantity: 8,
PreferIlm: false,
DSManagedBy: "Data stream lifecycle",
IndicesPerDs: 1,
IndicesManagedBy: []string{"Data stream lifecycle"},
}, dss)
t.Logf("time elapsed: %s", time.Now().Sub(start))

t.Log("upgrade to 8.16.0")
require.NoError(t, tf.Apply(ctx, ecTarget, ecRegion, name, terraform.Var("stack_version", "8.16.0")))
t.Logf("time elapsed: %s", time.Now().Sub(start))

t.Log("check number of documents")
newCount, err := ac.ApmDocCount(ctx)
require.NoError(t, err)
assertDocCountEqual(t, oldCount, newCount)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we asserting that the previously indexed data isn't lost somehow? If that's the case or if I've missunderstood, can you add a comment to explain why this is necssary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's the reason. It's not necessary but helps ensuring that after upgrade the state didn't change and we can safely proceed with further ingestion and assertions. I added a clarifying comment in fd73107 (#14935)

We never expect this to fail unless something broke during the upgrade that affected APM data.


t.Log("check data streams after upgrade, no rollover expected")
dss, err = ac.GetDataStream(ctx, "*apm*")
require.NoError(t, err)
assertDatastreams(t, checkDatastreamWant{
Quantity: 8,
PreferIlm: false,
DSManagedBy: "Data stream lifecycle",
IndicesPerDs: 1,
IndicesManagedBy: []string{"Data stream lifecycle"},
}, dss)

ingest(t, escfg.APMServerURL, apikey)
time.Sleep(1 * time.Minute)

t.Log("check number of documents")
newCount2, err := ac.ApmDocCount(ctx)
require.NoError(t, err)
assertDocCountGreaterThan(t, oldCount, newCount2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greater than is fine but wouldn't it be best to assert the exact number of documents? Or is this something that we expect to change over time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As part of the refactoring in fd73107 (#14935) I also change this to assert the delta number of documents after the second ingestion run.


// Confirm datastreams are
// v managed by DSL if created after 8.15.0
// x managed by ILM if created before 8.15.0
t.Log("check data streams and verify lazy rollover happened")
dss2, err := ac.GetDataStream(ctx, "*apm*")
require.NoError(t, err)
assertDatastreams(t, checkDatastreamWant{
Quantity: 8,
PreferIlm: false,
DSManagedBy: "Data stream lifecycle",
IndicesPerDs: 2,
IndicesManagedBy: []string{"Data stream lifecycle", "Data stream lifecycle"},
}, dss2)
t.Logf("time elapsed: %s", time.Now().Sub(start))

// check ES logs, there should be no errors
// TODO: how to get these from Elastic Cloud? Is it possible?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could set up monitoring to self and the ES logs should end up in an index in this same cluster.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 great idea, will look into it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worked like a charm, implemented in 0a02a50 and db94233

}
27 changes: 27 additions & 0 deletions functionaltests/TestUpgrade_8_15_4_to_8_16_0/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
resource "ec_deployment" "example_minimal" {
endorama marked this conversation as resolved.
Show resolved Hide resolved
name = var.name

region = var.ec_region
version = var.stack_version
deployment_template_id = "aws-storage-optimized"

elasticsearch = {
hot = {
autoscaling = {}
}

ml = {
autoscaling = {
autoscale = true
}
}
}

kibana = {
topology = {}
}

integrations_server = {}

tags = merge(local.ci_tags, module.tags.tags)
}
23 changes: 23 additions & 0 deletions functionaltests/TestUpgrade_8_15_4_to_8_16_0/outputs.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
output "deployment_id" {
value = ec_deployment.example_minimal.id
}

output "apm_url" {
value = ec_deployment.example_minimal.integrations_server.endpoints.apm
}

output "es_url" {
value = ec_deployment.example_minimal.elasticsearch.https_endpoint
}

output "username" {
value = ec_deployment.example_minimal.elasticsearch_username
}
output "password" {
value = ec_deployment.example_minimal.elasticsearch_password
sensitive = true
}

output "kb_url" {
value = ec_deployment.example_minimal.kibana.https_endpoint
}
20 changes: 20 additions & 0 deletions functionaltests/TestUpgrade_8_15_4_to_8_16_0/tags.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
resource "time_static" "created_date" {}

locals {
ci_tags = {
environment = var.ENVIRONMENT
repo = var.REPO
branch = var.BRANCH
build = var.BUILD_ID
created_date = coalesce(var.CREATED_DATE, time_static.created_date.unix)
}
project = "apm-server-functionaltest"
}

module "tags" {
source = "../../testing/infra/terraform/modules/tags"
# use the convention for team/shared owned resources if we are running in CI.
# assume this is an individually owned resource otherwise.
project = local.project
}

21 changes: 21 additions & 0 deletions functionaltests/TestUpgrade_8_15_4_to_8_16_0/terraform.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
terraform {
required_version = ">= 0.12.29"

required_providers {
ec = {
source = "elastic/ec"
version = "0.12.2"
}
}
}

locals {
api_endpoints = {
qa = "https://public-api.qa.cld.elstc.co"
pro = "https://api.elastic-cloud.com"
}
}

provider "ec" {
endpoint = local.api_endpoints[var.ec_target]
}
47 changes: 47 additions & 0 deletions functionaltests/TestUpgrade_8_15_4_to_8_16_0/vars.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
variable "ec_target" {
type = string
description = "The Elastic Cloud environment to target"
validation {
condition = contains(["qa", "pro"], var.ec_target)
error_message = "Valid values are (qa, pro)."
}
}

variable "ec_region" {
type = string
description = "The Elastic Cloud region to target"
}

variable "name" {
type = string
description = "The deployment name"
}

variable "stack_version" {
type = string
description = "The Elasticsearch version to bootstrap"
}

# CI variables
variable "BRANCH" {
description = "Branch name or pull request for tagging purposes"
default = "unknown-branch"
}

variable "BUILD_ID" {
description = "Build ID in the CI for tagging purposes"
default = "unknown-build"
}

variable "CREATED_DATE" {
description = "Creation date in epoch time for tagging purposes"
default = ""
}

variable "ENVIRONMENT" {
default = "unknown-environment"
}

variable "REPO" {
default = "unknown-repo-name"
}
41 changes: 41 additions & 0 deletions functionaltests/go.mod
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
module github.com/elastic/apm-server/functionaltests

go 1.23.2

require (
github.com/elastic/apm-perf v0.0.0-20241230130730-2ad47482b731
github.com/elastic/go-elasticsearch/v8 v8.16.0
github.com/hashicorp/terraform-exec v0.21.0
github.com/stretchr/testify v1.10.0
go.uber.org/zap v1.27.0
)

require (
github.com/Microsoft/go-winio v0.6.2 // indirect
github.com/apparentlymart/go-textseg/v15 v15.0.0 // indirect
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/elastic/elastic-transport-go/v8 v8.6.0 // indirect
github.com/go-logr/logr v1.4.2 // indirect
github.com/go-logr/stdr v1.2.2 // indirect
github.com/hashicorp/go-version v1.6.0 // indirect
github.com/hashicorp/terraform-json v0.22.1 // indirect
github.com/klauspost/compress v1.17.11 // indirect
github.com/pkg/errors v0.9.1 // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
github.com/tidwall/gjson v1.18.0 // indirect
github.com/tidwall/match v1.1.1 // indirect
github.com/tidwall/pretty v1.2.1 // indirect
github.com/tidwall/sjson v1.2.5 // indirect
github.com/zclconf/go-cty v1.14.4 // indirect
go.elastic.co/apm/v2 v2.6.2 // indirect
go.elastic.co/fastjson v1.4.0 // indirect
go.opentelemetry.io/otel v1.32.0 // indirect
go.opentelemetry.io/otel/metric v1.32.0 // indirect
go.opentelemetry.io/otel/trace v1.32.0 // indirect
go.uber.org/multierr v1.11.0 // indirect
golang.org/x/crypto v0.28.0 // indirect
golang.org/x/sync v0.10.0 // indirect
golang.org/x/text v0.20.0 // indirect
golang.org/x/time v0.8.0 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
)
Loading
Loading