-
Notifications
You must be signed in to change notification settings - Fork 209
NO-ISSUE: OTA-1605 Create MustGetKubeClient function for e2e test #1249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
a6a8ada to
37caff4
Compare
a79646a to
d944000
Compare
|
Please note that
|
d944000 to
067cd4b
Compare
|
@JianLi-RH: This pull request explicitly references no jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Tested on my local machine: Run the case: |
067cd4b to
bac7793
Compare
|
/cc |
|
/cc |
.openshift-tests-extension/openshift_payload_cluster-version-operator.json
Show resolved
Hide resolved
bac7793 to
51facb3
Compare
|
After modification, the case is still valid: |
51facb3 to
8e9da36
Compare
.openshift-tests-extension/openshift_payload_cluster-version-operator.json
Outdated
Show resolved
Hide resolved
72354d1 to
3a8631b
Compare
|
/testwith openshift/cluster-version-operator/main/e2e-agnostic-ovn http://openshift/origin#30316 |
|
@hongkailiu, |
|
/testwith openshift/cluster-version-operator/main/e2e-agnostic-ovn openshift/origin#30316 |
@hongkailiu Thanks for getting into consideration! My comment was raising the risk of changing test coverage, since the risk has been assessed to be manageable, it seems safe to go. |
|
/test e2e-aws-ovn-techpreview |
|
/testwith openshift/cluster-version-operator/main/e2e-agnostic-ovn openshift/origin#30316 |
|
@DavidHurta @hongkailiu The ote test case passed in https://prow.ci.openshift.org/view/gs/test-platform-results/logs/multi-pr-openshift-cluster-version-operator-1249-openshift-origin-30316-e2e-agnostic-ovn/1986710370059816960 |
|
/hold I have some requests for changes. I am explicitly holding the PR while I review the full PR. |
.openshift-tests-extension/openshift_payload_cluster-version-operator.json
Outdated
Show resolved
Hide resolved
test/cvo/cvo.go
Outdated
| client = utilities.MustGetKubeClient() | ||
| }) | ||
|
|
||
| g.It(`should not install resources annotated with release.openshift.io/delete=true`, g.Label("Conformance", "High", "42543"), func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that the 42543 label represents the polarion ID test case. Is that correct? If so, is it common or expected to write these IDs in labels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is polarion ID, I have no idea if it is common but I think it is helpful for us when we want to read the case from polarion.
| @@ -1,12 +1,39 @@ | |||
| package cvo | |||
|
|
|||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment not related to the code
I will hold us to a higher standard regarding the git hygiene, if I may.
Please see the Contributing guide for the cluster-version-operator repository.
I am mainly concerned about the following points:
-
Make commits of logical units.
-
Make sure your commit messages are in the proper format (see below).
- The PR title.
Currently, the PR consists of a single commit jianl - First e2e test, which consists of multiple logical units, and the commit title does not follow the recommended format. While PR titles are not explicitly mentioned in the guide, a PR title should state what a PR distinctly and briefly does.
Please keep in mind that these guidelines have their practical purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the commit, because this is the first OTE PR, it contains many logic, so the description may not describes everything. I will keep PR simple and describe the logic in the future.
test/cvo/cvo.go
Outdated
| }) | ||
| }) | ||
|
|
||
| var _ = g.Describe(`[Jira:"Cluster Version Operator"] The cluster version operator`, g.Ordered, g.Label("cvo"), func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the cvo label commonly used in other places as well? We already specify the component via [Jira:<Component>]. As such, I am interested in the purpose of the label or its usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
label and [Jira:] are two difference things.
We can use label to restrict which case can be included in a binary, for example:
specs = specs.Select(extensiontests.HasLabel("cvo"))
And we can also set platform for special test cases to restrict their execution, for example restrict 42543 run on GCP:
specs = specs.Select(extensiontests.HasLabel("42543")).Include(extensiontests.PlatformEquals("gcp"))
Label also gives us the possibility that work with many different teams/binaries, for example we want to run all LEVEL-0 cases, we can use specs.AddLabel(“LEVEL0"), for more details, you can refer: https://github.com/openshift/enhancements/blob/master/enhancements/testing/openshift-tests-extension.md#info---extension-metadata
In short, labels gives us more possibilities.
test/cvo/cvo.go
Outdated
| client = utilities.MustGetKubeClient() | ||
| }) | ||
|
|
||
| g.It(`should not install resources annotated with release.openshift.io/delete=true`, g.Label("Conformance", "High", "42543"), func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the High label represent, please? Can you link me a reference/docs for the label?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High is a property of the case on polarion, with it we can create test suites with same level, then we can run test suite by case level.
For example, we want to run all High level test cases on a special version, then we can follow below steps:
ext.AddSuite(extension.Suite{
Name: "high level test cases",
Parents: []string{"openshift/conformance"},
Qualifiers: []string{`"High" in labels`},
})
Then run it:
$ _output/linux/amd64/cluster-version-operator-tests run-suite "high level test cases"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"High" in labels means the High keyword in labels list, labels can be found by _output/linux/amd64/cluster-version-operator-tests list, for example:
$ _output/linux/amd64/cluster-version-operator-tests list
[
{
"name": "[Jira:\"Cluster Version Operator\"] cluster-version-operator-tests should support passing tests the sanity test should pass",
"labels": {},
"resources": {
"isolation": {}
},
"source": "openshift:payload:cluster-version-operator",
"lifecycle": "blocking",
"environmentSelector": {}
},
{
"name": "[Jira:\"Cluster Version Operator\"] The cluster version operator should not install resources annotated with release.openshift.io/delete=true",
"labels": {
"42543": {},
"Conformance": {},
"High": {},
"cvo": {}
},
"resources": {
"isolation": {}
},
"source": "openshift:payload:cluster-version-operator",
"lifecycle": "blocking",
"environmentSelector": {}
}
]
Similarly, we can use name.contains("[Serial]") in Qualifiers.
| [ | ||
| { | ||
| "name": "[Jira:Cluster Version Operator] cluster-version-operator-tests should support passing tests", | ||
| "name": "[Jira:\"Cluster Version Operator\"] cluster-version-operator-tests should support passing tests should support passing tests", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note the manually deleted and recreated metadata file in a separate commit. The file should only be updated via the make update. We are intentionally bypassing this as of the moment, but that should be clearly communicated in a commit description, as in "we know what we are doing, in the future, do not do that, do it properly or else history is lost". Otherwise, we may risk a higher chance of invalid changes in the future from people misunderstanding a commit change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will be updating the README file to clearly state this for the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I am not quite understand your concern.
Are you saying that we should use a separate commit to change the matedata file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Commits should be of logical units. Such a change deserves a separate commit with a disclaimer due to the mentioned concerns.
For example, see my past cfab71d commit that renamed a test this way.
To better understand why I advocate for this, see my past comment, which cites the metadata validation from the OTE enhancement.
| ) | ||
|
|
||
| var _ = Describe("[Jira:Cluster Version Operator] cluster-version-operator-tests", func() { | ||
| It("should support passing tests", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. It helps with the integration. It also provides a signal of the integration working in the future. Such sanity tests are also in several other OpenShift repositories that already integrated the OTE or are in the process of the integration.
test/cvo/cvo.go
Outdated
| _, err = client.CoreV1().Services("openshift-cloud-credential-operator").Get(context.TODO(), "controller-manager-service", metav1.GetOptions{}) | ||
| o.Expect(kerrors.IsNotFound(err)).To(o.BeTrue(), "The NotFound error should occur") | ||
|
|
||
| g.By("checking if ClusterRoleBindings default-account-openshift-machine-config-operator does not exist") | ||
| _, err = client.RbacV1().ClusterRoleBindings().Get(context.TODO(), "default-account-openshift-machine-config-operator", metav1.GetOptions{}) | ||
| o.Expect(kerrors.IsNotFound(err)).To(o.BeTrue(), "The NotFound error should occur") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am afraid such test logic is not sufficient. Please correct me if I am wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing for the existence of hard-coded objects we expect to have a manifest annotation release.openshift.io/delete: "true" may work for the current minor release; however, there is no guarantee that such tests would cover the expected functionality in the next minor release or any future minor release.
For example:
4.y: A team has decided for the next release to stop creating a specific deployment.4.(y+1): The deployment manifest now has the delete annotation.4.(y+2): The team can safely remove the entire manifest.
The written test logic would keep passing even though the hard-coded objects are not present, not due to the delete annotation; however, due to the manifests being gone. As such, the test would not test the expected should not install resources annotated with release.openshift.io/delete=true functionality and would provide a false passing signal.
For more information regarding the delete annotation and the mentioned potential issue, see the Subsequent Releases Strategy section of the Manifest Annotation For Object Deletion developer guide.
As of the moment, I personally can't think of an alternative solution while not using the oc CLI. Thus, I am leaving the resolution open. Maybe it's worth revisiting the discussion regarding importing origin utilities?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right, there are potential issues like you said. Right now it is a technical blocker for me, see my comments in:
#1249 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally (like we did before), we loop over manifests extracted from a payload, find the ones with the annotation, and ensure the non-installation.
Before a better solution (something like above) is available, the current solution is fine with me.
My reasoning:
Say the manifest is not even included in the payload in a future minor.
Our test still passes (good) but it basically checks nothing (bad).
In that bad case, we should update the manifests to check in the CURRENT minor version. If nothing is there, then remove the test case.
This refresh of the list should be done automatically but we need a more powerful tool for it.
If we find that the annotation comes and goes very often on the manifests over minor versions, we could do this (just throwing out an idea into the air, still not depending on oc-cli):
- Make a script to extract the manifests with annotation
- Save the list as a file and the test loads it and check the manifests.
- When the first build of each minor version is available, we update the list. (So once in a quarter, not too much of a burden)
I do not see we need to invest in that direction right now. We can always do it when we feel needing it in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should update the manifests to check in the CURRENT minor version.
The manifests can also be added and removed via backports, thus affecting all existing minor versions.
If we find that the annotation comes and goes very often on the manifests over minor versions
Who would be monitoring this?
When the first build of each minor version is available, we update the list.
Again, the manifests can also be removed and added via backports.
I do not see we need to invest in that direction right now.
Me as well, but for other reasons.
This means we would need to monitor all minor releases periodically via automation, as our attention is better utilized on more important efforts. This, in conclusion, means at minimum an initial development cost for the automation tooling and its maintenance.
Otherwise, the current solution may theoretically stop covering the expected functionality as soon as the day after merging. As a result, we, the OTA team, cannot rely on the test to cover the expected functionality. Then we would have a passing signal for a functionality, and we cannot depend on the signal being useful. That’s a rather large regression in the migrated test. I would personally rather drop the test case than go this route.
However, we still have the option to loop through all manifests and thus not change the covered functionality. This could be done by implementing the original logic, not using the oc CLI, or by importing origin packages. This could be done using a Go library for processing container images to fetch and extract the release image. We can then loop through the manifests as done originally. This would be done in collaboration with the software and quality engineers. No regression in covered functionality, no extreme imports hopefully, no implementation of automation for a single test case.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will have to face this issue, whenever you migrate an existing case it uses oc in the implementation and you do not find an easy workaround for it. I honestly do not know how many such cases there are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will have to face this issue, whenever you migrate an existing case it uses oc in the implementation and you do not find an easy workaround for it. I honestly do not know how many such cases there are.
@hongkailiu @DavidHurta This issue not only blocked me but also blocked all team to start their work because they are waiting for the MustGetKubeClient() ready to use, that's why I said don't waist time one my case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced Kubernetes/client-go by Openshift/client-go, Openshift/client-go is more easy to use and gives us many efficent functions to manipulate Cluster resources.
One thing I don't solved is that all above client could not get clusterversionoperator/cluster which is the log level object. cc @DavidHurta @hongkailiu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding how to list the manifest files, I took a quick look at the oc repository, it looks very complicated.
I prefer @hongkailiu 's Opt1 and Opt2, BTW I have another option: let's list some important resources by resource type then check all resources, for example :
g.It("demo", func() {
client := utilities.MustGetClient()
auths, _ := client.ConfigV1().Authentications().List(context.TODO(), metav1.ListOptions{})
for _, auth := range auths.Items {
if slices.Contains(auth.Annotations, "release.openshift.io/delete" {
panic("xxxxxx")
}
}
})There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the manifest with the annotation is removed.
This has a very low frequency, last time we hit the issue is at 4.20, the manifest with "release.openshift.io/delete" was introduced in 4.7 or 4.8, and the file was removed in 4.20.
80af1c8 to
56a29e0
Compare
|
@hongkailiu @DavidHurta I just removed the case, right now the PR only finished one thing: MustGetKubeClient(). |
|
/retest |
|
/hold I want to investigate more about client-go. |
@JianLi-RH, in case the investigation is not done by the end of the day, you want simply more time to explore or are simply busy, I would like to ask you whether it would be fine for me to rename the sanity test in a separate PR. That specific change is blocking my openshift/origin#30316 PR. I am happy to rename the test personally in a separate PR to unblock my PR while we are working on this one. |
|
@DavidHurta of course, go ahead, please. I am not sure when I can finish the investigation. |
…create a clientset instance. Clientset instance can be used to connect to cluster, manage resources. With clientset instance we can evaluate the posibility of moving other tests to OTE. The reference to OTE framework: https://docs.google.com/document/d/1cFZj9QdzW8hbHc3H0Nce-2xrJMtpDJrwAse9H7hLiWk/edit?tab=t.0#heading=h.8cf3f4eii1q8
afebe69 to
d2300d9
Compare
openshift/client-go gives us more efficient functions to access cluster
d2300d9 to
f98e03c
Compare
|
/unhold |
|
@JianLi-RH: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
In this PR, I finished 2 things:
/cc @wking @DavidHurta @PratikMahajan @hongkailiu @jhou1 @dis016 @jiajliu