Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial must-gather base code #1

Merged
merged 2 commits into from
Jan 23, 2023
Merged

Conversation

reginapizza
Copy link
Contributor

@reginapizza reginapizza commented Jan 13, 2023

Initial base for the gitops-must-gather

How to test:

Since the repo with the code base is my fork of redhat-developer/gitops-must-gather, the image that is built from it is in my quay repo, so please log onto your cluster and then run oc adm must-gather --image=quay.io/rescott/gitops-must-gather:latest. Your files will be created in your current folder under must-gather.local.[number]

@reginapizza reginapizza changed the title Initial must-gather base code [DRAFT] Initial must-gather base code Jan 13, 2023
@reginapizza reginapizza force-pushed the main branch 11 times, most recently from 303ff63 to 0524a5a Compare January 16, 2023 03:14
@reginapizza reginapizza changed the title [DRAFT] Initial must-gather base code Initial must-gather base code Jan 16, 2023
gather_gitops.sh Outdated Show resolved Hide resolved
Copy link
Contributor

@drpaneas drpaneas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @reginapizza !! 💪

There are couple of places we need to polish this a bit, but nonetheless it's a very good first step.

  • See my individual PR comments in the code and let me know if you they don't make sense.
  • The important issue is about fixing the double-quote escape character escape issues (just fire your commands one by one and see if they are working properly) as they don't work atm. You can also consider using a different function for each of those, to make the code easier to read and maintain e.g. getEvents, getApplicationSets, etc..
  • Other things that can be improved here is the error handling as it doesn't container any. It would be great if you could incorporate that, so in the end there is a errors.txt file containing the oc commands that failed to be executed. You. can also take it one step further and try to re-try these failed commands once again, in case there was a temporary glitch or connection issue (with the cluster).
  • Also, this is a must-gather script for OpenShift clusters only, not Kubernetes in general. It would be great if you could add a check if the cluster is an OpenShift cluster or not and print a warning message (e.g. "this cluster is not supported by the must-gather-gitops tool. This is a best effort").
  • Completely optional: it would be great if the user (engineer) has the option to change the output directory, where all the files and logs will be gathered. Currently the script saves the collected information in hardcoded folder. Adding an option to set the output folder with the command-line flag argument can make this more flexible. But again, this is just optional and can be done later. Not a prio.

Dockerfile Outdated
RUN chmod +x /usr/bin/gather

# install jq to parse json within bash scripts
RUN curl -o /usr/local/bin/jq http://stedolan.github.io/jq/download/linux64/jq && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to fetch jq, because it's already provided by the base quay.io/openshift/origin-must-gather:4.6 image ;) See:

$ docker run -it quay.io/openshift/origin-must-gather bash -c 'rpm -qf `which jq`'
jq-1.6-3.el8.x86_64

FYI:

In general is not a good practice to fetch binaries from upsream and inject them in this way. This is because it can introduce security vulnerabilities, and it can also make the image build process slower and less reliable. When a package is installed via a package manager, it is usually tested and verified by the package maintainer, which means that it is more likely to be secure and stable. Additionally, packages installed via a package manager are usually updated and maintained, which can help to ensure that security vulnerabilities are addressed in a timely manner. When a binary is fetched from the internet, it is not guaranteed that it will be secure or stable, and it is also not guaranteed that it will be updated or maintained. Therefore, it is best practice to install packages via a package manager in production environments.

Copy link
Contributor Author

@reginapizza reginapizza Jan 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried removing it, but when I do that I run into the error POD /usr/bin/gather: line 21: jq: command not found :/ is there a better way I could do it besides fetching jq in the Dockerfile?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably due us using an outdated version of the must-gather base image:

$ docker run --rm -it quay.io/openshift/origin-must-gather:4.6 jq
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "jq": executable file not found in $PATH: unknown.

vs using the 4.11 version:

$ docker run --rm -it quay.io/openshift/origin-must-gather:4.11 jq
jq - commandline JSON processor [version 1.6]

Usage:	jq [options] <jq filter> [file...]
	jq [options] --args <jq filter> [strings...]
	jq [options] --jsonargs <jq filter> [JSON_TEXTS...]

jq is a tool for processing JSON inputs, applying the given filter to
its JSON text inputs and producing the filter's results as JSON on
standard output.

The simplest filter is ., which copies jq's input to its output
unmodified (except for formatting, but note that IEEE754 is used
for number representation internally, with all that that implies).

For more advanced filters see the jq(1) manpage ("man jq")
and/or https://stedolan.github.io/jq

Example:

	$ echo '{"foo": 0}' | jq .
	{
		"foo": 0
	}

For a listing of options, use jq --help.

Is there a specific reason we use the 4.6 image?

Copy link
Contributor

@drpaneas drpaneas Jan 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess Regina picked 4.6 because that's what Istio uses. I see no reason why we shouldn't use the latest. Good catch Jann!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated origin-must-gather:4.11 image, but now I'm getting an error Error from server (Forbidden): pods "must-gather-ll5hh" is forbidden: violates PodSecurity "restricted:latest": allowPrivilegeEscalation != false (containers "gather", "copy" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers "gather", "copy" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or containers "gather", "copy" must set securityContext.runAsNonRoot=true), seccompProfile (pod or containers "gather", "copy" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like the cluster logging operator also ran into this problem and made this PR to fix it, should I also follow the same steps to resolve it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update: I have updated the base image to openshift-must-gather:4.13 just to keep it most up-to-date. The problem is still persisting but it's happening only on 4.13 clusters. At the moment, all other custom must-gathers I've tried also run into the same problem on 4.13 clusters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created this issue for this bug, as discussed over slack we can address this issue in a later PR as 4.13 still has a while until it's GA.

Makefile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
Makefile Show resolved Hide resolved
gather_gitops.sh Outdated Show resolved Hide resolved
gather_gitops.sh Outdated Show resolved Hide resolved
gather_gitops.sh Outdated Show resolved Hide resolved
gather_gitops.sh Outdated Show resolved Hide resolved
gather_gitops.sh Outdated Show resolved Hide resolved
@reginapizza reginapizza force-pushed the main branch 12 times, most recently from 1b0aefe to 104283e Compare January 17, 2023 20:33
Makefile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
Copy link
Member

@jannfis jannfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff, @reginapizza! I have not tested it locally yet (but definitely will do tomorrow!), but I had a couple of nitpicking remarks below.

gather_gitops.sh Outdated Show resolved Hide resolved
gather_gitops.sh Outdated Show resolved Hide resolved
@drpaneas
Copy link
Contributor

drpaneas commented Jan 18, 2023

@reginapizza for every command you run that has 2>&1 in the end, if the commands fails to get executed, please put it in the errors file. for example:

failed commands:
 * oc get batman -n gotham-city
 * oc get secret bruceID
 * etc

This is useful because a. it checks what works and what not and puts it in a single place to look for. So it makes easier to see if there are any problems with the commands. It's actually useful for reviewing this PR as well. b. if one of the commands starts failing in the future, we will know about it.

@jannfis WDYT? If not, then I would suggest to drop the 2>&1. We need one way to see what operations failed to be gathered.

Makefile Outdated Show resolved Hide resolved
@reginapizza reginapizza force-pushed the main branch 7 times, most recently from b91055e to 23345f4 Compare January 19, 2023 20:31

**Test acceptance criteria**:

* [ ] Unit Test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we actually going to write unit-tests and e2e tests for the must-gather?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not at the moment but ideally we will have some, I've created #4

gather_gitops.sh Outdated Show resolved Hide resolved
gather_gitops.sh Outdated Show resolved Hide resolved
gather_gitops.sh Outdated Show resolved Hide resolved
gather_gitops.sh Outdated Show resolved Hide resolved
Copy link
Contributor

@drpaneas drpaneas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci
Copy link

openshift-ci bot commented Jan 20, 2023

@drpaneas: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jannfis
Copy link
Member

jannfis commented Jan 20, 2023

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Jan 20, 2023
@drpaneas drpaneas merged commit 5e3bc95 into redhat-developer:main Jan 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants