-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial must-gather base code #1
Conversation
303ff63
to
0524a5a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @reginapizza !! 💪
There are couple of places we need to polish this a bit, but nonetheless it's a very good first step.
- See my individual PR comments in the code and let me know if you they don't make sense.
- The important issue is about fixing the double-quote escape character escape issues (just fire your commands one by one and see if they are working properly) as they don't work atm. You can also consider using a different function for each of those, to make the code easier to read and maintain e.g.
getEvents
,getApplicationSets
, etc.. - Other things that can be improved here is the error handling as it doesn't container any. It would be great if you could incorporate that, so in the end there is a
errors.txt
file containing theoc
commands that failed to be executed. You. can also take it one step further and try to re-try these failed commands once again, in case there was a temporary glitch or connection issue (with the cluster). - Also, this is a must-gather script for OpenShift clusters only, not Kubernetes in general. It would be great if you could add a check if the cluster is an OpenShift cluster or not and print a warning message (e.g. "this cluster is not supported by the must-gather-gitops tool. This is a best effort").
- Completely optional: it would be great if the user (engineer) has the option to change the output directory, where all the files and logs will be gathered. Currently the script saves the collected information in hardcoded folder. Adding an option to set the output folder with the command-line flag argument can make this more flexible. But again, this is just optional and can be done later. Not a prio.
Dockerfile
Outdated
RUN chmod +x /usr/bin/gather | ||
|
||
# install jq to parse json within bash scripts | ||
RUN curl -o /usr/local/bin/jq http://stedolan.github.io/jq/download/linux64/jq && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need to fetch jq
, because it's already provided by the base quay.io/openshift/origin-must-gather:4.6
image ;) See:
$ docker run -it quay.io/openshift/origin-must-gather bash -c 'rpm -qf `which jq`'
jq-1.6-3.el8.x86_64
FYI:
In general is not a good practice to fetch binaries from upsream and inject them in this way. This is because it can introduce security vulnerabilities, and it can also make the image build process slower and less reliable. When a package is installed via a package manager, it is usually tested and verified by the package maintainer, which means that it is more likely to be secure and stable. Additionally, packages installed via a package manager are usually updated and maintained, which can help to ensure that security vulnerabilities are addressed in a timely manner. When a binary is fetched from the internet, it is not guaranteed that it will be secure or stable, and it is also not guaranteed that it will be updated or maintained. Therefore, it is best practice to install packages via a package manager in production environments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried removing it, but when I do that I run into the error POD /usr/bin/gather: line 21: jq: command not found
:/ is there a better way I could do it besides fetching jq
in the Dockerfile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably due us using an outdated version of the must-gather base image:
$ docker run --rm -it quay.io/openshift/origin-must-gather:4.6 jq
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "jq": executable file not found in $PATH: unknown.
vs using the 4.11 version:
$ docker run --rm -it quay.io/openshift/origin-must-gather:4.11 jq
jq - commandline JSON processor [version 1.6]
Usage: jq [options] <jq filter> [file...]
jq [options] --args <jq filter> [strings...]
jq [options] --jsonargs <jq filter> [JSON_TEXTS...]
jq is a tool for processing JSON inputs, applying the given filter to
its JSON text inputs and producing the filter's results as JSON on
standard output.
The simplest filter is ., which copies jq's input to its output
unmodified (except for formatting, but note that IEEE754 is used
for number representation internally, with all that that implies).
For more advanced filters see the jq(1) manpage ("man jq")
and/or https://stedolan.github.io/jq
Example:
$ echo '{"foo": 0}' | jq .
{
"foo": 0
}
For a listing of options, use jq --help.
Is there a specific reason we use the 4.6
image?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess Regina picked 4.6 because that's what Istio uses. I see no reason why we shouldn't use the latest. Good catch Jann!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated origin-must-gather:4.11 image, but now I'm getting an error Error from server (Forbidden): pods "must-gather-ll5hh" is forbidden: violates PodSecurity "restricted:latest": allowPrivilegeEscalation != false (containers "gather", "copy" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers "gather", "copy" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or containers "gather", "copy" must set securityContext.runAsNonRoot=true), seccompProfile (pod or containers "gather", "copy" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like the cluster logging operator also ran into this problem and made this PR to fix it, should I also follow the same steps to resolve it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update: I have updated the base image to openshift-must-gather:4.13 just to keep it most up-to-date. The problem is still persisting but it's happening only on 4.13 clusters. At the moment, all other custom must-gathers I've tried also run into the same problem on 4.13 clusters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created this issue for this bug, as discussed over slack we can address this issue in a later PR as 4.13 still has a while until it's GA.
1b0aefe
to
104283e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff, @reginapizza! I have not tested it locally yet (but definitely will do tomorrow!), but I had a couple of nitpicking remarks below.
@reginapizza for every command you run that has
This is useful because a. it checks what works and what not and puts it in a single place to look for. So it makes easier to see if there are any problems with the commands. It's actually useful for reviewing this PR as well. b. if one of the commands starts failing in the future, we will know about it. @jannfis WDYT? If not, then I would suggest to drop the |
b91055e
to
23345f4
Compare
|
||
**Test acceptance criteria**: | ||
|
||
* [ ] Unit Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we actually going to write unit-tests and e2e tests for the must-gather?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not at the moment but ideally we will have some, I've created #4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
@drpaneas: changing LGTM is restricted to collaborators In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/lgtm |
Initial base for the gitops-must-gather
How to test:
Since the repo with the code base is my fork of redhat-developer/gitops-must-gather, the image that is built from it is in my quay repo, so please log onto your cluster and then run
oc adm must-gather --image=quay.io/rescott/gitops-must-gather:latest
. Your files will be created in your current folder undermust-gather.local.[number]