Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Epic] Extend Amun feature wise to be a testing platform for containers in a cluster #630

Open
fridex opened this issue Aug 11, 2021 · 8 comments
Labels
area/amun Issues or PRs related to Amun area/knowledge-graph Issues or PRs related to Knowledge Graph kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/user-experience Issues or PRs related to the User Experience of our Services, Tools, and Libraries. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@fridex
Copy link
Contributor

fridex commented Aug 11, 2021

Is your feature request related to a problem? Please describe.

As a developer, I would like to know how applications that are deployed into the cluster behave from different aspects so that I can observe what is happening to my application based on statistics produced by the service.

As a data scientist, I would like to have a unified report that can be analyzed from different points of view so that I can sport possible issues with the container image deployed in the cluster. To support this, I would like to have an ability to reuse jupyter notebooks that can automatically load reports produced by the service.

As we already have deployment and core features of amun in place, this is more about abstracting out some features and eventually provide more, such as GPU utilization for the container when it is run in a cluster.

Workflow:

  1. User submits Amun inspection with a pre-built container image respecting configuration supplied (node placement in the cluster, GPU requirements, CPU requirements, ...)
  2. Amun runs the application in a way user requests to do it (e.g. run a training phase of a machine learning model)
  3. Amun captures runtime statistics of the application (CPU utilization, process statistics from the PCB, GPU utilization, networking, ...) and reports them as a JSON
  4. A prepared jupyter notebook is used to automatically visualize statistics to users
  5. Users can spot issues, discrepancies, or other runtime characteristics from the report to analyze the application behavior in the cluster
@goern
Copy link
Member

goern commented Aug 13, 2021

/kind feature
/triage needs-information
/area amun
/area knowledge-graph

@sesheta sesheta added kind/feature Categorizes issue or PR as related to a new feature. triage/needs-information Indicates an issue needs more information in order to work on it. area/amun Issues or PRs related to Amun area/knowledge-graph Issues or PRs related to Knowledge Graph labels Aug 13, 2021
@sesheta
Copy link
Member

sesheta commented Sep 12, 2021

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

@sesheta sesheta closed this as completed Sep 12, 2021
@sesheta
Copy link
Member

sesheta commented Sep 12, 2021

@sesheta: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@fridex
Copy link
Contributor Author

fridex commented Sep 13, 2021

/reopen
/remove-lifecycle rotten

@sesheta
Copy link
Member

sesheta commented Sep 13, 2021

@fridex: Reopened this issue.

In response to this:

/reopen
/remove-lifecycle rotten

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sesheta sesheta reopened this Sep 13, 2021
@goern
Copy link
Member

goern commented Oct 6, 2021

/lifecycle frozen

@sesheta sesheta added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Oct 6, 2021
@goern goern removed the triage/needs-information Indicates an issue needs more information in order to work on it. label Oct 6, 2021
@codificat
Copy link
Member

/triage accepted
/priority backlog

@sesheta sesheta added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/backlog Higher priority than priority/awaiting-more-evidence. labels Oct 6, 2021
@codificat
Copy link
Member

From today's backlog refining now this might require some additional research on what/if/how to proceed.

Focus would be on the generated, user-facing and user accessible report.

/remove-triage accepted
/triage needs-information
/sig user-experience

@sesheta sesheta added triage/needs-information Indicates an issue needs more information in order to work on it. sig/user-experience Issues or PRs related to the User Experience of our Services, Tools, and Libraries. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Sep 19, 2022
@codificat codificat changed the title Extend Amun feature wise to be a testing platform for containers in a cluster [Epic] Extend Amun feature wise to be a testing platform for containers in a cluster Sep 19, 2022
@codificat codificat moved this to 🆕 New in Planning Board Sep 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/amun Issues or PRs related to Amun area/knowledge-graph Issues or PRs related to Knowledge Graph kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/user-experience Issues or PRs related to the User Experience of our Services, Tools, and Libraries. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
Status: 🆕 New
Development

No branches or pull requests

4 participants