Skip to content

Conversation

@yingzhanredhat
Copy link

@yingzhanredhat yingzhanredhat commented Oct 28, 2025

Add hyperfleet cluster and nodepool API critical user journey

@yingzhanredhat yingzhanredhat force-pushed the ying-cuj branch 5 times, most recently from 12da68f to bbea23e Compare November 4, 2025 03:03
"region": "us-east1",
"nodeCount": 3
},
"status": {
Copy link

@rafabene rafabene Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just add a comment that the status has more information as defined here: https://github.com/openshift-hyperfleet/architecture/pull/15/files#diff-68f9f8913b8bb6ffd0831939a808049c6972f7e4b634cf76283580c77362ee94R707

Example:

"status": {
  "phase": ""
  "phaseDescription": ""
  "conditions": {...}
  "adapters": {...}
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rafabene The logic should be like: create cluster with request body --> the return response has all the fileds,right ? What are the init values? (If the logic is right. and the init status si still in design progress, I can don't add the detailed content in the journey doc)

"status": {
  "phase": ""
  "phaseDescription": ""
  "conditions": {...}
  "adapters": {...}
}

HTTP 200 OK
{
"cluster_id": "550e8400-e29b-41d4-a716-446655440000",
"statuses": [
Copy link

@rafabene rafabene Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"region": "us-east1",
"nodeCount": 3
},
"status": {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

"region": "us-east1",
"nodeCount": 3
},
"status": {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here

"nodeCount": 3
},
"status": {
"phase": "Ready",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here

"nodeCount": 3
},
"status": {
"phase": "Ready",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here

"nodeCount": 2
},
"status": {
"phase": "Not Ready",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here

"nodeCount": 2
},
"status": {
"phase": "Not Ready",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here

"region": "us-east1",
"nodeCount": 3
},
"status": {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here

"region": "us-east1",
"nodeCount": 3
},
"status": {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here

"nodeCount": 5
},
"status": {
"phase": "Not Ready",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here

HTTP 200 OK
{
"cluster_id": "550e8400-e29b-41d4-a716-446655440000",
"statuses": [
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"machineType": "n1-standard-4"
},
"status": {
"phase": "Pending",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nodepools will also have aggregated status return

"machineType": "n1-standard-4"
},
"status": {
"phase": "Not Ready",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here

"machineType": "n1-standard-4"
},
"status": {
"phase": "Ready",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here

"machineType": "n1-standard-4"
},
"status": {
"phase": "Ready",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here

"machineType": "n1-highmem-8"
},
"status": {
"phase": "Not Ready",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here

"machineType": "n1-highmem-8"
},
"status": {
"phase": "Not Ready",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here

"nodeCount": 8,
"machineType": "n1-standard-8"
},
"status": {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here

"machineType": "n1-standard-8"
},
"status": {
"phase": "Terminating",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here

@rafabene
Copy link

rafabene commented Nov 4, 2025

After reviewing the status guide documentation, I've confirmed the
correct structure for both endpoints:

GET/POST /v1/clusters/{clusterId}/statuses → Uses

adapterStatuses

POST Request:

{
  "adapterStatuses": [
    {
      "adapter": "validation",
      "observedGeneration": 1,
      "conditions": [
        {
          "type": "Available",
          "status": "False",
          "reason": "JobRunning",
          "message": "Job is executing",
          "lastTransitionTime": "2025-10-17T12:00:05Z"
        },
        {
          "type": "Applied",
          "status": "True",
          "reason": "JobLaunched",
          "message": "Job created successfully",
          "lastTransitionTime": "2025-10-17T12:00:05Z"
        },
        {
          "type": "Health",
          "status": "True",
          "reason": "NoErrors",
          "message": "Adapter is healthy",
          "lastTransitionTime": "2025-10-17T12:00:05Z"
        }
      ],
      "data": {
        "validationResults": {
          "route53ZoneFound": true,
          "s3BucketAccessible": true
        }
      },
      "metadata": {
        "jobName": "validation-cls-123-gen1"
      },
      "lastUpdated": "2025-10-17T12:00:05Z"
    }
  ]
}

GET Response:
{
  "id": "status-cls-550e8400",
  "type": "clusterStatus",
  "href": "/api/hyperfleet/v1/clusters/cls-550e8400/statuses",
  "clusterId": "cls-550e8400",
  "adapterStatuses": [
    {
      "adapter": "validation",
      "observedGeneration": 1,
      "conditions": [
        {
          "type": "Available",
          "status": "True",
          "reason": "JobSucceeded",
          "message": "Job completed successfully after 115 seconds",
          "lastTransitionTime": "2025-10-17T12:02:00Z"
        },
        {
          "type": "Applied",
          "status": "True",
          "reason": "JobLaunched",
          "message": "Kubernetes Job created successfully",
          "lastTransitionTime": "2025-10-17T12:00:05Z"
        },
        {
          "type": "Health",
          "status": "True",
          "reason": "AllChecksPassed",
          "message": "All validation checks passed",
          "lastTransitionTime": "2025-10-17T12:02:00Z"
        }
      ],
      "data": {
        "validationResults": {
          "route53ZoneFound": true,
          "s3BucketAccessible": true,
          "quotaSufficient": true
        }
      },
      "metadata": {
        "jobName": "validation-cls-123-gen1",
        "executionTime": "115s"
      },
      "lastUpdated": "2025-10-17T12:02:00Z"
    }
  ],
  "lastUpdated": "2025-10-17T12:05:00Z"
}

✅ GET /v1/clusters/{clusterId} → Uses status

{
  "id": "cls-550e8400",
  "name": "my-cluster",
  "generation": 1,
  "spec": {
    "cloud": "aws",
    "region": "us-east-1",
    "domain": "example.com"
  },
  "metadata": {
    "createdAt": "2025-10-17T12:00:00Z",
    "updatedAt": "2025-10-17T12:05:00Z"
  },
  "status": {
    "phase": "Ready",
    "phaseDescription": "All required adapters completed successfully",
    "conditions": [
      {
        "type": "AllAdaptersReady",
        "status": "True",
        "reason": "AllRequiredAdaptersAvailable",
        "message": "All required adapters completed successfully",
        "lastTransitionTime": "2025-10-17T12:05:00Z"
      }
    ],
    "adapters": [
      {
        "name": "validation",
        "available": "True",
        "observedGeneration": 1
      },
      {
        "name": "dns",
        "available": "True",
        "observedGeneration": 1
      }
    ],
    "lastUpdated": "2025-10-17T12:05:00Z"
  }
}
Endpoint Field Name Content
/clusters/{id}/statuses adapterStatuses Detailed adapter statuses (full conditions, data, metadata)
/clusters/{id} status.adapters Aggregated summary (name, available, observedGeneration)

Key Points:

  • /statuses endpoint provides the detailed status resource with all
    adapter information
  • /clusters and /nodepools endpoint provides the cluster resource and nodepools resource with aggregated
    status summary


---

## Journey 5: Delete Cluster (Deprovisioning) (Post-MVP)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really a post MVP journey? Deleting a cluster seems like a critical path to prevent excessive resource accumulation during testing/development.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems yes. I got the information from the Jira description. https://issues.redhat.com/browse/HYPERFLEET-120


**System Response / User Sees:**
```json
HTTP 202 Accepted

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does a 204 make more sense here? I'm not sure we need to return the whole cluster object on a successful delete.

@@ -0,0 +1,1277 @@
# HyperFleet API - Customer Critical Journey

## Overview

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice if either in this section or an executive summary, we outlined at the top which user journeys are in scope and which are out of scope for MVP. That way we'll prevent people from scrolling down through all 10.

Copy link
Author

@yingzhanredhat yingzhanredhat Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree it,will add it


This document maps the critical user journeys for internal users interacting with the HyperFleet API. Each journey includes user actions, system responses, and the architectural components involved in processing the request.

> **Note:** All API request/response payloads shown in this document are still in design progress and not the final version. They will be updated once the API specification is published.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend either removing this if our payloads are finalized, or simplifying the sample responses in this document to be less specific if they're truly still in design.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will simplify the request and response body


## Persona

**Internal Platform Engineer**

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we expand a bit on this persona? I think we can add more depth for a CUJ here -

  • What is this person trying to achieve? ("provision clusters in <10 minutes with zero manual intervention")
  • What are their pain points? ("Managing multi-cloud clusters requires different tools/APIs for each provider")
  • What are their current tools/workflow? ("Uses cloud CLIs and terraform, takes 2 hours per cluster")
  • What is their success criteria? ("95% of clusters reach ready without manual intervention")


---

# Cluster Journeys

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could help to have a "pre-requisites" section here for things that happen before the user journey starts. For example, maybe things like "cloud provider credentials configured, HyperFleet deployed, API access set up", etc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that.I will get more detailed inforamtion from team member to know which sectors are required in MVP phase.

GET /api/hyperfleet/v1/clusters/550e8400-e29b-41d4-a716-446655440000/statuses
```

**System Response / User Sees (DNS Failure Example):**

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like both adapters completed successfully with all conditions showing "True". Let's update this to show the journey a bit more accurately.


---

## Journey 6: Cluster Access Control and Permissions(Out of MVP)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this "Post-MVP" to match the others?


---

## Journey 6: Cluster Access Control and Permissions(Out of MVP)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we elaborate a bit more on this journey? It does a good job of showing the access denial, but it'd be nice to have a complete workflow like the other journeys in this document. I'm thinking things like

  • How User A sets up or configures permissions
  • How user A would grant access to another user
  • How the newly-granted user can successfully access the cluster
  • The denial case

@AlexVulaj
Copy link

This is a comprehensive document with great detail, but it reads more like API specifications than a true CUJ document to me. It's also not immediately clear what's in scope for MVP vs Post-MVP, which I believe could be addressed with a quick summary up front.

Overall I think we could add more information about user goals, context, and expected outcomes. I'd like to see more narrative context (why is the user doing this? what problem are they solving?) - maybe some diagrams or flow charts could help there. We may even want to consider moving the detailed JSON payloads out to a separate API spec to really show an emphasis on the user journey overall.


## Overview

This document maps the critical user journeys for internal users interacting with the HyperFleet API. Each journey includes user actions, system responses, and the architectural components involved in processing the request.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we define what is an "internal user" ?


**Note on Cluster Phase Values:**
The HyperFleet API uses the following cluster phase values in `status.phase`:
- **`Pending`** - Cluster created, waiting for initial adapter processing

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think pending and not ready are really the same
I don't think the subtle difference of "waiting for initial adapter processing" is meaningful for the user

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- **`Terminating`** - Cluster deletion initiated, cleanup in progress

Adapter phase values (in `statuses` table):
- **`Pending`** - Adapter hasn't started processing yet

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think is not yet decided if creating a cluster will create "empty" statuses for all adapters in "pending" state.

IMO most probable, creating a cluster will have an empty array of statuses and it will be populated as these start reporting back the state.

The API, needs to know at least "how many adapters need to be complete to consider the cluste ready" ?

"spec": {
"provider": "gcp",
"region": "us-east1",
"nodeCount": 3

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NodePools will be created in a different API request.
We can create clusters with zero nodes.

So, the process will be:

  • Make a cluster request -> will return a clusterId
  • Make a nodepool request -> node creation will wait until cluster is ready

Copy link
Author

@yingzhanredhat yingzhanredhat Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Will remove all the node related configures from the cluster request body.
  2. Does it allow to create a nodepool to a not ready cluster ?

Comment on lines +99 to +101
"status": {
"phase": "Not Ready",
"lastTransitionTime": "2025-10-28T12:00:15Z"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cluster.status will contain also an array of adapter conditions.
BUT not all adapter conditions, only the one named available

Why is this?
If cluster response contains this info, we can reduce the number of API calls from adapters to know the individual state of each adapter available, which is used in their pre-conditions to know if they have to do some work.

If we don't send this info in the cluste response, the adapters have to fetch both the cluster and the statuses.

well, writing this I realize that the adapters may not even require the cluster info after all, but only the /clusters/{cluster_id}/statuses ..... mmm... we need to confirm this

@@ -0,0 +1,1277 @@
# HyperFleet API - Customer Critical Journey

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one suggestion for the document format.
As it is so big due to the many json examples, I tend to get lost into the details

It may be a good idea to use collapsed sections for the examples. wdyt?

https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/organizing-information-with-collapsed-sections

(tip: you can ask cursor/claude to create the details and also the summary automatically 😉)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.Let me update it


## Journey 3: List and Filter Clusters

### Step 1: List All Clusters

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Results will be paginated, here it shows the total property, but there will be also size and page


**User Action:**
```bash
GET /api/hyperfleet/v1/clusters?phase=Not%20Ready

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the query be referring to the full property path? status.phase="Not%20Ready"

```json
HTTP 201 Created
{
"id": "my-nodepool",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that my-nodepool will be the name, and the id will be a UUID?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants