Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 0 additions & 90 deletions rfc/.template.md

This file was deleted.

75 changes: 75 additions & 0 deletions rfc/bulksave.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# To improve the spinnaker sql storage performance by using bulk save api

| | |
|-|-|
| **Status** | _**Proposed**, Accepted, Implemented, Obsolete_ |
| **RFC #** | |
| **Author(s)** | Sanjeev Thatiparthi (@sanopsmx) (https://github.com/sanopsmx)
| **SIG / WG** | sig-security

## Overview

https://github.com/spinnaker/spinnaker/issues/6147

Spinnaker API currently supports saving one pipeline at a time. In order to save multiple pipelines,
the external system needs to iteratively call the save pipeline method. In a good deployment,
the pipeline save goes through multiple hops and can take 2 to 5s and can be slower in other deployments.
The overall time to update all the pipelines can run into hours.

### Goals and Non-Goals

To improve the performance of gate, orca and front50 microservices by using bulk insert query.
We need to write a bulk save api which accepts an list of pipelines in a single rest api call
and save them in front50 using bulk insert query.

## Motivation and Rationale

In many deployments, the pipelines are managed with templates and pipeline json structures are generated
from these templates based on enterprise requirements and then saved to Spinnaker using the API.
This operation occurs frequently enough that there is a requirement for efficiency in saving bulk pipelines in Spinnaker.
The changes to pipelines can occur in a Spinnaker deployments because of

1. Changes in policies associated with pipelines
2. Changes to metadata in the pipelines for better analytics of executed pipelines
3. Synchronizing pipelines in Spinnaker with Git spanning multiple applications

Applying the changes to pipelines with one pipeline at a time with currently available API takes hours to complete.
Since the operation of bulk updates is performed often and multiple organizations,
it makes sense to have Spinnaker support batch updates of pipelines.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you go into detail about what other approaches you've tried that lead to your need for a Bulk Save API? You mention that you're currently calling the API sequentially but have you tried calling the API concurrently? If so, what were the results? Have you tried using the savePipeline task in Orca and submitting multiple tasks? I'm interested to hear about any performance issues you've experienced using these approaches if you have. If not, why?

I believe I asked @emagana to include some of this information in the original design proposal.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add some helpful advice for future proposals, including data like that is critical for the community, not just the TOC, to evaluate the need for features, especially performance improvements. Understanding the full scope of what's been tried will enable us to either accept the proposal as is or offer alternative approaches and iterate on a design that suits the entire community. To be clear, I don't believe this is one of those cases. Just offering some advice on writing effective proposals :D.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#194

This has the history of the previous implementation.

@ajordens @robzienert
has suggested this approach.


## Timeline

Will be implemented in 1-2 weeks time from the approval of this governance PR.

## Design

We propose a solution of updating multiple pipelines in one API call.

We propose to use the batch update strategy of saving multiple pipelines in a single rest call.
The API will accept a list of pipelines and save it to a data store using bulk insert queries (SQL backend)
if supported else will fallback to existing one pipeline at a time.
The batch update API will also require service account configurations in the pipeline and
will not support dynamic service accounts for the pipelines. This allows for performance
improvement by the Gate API calling the front50 service without having to go through Orca.

We create a new rest api.This rest api will accept a list of pipelines as an array and pass the json to front50.

We use sql bulk insert to save the all the pipelines list to the database.

*Assumptions:*
This solution works only for RDBMS database store.
Save pipelines run in the context of an application, irrespective of the content of the pipeline.
In the case of bulk save, we will use the application name as “bulk save”.
The implication is that the tasks for bulk save will not show up under tasks in their respective applications.

*Performance Statistics*

These are the few performance stats collected using bulk insert queries( sql storage).

| Pipeline size | No. of pipelines | Total size | Round trip time in secs | Configuration changes | Comments |
|:-------------:|------------------|------------|-------------------------|---------------------------------------|----------|
| ~7.1 kb | 1000 | ~7.1 MB | 23.260 | okhttp3 default timeout of 10 secs. | |
| ~7.1 kb | 1450 | ~10.3 MB | 26.878 | okhttp3 default timeout of 10 secs. | |
| ~7.1 kb | 1757 | ~12.5 MB | 39.478 | okhttp3 default timeout of 10 secs. | |
| ~7.1 kb | 3513 | ~25 MB | 80.371 | Increased okhttp3 timeout to 60 secs. | |