[Blog post] AD imputation customer success story #4006

kaituo · 2025-11-17T18:09:43Z

Description

This PR contains the content for publishing the blog on AD imputation customer success story

Issues Resolved

Check List

Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the BSD-3-Clause License.

This PR contains the content for publishing the blog on AD imputation customer success story Signed-off-by: kaituo <[email protected]>

github-actions · 2025-11-17T18:09:53Z

Thank you for submitting a blog post!

The blog post review process is: Submit a PR -> (Optional) Peer review -> Doc review -> Editorial review -> Marketing review -> Published.

github-actions · 2025-11-17T18:09:54Z

Hi @kaituo,

It looks like you're adding a new blog post but don't have an issue mentioned. Please link this PR to an open issue using one of these keywords in the PR description:

Closes #issue-number
Fixes #issue-number
Resolves #issue-number

If an issue hasn't been created yet, please create one and then link it to this PR.

kaituo · 2025-11-17T18:22:09Z

Hi @kaituo,

It looks like you're adding a new blog post but don't have an issue mentioned. Please link this PR to an open issue using one of these keywords in the PR description:
* Closes #issue-number

* Fixes #issue-number

* Resolves #issue-number
If an issue hasn't been created yet, please create one and then link it to this PR.

added Closes #issue-number

Signed-off-by: kaituo <[email protected]>

pajuric · 2025-11-18T18:48:56Z

@kolchfa-aws - Adding you for tech review.

dbwiddis · 2025-11-18T18:45:01Z

_posts/2025-11-07-klarna-imputation.md

+## Introduction
+
+Anomaly detection in Amazon OpenSearch Service enables users to automatically identify unusual patterns and behaviors in their data streams. This powerful capability has become an essential tool for many organizations seeking to monitor system health, detect issues early, and maintain operational excellence.
+However, through continuous customer feedback and real-world usage, we have identified areas where the Anomaly Detection plugin could be further improved, particularly in how it handles scenarios with missing or insufficient input data.


"continuous customer feedback and real-world usage" seems a bit hand-wavy. This whole sentence could be more clearly focused on "identifying customer use cases which are not well handled" or some similar tone.

dbwiddis · 2025-11-18T18:46:54Z

_posts/2025-11-07-klarna-imputation.md

+- **`PREVIOUS` (last known value):** This is the best if you want to effectively ignore missing data by carrying the last observation forward.
+- **`ZERO` or `FIXED_VALUES`:** These methods are similar and should be used when you want missing data to be treated as a potential anomaly. By filling in a rare or out-of-range value (like zero or a specific constant), you make the imputed point stand out to the detector. This approach contrasts with `PREVIOUS`, which aims to make missing data blend in.
+
+### Algorithm sketch


I'm curious who the audience of this post is. Up until here it read like a layman's interpretation and I understood it well, then suddenly we're jumping into math. Is a blog post the right place for these equations? Is there a way to publish a smaller technical-focused blog and link to it from a layman's blog?

The target audience is engineers/scientists—both those who just want to know how to use imputation and advanced users who want to understand the underlying implementation. I don't know an option to publish a more technically focused blog. What I can do is to move most of the math into an appendix and where the main flow stays friendly to entry-level users.

dbwiddis · 2025-11-18T18:47:46Z

_posts/2025-11-07-klarna-imputation.md

+
+For all $t$, we have $0 \le f_t \le 1$ and $0 \le q_t \le 1$.
+
+*Proof.* In the binary model, $0 \le n^{\mathrm{imp}}_t \le L$ by construction, hence $0\le f_t\le 1$ and $q_t=1-f_t\in[0,1]$. In the fractional model, the window sum $S_t$ is the sum of the last $L$ mass terms, each of which lies in $[0,1]$. Therefore, $0 \le S_t \le L$, which implies $f_t \in [0,1]$ and $q_t \in [0,1]$.


Continuing my other comment, a mathematical "proof" really seems out of place in a non-technical-audience blog.

answered in previous comment.

dbwiddis · 2025-11-18T18:49:12Z

_posts/2025-11-07-klarna-imputation.md

+
+Also, because $q_t$ is always in the range $[0, 1]$, the smoothed statistic $\mathrm{DQ}_t$ is also guaranteed to remain within $[0, 1]$. This follows directly from the standard exponential‑smoothing recurrence, where the new value is a convex combination—i.e., a weighted average with nonnegative weights that sum to 1, so it lies between its inputs—of the previous smoothed value $\mathrm{DQ}_{t-1}$ and the current observation $q_t$ (specifically $(1-\lambda)\mathrm{DQ}_{t-1}+\lambda q_t$ with $0 \lt \lambda \lt 1$), ensuring it never leaves the bounds defined by the input signal (see Wikipedia's article on ["Convex combination"](https://en.wikipedia.org/wiki/Convex_combination#:~:text=As%20a%20particular%20example%2C%20every,1)). During a sustained period of missing data, as $f_t$ trends up, $q_t$ trends down, and $\mathrm{DQ}_t$ follows suit, decreasing smoothly. Conversely, when real data returns and $f_t$ trends down, $q_t$ trends up, and $\mathrm{DQ}_t$ reliably recovers towards 1. This ensures the gating mechanism, which relies on these signals, is stable and responds to persistent changes in data quality rather than short-term noise.
+
+## System architecture


here we get back into the practical nature of a scalable design, but i almost missed it skimming through math I didn't quite understand despite being a math major :)

feel free to comment on how to make the math more understandable:)

dbwiddis · 2025-11-18T18:53:03Z

_posts/2025-11-07-klarna-imputation.md

+
+## Conclusion
+
+Klarna’s experience underscored a simple but easily overlooked truth: in real-world monitoring, **“no data” is sometimes the most important data point of all**. By treating silent intervals as a first-class signal rather than a gap to ignore, we were able to close a blind spot where critical outages could otherwise slip by undetected.


I thought about commenting near the top when "no data" was first introduced, but it's really just too vague a term and I'm not sure it should be the key word you focus on. There are really multiple types of "no data"

data missing because you didn't collect it (not the data's problem, truly missing and we shouldn't impute anything)

data missing because that's the anomaly itself (the focus here, possibly could mention survivorship bias and the need to impute "something")

data present that's just the baseline (treated as "no anomaly" and provides useful data in a Bayesian context)

I think the attempt to squeeze too much out of "no data" doesn't address the complexities of data quality well enough.

I tried to distinguish the first two cases you mentioned in the Solution part. Let me know what you think.

Signed-off-by: kaituo <[email protected]>

dbwiddis

LGTM! Thanks for addressing my observations!

kaituo · 2025-11-19T00:24:22Z

@kolchfa-aws We are ready for Doc review.

Signed-off-by: kaituo <[email protected]>

Signed-off-by: Fanit Kolchina <[email protected]>

_posts/2025-11-07-klarna-imputation.md

Signed-off-by: kolchfa-aws <[email protected]>

natebower

Editorial review

_community_members/patrikblomberg.md

_posts/2025-11-07-klarna-imputation.md

Signed-off-by: Nathan Bower <[email protected]>

natebower

Thanks @kaituo! LGTM

@pajuric This should be ready to publish.

kaituo · 2025-11-20T16:23:13Z

@pajuric Please hold off on publishing for now, as we’re waiting for approval from the customer’s leadership team.

Signed-off-by: Kaituo Li <[email protected]>

kaituo · 2025-11-21T01:53:40Z

@natebower I updated photo link. Do you mind approving again?

natebower

LGTM

pajuric · 2025-12-09T23:06:41Z

Per @kaituo - blog is still holding for customer approval.

AD imputation customer success story

39399c1

This PR contains the content for publishing the blog on AD imputation customer success story Signed-off-by: kaituo <[email protected]>

kaituo requested review from AMoo-Miki, CEHENKLE, elfisher, kolchfa-aws, krisfreedain, natebower, nateynateynate, nknize and peterzhuamazon as code owners November 17, 2025 18:09

apply edits according to Sean

82749b8

Signed-off-by: kaituo <[email protected]>

kaituo changed the title ~~AD imputation customer success story~~ [Blog post] AD imputation customer success story Nov 18, 2025

dbwiddis reviewed Nov 18, 2025

View reviewed changes

kolchfa-aws added New blog New blog post Tech review The blog is under tech review labels Nov 18, 2025

kolchfa-aws self-assigned this Nov 18, 2025

address comments from Dan

01dafbd

Signed-off-by: kaituo <[email protected]>

dbwiddis approved these changes Nov 19, 2025

View reviewed changes

kaituo and others added 2 commits November 19, 2025 12:56

address Sean's comments

defdc3e

Signed-off-by: kaituo <[email protected]>

Doc review

ed0aaae

Signed-off-by: Fanit Kolchina <[email protected]>

kolchfa-aws reviewed Nov 20, 2025

View reviewed changes

_posts/2025-11-07-klarna-imputation.md Show resolved Hide resolved

Update _posts/2025-11-07-klarna-imputation.md

cc4496d

Signed-off-by: kolchfa-aws <[email protected]>

kolchfa-aws added Editorial review The blog is under editorial review and removed Tech review The blog is under tech review labels Nov 20, 2025

kolchfa-aws assigned natebower Nov 20, 2025

kolchfa-aws removed their assignment Nov 20, 2025

natebower reviewed Nov 20, 2025

View reviewed changes

natebower removed the Editorial review The blog is under editorial review label Nov 20, 2025

Apply suggestions from code review

2729ab9

Signed-off-by: Nathan Bower <[email protected]>

natebower previously approved these changes Nov 20, 2025

View reviewed changes

natebower added Done and ready to publish The blog is approved and ready to publish and removed New blog New blog post labels Nov 20, 2025

natebower assigned pajuric and unassigned natebower Nov 20, 2025

update photo link

f50536b

Signed-off-by: Kaituo Li <[email protected]>

kaituo dismissed natebower’s stale review via f50536b November 21, 2025 01:53

natebower approved these changes Nov 21, 2025

View reviewed changes


		For all $t$, we have $0 \le f_t \le 1$ and $0 \le q_t \le 1$.

		Proof. In the binary model, $0 \le n^{\mathrm{imp}}_t \le L$ by construction, hence $0\le f_t\le 1$ and $q_t=1-f_t\in[0,1]$. In the fractional model, the window sum $S_t$ is the sum of the last $L$ mass terms, each of which lies in $[0,1]$. Therefore, $0 \le S_t \le L$, which implies $f_t \in [0,1]$ and $q_t \in [0,1]$.


		Also, because $q_t$ is always in the range $[0, 1]$, the smoothed statistic $\mathrm{DQ}_t$ is also guaranteed to remain within $[0, 1]$. This follows directly from the standard exponential‑smoothing recurrence, where the new value is a convex combination—i.e., a weighted average with nonnegative weights that sum to 1, so it lies between its inputs—of the previous smoothed value $\mathrm{DQ}_{t-1}$ and the current observation $q_t$ (specifically $(1-\lambda)\mathrm{DQ}_{t-1}+\lambda q_t$ with $0 \lt \lambda \lt 1$), ensuring it never leaves the bounds defined by the input signal (see Wikipedia's article on ["Convex combination"](https://en.wikipedia.org/wiki/Convex_combination#:~:text=As%20a%20particular%20example%2C%20every,1)). During a sustained period of missing data, as $f_t$ trends up, $q_t$ trends down, and $\mathrm{DQ}_t$ follows suit, decreasing smoothly. Conversely, when real data returns and $f_t$ trends down, $q_t$ trends up, and $\mathrm{DQ}_t$ reliably recovers towards 1. This ensures the gating mechanism, which relies on these signals, is stable and responds to persistent changes in data quality rather than short-term noise.

		## System architecture


		## Conclusion

		Klarna’s experience underscored a simple but easily overlooked truth: in real-world monitoring, “no data” is sometimes the most important data point of all. By treating silent intervals as a first-class signal rather than a gap to ignore, we were able to close a blind spot where critical outages could otherwise slip by undetected.

[Blog post] AD imputation customer success story #4006

Are you sure you want to change the base?

[Blog post] AD imputation customer success story #4006

Uh oh!

Conversation

kaituo commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issues Resolved

Check List

Uh oh!

github-actions bot commented Nov 17, 2025

Uh oh!

github-actions bot commented Nov 17, 2025

Uh oh!

kaituo commented Nov 17, 2025

Uh oh!

pajuric commented Nov 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dbwiddis left a comment

Choose a reason for hiding this comment

Uh oh!

kaituo commented Nov 19, 2025

Uh oh!

Uh oh!

natebower left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

natebower left a comment

Choose a reason for hiding this comment

Uh oh!

kaituo commented Nov 20, 2025

Uh oh!

kaituo commented Nov 21, 2025

Uh oh!

natebower left a comment

Choose a reason for hiding this comment

Uh oh!

pajuric commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kaituo commented Nov 17, 2025 •

edited

Loading