Add DynamoDB item Time to Live #99

Dnomyar · 2021-10-11T16:16:21Z

Hi folks,

I'd like to propose a change to add a TTL to each item to leverage DynamoDB's feature. I am not aware of any discussion about this subject.

To add more details, we've added some optional configuration for journal and snapshot tables to specify the TTL:

my-dynamodb-journal { // also available for snapshots tables
  dynamodb-item-ttl-config {
    field-name = "expiresAt" // in the AWS UI, we need to set the field name https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/time-to-live-ttl-how-to.html
    ttl = 30d
  }
}

With this configuration, it would add a field "expiresAt" for each journal item inserted with value now + 30d.

To facilitate the review process, this can be separated into two parts:

the first part contains the bulk of the work: handling configuration & adding the field
the second part contains the part that updates the alert related to the item size (<400k). Correct me if I am wrong, but I haven't found any similar alert for the snapshots.

I also tested it against dynamo in AWS:

Note: I am checking with my company for the CLA

Cheers

…nd fix and a potential bug

…conds for clarity and add comment

lightbend-cla-validator · 2021-10-11T16:16:22Z

At least one pull request committer is not linked to a user. See https://help.github.com/en/articles/why-are-my-commits-linked-to-the-wrong-user#commits-are-not-linked-to-any-user

coreyoconnor

Thanks!
This looks cool. :) Definitely useful to have.
Note: "TTL" stands for "Time To Live" not "Time To Leave"

My biggest concern is that the TTL expire and removal is eventually consistent across multiple partitions.

This is related to how TTL is implemented in dynamodb. The docs for TTL state "Shortly after the date and time of the specified timestamp". The TTL is implemented by a per-partition background scan. This means that items are not actually deleted in global TTL order but only per-partition. Which may result in odd recover behavior.

Similarly, the TTL docs have this note: "Items that have expired, but haven’t yet been deleted by TTL, still appear in reads, queries, and scans."

Suppose the log (in insert time order) for a persistent entity is (E - event, S - snapshot):
E0, E1, E2, S0, E3, E4, S1.

As the Es may be on multiple partitions the TTL expire may result in:

E0, E2, S0, E3, E4, S1.

Note that E1 was expired prior to E0. Which is entirely possible given the description from Amazon on how TTL is implemented.

Presuming the above problem is accurate, some mitigations:

A separate TTL for snapshots. If the journal TTL can be configured to be a bit larger than the snapshot any out-of-order TTL removal of journal item can be prior to the snapshot item. Which should be safe for recovery.
Checking the expireAt during read.
Document this risk. This behavior may be totally fine for some applications.

A similar concern related to usage with existing systems:

What happens if this is enabled on an existing system. Suppose an existing system has persisted events and snapshots. Then the developers configure the system to use a TTL. What will happen during recovery? In the case of no snapshot, would the recovery fail?

This also applies the TTL for all items using DDB persistence. Is there a way to configure this per persistent entity?

I don't think any of the above is blocking. Documentation, warning and asserts could be added that ensure it's reasonably safe.

I'm going to consider this further before approval.

coreyoconnor · 2021-10-15T05:50:51Z

src/main/scala/akka/persistence/dynamodb/DynamoDBTTLConfig.scala

+  def readTTLConfig(c: Config): Option[DynamoDBTTLConfig] = {
+    for {
+      fieldName <- Try(c getString configFieldName).toOption
+      if fieldName.trim.nonEmpty


this guard uses fieldName.trim but below the fieldName is not trimmed. Consider a map after toOption: toOption.map(_.trim).

Good catch! Should be good now.

coreyoconnor · 2021-10-15T05:53:05Z

src/main/scala/akka/persistence/dynamodb/journal/DynamoDBJournalRequests.scala


-      fut.map { serialized =>
+      serializePersistentRepr(reprPayload, serializer).map { serialized =>


nice simplification!

lightbend-cla-validator · 2021-10-19T13:27:19Z

At least one pull request committer is not linked to a user. See https://help.github.com/en/articles/why-are-my-commits-linked-to-the-wrong-user#commits-are-not-linked-to-any-user

lightbend-cla-validator · 2021-10-19T13:36:52Z

At least one pull request committer is not linked to a user. See https://help.github.com/en/articles/why-are-my-commits-linked-to-the-wrong-user#commits-are-not-linked-to-any-user

lightbend-cla-validator · 2021-10-19T13:39:29Z

At least one pull request committer is not linked to a user. See https://help.github.com/en/articles/why-are-my-commits-linked-to-the-wrong-user#commits-are-not-linked-to-any-user

lightbend-cla-validator · 2021-10-19T13:41:03Z

At least one pull request committer is not linked to a user. See https://help.github.com/en/articles/why-are-my-commits-linked-to-the-wrong-user#commits-are-not-linked-to-any-user

Dnomyar · 2021-10-19T13:53:47Z

README.md

+
+Note that `E1` was expired prior to `E0`. Which is entirely possible given the description from Amazon on how TTL is implemented.
+
+As a consequence, a separate TTL for snapshot should be preferred. The journal TTL should be configured to be a bit larger than the snapshot. Thus, any out-of-order TTL removal of journal item can be prior to the snapshot item.


Could you help me understand exactly what having the snapshot TTL a bit larger than the journal one would change?
Let's say we have:

E1, E2, E3, S1, E4, E5

If the journal TTL is lower than the snapshot TTL, E4 and E5 could expire before S1 and thus information could be lost.

Yes. That is correct.

Considering this further: I don't think that requirement: "the snapshot TTL < journal TTL" is sufficient by itself. There also needs to be consideration of the snapshot interval when defining the TTL.

The property I'm thinking is valuable: The persistent entity can be recovered by replaying the events since any persisted snapshot.

So, if there is a sequence like E1, E2, S1, E3, E4, S2, E5 for a persistent entity. Then an equal entity state can be recovered from the replay S1, E3, E4, E5 or S2, E5.

I think the minimum requirement is:

snapshot interval much less than snapshot TTL

With the requirement:

snapshot TTL less than journal TTL. At least 48hrs less.

being necessary only to ensure the entity can be recovered by replaying the events since any persisted snapshot.

Whew! Lots of consideration for a user of this but still valuable to have the TTL. Especially since even a nice TTL of like 20d is going to be much, much larger than the snapshot interval. The potential inconsistency seems really unlikely to occur in a real-world application?

EG: suppose the journal TTL was less than the snapshot interval. This satisfies the requirement "snapshot TTL less than journal TTL", but not the property above.

Prior to any TTL expiry:

E1, E2, E3, E4

Note no snapshot yet.... Expiry and delete occurs for E1 and E2:

E3, E4

Oops! We can no longer recover the persistent entity state.

Compare to the snapshot interval << journal TTL:

E1, E2, E3, E4

Expiring hasn't occurred yet. Snapshot interval must occur first (per given above):

E1, E2, E3, E4, S1

Expiry and delete occurs for E1 and E2:

E3, E4, S1

Still safe: The persistent entity can be entirely recovered from S1. Expiry and delete occurs for E3 and E4:

S1

Still safe: The persistent entity can be entirety recovered from S1.

Dnomyar · 2021-10-19T13:55:48Z

Thanks very much for your detailed feedback @coreyoconnor!

I pushed a few commits to try to address the feedback. I added a section in the README to describe how to use this and what are the things to be cautious about.

There is one thing I am not sure about in the readme. See my comment https://github.com/akka/akka-persistence-dynamodb/pull/99/files#r731892138

EDIT: Also, do you think Checking the expireAt during read. should be part of this PR or are you happy to raise an issue to implement that later on?

coreyoconnor · 2021-12-01T16:42:16Z

Thank you for your patience. I'm working to review this this week. Looks like I still have some CI fixes to push first.

coreyoconnor

Add a note to the docs that an appropriate TTL also depends on snapshot interval. Then LGTM! Thanks!

I still have to fix CI before merging tho. Working on that.

coreyoconnor · 2021-12-05T07:06:44Z

README.md

+
+Note that `E1` was expired prior to `E0`. Which is entirely possible given the description from Amazon on how TTL is implemented.
+
+As a consequence, a separate TTL for snapshot should be preferred. The journal TTL should be configured to be a bit larger than the snapshot. Thus, any out-of-order TTL removal of journal item can be prior to the snapshot item.


Yes. That is correct.

Considering this further: I don't think that requirement: "the snapshot TTL < journal TTL" is sufficient by itself. There also needs to be consideration of the snapshot interval when defining the TTL.

The property I'm thinking is valuable: The persistent entity can be recovered by replaying the events since any persisted snapshot.

So, if there is a sequence like E1, E2, S1, E3, E4, S2, E5 for a persistent entity. Then an equal entity state can be recovered from the replay S1, E3, E4, E5 or S2, E5.

I think the minimum requirement is:

snapshot interval much less than snapshot TTL

With the requirement:

snapshot TTL less than journal TTL. At least 48hrs less.

being necessary only to ensure the entity can be recovered by replaying the events since any persisted snapshot.

Whew! Lots of consideration for a user of this but still valuable to have the TTL. Especially since even a nice TTL of like 20d is going to be much, much larger than the snapshot interval. The potential inconsistency seems really unlikely to occur in a real-world application?

EG: suppose the journal TTL was less than the snapshot interval. This satisfies the requirement "snapshot TTL less than journal TTL", but not the property above.

Prior to any TTL expiry:

E1, E2, E3, E4

Note no snapshot yet.... Expiry and delete occurs for E1 and E2:

E3, E4

Oops! We can no longer recover the persistent entity state.

Compare to the snapshot interval << journal TTL:

E1, E2, E3, E4

Expiring hasn't occurred yet. Snapshot interval must occur first (per given above):

E1, E2, E3, E4, S1

Expiry and delete occurs for E1 and E2:

E3, E4, S1

Still safe: The persistent entity can be entirely recovered from S1. Expiry and delete occurs for E3 and E4:

S1

Still safe: The persistent entity can be entirety recovered from S1.

coreyoconnor · 2021-12-05T07:07:29Z

README.md

+Before enabling the TTL feature, make sure to have an understanding of the potential impacts detailed below.
+
+#### Don't rely on the TTL for business logic related to the expiration
+As per specified in the [AWS DynamoDB configuration](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/howitworks-ttl.html),


Nice description of the design considerations required!

coreyoconnor · 2021-12-05T07:12:52Z

README.md

+
+Note that `E1` was expired prior to `E0`. Which is entirely possible given the description from Amazon on how TTL is implemented.
+
+As a consequence, a separate TTL for snapshot should be preferred. The journal TTL should be configured to be a bit larger than the snapshot. Thus, any out-of-order TTL removal of journal item can be prior to the snapshot item.


With the above consideration I'd rewrite: "As a consequence, a separate TTL for snapshot should be preferred. The journal TTL should be configured to be a bit larger than the snapshot. Thus, any out-of-order TTL removal of journal item can be prior to the snapshot item."

To something like:

As a consequence, configuring the TTLs requires careful consideration. For an entity to be recovered from the persisted snapshot and and journal events the snapshot interval must be less than snapshot TTL. Further, to ensure the useful property: The entity can be recovered by replaying the events since any persisted snapshot. The journal TTL should be larger than the snapshot TTL.

coreyoconnor · 2021-12-05T07:14:49Z

README.md

+
+In DynamoDB, this feature can be activated in a per-table basis by specifying the name of the field containing the expiring date.  
+
+Expiring items is available for journal and snapshot tables. In order to activate it, `dynamodb-item-ttl-config.field-name` and `dynamodb-item-ttl-config.ttl` need to be specified:


"need to be specified. For example, given a system with a snapshot interval of 10 days. The TTLs could be configured:"

lightbend-cla-validator · 2022-01-11T12:01:34Z

At least one pull request committer is not linked to a user. See https://help.github.com/en/articles/why-are-my-commits-linked-to-the-wrong-user#commits-are-not-linked-to-any-user

# Conflicts: # src/main/scala/akka/persistence/dynamodb/DynamoDBConfig.scala # src/main/scala/akka/persistence/dynamodb/journal/DynamoDBJournalRequests.scala # src/main/scala/akka/persistence/dynamodb/snapshot/DynamoDBSnapshotConfig.scala

lightbend-cla-validator · 2022-01-11T12:08:27Z

At least one pull request committer is not linked to a user. See https://help.github.com/en/articles/why-are-my-commits-linked-to-the-wrong-user#commits-are-not-linked-to-any-user

lightbend-cla-validator · 2022-01-11T12:13:23Z

At least one pull request committer is not linked to a user. See https://help.github.com/en/articles/why-are-my-commits-linked-to-the-wrong-user#commits-are-not-linked-to-any-user

Damien Raymond added 16 commits October 6, 2021 15:10

Add config for item expiration: field name and ttl value

af6ae8e

Implement inserting item with the ttl field from configuration

5e21bb8

Extract the ttl related configuration stuff into and different file a…

129d6d0

…nd fix and a potential bug

add missing assert

55520d0

Add the ttl logic for snapshot tables

914ad08

Refactor: rename maybeTTLConfig to TTLConfig and use pattern matching

f66462b

Refactor: rename getItemExpiryTimeSeconds to getItemExpiryTimeEpochSe…

aca1aeb

…conds for clarity and add comment

extract serializePersistentRepr

6cc127f

reorder statements

27600aa

extract function verifyItemSizeDidNotReachThreshold

b469627

Extract createItem function

bd8f36c

move function keyLength to function verifyItemSizeDidNotReachThreshold

f7d20d7

Extract verifyItemSizeDidNotReachThreshold in another class

1f90792

Turn keyLength into a value

293b2e6

Add the ttl size in the ItemSizeVerifier

70e7908

Refactor: to fix a smell

80cc33e

Dnomyar marked this pull request as draft October 11, 2021 16:21

Dnomyar marked this pull request as ready for review October 11, 2021 16:25

Dnomyar mentioned this pull request Oct 15, 2021

DynamoDB TTL #101

Open

coreyoconnor reviewed Oct 15, 2021

View reviewed changes

Dnomyar changed the title ~~Add DynamoDB item Time to Leave~~ Add DynamoDB item Time to Live Oct 18, 2021

Add section in README about TTL

624df3f

README typos

c6bf7c3

Address review feedback, move time up

7e11085

README typos

3c6cfc3

Dnomyar commented Oct 19, 2021

View reviewed changes

Dnomyar requested a review from coreyoconnor November 10, 2021 14:53

coreyoconnor requested changes Dec 5, 2021

View reviewed changes

Update readme

d32d946

format files

c69866a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add DynamoDB item Time to Live #99

Add DynamoDB item Time to Live #99

Dnomyar commented Oct 11, 2021 •

edited

Loading

lightbend-cla-validator commented Oct 11, 2021

coreyoconnor left a comment

coreyoconnor Oct 15, 2021

Dnomyar Oct 19, 2021

coreyoconnor Oct 15, 2021

Dnomyar Oct 19, 2021

lightbend-cla-validator commented Oct 19, 2021

lightbend-cla-validator commented Oct 19, 2021

lightbend-cla-validator commented Oct 19, 2021

lightbend-cla-validator commented Oct 19, 2021

Dnomyar Oct 19, 2021 •

edited

Loading

coreyoconnor Dec 5, 2021

Dnomyar commented Oct 19, 2021 •

edited

Loading

coreyoconnor commented Dec 1, 2021

coreyoconnor left a comment

coreyoconnor Dec 5, 2021

coreyoconnor Dec 5, 2021

coreyoconnor Dec 5, 2021

coreyoconnor Dec 5, 2021

lightbend-cla-validator commented Jan 11, 2022

lightbend-cla-validator commented Jan 11, 2022

lightbend-cla-validator commented Jan 11, 2022


		fut.map { serialized =>
		serializePersistentRepr(reprPayload, serializer).map { serialized =>


		Note that `E1` was expired prior to `E0`. Which is entirely possible given the description from Amazon on how TTL is implemented.

		As a consequence, a separate TTL for snapshot should be preferred. The journal TTL should be configured to be a bit larger than the snapshot. Thus, any out-of-order TTL removal of journal item can be prior to the snapshot item.


		In DynamoDB, this feature can be activated in a per-table basis by specifying the name of the field containing the expiring date.

		Expiring items is available for journal and snapshot tables. In order to activate it, `dynamodb-item-ttl-config.field-name` and `dynamodb-item-ttl-config.ttl` need to be specified:

Add DynamoDB item Time to Live #99

Are you sure you want to change the base?

Add DynamoDB item Time to Live #99

Conversation

Dnomyar commented Oct 11, 2021 • edited Loading

lightbend-cla-validator commented Oct 11, 2021

coreyoconnor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lightbend-cla-validator commented Oct 19, 2021

lightbend-cla-validator commented Oct 19, 2021

lightbend-cla-validator commented Oct 19, 2021

lightbend-cla-validator commented Oct 19, 2021

Dnomyar Oct 19, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dnomyar commented Oct 19, 2021 • edited Loading

coreyoconnor commented Dec 1, 2021

coreyoconnor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lightbend-cla-validator commented Jan 11, 2022

lightbend-cla-validator commented Jan 11, 2022

lightbend-cla-validator commented Jan 11, 2022

Dnomyar commented Oct 11, 2021 •

edited

Loading

Dnomyar Oct 19, 2021 •

edited

Loading

Dnomyar commented Oct 19, 2021 •

edited

Loading