Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New group consistency algorithm #6401

Closed
link2xt opened this issue Jan 2, 2025 · 12 comments
Closed

New group consistency algorithm #6401

link2xt opened this issue Jan 2, 2025 · 12 comments
Assignees

Comments

@link2xt
Copy link
Collaborator

link2xt commented Jan 2, 2025

PR is at #6404

Existing group consistency algorithm is built in ad-hoc way by fixing known cases where it practically failed and adding the tests each time. At this moment there is one unfixed case in the form of the PR #6021

Instead of fixing this case one more time with another patch, current plan is to rebuild group consistency from scratch. Algorithm is currently defined at https://github.com/chatmail/specs/tree/main/group-membership with a TLA+ model that was used to reject simpler solutions by finding counter-examples for "eventual consistency" property. There is a Python model of the algorithm at https://github.com/chatmail/specs/tree/main/gmc which shows that #6021 case is fixed and also tests the "immediate consistency" property. We will also keep all existing tests already defined in the core codebase, they should pass and maybe changed if there is a good reason, but not removed, especially when they are testing compatibility with older Delta Chat and MUAs that is not captured by the formal spec.

The idea of the new algorithm is to maintain group member set as a Last-Write-Wins set CRDT. To determine if some operation happened earlier or later we are going to use message timestamps with second precision. Using logical clocks (e.g. vector clocks) was considered but sending them around would increase the amount of data and the result of having temporarily unsynchronized clock between devices is not fatal. Nowadays most of the devices are mobile devices that have clocks synchronized over the network without users having to know about it, I personally don't remember having to adjust my clock on any of the devices in the last 5-10 years at least, it just works and we don't need high resolution.

Currently chats_contacts table consists of two columns chat_id and contact_id. We will also add add_timestamp and remove_timestamp columns that default to 0. If add_timestamp >= remove_timestamp, the member is actually in the chat, otherwise the row is a tombstone. Note that if add_timestamp == remove_timestamp, member is considered to be part of the chat as it simplifies database migration.

We also add new headers Chat-Group-Past-Members and and Chat-Group-Member-Timestamps. Chat-Group-Past-Members contains a list of past group members in a format similar to the To: header.

Chat-Group-Member-Timestamps contains space-separated (just like the To field for easy wrapping)
unix timestamps (in seconds, just integers) for the To field followed by Chat-Group-Past-Members field in exactly this order. I have not decided yet if it makes sense to include the timestamp of the sender, i.e. for the From field. When member leaves the group, self address is anyway included in the Chat-Group-Past-Members. We also want to have sealed sender at some point. I will leave the sender timestamp out for now. If this becomes a problem, we can add the sender explicitly in the To field, this is probably a cleaner solution.

There is no special compression for timestamps, messages are compressed by OpenPGP anyway. There are some ideas to e.g. only send differences for timestamps other than the first one or hex-encode, but I am not going to do this for the first implementation. We can play with this late before merging.

When a message is received, if there is a Chat-Version header and Chat-Group-Member-Timestamps header
exists, do the merging. First, adjust the received timestamps so they are not in the future by taking the minimum of the current timestamp and the received timestamp. Same when loading the timestamps from the database, if they are in the future assume them to be the current timestamp.

For every member that does not exist in the chats_contacts table yet, just accept the state from the received message and the new timestamp. If the member already exists in the chats_contacts table, accept the new state if the received timestamp is higher than stored timestamp. In case of the same timestamp and conflict, e.g. local state says the member is removed (is_tombstone is true) and received state says the member is added (it is in the To: field rather than Past-Members: field), prefer adding the member. Adding members is preferred to removing members because in this case added member eventually learns that it is still part of the group and can leave while if the user is removed they stop receiving messages and may not notice that they are no longer in the group.

If the message is received with Chat-Version but without Chat-Group-Member-Timestamps, it is an old Delta Chat message. In this case if there is a Chat-Member-Added or Chat-Member-Removed header, do the action locally if the message is newer than the timestamp stored locally.

If the message is received without Chat-Version header, it is a message sent from non-Delta Chat client. In this case treat all members in the To field as if they are added with timestamp 0. This allows adding members to the group
with non-DC clients, but not re-adding or removing them. It is mostly to support ad-hoc groups which are actually email threads and normally don't live long enough to have several iterations of member removals and re-additions.

Expiration for tombstones is moved into #6427

@link2xt
Copy link
Collaborator Author

link2xt commented Jan 4, 2025

For this protocol to work, we need to make sure all group messages that have Chat-Group-Member-Timestamps headers have the same number of timestamps as the number of members in the To and Chat-Group-Past-Members combined. I am going to check it with debug_assert! where I can.

Currently Delta Chat adds self address to the "To" field if "To" field is empty. If Alice and Bob are in the group, Alice removes Bob and sends "member removed" to "Bob", but "Bob" is not included in the "To" field so Alice adds own address to the "To" field. Proper solution for the case when we don't have any members for the To field is to include empty hidden-recipients:; group there. However, current Delta Chat cannot handle such messages. Messages with empty group in the To field that are not broadcast list messages go to trash. I fixed this in the preparation PR: #6407
However, we cannot wait until everyone upgrades to Delta Chat with this change, so as a workaround we will probably have to always explicitly include self address in the To field to ensure that it is never empty.

@Hocuri
Copy link
Collaborator

Hocuri commented Jan 5, 2025

Consider this case:

  • Alice exports a backup
  • Someone removes Charlie
  • Fast-forward 60 days
  • Alice imports the old backup and sends into the group
  • Since there is no tombstone of Charlie, he will be re-added.

One solution would be: Never accept changes if the received timestamp (i.e. get_header("Chat-Group-Member-Timestamps")[index_of_this_member]) is older than 60 days. This would be bad for eventual consistency, since after 60 days the algorithm won't try and achieve consistency anymore.

An alternative that may be a bit better for eventual consistency would be: Never accept changes if the maximum received timestamp for all members in this group (i.e. max(get_header("Chat-Group-Member-Timestamps"))) is older than 60 days.

@link2xt
Copy link
Collaborator Author

link2xt commented Jan 5, 2025

Since there is no tombstone of Charlie, he will be re-added.

This is expected. This at least achieves eventual consistency, even if by re-adding group members.

If the group is somewhat active, Alice will likely receive at least one message after restoring the backup and sending a message to the group. We can maybe make Alice remove whatever members have timestamp older than 60 days and not present in the To: field as soon as she receives some message in the group.

Any workarounds that we add should not result in the failure to achieve consistency, such as Alice forever sending messages to some member and other members never accepting this member into the memberlist.

@iequidoo
Copy link
Collaborator

iequidoo commented Jan 6, 2025

Since there is no tombstone of Charlie, he will be re-added.

This is expected. The only fix is to always send the timestamp and have no "60 days" expiration. Other solutions break eventual consistency.

Can Charlie keep his own tombstone so that Delta Chat can at least show to Charlie that it may be an outdated addition? It shouldn't blame anyone and Charlie can decide to leave the group again more quickly.

@link2xt
Copy link
Collaborator Author

link2xt commented Jan 6, 2025

I think I will not implement this 60 days rule for initial implementation and postpone it for another PR.

@iequidoo
Copy link
Collaborator

iequidoo commented Jan 6, 2025

Also it seems that Chat-Group-Past-Members list length should be limited to smth like len("To") + 64 and stalest members should be removed from it even if they were tombed < 60 days ago, otherwise the algorithm doesn't scale well in some cases.

Also it seems that past members (and even non-members if they know the group id) can readd themselves, maybe we should ignore self-timestamps for To and only apply them for Chat-Group-Past-Members?

@link2xt
Copy link
Collaborator Author

link2xt commented Jan 6, 2025

Also it seems that Chat-Group-Past-Members list length should be limited to smth like len("To") + 64 and stalest members should be removed from it even if they were tombed < 60 days ago, otherwise the algorithm doesn't scale well in some cases.

This is already the case with the To field, we should have some reasonable limit like 500 members per chat_id in the chats_contacts table and not allow to add more members when the limit is reached.

Also it seems that past members (and even non-members if they know the group id) can readd themselves, maybe we should ignore self-timestamps for To and only apply them for Chat-Group-Past-Members?

Malicious users are explicitly out of scope for https://github.com/chatmail/specs/tree/main/group-membership, the problem is difficult enough already. Such user can already do worse things like creating a group with the same ID and different members then introducing such new group to previously known member, creating a clone group with known past members etc. If you really need to prevent such user from at least joining the group, you need to generate a new ID.

Any attempts to e.g. ignore adding self to the group will result in "member is not part of the group" errors due to reordering in cases without any attackers.

@link2xt
Copy link
Collaborator Author

link2xt commented Jan 7, 2025

Current idea to fix the "user restored old backup" case after discussion with @Hocuri:

  1. For each group, store the "memberlist timestamp". We already have it stored in the param, can be reused.
  2. If "memberlist timestamp" is older than 60 days, the group is considered stale.
  3. If the group is stale, we don't send Chat-Group-Member-Timestamps outside, because it will anyway have only 0 timestamps and no tombstones. This will look like messages sent by legacy Delta Chat, so only explicit member changes will be applied.
  4. If the group is stale and we receive any message, instead of doing synchronization we remove all entries for the chat from chats_contacts table and just apply the new To: memberlist as a new one and bump the memberlist timestamp. All the timestamps we take from the Chat-Group-Member-Timestamps if there is any, otherwise just set timestamps to 0 for all members in the To: field.

If the user who restored a backup is a chatmail user, they will likely have only some recent group messages and will synchronize to not-so-outdated state. If the user is a "gmail user" who has the full archive on IMAP, it will take some time to process messages but should not result in problems as long as user does not chat until fully downloading everything. With the recent fix to remove Secure-Join messages after processing (#6354) no automatic messages should be sent during fetching. Maybe we should not allow sending anything until the first full sync, but this is out of scope for the issue.

@iequidoo
Copy link
Collaborator

iequidoo commented Jan 7, 2025

4. If the group is stale and we receive any message, instead of doing synchronization we remove all entries for the chat from chats_contacts table and just apply the new To: memberlist as a new one and bump the memberlist timestamp. All the timestamps we take from the Chat-Group-Member-Timestamps if there is any, otherwise just set timestamps to 0 for all members in the To: field.

The only thing is that this "memberlist timestamp" is also needs to be sent. Otherwise if two such stale states meet, it's not clear which one to prefer. Maybe quite a rare scenario (e.g. two users restore from old backups), but still. Sending the whole state is obviously better anyway.

@link2xt
Copy link
Collaborator Author

link2xt commented Jan 7, 2025

Maybe quite a rare scenario (e.g. two users restore from old backups), but still.

Yes, we talked about it and decided to ignore it. It is always possible to come up with a scenario when the chat is not active and then two users restore backups, one from 1 year old and another from 5 years old etc., things are going to break then.

@iequidoo
Copy link
Collaborator

iequidoo commented Jan 7, 2025

Yes, we talked about it and decided to ignore it. It is always possible to come up with a scenario when the chat is not active and then two users restore backups, one from 1 year old and another from 5 years old etc., things are going to break then.

We can add additional timestamps to messages headers, they can basically work as Chat-Group-Member-Timestamps, MemberListTimestamp is maybe indeed not so useful because it only fixes some rare cases, but some timestamp for ChatGroupName can be, that would be better than a crutch i'm adding in #6414.

link2xt added a commit that referenced this issue Jan 11, 2025
This implements new group consistency algorithm described in
<#6401>

New `Chat-Group-Member-Timestamps` header is added
to send timestamps of member additions and removals.
Member is part of the chat if its addition timestamp
is greater or equal to the removal timestamp.
@link2xt
Copy link
Collaborator Author

link2xt commented Jan 12, 2025

Closing this issue as fixed by #6404.
For 60-day expiration of tombstones I am creating another issue #6427 and collecting everything related into the first post.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants