Limit to_device EDU size to 65536 #18416

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

MatMaul wants to merge 11 commits into element-hq:develop from tchapgouv:edu-limit-size

+304 −58

Contributor

MatMaul commented May 9, 2025 •

edited

Loading

If a set of messages exceeds this limit, the messages are splitted across several EDUs.

Should fix #17035.

There is currently no official specced limit for EDUs, but the consensus seems to be that it would be useful to have one to avoid this bug by bounding the transaction size.

As a side effect it also limits the size of a single to-device message to a bit less than 65536.

This should probably be added to the spec similarly to the message size limit.

Pull Request Checklist

Pull request is based on the develop branch
Pull request includes a changelog file.
Code style is correct


          Limit to_device EDU size to 65536

61fa1b9

MatMaul force-pushed the edu-limit-size branch from 8add186 to 61fa1b9 Compare

May 9, 2025 14:19

MatMaul marked this pull request as ready for review

May 9, 2025 14:38

MatMaul requested a review from a team as a code owner

May 9, 2025 14:38

MatMaul and others added 10 commits

May 12, 2025 01:43


          Increment to_device stream for each EDU otherwise we loose some

c80f24d


          Simplify

eda00e1


          Add comment

6627bed


          Cosmetic

c6bc691


          Cosmetic

57ab541


          Add logs

a0e6dc3


          Improve logs

9ce1488


          fix bug

5e86c59


          Improve logs

35d98b6


          Merge remote-tracking branch 'origin/develop' into edu-limit-size

6be7bcc

MadLittleMods reviewed

View reviewed changes

synapse/api/constants.py

    
            @@ -28,6 +28,7 @@
          
              # the max size of a (canonical-json-encoded) event

              MAX_PDU_SIZE = 65536

              MAX_EDU_SIZE = 65536

Contributor

MadLittleMods May 20, 2025

Suggested change

      
            MAX_EDU_SIZE = 65536
          
            # This isn't spec'ed but is our own reasonable default to play nice with Synapse's
          
            # `max_request_size`/`max_request_body_size`. We chose the same as `MAX_PDU_SIZE` as our
          
            # `max_request_body_size` math is currently limited by 200 `MAX_PDU_SIZE` things. The
          
            # spec for a `/federation/v1/send` request sets the limit at 100 EDU's and 50 PDU's
          
            # which is below that 200 `MAX_PDU_SIZE` limit (`max_request_body_size`).
          
            #
          
            # Allowing oversized EDU's results in failed `/federation/v1/send` transactions (because
          
            # the request overall can overrun the `max_request_body_size`) which are retried over
          
            # and over and prevent other outbound federation traffic from happening.
          
            MAX_EDU_SIZE = 65536

synapse/handlers/devicemessage.py

+                  """
+                  This function takes a dictionary of messages and splits them into several EDUs if needed.
+                  It will raise an EventSizeError if a single message is too large to fit into an EDU.

Contributor

MadLittleMods May 20, 2025

What happens if the EventSizeError is raised? How does Synapse recover? Are we sure that outbound federation doesn't remain stuck?

Contributor

MadLittleMods May 20, 2025

Judging from the tests, it looks like we stop too large messages from even being put in the outbox which is great! Just want to confirm that?

synapse/handlers/devicemessage.py

Comment on lines +304 to +307

+                          edu_contents = get_device_message_edu_contents(
+                              sender_user_id, message_type, messages, context
+                          )
+                          remote_edu_contents[destination] = edu_contents

Contributor

MadLittleMods May 20, 2025

Instead of changing the structure of remote_edu_contents (was a map from destination to EDU meta) (to a map from destination to multiple EDU meta), could we just call add_messages_to_device_inbox(...) multiple times?

synapse/handlers/devicemessage.py

+                      "type": message_type,
+                      "message_id": random_string(16),
+                  }
+                  # This is the size of the full EDU without any messages and without the opentracing context

Contributor

MadLittleMods May 20, 2025

Why is the BASE_EDU_SIZE calculated without BASE_EDU_CONTENT["org.matrix.opentracing_context"]?

synapse/handlers/devicemessage.py

+                      if current_edu_size + message_entry_size > MAX_EDU_SIZE:
+                          edu_contents.append(current_edu_content)
+                          logger.debug(
+                              "Splitting %d device messages from %s into EDU msgid %s, %d EDUs queued",

Contributor

MadLittleMods May 20, 2025

Suggested change

      
                            "Splitting %d device messages from %s into EDU msgid %s, %d EDUs queued",
          
                            "Splitting %d to-device messages from %s into EDU (message_id=%s), (total EDUs so far: %d)",

synapse/handlers/devicemessage.py


		edu_contents = []

		current_edu_content: JsonDict = deepcopy(BASE_EDU_CONTENT)

Contributor

MadLittleMods May 20, 2025

Instead of this cloning, perhaps it's easier to understand if we just have a little helper (maybe performs better as well 🤷):

    def create_new_to_device_edu_content() -> JsonDict:
        """Create a new `m.direct_to_device` EDU `content` object with a unique message ID."""
        content = {
            "messages": {},
            "sender": sender_user_id,
            "type": message_type,
            "message_id": random_string(16),
            "org.matrix.opentracing_context": json_encoder.encode(context)
        }
        return content

synapse/handlers/devicemessage.py

+              ) -> List[JsonDict]:
+                  """
+                  This function takes a dictionary of messages and splits them into several EDUs if needed.

Contributor

MadLittleMods May 20, 2025

Could use a docstring for the args and return.

And context of why we care to split similar to how we explain it for MAX_EDU_SIZE above.

synapse/handlers/devicemessage.py

Comment on lines +489 to +495

+                      logger.debug(
+                          "Queuing last %d device messages from %s into EDU msgid %s, %d EDUs queued",
+                          len(current_edu_content["messages"]),
+                          sender_user_id,
+                          current_edu_content["message_id"],
+                          len(edu_contents),
+                      )

Contributor

MadLittleMods May 20, 2025

Suggested change

      
                    logger.debug(
          
                        "Queuing last %d device messages from %s into EDU msgid %s, %d EDUs queued",
          
                        len(current_edu_content["messages"]),
          
                        sender_user_id,
          
                        current_edu_content["message_id"],
          
                        len(edu_contents),
          
                    )
          
                    logger.debug(
          
                        "Splitting remaining %d device messages from %s into EDU (message_id=%s), (total EDUs so far: %d)",
          
                        len(current_edu_content["messages"]),
          
                        sender_user_id,
          
                        current_edu_content["message_id"],
          
                        len(edu_contents),
          
                    )

tests/rest/client/test_sendtodevice.py


		mock_send_transaction.reset_mock()

		# 2 messages, each just big enough to fit in an EDU

Contributor

MadLittleMods May 20, 2025

Suggested change

      
                    # 2 messages, each just big enough to fit in an EDU
          
                    # 2 messages, each just big enough to fit into their own EDU

tests/rest/client/test_sendtodevice.py


		self.assertEqual(mock_send_transaction.call_count, 2)

		# A transaction can contain up to 100 EDUs but synapse reserves 10 EDUs for other purposes

Contributor

MadLittleMods May 20, 2025

For my own understanding, this happens at

synapse/synapse/federation/sender/per_destination_queue.py

Line 747 in 99cbd33

) = await self.queue._get_to_device_message_edus(edu_limit - 10)

It would be good to label this magic value as a constant which we could also cross-reference here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet