Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

switch_collection method is not thread safe #788

Open
matino opened this issue Oct 22, 2014 · 15 comments
Open

switch_collection method is not thread safe #788

matino opened this issue Oct 22, 2014 · 15 comments

Comments

@matino
Copy link

matino commented Oct 22, 2014

Hi guys,

We just run into an issue when archiving data from one collection into another using switch_collection in multi threaded environment (sample code we used below):

def archive(self):
        self.switch_collection(Item._meta['archive_collection'])
        self.save()
        self.switch_collection(Item._meta['collection'])
        self.delete()

The scenario is a following:
1st thread runs archive, executes self.switch_collection(Item._meta['archive_collection'])
2nd thread wants to query different document and doesn't find it, because 1st thread has switched collection.

It would be a good idea to make switch_collection (and preferably switch_db) method thread safe.
What do you think?

@mmarksnippety
Copy link

Hi

I did a little investigation. Document.switch_collection internally use context manager switch_collection. That manager rewrite collection name on document class. Is this behavior necessary?

@DavidBord
Copy link
Contributor

@matino , why not wrap with lock your archive and query methods?

@DavidBord
Copy link
Contributor

@mmarksnippety , I think it is because otherwise how would you be able to do other actions on the second collection?

@matino
Copy link
Author

matino commented Nov 3, 2014

@DavidBord - I would have to lock all queries in a non archived collection, remember about locking new queries - not really an elegant solution for me.

@DavidBord
Copy link
Contributor

I think that read locking the collection while in the context of switch_collection would also lock the collection for reading from the collection switch_collection switched to

@matino
Copy link
Author

matino commented Nov 4, 2014

I need to be 100% sure ;)

@DavidBord
Copy link
Contributor

But its not only about you, right? :)
I don't think that starving all readers between calls to switch_collection is a proper behaviour so I am marking this as won't fix

Regardless of mongoengine, I would have separated the archive and query stuff into 2 separate processes because I wouldn't want a problem with one of them to affect the other

@matino
Copy link
Author

matino commented Nov 5, 2014

@DavidBord - fine by me, thanks for your 2prompt replies anyway! Maybe there should be a warning at least in the docs about this behavior? What do you think?

@calvinwyoung
Copy link

+1 for including a warning in the docs. We've been seeing a lot of strange bugs that all turned out to be caused by the non-thread-safety of the switch_db context manager. It would've been really helpful if there were a big fat warning in the docs.

@rturk
Copy link

rturk commented Jan 6, 2015

Why not completely review this or maybe even remove this feature? I know that this sound bald but beyond the thread issue the concept of a "switch_db" is already dangerous, doesn't looks like an streamlined code.

Ideally one should use simply use connA and connB to reference different collections, because in fact they are different connections

@iburakov
Copy link

iburakov commented Nov 21, 2018

Ran into the same issue today, also in a multithreaded environment. Hours spend catching an inconsistent bug because of such behaviour. Ended up in mongoengine's sources (leaky abstraction, yeah). Docs, unfortunately, still have no info about switch_db's (lack of) thread-safety. A little example of consequences:

Thread 1:

q = Question.objects(...)  # assuming that we use some db
# processing q's data, time goes...
q.save()  # intending to use the same db as for the query, nothing seems to affect q's connection

Thread 2:

outgoing_question = Question(...)
outgoing_question.switch_db('some_alias')
# at this moment q from Thread 1 successfully goes to 'some_alias', wow!
outgoing_question.save() 

Thread 2 is almost the same as the usage shown in switch_db's docstring. Expected behaviour's logic is like that: switching with an instance, thus affecting only this instance, right? Unexpectedly, no. First lines of current switch_db's implementation:

    def switch_db(self, db_alias, keep_created=True):
        """< docstring > """
        with switch_db(self.__class__, db_alias) as cls:  # switches the db on the whole class
            collection = cls._get_collection()
            db = cls._get_db()

Summarizing: it's quite an unexpected move to switch the whole Document's database affecting all queries and instances when using switch_db method of an instance, isn't it? Manual locking seems completely inelegant in my case as well, because real code is rather more complex than the examples I've provided, but that seems to be the only option for now.

+1 for feature implementation reviewing, or at least a sad warning in docs.

@samuelfekete
Copy link

My approach to resolve this problem was to stop using switch_collection altogether and instead dynamically generate one class per collection using inheritance. See here for more details.

I think something similar can be done to switch_collection itself, by having it dynamically generate a new class per collection requested (if it hasn't already generated it).

@upcFrost
Copy link

upcFrost commented Jan 8, 2021

6 years passed and the bug which is "just a little bit" critical unless you do a lot of concurrency testing is still open :(

@jessemcl-flwls
Copy link

rather than switch, what about using using(), i haven't looked at the source code yet, but presumably this only affects the calling thread? https://docs.mongoengine.org/apireference.html#mongoengine.queryset.QuerySet.using (and I assume there is a similar thing for inserts, updates, etc.)...?

@jessemcl-flwls
Copy link

jessemcl-flwls commented Aug 19, 2022

I took a look at the code and it all uses get_db() under-the-hood in a non-thread-safe way. I've proposed a change to support switching db's in a safe way, which could be extended to collections. See here: #2686

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants