Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking change: Improve save() performance by skipping index creation #2702

Merged

Conversation

ShaneHarvey
Copy link
Contributor

@ShaneHarvey ShaneHarvey commented Oct 27, 2022

Indexes are now only created when a Model is first used, eg the first call to save() or on the first call to _get_collection(), or when the new meta["auto_create_index_on_save"] option is set to True. This is a minor breaking change for some applications. As a workaround apps can explicitly call ensure_indexes() or set meta["auto_create_index_on_save"] to True.

Note that with the default settings (auto_create_index=True + auto_create_index_on_save=False) indexes are still created after on a Document's first use, or first use after Document.drop_collection().

I hope this PR can finally resolve #1446. I reviewed that issue, the previous attempts to fix it (#1457 and #1511), and the original issue that added this behavior (#812) and I believe the solution in this PR is a good compromise between having better default behavior while also offering an escape hatch for the old behavior (via auto_create_index_on_save).

Here are the benchmark results:

$ python benchmarks/test_save_with_indexes.py
--------------------------------------------------------------------------------
Save 10000 documents with 0 indexes.
2.8389482499987935s
--------------------------------------------------------------------------------
Save 10000 documents with 1 index.
2.782498458000191s
--------------------------------------------------------------------------------
Save 10000 documents with 2 indexes.
2.7451970830006758s
--------------------------------------------------------------------------------
Save 10000 documents with 1 index (auto_create_index_on_save=True).
4.725924582999141s
--------------------------------------------------------------------------------
Save 10000 documents with 2 indexes (auto_create_index_on_save=True).
4.777219208997849s

4.72/2.78 = a significant 1.7x speed up for save() on a document with one field and one index.

Note: I've made some relatively minor changes to the other benchmarks as well, mainly just to explicitly use w=1 write concern. This change significantly improves the runtime (and the effectiveness) of the benchmarks, otherwise they end up using the server's default writeConcern which is now w:majority. I can move this to a new PR if desired.

Indexes are now only created when a Model is first used, eg the first
call to save() or on the first call to _get_collection(), or when the new
meta["auto_create_index_on_save"] option is set to True. This is a minor
breaking change for some applications. As a workaround apps can explicitly
call ensure_indexes() or set meta["auto_create_index_on_save"] to True.
@ShaneHarvey
Copy link
Contributor Author

The tests pass locally, @bagerard this is ready for you consideration. Could you approve the github workflow?

Copy link

@stlucasgarcia stlucasgarcia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hoping this will be merged soon 🙏🏻

@bagerard
Copy link
Collaborator

Thanks for resolving this @ShaneHarvey , much appreciated!

@ShaneHarvey
Copy link
Contributor Author

Glad to see this merged, thanks @bagerard!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mongoengine killed mongodb performance when used with pymongo 3.x
3 participants