Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy loading and caching for attributes set in _loadData(..) #1510

Open
wants to merge 33 commits into
base: master
Choose a base branch
from

Conversation

eliasbenb
Copy link

@eliasbenb eliasbenb commented Apr 4, 2025

Description

This PR introduces lazy loading for data attributes in all Plex objects as well as a caching mechanism when loading those attributes using a new @cached_data_property decorator. This change significantly improves performance by deferring the loading of computationally expensive attributes until when the user requests it.

The cache of properties created with @cached_data_property will also automatically be invalidated when _loadData(...) is called with a new XML data object.

Fixes: #1509

Type of Change

  • New feature (non-breaking change which adds functionality)

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated the docstring for new or existing methods
  • I have added tests when applicable

Additional Notes

The initial performance testing is very promising. Below is a test script that was ran against the production python-plexapi package and this PR's version:

import timeit

def test():
    from plexapi.server import PlexServer

    plex = PlexServer()

    res = []
    for section in plex.library.sections():
        items = section.all()
        res.extend([(i.title, i.lastViewedAt) for i in items])

if __name__ == "__main__":
    print(timeit.timeit(test, number=1))

In this testing scenario, the program does not need access to any of the expensive Video attributes (e.g. images, genres, directors). Since those expensive queries are skipped, the lazy-loaded testing results are ~3x faster than the production python-plexapi package.

# Time in seconds
Lazy-loaded: 5.732901584000501
Production: 17.07976305699958

Below are the Pyspy profiler outputs between the two packages against the above script:

Lazy-loaded:

lazy-loaded

Production:

production

As can clearly be seen, python-plexapi (purple) is no longer the bottleneck, instead, the requests and urllib packages (which are making the HTTP requests to the Plex server) are the new bottlenecks.

- Cached properties are defined using a `cached_data_property` decorator
- Property caches are automatically invalidated whenever the `_data` atribute is changed
- The implementation uses a metaclass to collect and track all cached properties across class inheritances
These attributes include those that call `findItems` and `listAttrs`
@JonnyWong16
Copy link
Collaborator

Unrelated, but the call to cleanXMLString could also be optimized by only cleaning the string when required by catching ParseError.

#1450

@eliasbenb
Copy link
Author

@JonnyWong16 the change to cleanXMLString resulted in the function's percentage CPU time going down from 19.1% to 12.3% (tested with the scenario from my PR description).

@eliasbenb
Copy link
Author

I just realized the cleanXMLString change is out of scope for this PR, let me know if you'd rather have me revert it.

@JonnyWong16
Copy link
Collaborator

Yeah, I was just thinking about it when I saw your graph. A separate PR would be good.

… expensive) before trying again with cleaning"

This reverts commit e8348df.
@JonnyWong16
Copy link
Collaborator

The cache invalidation isn't working.

from plexapi.server import PlexServer

plex = PlexServer()
movie = plex.library.section("Movies").get("GoldenEye")

id(movie._data)  # 2701410045600
movie.__dict__['genres']  # KeyError as expected
movie.genres  # retrieved - [<Genre:Action>, <Genre:Adventure>]
movie.__dict__['genres']  # cached - [<Genre:Action>, <Genre:Adventure>]

movie.reload()
id(movie._data)  # 2701409346640 - ID changed
movie.__dict__['genres']  # still cached - [<Genre:Action>, <Genre:Adventure>]

Copy link
Author

@eliasbenb eliasbenb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that PlexPartialObject inherits _loadData from PlexObject, so this commit doesn't really do anything 🤦.

At the very least, it makes it clear where loadData is going to.

@JonnyWong16
Copy link
Collaborator

Still a bunch of tests failing.

@eliasbenb
Copy link
Author

eliasbenb commented Apr 4, 2025

I'm not really sure how to setup the testing environment myself, so this is just my understanding for why each test is failing from reading the pytest code and logs.

🚫 test_library_add_edit_delete, test_library_and_section_search_for_movie:

These two involve editing a cached attribute (locations). Will need to investigate.

🚫 test_photo_Photoalbum_mixins_fields, test_photo_Photo_mixins_fields, test_photo_Photo_mixins_tags, test_Playlist_mixins_fields, test_video_Season_mixins_tags:

All the above tests involve the cached property fields. The problem is fields is being returned as an empty list. When I tried printing fields in my local movies and TV libraries, it would also always return an empty list when using the prod python-plexapi package.

Is fields specific to photo libraries? There isn't much documentation on it so I'm not even sure what it is.

test_create_playqueue, test_video_Movie_attrs:

These two seem to be failing on all other PRs, so I'm assuming they are unrelated to this PR.

test_video_Movie_reload_kwargs:

The test tries to call assert movie.__dict__.get('media') == [] which won't work with new changes since the attribute doesn't exist until getattr is called. So, the test will need to be modified to assert movie.media == []

test_video_Season_attrs:

I can reproduce this. No guids are printed for season.guids. Interestingly, this only occurs for seasons. Movies, episodes, and shows are unaffected. Will investigate.

@eliasbenb
Copy link
Author

eliasbenb commented Apr 4, 2025

test_video_Movie_reload_kwargs and test_video_Season_attrs should be fixed now.

The former was fixed by modifying the test to use getattr instead of accessing attributes through __dict__ (which will fail for cached attributes that haven't been accessed yet).

The latter was due to a typo on my part, I was using functools.cached_property when I should've been using the new cached_data_property.

I'll revisit the remaining 3 issues another day. They aren't as straightforward.

@JonnyWong16
Copy link
Collaborator

test_library_and_section_search_for_movie:

This is only failing because the previous test for editing library locations failed so it ended up with two movies.

🚫 test_photo_Photoalbum_mixins_fields, test_photo_Photo_mixins_fields, test_photo_Photo_mixins_tags, test_Playlist_mixins_fields, test_video_Season_mixins_tags:

fields are a list of locked edits.

test_video_Movie_reload_kwargs:

The reason I used movie.__dict__['media'] here is because I explicitly don't want to trigger the auto-reload on accessing empty attributes via movie.media.

@eliasbenb
Copy link
Author

eliasbenb commented Apr 4, 2025

This is only failing because the previous test for editing library locations failed so it ended up with two movies.

Makes sense thanks for clarifying, I'll merge them in my comment.

fields are a list of locked edits.

In that case it would seem that all the the renaming issues are related to edits not working. Although, edits do work at least for some attributes. So it might be local to just fields. Will investigate more tomorrow.

The reason I used movie.__dict__['media'] here is because I explicitly don't want to trigger the auto-reload on accessing empty attributes via movie.media.

Not very familiar with how auto reloads works but would this work?

def test_video_Movie_reload_kwargs(movie):
    assert len(movie.media)
    assert movie.summary is not None
    movie.reload(includeFields=False, **movie._EXCLUDES)

    original_auto_reload = movie._autoReload
    movie._autoReload = False

    assert movie.media == []
    assert movie.summary is None

    movie._autoReload = original_auto_reload 

@JonnyWong16
Copy link
Collaborator

Yes, that should work for the movie reload test.

@JonnyWong16
Copy link
Collaborator

JonnyWong16 commented Apr 5, 2025

🚫 test_library_add_edit_delete:

  • LibrarySection has it's own reload() method which doesn't call _loadData(). That needs a special way to handle the caching invalidating or refactor it in a way to call _loadData().

🚫 test_photo_Photoalbum_mixins_fields, test_photo_Photo_mixins_fields, test_photo_Photo_mixins_tags, test_Playlist_mixins_fields, test_video_Season_mixins_tags:

  • Photoalbum, Photo, and Playlist are missing the call to PlexObject._loadData(self, data)
  • Season was fixed with 526e47b

@eliasbenb
Copy link
Author

I think Playable, PlexSession and PlexHistory are treated more like a mixin than base classes. The only time PlexSession is ever inherited is in something that also inherits from PlexObject.

Alright, then it should work without inheriting PlexObject.

Instead of changing _loadData to call PlexObject._loadData(), we just insert a method in between.

Done and yes, it's a lot cleaner.

But now I'm questioning if everything should be using super() instead.

The last change removed most of the cls.func(self, ...) calls. There are still some that existed before this PR but I'll leave them since it's out of scope.

@eliasbenb
Copy link
Author

There's also another issue of whether _loadData(...) should do the self._data assignment. It makes sense aesthetically but also adds a lot of duplicated lines.

Putting it in _invalidateCacheAndLoadData(...) in simpler but possibly less clear.

I'm leaning towards leaving it in just _invalidateCacheAndLoadData(...) but what do you think?

if prop_name in self.__dict__:
del self.__dict__[prop_name]

def _invalidateCacheAndLoadData(self, data):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit nit-picky, but swap the order of the two methods _invalidateCachedProperties() and _invalidateCacheAndLoadData().

And forgot to change the call to _loadData() in __init__ above.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_loadData() in __init__ still needs to be changed?

@JonnyWong16
Copy link
Collaborator

JonnyWong16 commented Apr 7, 2025

There's also another issue of whether _loadData(...) should do the self._data assignment. It makes sense aesthetically but also adds a lot of duplicated lines.

Putting it in _invalidateCacheAndLoadData(...) in simpler but possibly less clear.

I'm leaning towards leaving it in just _invalidateCacheAndLoadData(...) but what do you think?

I think it's fine where you have it in _invalidateCacheAndLoadData(...). You need to get the id() of the old data before you reassign it.

@JonnyWong16
Copy link
Collaborator

flake8 tests are failing due to unused imports.

@eliasbenb
Copy link
Author

flake8 tests are failing due to unused imports.

Unrelated to the PR but I'm doing the development in a VSCode devcontainer. I was assuming that

	"features": {
		"ghcr.io/devcontainers-extra/features/flake8:2": {}
	}

Was enough for flake8 to be installed, but apparently you still need to get the flake8 extension from the VSCode marketplace on top of that. Linting is working correctly now, so I should be good on flake8 issues now.

@eliasbenb
Copy link
Author

#1510 (comment)

_loadData() in __init__ still needs to be changed?

Looks like this repo doesn't allow non collaborators to submit review comments, so my reply was lost.

My original reply was: "Because this is the init function, there shouldn't be any cache to invalidate. So, I think it's fine as is. Let me know what you think."

@JonnyWong16
Copy link
Collaborator

Looks like this repo doesn't allow non collaborators to submit review comments, so my reply was lost.

@pkkid Can you look into this?

My original reply was: "Because this is the init function, there shouldn't be any cache to invalidate. So, I think it's fine as is. Let me know what you think."

Okay, that's what I thought as well.

@eliasbenb
Copy link
Author

If it helps, this is what it looks like on my end. My message has a "Pending" tag:

image

@eliasbenb
Copy link
Author

Also, this was another comment that was lost for the same reason:

library.Hub has a special reload function that calls self.items = self.fetchItems(self.key), making the cached property somewhat useless if the more and key condition is passed.

I'm a little confused on why the remaining items are fetched only after a reload, why isn't it fetched in _loadData?

https://github.com/eliasbenb/python-plexapi/blob/88e165f5edb22b1d5b6528993b6989babfa2f64b/plexapi/library.py#L2236-L2248

@pkkid
Copy link
Collaborator

pkkid commented Apr 8, 2025

I think the permissions were fixed.
image

@eliasbenb
Copy link
Author

@pkkid, I don't think it worked. I tried sending another comment and it's still hidden from public view.

@JonnyWong16
Copy link
Collaborator

I found one more instance of _loadData() here. But I don't think this one matters because there are not cached Setting attributes?

self._settings[id]._loadData(elem)

Also, this was another comment that was lost for the same reason:

library.Hub has a special reload function that calls self.items = self.fetchItems(self.key), making the cached property somewhat useless if the more and key condition is passed.

I'm a little confused on why the remaining items are fetched only after a reload, why isn't it fetched in _loadData?

https://github.com/eliasbenb/python-plexapi/blob/88e165f5edb22b1d5b6528993b6989babfa2f64b/plexapi/library.py#L2236-L2248

If you do LibrarySection.hubs() it only returns "partial hubs" with just the first few items. You need to reload() the hub to retrieve the full list of items.

@eliasbenb
Copy link
Author

eliasbenb commented Apr 9, 2025

I found one more instance of _loadData() here. But I don't think this one matters because there are not cached Setting attributes?

Yes, doesn't really matter but I'll change it to _invalidateCacheAndLoadData just to be more compatible with the new caching system in case of any future changes.

If you do LibrarySection.hubs() it only returns "partial hubs" with just the first few items. You need to reload() the hub to retrieve the full list of items.

I'm assuming this was a requirement because getting the full list was too expensive before? If that's the case, would it make sense now that the attribute is lazy loaded to always return the full hub when called?

@JonnyWong16
Copy link
Collaborator

JonnyWong16 commented Apr 9, 2025

I'm assuming this was a requirement because getting the full list was too expensive before? If that's the case, would it make sense now that the attribute is lazy loaded to always return the full hub when called?

I think that's reasonable now.

@deprecated the reload function to just return items() and update items() to fetch all items when called. Technically, Hub.reload() should reload it's own attributes but that would be a breaking change. Don't forget the update the docstring for the more attribute.

Side note (for a separate PR), I was just looking at hubs and there is a random (bool) attribute that we are missing.

@eliasbenb
Copy link
Author

eliasbenb commented Apr 9, 2025

@deprecated the reload function to just return items() and update items() to fetch all items when called. Technically, Hub.reload() should reload it's own attributes but that would be a breaking change. Don't forget the update the docstring for the more attribute.

I read your comment edits after pushing a change. So, I went with something a little different. Let me know what you think of it. a06a43f.

I believe my change mirrors the preexisting behavior 1-to-1 without needing to deprecate the reload function (since it still serves a purpose of invalidating the cache).

Although, I'm realizing I didn't update the docstring for the reload function to reflect the change.

Edit: Some more logic I was missing c98a9d1

@JonnyWong16
Copy link
Collaborator

JonnyWong16 commented Apr 9, 2025

You're right, for some reason I thought reload() returned the items.

I'm working on another PR to refactor a bunch of the reload methods to unify them so just leave it for now. Just update the doc strings.

Copy link
Collaborator

@JonnyWong16 JonnyWong16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good now. Thanks for all your work.

I think the next follow up PR is to remove all the manual _cache attributes now that we have better caching.

https://github.com/search?q=repo%3Apushingkarmaorg%2Fpython-plexapi+%22%23+cache%22&type=code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ability to load attributes on demand then to cache them instead of loading them on object instantiation
3 participants