Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 18 additions & 1 deletion pptx_blueprint/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import pathlib
import pptx
import re
from typing import Union, Iterable, Tuple
from pptx.shapes.base import BaseShape

Expand All @@ -23,6 +24,7 @@ def __init__(self, filename: _Pathlike) -> None:
"""
self._template_path = filename
self._presentation = pptx.Presentation(filename)
self._copy_tags_to_name()
pass

def replace_text(self, label: str, new_text: str) -> None:
Expand Down Expand Up @@ -71,7 +73,7 @@ def _find_shapes(self,
matched_shapes = []

def _find_shapes_in_slide(slide):
return filter(lambda shape: shape.text == f'{{{tag_name}}}', slide.shapes)
return filter(lambda shape: shape.name == f'{{{tag_name}}}', slide.shapes)

if slide_number == '*':
slides = self._presentation.slides
Expand All @@ -87,6 +89,21 @@ def _find_shapes_in_slide(slide):

return matched_shapes

def _get_all_shapes(self) -> Iterable[BaseShape]:
# Do we need all the shapes? Perhaps we should filter on tags here.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to iterate over each shape at least once and apply the re.match to identify whether there's some work for us to do. It's a decision of style if you want to first get all shapes and then match and work, or if you get all matching slides and then do some work. For a (extremely) big presentation, however, generating a list of all slides may be quite memory expensive and possibly slow. Both issues could be eliminated if you'd just replace the square brackets with regular brackets, thereby changing the list comprehension into a generator expression.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please inline...

Copy link
Author

@pajot pajot Oct 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please inline...

Do you mean the comment? Isn't it a bit long to be inlined? (PEP8 says "use inline comments sparingly ;) )

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inlining means to eliminate the function and use the code of the function body instead of the function call. This is reasonable in this particular case because

  • the function is only used once (and unlikely to be needed multiple times)
  • the inline version is almost equally short and equally descriptive.
    instead of
    all_shapes = self._get_all_shapes()
    
    just use
    all_shapes = (shape
                  for slide in self._presentation.slides
                  for shape in slide.shapes)
    

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying my comment, Tim. I now see that I could have been a little more verbose :-)

all_shapes = [shape for slide in self._presentation.slides for shape in slide.shapes]
return all_shapes

def _copy_tags_to_name(self) -> None:
all_shapes = self._get_all_shapes()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please inline

# This regex matches on tags
regex_tag = re.compile(r'^\s*(\{\w+\})\s*$')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be defined as a class constant?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further, why are the curly brackets part of the group? Do we really want to set the shape.name to e.g. '{location}' instead of 'location'?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further, why are the curly brackets part of the group? Do we really want to set the shape.name to e.g. '{location}' instead of 'location'?

Tim and I had discussed this, yes. Leaving the curly braces in makes it clear that this is a tag.

Copy link
Owner

@timhoffm timhoffm Oct 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be defined as a class constant?

Could do, but if so, please private. I don't want to make this user-configurable as of now. Strictly speaking, it would even be sufficient not to explicitly compile the regexp and just use inline if re.match(r'^\s*(\{\w+\})\s*$', shape.text):. re caches the last few (I belive to remember 100?) expressions. Therefore, the overhead for this basic call vs. the compiled regexp is just a dict lookup, which is negligible.

Do we really want to set the shape.name to e.g. '{location}' instead of 'location'?

I tend to say yes. While the curly brace feels a bit bulky here, it serves to highly reduce the likelihood of a name collision between a tag and an existing shape name.

for shape in all_shapes:
# We only copy contents we recognize as tags
match = regex_tag.match(shape.text)
if match:
shape.name = match.group(1)

def save(self, filename: _Pathlike) -> None:
"""Saves the updated pptx to the specified filepath.

Expand Down
14 changes: 14 additions & 0 deletions tests/test_template.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
from pptx_blueprint import Template
from pathlib import Path
import pytest
import re


@pytest.fixture
Expand Down Expand Up @@ -38,3 +39,16 @@ def test_find_shapes_from_one_slide(template):
def test_find_shapes_index_out_of_range(template):
with pytest.raises(IndexError):
shapes = template._find_shapes(0, 'logo')

def test_get_all_shapes(template):
shapes = template._get_all_shapes()
for shape in shapes:
assert str(shape.text) and str(shape.name)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So how does this test that I 'get all the shapes'? I'm afraid that you'd have to mock the _presentation to verify this...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simply named the test after the function it is testing. But no, it is not actually testing that I have all the shapes. But code to test this is necessarily going to be as complex as the code that performs it.

At the very least, the code checks that what was obtained has at least some properties we would expect to find in slides :)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the test would be rather complex. At least you'd have to monkey patch the Presentation class (because we don't use dependency injection) and configure the mock (which seems tedious). Further, there are people out there that say "don't test private methods, just test the public interface", i.e., you would not write a test for _get_all_shapes() at all, but only test it indirectly by testing the public method it is used in.

Would probably the simplest to just give the test a more descriptive (cool slang for "longer") name ;-)


def test_copy_tags_to_name(template):
template._copy_tags_to_name()
all_shapes = template._get_all_shapes()
regex_tag = re.compile(r'\{[\s\w]*\}')
for shape in all_shapes:
if regex_tag.match(shape.text):
assert shape.text == shape.name
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really what you wan't to test? Taking a look at the regex you defined in the implementation, for a shape having the text ' {location} ', you would set the name as '{location}'. Hence, in this case, shape.text != shape.name.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, this test needs to be fixed :)