Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New: Interactive Segmentor Engine #2

Open
wants to merge 26 commits into
base: develop
Choose a base branch
from

Conversation

Jiaqi-Lv
Copy link
Collaborator

@Jiaqi-Lv Jiaqi-Lv commented Feb 3, 2022

No description provided.

@Jiaqi-Lv Jiaqi-Lv marked this pull request as draft February 3, 2022 21:10
@Jiaqi-Lv Jiaqi-Lv marked this pull request as ready for review February 7, 2022 23:01
@Jiaqi-Lv
Copy link
Collaborator Author

Jiaqi-Lv commented Feb 7, 2022

Create interactive_segmentation dataset.

Copy link
Owner

@mostafajahanifar mostafajahanifar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Jiaqi,
Sorry for the late review, I'm experiencing three super busy weeks. I hope my late response has not stopped you from further development.

Anyway, Thanks for this code, it seems to do the job. However, there are some major comments from me that need to be answered or addressed before going forward.

raise ValueError(f"`{mode}` is not supported.")

self.img_path = img_path
self.label = label
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the label?
if it's the label of the patch. include it in the docstring.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the label of the input image. I'll include it in the docstring.
However, we might not need it at all?

tiatoolbox/models/dataset/interactive_segmentation.py Outdated Show resolved Hide resolved
return data


def get_boundingBox(self, idx):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_bounding_box

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the reason for writing this bit as a separate method unless it is written in a @staticmethod form to be used later in other modules as well. My suggestion is to merge this bit into the init method with good documentations (comment)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have this as a function so we only compute a bounding box for a click when we are loading the actual patch in getitem.
Sure. But does that mean we compute all the bounding boxes in init and store all of them? Will it be a memory overhead? Or is it ok?

tiatoolbox/models/dataset/interactive_segmentation.py Outdated Show resolved Hide resolved
tiatoolbox/models/dataset/interactive_segmentation.py Outdated Show resolved Hide resolved

return bounds

def get_exclusionMap(self, idx, boundingBox):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function has a good property that it does not need saving a big array in RAM but its computationally extensive. Imagine we have 100000 points in the point list, each time that __get_item__ method is called, this function also calls a for loop over all 99999 other cells, which is not optimized I believe. There are some ways to avoid this:

1- Use the original NuClick approach: creating a big array of click_point and crop the bounding box region from it with each get_item . This way you get rid of the internal for loop and it's fast. Disadvantage is that it might require at max 500 MB of RAM for big whole slide image. (which I'm fine with)

2- This approach is to try vectorizing the inner for loop. Instead of going over all otherPoints in a for loop, try using array filtering, for example:

xy_locations = np.concat([x_locations, y_locations], axis=1)
sel = np.any(xy_locations  > boundingBox[:2], axis=1)
sel |= np.any(xy_locations  < boundingBox[2:], axis=1)
filtered_xy_locs = xy_locations[sel]

exclusionMap[0, filtered_xy_locs[:, 1]-boundingBox[1], filtered_xy_locs[:, 0]-boundingBox[0]] = 1

To be honest, I just write this snippet in the comments and haven't checked if it works, but there should be a similar trick out there to avoid writing the inner for loop.

3- The smartest way maybe is to go with KDtree algorithm. I'm not an expert with this concept, but this might be the solution to both speed and memory efficiency. For example, in the init method you form the KDtree to find the K nearest neighbors of each point and for each point when you are forming the exclusion map, you somehow extract the nearest neighbors that fall inside the same bounding box as well.

All and all, my suggestion is this: check for a high number of point annotations (like random 100000 points) and see how long does it take with your current exclusionMap creation function. If it was considerable, neglect it and go with the classic way that we are doing in orifinal NuClick i.e., creating a big click_map.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the code to use the second approach. I think this is good enough?

tiatoolbox/models/dataset/interactive_segmentation.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants