Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tkakar/cat 673 create builder for seg mask #92

Merged
merged 22 commits into from
Oct 28, 2024
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -129,4 +129,6 @@ dmypy.json
.pyre/

# VSCode
.VSCode
.VSCode

.DS_Store
26 changes: 23 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# portal-visualization

Given HuBMAP Dataset JSON, creates a Vitessce configuration.
Given HuBMAP Dataset JSON (e.g. https://portal.hubmapconsortium.org/browse/dataset/004d4f157df4ba07356cd805131dfc04.json), creates a Vitessce configuration.

## Release process

Expand All @@ -23,7 +23,7 @@ $ pip install .
$ src/vis-preview.py --help
usage: vis-preview.py [-h] (--url URL | --json JSON) [--assaytypes_url URL]
[--assets_url URL] [--token TOKEN] [--marker MARKER]
[--to_json]
[--to_json] [--epic_uuid UUID]

Given HuBMAP Dataset JSON, generate a Vitessce viewconf, and load vitessce.io.

Expand All @@ -39,6 +39,25 @@ optional arguments:
--marker MARKER Marker to highlight in visualization; Only used in
some visualizations.
--to_json Output viewconf, rather than open in browser.
--epic_uuid UUID uuid of the EPIC dataset
```


```
Notes:
1. To get the token, look for Authorization Bearer {token represented by a long string} under `search-api` network calls under th network tab in developer's tool when browsing a dataset in portal
tkakar marked this conversation as resolved.
Show resolved Hide resolved
2. If you added an argument to the vis-preview.py script, do update the help docs in README, otherwise build will throw error
tkakar marked this conversation as resolved.
Show resolved Hide resolved
3.
tkakar marked this conversation as resolved.
Show resolved Hide resolved

```



## Build & Testing
```
To build: `python -m build`
`To run the tests `./test.sh`. Install the `flake8` and `autopep8` packages.

```

## Background
Expand All @@ -47,7 +66,8 @@ optional arguments:

Data for the Vitessce visualization almost always comes via raw data that is processed by [ingest-pipeline](https://github.com/hubmapconsortium/ingest-pipeline) airflow dags.
Harvard often contributes our own custom pipelines to these dags that can be found in [portal-containers](https://github.com/hubmapconsortium/portal-containers).
The outputs of these pipelines are then converted into view configurations for Vitessce by the [portal backend](https://github.com/hubmapconsortium/portal-ui/blob/0b43a468fff0256a466a3bf928a83893321ea1d9/context/app/api/client.py#L165),
The outputs of these pipelines are then converted into view configurations for Vitessce by the [portal backend](https://github.com/hubmapconsortium/portal-ui/blob/0b43a468fff0256a466a3bf928a83893321ea1d9/context/app/api/client.py#L165), The `vis-preview.py` mimics the invocation of `get_view_config_builder` for development and testing purposes independently, i.e., without using the [portal backend](https://github.com/hubmapconsortium/portal-ui/blob/0b43a468fff0256a466a3bf928a83893321ea1d9/context/app/api/client.py#L165).
tkakar marked this conversation as resolved.
Show resolved Hide resolved

using code in this repo, when a `Dataset` that should be visualized is requested in the client.
The view configurations are built using the [Vitessce-Python API](https://vitessce.github.io/vitessce-python/).

Expand Down
5 changes: 4 additions & 1 deletion src/portal_visualization/builders/base_builders.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,10 @@ def _build_assets_url(self, rel_path, use_token=True):
'https://example.com/uuid/rel_path/to/clusters.ome.tiff?token=groups_token'

"""
base_url = urllib.parse.urljoin(self._assets_endpoint, f"{self._uuid}/{rel_path}")
uuid = self._uuid
if hasattr(self, "_epic_uuid"): # pragma: no cover
uuid = self._epic_uuid
base_url = urllib.parse.urljoin(self._assets_endpoint, f"{uuid}/{rel_path}")
token_param = urllib.parse.urlencode({"token": self._groups_token})
return f"{base_url}?{token_param}" if use_token else base_url

Expand Down
123 changes: 109 additions & 14 deletions src/portal_visualization/builders/epic_builders.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,23 @@
from abc import ABC, abstractmethod
from vitessce import VitessceConfig
from abc import abstractmethod
from vitessce import VitessceConfig, ObsSegmentationsOmeTiffWrapper, AnnDataWrapper, \
get_initial_coordination_scope_prefix, CoordinationLevel as CL
from .base_builders import ConfCells
from ..utils import get_conf_cells
from .base_builders import ViewConfBuilder
from requests import get
import re

from ..paths import OFFSETS_DIR, IMAGE_PYRAMID_DIR

zarr_path = 'hubmap_ui/seg-to-mudata-zarr/secondary_analysis.zarr'

# EPIC builders take in a vitessce conf output by a previous builder and modify it
# accordingly to add the EPIC-specific configuration.
class EPICConfBuilder(ABC):
def __init__(self, base_conf: ConfCells, epic_uuid) -> None:


class EPICConfBuilder(ViewConfBuilder): # pragma: no cover
def __init__(self, epic_uuid, base_conf: ConfCells, entity, groups_token, assets_endpoint, **kwargs) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure if we wanted the EPICConfBuilder to extend the base ViewConfBuilder (i.e. in case there are any additional EPIC-specific patterns that become apparent) but I think it should be fine; I think the extra _epic_uuid logic you included in the base builder takes care of my initial concerns.

super().__init__(entity, groups_token, assets_endpoint, **kwargs)

conf, cells = base_conf

Expand Down Expand Up @@ -43,17 +53,102 @@ def apply(self):
def _apply(self, conf): # pragma: no cover
pass

def zarr_store_url(self):
adata_url = self._build_assets_url(zarr_path, use_token=False)
return adata_url

def segmentations_url(self, img_path):
img_url = self._build_assets_url(img_path)
return (
img_url,
str(
re.sub(
r"ome\.tiff?",
"offsets.json",
re.sub(IMAGE_PYRAMID_DIR, OFFSETS_DIR, img_url),
)
),
)

class SegmentationMaskBuilder(EPICConfBuilder):

class SegmentationMaskBuilder(EPICConfBuilder): # pragma: no cover
def _apply(self, conf):
zarr_url = self.zarr_store_url()
datasets = conf.get_datasets()
print(f"Found {len(datasets)} datasets")
# Proof of concept using one of the kaggle segmentation masks for now
# segmentations = ObsSegmentationsOmeTiffWrapper(
# img_url='https://assets.hubmapconsortium.org/c9d9ab5c9ee9642b60dd351024968627/ometiff-pyramids/VAN0042-RK-3-18-registered-PAS-to-postAF-registered.ome_mask.ome.tif?token=AgndN7NVbn83wwDXjpnY1Y0lDoJj2j7zOGmn1WN6qr9pqdkjKmt9C1XYm4KrlWrOXE9rVJvpnEKrPjIXrlKd1hmDGjV',
# # offsets_path=f'./{name}/{name}/offsets/{name}.segmentations.offsets.json',
# obs_types_from_channel_names=True,
# )
# dataset.add_object(segmentations)
pass
# TODO: add the correct path to the segmentation mask ome-tiff (image-pyramid)
seg_path = f'{self.segmentations_url("seg")}/'
# print(seg_path)
seg_path = (
'https://assets.hubmapconsortium.org/c9d9ab5c9ee9642b60dd351024968627/'
'ometiff-pyramids/VAN0042-RK-3-18-registered-PAS-to-postAF-registered.ome_mask.ome.tif?'
'token=AgzQXm7nvOW32vWw0EPpKonwbOqjNBzNvvW1p15855NoYglJxyfkC8rlJJWy8V6E8MeyXOwlpKdNBnHb5qnv7f8oeeG'
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitGuardian nagged me about including the token in the previously committed code, even though they expire quickly 😆

I believe we can access the token via self._groups_token like the superclass does now that we're extending it: https://github.com/hubmapconsortium/portal-visualization/blob/main/src/portal_visualization/builders/base_builders.py#L106-L107

mask_names = self.read_metadata_from_url()
mask_names = ['mask1', 'mask2'] # for testing purposes
if (mask_names is not None):
segmentation_objects = create_segmentation_objects(zarr_url, mask_names)
segmentations = ObsSegmentationsOmeTiffWrapper(
img_url=seg_path,
obs_types_from_channel_names=True,
coordination_values={
"fileUid": "segmentation-mask"
}
)

for dataset in datasets:
dataset.add_object(segmentations)
for obj in segmentation_objects:
dataset.add_object(obj)

# TODO: what happens if these views already exist , and if there are other views, how to place these?
spatial_view = conf.add_view("spatialBeta", dataset=dataset, w=8, h=12)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, this is definitely important to determine! While we may be able to handle the more basic image-only assays like Histology/PAS microscopy by appending to existing configurations, given that we anticipate there will be segmentation masks for various assays and that we'll have a few different base confs as a result, my understanding is that we may need to re-set up views and coordinations from scratch for those. We should discuss with @keller-mark to confirm/see if he has any alternative suggestions.

lc_view = conf.add_view("layerControllerBeta", dataset=dataset, w=4, h=12, x=8, y=0)
# without add_view can't access the metaCoordincatinSpace
# (e.g. get_coordination_scope() https://python-docs.vitessce.io/api_config.html?
# highlight=coordination#vitessce.config.VitessceChainableConfig.get_coordination_scope)
conf.link_views_by_dict([spatial_view, lc_view], {
"segmentationLayer": CL([
{
"fileUid": "segmentation-mask",
"spatialLayerVisible": True,
"spatialLayerOpacity": 1,
}
])

}, meta=True, scope_prefix=get_initial_coordination_scope_prefix("A", "obsSegmentations"))

def read_metadata_from_url(self):
url = f'{self.zarr_store_url()}/metadata.json'
print(f"metadata.json URL: {url}")
# url ='https://portal.hubmapconsortium.org/browse/dataset/004d4f157df4ba07356cd805131dfc04.json'
request_init = self._get_request_init() or {}
response = get(url, **request_init)
if response.status_code == 200:
data = response.json()
if isinstance(data, dict) and "mask_name" in data:
mask_name = data["mask_name"]
print(f"Mask name found: {mask_name}")
return mask_name
else:
print("'mask_name' key not found in the response.")
return None
else:
# raise Exception(f"Failed to retrieve data: {response.status_code} - {response.reason}")
pass # for testing purposes


def create_segmentation_objects(base_url, mask_names): # pragma: no cover
segmentation_objects = []
for mask_name in mask_names:
mask_url = f'{base_url}/{mask_name}.zarr'
segmentations_zarr = AnnDataWrapper(
adata_url=mask_url,
obs_locations_path="obsm/X_spatial",
obs_labels_names=mask_name,
coordination_values={
"obsType": mask_name
}
)
segmentation_objects.append(segmentations_zarr)

return segmentation_objects
2 changes: 1 addition & 1 deletion src/portal_visualization/epic_factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,5 @@

# This function will determine which builder to use for the given entity.
# Since we only have one builder for EPICs right now, we can just return it.
def get_epic_builder(epic_uuid):
def get_epic_builder(epic_uuid): # pragma: no cover
return SegmentationMaskBuilder
64 changes: 55 additions & 9 deletions src/vis-preview.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
import requests

from portal_visualization.builder_factory import get_view_config_builder
from portal_visualization.epic_factory import get_epic_builder


def main(): # pragma: no cover
Expand Down Expand Up @@ -42,39 +43,84 @@ def main(): # pragma: no cover
parser.add_argument(
'--to_json', action='store_true',
help='Output viewconf, rather than open in browser.')
# parser.add_argument(
# '--parent_uuid', action='store_true',
# help='Parent uuid for the dataset',
# default=None)
parser.add_argument(
'--epic_uuid', metavar='UUID',
help='uuid of the EPIC dataset',
default=None)

#
# parser.add_argument(
# '--epic_builder', action='store_true',
# help='Whether to use the epic_builder or not',
# default=None)

args = parser.parse_args()
marker = args.marker
# epic_builder = args.epic_builder
epic_uuid = args.epic_uuid
# parent_uuid = args.parent_uuid # this may not be needed, as the --url provides the parent dataset json?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parent_uuid is necessary for visualizing image pyramid support datasets, since the logic in builder_factory expects parent is not None for those cases and performs an assaytype lookup on the parent to determine which specific image pyramid builder to use (to handle evolution of how image pyramids were handled over time): https://github.com/hubmapconsortium/portal-visualization/blob/main/src/portal_visualization/builder_factory.py#L75-L76


if args.url:
response = requests.get(args.url)
if response.status_code == 403:
# Even if the user has provided a globus token,
# that isn't useful when making requests to our portal.
raise Exception('Protected data: Download JSON via browser; Redo with --json')
tkakar marked this conversation as resolved.
Show resolved Hide resolved
response.raise_for_status()
json_str = response.text
else:
json_str = args.json.read_text()
entity = json.loads(json_str)

def get_assaytype(uuid):
def get_assaytype(entity):
uuid = entity.get("uuid")
headers = {}
if args.token:
headers['Authorization'] = f'Bearer {args.token}'
requests.get(f'{defaults["assaytypes_url"]}/{uuid}', headers=headers).json()
try:
response = requests.get(f'{defaults["assaytypes_url"]}{uuid}', headers=headers)
if response.status_code != 200:
print(f"Error: Received status code {response.status_code}")
else:
try:
data = response.json()
return data
except Exception as e:
print(f"Error in parsing the response {str(e)}")
except Exception as e:
print(f"Error accessing {defaults['assaytypes_url']}{uuid}: {str(e)}")

Builder = get_view_config_builder(entity, get_assaytype)
builder = Builder(entity, args.token, args.assets_url)
print(f'Using: {builder.__class__.__name__}', file=stderr)
conf_cells = builder.get_conf_cells(marker=marker)
if args.to_json:
print(json.dumps(conf_cells.conf, indent=2))

if (epic_uuid is not None and conf_cells is not None): # pragma: no cover
EpicBuilder = get_epic_builder(epic_uuid)
epic_builder = EpicBuilder(epic_uuid, conf_cells, entity, args.token, args.assets_url)
print(f'Using: {epic_builder.__class__.__name__}', file=stderr)
conf_cells = epic_builder.get_conf_cells()

if isinstance(conf_cells.conf, list):
conf_as_json = json.dumps(conf_cells.conf[0])
else:
conf_as_json = json.dumps(conf_cells.conf)
data_url = f'data:,{quote_plus(conf_as_json)}'
vitessce_url = f'http://vitessce.io/#?url={data_url}'
open_new_tab(vitessce_url)

if args.to_json:
print(conf_as_json)

# For testing
# with open ('epic.json','w') as file:
# if isinstance(conf_cells.conf, list):
# json.dump( conf_cells.conf[0], file, indent=4, separators=(',', ': '))
# else:
# json.dump( conf_cells.conf, file, indent=4, separators=(',', ': '))

data_url = f'data:,{quote_plus(conf_as_json)}'
vitessce_url = f'http://vitessce.io/#?url={data_url}'
open_new_tab(vitessce_url)


if __name__ == "__main__": # pragma: no cover
Expand Down
31 changes: 16 additions & 15 deletions test/test_builders.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@
import pytest
import zarr

from src.portal_visualization.epic_factory import get_epic_builder
from src.portal_visualization.builders.base_builders import ConfCells
# from src.portal_visualization.epic_factory import get_epic_builder
# from src.portal_visualization.builders.base_builders import ConfCells
from src.portal_visualization.builder_factory import (
get_view_config_builder,
has_visualization,
Expand Down Expand Up @@ -147,6 +147,7 @@ def test_entity_to_vitessce_conf(entity_path, mocker):
# but to test the end-to-end integration, they are useful.
groups_token = environ.get("GROUPS_TOKEN", "groups_token")
assets_url = environ.get("ASSETS_URL", "https://example.com")
# epic_uuid = environ.get("EPIC_UUID", "epic_uuid")
builder = Builder(entity, groups_token, assets_url)
conf, cells = builder.get_conf_cells(marker=marker)

Expand All @@ -173,22 +174,22 @@ def test_entity_to_vitessce_conf(entity_path, mocker):
# TODO: This is a stub for now, real tests for the EPIC builders
# will be added in a future PR.

epic_builder = get_epic_builder(entity["uuid"])
assert epic_builder is not None
# epic_builder = get_epic_builder(conf, epic_uuid)
# assert epic_builder is not None

if conf is None:
with pytest.raises(ValueError):
epic_builder(ConfCells(conf, cells), entity["uuid"]).get_conf_cells()
return
# if conf is None:
# with pytest.raises(ValueError):
# epic_builder(ConfCells(conf, cells), epic_uuid, entity, groups_token, parent).get_conf_cells()
# return

built_epic_conf, _ = epic_builder(
ConfCells(conf, cells), entity["uuid"]
).get_conf_cells()
# built_epic_conf, _ = epic_builder(
# ConfCells(conf, cells), epic_uuid, entity, groups_token, parent
# ).get_conf_cells()

assert built_epic_conf is not None
assert json.dumps(built_epic_conf, indent=2, sort_keys=True) == json.dumps(
conf, indent=2, sort_keys=True
)
# assert built_epic_conf is not None
# assert json.dumps(built_epic_conf, indent=2, sort_keys=True) == json.dumps(
# conf, indent=2, sort_keys=True
# )


@pytest.mark.parametrize("entity_path", bad_entity_paths, ids=lambda path: path.name)
Expand Down
Loading
Loading