Skip to content

KeyError and OperationalErrors when loading SEG file from NBIA #344

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jjjermiah opened this issue Apr 16, 2025 · 11 comments
Open

KeyError and OperationalErrors when loading SEG file from NBIA #344

jjjermiah opened this issue Apr 16, 2025 · 11 comments

Comments

@jjjermiah
Copy link

Thanks for this great package!
Im trying to use the Segmentation class to process DICOM-SEG files published on the NBIA, and running into two distinct issues for the ISPY2 and NSCLC-Radiogenomics collections:

NSCLC-Radiogenomics

This file has pixel data, and the instantiation itself fails immediately.

hd.seg.Segmentation.from_file('data/NSCLC_Radiogenomics/R01-001/SEG_Series28767652/00000001.dcm')
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[5], line 1
----> 1 hd.seg.Segmentation.from_file('data/NSCLC_Radiogenomics/R01-001/SEG_Series28767652/00000001.dcm')

File ~/bhklab/radiomics/Projects/med-imagetools/.pixi/envs/dev/lib/python3.12/site-packages/highdicom/image.py:4796, in _Image.from_file(cls, fp, lazy_frame_retrieval)
   4794     image._file_reader = reader
   4795 else:
-> 4796     image = cls.from_dataset(_wrapped_dcmread(fp), copy=False)
   4798 return image

File ~/bhklab/radiomics/Projects/med-imagetools/.pixi/envs/dev/lib/python3.12/site-packages/highdicom/seg/sop.py:2864, in Segmentation.from_dataset(cls, dataset, copy)
   2858             pixel_measures = PixelMeasuresSequence.from_sequence(
   2859                 pffg_item.PixelMeasuresSequence,
   2860                 copy=False,
   2861             )
   2862             pffg_item.PixelMeasuresSequence = pixel_measures
-> 2864 seg = super().from_dataset(seg, copy=False)
   2866 return cast(Self, seg)

File ~/bhklab/radiomics/Projects/med-imagetools/.pixi/envs/dev/lib/python3.12/site-packages/highdicom/image.py:1153, in _Image.from_dataset(cls, dataset, copy)
   1150 im.__class__ = cls
   1151 im = cast(Self, im)
-> 1153 im._build_luts()
   1154 return im

File ~/bhklab/radiomics/Projects/med-imagetools/.pixi/envs/dev/lib/python3.12/site-packages/highdicom/image.py:1931, in _Image._build_luts(self)
   1926 self._coordinate_system = get_image_coordinate_system(
   1927     self
   1928 )
   1930 if is_multiframe_image(self):
-> 1931     self._build_luts_multiframe()
   1932 else:
   1933     self._build_luts_single_frame()

File ~/bhklab/radiomics/Projects/med-imagetools/.pixi/envs/dev/lib/python3.12/site-packages/highdicom/image.py:2237, in _Image._build_luts_multiframe(self)
   2235 grp_ptr = func_grp_pointers[ptr]
   2236 if grp_ptr is not None:
-> 2237     dim_val = frame_item[grp_ptr][0][ptr].value
   2238 else:
   2239     dim_val = frame_item[ptr].value

File ~/bhklab/radiomics/Projects/med-imagetools/.pixi/envs/dev/lib/python3.12/site-packages/pydicom/dataset.py:1051, in Dataset.__getitem__(self, key)
   1048     except Exception as exc:
   1049         raise KeyError(f"'{key}'") from exc
-> 1051 elem = self._dict[tag]
   1053 if isinstance(elem, RawDataElement):
   1054     # If a deferred read, then go get the value now
   1055     if elem.value is None and elem.length != 0:

KeyError: (0062,000A)

ISPY2

The instantiation here works, but attempting to access the volume fails.

hd.seg.Segmentation.from_file('data/ISPY2/ISPY2-111038/SEG_Series20495892/00000001.dcm').get_volume()
---------------------------------------------------------------------------
OperationalError                          Traceback (most recent call last)
Cell In[4], line 1
----> 1 hd.seg.Segmentation.from_file('data/ISPY2/ISPY2-111038/SEG_Series20495892/00000001.dcm').get_volume()

File ~/bhklab/radiomics/Projects/med-imagetools/.pixi/envs/dev/lib/python3.12/site-packages/highdicom/seg/sop.py:4487, in Segmentation.get_volume(self, slice_start, slice_end, row_start, row_end, column_start, column_end, as_indices, dtype, segment_numbers, combine_segments, relabel, rescale_fractional, skip_overlap_checks, apply_palette_color_lut, apply_icc_profile, allow_missing_positions, rtol, atol)
   4479     affine = volume_geometry[
   4480         :,
   4481         row_start - 1:,
   4482         column_start - 1:,
   4483     ].affine
   4484 else:
   4485     # Check that the combination of frame numbers and segment numbers
   4486     # uniquely identify segmentation frames
-> 4487     if not self._do_columns_identify_unique_frames(columns):
   4488         raise RuntimeError(
   4489             'Volume positions and segment numbers do not '
   4490             'uniquely identify frames of the segmentation image.'
   4491         )
   4493     (
   4494         stack_table_def,
   4495         volume_geometry,
   (...)   4502         as_indices=as_indices,
   4503     )

File ~/bhklab/radiomics/Projects/med-imagetools/.pixi/envs/dev/lib/python3.12/site-packages/highdicom/image.py:2788, in _Image._do_columns_identify_unique_frames(self, column_names, filter)
   2785     total = self.number_of_frames
   2786     filter = ''
-> 2788 n_unique_combos = cur.execute(
   2789     "SELECT COUNT(*) FROM "
   2790     f"(SELECT 1 FROM FrameLUT {filter} GROUP BY {col_str})"
   2791 ).fetchone()[0]
   2792 return n_unique_combos == total

OperationalError: no such column: ReferencedSegmentNumber

Is this expected behaviour?

My main goal is to read in these SEG files, align to their reference CT/MR files, and do some processing in python.

some of the access methods that I'd like to use are:

Segmentation.get_plane_positions()
Segmentation.get_volume_geometry()
Segmentation.iter_segments()
Segmentation.get_volume()

Supplementary Data

# ISPY2 SEG UID:
SeriesInstanceUID = 1.3.6.1.4.1.14519.5.2.1.181531379856362150361263442486520495892
SOPInstanceUID = 1.3.6.1.4.1.14519.5.2.1.100457706057013982244711005164575873732

# NSCLC Radiogenomics UID:
SeriesInstanceUID = 1.3.6.1.4.1.14519.5.2.1.4334.1501.208304249098874719086628767652
SOPInstanceUID = 1.3.6.1.4.1.14519.5.2.1.4334.1501.194849568492835137645001869117

dcm dumps:
NSCLC_Radiogenomics__SEG_Series28767652__00000001.dcmdump.txt
ISPY2-111038__SEG_Series20496892__00000001.dcmdump.txt

Please let me know how I can help with this as well.

@fedorov
Copy link
Member

fedorov commented Apr 16, 2025

The specific SEG from the NSCLC-Radiogenomics seem to be created using probably the first ever iteration of the DICOM SEG writer in 3D Slicer, it could be that those objects have issues. I would need to look into more detail.

ISPY2 segmentations are known to be invalid - the creator incorrectly implemented the conversion, and utilized individual bits within a byte to encode different segments. I think if you want to parse those out you would need to write some custom code. I don't know if the conventions used are documented.

For your reference, all of the public DICOM collections from TCIA are also available in IDC, and you can download any collection/patient/study/series files using idc-index with a command line like this:

idc download 1.3.6.1.4.1.14519.5.2.1.4334.1501.208304249098874719086628767652

@CPBridge
Copy link
Collaborator

CPBridge commented Apr 16, 2025

Thanks @fedorov for this very useful background information.

If the ISPY2 images are as incorrect as you describe, then I think that it's probably a non-starter for highdicom to try and support reading them and I'm afraid you are on your own @jjjermiah, sorry.

For the NSCLC dataset, I can try to take a look soon. @fedorov I'm assuming that the copy of these files in the IDC matches the ones on TCIA that seem to be causing problems here? Or did we fix them somehow before ingesting them into the IDC?

@fedorov
Copy link
Member

fedorov commented Apr 16, 2025

If the ISPY2 images are as incorrect as you describe, then I think that it's probably a non-starter for highdicom to try and support reading them and I'm afraid you are on your own.

I agree, I would not expect highdicom to handle it out of the box. But I think highdicom could still be useful to decypher those perhaps. I should go back into notes and post on IDC forum what are the exact issues we identified with that dataset.

For the NSCLC dataset, I can try to take a look soon. @fedorov I'm assuming that the copy of these files in the IDC matches the ones on TCIA that seem to be causing problems here? Or did we fix them somehow before ingesting them into the IDC?

Yes, what we have in IDC should be the same. No, we did not fix anything before ingesting. If we decide to fix those, they should be resubmitted to TCIA and get into IDC when we re-fetch the updated content from TCIA.

@CPBridge
Copy link
Collaborator

Hi @jjjermiah I believe #345 should resolve your issue with the NSCLC dataset. I have only tested on one case though, perhaps you would be willing to test the rest of the dataset and let us know if you come across other problems before we merge?

I think the files are perfectly valid, but are "unusual" in the sense that one of the attributes used as a Dimension Index is found in the SharedFunctionalGroupsSequence rather than the per frame functional groups sequence. Generally this would not be expected because it is unusual to index along a dimension that doesn't change. But from reviewing the relevant part of the standard, I think it is an entirely valid thing to do, and it's a simple fix.

@jjjermiah
Copy link
Author

@CPBridge thanks a bunch for addressing this. I will test on the rest of the dataset and let you know what I find

@AlexNmSED
Copy link

@jjjermiah Hello, I am also working on the ISPY2 dataset. I am full of confusion about the mask analysis file they provide. How do I get the annotation of the tumor region. Hope to get your help.

@jjjermiah
Copy link
Author

@AlexNmSED given that it doesnt follow the standard, we've also chosen not to support its unique approach storing segmentation data.
Without looking deeply into it, I would probably just find which region has a segment label with the tumor and then disregard everything else, but Im unsure if thats even correct either.

@AlexNmSED
Copy link

AlexNmSED commented May 9, 2025 via email

@jjjermiah
Copy link
Author

For my purposes, it would be sufficient to simply identify which slices contain the tumor—I don’t necessarily need to extract the exact tumor regions. Do you have any suggestions on how to obtain the slice indices where the tumor appears, or perhaps any sample code that could help with this?

Oh if thats the case then the mask values are encoded by values of 0 as described in Analysis-mask-files-description.v20211020.docx

@AlexNmSED
Copy link

Thank you for your help.

@AlexNmSED
Copy link

@jjjermiah Hello. Do you know how to obtain the DCE-MRI data from the I-SPY2 study at each different time point, specifically the pre-contrast images and the images from each contrast-enhanced phase?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants