Skip to content

all scripts used to extract data#309

Open
Joishee05 wants to merge 7 commits intomainfrom
Thorax
Open

all scripts used to extract data#309
Joishee05 wants to merge 7 commits intomainfrom
Thorax

Conversation

@Joishee05
Copy link
Collaborator

@Joishee05 Joishee05 commented Mar 2, 2026

Closes #295

Extracts the data from Thorax.pptx.

Questions:
A lot of the images extracted from the PowerPoint look very similar since the presentation is just four pairs of bones. For the images, would it make sense to keep all the images and rename them? I tried naming them, and it got confusing after a while.

I haven't gotten around to renaming the rest of them. Extra files will be deleted later.

@leandrumartin Let me know if this looks good and I will create a seperate PR.

@leandrumartin
Copy link
Collaborator

Currently it does look like the server expects each bone and sub-bone to have its own image. This does seem unnecessary; I'll work on a fix to make it so that if the server doesn't find an image for a sub-bone, it will grab the bone image instead. That way, we won't have to do all the manual renaming of sub-bone images when the image just matches the image of the bone.

So in the meantime, for any of the images that on the slides are supposed to represent a sub-bone but are really just the image of the parent bone with a colored region annotation overlaid on top, you can only keep the parent bone image.

Also, please remake this as a PR into the data branch, not main.

@leandrumartin
Copy link
Collaborator

Slight correction to my previous comment: it doesn't really matter what the server is doing to grab the subbone images; what determines which boneset/bone/subbone is linked to which image is determined by the "images" field in the description JSON. So to link a subbone to a specific image, you can (as described in the documentation for cleaning the output of Extract_Bone_Descriptions.py) include that image filename in the "images" field. So duplicate images are not needed at all.

@Joishee05 Joishee05 marked this pull request as ready for review March 9, 2026 16:19
@Joishee05 Joishee05 requested a review from leandrumartin as a code owner March 9, 2026 16:19
Copy link
Collaborator

@leandrumartin leandrumartin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The content of all the files looks good. Just make sure when you remake this as a PR into data that the unnecessary files are removed and everything is placed into the correct folders.

  • The Thorax.xml/ppt/ folder should not be committed
  • As with the other PR, when remaking this as a PR into data, ensure the files are placed into the appropriate existing folders instead of adding new ones

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should not be committed

Copy link
Collaborator

@leandrumartin leandrumartin Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be renamed boneset/thorax.json when placed into data branch

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should not be committed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unneeded in database, should not be committed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extract Thorax PowerPoint data and add to database

2 participants