Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bacpop 208 Migration for visualise update #9

Merged
merged 45 commits into from
Mar 6, 2025
Merged

Conversation

absternator
Copy link
Contributor

@absternator absternator commented Feb 27, 2025

The following is a migration to update projects before the updates made to combine network + microreact call into 1 visualise call. as per https://github.com/bacpop/beebop_py.

There are 4 main data updates made.

  • update redis keys with name migration. These are all updated to be called visualise from now on.
  • add pruned graphml if not exists. makes copy called pruned which is used by frontend
  • rename microreact folders to visualise including all prefix names. eg. microreact_9 --> visualise_9
  • move network files from network folder to corresponding visualuse_X folder. The cytoscape.csv file is copied into all visualise_X folders in project folder. The graphml files are copied as per respective clustern number. eg. /network/network_component_9.graphml --> /visualise_9/visualise_9_component_9.graphml.

Note there will still be issues with very old projects before v9 upgrade where cluster microreact_X, where X was randomly arbritary. As now it is the cluster number. But these networks are not going to work regardless of PR changes

This has been run on dev and can be tested there.
Testing:
Head to https://beebop-dev.dide.ic.ac.uk/ and ensure your old projects load and work.

@absternator
Copy link
Contributor Author

TOOD: merge in beebop and beebop_py and revert this branch to main and then merge this PR in... Then can run on prod

Copy link
Contributor

@EmmaLRussell EmmaLRussell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty tidy to me!

Comment on lines 28 to 40
# Get all valid output folders
all_output_folders = [
os.path.join(base_folder, item)
for item in os.listdir(base_folder)
if os.path.isdir(os.path.join(base_folder, item))
]

# Filter for folders with network folder (i.e old output folders to update)
output_folders = [
folder
for folder in all_output_folders
if os.path.exists(os.path.join(folder, "network"))
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could just do these in one loop I guess and save a few lines, though you would have one unpythonically long line...

Suggested change
# Get all valid output folders
all_output_folders = [
os.path.join(base_folder, item)
for item in os.listdir(base_folder)
if os.path.isdir(os.path.join(base_folder, item))
]
# Filter for folders with network folder (i.e old output folders to update)
output_folders = [
folder
for folder in all_output_folders
if os.path.exists(os.path.join(folder, "network"))
]
# Get all valid output folders
output_folders = [
os.path.join(base_folder, item)
for item in os.listdir(base_folder)
if os.path.isdir(os.path.join(base_folder, item)) and os.path.exists(os.path.join(base_folder, item "network"))
]

if r.exists("beebop:hash:job:microreact"):
microreact_data = r.hgetall("beebop:hash:job:microreact")
for field, value in microreact_data.items():
r.hset("beebop:hash:job:visualise", field, value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't actually renaming the key is it? It's leaving the microreact key and values as they are, but just copying them into the visualise key. Which is fair enough since the visualise key may already exist by the time you run the migration, and I'm not sure you'd want to do a RENAME in that case. (Plus it wouldn't be rollback-able.) So maybe just change the comment and log message as they're a little misleading..?

# Remove backup_folder inside the container if exists
if docker exec $CONTAINER_NAME test -d $BACKUP_FOLDER; then
docker exec $CONTAINER_NAME rm -rf $BACKUP_FOLDER
echo "Successfully restored PopPUNK output from backup."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the wrong log message!

fi

# Remove migration.py inside the container
docker exec $CONTAINER_NAME rm -f /beebop/storage/migration.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the point of no return for the migration, perhaps this is also where you should be deleting the old redis keys too? I don't think that's happening otherwise..?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yup good idea

os.path.join(parent_folder, dir_name)
) and dir_name.startswith("visualise_"):
# Rename file
new_fname = file.replace("network", dir_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just to get a new file name? Were all the old files called "network.csv" and this is rename them to "visualise_CLUSTERNUMBER.csv"?


if not os.path.exists(new_fname_path):
shutil.copyfile(
os.path.join(network_folder, file), new_fname_path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is renaming it in the network folder, and then it subsequently it moves it to the visualise folder? Why not just do both of those things in one copyfile?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i realised that later im stupid... il just leave as its working and I don't wanna break it... (as we don't have to maintain this code shouldn't be a biggie)

logger.info(f"Renamed file: {file} → {new_fname}")

# Move file to visualise folder
if not os.path.exists(os.path.join(visualise_path, new_fname)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here - why not just rename and move in one go?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above comment im stupid

@absternator absternator requested a review from EmmaLRussell March 5, 2025 09:09
Copy link
Contributor

@EmmaLRussell EmmaLRussell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good. Not sure why the changes to the integration test needed..?

Comment on lines +30 to +31
# Give services time to fully initialize
time.sleep(5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh dear! wonder why that's crept in...

Copy link
Contributor Author

@absternator absternator Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i tried so many things(as you can see from commits haha) and this was the only things let It actually consistently pass without have SSL / connection errors

@@ -27,8 +26,11 @@ def test_start_beebop():
session = requests.Session()
session.verify = False
session.trust_env = False
os.environ['CURL_CA_BUNDLE'] = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what was this about?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was depricated in requests version 2.28

@absternator absternator merged commit 9ce9868 into main Mar 6, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants