Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Periodic deletion of Bulkrax /imports directory #2091

Open
eporter23 opened this issue Jan 18, 2023 · 13 comments
Open

Periodic deletion of Bulkrax /imports directory #2091

eporter23 opened this issue Jan 18, 2023 · 13 comments
Labels
Middleware Flag issues for Middleware/DevOps teams

Comments

@eporter23
Copy link
Contributor

eporter23 commented Jan 18, 2023

In order to prevent Curate from running out of disk space (1/18/23 occurrence), we need to expand our deletion/cleanup jobs to periodically to clean the /imports directory where we upload files to be used for Bulkrax ingests.

To investigate:

  1. What is stored in the /working directory and if it should be periodically cleared
  2. How to clear /imports which are stored per release

We currently have an AWX job that clears out the /tmp directory:

  • Curate-Prod EBS Local Volume Subdirectory Clean

Recommended frequency: same as the above job

@eporter23 eporter23 added Middleware Flag issues for Middleware/DevOps teams Software Engineering Flag work for software engineering team labels Jan 18, 2023
@bwatson78 bwatson78 self-assigned this Feb 22, 2023
@bwatson78
Copy link
Contributor

bwatson78 commented Feb 22, 2023

Questions:

@jcrompton42
Copy link

/tmp is local.

@eporter23
Copy link
Contributor Author

eporter23 commented Feb 22, 2023

Yes, that is fine, but will that impact @jcrompton42 's file copy process to move files from EFS to /imports for Bulkrax? See #1985

@bwatson78
Copy link
Contributor

bwatson78 commented Feb 22, 2023

It would just need to be updated, I think/hope.

@jcrompton42
Copy link

What @bwatson78 said, when the /imports location changes, can someone make me a ticket to update that?

@bwatson78
Copy link
Contributor

@jcrompton42 Yup, I will.

@bwatson78
Copy link
Contributor

Notes:

  • Removing the ENV of 'WORKING_PATH' would change /working to the default of /tmp/uploads, which currently doesn't exist.
  • If I unset config.import_path = 'imports' in bulkrax' initializer, it will default out the import folder to /tmp/imports.

@bwatson78
Copy link
Contributor

@rotated8 Knowing all of the info above, should I move forward with bringing both to /tmp in Arch as a trial run?

@eporter23
Copy link
Contributor Author

eporter23 commented Mar 2, 2023

@bwatson78 tested work in Curate v2.7.0 by running an export from test and a manually uploaded zip file import into arch. Import worked fine. Here's the path that was generated:Screen Shot 2023-03-02 at 1.43.21 PM.png

@eporter23
Copy link
Contributor Author

@bwatson78 related to the above comment, when I exported from test I noticed the CSV had a couple of those extraneous multi-valued field columns again for abstract and publisher. Screen Shot 2023-03-02 at 1.08.39 PM.png
Screen Shot 2023-03-02 at 1.08.49 PM.png

@eporter23
Copy link
Contributor Author

eporter23 commented Mar 7, 2023

@bwatson78 @jcrompton42 do we need another ticket related to updating the EFS to Curate file copy job? (#1985)

@jcrompton42
Copy link

I updated the file copy job so we should be good on that.

@rotated8 rotated8 removed the Software Engineering Flag work for software engineering team label Sep 11, 2023
@eporter23
Copy link
Contributor Author

eporter23 commented Oct 4, 2023

Notes from Slack discussion 10/4/23: this is implemented in prod, but each release has its own tmp/imports folder so as we do new releases we leave some stuff hanging out in imports. This means that each time we deploy to prod, there's a separate version with its own /tmp directory. Keeping this open until we determine next steps, which will be a separate ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Middleware Flag issues for Middleware/DevOps teams
Projects
None yet
Development

No branches or pull requests

4 participants