Skip to content

allowing multiple pack storage locations #123

@zhubonan

Description

@zhubonan

One problem I face with my current AiiDA-based workflow is the growing size of the repository verses the finite size of the fast SSD storage. This can happen quite quickly if I had to run a few "large" caclulations for which a lot of data is needed during post-processing and provenance critical . In theory, most of the files stored by AiiDA are not frequently accessed and they are perfectly fine to sit on a slow storage position, e.g. spinning disk or NFS mounts. On the other hand, having the whole repository on a slow storage location can slow down the daemon and workflows.

I think this package can give a natural solution to this problem. Here, the loose "objects" can be written onto a fast-to-write disk. The read-only access of the "fully" packed packs no longer benefit from fast disk speed, so they can be moved into a slow storage if needed, e.g:

  • loose files -> objectore folder on fast SSD
  • not fully packed pack file -> objectore folder on fast SSD
  • full pack file with only read access -> addition folders on slow storage location

At the moment, all of the (integer numbers) packs are stored under the packs folder, would it be possible to allow multiple storage positions to be used (for fully "packed" ones)? I think it should just be a matter of iterating over the storage locations and check if the file exists, or a dictionary of pack id and their locations can built when the Container class is instantiated to reduce the overhead.

Please let me know what do you think about thsi idea. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    wontfixThis will not be worked on

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions