Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,20 +11,20 @@ exclude: &exclude_files >
repos:

- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
rev: v6.0.0
hooks:
- id: check-json
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace

- repo: https://github.com/ikamensh/flynt/
rev: '1.0.1'
rev: '1.0.6'
hooks:
- id: flynt

- repo: https://github.com/executablebooks/mdformat
rev: '0.7.17'
rev: '1.0.0'
hooks:
- id: mdformat
additional_dependencies:
Expand All @@ -34,20 +34,20 @@ repos:
files: (?x)^(README\.md|CHANGELOG\.md)$

- repo: https://github.com/asottile/pyupgrade
rev: v3.14.0
rev: v3.21.2
hooks:
- id: pyupgrade
args: [--py37-plus]

- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.8.6
rev: v0.14.7
hooks:
- id: ruff-format
- id: ruff
args: [--fix, --exit-non-zero-on-fix, --show-fixes]

- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.5.1
rev: v1.19.0
hooks:
- id: mypy
additional_dependencies:
Expand Down
58 changes: 29 additions & 29 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,41 +2,41 @@

## v1.4.0 (6 October 2025)

- Add `readable`, `writable`, and `closed` properties to stream classes for TextIOWrapper compatibility [\[1c73d64\]](https://github.com/aiidateam/disk-objectstore/commit/1c73d64137e1b093918337609cb6c8a6dece4a7b)
- Add `readable`, `writable`, and `closed` properties to stream classes for TextIOWrapper compatibility [[1c73d64]](https://github.com/aiidateam/disk-objectstore/commit/1c73d64137e1b093918337609cb6c8a6dece4a7b)

## v1.3.0 (17 April 2025)

- Change API of `database.get_session` to always raise an error [\[6686ad0\]](https://github.com/aiidateam/disk-objectstore/commit/6686ad0c3280bf90e1954b3b8052ec999e8532be)
- Change API of `database.get_session` to always raise an error [[6686ad0]](https://github.com/aiidateam/disk-objectstore/commit/6686ad0c3280bf90e1954b3b8052ec999e8532be)

- Add support for Python 3.13 [\[9b02a50\]](https://github.com/aiidateam/disk-objectstore/commit/9b02a50360749db1ea28ebe20661bf074d6c63a0)
- Add support for Python 3.13 [[9b02a50]](https://github.com/aiidateam/disk-objectstore/commit/9b02a50360749db1ea28ebe20661bf074d6c63a0)

- Properly close SQL connections that led to open file descriptors in Python3.13 [\[6686ad0\]](https://github.com/aiidateam/disk-objectstore/commit/6686ad0c3280bf90e1954b3b8052ec999e8532be) and [\[f5eed0f\]](https://github.com/aiidateam/disk-objectstore/commit/f5eed0f1afd1576f17e5d71d31df9717041fc9f3)
- Properly close SQL connections that led to open file descriptors in Python3.13 [[6686ad0]](https://github.com/aiidateam/disk-objectstore/commit/6686ad0c3280bf90e1954b3b8052ec999e8532be) and [[f5eed0f]](https://github.com/aiidateam/disk-objectstore/commit/f5eed0f1afd1576f17e5d71d31df9717041fc9f3)

## v1.2.0 (26 September 2024)

This only enforces proper semantic versioning as the last release added a new functionality. No changes have been added.

## v1.1.1 (19 September 2024)

- Added progress bar functionality for repack and pack_all_loose [\[737f9c7\]](https://github.com/aiidateam/disk-objectstore/commit/737f9c71151bf7ac297c6431688b4a75eac91b7c)
- Added progress bar functionality for repack and pack_all_loose [[737f9c7]](https://github.com/aiidateam/disk-objectstore/commit/737f9c71151bf7ac297c6431688b4a75eac91b7c)

## v1.1.0 (7 March 2024)

### Features

- Add functionality to easily create a container backup [\[23c784a\]](https://github.com/aiidateam/disk-objectstore/commit/23c784a221954a1518a3e35affdec53681f809b7)
- Add functionality to easily create a container backup [[23c784a]](https://github.com/aiidateam/disk-objectstore/commit/23c784a221954a1518a3e35affdec53681f809b7)

## v1.0.0 (September 2023)

### Features

- Add support for `whence=2` in `PackedObjectReader.seek` [\[5515ab6\]](https://github.com/aiidateam/disk-objectstore/commit/5515ab6d75581b36ecb3e0b8ff37407e05abefda)
- Add support for changing compression when repacking, and add auto compression heuristics [\[599e87c\]](https://github.com/aiidateam/disk-objectstore/commit/599e87c852427e02062f04f5f3d2276013410710)
- Improve efficiency when accessing packed compressed objects [\[10edd63\]](https://github.com/aiidateam/disk-objectstore/commit/10edd6395455d7c59361e608396b672289d8de58)
- Add support for `whence=2` in `PackedObjectReader.seek` [[5515ab6]](https://github.com/aiidateam/disk-objectstore/commit/5515ab6d75581b36ecb3e0b8ff37407e05abefda)
- Add support for changing compression when repacking, and add auto compression heuristics [[599e87c]](https://github.com/aiidateam/disk-objectstore/commit/599e87c852427e02062f04f5f3d2276013410710)
- Improve efficiency when accessing packed compressed objects [[10edd63]](https://github.com/aiidateam/disk-objectstore/commit/10edd6395455d7c59361e608396b672289d8de58)

### Changes

- A number of API methods changed the return type from bare dictionaries to dataclass instances [\[7a63462\]](https://github.com/aiidateam/disk-objectstore/commit/7a634626ea3e5f35aa3cdd458daf9d8b825d759a)
- A number of API methods changed the return type from bare dictionaries to dataclass instances [[7a63462]](https://github.com/aiidateam/disk-objectstore/commit/7a634626ea3e5f35aa3cdd458daf9d8b825d759a)

- `Container.get_object_stream_and_meta -> ObjectMeta`
- `Container.get_objects_meta -> ObjectMeta`
Expand All @@ -47,43 +47,43 @@ This only enforces proper semantic versioning as the last release added a new fu

The dataclasses are importable from `disk_objectstore.dataclasses`.

- A number of API methods replaced using `os.path` with `str` paths, for `pathlib.Path` [\[df96142\]](https://github.com/aiidateam/disk-objectstore/commit/df9614236b7d420fb610313d70ffae51e7aead75)
- A number of API methods replaced using `os.path` with `str` paths, for `pathlib.Path` [[df96142]](https://github.com/aiidateam/disk-objectstore/commit/df9614236b7d420fb610313d70ffae51e7aead75)
The following methods now return a `pathlib.Path` instance:

- `Container.get_folder`
- `LazyOpener.path`

- Various improvements to docs and code [\[5ba9316\]](https://github.com/aiidateam/disk-objectstore/commit/5ba93162cd49d9b1ca7149c502349bfb06833255)
- Various improvements to docs and code [[5ba9316]](https://github.com/aiidateam/disk-objectstore/commit/5ba93162cd49d9b1ca7149c502349bfb06833255)

### Devops

- Moving documentation to `sphinx+myst` [\[2002f3c\]](https://github.com/aiidateam/disk-objectstore/commit/2002f3c3ec07f7ff46a04df293c8c9a7dff4db6a)
- Adopt PEP 621 and move build spec to `pyproject.toml` [\[4bd0c4e\]](https://github.com/aiidateam/disk-objectstore/commit/4bd0c4e01eaf3c149d4e11921b7ff4d42a5d5da5)
- Make types more permissive [\[c012056\]](https://github.com/aiidateam/disk-objectstore/commit/c0120568a992b41a55b325f3217d4902b5281070)
- Moving documentation to `sphinx+myst` [[2002f3c]](https://github.com/aiidateam/disk-objectstore/commit/2002f3c3ec07f7ff46a04df293c8c9a7dff4db6a)
- Adopt PEP 621 and move build spec to `pyproject.toml` [[4bd0c4e]](https://github.com/aiidateam/disk-objectstore/commit/4bd0c4e01eaf3c149d4e11921b7ff4d42a5d5da5)
- Make types more permissive [[c012056]](https://github.com/aiidateam/disk-objectstore/commit/c0120568a992b41a55b325f3217d4902b5281070)

### Dependencies

- Add Python 3.11 support [\[afdae26\]](https://github.com/aiidateam/disk-objectstore/commit/afdae261a5849e994b5920ca07665fc6a19f3852)
- Unpin `sqlalchemy` adding support for `>=1.4.22` [\[a2a987f\]](https://github.com/aiidateam/disk-objectstore/commit/a2a987f02a128b7cc265982e102d210e6e17d6f6)
- Removed uneeded `ablog` dependencies [\[8165f58\]](https://github.com/aiidateam/disk-objectstore/commit/8165f58fefdd40b55555eef9a2d40ee280593232)
- Add Python 3.11 support [[afdae26]](https://github.com/aiidateam/disk-objectstore/commit/afdae261a5849e994b5920ca07665fc6a19f3852)
- Unpin `sqlalchemy` adding support for `>=1.4.22` [[a2a987f]](https://github.com/aiidateam/disk-objectstore/commit/a2a987f02a128b7cc265982e102d210e6e17d6f6)
- Removed uneeded `ablog` dependencies [[8165f58]](https://github.com/aiidateam/disk-objectstore/commit/8165f58fefdd40b55555eef9a2d40ee280593232)

## v0.6.0 (September 2021)

- ⬆️ UPGRADE: Remove Python support for 3.5 and 3.6, and add support for 3.9.
- ⬆️ UPGRADE: SQLAlchemy v1.4 (with v2 API) [\[#114\]](https://github.com/aiidateam/disk-objectstore/pull/114)
- ✨ NEW: Add basic CLI [\[#117\]](https://github.com/aiidateam/disk-objectstore/pull/117) (see README.md for details)
- 🔧 MAINTAIN: Add type annotations and mypy type checking [\[#113\]](https://github.com/aiidateam/disk-objectstore/pull/113)
- ⬆️ UPGRADE: SQLAlchemy v1.4 (with v2 API) [[#114]](https://github.com/aiidateam/disk-objectstore/pull/114)
- ✨ NEW: Add basic CLI [[#117]](https://github.com/aiidateam/disk-objectstore/pull/117) (see README.md for details)
- 🔧 MAINTAIN: Add type annotations and mypy type checking [[#113]](https://github.com/aiidateam/disk-objectstore/pull/113)

## v0.5.0 (November 2020)

- Various general (but very important) speed improvements [\[#96\]](https://github.com/aiidateam/disk-objectstore/pull/96) [\[#102\]](https://github.com/aiidateam/disk-objectstore/pull/102)
- Add callbacks to a number of functions (e.g. export, add_objects_to_pack, ... to allow showing progress bars or similar indicators [\[#96\]](https://github.com/aiidateam/disk-objectstore/pull/96)
- Implement repacking (at least when not changing hashing or compression) [\[#96\]](https://github.com/aiidateam/disk-objectstore/pull/96)
- Remove `export` function, implement `import_objects` function instead, to be called on the other side (it's more efficient) [\[#96\]](https://github.com/aiidateam/disk-objectstore/pull/96)
- Add support for VACUUMing operations on the SQLite database (very important for efficiency) [\[#96\]](https://github.com/aiidateam/disk-objectstore/pull/96)
- Add support for multiple hashing algorithms [\[#96\]](https://github.com/aiidateam/disk-objectstore/pull/96)
- Add concept of (unique) `container_id` [\[#97\]](https://github.com/aiidateam/disk-objectstore/pull/97)
- Generalize the compression algorithm implementation, and multiple algorithms are supported now [\[#99\]](https://github.com/aiidateam/disk-objectstore/pull/99)
- Various general (but very important) speed improvements [[#96]](https://github.com/aiidateam/disk-objectstore/pull/96) [[#102]](https://github.com/aiidateam/disk-objectstore/pull/102)
- Add callbacks to a number of functions (e.g. export, add_objects_to_pack, ... to allow showing progress bars or similar indicators [[#96]](https://github.com/aiidateam/disk-objectstore/pull/96)
- Implement repacking (at least when not changing hashing or compression) [[#96]](https://github.com/aiidateam/disk-objectstore/pull/96)
- Remove `export` function, implement `import_objects` function instead, to be called on the other side (it's more efficient) [[#96]](https://github.com/aiidateam/disk-objectstore/pull/96)
- Add support for VACUUMing operations on the SQLite database (very important for efficiency) [[#96]](https://github.com/aiidateam/disk-objectstore/pull/96)
- Add support for multiple hashing algorithms [[#96]](https://github.com/aiidateam/disk-objectstore/pull/96)
- Add concept of (unique) `container_id` [[#97]](https://github.com/aiidateam/disk-objectstore/pull/97)
- Generalize the compression algorithm implementation, and multiple algorithms are supported now [[#99]](https://github.com/aiidateam/disk-objectstore/pull/99)

## v0.4.0 (20 July 2020)

Expand Down
4 changes: 2 additions & 2 deletions disk_objectstore/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -167,12 +167,12 @@ def optimize(dostore: ContainerContext, non_interactive: bool, compress: bool, v
if not non_interactive:
click.confirm('Is this the only process accessing the container?', abort=True)
size = sum(f.stat().st_size for f in dostore.path.glob('**/*') if f.is_file())
click.echo(f'Initial container size: {round(size/1000, 2)} Mb')
click.echo(f'Initial container size: {round(size / 1000, 2)} Mb')
with dostore.container as container:
container.pack_all_loose(compress=compress)
container.clean_storage(vacuum=vacuum)
size = sum(f.stat().st_size for f in dostore.path.glob('**/*') if f.is_file())
click.echo(f'Final container size: {round(size/1000, 2)} Mb')
click.echo(f'Final container size: {round(size / 1000, 2)} Mb')


@main.command('backup')
Expand Down
24 changes: 12 additions & 12 deletions disk_objectstore/container.py
Original file line number Diff line number Diff line change
Expand Up @@ -1548,9 +1548,9 @@ def add_streamed_objects_to_pack( # pylint: disable=too-many-locals, too-many-b
operations! (See e.g. the `import_files()` method).
:return: a list of object hash keys
"""
assert isinstance(
compress, bool
), 'Only True of False are valid `compress` modes when adding direclty to a pack'
assert isinstance(compress, bool), (
'Only True of False are valid `compress` modes when adding direclty to a pack'
)
yield_per_size = 1000
hashkeys: list[str] = []

Expand Down Expand Up @@ -1875,9 +1875,9 @@ def loosen_object(self, hashkey):
# This always rewrites it as loose
written_hashkey = self.add_streamed_object(stream)

assert (
written_hashkey == hashkey
), 'Mismatch in the hashkey when rewriting an existing object as loose! {written_hashkey} vs {hashkey}'
assert written_hashkey == hashkey, (
'Mismatch in the hashkey when rewriting an existing object as loose! {written_hashkey} vs {hashkey}'
)
return self._get_loose_path_from_hashkey(hashkey)

def _vacuum(self) -> None:
Expand Down Expand Up @@ -2546,14 +2546,14 @@ def repack_pack( # pylint: disable=too-many-branches,too-many-statements,too-ma
In case of "close", the value is None.
return value of the callback function is ignored.
"""
assert (
pack_id != self._REPACK_PACK_ID
), f"The specified pack_id '{pack_id}' is invalid, it is the one used for repacking"
assert pack_id != self._REPACK_PACK_ID, (
f"The specified pack_id '{pack_id}' is invalid, it is the one used for repacking"
)

# Check that it does not exist
assert not self._get_pack_path_from_pack_id(
self._REPACK_PACK_ID, allow_repack_pack=True
).exists(), f"The repack pack '{self._REPACK_PACK_ID}' already exists, probably a previous repacking aborted?"
assert not self._get_pack_path_from_pack_id(self._REPACK_PACK_ID, allow_repack_pack=True).exists(), (
f"The repack pack '{self._REPACK_PACK_ID}' already exists, probably a previous repacking aborted?"
)

session = self._get_operation_session()

Expand Down
10 changes: 5 additions & 5 deletions disk_objectstore/examples/example_objectstore.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ def main(
files = {}

start_counts = container.count_objects()
print(f"Currently known objects: {start_counts['packed']} packed, {start_counts['loose']} loose")
print(f'Currently known objects: {start_counts["packed"]} packed, {start_counts["loose"]} loose')
print('Pack objects on disk:', start_counts['pack_files'])

print(f'Generating {num_files} files in memory...')
Expand All @@ -95,9 +95,9 @@ def main(

# Check that no loose files were created
counts = container.count_objects()
assert (
counts['loose'] == start_counts['loose']
), f"Mismatch (loose in packed case): {start_counts['loose']} != {counts['loose']}"
assert counts['loose'] == start_counts['loose'], (
f'Mismatch (loose in packed case): {start_counts["loose"]} != {counts["loose"]}'
)
## Cannot do this with the hash key implenentation - I might have stored the same object twice
# assert counts['packed'
# ] == start_counts['packed'] + num_files, 'Mismatch (packed in packed case): {} + {} != {}'.format(
Expand Down Expand Up @@ -156,7 +156,7 @@ def main(
# Check that all loose files are gone
counts = container.count_objects()
loose_folder = container._get_loose_folder() # pylint: disable=protected-access
assert not counts['loose'], 'loose objects left: ' f'{os.listdir(loose_folder)}'
assert not counts['loose'], f'loose objects left: {os.listdir(loose_folder)}'
## I cannot do this because I could have overlap if the object is identical and has the same hash key
# assert counts['packed'] == start_counts['packed'] + start_counts[
# 'loose'] + num_files, 'Mismatch (post-pack): {} + {} + {} != {}'.format(
Expand Down
20 changes: 10 additions & 10 deletions disk_objectstore/examples/profile_zeros.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ def main_run(container, size_gb, compress_packs):
size_bytes = size_gb * 1024 * 1024 * 1024

start_counts = container.count_objects()
print(f"Currently known objects: {start_counts['packed']} packed, {start_counts['loose']} loose")
print(f'Currently known objects: {start_counts["packed"]} packed, {start_counts["loose"]} loose')
print('Pack objects on disk:', start_counts['pack_files'])

zero_stream = ZeroStream(length=size_bytes)
Expand All @@ -44,12 +44,12 @@ def main_run(container, size_gb, compress_packs):

# Check that no loose files were created
counts = container.count_objects()
assert (
counts['loose'] == start_counts['loose']
), f"Mismatch (loose in packed case): {start_counts['loose']} != {counts['loose']}"
assert (
counts['packed'] == start_counts['packed'] + 1
), f"Mismatch (packed in packed case): {start_counts['packed']} + 1 != {counts['packed']}"
assert counts['loose'] == start_counts['loose'], (
f'Mismatch (loose in packed case): {start_counts["loose"]} != {counts["loose"]}'
)
assert counts['packed'] == start_counts['packed'] + 1, (
f'Mismatch (packed in packed case): {start_counts["packed"]} + 1 != {counts["packed"]}'
)

# print container size info
size_info = container.get_total_size()
Expand Down Expand Up @@ -167,9 +167,9 @@ def main(size_gb, path, clear, check_memory_measurement, with_line_profiler, com
interval=memory_check_interval,
)
# Check that it's not an empty list
assert (
memory_report
), f'>> Process too fast for checking memory usage with interval {memory_check_interval} s!!!'
assert memory_report, (
f'>> Process too fast for checking memory usage with interval {memory_check_interval} s!!!'
)
print(
f'>> Max memory usage (check interval {memory_check_interval} s, '
f'{len(memory_report)} checks performed): {max(memory_report):.3f} MB'
Expand Down
Loading
Loading