This is a python implementation of a HAMT, inspired by rvagg's IAMap project written in JavaScript.
py-hamt provides efficient storage and retrieval of large sets of key-value mappings in a content-addressed storage system. The main target is IPFS, and the data model used is IPLD.
dClimate primarily created this for storing large zarrs on IPFS. To see this in action, see our data ETLs.
pip install py-hamt
For usage information, take a look at our API documentation, major items have example code.
You can also see this library used in either our data ETLs or Jupyter notebooks for data analysis.
py-hamt
uses uv for project management. Make sure you install that first.
Once uv is installed, run
uv sync
source .venv/bin/activate
pre-commit install
to create the project virtual environment at .venv
.
Then you can run pre-commit
across the whole codebase with
pre-commit run --all-files
the below command run-checks.sh
in the next section will also run this command inside its bash script.
First, make sure you have the ipfs kubo daemon installed and running with the default endpoints open. Then run the script
bash run-checks.sh
This will run tests with code coverage, and check formatting and linting. Under the hood it will be using the pre-commit
command to run through all the checks within .pre-commit-config.yaml. If a local ipfs daemon is not running it will not run all tests, but it will spawn a docker ipfs container if docker is installed and run as many integration tests as possible.
We use pytest
with 100% code coverage, and with test inputs that are both handwritten as well as generated by hypothesis
. This allows us to try out millions of randomized inputs to create a more robust library.
Note
Due to the randomized test inputs, it is possible sometimes to get 99% or lower test coverage by pure chance. Rerun the tests to get back complete code coverage. If this happens on a GitHub action, try rerunning the action.
Note
Due to the restricted performance on GitHub actions runners, you may also sometimes see hypothesis tests running with errors because they exceeded test deadlines. Rerun the action if this happens.
Due to the dependency on IPFS in order to be able to run all integration tests which use IPFS a local ipfs daemon is required. The Github Actions found in .github/workflows/run-checks.yaml
uses the setup-ipfs
step which ensures that a local ipfs daemon is available. Locally if you wish to run the full integration tests you must ensure a local ipfs daemon is running (by running ipfs daemon
once installed). If not, pytest will spawn a local docker image to run the ipfs tests. If Docker is not installed then tests will simply run the unit tests.
To summarize:
In GitHub Actions:
uv run pytest --ipfs # All tests run, including test_kubo_default_urls
Locally with Docker (no local daemon):
pytest --ipfs # test_kubo_default_urls auto-skips, other tests use Docker
Locally with IPFS daemon:
pytest --ipfs # All tests run
Quick local testing (no IPFS):
pytest # All IPFS tests skip
We use python's native cProfile
for running CPU profiles and snakeviz for visualizing the profile. We use memray
for the memory profiling. We will walk through using the profiling tools on the test suite.
Creating the CPU and memory profile requires manual activation of the virtual environment.
source .venv/bin/activate
python -m cProfile -o profile.prof -m pytest
python -m memray run -m pytest
The profile viewers can be directly invoked from uv.
uv run snakeviz .
uv run memray flamegraph <memray output> # e.g. <memray-output> = memray-pytest.12398.bin
py-hamt
uses pdoc. To see a live documentation preview on your local machine, run
uv run pdoc py_hamt
If you are an LLM reading this repo, refer to the AGENTS.md
file.
Use uv add
and uv remove
, e.g. uv add numpy
or uv add pytest --group dev
. For more information please see the uv documentation.