Summary 💡
I'd like to be able to clone or fetch large repositories with gitoxide while keeping process memory bounded and predictable, instead of having memory usage scale with pack shape or repository size.
The concrete thing I’m after is some form of memory-budgeted clone/fetch path, where pack receive, indexing, and delta resolution either:
- stay within a configured memory budget, or
- fail cleanly with a dedicated error,
instead of risking an OOM kill on large or adversarial repos.
I’ve been exploring this in a fork and would be interested in upstreaming it in smaller pieces if this direction makes sense.
Motivation 🔦
I’m using gitoxide in a service that mirrors remote repositories on relatively small machines. The tricky case is not correctness, it’s survivability: on a 1 GB-ish VPS, a sufficiently large repo, large pack, or deep delta chain can push clone/fetch memory high enough that the host gets OOM-killed.
What I want from gitoxide is roughly the same operational property you’d want from any server-side storage primitive: memory use should be controllable, and pathological inputs should degrade into a normal error instead of taking the process down.
The work I’ve done locally points to a plausible direction:
- a shared process/repository-scoped memory budget
- bounded decoded-object / delta caches
- spilling large intermediates to disk when the budget is exhausted
- external sorting / streaming in places that currently retain large in-memory structures
- surfacing
OutOfBudget-style failures as ordinary errors
In that fork, this was enough to get stress clones of very large repos into a much smaller and more predictable memory envelope, which makes gitoxide much more viable for mirroring and other server-side automation.
Before I invest more in polishing this for upstream, I wanted to check whether this is a direction the project would want to support.
Summary 💡
I'd like to be able to clone or fetch large repositories with
gitoxidewhile keeping process memory bounded and predictable, instead of having memory usage scale with pack shape or repository size.The concrete thing I’m after is some form of memory-budgeted clone/fetch path, where pack receive, indexing, and delta resolution either:
instead of risking an OOM kill on large or adversarial repos.
I’ve been exploring this in a fork and would be interested in upstreaming it in smaller pieces if this direction makes sense.
Motivation 🔦
I’m using
gitoxidein a service that mirrors remote repositories on relatively small machines. The tricky case is not correctness, it’s survivability: on a 1 GB-ish VPS, a sufficiently large repo, large pack, or deep delta chain can push clone/fetch memory high enough that the host gets OOM-killed.What I want from
gitoxideis roughly the same operational property you’d want from any server-side storage primitive: memory use should be controllable, and pathological inputs should degrade into a normal error instead of taking the process down.The work I’ve done locally points to a plausible direction:
OutOfBudget-style failures as ordinary errorsIn that fork, this was enough to get stress clones of very large repos into a much smaller and more predictable memory envelope, which makes
gitoxidemuch more viable for mirroring and other server-side automation.Before I invest more in polishing this for upstream, I wanted to check whether this is a direction the project would want to support.