Skip to content

Memory-bounded clone/fetch for large repositories #2527

@mjgil

Description

@mjgil

Summary 💡

I'd like to be able to clone or fetch large repositories with gitoxide while keeping process memory bounded and predictable, instead of having memory usage scale with pack shape or repository size.

The concrete thing I’m after is some form of memory-budgeted clone/fetch path, where pack receive, indexing, and delta resolution either:

  1. stay within a configured memory budget, or
  2. fail cleanly with a dedicated error,

instead of risking an OOM kill on large or adversarial repos.

I’ve been exploring this in a fork and would be interested in upstreaming it in smaller pieces if this direction makes sense.

Motivation 🔦

I’m using gitoxide in a service that mirrors remote repositories on relatively small machines. The tricky case is not correctness, it’s survivability: on a 1 GB-ish VPS, a sufficiently large repo, large pack, or deep delta chain can push clone/fetch memory high enough that the host gets OOM-killed.

What I want from gitoxide is roughly the same operational property you’d want from any server-side storage primitive: memory use should be controllable, and pathological inputs should degrade into a normal error instead of taking the process down.

The work I’ve done locally points to a plausible direction:

  • a shared process/repository-scoped memory budget
  • bounded decoded-object / delta caches
  • spilling large intermediates to disk when the budget is exhausted
  • external sorting / streaming in places that currently retain large in-memory structures
  • surfacing OutOfBudget-style failures as ordinary errors

In that fork, this was enough to get stress clones of very large repos into a much smaller and more predictable memory envelope, which makes gitoxide much more viable for mirroring and other server-side automation.

Before I invest more in polishing this for upstream, I wanted to check whether this is a direction the project would want to support.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions