Skip to content

Conversation

@hi-wesley
Copy link

@hi-wesley hi-wesley commented Nov 22, 2025

Summary

This PR adds a new merge method, core_ta, which implements Task Arithmetic in Core Space as described in:

Panariello et al., "Core Space: Composing and Editing Large Language Models in a Common Space" (NeurIPS 2025, to appear).

The method performs model merging in a shared "core space" constructed from low-rank factorizations of parameter deltas, then reconstructs the merged weights back in the original parameter space.

What this PR changes

  • Adds mergekit/merge_methods/core_space.py implementing:

    • core_space_task_arithmetic(tensors, base_tensor, rank=16) decorated via easy_define.merge_method with:

      • name="core_ta"
      • pretty_name="Task Arithmetic in Core Space"
      • a reference URL to the Core Space paper.
    • _core_space_ta_single, which for each 2D weight matrix:

      • Computes per-model deltas relative to base_tensor.
      • Approximates each delta with a low-rank SVD factorization (ΔW ≈ B @ A, LoRA-style).
      • Builds shared "core" bases from stacked factors (A_stack, B_stack) via SVD.
      • Projects each task into Core Space to obtain a small core matrix M_t.
      • Performs task arithmetic in Core Space (M_merged = Σ_t M_t).
      • Reconstructs the merged delta (Δ_merged = U_B_ref @ M_merged @ Vh_A_ref) and adds it back to base_tensor.
    • A fallback path for non-matrix tensors (base_tensor.ndim < 2) that uses standard task arithmetic on the deltas, to avoid unnecessary SVD.

  • Registers the method in mergekit/merge_methods/__init__.py so it is available as merge_method: core_ta in YAML configs.

Usage example

A minimal YAML example:

models:
  - model: mistralai/Mistral-7B-Instruct-v0.2
    parameters:
      weight: 1.0
  - model: mistralai/Mistral-7B-Instruct-v0.2
    parameters:
      weight: 1.0

base_model: mistralai/Mistral-7B-Instruct-v0.2

merge_method: core_ta

parameters:
  rank: 8

dtype: bfloat16
tokenizer_source: base

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Introduces `core_ta` merge method performing task arithmetic in a shared core space via low-rank SVD factorization, with 1D fallback and multihead support, and registers it for use.
> 
> - **Merge Methods**:
>   - **`core_ta` (Task Arithmetic in Core Space)** in `mergekit/merge_methods/core_space.py`:
>     - For 2D weights: factors per-model deltas (`ΔW`) with SVD into LoRA-style `B @ A`, derives shared core bases, sums task matrices in core space, and reconstructs merged weights.
>     - Supports batched/multihead tensors by per-slice processing; parameter `rank` controls truncation.
>     - Falls back to simple delta summation for non-matrix tensors.
>   - **Registration**: imports `core_space` in `mergekit/merge_methods/__init__.py` to expose `merge_method: core_ta`.
> 
> <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit fc2e1191d865d9442bdafd2c07ab658be7f0bc3b. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

…t core_space_task_arithmetic in merge_methods/core_space.py, using low-rank SVD to approximate LoRA-style factors and perform task arithmetic in Core Space. register the new 'core_ta' merge method in merge_methods/__init__.py
@github-actions
Copy link

github-actions bot commented Nov 22, 2025

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@hi-wesley
Copy link
Author

I have read the CLA Document and I hereby sign the CLA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant