Query vs Output subspace impact comparison

Investigate the relative impact of Query latent space vs Output latent space decomposition.

Decomposing the query space is an already-established technique in DeepSeek-V3.

Let's say we find that decomposing the Query space:
- Doesn't improve throughput.
- Hurts performance on the benchmark task.

(We might expect to see this given that DeepSeek-V2-Lite doesn't use a query latent space!)

If that's the case, it would be helpful context for our results. 

In particular, it would be interesting to see the relative impact between the two. Does one seem more beneficial than the other?

**Approach**

- We could evaluate this in the Encoder model, the Decoder, or both.
- There are 8 possible variations, and we've already done 3. 
  1. [x] No decompositions (standard MHA)
  2. [ ] KV only

  3. [ ] Query only

  4. [ ] Output only

  5. [x] Query and KV (standard MLA)
  6. [ ] Query and Output
  7. [ ] KV and Output
  8. [x] Query, KV, and Output (MLA-o)*Tasks:** [ ] Some variants to compare (we already have some of these):

I think that comparing "Query only" and "Output only" and standard MHA might be the most direct comparison.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Query vs Output subspace impact comparison #10

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Query vs Output subspace impact comparison #10

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions