UPSTREAM PR #17063: Add ops needed for new hybrid models: SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM #109

DajanaV · 2025-11-06T21:33:55Z

The ops needed for the new hybrid models including Qwen3 Next and Kimi Linear.

Prerequisite to merging ggml-org/llama.cpp#16095

…, CUMSUM

loci-agentic-ai · 2025-11-06T22:18:20Z

Access the complete analysis in the LOCI Dashboard

Based on the analysis of PR #109 in the llama.cpp repository, this pull request introduces new mathematical operations to the GGML tensor library without impacting core inference performance.

Summary

The PR adds several new mathematical operations including CUMSUM, TRI (triangular matrix operations), SOLVE_TRI (triangular system solver), and unary operations EXPM1 and SOFTPLUS. These additions expand GGML's mathematical capabilities but do not modify existing inference pathways.

Key Findings

Performance Impact: The highest percentage change identified was in the ggml_set_op_params_i32 function with increased Response Time. However, this function is a utility for setting operation parameters and is not part of the critical inference path. Core functions like llama_decode, llama_encode, and llama_tokenize remain unchanged, indicating no impact on tokens per second performance.

Core Function Analysis: The changes do not affect any of the performance-critical components identified in the project summary:

Model Processing Module functions remain untouched
Token Processing Module is unaffected
Memory Management and Batch Processing modules are unchanged
No modifications to inference-critical paths

Power Consumption: Analysis shows minimal power consumption changes at the binary level, with new operations adding computational overhead only when explicitly used. The main affected binary is the core GGML library.

Code Structure: The implementation follows established GGML patterns by adding new operation types to enums, implementing forward computation functions, and extending backend support. The changes are well-contained within the mathematical operations framework.

Code Review: The GitHub analysis shows proper implementation of new mathematical operations with appropriate test coverage and documentation updates. No critical issues were identified in the implementation.

Conclusion: This PR represents a feature addition rather than a performance modification. The new mathematical operations expand GGML's capabilities without affecting existing inference performance, making it a low-risk enhancement to the codebase's mathematical foundation.

Add ops needed for new hybrid models: SOFTPLUS, EXPM1, TRI, SOLVE_TRI…

d4cf45d

…, CUMSUM

DajanaV temporarily deployed to PROD__AL_DEMO November 6, 2025 21:33 — with GitHub Actions Inactive

DajanaV force-pushed the main branch 27 times, most recently from 6aa5dc2 to 81cedf2 Compare November 10, 2025 16:10

DajanaV force-pushed the main branch 13 times, most recently from 20900e4 to 2e1e7c4 Compare November 12, 2025 08:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UPSTREAM PR #17063: Add ops needed for new hybrid models: SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM #109

UPSTREAM PR #17063: Add ops needed for new hybrid models: SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM #109

DajanaV commented Nov 6, 2025

Uh oh!

loci-agentic-ai bot commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

UPSTREAM PR #17063: Add ops needed for new hybrid models: SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM #109

Are you sure you want to change the base?

UPSTREAM PR #17063: Add ops needed for new hybrid models: SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM #109

Conversation

DajanaV commented Nov 6, 2025

Uh oh!

loci-agentic-ai bot commented Nov 6, 2025

Summary

Key Findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants