Halo: Batch Query Processing and Optimization for Agentic Workflows

Halo: Batch Query Processing and Optimization for Agentic Workflows

Here is the prototype for Halo, a novel system that unifies LLM serving with query optimization to efficiently process batch agentic workflows.

We identify key features in our design:

Unified Framework: Halo integrates LLM serving and query optimization into a single framework, simplifying deployment and management.
Batch Processing: The system is optimized for batch processing, allowing for efficient handling of large volumes of queries leveraging techniques like cache reuse and prefix caching.
Query Optimization: Halo employs advanced techniques to optimize query execution, targeting reduced latency and redundant context exchange while adapting to varying workloads and resource availability.

We hope Halo can be deployed in broader scenarios and achieve larger cost savings in the era of large generative models.

Installation

Install uv for environment management:

curl -LsSf https://astral.sh/uv/install.sh | sh
source ~/.bashrc

Build Halo's environment:

uv venv
uv sync
source .venv/bin/activate

Usage

Build a declarative YAML configuration file for your workflow. Example:

start_ops:
  - op0
end_ops:
  - op1

ops:
  op0:
    model: meta-llama/Llama-3.2-3B-Instruct
    prompt: "Please try to answer the question with multi-step Chain of Thought."
    max_tokens: 512
    input_ops: []
    output_ops:
      - op1

  op1:
    model: meta-llama/Llama-3.1-8B-Instruct
    prompt: "Please answer the question again based on the previous context and your own reasoning."
    max_tokens: 1024
    input_ops:
      - op0
    output_ops: []

Create your queries

from halo.components import Query
queries = [
    Query(
        id = 0,
        prompt = "What is the capital of France?",
    )
    ...
]

Execute with Halo's optimizer

from halo.optimizers import Optimizer_v

optimizer = Optimizer_v(YAML_CONFIG_PATH)
queries = optimizer.execute(queries, return_queries=True)

Citation

If you find this project useful, please consider citing our work:

@misc{shen2025batchqueryprocessingoptimization,
      title={Batch Query Processing and Optimization for Agentic Workflows}, 
      author={Junyi Shen and Noppanat Wadlom and Yao Lu},
      year={2025},
      eprint={2509.02121},
      archivePrefix={arXiv},
      primaryClass={cs.DB},
      url={https://arxiv.org/abs/2509.02121}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
figs		figs
halo		halo
scripts		scripts
templates		templates
.gitignore		.gitignore
.python-version		.python-version
LICENSE.txt		LICENSE.txt
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Halo: Batch Query Processing and Optimization for Agentic Workflows

Installation

Usage

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Halo: Batch Query Processing and Optimization for Agentic Workflows

Installation

Usage

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages