Add OpenAI Batch API support and batch_size passthrough by m7mdhka · Pull Request #424 · google/langextract

m7mdhka · 2026-03-23T03:47:24Z

Description

This PR adds true provider-native batching support for OpenAI via the OpenAI Batch API, and makes BaseLanguageModel.infer_batch() respect batch_size by passing it through to provider implementations as a hint.

Choose one: Feature

Key changes

BaseLanguageModel.infer_batch() now validates batch_size > 0 and forwards it into infer(..., batch_size=...) as a provider hint.
OpenAI: adds an OpenAI Batch API helper and wires it into the OpenAI provider behind an explicit batch config and threshold.
Gemini/Ollama: strips batch_size from runtime kwargs to avoid it leaking into provider payload/options.

Files

Base batching: langextract/core/base_model.py
OpenAI provider + batch wiring: langextract/providers/openai.py
OpenAI Batch helper: langextract/providers/openai_batch.py
Kwarg hygiene: langextract/providers/gemini.py, langextract/providers/ollama.py
Tests: tests/openai_batch_test.py, tests/inference_test.py

Risks / notes

OpenAI batch mode is opt-in via config and only triggers above a threshold; non-batch behavior remains unchanged by default.
Batch jobs are async (polling + timeout) and output ordering is normalized using custom_id.

How Has This Been Tested?

pytest -q
./autoformat.sh
pylint --rcfile=.pylintrc langextract/providers/openai_batch.py langextract/providers/openai.py langextract/core/base_model.py langextract/providers/gemini.py langextract/providers/ollama.py
pylint --rcfile=tests/.pylintrc tests/openai_batch_test.py tests/inference_test.py
pre-commit run --files langextract/core/base_model.py langextract/providers/openai.py langextract/providers/openai_batch.py langextract/providers/gemini.py langextract/providers/ollama.py tests/inference_test.py tests/openai_batch_test.py

Checklist

I have read and acknowledged Google's Open Source Code of conduct.
I have read the Contributing page, and I either signed the Google Individual CLA or am covered by my company's Corporate CLA.
I have discussed my proposed solution with code owners in the linked issue(s) and we have agreed upon the general approach.
I have made any needed documentation changes, or noted in the linked issue(s) that documentation elsewhere needs updating.
I have added tests, or I have ensured existing tests cover the changes.
I have followed Google's Python Style Guide and ran pylint over the affected code.

- Pass batch_size through BaseLanguageModel.infer_batch() as a provider hint\n- Add OpenAI Batch API helper and wire it into OpenAI provider\n- Prevent batch_size leaking into Gemini/Ollama payload kwargs\n- Add OpenAI batch helper unit tests

- Refactor helper to satisfy pylint return-statement limit\n- Fix test fakes so pylint passes under tests/.pylintrc\n- Record verified test/lint commands in PR_openai_batch.md

github-actions · 2026-03-23T03:47:33Z

No linked issues found. Please link an issue in your pull request description or title.

Per our Contributing Guidelines, all PRs must:

Reference an issue with one of:
- Closing keywords: Fixes #123, Closes #123, Resolves #123 (auto-closes on merge in the same repository)
- Reference keywords: Related to #123, Refs #123, Part of #123, See #123 (links without closing)
The linked issue should have 5+ 👍 reactions from unique users (excluding bots and the PR author)
Include discussion demonstrating the importance of the change

You can also use cross-repo references like owner/repo#123 or full URLs.

google-cla · 2026-03-23T03:47:35Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

github-actions · 2026-04-02T05:18:40Z

⚠️ Branch Update Required

Your branch is 1 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

jfryb753-netizen · 2026-04-02T07:56:33Z

Description

This PR adds true provider-native batching support for OpenAI via the OpenAI Batch API, and makes BaseLanguageModel.infer_batch() respect batch_size by passing it through to provider implementations as a hint.

Choose one: Feature

Key changes

BaseLanguageModel.infer_batch() now validates batch_size > 0 and forwards it into infer(..., batch_size=...) as a provider hint.

OpenAI: adds an OpenAI Batch API helper and wires it into the OpenAI provider behind an explicit batch config and threshold.

Gemini/Ollama: strips batch_size from runtime kwargs to avoid it leaking into provider payload/options.

Files

Base batching: langextract/core/base_model.py

OpenAI provider + batch wiring: langextract/providers/openai.py

OpenAI Batch helper: langextract/providers/openai_batch.py

Kwarg hygiene: langextract/providers/gemini.py, langextract/providers/ollama.py

Tests: tests/openai_batch_test.py, tests/inference_test.py

Risks / notes

OpenAI batch mode is opt-in via config and only triggers above a threshold; non-batch behavior remains unchanged by default.

Batch jobs are async (polling + timeout) and output ordering is normalized using custom_id.

How Has This Been Tested?

pytest -q

./autoformat.sh

pylint --rcfile=.pylintrc langextract/providers/openai_batch.py langextract/providers/openai.py langextract/core/base_model.py langextract/providers/gemini.py langextract/providers/ollama.py

pylint --rcfile=tests/.pylintrc tests/openai_batch_test.py tests/inference_test.py

pre-commit run --files langextract/core/base_model.py langextract/providers/openai.py langextract/providers/openai_batch.py langextract/providers/gemini.py langextract/providers/ollama.py tests/inference_test.py tests/openai_batch_test.py

Checklist

I have read and acknowledged Google's Open Source Code of conduct.

I have read the Contributing page, and I either signed the Google Individual CLA or am covered by my company's Corporate CLA.

I have discussed my proposed solution with code owners in the linked issue(s) and we have agreed upon the general approach.

I have made any needed documentation changes, or noted in the linked issue(s) that documentation elsewhere needs updating.

I have added tests, or I have ensured existing tests cover the changes.

I have followed Google's Python Style Guide and ran pylint over the affected code.

github-actions · 2026-04-10T03:04:16Z

⚠️ Branch Update Required

Your branch is 4 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

m7mdhka added 3 commits March 23, 2026 06:34

Add OpenAI Batch API support

5cc63a7

- Pass batch_size through BaseLanguageModel.infer_batch() as a provider hint\n- Add OpenAI Batch API helper and wire it into OpenAI provider\n- Prevent batch_size leaking into Gemini/Ollama payload kwargs\n- Add OpenAI batch helper unit tests

Fix lint in OpenAI batch helper

ba39ab0

- Refactor helper to satisfy pylint return-statement limit\n- Fix test fakes so pylint passes under tests/.pylintrc\n- Record verified test/lint commands in PR_openai_batch.md

Remove local PR draft markdown

aad9011

github-actions bot added the size/L Pull request with 600-1000 lines changed label Mar 23, 2026

Restore inference_test.py to upstream

6331302

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OpenAI Batch API support and batch_size passthrough#424

Add OpenAI Batch API support and batch_size passthrough#424
m7mdhka wants to merge 4 commits intogoogle:mainfrom
m7mdhka:fix/openai-batch-batchsize

m7mdhka commented Mar 23, 2026

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

google-cla bot commented Mar 23, 2026

Uh oh!

github-actions bot commented Apr 2, 2026

Uh oh!

jfryb753-netizen commented Apr 2, 2026

Description

Key changes

Files

Risks / notes

How Has This Been Tested?

Checklist

Uh oh!

github-actions bot commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

m7mdhka commented Mar 23, 2026

Description

Key changes

Files

Risks / notes

How Has This Been Tested?

Checklist

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

google-cla bot commented Mar 23, 2026

Uh oh!

github-actions bot commented Apr 2, 2026

Uh oh!

jfryb753-netizen commented Apr 2, 2026

Description

Key changes

Files

Risks / notes

How Has This Been Tested?

Checklist

Uh oh!

github-actions bot commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants