Skip to content

Fix bug: Update requirements and notebooks for DL examples #633

Merged
YanxuanLiu merged 7 commits into
NVIDIA:mainfrom
YanxuanLiu:dl-inf-env
Jun 26, 2026
Merged

Fix bug: Update requirements and notebooks for DL examples #633
YanxuanLiu merged 7 commits into
NVIDIA:mainfrom
YanxuanLiu:dl-inf-env

Conversation

@YanxuanLiu

@YanxuanLiu YanxuanLiu commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Fix bug of requirements and notebooks for DL inference examples.

  • Updated versions in requirements
  • Updated notebooks to align with latest environment and dependencies.

@YanxuanLiu YanxuanLiu self-assigned this Jun 16, 2026
@YanxuanLiu YanxuanLiu changed the title Fix bug: Update requirements and notebooks for DL examples [DO NOT REVIEW] Fix bug: Update requirements and notebooks for DL examples Jun 16, 2026
@greptile-apps

greptile-apps Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes several bugs in TensorFlow deep-learning inference notebooks and tightens dependency pins across the shared requirements files. The core fixes address a [UNK]-in-vocabulary corruption in text_classification_tf, a target-column data-leakage bug in keras_preprocessing_tf, and a non-deterministic vocabulary deduplication via list(set()) that could silently shuffle token indices between runs.

  • Dependency updates (requirements.txt, torch_requirements.txt): datasets is pinned to ==3.*, huggingface-hub<1.0 is added to guard against breaking API changes, and torch/torchvision/torch-tensorrt are locked to exact versions (2.8.0/0.23.0/2.8.0).
  • TF model loading (conditional_generation_tf, pipelines_tf): use_safetensors=False is added to all TFAutoModel* and TFT5* loads to force the PyTorch .bin path, fixing an incompatibility in newer transformers+TF combinations; explicit model/tokenizer objects are now passed to pipeline() instead of letting it auto-download.
  • Vocabulary normalization (text_classification_tf): normalize_vocabulary is moved before export_model.save() so the saved .keras file already contains the cleaned vocab, and the function now correctly removes both \"\" and \"[UNK]\" (special tokens that set_vocabulary prepends automatically) while preserving insertion order.

Confidence Score: 5/5

Safe to merge — all changes are targeted bug fixes and dependency stabilization with no new logic paths that could regress existing behavior.

The changes correct real bugs (data leakage from the wrong DataFrame reference in df_to_dataset, non-deterministic vocabulary ordering from list(set()), missing [UNK] filter before set_vocabulary) and add defensive version pins. The refactoring in the notebooks is straightforward and backed by executed cell outputs in the notebook itself. The only rough edge is a leftover unused device variable in predict_batch_fn, which is cosmetic and has no effect on correctness or GPU placement.

The two requirements files (requirements.txt, torch_requirements.txt) carry the dependency constraints that have been discussed in prior review threads; they are worth a second look if there are concerns about the numpy <2 upper bound or the tensorrt index URL CUDA alignment.

Important Files Changed

Filename Overview
examples/ML+DL-Examples/Spark-DL/dl_inference/requirements.txt Pins datasets to ==3.*, adds huggingface-hub<1.0, widens numpy to >=1.26.4,<2; <2 upper bound still excludes NumPy 2.x which may conflict with modern tensorflow.
examples/ML+DL-Examples/Spark-DL/dl_inference/torch_requirements.txt Pins torch/torchvision/torch-tensorrt to 2.8.0/0.23.0/2.8.0; tensorrt index URL still points to cu121 which may not carry a matching wheel for torch 2.8.0.
examples/ML+DL-Examples/Spark-DL/dl_inference/huggingface/conditional_generation_tf.ipynb Adds use_safetensors=False to all three TFT5ForConditionalGeneration loads to force the PyTorch .bin format, fixing a model-loading incompatibility with newer transformers/TF combinations.
examples/ML+DL-Examples/Spark-DL/dl_inference/huggingface/pipelines_tf.ipynb Model and tokenizer are now loaded explicitly with use_safetensors=False; device=device replaced with dtype=None across all inference paths. Leftover device variable in predict_batch_fn is dead code after the refactoring.
examples/ML+DL-Examples/Spark-DL/dl_inference/tensorflow/keras_preprocessing_tf.ipynb Fixes a data-leakage bug in df_to_dataset where dataframe.items() was used after df.pop('target'); now correctly uses df.items(). Also refactors the train/val/test split.
examples/ML+DL-Examples/Spark-DL/dl_inference/tensorflow/text_classification_tf.ipynb Moves normalize_vocabulary earlier (before export_model.save); fixes non-deterministic list(set()) deduplication and correctly filters [UNK] which set_vocabulary expects to be absent.
examples/ML+DL-Examples/Spark-DL/dl_inference/vllm/qwen-2.5-7b_vllm.ipynb Removes task="generate" (now auto-inferred by vllm) and increases wait_retries from 60 to 180.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[vectorize_layer.adapt on train data] --> B[Train + evaluate model]
    B --> C["normalize_vocabulary(get_vocabulary())\nFilters '' and '[UNK]', preserves order"]
    C --> D[vectorize_layer.set_vocabulary]
    D --> E[export_model.save → text_model.keras]
    E --> F[Spark predict_batch_udf loads model]
    D --> G["normalize_vocabulary again\n(idempotent on already-clean vocab)"]
    G --> H[vectorize_layer.set_vocabulary]
    H --> I[export_model.save → text_model_cleaned.keras]
    I --> J[Triton server loads cleaned model]
    style C fill:#d4edda,stroke:#28a745
    style D fill:#d4edda,stroke:#28a745
    style G fill:#fff3cd,stroke:#ffc107
    style H fill:#fff3cd,stroke:#ffc107
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[vectorize_layer.adapt on train data] --> B[Train + evaluate model]
    B --> C["normalize_vocabulary(get_vocabulary())\nFilters '' and '[UNK]', preserves order"]
    C --> D[vectorize_layer.set_vocabulary]
    D --> E[export_model.save → text_model.keras]
    E --> F[Spark predict_batch_udf loads model]
    D --> G["normalize_vocabulary again\n(idempotent on already-clean vocab)"]
    G --> H[vectorize_layer.set_vocabulary]
    H --> I[export_model.save → text_model_cleaned.keras]
    I --> J[Triton server loads cleaned model]
    style C fill:#d4edda,stroke:#28a745
    style D fill:#d4edda,stroke:#28a745
    style G fill:#fff3cd,stroke:#ffc107
    style H fill:#fff3cd,stroke:#ffc107
Loading

Reviews (2): Last reviewed commit: "Update DL inference requirement headers" | Re-trigger Greptile

Comment thread examples/ML+DL-Examples/Spark-DL/dl_inference/vllm_requirements.txt Outdated
Comment thread examples/ML+DL-Examples/Spark-DL/dl_inference/requirements.txt Outdated
@YanxuanLiu YanxuanLiu changed the title [DO NOT REVIEW] Fix bug: Update requirements and notebooks for DL examples Fix bug: Update requirements and notebooks for DL examples Jun 25, 2026
YanxuanLiu and others added 7 commits June 25, 2026 18:35
Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com>
Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com>
Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com>
Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com>
Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com>
Signed-off-by: YanxuanLiu <yanxuanl@nvidia.com>
@YanxuanLiu YanxuanLiu marked this pull request as ready for review June 25, 2026 10:38

@rishic3 rishic3 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks @YanxuanLiu. On a broader note almost no one is writing any new Tensorflow, so we should consider deprecating the _tf notebooks.

@YanxuanLiu

Copy link
Copy Markdown
Collaborator Author

Looks good, thanks @YanxuanLiu. On a broader note almost no one is writing any new Tensorflow, so we should consider deprecating the _tf notebooks.

Sure, will remove from our default notebook list

@YanxuanLiu YanxuanLiu merged commit 1151c92 into NVIDIA:main Jun 26, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants