-
Notifications
You must be signed in to change notification settings - Fork 518
[HuggingFace][Neuronx] Training - DLC for Optimum-neuron 0.3.0 - Neuron SDK 2.24.1 PyTorch 2.7.1 - Transformers 4.51.3 #5292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
abf24d8
e75107e
9fdecdd
51ae3fd
2d2f77b
be67daa
887dc16
f25c8ec
64854ad
7cc49e1
e72d0cd
4737acd
a77e0e5
19830ff
de3464d
818817e
25e18ff
63bfd04
7b6765a
6b710eb
3f0ee0c
37c9e08
e3cd8e2
08f9a51
671683a
fca4a28
b194622
22056df
b7834de
b5badd6
010a0b7
297b8ff
ffabd56
cabea97
6776c67
61597b2
fd53d95
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
# https://github.com/aws/deep-learning-containers/blob/master/available_images.md | ||
# refer to the above page to pull latest PyTorch Neuronx image | ||
|
||
# docker image region us-west-2 | ||
FROM 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training-neuronx:2.7.0-neuronx-py310-sdk2.24.1-ubuntu22.04 | ||
|
||
LABEL maintainer="Amazon AI" | ||
LABEL dlc_major_version="2" | ||
|
||
# Version args | ||
ARG OPTIMUM_NEURON_VERSION=0.3.0 | ||
ARG TRANSFORMERS_VERSION=4.51.0 | ||
ARG DATASETS_VERSION=4.1.0 | ||
ARG GEVENT_VERSION=24.10.3 | ||
ARG PYTHON=python3 | ||
|
||
RUN apt-get remove -y --purge emacs && \ | ||
apt-get autoremove -y | ||
|
||
RUN pip install --upgrade pip | ||
|
||
# We need to set this environment variable to avoid the following error when building KenLM: | ||
# https://github.com/kpu/kenlm/issues/462 | ||
ENV CMAKE_POLICY_VERSION_MINIMUM=3.5 | ||
|
||
# Install Hugging Face libraries and its dependencies | ||
# Install optimum-neuron with this exta starting from next release. \ | ||
# "optimum-neuron[training]"==${OPTIMUM_NEURON_VERSION} \ | ||
RUN pip install --no-cache-dir \ | ||
"sagemaker==2.232.2" \ | ||
evaluate \ | ||
transformers[sklearn,sentencepiece,audio,vision]==${TRANSFORMERS_VERSION} \ | ||
datasets==${DATASETS_VERSION} \ | ||
optimum-neuron[training]==${OPTIMUM_NEURON_VERSION} \ | ||
gevent==${GEVENT_VERSION} | ||
|
||
# Pin numpy to version required by neuronx-cc | ||
# Update Pillow, urllib, wandb versions to fix high and critical vulnerabilities | ||
# neuronx-cc has requirement networkx~=2.6 | ||
RUN pip install -U \ | ||
"tensorboard>=2.11.0" \ | ||
"numpy>=1.24.3,<=1.25.2" \ | ||
"numba" \ | ||
"Pillow==10.3.0" \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. pillow is getting installed before too |
||
"requests<2.32.0" \ | ||
wandb \ | ||
pytorch-lightning \ | ||
Jinja2 \ | ||
mlflow \ | ||
tornado \ | ||
"awscli<2" \ | ||
boto3 \ | ||
botocore \ | ||
google-auth \ | ||
"urllib3>=1.26.17,<1.27" \ | ||
"networkx==2.6.3" \ | ||
bokeh \ | ||
torchvision==0.22.0 \ | ||
"opencv-python<4.12.0" | ||
|
||
RUN apt-get update \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do all APT installs in the beginning |
||
&& apt install -y --no-install-recommends \ | ||
git-lfs \ | ||
libgssapi-krb5-2 \ | ||
libexpat1 \ | ||
expat \ | ||
libarchive13 \ | ||
libgstreamer1.0-0 \ | ||
libgstreamer-plugins-base1.0-0 \ | ||
&& apt-get upgrade -y apparmor \ | ||
&& apt-get clean \ | ||
&& rm -rf /var/lib/apt/lists/* \ | ||
# The pytorch-training-neuronx base image comes with unneeded files for setting up apex | ||
# In order to pass the sanity test in the deep-learning-containers workflow, we will remove it here | ||
&& rm -rf /root/apex_setup.py | ||
|
||
ENV WANDB_MODE=disabled |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
{ | ||
"76839": "[pkg: gevent] [installed: 24.10.3]", | ||
"79077": "[pkg: h2] [installed: 4.2.0]", | ||
"71691": "[pkg: mlflow] [installed: 3.4.0]", | ||
"77740": "[pkg: protobuf] [installed: 3.20.3]", | ||
"78558": "[pkg: regex] [installed: 2024.11.6]", | ||
"77680": "[pkg: requests] [installed: 2.31.0]", | ||
"71064": "[pkg: requests] [installed: 2.31.0]", | ||
"77986": "[pkg: transformers] [installed: 4.51.0]", | ||
"78153": "[pkg: transformers] [installed: 4.51.0]", | ||
"79596": "[pkg: transformers] [installed: 4.51.0]", | ||
"79595": "[pkg: transformers] [installed: 4.51.0]", | ||
"79855": "[pkg: transformers] [installed: 4.51.0]", | ||
"78688": "[pkg: transformers] [installed: 4.51.0]", | ||
"77744": "[pkg: urllib3] [installed: 1.26.20]", | ||
"78828": "[pkg: torch] [installed: 2.7.0]" | ||
} |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you explain what is this doing? |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# telemetry.sh | ||
#!/bin/bash | ||
if [ -f /usr/local/bin/deep_learning_container.py ] && [[ -z "${OPT_OUT_TRACKING}" || "${OPT_OUT_TRACKING,,}" != "true" ]]; then | ||
( | ||
python /usr/local/bin/deep_learning_container.py \ | ||
--framework "huggingface_pytorch" \ | ||
--framework-version "2.7.0" \ | ||
--container-type "training" \ | ||
&>/dev/null & | ||
) | ||
fi | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tensorboard is also duplicate. please combine all of them