Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error: failed to solve: DeadlineExceeded: context deadline exceeded after everything is done #5687

Open
4 tasks done
huliang-microsoft opened this issue Jan 28, 2025 · 3 comments

Comments

@huliang-microsoft
Copy link

huliang-microsoft commented Jan 28, 2025

Contributing guidelines and issue reporting guide

Well-formed report checklist

  • I have found a bug that the documentation does not mention anything about my problem
  • I have found a bug that there are no open or closed issues that are related to my problem
  • I have provided version/information about my environment and done my best to provide a reproducer

Description of bug

Bug description

Hello, we are using docker build in our Azure Devops pipeline. We recently started to see a lot "error: failed to solve: DeadlineExceeded: context deadline exceeded" at the very end of the docker build process. Not every time it fails, but intermittent.

I have checked this issue #4327. The title is similar but it seems that we hit a different problem.
Sorry for the vague issue, it looks to me that the main docker build process is done, and the error happened during "unpacking"? I could not reproduce this on my local machine so I am guessing it might be hardware related but I would like to get some idea what error could happen at this stage and what can be the cause?

Thanks in advance.
The Error Message:

#16 exporting to image
#16 exporting layers
#16 exporting layers 196.1s done
#16 exporting manifest sha256:fb9dc3cf390b85bc6e61087220867593d4f72ac34ccc71a9a28d803412848629 0.0s done
#16 exporting config sha256:110dd9bc89e59618f0fde0befabbd7844192ff9aa707f4f98d643d6a7632d66c 0.0s done
#16 naming to ***/imagemultiseverity-cuda-standard_nc6s_v3:20250128.5 done
#16 unpacking to ***/imagemultiseverity-cuda-standard_nc6s_v3:20250128.5
#16 unpacking to ***/imagemultiseverity-cuda-standard_nc6s_v3:20250128.5 40.8s done
#16 DONE 237.0s
error: failed to solve: DeadlineExceeded: context deadline exceeded

And this successful run took even longer time:

#16 exporting to image
#16 exporting layers
#16 exporting layers 195.4s done
#16 exporting manifest sha256:9047cebdcd6d65852156f290d35327112b943b5b2e41c2d040b63ffeb4ed0dfa 0.0s done
#16 exporting config sha256:ce2b56ad7fe42ef20335dbf5392f6c4df5fa1aeee03c114534f4eed11cfbd1c5 0.0s done
#16 naming to ***/imagemultiseverity-cuda-standard_nc6s_v3:20250128.5 done
#16 unpacking to ***/imagemultiseverity-cuda-standard_nc6s_v3:20250128.5
#16 unpacking to ***/imagemultiseverity-cuda-standard_nc6s_v3:20250128.5 64.6s done
#16 DONE 260.2s

Our dockerfile is not complicated, attaching it here:

# base image
FROM mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.6-cudnn8-ubuntu20.04:20240709.v1

USER root

ARG PIP_INDEX_URL

# Disable pip cache
ENV PIP_NO_CACHE_DIR=1

# Install necessary packages and dependencies
RUN apt-get update && \
    apt-get install -y wget kmod runit

# Delete existing files and folders in /var/runit if it exists
RUN [ -d /var/runit ] && rm -rf /var/runit/* || echo "/var/runit does not exist, skipping cleanup"



WORKDIR /app
# set up environment
ARG CONDA_ENV_FILE=src/dependencies.yaml
COPY $CONDA_ENV_FILE /app/conda_env.yaml
RUN conda init
RUN conda env create --name model-env -f /app/conda_env.yaml && \
    conda clean -ay

COPY src/ /app/src

# copy model files
COPY artifacts/model /app/model

# Environment settings for selected SKU: Standard_NC6s_v3
ENV GUNICORN_CMD_ARGS="--threads 12 --workers=1 --bind=0.0.0.0:8501"
ENV DYNAMICBATCH_ENABLED="True"
ENV DYNAMICBATCH_IDLEBATCHSIZE="1"
ENV DYNAMICBATCH_MAXBATCHINTERVAL="0.002"
ENV DYNAMICBATCH_MAXBATCHSIZE="8"

ENV RAI_AZUREML_MODEL_DIR=/app
ENV PYTHONPATH="/app/src"

# configure ports and entrypoint
EXPOSE 8501

# Prepare runit service
RUN mkdir -p /var/runit/service && \
    cp /app/src/run.sh /var/runit/service/run && \
    chmod +x /var/runit/service/run

# Start the service
CMD ["runsvdir", "/var/runit"]

Reproduction

Unfortunately, I could not reproduce this on my local machine, so I guess it might be related to hardware.

Version information

Docker version 24.0.9, build 293681613032e6d1a39cc88115847d3984195c24
@tonistiigi
Copy link
Member

Please post the error with --debug . Should have stacktraces then.

unpacking to

What's the setup you are using. Unpacking in here should mean extracting to containerd image store. Although from progress, it seems that unpacking/exporting phase finished successfully.

@huliang-microsoft
Copy link
Author

Hello @tonistiigi thanks for replying. I am using an Azure Devops build task and it seems that that specific task does not provide debug flag, I am checking with the support engineer team to see if they can add the flag. will update this thread once I get the output.

@huliang-microsoft
Copy link
Author

Hello @tonistiigi We have an interesting problem with the --debug flag. The docker cmd in the pipeline returns (The cmd has a lot labels, to make it easy to read, I removed them)

/bin/docker build -f /mnt/vss/_work/docker/artifacts/dockerfiles/Dockerfile_TextMultiSeverity_CUDA_Standard_NC6s_v3 --label ... --build-arg PIP_INDEX_URL --debug --force-rm --pull -t ***/textmultiseverity-cuda-standard_nc6s_v3:20250131.10 /mnt/vss/_work/docker
unknown flag: --debug

I see another issue mentioned that --debug must be after docker and before build docker/docs#20540 but I ran the docker command locally, it seems that I can put --debug anywhere in the docker build command.
The docker version in the pipeline and my local machine are the same: 24.0.9. Any idea why the debug flag is not recognized?

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants