-
Notifications
You must be signed in to change notification settings - Fork 532
[MISC] add decorator for logging exceptions #1512
Conversation
Signed-off-by: Sheng Zha <[email protected]>
The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1512/5ce7c5f1b8e212a853a4d08717e0ccf875b7822a/index.html |
Signed-off-by: Sheng Zha <[email protected]>
The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1512/f9a5fb71925c75e9ca484b7a0e908756319460bf/index.html |
Codecov Report
@@ Coverage Diff @@
## master #1512 +/- ##
==========================================
- Coverage 86.49% 85.87% -0.63%
==========================================
Files 55 55
Lines 7502 7396 -106
==========================================
- Hits 6489 6351 -138
- Misses 1013 1045 +32
Continue to review full report at Codecov.
|
@@ -49,7 +49,7 @@ RUN cd ${WORKDIR} \ | |||
&& git clone https://github.com/dmlc/gluon-nlp \ | |||
&& cd gluon-nlp \ | |||
&& git checkout master \ | |||
&& python3 -m pip install -U -e ."[extras]" | |||
&& python3 -m pip install -U -e ."[extras,dev]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leezu the docker build for gpu keeps failing in horovod build step
horovod build error
|
@szha From the error message, it seems to be related to how the mxnet integration is written. Currently, the horovod will call MXNet API to determine some GPU-related flags, and will fail if the instance that is used does not contain GPU or is not configured appropriately. You may follow the guide in https://github.com/dmlc/gluon-nlp/tree/master/tools/docker#build-by-yourself and try again (Need to edit |
@sxjscience thanks. I think my system already has nvidia-docker2 installed and the config entry added. I think you are right that this has to do with how horovod integration is written. It's having trouble finding mxnet for some reason. |
OK, because I find that there are the following warning in the log so I thought that GPU was not used.
|
@sxjscience using |
I think we may try to automate our docker pipeline. |
@szha do you mean this error occurs when you rebuild the container?
That's not correct, because nvidia-docker only takes effect at runtime and not at buildtime. You need to follow the steps in https://github.com/dmlc/gluon-nlp/tree/master/tools/docker#build-by-yourself |
@leezu @sxjscience thanks for helping. I noticed that previously I missed the "default-runtime" entry in the config. Sorry for the miss. I was able to complete the build after adding that entry and I'm pushing the GPU docker now. |
looks like there might be an upstream change as tests/test_data_tokenizers.py::test_spacy_tokenizer failed. |
Signed-off-by: Sheng Zha <[email protected]>
The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1512/0a41311da7e394cce3459f93c90beef34c55f767/index.html |
Description
add decorator for logging exceptions
Checklist
Essentials
Changes
Comments
cc @dmlc/gluon-nlp-team