Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash when trying to use training in Applio with Zluda 3.7.2 #28

Open
AznamirWoW opened this issue Jul 13, 2024 · 2 comments
Open

Crash when trying to use training in Applio with Zluda 3.7.2 #28

AznamirWoW opened this issue Jul 13, 2024 · 2 comments
Assignees
Labels
implementation Unimplemented feature(s)

Comments

@AznamirWoW
Copy link

Using Applio 3.2.1

  1. modified the install script to use cu118 libraries

pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118

  1. downloaded Zluda 3.7.2, patched torch under env/Lib/site-packages/torch/lib using 3 dlls.

  2. modified Applio code to add wherever it imports torch
    torch.backends.cudnn.enabled = False
    torch.backends.cuda.enable_flash_sdp(False)
    torch.backends.cuda.enable_math_sdp(True)
    torch.backends.cuda.enable_mem_efficient_sdp(False)

  3. Ran the training process

  • preprocess dataset works
  • extract features works
  • start training fails regardless of the batch size with the following (original train.py line 545)

thread '' panicked at zluda_rtc\src\lib.rs:34:16:
[ZLUDA] HIPRTC failed: 11
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
thread '' panicked at library\panic_unwind\src\seh.rs:260:8:
Rust panics cannot be copied
stack backtrace:
0: 0x7ff83202601a - nvrtcGetLoweredName
1: 0x7ff832032c8b - nvrtcGetLoweredName
2: 0x7ff8320247d1 - nvrtcGetLoweredName
3: 0x7ff832025e06 - nvrtcGetLoweredName
4: 0x7ff83202761f - nvrtcGetLoweredName
5: 0x7ff8320272b7 - nvrtcGetLoweredName
6: 0x7ff832027b5d - nvrtcGetLoweredName
7: 0x7ff8320279db - nvrtcGetLoweredName
8: 0x7ff8320266a9 - nvrtcGetLoweredName
9: 0x7ff8320276d6 - nvrtcGetLoweredName
10: 0x7ff832037527 - nvrtcGetLoweredName
11: 0x7ff83202ab9e - nvrtcGetLoweredName
12: 0x7ff83de02255 - std::_Init_locks::operator=
13: 0x7ff83de01fb8 - std::_Init_locks::operator=
14: 0x7ff83de024c9 - __ExceptionPtrCurrentException
15: 0x7ffff58ebeed - THPPointer<_frame>::operator bool
16: 0x7ffff62afd7b - c10d::PythonCommHook::runHook
17: 0x7ff849711080 -
18: 0x7ff8497126a5 - _NLG_Return2
19: 0x7ff84f811c96 - RtlCaptureContext2
20: 0x7ffff58ef44b - c10::ivalue::Future::devices
21: 0x7ff81fe682f6 - cfunction_call
at \objects\methodobject.c:543
22: 0x7ff81fe2554c - _PyObject_MakeTpCall
at \objects\call.c:215
23: 0x7ff81fe278f1 - method_vectorcall
at \objects\classobject.c:83
24: 0x7ff81fe8fc5a - slot_tp_call
at \objects\typeobject.c:7497
25: 0x7ff81fe2554c - _PyObject_MakeTpCall
at \objects\call.c:215
26: 0x7ff81ff1e6f2 - PyObject_Vectorcall
at \include\cpython\abstract.h:123
27: 0x7ff81ff1e6f2 - call_function
at \python\ceval.c:5893
28: 0x7ff81ff1a8e2 - _PyEval_EvalFrameDefault
at \python\ceval.c:4213
29: 0x7ff81ff1ce7b - _PyEval_EvalFrame
at \include\internal\pycore_ceval.h:46
30: 0x7ff81ff1ce7b - _PyEval_Vector
at \python\ceval.c:5067
31: 0x7ff81fe2585e - _PyFunction_Vectorcall
at \objects\call.c:347
32: 0x7ff81fe277d2 - _PyObject_VectorcallTstate
at \include\cpython\abstract.h:114
33: 0x7ff81fe277d2 - method_vectorcall
at \objects\classobject.c:53
34: 0x7ff81fe2566d - PyVectorcall_Call
at \objects\call.c:267
35: 0x7ff81ff1e8f2 - PyObject_Call
at \objects\call.c:317
36: 0x7ff81ff1e8f2 - do_call_core
at \python\ceval.c:5945
37: 0x7ff81ff19d19 - _PyEval_EvalFrameDefault
at \python\ceval.c:4277
38: 0x7ff81ff1ce7b - _PyEval_EvalFrame
at \include\internal\pycore_ceval.h:46
39: 0x7ff81ff1ce7b - _PyEval_Vector
at \python\ceval.c:5067
40: 0x7ff81fe2585e - _PyFunction_Vectorcall
at \objects\call.c:347
41: 0x7ff81fe277d2 - _PyObject_VectorcallTstate
at \include\cpython\abstract.h:114
42: 0x7ff81fe277d2 - method_vectorcall
at \objects\classobject.c:53
43: 0x7ff81fe2566d - PyVectorcall_Call
at \objects\call.c:267
44: 0x7ff81ff1e8f2 - PyObject_Call
at \objects\call.c:317
45: 0x7ff81ff1e8f2 - do_call_core
at \python\ceval.c:5945
46: 0x7ff81ff19d19 - _PyEval_EvalFrameDefault
at \python\ceval.c:4277
47: 0x7ff81ff1ce7b - _PyEval_EvalFrame
at \include\internal\pycore_ceval.h:46
48: 0x7ff81ff1ce7b - _PyEval_Vector
at \python\ceval.c:5067
49: 0x7ff81fe2585e - _PyFunction_Vectorcall
at \objects\call.c:347
50: 0x7ff81fe25391 - _PyObject_FastCallDictTstate
at \objects\call.c:153
51: 0x7ff81fe25ad2 - _PyObject_Call_Prepend
at \objects\call.c:431
52: 0x7ff81fe8fc0c - slot_tp_call
at \objects\typeobject.c:7494
53: 0x7ff81fe2554c - _PyObject_MakeTpCall
at \objects\call.c:215
54: 0x7ff81ff1e6f2 - PyObject_Vectorcall
at \include\cpython\abstract.h:123
55: 0x7ff81ff1e6f2 - call_function
at \python\ceval.c:5893
56: 0x7ff81ff1af08 - _PyEval_EvalFrameDefault
at \python\ceval.c:4231
57: 0x7ff81ff1ce7b - _PyEval_EvalFrame
at \include\internal\pycore_ceval.h:46
58: 0x7ff81ff1ce7b - _PyEval_Vector
at \python\ceval.c:5067
59: 0x7ff81fe2585e - _PyFunction_Vectorcall
at \objects\call.c:347
60: 0x7ff81fe277d2 - _PyObject_VectorcallTstate
at \include\cpython\abstract.h:114
61: 0x7ff81fe277d2 - method_vectorcall
at \objects\classobject.c:53
62: 0x7ff81fe2566d - PyVectorcall_Call
at \objects\call.c:267
63: 0x7ff81ff1e8f2 - PyObject_Call
at \objects\call.c:317
64: 0x7ff81ff1e8f2 - do_call_core
at \python\ceval.c:5945
65: 0x7ff81ff19d19 - _PyEval_EvalFrameDefault
at \python\ceval.c:4277
66: 0x7ff81ff1ce7b - _PyEval_EvalFrame
at \include\internal\pycore_ceval.h:46
67: 0x7ff81ff1ce7b - _PyEval_Vector
at \python\ceval.c:5067
68: 0x7ff81fe2585e - _PyFunction_Vectorcall
at \objects\call.c:347
69: 0x7ff81fe277d2 - _PyObject_VectorcallTstate
at \include\cpython\abstract.h:114
70: 0x7ff81fe277d2 - method_vectorcall
at \objects\classobject.c:53
71: 0x7ff81fe2566d - PyVectorcall_Call
at \objects\call.c:267
72: 0x7ff81ff1e8f2 - PyObject_Call
at \objects\call.c:317
73: 0x7ff81ff1e8f2 - do_call_core
at \python\ceval.c:5945
74: 0x7ff81ff19d19 - _PyEval_EvalFrameDefault
at \python\ceval.c:4277
75: 0x7ff81ff1ce7b - _PyEval_EvalFrame
at \include\internal\pycore_ceval.h:46
76: 0x7ff81ff1ce7b - _PyEval_Vector
at \python\ceval.c:5067
77: 0x7ff81fe2585e - _PyFunction_Vectorcall
at \objects\call.c:347
78: 0x7ff81fe25391 - _PyObject_FastCallDictTstate
at \objects\call.c:153
79: 0x7ff81fe25ad2 - _PyObject_Call_Prepend
at \objects\call.c:431
80: 0x7ff81fe8fc0c - slot_tp_call
at \objects\typeobject.c:7494
81: 0x7ff81fe2554c - _PyObject_MakeTpCall
at \objects\call.c:215
82: 0x7ff81ff1e6f2 - PyObject_Vectorcall
at \include\cpython\abstract.h:123
83: 0x7ff81ff1e6f2 - call_function
at \python\ceval.c:5893
84: 0x7ff81ff1af08 - _PyEval_EvalFrameDefault
at \python\ceval.c:4231
85: 0x7ff81ff1ce7b - _PyEval_EvalFrame
at \include\internal\pycore_ceval.h:46
86: 0x7ff81ff1ce7b - _PyEval_Vector
at \python\ceval.c:5067
87: 0x7ff81fe2585e - _PyFunction_Vectorcall
at \objects\call.c:347
88: 0x7ff81fe27669 - _PyObject_VectorcallTstate
at \include\cpython\abstract.h:114
89: 0x7ff81fe278f1 - method_vectorcall
at \objects\classobject.c:83
90: 0x7ff81ff1e8f2 - PyObject_Call
at \objects\call.c:317
91: 0x7ff81ff1e8f2 - do_call_core
at \python\ceval.c:5945
92: 0x7ff81ff19d19 - _PyEval_EvalFrameDefault
at \python\ceval.c:4277
93: 0x7ff81ff1ce7b - _PyEval_EvalFrame
at \include\internal\pycore_ceval.h:46
94: 0x7ff81ff1ce7b - _PyEval_Vector
at \python\ceval.c:5067
95: 0x7ff81fe2585e - _PyFunction_Vectorcall
at \objects\call.c:347
96: 0x7ff81fe27669 - _PyObject_VectorcallTstate
at \include\cpython\abstract.h:114
97: 0x7ff81fe278f1 - method_vectorcall
at \objects\classobject.c:83
98: 0x7ff81ff1e8f2 - PyObject_Call
at \objects\call.c:317
99: 0x7ff81ff1e8f2 - do_call_core
at \python\ceval.c:5945
100: 0x7ff81ff19d19 - _PyEval_EvalFrameDefault
at \python\ceval.c:4277
101: 0x7ff81ff1ce7b - _PyEval_EvalFrame
at \include\internal\pycore_ceval.h:46
102: 0x7ff81ff1ce7b - _PyEval_Vector
at \python\ceval.c:5067
103: 0x7ff81fe2585e - _PyFunction_Vectorcall
at \objects\call.c:347
104: 0x7ff81fe253b4 - _PyObject_FastCallDictTstate
at \objects\call.c:142
105: 0x7ff81fe25ad2 - _PyObject_Call_Prepend
at \objects\call.c:431
106: 0x7ff81fe8fc0c - slot_tp_call
at \objects\typeobject.c:7494
107: 0x7ff81fe257a7 - _PyObject_Call
at \objects\call.c:305
108: 0x7ff81ff1e8f2 - PyObject_Call
at \objects\call.c:317
109: 0x7ff81ff1e8f2 - do_call_core
at \python\ceval.c:5945
110: 0x7ff81ff19d19 - _PyEval_EvalFrameDefault
at \python\ceval.c:4277
111: 0x7ff81ff1ce7b - _PyEval_EvalFrame
at \include\internal\pycore_ceval.h:46
112: 0x7ff81ff1ce7b - _PyEval_Vector
at \python\ceval.c:5067
113: 0x7ff81fe2585e - _PyFunction_Vectorcall
at \objects\call.c:347
114: 0x7ff81fe27669 - _PyObject_VectorcallTstate
at \include\cpython\abstract.h:114
115: 0x7ff81fe278f1 - method_vectorcall
at \objects\classobject.c:83
116: 0x7ff81ff1e8f2 - PyObject_Call
at \objects\call.c:317
117: 0x7ff81ff1e8f2 - do_call_core
at \python\ceval.c:5945
118: 0x7ff81ff19d19 - _PyEval_EvalFrameDefault
at \python\ceval.c:4277
119: 0x7ff81ff1ce7b - _PyEval_EvalFrame
at \include\internal\pycore_ceval.h:46
120: 0x7ff81ff1ce7b - _PyEval_Vector
at \python\ceval.c:5067
121: 0x7ff81fe2585e - _PyFunction_Vectorcall
at \objects\call.c:347
122: 0x7ff81fe27669 - _PyObject_VectorcallTstate
at \include\cpython\abstract.h:114
123: 0x7ff81fe278f1 - method_vectorcall
at \objects\classobject.c:83
124: 0x7ff81ff1e8f2 - PyObject_Call
at \objects\call.c:317
125: 0x7ff81ff1e8f2 - do_call_core
at \python\ceval.c:5945
126: 0x7ff81ff19d19 - _PyEval_EvalFrameDefault
at \python\ceval.c:4277
127: 0x7ff81ff1ce7b - _PyEval_EvalFrame
at \include\internal\pycore_ceval.h:46
128: 0x7ff81ff1ce7b - _PyEval_Vector
at \python\ceval.c:5067
129: 0x7ff81fe2585e - _PyFunction_Vectorcall
at \objects\call.c:347
130: 0x7ff81fe27669 - _PyObject_VectorcallTstate
at \include\cpython\abstract.h:114
131: 0x7ff81fe278f1 - method_vectorcall
at \objects\classobject.c:83
132: 0x7ff81ff1e8f2 - PyObject_Call
at \objects\call.c:317
133: 0x7ff81ff1e8f2 - do_call_core
at \python\ceval.c:5945
134: 0x7ff81ff19d19 - _PyEval_EvalFrameDefault
at \python\ceval.c:4277
135: 0x7ff81ff1ce7b - _PyEval_EvalFrame
at \include\internal\pycore_ceval.h:46
136: 0x7ff81ff1ce7b - _PyEval_Vector
at \python\ceval.c:5067
137: 0x7ff81fe2585e - _PyFunction_Vectorcall
at \objects\call.c:347
138: 0x7ff81fe253b4 - _PyObject_FastCallDictTstate
at \objects\call.c:142
139: 0x7ff81fe25ad2 - _PyObject_Call_Prepend
at \objects\call.c:431
140: 0x7ff81fe8fc0c - slot_tp_call
at \objects\typeobject.c:7494
141: 0x7ff81fe2554c - _PyObject_MakeTpCall
at \objects\call.c:215
142: 0x7ff81ff1e6f2 - PyObject_Vectorcall
at \include\cpython\abstract.h:123
143: 0x7ff81ff1e6f2 - call_function
at \python\ceval.c:5893
144: 0x7ff81ff1a8e2 - _PyEval_EvalFrameDefault
at \python\ceval.c:4213
145: 0x7ff81ff1ce7b - _PyEval_EvalFrame
at \include\internal\pycore_ceval.h:46
146: 0x7ff81ff1ce7b - _PyEval_Vector
at \python\ceval.c:5067
147: 0x7ff81fe2585e - _PyFunction_Vectorcall
at \objects\call.c:347
148: 0x7ff81ff161a9 - _PyObject_VectorcallTstate
at \include\cpython\abstract.h:114
149: 0x7ff81ff1e6f2 - PyObject_Vectorcall
at \include\cpython\abstract.h:123
150: 0x7ff81ff1e6f2 - call_function
at \python\ceval.c:5893
151: 0x7ff81ff1a8e2 - _PyEval_EvalFrameDefault
at \python\ceval.c:4213
152: 0x7ff81ff1ce7b - _PyEval_EvalFrame
at \include\internal\pycore_ceval.h:46
153: 0x7ff81ff1ce7b - _PyEval_Vector
at \python\ceval.c:5067
154: 0x7ff81fe2585e - _PyFunction_Vectorcall
at \objects\call.c:347
155: 0x7ff81ff1e8f2 - PyObject_Call
at \objects\call.c:317
156: 0x7ff81ff1e8f2 - do_call_core
at \python\ceval.c:5945
157: 0x7ff81ff19d19 - _PyEval_EvalFrameDefault
at \python\ceval.c:4277
158: 0x7ff81ff1ce7b - _PyEval_EvalFrame
at \include\internal\pycore_ceval.h:46
159: 0x7ff81ff1ce7b - _PyEval_Vector
at \python\ceval.c:5067
160: 0x7ff81fe2585e - _PyFunction_Vectorcall
at \objects\call.c:347
161: 0x7ff81ff161a9 - _PyObject_VectorcallTstate
at \include\cpython\abstract.h:114
162: 0x7ff81ff1e6f2 - PyObject_Vectorcall
at \include\cpython\abstract.h:123
163: 0x7ff81ff1e6f2 - call_function
at \python\ceval.c:5893
164: 0x7ff81ff1aeb5 - _PyEval_EvalFrameDefault
at \python\ceval.c:4198
165: 0x7ff81ff1ce7b - _PyEval_EvalFrame
at \include\internal\pycore_ceval.h:46
166: 0x7ff81ff1ce7b - _PyEval_Vector
at \python\ceval.c:5067
167: 0x7ff81fe2585e - _PyFunction_Vectorcall
at \objects\call.c:347
168: 0x7ff81ff161a9 - _PyObject_VectorcallTstate
at \include\cpython\abstract.h:114
169: 0x7ff81ff1e6f2 - PyObject_Vectorcall
at \include\cpython\abstract.h:123
170: 0x7ff81ff1e6f2 - call_function
at \python\ceval.c:5893
171: 0x7ff81ff1aeb5 - _PyEval_EvalFrameDefault
at \python\ceval.c:4198
172: 0x7ff81ff1ce7b - _PyEval_EvalFrame
at \include\internal\pycore_ceval.h:46
173: 0x7ff81ff1ce7b - _PyEval_Vector
at \python\ceval.c:5067
174: 0x7ff81fe2585e - _PyFunction_Vectorcall
at \objects\call.c:347
175: 0x7ff81ff161a9 - _PyObject_VectorcallTstate
at \include\cpython\abstract.h:114
176: 0x7ff81ff1e6f2 - PyObject_Vectorcall
at \include\cpython\abstract.h:123
177: 0x7ff81ff1e6f2 - call_function
at \python\ceval.c:5893
178: 0x7ff81ff1a8e2 - _PyEval_EvalFrameDefault
at \python\ceval.c:4213
179: 0x7ff81ff1ce7b - _PyEval_EvalFrame
at \include\internal\pycore_ceval.h:46
180: 0x7ff81ff1ce7b - _PyEval_Vector
at \python\ceval.c:5067
181: 0x7ff81fe2585e - _PyFunction_Vectorcall
at \objects\call.c:347
182: 0x7ff81ff161a9 - _PyObject_VectorcallTstate
at \include\cpython\abstract.h:114
183: 0x7ff81ff1e6f2 - PyObject_Vectorcall
at \include\cpython\abstract.h:123
184: 0x7ff81ff1e6f2 - call_function
at \python\ceval.c:5893
185: 0x7ff81ff1af08 - _PyEval_EvalFrameDefault
at \python\ceval.c:4231
186: 0x7ff81ff1ce7b - _PyEval_EvalFrame
at \include\internal\pycore_ceval.h:46
187: 0x7ff81ff1ce7b - _PyEval_Vector
at \python\ceval.c:5067
188: 0x7ff81ff178c2 - PyEval_EvalCode
at \python\ceval.c:1134
189: 0x7ff81ff8e08e - run_eval_code_obj
at \python\pythonrun.c:1291
190: 0x7ff81ff8e168 - run_mod
at \python\pythonrun.c:1312
191: 0x7ff81ff8dc39 - PyRun_StringFlags
at \python\pythonrun.c:1183
192: 0x7ff81ff8c21b - PyRun_SimpleStringFlags
at \python\pythonrun.c:503
193: 0x7ff81fda8ef7 - pymain_run_command
at \modules\main.c:252
194: 0x7ff81fda8ef7 - pymain_run_python
at \modules\main.c:582
195: 0x7ff81fda9e93 - Py_RunMain
at \modules\main.c:670
196: 0x7ff81fda9e93 - pymain_main
at \modules\main.c:1066
197: 0x7ff81fda9f06 - Py_Main
at \modules\main.c:1078
198: 0x7ff694f11494 - invoke_main
at d:\agent_work\2\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:90
199: 0x7ff694f11494 - __scrt_common_main_seh
at d:\agent_work\2\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288
200: 0x7ff84ea37374 - BaseThreadInitThunk
201: 0x7ff84f7bcc91 - RtlUserThreadStart

@lshqqytiger lshqqytiger added bug Something isn't working implementation Unimplemented feature(s) labels Jul 14, 2024
@AznamirWoW
Copy link
Author

found that removing "@torch.jit.script" decorator prevents the crash

image

@lshqqytiger
Copy link
Owner

#34

@lshqqytiger lshqqytiger closed this as not planned Won't fix, can't repro, duplicate, stale Aug 27, 2024
@lshqqytiger lshqqytiger mentioned this issue Jan 3, 2025
@lshqqytiger lshqqytiger reopened this Jan 3, 2025
@lshqqytiger lshqqytiger removed the bug Something isn't working label Jan 3, 2025
@lshqqytiger lshqqytiger self-assigned this Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
implementation Unimplemented feature(s)
Projects
None yet
Development

No branches or pull requests

2 participants