-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instruction: I-21-0 with opcode: TensorTensor couldn't be allocated in SB Error #1060
Comments
Each SBUF partition in NeuronCore-v2 only has 196KiB of physical memory. When a TensorTensor instruction (triggered by a call of nl.add) is executed, all the input and output tensors must fit in SBUF. More info on SBUF: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/nki/trainium_inferentia2_arch.html#trainium-inferentia2-arch. Here, you will need to reduce the tile size of your nl.add() calls. Use loops to iterate over different chunks of your original tensor. |
Hello,
Thanks for your response. We don't believe this is a memory issue as we are not increasing the size of the output tensor, we are simply adding in place. Do you mind taking a look at our code? Thank you!
Best,
Sherine Ismail
…________________________________
From: aws-serina-tan ***@***.***>
Sent: Friday, December 6, 2024 6:42 PM
To: aws-neuron/aws-neuron-sdk ***@***.***>
Cc: Sherine M Ismail ***@***.***>; Author ***@***.***>
Subject: Re: [aws-neuron/aws-neuron-sdk] Instruction: I-21-0 with opcode: TensorTensor couldn't be allocated in SB Error (Issue #1060)
Each SBUF partition only has 196KiB of physical memory. When a TensorTensor instruction (triggered by a call of nl.add) is executed, all the input and output tensors must fit in SBUF. More info on SBUF: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/nki/trainium_inferentia2_arch.html#trainium-inferentia2-arch.
Here, you will need to reduce the tile size of your nl.add() calls. Use loops to iterate over different chunks of your original tensor.
—
Reply to this email directly, view it on GitHub<#1060 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BEITX5ZHPZS3FFV46MUJEZT2EJOA5AVCNFSM6AAAAABTFW55WCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMRUHAZDAOBXHA>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
@sherinei could you share your latest code via gist or github with @aws-serina-tan @AWSNB @aws-zhehongb @JonathanHenson @aws-qieqingy @EmilyWebber and if u can share the code you doing to adding in place ? are you using a = a+b, a+=b, a[...] = a+b, or a[...] += b ? |
Also, we noticed that our code works (runs and passes tests without errors) only when we use nki.simulate_kernel. Let me know if you think of anything that could be causing this issue. Thanks again!
Best,
Sherine Ismail
…________________________________
From: Sherine M Ismail ***@***.***>
Sent: Friday, December 6, 2024 9:22 PM
To: aws-neuron/aws-neuron-sdk ***@***.***>; aws-neuron/aws-neuron-sdk ***@***.***>
Cc: Author ***@***.***>
Subject: Re: [aws-neuron/aws-neuron-sdk] Instruction: I-21-0 with opcode: TensorTensor couldn't be allocated in SB Error (Issue #1060)
Hello,
Thanks for your response. We don't believe this is a memory issue as we are not increasing the size of the output tensor, we are simply adding in place. Do you mind taking a look at our code? Thank you!
Best,
Sherine Ismail
________________________________
From: aws-serina-tan ***@***.***>
Sent: Friday, December 6, 2024 6:42 PM
To: aws-neuron/aws-neuron-sdk ***@***.***>
Cc: Sherine M Ismail ***@***.***>; Author ***@***.***>
Subject: Re: [aws-neuron/aws-neuron-sdk] Instruction: I-21-0 with opcode: TensorTensor couldn't be allocated in SB Error (Issue #1060)
Each SBUF partition only has 196KiB of physical memory. When a TensorTensor instruction (triggered by a call of nl.add) is executed, all the input and output tensors must fit in SBUF. More info on SBUF: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/nki/trainium_inferentia2_arch.html#trainium-inferentia2-arch.
Here, you will need to reduce the tile size of your nl.add() calls. Use loops to iterate over different chunks of your original tensor.
—
Reply to this email directly, view it on GitHub<#1060 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BEITX5ZHPZS3FFV46MUJEZT2EJOA5AVCNFSM6AAAAABTFW55WCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMRUHAZDAOBXHA>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Just shared it. You can find the code in part2/conv2d.py in the fused_conv2d_maxpool() function. We specifically do:
# store to output array in hbm
res = nl.add(position_out, bias_i)
position_out_reshaped = res.reshape((c_out_pmax, tiled_out_height, out_width))
nl.store(X_out[b, out_i*c_out_pmax:out_i*c_out_pmax + c_out_pmax, start_out_height:end_out_height], value = position_out_reshaped) # account for pooling later
Thanks again.
…________________________________
From: AWSNB ***@***.***>
Sent: Friday, December 6, 2024 9:59 PM
To: aws-neuron/aws-neuron-sdk ***@***.***>
Cc: Sherine M Ismail ***@***.***>; Mention ***@***.***>
Subject: Re: [aws-neuron/aws-neuron-sdk] Instruction: I-21-0 with opcode: TensorTensor couldn't be allocated in SB Error (Issue #1060)
@sherinei<https://github.com/sherinei> could you share your latest code via gist or github with @aws-serina-tan<https://github.com/aws-serina-tan> @AWSNB<https://github.com/AWSNB> @aws-zhehongb<https://github.com/aws-zhehongb> @JonathanHenson<https://github.com/JonathanHenson> @aws-qieqingy<https://github.com/aws-qieqingy> @EmilyWebber<https://github.com/EmilyWebber>
and if u can share the code you doing to adding in place ? are you using a = a+b, a+=b, a[...] = a+b, or a[...] += b ?
—
Reply to this email directly, view it on GitHub<#1060 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BEITX5764W2WFYBFQPQP6FT2EKFDPAVCNFSM6AAAAABTFW55WCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMRUHE2TKMZXGI>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
@sherinei couple of other comments on the code: Line 185: nl.add(row_out, bias_i) ==> you are not assigning the result of add to any destination. Line 184-192: try adding the bias after copying matmul to sbuf, and instead of +=, try nl.add
|
Hi sorry for the confusion, the issue we have is actually the nl.add on line 193 in the code, we forgot to delete that earlier nl.add which shouldn't be there
…________________________________
From: AWSNB ***@***.***>
Sent: Friday, December 6, 2024 11:00 PM
To: aws-neuron/aws-neuron-sdk ***@***.***>
Cc: Sherine M Ismail ***@***.***>; Mention ***@***.***>
Subject: Re: [aws-neuron/aws-neuron-sdk] Instruction: I-21-0 with opcode: TensorTensor couldn't be allocated in SB Error (Issue #1060)
@sherinei<https://github.com/sherinei> couple of other comments on the code:
Line 185: nl.add(row_out, bias_i) ==> you are not assigning the result of add to any destination.
this should be: c = nl.add(a,b)
Line 184-192: try adding the bias after copying matmul to sbuf, and instead of +=, try nl.add
row_out[...] = nl.matmul(w[:, in_i*c_in_pmax:in_i*c_in_pmax + c_in_pmax, i, j], x_row)
# nl.add(row_out, bias_i) -- move this to add after data is in sbuf
# copy per row output into corresponding index in sbuf array
row_out_sbuf = nl.ndarray(shape=row_out.shape, dtype=row_out.dtype, buffer=nl.sbuf)
row_out_sbuf = nl.copy(row_out, dtype=row_out.dtype) # from psum to sbuf
row_out_sbuf[...] = nl.add(row_out_sbuf, bias_i); # putting this here to do sbuf to sbuf
# print(row_out_sbuf.shape, bias_i.shape)
po_start_index = h * out_width
po_end_index = po_start_index + out_width
position_out[:, po_start_index:po_end_index] = nl.add( position_out[:, po_start_index:po_end_index], row_out_sbuf) # changing from += to nl.add or can do something like a[...] = a + b
—
Reply to this email directly, view it on GitHub<#1060 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BEITX54765IACVCSIOYFGOD2EKMIVAVCNFSM6AAAAABTFW55WCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMRUHE3TONBYHE>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Got it
1. You should still change line 191 to use nl.add instead of +=
2. Did you confirm position_out & bias_i have the same shape ?
From: sherinei ***@***.***>
Reply-To: aws-neuron/aws-neuron-sdk ***@***.***>
Date: Friday, December 6, 2024 at 11:09 PM
To: aws-neuron/aws-neuron-sdk ***@***.***>
Cc: "Bshara, Nafea" ***@***.***>, Mention ***@***.***>
Subject: Re: [aws-neuron/aws-neuron-sdk] Instruction: I-21-0 with opcode: TensorTensor couldn't be allocated in SB Error (Issue #1060)
Hi sorry for the confusion, the issue we have is actually the nl.add on line 193 in the code, we forgot to delete that earlier nl.add which shouldn't be there
…________________________________
From: AWSNB ***@***.***>
Sent: Friday, December 6, 2024 11:00 PM
To: aws-neuron/aws-neuron-sdk ***@***.***>
Cc: Sherine M Ismail ***@***.***>; Mention ***@***.***>
Subject: Re: [aws-neuron/aws-neuron-sdk] Instruction: I-21-0 with opcode: TensorTensor couldn't be allocated in SB Error (Issue #1060)
@sherinei<https://github.com/sherinei> couple of other comments on the code:
Line 185: nl.add(row_out, bias_i) ==> you are not assigning the result of add to any destination.
this should be: c = nl.add(a,b)
Line 184-192: try adding the bias after copying matmul to sbuf, and instead of +=, try nl.add
row_out[...] = nl.matmul(w[:, in_i*c_in_pmax:in_i*c_in_pmax + c_in_pmax, i, j], x_row)
# nl.add(row_out, bias_i) -- move this to add after data is in sbuf
# copy per row output into corresponding index in sbuf array
row_out_sbuf = nl.ndarray(shape=row_out.shape, dtype=row_out.dtype, buffer=nl.sbuf)
row_out_sbuf = nl.copy(row_out, dtype=row_out.dtype) # from psum to sbuf
row_out_sbuf[...] = nl.add(row_out_sbuf, bias_i); # putting this here to do sbuf to sbuf
# print(row_out_sbuf.shape, bias_i.shape)
po_start_index = h * out_width
po_end_index = po_start_index + out_width
position_out[:, po_start_index:po_end_index] = nl.add( position_out[:, po_start_index:po_end_index], row_out_sbuf) # changing from += to nl.add or can do something like a[...] = a + b
—
Reply to this email directly, view it on GitHub<#1060 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BEITX54765IACVCSIOYFGOD2EKMIVAVCNFSM6AAAAABTFW55WCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMRUHE3TONBYHE>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
—
Reply to this email directly, view it on GitHub<#1060 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AFTRWCOXPGWHKWWABFQJBTD2EKNHBAVCNFSM6AAAAABTFW55WCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMRUHE4DANBSGM>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
For (1), we tried changing it but got the following error:
position_out[:, po_start_index:po_end_index] = nl.add(position_out[:, po_start_index:po_end_index], row_out_sbuf)
SyntaxError: Unexpected output dependencies, missing indices in the dst access: i, j
For (2), position_out is (128, tiled_out_height * out_width) and bias_i is (128, 1); however, we were told that nl.add(...) would broadcast bias_i to the same shape as position_out.
…________________________________
From: AWSNB ***@***.***>
Sent: Friday, December 6, 2024 11:16 PM
To: aws-neuron/aws-neuron-sdk ***@***.***>
Cc: Sherine M Ismail ***@***.***>; Mention ***@***.***>
Subject: Re: [aws-neuron/aws-neuron-sdk] Instruction: I-21-0 with opcode: TensorTensor couldn't be allocated in SB Error (Issue #1060)
Got it
1. You should still change line 191 to use nl.add instead of +=
2. Did you confirm position_out & bias_i have the same shape ?
From: sherinei ***@***.***>
Reply-To: aws-neuron/aws-neuron-sdk ***@***.***>
Date: Friday, December 6, 2024 at 11:09 PM
To: aws-neuron/aws-neuron-sdk ***@***.***>
Cc: "Bshara, Nafea" ***@***.***>, Mention ***@***.***>
Subject: Re: [aws-neuron/aws-neuron-sdk] Instruction: I-21-0 with opcode: TensorTensor couldn't be allocated in SB Error (Issue #1060)
Hi sorry for the confusion, the issue we have is actually the nl.add on line 193 in the code, we forgot to delete that earlier nl.add which shouldn't be there
________________________________
From: AWSNB ***@***.***>
Sent: Friday, December 6, 2024 11:00 PM
To: aws-neuron/aws-neuron-sdk ***@***.***>
Cc: Sherine M Ismail ***@***.***>; Mention ***@***.***>
Subject: Re: [aws-neuron/aws-neuron-sdk] Instruction: I-21-0 with opcode: TensorTensor couldn't be allocated in SB Error (Issue #1060)
@sherinei<https://github.com/sherinei> couple of other comments on the code:
Line 185: nl.add(row_out, bias_i) ==> you are not assigning the result of add to any destination.
this should be: c = nl.add(a,b)
Line 184-192: try adding the bias after copying matmul to sbuf, and instead of +=, try nl.add
row_out[...] = nl.matmul(w[:, in_i*c_in_pmax:in_i*c_in_pmax + c_in_pmax, i, j], x_row)
# nl.add(row_out, bias_i) -- move this to add after data is in sbuf
# copy per row output into corresponding index in sbuf array
row_out_sbuf = nl.ndarray(shape=row_out.shape, dtype=row_out.dtype, buffer=nl.sbuf)
row_out_sbuf = nl.copy(row_out, dtype=row_out.dtype) # from psum to sbuf
row_out_sbuf[...] = nl.add(row_out_sbuf, bias_i); # putting this here to do sbuf to sbuf
# print(row_out_sbuf.shape, bias_i.shape)
po_start_index = h * out_width
po_end_index = po_start_index + out_width
position_out[:, po_start_index:po_end_index] = nl.add( position_out[:, po_start_index:po_end_index], row_out_sbuf) # changing from += to nl.add or can do something like a[...] = a + b
—
Reply to this email directly, view it on GitHub<#1060 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BEITX54765IACVCSIOYFGOD2EKMIVAVCNFSM6AAAAABTFW55WCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMRUHE3TONBYHE>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
—
Reply to this email directly, view it on GitHub<#1060 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AFTRWCOXPGWHKWWABFQJBTD2EKNHBAVCNFSM6AAAAABTFW55WCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMRUHE4DANBSGM>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
—
Reply to this email directly, view it on GitHub<#1060 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BEITX5232QQZCIUMEU24PPL2EKODNAVCNFSM6AAAAABTFW55WCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMRUHE4DEOJVGA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
you are right on (2) and these specific dimensions are indeed broadcastable so the code is good re (1), see hongbin's comments about indices inside loops in case that helps |
don't think those comments help with our issue, position_out[:, po_start_index:po_end_index] += row_out_sbuf runs fine but position_out[:, po_start_index:po_end_index] = position_out[:, po_start_index:po_end_index] + row_out_sbuf and position_out[:, po_start_index:po_end_index] = nl.add(position_out[:, po_start_index:po_end_index], row_out_sbuf) return errors. position_out is of shape(c_out_pmax, tiled_out_height * out_width) |
in
for test for fp32 datatype, you need You need to consider to tile the computation into multiple smaller tiles to overcome this problem |
Hello, we are getting the following error when trying to use nl.add:
`Running correctness test for conv2d kernel with larger images...[GCA035] Instruction: I-21-0 with opcode: TensorTensor couldn't be allocated in SB
Memory Location Accessed:
res.48_i0: 888 Bytes per Partition and total of: 113664 Bytes in SB
position_out_i0: 98568 Bytes per Partition and total of: 12616704 Bytes in SB
position_out_i0: 98568 Bytes per Partition and total of: 12616704 Bytes in SB
Total Accessed Bytes per partition by instruction: 198024
Total SB Partition Size: 196608
Traceback (most recent call last):
File "/home/ubuntu/asst4-trainium/part2/test_harness.py", line 196, in
test_result = test_correctness_conv2d_kernel(conv2d, use_larger_images=True)
File "/home/ubuntu/asst4-trainium/part2/test_harness.py", line 85, in test_correctness_conv2d_kernel
out = kernel(*args, **kwargs)
File "neuronxcc/nki/compile.py", line 92, in neuronxcc.nki.compile.GenericKernel.call
File "neuronxcc/starfish/penguin/targets/nki/TraceKernel.py", line 174, in neuronxcc.starfish.penguin.targets.nki.TraceKernel.Kernel.call
File "neuronxcc/starfish/penguin/targets/nki/TraceKernel.py", line 422, in neuronxcc.starfish.penguin.targets.nki.TraceKernel.BaremetalKernel.post_process_call
File "neuronxcc/starfish/penguin/targets/nki/TraceKernel.py", line 425, in neuronxcc.starfish.penguin.targets.nki.TraceKernel.BaremetalKernel.post_process_call
File "neuronxcc/starfish/penguin/targets/nki/TraceKernel.py", line 508, in neuronxcc.starfish.penguin.targets.nki.TraceKernel.BaremetalKernel._compile
RuntimeError: Compilation failed for fused_conv2d_maxpool with error Command '['neuronx-cc', 'compile', '--framework', 'XLA', 'penguin.py', '--internal-tensorizer-opt-level=nki', '--pipeline', 'compile', 'SaveTemps', '--target', 'trn1', '--disable-internal-io-dge', '--output=file.neff']' returned non-zero exit status 70.`
We're not sure what's causing this error. Any help would be appreciated. Thanks.
The text was updated successfully, but these errors were encountered: