Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV reported by Sentry when a NullReferenceException is thrown in Release mode #9055

Open
tranb3r opened this issue Jun 25, 2024 · 44 comments
Assignees
Labels
Area: App Runtime Issues in `libmonodroid.so`. need-attention A xamarin-android contributor needs to review

Comments

@tranb3r
Copy link

tranb3r commented Jun 25, 2024

Android framework version

net8.0-android

Affected platform version

VS 2022 17.10.0

Description

I'm getting SIGSEGV reports from Sentry when running my Maui app on Android.
I've norrowed it down to a NullReferenceException happening in Release mode.
However, I don't understand why I get both a NullReferenceException and a SIGSEGV, since I'm using only managed code.
Even when I catch the NullReferenceException, the SIGSEGV still occurs, without the app actually crashing.
Also, I haven't seen anything weird in the log. How is Sentry catching this SIGSEGV? I'm lost here...
Could you please take a look at my repro? https://github.com/tranb3r/Issues/tree/main/MauiAppSegfault

Steps to Reproduce

  1. Open repro app in Visual Studio and set your Dsn for Sentry before running the app.
  2. Run the app in Release mode on Android.
  3. Tap on Run button. A NullReferenceException happens but it's catched silently.
  4. Close the app, launch it again. An error report with SIGSEGV is sent to Sentry immediately.

Did you find any workaround?

No workaround.

Relevant log output

No response

@tranb3r tranb3r added Area: App Runtime Issues in `libmonodroid.so`. needs-triage Issues that need to be assigned. labels Jun 25, 2024
@tranb3r tranb3r changed the title SIGSEGV reported by Sentry when a NullReferenceException is thrown while using HtmlAgilityPack in Release mode SIGSEGV reported by Sentry when a NullReferenceException is thrown in Release mode Jun 26, 2024
@grendello
Copy link
Contributor

@tranb3r considering I have no idea what Sentry is and how to use it, I will need more info from you :)

Can you add to this issue logcat output with the segfault, as well as the managed exception strack trace?

The fact that you don't use native code directly doesn't mean it isn't involved, in fact, it's always involved in one manner or another.

If you're able to reproduce this issue locally, please capture logcat output using the following commands:

> adb shell setprop debug.mono.log default,assembly,mono_log_level=debug,mono_log_mask=all
> adb logcat -G 64M
> adb logcat -c
rem Start and crash the app here, wait 2-3 seconds and then:
> adb logcat -d > logcat.txt

@grendello grendello added need-info Issues that need more information from the author. and removed needs-triage Issues that need to be assigned. labels Jun 27, 2024
@tranb3r
Copy link
Author

tranb3r commented Jun 27, 2024

considering I have no idea what Sentry is and how to use it, I will need more info from you :)

@grendello
Sentry is cloud-based error tracking for applications.
If you want to reproduce this issue on your machine, you can create a free account in 2 minutes, and then simply copy your account id (it's called DSN) into MauiProgram.cs.

Can you add to this issue logcat output with the segfault, as well as the managed exception strack trace?

The manage exception is catched. I've added an INFO log so you can see it in the logcat.

06-27 14:20:50.722 11869 11869 I MauiAppSegFault: System.NullReferenceException: Object reference not set to an instance of an object
06-27 14:20:50.722 11869 11869 I MauiAppSegFault:    at MauiAppSegfault.MainPage.Button_OnClicked(Object sender, EventArgs e)

The application is not crashing, and I haven't seen a trace for the Segfault.
But Sentry is capturing it, so it must be somewhere.

So here are two logcats:

  • first is a repro WITH sentry : run app, click on Run button, exception is catched, close app, run it again, Segfault is sent to Sentry.
    logcat_with_sentry.txt
  • second is a repro WITHOUT sentry : run app, click on Run button, exception is catched, close app, run it again.
    logcat_without_sentry.txt

@grendello
Copy link
Contributor

@tranb3r thanks, I'd still like to see the error your instance of Sentry records, the one with SIGSEGV. It would be helpful if you pasted it (and whatever context surrounds it) here.

@tranb3r
Copy link
Author

tranb3r commented Jun 27, 2024

image

image

@tranb3r
Copy link
Author

tranb3r commented Jun 27, 2024

@grendello
I've posted screenshots of everything I can see in Sentry.

@grendello
Copy link
Contributor

@tranb3r thanks! They clearly have the whole stack trace somewhere, since without it they wouldn't show the registers nor the frame where it happens. Alas, their UI made the trace useless - could you try digging in the UI to find the raw data they parse?

@tranb3r
Copy link
Author

tranb3r commented Jun 27, 2024

I really think that's everything I can get from the UI.

Adding @jamescrosswell , hope you don't mind.
James, maybe you can provide more context?

@grendello
Copy link
Contributor

It's too bad the raw data is missing. With these crashes the context is everything - not just the header, but most importantly the frames themselves. With some kinds of signals (e.g. SIGABRT) lines preceding the native trace are often crucial. Contents of registers, as shown in the screenshot, is mostly of secondary interest - they don't give us any information about the location of the crash (except of, in this case, the RPI register which points to code position at the crash) with regards to source code. Even if the trace doesn't contain file:line information, it contains addresses relative to the loaded shared libraries/executables and we can post-mortem translate them to code location (not always, but in most cases). The information contained in the UI above is, alas, not helpful.

@tranb3r
Copy link
Author

tranb3r commented Jun 28, 2024

Here is the data that is saved by Sentry when the error occurs. It contains a bit more than what is visible in the UI. Maybe you can take a look?
68e4d535-186b-42dc-56f2-c213526eed94.envelope.json

@grendello
Copy link
Contributor

The only frame in this data is this:

"frames": [{
  "instruction_addr": "0x7085005af665",
  "package": "/data/app/~~lhSuplZuwh7nocNaMhXmWw==/com.companyname.mauiappsegfault-henAPYLgWXjjYhjFwUCWbQ==/split_config.x86_64.apk!/lib/x86_64/libaot-MauiAppSegfault.dll.so",
  "image_addr": "0x7085005ae000"
}]

which is weird, because AOT libraries don't contain directly executable code. They contain code-as-data that is loaded by the runtime, patched up and made executable so the frame shouldn't point to anything in that shared library. However, try without AOT and see if you still get the segfault?

I don't know if this is the only frame that was logged by Android or the only frame that was deemed worthy of being saved by Sentry, but without the rest it's very, VERY, hard to see what's going on.

The fact that the segfault is reported by Sentry, but is not in your logcat is very weird. The only explanation that comes to me is that Sentry intercepts it and prevents it from ending up in the logcat, exiting the application "cleanly" instead. Is such scenario possible?

@tranb3r
Copy link
Author

tranb3r commented Jun 28, 2024

However, try without AOT and see if you still get the segfault?

Yes. Still getting the segfault without AOT (<RunAOTCompilation>false</RunAOTCompilation>)

The fact that the segfault is reported by Sentry, but is not in your logcat is very weird. The only explanation that comes to me is that Sentry intercepts it and prevents it from ending up in the logcat, exiting the application "cleanly" instead. Is such scenario possible?

This is why I also pasted the logcat without sentry. I don't see the segfault in this log.
So, I don't think that Sentry is preventing the error from ending up in the logcat.
Now the question is, how is Sentry capturing a segfault that we cannot see in the logcat?
cc @jamescrosswell @bitsandfoxes

@tranb3r tranb3r closed this as completed Jun 28, 2024
@tranb3r tranb3r reopened this Jun 28, 2024
@grendello
Copy link
Contributor

@tranb3r Sentry could intercept the signal if it installed its own handler and didn't chain it up to the previous one before exiting the application. This could prevent the signal from being logged in the logcat.

@tranb3r
Copy link
Author

tranb3r commented Jun 28, 2024

@tranb3r Sentry could intercept the signal if it installed its own handler and didn't chain it up to the previous one before exiting the application. This could prevent the signal from being logged in the logcat.

Sure.
But when the app does not use Sentry, there is no segfault in the logcat either, right?
So it means it's not Sentry that is preventing the segfault from being logged.

@grendello
Copy link
Contributor

@tranb3r Sentry could intercept the signal if it installed its own handler and didn't chain it up to the previous one before exiting the application. This could prevent the signal from being logged in the logcat.

Sure. But when the app does not use Sentry, there is no segfault in the logcat either, right? So it means it's not Sentry that is preventing the segfault from being logged.

It may also mean that Sentry is causing and catching the segfault. But we don't have enough information to figure that out yet.

@tranb3r
Copy link
Author

tranb3r commented Jun 28, 2024

I will consider opening an issue in Sentry SDK repository if I do not get an answer from James or Stefan here.

@bruno-garcia
Copy link
Member

Sentry could intercept the signal if it installed its own handler and didn't chain it up to the previous one before exiting the application.

sentry-native should chain the handlers so this shouldn't be the case. @Swatinem or @markushi could confirm this

Now the question is, how is Sentry capturing a segfault that we cannot see in the logcat?
I'm not sure how this could happen, and I wonder if it's something related to .NET's usage of signals that our SDK is capturing as an error?
IIRC .NET used some signals to communicate between the native and .NET layers. I wonder if that could be related here?

Looking at the screenshot of this event in Sentry I noticed it's missing symbols. Any chance you can upload debug symbols when building the app so we can see the stack trace? Upload is done automatically via msbuild if you configure your .NET project.

Thanks for the repro though, at least we have something to dig into it. @bitsandfoxes said he's taking a look

@bitsandfoxes
Copy link

Running the repro without any problems and I can't seem to reproduce the segfault being sent. I'm on Android 14 on a Pixel 6, might this be a device-specific issue?

@tranb3r
Copy link
Author

tranb3r commented Jun 28, 2024

Running the repro without any problems and I can't seem to reproduce the segfault being sent. I'm on Android 14 on a Pixel 6, might this be a device-specific issue?

Did you follow all the steps ? (release mode ; tap the run button; restart the app)
I'm reproducing on emulator and pixel5. Could you please try to repro either on emulator or pixel5?
I can also provide an apk for you to test on pixel6 if you give me your Dsn.

@bitsandfoxes
Copy link

Did you follow all the steps ? (release mode ; tap the run button; restart the app)

Let me try that again.

@dotnet-policy-service dotnet-policy-service bot added need-attention A xamarin-android contributor needs to review and removed need-info Issues that need more information from the author. labels Jul 1, 2024
@tranb3r
Copy link
Author

tranb3r commented Jul 2, 2024

I can confirm my repro on:

  • android 10, 11, 12, 13, 14
  • pixel 5, 6, 7 ; moto G20 ; xiaomi redmi 7 ; oneplus 8 pro

As soon as the Run button is pressed and the exception is triggered, a last_crash file and an xxx.envelope file are created in the cache folder of the app. But the exception is catched and the app is not crashing. The crash report is sent to Sentry when the app is restarted.

Is Sentry capturing a crash of the app? Or is it Sentry SDK that is actually crashing?

@bitsandfoxes
Copy link

bitsandfoxes commented Jul 3, 2024

So after a whole bunch of testing:

  1. This only happens in Release and not in Debug
  2. This only happens if there is an actual exception.
    i.e. var s = default(string); var c = s.Length;
  3. throw new Exception() does not cause a signal on the signal handler.

@grendello is this intended behavior? And if so, how does the runtime handle the signal? What is getting checked to ignore the signal safely?

For context, the SDK is hooking itself up to the signal handler and it receives a signal, thus creating an event from it. That signal then gets forwarded and seems to get swallowed.

@grendello
Copy link
Contributor

@bitsandfoxes no, the behavior isn't correct... I wonder if your handler catches the signal on a thread that's not attached to MonoVM and thus there are no handlers to chain to.

@bruno-garcia
Copy link
Member

To confirm, a C# null reference is supposed to trigger SIGSEGV? I mean seems that's what's happening, but the app doesn't crash. If Sentry wasn't there the signal probably wouldn't be 'noticed' by anything else in the app. Correct?

@grendello
Copy link
Contributor

@bruno-garcia no, a managed null reference shouldn't do that. SIGSEGV is always an issue with some native code. If sentry weren't there, then the Android launcher process would have caught it, logged in logcat and created a tombstone. Since there's no segfault without Sentry, however, the problem either exists in Sentry's native code or something Sentry does to/in managed land triggers a bug in the MonoVM runtime.

@bruno-garcia
Copy link
Member

If sentry weren't there, then the Android launcher process would have caught it, logged in logcat and created a tombstone.

Doesn't seem to be the case as OP mentioned without Sentry still nothing shows up on logcat. Unless I misunderstood things.

Since there's no segfault without Sentry, however, the problem either exists in Sentry's native code or something Sentry does to/in managed land triggers a bug in the MonoVM runtime.

That's possible (except the signal might still exist even without Sentry since nothing shows up logcat either with or without Sentry).

I'm saying that after reading:

This is why I also pasted the logcat without sentry. I don't see the segfault in this log.

@grendello
Copy link
Contributor

If sentry weren't there, then the Android launcher process would have caught it, logged in logcat and created a tombstone.

Doesn't seem to be the case as OP mentioned without Sentry still nothing shows up on logcat. Unless I misunderstood things.

It's a very, very, rare case that an Android application crashes with segfault and nothing gets logged. I think, I've seen only one such instance over the years. Sometimes there's very little logged, but there's always a trace. If MonoVM misses to catch the signal, ART or Zygote will. Mono won't silently handle and ignore a segfault, so the fact that we see nothing in the logcat without Sentry means that most likely (very likely) there's no signal raised.

Since there's no segfault without Sentry, however, the problem either exists in Sentry's native code or something Sentry does to/in managed land triggers a bug in the MonoVM runtime.

That's possible (except the signal might still exist even without Sentry since nothing shows up logcat either with or without Sentry).

This is very unlikely, as explained above. The only way for that to happen is if there were a signal handler installed somewhere which would handle and not log the segfault. Neither ART nor MonoVM would do that, and also there's no good reason to swallow such destructive and dangerous signals, I can't imagine why a legitimate piece of software would do that. The only scenario I can imagine is where we have a corrupted chain of signal handlers. For instance, let's consider a situation where both ART and MonoVM uninstall their own handlers and the chain is left with a handler in the middle that does some processing when it captures a signal and then passes it on, if there is another handler installed. If we assume that other signals are gone and our hypothetical handler doesn't have logging code, nor was it designed to abort the application, the app can keep running. I can imagine this scenario with software like Sentry running in the application, but I can't imagine it without it present (barring application itself doing that, of course).

@tranb3r
Copy link
Author

tranb3r commented Jul 24, 2024

Is there anything I can do to help make some progress?

@supervacuus
Copy link

Hi, I'm a Sentry Native SDK maintainer. I had the chance to investigate this topic further to understand what was happening.

In short, what is happening is the following:

  1. The dotnet JIT and AOT generate machine code for the provided code snippet that will cause a page fault in the CPU (specifically accessing the NULL page),
  2. The kernel must respond in this case and will invoke the process' signal handler with a SIGSEGV
    Since the Native SDK installs its signal handler last, it will be the first in the signal chain.
  3. It is unaware that it is running inside the CLR, and a SIGSEGV will produce a crash event that will be reported as a native crash.
  4. Ultimately, our signal handler invokes the next in the chain, i.e., the one from the dotnet runtime.
  5. The runtime handler identifies the signal source as part of the generated machine code for its managed code and raises a managed code exception (NullReferenceException) that, if uncaught, will also produce a crash event.
  6. In the case where a signal can be "converted" into a managed code exception, the dotnet runtime handlers will discontinue the signal chain (since the program must continue with the exception), which is why debuggerd will not log any crashes in logcat (and no tombstones will be generated either).

This is why you will get two events and see no crashes in logcat. The SIGSEGV isn't an additional crash (and isn't provoked by the Native SDK either) but the result of optimized CLR-generated native code. The dotnet CLR expects this to happen and converts it to a NullReferenceException in its signal handlers.

The Native SDK will also receive that SIGSEGV, but it acts like the signal was an unrecoverable native crash. That should change. An approach we already tested on Linux (where we can see the same behavior) was to invoke the dotnet runtime handler at the start of our handler (rather than the end).

In the case of a native-provoked managed code exception, our handler would never execute (and, as a result, produce a crash event). Only if the runtime handler continues the signal chain (which would either be an unintended CLR crash or, more likely, a crash in some other native code) will we send a native crash.


If you want to follow along the process in the dotnet runtime:

  1. The dotnet runtime installs its signal handlers here:
    https://github.com/dotnet/runtime/blob/35f62c1c6938da074b4c350d6f81947d38bb316d/src/coreclr/src/pal/src/exception/signal.cpp#L165-L175

  2. While it installs different handlers for each signal type, in our case, the sigsegv_handler()
    https://github.com/dotnet/runtime/blob/35f62c1c6938da074b4c350d6f81947d38bb316d/src/coreclr/src/pal/src/exception/signal.cpp#L509-L572
    they all boil down to checking whether a signal should be handled as a native crash, typically continuing the signal chain or as a managed code crash provoked from its generated native code.

  3. The latter case ends up in the common_signal_handler()
    https://github.com/dotnet/runtime/blob/35f62c1c6938da074b4c350d6f81947d38bb316d/src/coreclr/src/pal/src/exception/signal.cpp#L837-L906
    which produces an SEH exception record from the signal data and in the end calls SEHProcessException()
    https://github.com/dotnet/runtime/blob/main/src/coreclr/pal/src/exception/seh.cpp#L250-L288,
    which either passes the exception on to additional hardware handlers (our case), throws the SEH right there or can't propagate the exception which case the path falls back to signal chaining.

  4. In the end, we either have a CPU exception that will be reported by the OS default handler (and tombstoned/logged to logcat via debuggerd on Android) or (the case in the issue) we never really return to the signal chain, and the program sets the PC to continue raising a managed code exception, ending up here:
    https://github.com/dotnet/runtime/blob/main/src/coreclr/vm/exceptionhandling.cpp#L5537-L5573.

I couldn't see any significant differences between a Linux/Android handling, rather that the same is true for other POSIX systems as well (except where that layer only emulates lower level mechanisms, i.e., Mach and SEH).


I also created a trivial dotnet program (running on a Linux GHA runner) that installs a signal handler similar ours (without all the handling code, but maintaining the signal chain):

https://github.com/supervacuus/signals_dotnet

This program doesn't use any sentry code and shows the same behavior, the last installed signal handler will receive the signal provoked by the generated code while the dotnet runtime handler (if invoked) will produce a NullReferenceException.

@jamescrosswell
Copy link

Awesome - thank you @supervacuus !

An approach we already tested on Linux (where we can see the same behavior) was to invoke the dotnet runtime handler at the start of our handler (rather than the end).

It sounds like we can close this issue in the dotnet/android repo and create one instead in getsentry/sentry-native then?

@supervacuus
Copy link

It sounds like we can close this issue in the dotnet/android repo and create one instead in getsentry/sentry-native then?

Yes, I think this entirely a sentry issue.

@grendello
Copy link
Contributor

@supervacuus thanks for the analysis, however I have one thing that must be noted. In the case of .NET for Android, CLR isn't used, we use the MonoVM runtime, so while overall behavior might be the same, the details may differ.

@supervacuus
Copy link

@supervacuus thanks for the analysis, however I have one thing that must be noted. In the case of .NET for Android, CLR isn't used, we use the MonoVM runtime, so while overall behavior might be the same, the details may differ.

Yeah, sorry, you are absolutely right @grendello. I did not mention this because I did not want to extend the already long comment. While the concrete implementation differs, MonoVM has a similar signal-to-exception mapping to the one in the CLR, split across:

My point was also that we (Sentry) should probably have a more abstract view of the differences between the two implementations and care more about the similarity in behavior as seen from the chaining in our handler.

But I am not a runtime dev, and any input you can give us on the implementation details and how they could affect choices in our handler is more than welcome!

@grendello
Copy link
Contributor

@supervacuus got it :) The details I presume will be more or less the same, as far as the mechanics are concerned, after all we're dealing with standard POSIX way of chaining signals. I mentioned MonoVM just for the record and fullness of information, so that future readers of this issue have a clear image on what's involved here.

@david-maw
Copy link

I don't have much value to add except to say it's happening to me too. Thanks @tranb3r for submitting this and giving me a hint as to what might have triggered the intermittent SIGSEGV I'm seeing in release code. So far, it's highly intermittent and not reproducible but this is at least a clue as to what to look for!

@gwise-vision
Copy link

gwise-vision commented Sep 28, 2024

This does not crash the actual app, it just shows an error in Sentry.
It seems that Sentry is having issues with .net8. This issue should be fixed by Sentry.

@gwise-vision
Copy link

안녕하세요. Sentry Native SDK 유지 관리자입니다. 무슨 일이 작용하는지 이해하기 위해 이 주제를 더 조사할 기회가 있음을 알 수 있습니다.

간단히 말해서, 무슨 일이 활동하고 있는지는 다음과 같습니다.

  1. dotnet JIT 및 AOT는 CPU에서 페이지 오류가 발생하는 코드 조각에 대한 머신 코드를 생성합니다(문자 페이지 액세스 NULL).
  2. 이 경우에는 응답해야 하며 프로세스의 신호 처리를 요청합니다. SIGSEGV
    SDK가 신호 처리기를 마지막으로 설치하면 신호 체인이 첫 번째로 설치됩니다.
  3. CLR 내부에서 실행되고 있다는 사실을 인식하지 못했습니다, SIGSEGV가 충돌하여 보고되는 충돌 이벤트를 생성합니다.
  4. 궁극적으로, 우리의 신호 처리기는 체인의 다음 신호, 즉 dotnet 내부의 신호를 호출합니다.
  5. 분배 핸들러는 관리되는 코드에 대한 생성된 머신의 일부로 신호 소스를 식별하고 관리하는 코드 로그( NullReferenceException)에 발생합니다. 이 오류가 발생하지 않고 충돌 이벤트가 발생합니다.
  6. 신호가 관리되는 코드 예외로 "변환"이 될 수 있는 경우, dotnet 손잡이 핸들러는 신호 컨테이너를 중단합니다(프로그램이 이벤트를 계속 처리해야 하기 때문에). 따라서 debuggerdlogcat에 충돌을 기록하지 않으며, 삭제 표시도 생성되지 않습니다.

이것이 두 가지 이벤트를 포함하는 logcat에서 충돌을 일으키지 않는 이유입니다. 이는 추가 충돌이 발생하지 않은 것입니다(네이티브 SDK에 관련하여 관련됨) 최적화된 CLR 생성된 코드의 결과입니다. dotnet CLR이 예상되고 이를 신호 처리기에서 SIGSEGVa로 변환합니다.NullReferenceException

SDK도 그것을 수신 SIGSEGV했지만, 신호 복구는 일치된 크래시인처럼 작동합니다. 그래야 해요. 우리가 이미 Linux에서 테스트한 접근 방식(동일한 동작을 볼 수 있는 곳)은 핸들러의 시작 부분(끝이 아닌)에서 dotnet 핸들러를 호출하는 것이었습니다.

처리되지 않은 경우에는 처리할 문서가 없는 경우, 핸들러는 실행하지 않습니다(그리고 결과적으로 래시 이벤트를 생성합니다). 핸들러가 신호를 계속하는 경우에만(의도치 않은 CLR 크래시 또는 더 가능성이 높은 것으로 확인된 코드의 크래시) 확인된 크래시를 보냅니다.

dotnet에서 프로세서를 종료하려면:

  1. dotnet은 여기 신호 처리기를 설치합니다:
    https://github.com/dotnet/runtime/blob/35f62c1c6938da074b4c350d6f81947d38bb316d/src/coreclr/src/pal/src/Exception/signal.cpp#L165-L175
  2. 각 신호 방식에 대해 서로 다른 처리기를 설치하지만, 우리의 경우 sigsegv_handler()
    https://github.com/dotnet/runtime/blob/35f62c1c6938da074b4c350d6f81947d38bb316d/src/coreclr/src/pal/src/Exception/signal.cpp#L509-L572
    모든 신호를 일치시키도록 처리해야 하는지 여부를 확인하는 것으로 요약하고, 일반적으로 신호 체인을 처리하거나 생성된 것으로 처리되어야 합니다.
  3. common_signal_handler()
    서명의 경우는 https://github.com/dotnet/runtime/blob/35f62c1c6938da074b4c350d6f81947d38bb316d/src/coreclr/src/pal/src/Exception/signal.cpp#L837-L906 에서
    바로며 , SEH 전부터 신호 데이터가 생성되고 결국 SEHProcessException()
    https://github.com/dotnet/runtime/blob/main/src/coreclr/pal/src/Exception/seh.cpp#L250-L288 을 호출합니다.
    호스팅은 예외를 추가 하드웨어 처리기에 전달하거나(우리의 경우) SEH바로 해당 지역에서 예외를 던져야 할 수 있는 유일한 경우가 있는 신호 체이닝으로 돌아갑니다.
  4. 결국, OS 기본 처리기에서 보고되는 CPU 예외 발생이 발생하거나(Android에서 삭제 표시/로깅됨 logcat) 또는(이 문제의 경우) 신호로 돌아가고 프로그램이 PC를 설정하여 관리되는 코드 백업이 계속 발생하도록 경고합니다. https://github.com/dotnet/runtime/blob/main/src/coreclr/vm/Exceptionhandling.cpp#L5537-L5573debuggerd 에서 종료게시 .

Linux/Android의 처리 방식 간에는 의미가 있는 차이점을 구별할 수 있는 방식과, POSIX 시스템과 동일했습니다(해당 부분이 Mach와 SEH와 같은 하위 레벨의 플레이어만 에뮬레이트하는 경우를 제외하고).

저는 Linux GHA 러너에서 실행되는 작은 dotnet 프로그램을 만들었고, 이 프로그램은 우리와 유사한 신호 처리를 설치합니다(모든 처리 코드는 아니지만 신호는 유지됨):

https://github.com/supervacuus/signals_dotnet

이 프로그램은 센트리 코드를 사용하지 않는 것과 비슷합니다. 마지막으로 신호 처리기는 생성된 코드에 의해 실제로 신호를 받는 반면, dotnet이 처리하는 경우(호출된 경우)는 NullReferenceException.

I think this is a bug in Sentry. How will Sentry fix this?

@bruno-garcia
Copy link
Member

I think this is a bug in Sentry. How will Sentry fix this?

That's being tracked here:

And the PR addressing it is already open:

It's available on 7.15.0-alpha.1 of Sentry for Android (not the .NET SDK yet) while we test things out. But once it's all validated, we'll merge this to main and update the .NET SDK too.

@tranb3r
Copy link
Author

tranb3r commented Dec 19, 2024

@supervacuus @jamescrosswell
I don't know exactly who is working on this issue...
I've tested my sample app with Sentry.Maui 5.0.0, and now, instead of simply logging a crash into Sentry when the exception is catched, it's actually crashing the app. So it's a lot worse than before.
No crash when removing Sentry sdk. The exception is simply catched by code as expected.
Could you please take a look?

@bruno-garcia
Copy link
Member

@supervacuus @jamescrosswell I don't know exactly who is working on this issue... I've tested my sample app with Sentry.Maui 5.0.0, and now, instead of simply logging a crash into Sentry when the exception is catched, it's actually crashing the app. So it's a lot worse than before. No crash when removing Sentry sdk. The exception is simply catched by code as expected. Could you please take a look?

Could you share some context about the crash? The stack trace would help.
An issue on the repo would be very helpful.

@tranb3r
Copy link
Author

tranb3r commented Dec 19, 2024

Here is the logcat.

12-19 20:20:27.031	17567	17567	com.companyname.mauiappsegfault	I	MauiAppSegFault	Button_OnClicked
12-19 20:20:27.032	17567	17567	com.companyname.mauiappsegfault	I	sentry-native	entering signal handler
12-19 20:20:27.033	17567	17567	com.companyname.mauiappsegfault	D	sentry-native	defer to runtime signal handler at start
12-19 20:20:27.033	17567	17567	com.companyname.mauiappsegfault	D	sentry-native	return from runtime signal handler, we handle the signal
12-19 20:20:27.055	17567	17567	com.companyname.mauiappsegfault	D	sentry-native	captured backtrace from ucontext with 2 frames
12-19 20:20:27.055	17567	17567	com.companyname.mauiappsegfault	D	sentry-native	captured backtrace with 2 frames
12-19 20:20:27.056	17567	17602	com.companyname.mauiappsegfault	D	EGL_emulation	app_time_stats: avg=798.73ms min=2.61ms max=7759.53ms count=10
12-19 20:20:27.056	17567	17567	com.companyname.mauiappsegfault	D	sentry-native	merging scope into event
12-19 20:20:27.056	17567	17567	com.companyname.mauiappsegfault	D	sentry-native	trying to read modules from /proc/self/maps
12-19 20:20:27.197	17567	17567	com.companyname.mauiappsegfault	D	sentry-native	read 420 modules from /proc/self/maps
12-19 20:20:27.200	17567	17567	com.companyname.mauiappsegfault	D	sentry-native	adding attachments to envelope
12-19 20:20:27.200	17567	17567	com.companyname.mauiappsegfault	D	sentry-native	sending envelope
12-19 20:20:27.201	17567	17567	com.companyname.mauiappsegfault	I	sentry-native	crash has been captured
12-19 20:20:27.203	17567	17567	com.companyname.mauiappsegfault	F	libc	Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x10 in tid 17567 (mauiappsegfault), pid 17567 (mauiappsegfault)
12-19 20:20:27.268	17624	17624	No info available	E	crash_dump64	failed to get the guest state header for thread 17567: Bad address
12-19 20:20:27.268	17624	17624	No info available	E	crash_dump64	failed to get the guest state header for thread 17568: Bad address
12-19 20:20:27.270	17624	17624	No info available	E	crash_dump64	failed to get the guest state header for thread 17569: Bad address
12-19 20:20:27.270	17624	17624	No info available	E	crash_dump64	failed to get the guest state header for thread 17570: Bad address
12-19 20:20:27.275	17624	17624	No info available	E	crash_dump64	failed to get the guest state header for thread 17571: Bad address
12-19 20:20:27.279	17624	17624	No info available	E	crash_dump64	failed to get the guest state header for thread 17573: Bad address
12-19 20:20:27.280	17624	17624	No info available	E	crash_dump64	failed to get the guest state header for thread 17574: Bad address
12-19 20:20:27.288	17624	17624	No info available	E	crash_dump64	failed to get the guest state header for thread 17575: Bad address
12-19 20:20:27.292	17624	17624	No info available	E	crash_dump64	failed to get the guest state header for thread 17577: Bad address
12-19 20:20:27.293	17624	17624	No info available	E	crash_dump64	failed to get the guest state header for thread 17578: Bad address
12-19 20:20:27.294	17624	17624	No info available	E	crash_dump64	failed to get the guest state header for thread 17580: Bad address
12-19 20:20:27.294	17624	17624	No info available	E	crash_dump64	failed to get the guest state header for thread 17582: Bad address
12-19 20:20:27.295	17624	17624	No info available	E	crash_dump64	failed to get the guest state header for thread 17583: Bad address
12-19 20:20:27.298	17624	17624	No info available	E	crash_dump64	failed to get the guest state header for thread 17584: Bad address
12-19 20:20:27.299	17624	17624	No info available	E	crash_dump64	failed to get the guest state header for thread 17585: Bad address
12-19 20:20:27.299	17624	17624	No info available	E	crash_dump64	failed to get the guest state header for thread 17589: Bad address
12-19 20:20:27.300	17624	17624	No info available	E	crash_dump64	failed to get the guest state header for thread 17590: Bad address
12-19 20:20:27.303	17624	17624	No info available	E	crash_dump64	failed to get the guest state header for thread 17591: Bad address
12-19 20:20:27.307	17624	17624	No info available	E	crash_dump64	failed to get the guest state header for thread 17592: Bad address
12-19 20:20:27.310	17624	17624	No info available	E	crash_dump64	failed to get the guest state header for thread 17596: Bad address
12-19 20:20:27.316	17624	17624	No info available	E	crash_dump64	failed to get the guest state header for thread 17597: Bad address
12-19 20:20:27.318	17624	17624	No info available	E	crash_dump64	failed to get the guest state header for thread 17598: Bad address
12-19 20:20:27.323	17624	17624	No info available	E	crash_dump64	failed to get the guest state header for thread 17599: Bad address
12-19 20:20:27.324	17624	17624	No info available	E	crash_dump64	failed to get the guest state header for thread 17600: Bad address
12-19 20:20:27.324	17624	17624	No info available	E	crash_dump64	failed to get the guest state header for thread 17602: Bad address
12-19 20:20:27.330	17624	17624	No info available	E	crash_dump64	failed to get the guest state header for thread 17605: Bad address
12-19 20:20:27.332	17624	17624	No info available	E	crash_dump64	failed to get the guest state header for thread 17606: Bad address
12-19 20:20:27.334	17624	17624	No info available	E	crash_dump64	failed to get the guest state header for thread 17610: Bad address
12-19 20:20:27.338	17624	17624	No info available	E	crash_dump64	failed to get the guest state header for thread 17611: Bad address
12-19 20:20:27.341	17624	17624	No info available	E	crash_dump64	failed to get the guest state header for thread 17613: Bad address
12-19 20:20:27.346	17624	17624	No info available	E	crash_dump64	failed to get the guest state header for thread 17614: Bad address
12-19 20:20:27.348	17624	17624	No info available	E	crash_dump64	failed to get the guest state header for thread 17617: Bad address
12-19 20:20:27.377	17624	17624	No info available	I	crash_dump64	obtaining output fd from tombstoned, type: kDebuggerdTombstoneProto
12-19 20:20:27.385	237	237	tombstoned	I	tombstoned	received crash request for pid 17567
12-19 20:20:27.399	17624	17624	No info available	I	crash_dump64	performing dump of process 17567 (target tid = 17567)
12-19 20:20:27.794	0	0	No info available	I	logd	logdr: UID=10248 GID=10248 PID=17624 n tail=500 logMask=8 pid=17567 start=0ns deadline=0ns
12-19 20:20:27.818	0	0	No info available	I	logd	logdr: UID=10248 GID=10248 PID=17624 n tail=500 logMask=1 pid=17567 start=0ns deadline=0ns
12-19 20:20:27.896	17624	17624	crash_dump64	F	DEBUG	*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
12-19 20:20:27.897	17624	17624	crash_dump64	F	DEBUG	Build fingerprint: 'google/sdk_gphone64_x86_64/emu64xa:15/AE3A.240806.005/12228598:userdebug/dev-keys'
12-19 20:20:27.897	17624	17624	crash_dump64	F	DEBUG	Revision: '0'
12-19 20:20:27.897	17624	17624	crash_dump64	F	DEBUG	ABI: 'x86_64'
12-19 20:20:27.897	17624	17624	crash_dump64	F	DEBUG	Timestamp: 2024-12-19 20:20:27.449787200+0100
12-19 20:20:27.897	17624	17624	crash_dump64	F	DEBUG	Process uptime: 12s
12-19 20:20:27.898	17624	17624	crash_dump64	F	DEBUG	Cmdline: com.companyname.mauiappsegfault
12-19 20:20:27.898	17624	17624	crash_dump64	F	DEBUG	pid: 17567, tid: 17567, name: mauiappsegfault  >>> com.companyname.mauiappsegfault <<<
12-19 20:20:27.898	17624	17624	crash_dump64	F	DEBUG	uid: 10248
12-19 20:20:27.898	17624	17624	crash_dump64	F	DEBUG	signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0000000000000010
12-19 20:20:27.899	17624	17624	crash_dump64	F	DEBUG	Cause: null pointer dereference
12-19 20:20:27.900	17624	17624	crash_dump64	F	DEBUG	    rax 0000000000000010  rbx 0000743e7572be70  rcx 0000744091747018  rdx 0000000000000006
12-19 20:20:27.902	17624	17624	crash_dump64	F	DEBUG	    r8  0000000000000000  r9  0000000000000000  r10 0000000000000002  r11 0000743dc2c91550
12-19 20:20:27.903	17624	17624	crash_dump64	F	DEBUG	    r12 0000000000000007  r13 0000000041be0180  r14 0000743d3111b7e0  r15 0000743d311193d8
12-19 20:20:27.903	17624	17624	crash_dump64	F	DEBUG	    rdi 0000000000000000  rsi 0000000000000600
12-19 20:20:27.903	17624	17624	crash_dump64	F	DEBUG	    rbp 00007ffe58f40e80  rsp 00007ffe58f40c98  rip 0000743d345f4430
12-19 20:20:27.905	17624	17624	crash_dump64	F	DEBUG	2 total frames
12-19 20:20:27.908	17624	17624	crash_dump64	F	DEBUG	backtrace:
12-19 20:20:27.908	17624	17624	crash_dump64	F	DEBUG	      #00 pc 00000000001d0430  /data/app/~~kOlrZw0fx5jyxEGaHLBMig==/com.companyname.mauiappsegfault-ZUIs01PquU-grZ5Tk0CSZA==/split_config.x86_64.apk!libmonosgen-2.0.so (offset 0x117c000) (BuildId: 37021294544f624d009dfa77e9b7297559f13344)
12-19 20:20:27.908	17624	17624	crash_dump64	F	DEBUG	      #01 pc 0000000000004704  /data/app/~~kOlrZw0fx5jyxEGaHLBMig==/com.companyname.mauiappsegfault-ZUIs01PquU-grZ5Tk0CSZA==/split_config.x86_64.apk (offset 0x16c000)
12-19 20:20:27.969	237	237	tombstoned	E	tombstoned	Tombstone written to: tombstone_00

@tranb3r
Copy link
Author

tranb3r commented Dec 19, 2024

@bruno-garcia
Do you want me to open another issue?
This one has not even been closed yet.

@jamescrosswell
Copy link

jamescrosswell commented Dec 19, 2024

Do you want me to open another issue?

@tranb3r we had this issue and PR related in the sentry-dotnet repo:

I think that issue was a bit confusing though as it described two things - the SIGSEGV and the missing logcat attachments.

In any event, yes I think you could open a new issue in the sentry-dotnet repo and just reference the issue/PR above as well as this issue in the dotnet/android repo.

@tranb3r
Copy link
Author

tranb3r commented Dec 20, 2024

In any event, yes I think you could open a new issue in the sentry-dotnet repo and just reference the issue/PR above as well as this issue in the dotnet/android repo.

getsentry/sentry-dotnet#3861

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: App Runtime Issues in `libmonodroid.so`. need-attention A xamarin-android contributor needs to review
Projects
None yet
Development

No branches or pull requests

8 participants