musings about the dump, which must be caused by atexit()

sigsegv_dump
Greg Alexander 5 years ago
parent 2d8d649cdd
commit e204c1ea74

69
NOTES

@ -898,6 +898,75 @@ the manifest...
allowBackup="false" took immediate effect and had no surprises...
August 4, 2019.
I finally got a dump from a user (Hammad), and it's quite distressing.
The stack trace is roughly:
backtrace()
sigsegv_handler()
/system/bin/app_process64+0x2a90
__kernel_rt_sigreturn()
A5xContext::HwAddNop(unsigned int *, unsigned int)
EsxCmdMgr::IssuePendingIB1s(EsxFlushReason, int, int)
EsxCmdMgr::Flush(EsxFlushReason)
EsxContext::Destroy()
EglContext::DestroyEsxContext()
EglDisplay::MarkContextListForDestroy()
EglDisplay::Terminate(int)
EglDisplayList::Destroy()
EglDisplay::DestroyStaticListsMutexesAndTlsKeys()
EsxEntryDestruct()
/system/vendor/lib64/egl/libGLESv2_adreno.so+0x12780
[... cut off at 16 ...]
So many questions! I think app_process64 must be the actual C main() of
a process, responsible for branching into all the android system
libraries? I imagine it's involved because it's somehow intercepted the
SIGSEGV and re-dispatched it to my handler? I don't see any way we could
have branched into libGLESv2_adreno from userland, so the SIGSEGV must
come from the UI thread, I guess? Maybe this SIGSEGV is actually the
sort of thing we'd get if we tried to call UI code from the non-UI
thread??
It looks like GLES is busy cleaning itself up, and it crashes. Why's it
crash? Why's it trying to clean itself up?
Hammad says there is no problem using sshd...I thought he meant that the
re-start logic is working for him but his dropbear.err has multiple dumps
in it! The SIGSEGVs are apparently not killing the daemon.
There are no timestamps on the dumps, but it looks like they're
associated with activity anyways. Each dump happens between "Disconnect
received" and "sigchld". Some of them have "server select out"
interleaved into the dump, which I think is the result of Hammad running:
while true; do ssh phone 'exit'; done
That is, it appears he starts a new connection the very instant the old
connection ends. So the new connection comes into the server process
while the child process is in the act of dying.
The thing is, I don't see how it could possibly be getting signals from
the Java side of things, because it fork()s before setting up the signal
handling. It's not just running in a different thread, it should be a
totally separate process. I can test this but I don't think I'm wrong
about that.
So I guess just about the only thing that's really possible is that
there's an atexit() which survives the fork() because it isn't followed
up with an execve(). It's not caused by ARM, or even necessarily by
Android 9...the reason it doesn't show up in the emulator is that the
libGLES that registers the atexit() is vendor-supplied for specific
hardware ("Adreno").
So I need to figure out how to bypass the atexit() somehow, perhaps by
calling _exit() directly?
XXX - merge back into main branch, because I'll want to keep the dump facility
XXX - make the dump go deeper in the stack
XXX - put a crash in an atexit() to be sure it presents about this way
XXX - test re-start mechanism, which doesn't seem to work on the first try if it crashes
XXX - test bypassing that crash
XXX - remove the crash, remove the debug fprintfs (select in/out, sigchld)
--- new release

Loading…
Cancel
Save