mirror of
http://galexander.org/git/simplesshd.git
synced 2025-01-13 16:40:54 +00:00
musings about the dump, which must be caused by atexit()
This commit is contained in:
parent
2d8d649cdd
commit
e204c1ea74
69
NOTES
69
NOTES
@ -898,6 +898,75 @@ the manifest...
|
||||
allowBackup="false" took immediate effect and had no surprises...
|
||||
|
||||
|
||||
August 4, 2019.
|
||||
|
||||
I finally got a dump from a user (Hammad), and it's quite distressing.
|
||||
The stack trace is roughly:
|
||||
backtrace()
|
||||
sigsegv_handler()
|
||||
/system/bin/app_process64+0x2a90
|
||||
__kernel_rt_sigreturn()
|
||||
A5xContext::HwAddNop(unsigned int *, unsigned int)
|
||||
EsxCmdMgr::IssuePendingIB1s(EsxFlushReason, int, int)
|
||||
EsxCmdMgr::Flush(EsxFlushReason)
|
||||
EsxContext::Destroy()
|
||||
EglContext::DestroyEsxContext()
|
||||
EglDisplay::MarkContextListForDestroy()
|
||||
EglDisplay::Terminate(int)
|
||||
EglDisplayList::Destroy()
|
||||
EglDisplay::DestroyStaticListsMutexesAndTlsKeys()
|
||||
EsxEntryDestruct()
|
||||
/system/vendor/lib64/egl/libGLESv2_adreno.so+0x12780
|
||||
[... cut off at 16 ...]
|
||||
|
||||
So many questions! I think app_process64 must be the actual C main() of
|
||||
a process, responsible for branching into all the android system
|
||||
libraries? I imagine it's involved because it's somehow intercepted the
|
||||
SIGSEGV and re-dispatched it to my handler? I don't see any way we could
|
||||
have branched into libGLESv2_adreno from userland, so the SIGSEGV must
|
||||
come from the UI thread, I guess? Maybe this SIGSEGV is actually the
|
||||
sort of thing we'd get if we tried to call UI code from the non-UI
|
||||
thread??
|
||||
|
||||
It looks like GLES is busy cleaning itself up, and it crashes. Why's it
|
||||
crash? Why's it trying to clean itself up?
|
||||
|
||||
Hammad says there is no problem using sshd...I thought he meant that the
|
||||
re-start logic is working for him but his dropbear.err has multiple dumps
|
||||
in it! The SIGSEGVs are apparently not killing the daemon.
|
||||
|
||||
There are no timestamps on the dumps, but it looks like they're
|
||||
associated with activity anyways. Each dump happens between "Disconnect
|
||||
received" and "sigchld". Some of them have "server select out"
|
||||
interleaved into the dump, which I think is the result of Hammad running:
|
||||
while true; do ssh phone 'exit'; done
|
||||
That is, it appears he starts a new connection the very instant the old
|
||||
connection ends. So the new connection comes into the server process
|
||||
while the child process is in the act of dying.
|
||||
|
||||
The thing is, I don't see how it could possibly be getting signals from
|
||||
the Java side of things, because it fork()s before setting up the signal
|
||||
handling. It's not just running in a different thread, it should be a
|
||||
totally separate process. I can test this but I don't think I'm wrong
|
||||
about that.
|
||||
|
||||
So I guess just about the only thing that's really possible is that
|
||||
there's an atexit() which survives the fork() because it isn't followed
|
||||
up with an execve(). It's not caused by ARM, or even necessarily by
|
||||
Android 9...the reason it doesn't show up in the emulator is that the
|
||||
libGLES that registers the atexit() is vendor-supplied for specific
|
||||
hardware ("Adreno").
|
||||
|
||||
So I need to figure out how to bypass the atexit() somehow, perhaps by
|
||||
calling _exit() directly?
|
||||
|
||||
|
||||
XXX - merge back into main branch, because I'll want to keep the dump facility
|
||||
XXX - make the dump go deeper in the stack
|
||||
XXX - put a crash in an atexit() to be sure it presents about this way
|
||||
XXX - test re-start mechanism, which doesn't seem to work on the first try if it crashes
|
||||
XXX - test bypassing that crash
|
||||
XXX - remove the crash, remove the debug fprintfs (select in/out, sigchld)
|
||||
|
||||
--- new release
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user