Description: Framework reboot due to native crash in zygote crash with SIGSEGV error
Reproduction: Long duration test with multiple apps and reproduction rate – 1/100.
Description:
Below is the tombstone for zygote:
Line 56603:07-08 10:19:39.605 26565 26565 F DEBUG : Build fingerprint: ------------------------------------------------------------------
Line 56604: 07-08 10:19:39.605 26565 26565 F DEBUG : Revision: '0'
Line 56605: 07-08 10:19:39.605 26565 26565 F DEBUG : ABI: 'arm'
Line 56608: 07-08 10:19:39.606 26565 26565 F DEBUG : pid: 652, tid: 26546, name: HeapTaskDaemon >>> zygote <<<
Line 56609: 07-08 10:19:39.606 26565 26565 F DEBUG : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x130
Line 56610: 07-08 10:19:39.606 26565 26565 F DEBUG : Cause: null pointer dereference
Line 56611: 07-08 10:19:39.606 26565 26565 F DEBUG : r0 dec0b000 r1 5616eb84 r2 00000007 r3 00000000
Line 56612: 07-08 10:19:39.606 26565 26565 F DEBUG : r4 00000130 r5 cd7929c8 r6 00000000 r7 e4fdc380
Line 56613: 07-08 10:19:39.606 26565 26565 F DEBUG : r8 cd7924bc r9 00000000 r10 00000002 r11 00008a0c
Line 56614: 07-08 10:19:39.607 26565 26565 F DEBUG : ip e49b8974 sp cd7924b0 lr e45e1507 pc e45ec316
Line 56712: 07-08 10:19:39.730 26565 26565 F DEBUG :
Line 56713: 07-08 10:19:39.730 26565 26565 F DEBUG : backtrace:
Line 56715: 07-08 10:19:39.730 26565 26565 F DEBUG : #00 pc 000aa316 /system/lib/libart.so (art::TimingLogger::Reset()+106)
Line 56716: 07-08 10:19:39.730 26565 26565 F DEBUG : #01 pc 0016663b /system/lib/libart.so (art::gc::collector::GarbageCollector::Run(art::gc::GcCause, bool)+178)
Line 56717: 07-08 10:19:39.730 26565 26565 F DEBUG : #02 pc 0018035d /system/lib/libart.so (art::gc::Heap::CollectGarbageInternal(art::gc::collector::GcType, art::gc::GcCause, bool)+2420)
Line 56718: 07-08 10:19:39.730 26565 26565 F DEBUG : #03 pc 0018dbeb /system/lib/libart.so (art::gc::Heap::ConcurrentGC(art::Thread*, art::gc::GcCause, bool)+182)
Line 56719: 07-08 10:19:39.730 26565 26565 F DEBUG : #04 pc 00191b11 /system/lib/libart.so (art::gc::Heap::ConcurrentGCTask::Run(art::Thread*)+20)
Line 56720: 07-08 10:19:39.730 26565 26565 F DEBUG : #05 pc 001aa957 /system/lib/libart.so (art::gc::TaskProcessor::RunAllTasks(art::Thread*)+34)
Line 56721: 07-08 10:19:39.731 26565 26565 F DEBUG : #06 pc 0007463b /system/framework/arm/boot-core-libart.oat (offset 0x73000) (dalvik.system.VMRuntime.clampGrowthLimit [DEDUPED]+74)
Line 56722: 07-08 10:19:39.731 26565 26565 F DEBUG : #07 pc 0014a85d /system/framework/arm/boot-core-libart.oat (offset 0x73000) (java.lang.Daemons$HeapTaskDaemon.runInternal+172)
Line 56723: 07-08 10:19:39.731 26565 26565 F DEBUG : #08 pc 000ec963 /system/framework/arm/boot-core-libart.oat (offset 0x73000) (java.lang.Daemons$Daemon.run+66)
Line 56724: 07-08 10:19:39.731 26565 26565 F DEBUG : #09 pc 002151b1 /system/framework/arm/boot-core-oj.oat (offset 0x106000) (java.lang.Thread.run+64)
Line 56725: 07-08 10:19:39.731 26565 26565 F DEBUG : #10 pc 00411575 /system/lib/libart.so (art_quick_invoke_stub_internal+68)
Line 56726: 07-08 10:19:39.731 26565 26565 F DEBUG : #11 pc 003eb045 /system/lib/libart.so (art_quick_invoke_stub+224)
Line 56727: 07-08 10:19:39.731 26565 26565 F DEBUG : #12 pc 000a183d /system/lib/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+136)
Line 56728: 07-08 10:19:39.731 26565 26565 F DEBUG : #13 pc 003498d5 /system/lib/libart.so (art:
anonymous namespace)::InvokeWithArgArray(art::ScopedObjectAccessAlreadyRunnable const&, art::ArtMethod*, art:
anonymous namespace)::ArgArray*, art::JValue*, char const*)+52)
Line 56729: 07-08 10:19:39.731 26565 26565 F DEBUG : #14 pc 0034a62d /system/lib/libart.so (art::InvokeVirtualOrInterfaceWithJValues(art::ScopedObjectAccessAlreadyRunnable const&, _jobject*, _jmethodID*, jvalue*)+320)
Line 56730: 07-08 10:19:39.731 26565 26565 F DEBUG : #15 pc 0036d0a3 /system/lib/libart.so (art::Thread::CreateCallback(void*)+866)
Line 56731: 07-08 10:19:39.731 26565 26565 F DEBUG : #16 pc 00072dcd /system/lib/libc.so (__pthread_start(void*)+22)
Line 56732: 07-08 10:19:39.731 26565 26565 F DEBUG : #17 pc 0001e3b1 /system/lib/libc.so (__start_thread+24)
One more observation is that we saw few app crashes prior to zygote crash in the path of zygote forking these apps.
pid: 17395, tid: 17395, name: o.android.imoi >>> zygote <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x4
Cause: null pointer dereference
r0 00000000 r1 8b148311 r2 00000000 r3 00000000
r4 e4bdc424 r5 e4bdc420 r6 e4b97a64 r7 e4bdc3c8
r8 e4bc8000 r9 e4bdc448 r10 00003000 r11 00000003
ip 000000ff sp ffbe1230 lr e41e7dd5 pc e41e7de8
backtrace:
#00 pc 000a8de8 /system/lib/libart.so (art::CumulativeLogger::Reset()+68)
#01 pc 00166901 /system/lib/libart.so (art::gc::collector::GarbageCollector:: ()+192)
#02 pc 0017d853 /system/lib/libart.so (art::gc::Heap::ResetGcPerformanceInfo()+34)
#03 pc 003570db /system/lib/libart.so (art::Runtime::InitNonZygoteOrPostFork(_JNIEnv*, bool, art::Runtime::NativeBridgeAction, char const*, bool)+74)
#04 pc 002e8fb7 /system/lib/libart.so (art::ZygoteHooks_nativePostForkChild(_JNIEnv*, _jclass*, long long, int, unsigned char, unsigned char, _jstring*)+3146)
#05 pc 00074c63 /system/framework/arm/boot-core-libart.oat (offset 0x73000) (dalvik.system.ZygoteHooks.nativePostForkChild+154)
#06 pc 000eba15 /system/framework/arm/boot-core-libart.oat (offset 0x73000) (dalvik.system.ZygoteHooks.postForkChild+68)
#07 pc 00ba0ab9 /system/framework/arm/boot-framework.oat (offset 0x393000) (com.android.internal.os.Zygote.callPostForkChildHooks+80)
#08 pc 00412975 /system/lib/libart.so (art_quick_invoke_stub_internal+68)
#09 pc 003eaec7 /system/lib/libart.so (art_quick_invoke_static_stub+222)
#10 pc 000a184f /system/lib/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+154)
#11 pc 00349655 /system/lib/libart.so (art:
anonymous namespace)::InvokeWithArgArray(art::ScopedObjectAccessAlreadyRunnable const&, art::ArtMethod*, art:
anonymous namespace)::ArgArray*, art::JValue*, char const*)+52)
#12 pc 0034947f /system/lib/libart.so (art::InvokeWithVarArgs(art::ScopedObjectAccessAlreadyRunnable const&, _jobject*, _jmethodID*, std::__va_list)+310)
#13 pc 00290219 /system/lib/libart.so (art::JNI::CallStaticVoidMethodV(_JNIEnv*, _jclass*, _jmethodID*, std::__va_list)+444)
#14 pc 0006e579 /system/lib/libandroid_runtime.so (_JNIEnv::CallStaticVoidMethod(_jclass*, _jmethodID*, ...)+28)
#15 pc 0011c2ed /system/lib/libandroid_runtime.so ((anonymous namespace)::ForkAndSpecializeCommon(_JNIEnv*, unsigned int, unsigned int, _jintArray*, int, _jobjectArray*, long long, long long, int, _jstring*, _jstring*, bool, _jintArray*, _jintArray*, bool, _jstring*, _jstring*)+4052)
#16 pc 0011ab37 /system/lib/libandroid_runtime.so (android::com_android_internal_os_Zygote_nativeForkAndSpecialize(_JNIEnv*, _jclass*, int, int, _jintArray*, int, _jobjectArray*, int, _jstring*, _jstring*, _jintArray*, _jintArray*, unsigned char, _jstring*, _jstring*)+470)
#17 pc 003b8ba3 /system/framework/arm/boot-framework.oat (offset 0x393000) (com.android.internal.os.Zygote.nativeForkAndSpecialize+338)
#18 pc 00ba3a8b /system/framework/arm/boot-framework.oat (offset 0x393000) (com.android.internal.os.ZygoteConnection.processOneCommand+1450)
#19 pc 00ba7a5b /system/framework/arm/boot-framework.oat (offset 0x393000) (com.android.internal.os.ZygoteServer.runSelectLoop+770)
#20 pc 00ba5269 /system/framework/arm/boot-framework.oat (offset 0x393000) (com.android.internal.os.ZygoteInit.main+1696)
#21 pc 00412975 /system/lib/libart.so (art_quick_invoke_stub_internal+68)
#22 pc 003eaec7 /system/lib/libart.so (art_quick_invoke_static_stub+222)
#23 pc 000a184f /system/lib/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+154)
#24 pc 00349655 /system/lib/libart.so (art:
anonymous namespace)::InvokeWithArgArray(art::ScopedObjectAccessAlreadyRunnable const&, art::ArtMethod*, art:
anonymous namespace)::ArgArray*, art::JValue*, char const*)+52)
#25 pc 0034947f /system/lib/libart.so (art::InvokeWithVarArgs(art::ScopedObjectAccessAlreadyRunnable const&, _jobject*, _jmethodID*, std::__va_list)+310)
#26 pc 00290219 /system/lib/libart.so (art::JNI::CallStaticVoidMethodV(_JNIEnv*, _jclass*, _jmethodID*, std::__va_list)+444)
#27 pc 0006e579 /system/lib/libandroid_runtime.so (_JNIEnv::CallStaticVoidMethod(_jclass*, _jmethodID*, ...)+28)
#28 pc 0007073b /system/lib/libandroid_runtime.so (android::AndroidRuntime::start(char const*, android::Vector<android::String8> const&, bool)+462)
#29 pc 00001c8f /system/bin/app_process32 (main+1122)
#30 pc 000a2245 /system/lib/libc.so (__libc_init+48)
#31 pc 000017eb /system/bin/app_process32 (_start_main+38)
#32 pc 000000c4 <unknown>
Analysis:
Loaded coredump in GDB:
#0 art::TimingLogger::Reset (this=0x130) at art/runtime/base/timing_logger.cc:148
No locals.
#1 0xe46a863e in Reset (this=0x120, gc_cause=<optimized out>, clear_soft_references=<optimized out>) at art/runtime/gc/collector/garbage_collector.cc:49
No locals.
#2 art::gc::collector::GarbageCollector::Run (this=0xe4fdc380, gc_cause=art::gc::kGcCauseBackground, clear_soft_references=true) at art/runtime/gc/collector/garbage_collector.cc:92
start_time = 151785656375586
current_iteration = 0x120
end_time = <optimized out>
self = <optimized out>
#3 0xe46c2360 in art::gc::Heap::CollectGarbageInternal (this=0xe4f3c800, gc_type=art::gc::collector::kGcTypeFull, gc_cause=art::gc::kGcCauseBackground, clear_soft_references=<optimized out>)
at art/runtime/gc/heap.cc:2648
runtime = 0xe4f3c400
self = 0xe42acc00
collector = 0xe4fdc380
#4 0xe46cfbee in art::gc::Heap::ConcurrentGC (this=0xe4f3c800, self=0xe42acc00, cause=art::gc::kGcCauseBackground, force_full=<optimized out>) at art/runtime/gc/heap.cc:3675
next_gc_type = art::gc::collector::kGcTypeSticky
tid = 26546
#5 0xe46d3b14 in art::gc::Heap::ConcurrentGCTask::Run (this=<optimized out>, self=0x5616eb84) at art/runtime/gc/heap.cc:3620
heap = 0xe4f3c800
#6 0xe46ec958 in art::gc::TaskProcessor::RunAllTasks (this=0xe4f30200, self=0xe42acc00) at art/runtime/gc/task_processor.cc:129
task = 0xdec08000
#7 0x720bc63c in ?? ()
From here, we see that at frame 2, current_iteration = 0x120 is holding invalid address, which is a member of garbage collector class, see below code for reference.
In file -- art/runtime/gc/collector/garbage_collector.cc
91 Iteration* current_iteration = GetCurrentIteration();
92 current_iteration->Reset(gc_cause, clear_soft_references);
429 const collector::Iteration* GetCurrentGcIteration() const {
430 return ¤t_gc_iteration_;
431 }
1254 collector::Iteration current_gc_iteration_;
And we see the collector object being zeroed out, which seems to be the reason for our crash.
gdb) f 2
#2 art::gc::collector::GarbageCollector::Run (this=0xe4fdc380, gc_cause=art::gc::kGcCauseBackground, clear_soft_references=true) at art/runtime/gc/collector/garbage_collector.cc:92
92 in art/runtime/gc/collector/garbage_collector.cc
(gdb) x/100 this
0xe4fdc380: 0 0 0 0
0xe4fdc390: 0 0 0 0
0xe4fdc3a0: 0 0 0 0
0xe4fdc3b0: 0 0 0 0
0xe4fdc3c0: 0 0 0 0
0xe4fdc3d0: 0 0 0 0
0xe4fdc3e0: 0 0 0 0
0xe4fdc3f0: 0 0 0 0
0xe4fdc400: 0 0 0 0
0xe4fdc410: 0 0 0 0
0xe4fdc420: 0 0 0 0
0xe4fdc430: 0 0 0 0
0xe4fdc440: 0 0 0 0
0xe4fdc450: 0 0 0 0
0xe4fdc460: 0 0 0 0
0xe4fdc470: 0 0 0 0
0xe4fdc480: 0 0 0 0
0xe4fdc490: 0 0 0 0
0xe4fdc4a0: 0 0 0 0
0xe4fdc4b0: 0 0 0 0
0xe4fdc4c0: 0 0 0 0
0xe4fdc4d0: 0 0 0 0
0xe4fdc4e0: 0 0 0 0
0xe4fdc4f0: 0 0 0 0
The app crashes seen prior to this zygote crash also seem be to due to similar reason, collector object being NULL.
Debug approaches -
We have internally tried to use ASAN and malloc_debug to check is such corruptions can be caught.
Unfortunately, after enabling malloc_debug, issue was not reproducible.
And with ASAN enablement, device runs slow, and results in other unrelated issues.
Can you please help to provide any debug suggestions/ share any similar instances of this issue ?
Regards,
Deepika