Description:  Framework reboot due to native crash in zygote crash with SIGSEGV error
Reproduction: Long duration test with multiple apps and reproduction rate – 1/100.
Description:
Below is the tombstone for zygote:
       Line 56603:07-08 10:19:39.605 26565 26565 F DEBUG   : Build fingerprint: ------------------------------------------------------------------
        Line 56604: 07-08 10:19:39.605 26565 26565 F DEBUG   : Revision: '0'
        Line 56605: 07-08 10:19:39.605 26565 26565 F DEBUG   : ABI: 'arm'
        Line 56608: 07-08 10:19:39.606 26565 26565 F DEBUG   : pid: 652, tid: 26546, name: HeapTaskDaemon  >>> zygote <<<
        Line 56609: 07-08 10:19:39.606 26565 26565 F DEBUG   : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x130
        Line 56610: 07-08 10:19:39.606 26565 26565 F DEBUG   : Cause: null pointer dereference
        Line 56611: 07-08 10:19:39.606 26565 26565 F DEBUG   :     r0  dec0b000  r1  5616eb84  r2  00000007  r3  00000000
        Line 56612: 07-08 10:19:39.606 26565 26565 F DEBUG   :     r4  00000130  r5  cd7929c8  r6  00000000  r7  e4fdc380
        Line 56613: 07-08 10:19:39.606 26565 26565 F DEBUG   :     r8  cd7924bc  r9  00000000  r10 00000002  r11 00008a0c
        Line 56614: 07-08 10:19:39.607 26565 26565 F DEBUG   :     ip  e49b8974  sp  cd7924b0  lr  e45e1507  pc  e45ec316
        Line 56712: 07-08 10:19:39.730 26565 26565 F DEBUG   : 
        Line 56713: 07-08 10:19:39.730 26565 26565 F DEBUG   : backtrace:
        Line 56715: 07-08 10:19:39.730 26565 26565 F DEBUG   :     #00 pc 000aa316  /system/lib/libart.so (art::TimingLogger::Reset()+106)
        Line 56716: 07-08 10:19:39.730 26565 26565 F DEBUG   :     #01 pc 0016663b  /system/lib/libart.so (art::gc::collector::GarbageCollector::Run(art::gc::GcCause, bool)+178)
        Line 56717: 07-08 10:19:39.730 26565 26565 F DEBUG   :     #02 pc 0018035d  /system/lib/libart.so (art::gc::Heap::CollectGarbageInternal(art::gc::collector::GcType, art::gc::GcCause, bool)+2420)
        Line 56718: 07-08 10:19:39.730 26565 26565 F DEBUG   :     #03 pc 0018dbeb  /system/lib/libart.so (art::gc::Heap::ConcurrentGC(art::Thread*, art::gc::GcCause, bool)+182)
        Line 56719: 07-08 10:19:39.730 26565 26565 F DEBUG   :     #04 pc 00191b11  /system/lib/libart.so (art::gc::Heap::ConcurrentGCTask::Run(art::Thread*)+20)
        Line 56720: 07-08 10:19:39.730 26565 26565 F DEBUG   :     #05 pc 001aa957  /system/lib/libart.so (art::gc::TaskProcessor::RunAllTasks(art::Thread*)+34)
        Line 56721: 07-08 10:19:39.731 26565 26565 F DEBUG   :     #06 pc 0007463b  /system/framework/arm/boot-core-libart.oat (offset 0x73000) (dalvik.system.VMRuntime.clampGrowthLimit [DEDUPED]+74)
        Line 56722: 07-08 10:19:39.731 26565 26565 F DEBUG   :     #07 pc 0014a85d  /system/framework/arm/boot-core-libart.oat (offset 0x73000) (java.lang.Daemons$HeapTaskDaemon.runInternal+172)
        Line 56723: 07-08 10:19:39.731 26565 26565 F DEBUG   :     #08 pc 000ec963  /system/framework/arm/boot-core-libart.oat (offset 0x73000) (java.lang.Daemons$Daemon.run+66)
        Line 56724: 07-08 10:19:39.731 26565 26565 F DEBUG   :     #09 pc 002151b1  /system/framework/arm/boot-core-oj.oat (offset 0x106000) (java.lang.Thread.run+64)
        Line 56725: 07-08 10:19:39.731 26565 26565 F DEBUG   :     #10 pc 00411575  /system/lib/libart.so (art_quick_invoke_stub_internal+68)
        Line 56726: 07-08 10:19:39.731 26565 26565 F DEBUG   :     #11 pc 003eb045  /system/lib/libart.so (art_quick_invoke_stub+224)
        Line 56727: 07-08 10:19:39.731 26565 26565 F DEBUG   :     #12 pc 000a183d  /system/lib/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+136)
        Line 56728: 07-08 10:19:39.731 26565 26565 F DEBUG   :     #13 pc 003498d5  /system/lib/libart.so (art:

anonymous namespace)::InvokeWithArgArray(art::ScopedObjectAccessAlreadyRunnable const&, art::ArtMethod*, art:

anonymous namespace)::ArgArray*, art::JValue*, char const*)+52)
        Line 56729: 07-08 10:19:39.731 26565 26565 F DEBUG   :     #14 pc 0034a62d  /system/lib/libart.so (art::InvokeVirtualOrInterfaceWithJValues(art::ScopedObjectAccessAlreadyRunnable const&, _jobject*, _jmethodID*, jvalue*)+320)
        Line 56730: 07-08 10:19:39.731 26565 26565 F DEBUG   :     #15 pc 0036d0a3  /system/lib/libart.so (art::Thread::CreateCallback(void*)+866)
        Line 56731: 07-08 10:19:39.731 26565 26565 F DEBUG   :     #16 pc 00072dcd  /system/lib/libc.so (__pthread_start(void*)+22)
        Line 56732: 07-08 10:19:39.731 26565 26565 F DEBUG   :     #17 pc 0001e3b1  /system/lib/libc.so (__start_thread+24)
One more observation is that we saw few app crashes prior to zygote crash in the path of zygote forking these apps.
pid: 17395, tid: 17395, name: o.android.imoi  >>> zygote <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x4
Cause: null pointer dereference
    r0  00000000  r1  8b148311  r2  00000000  r3  00000000
    r4  e4bdc424  r5  e4bdc420  r6  e4b97a64  r7  e4bdc3c8
    r8  e4bc8000  r9  e4bdc448  r10 00003000  r11 00000003
    ip  000000ff  sp  ffbe1230  lr  e41e7dd5  pc  e41e7de8
backtrace:
    #00 pc 000a8de8  /system/lib/libart.so (art::CumulativeLogger::Reset()+68)
    #01 pc 00166901  /system/lib/libart.so (art::gc::collector::GarbageCollector::      ()+192)
    #02 pc 0017d853  /system/lib/libart.so (art::gc::Heap::ResetGcPerformanceInfo()+34)
    #03 pc 003570db  /system/lib/libart.so (art::Runtime::InitNonZygoteOrPostFork(_JNIEnv*, bool, art::Runtime::NativeBridgeAction, char const*, bool)+74)
    #04 pc 002e8fb7  /system/lib/libart.so (art::ZygoteHooks_nativePostForkChild(_JNIEnv*, _jclass*, long long, int, unsigned char, unsigned char, _jstring*)+3146)
    #05 pc 00074c63  /system/framework/arm/boot-core-libart.oat (offset 0x73000) (dalvik.system.ZygoteHooks.nativePostForkChild+154)
    #06 pc 000eba15  /system/framework/arm/boot-core-libart.oat (offset 0x73000) (dalvik.system.ZygoteHooks.postForkChild+68)
    #07 pc 00ba0ab9  /system/framework/arm/boot-framework.oat (offset 0x393000) (com.android.internal.os.Zygote.callPostForkChildHooks+80)
    #08 pc 00412975  /system/lib/libart.so (art_quick_invoke_stub_internal+68)
    #09 pc 003eaec7  /system/lib/libart.so (art_quick_invoke_static_stub+222)
    #10 pc 000a184f  /system/lib/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+154)
    #11 pc 00349655  /system/lib/libart.so (art:

anonymous namespace)::InvokeWithArgArray(art::ScopedObjectAccessAlreadyRunnable const&, art::ArtMethod*, art:

anonymous namespace)::ArgArray*, art::JValue*, char const*)+52)
    #12 pc 0034947f  /system/lib/libart.so (art::InvokeWithVarArgs(art::ScopedObjectAccessAlreadyRunnable const&, _jobject*, _jmethodID*, std::__va_list)+310)
    #13 pc 00290219  /system/lib/libart.so (art::JNI::CallStaticVoidMethodV(_JNIEnv*, _jclass*, _jmethodID*, std::__va_list)+444)
    #14 pc 0006e579  /system/lib/libandroid_runtime.so (_JNIEnv::CallStaticVoidMethod(_jclass*, _jmethodID*, ...)+28)
    #15 pc 0011c2ed  /system/lib/libandroid_runtime.so ((anonymous namespace)::ForkAndSpecializeCommon(_JNIEnv*, unsigned int, unsigned int, _jintArray*, int, _jobjectArray*, long long, long long, int, _jstring*, _jstring*, bool, _jintArray*, _jintArray*, bool, _jstring*, _jstring*)+4052)
    #16 pc 0011ab37  /system/lib/libandroid_runtime.so (android::com_android_internal_os_Zygote_nativeForkAndSpecialize(_JNIEnv*, _jclass*, int, int, _jintArray*, int, _jobjectArray*, int, _jstring*, _jstring*, _jintArray*, _jintArray*, unsigned char, _jstring*, _jstring*)+470)
    #17 pc 003b8ba3  /system/framework/arm/boot-framework.oat (offset 0x393000) (com.android.internal.os.Zygote.nativeForkAndSpecialize+338)
    #18 pc 00ba3a8b  /system/framework/arm/boot-framework.oat (offset 0x393000) (com.android.internal.os.ZygoteConnection.processOneCommand+1450)
    #19 pc 00ba7a5b  /system/framework/arm/boot-framework.oat (offset 0x393000) (com.android.internal.os.ZygoteServer.runSelectLoop+770)
    #20 pc 00ba5269  /system/framework/arm/boot-framework.oat (offset 0x393000) (com.android.internal.os.ZygoteInit.main+1696)
    #21 pc 00412975  /system/lib/libart.so (art_quick_invoke_stub_internal+68)
    #22 pc 003eaec7  /system/lib/libart.so (art_quick_invoke_static_stub+222)
    #23 pc 000a184f  /system/lib/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+154)
    #24 pc 00349655  /system/lib/libart.so (art:

anonymous namespace)::InvokeWithArgArray(art::ScopedObjectAccessAlreadyRunnable const&, art::ArtMethod*, art:

anonymous namespace)::ArgArray*, art::JValue*, char const*)+52)
    #25 pc 0034947f  /system/lib/libart.so (art::InvokeWithVarArgs(art::ScopedObjectAccessAlreadyRunnable const&, _jobject*, _jmethodID*, std::__va_list)+310)
    #26 pc 00290219  /system/lib/libart.so (art::JNI::CallStaticVoidMethodV(_JNIEnv*, _jclass*, _jmethodID*, std::__va_list)+444)
    #27 pc 0006e579  /system/lib/libandroid_runtime.so (_JNIEnv::CallStaticVoidMethod(_jclass*, _jmethodID*, ...)+28)
    #28 pc 0007073b  /system/lib/libandroid_runtime.so (android::AndroidRuntime::start(char const*, android::Vector<android::String8> const&, bool)+462)
    #29 pc 00001c8f  /system/bin/app_process32 (main+1122)
    #30 pc 000a2245  /system/lib/libc.so (__libc_init+48)
    #31 pc 000017eb  /system/bin/app_process32 (_start_main+38)
    #32 pc 000000c4  <unknown>
 
Analysis:
Loaded coredump in GDB:
#0  art::TimingLogger::Reset (this=0x130) at art/runtime/base/timing_logger.cc:148
No locals.
#1  0xe46a863e in Reset (this=0x120, gc_cause=<optimized out>, clear_soft_references=<optimized out>) at art/runtime/gc/collector/garbage_collector.cc:49
No locals.
#2  art::gc::collector::GarbageCollector::Run (this=0xe4fdc380, gc_cause=art::gc::kGcCauseBackground, clear_soft_references=true) at art/runtime/gc/collector/garbage_collector.cc:92
        start_time = 151785656375586
        current_iteration = 0x120
        end_time = <optimized out>
        self = <optimized out>
#3  0xe46c2360 in art::gc::Heap::CollectGarbageInternal (this=0xe4f3c800, gc_type=art::gc::collector::kGcTypeFull, gc_cause=art::gc::kGcCauseBackground, clear_soft_references=<optimized out>)
    at art/runtime/gc/heap.cc:2648
        runtime = 0xe4f3c400
        self = 0xe42acc00
        collector = 0xe4fdc380
#4  0xe46cfbee in art::gc::Heap::ConcurrentGC (this=0xe4f3c800, self=0xe42acc00, cause=art::gc::kGcCauseBackground, force_full=<optimized out>) at art/runtime/gc/heap.cc:3675
        next_gc_type = art::gc::collector::kGcTypeSticky
        tid = 26546
#5  0xe46d3b14 in art::gc::Heap::ConcurrentGCTask::Run (this=<optimized out>, self=0x5616eb84) at art/runtime/gc/heap.cc:3620
        heap = 0xe4f3c800
#6  0xe46ec958 in art::gc::TaskProcessor::RunAllTasks (this=0xe4f30200, self=0xe42acc00) at art/runtime/gc/task_processor.cc:129
        task = 0xdec08000
#7  0x720bc63c in ?? ()
From here, we see that at frame 2, current_iteration = 0x120 is holding invalid address, which is a member of garbage collector class, see below code for reference.
In file -- art/runtime/gc/collector/garbage_collector.cc
91   Iteration* current_iteration = GetCurrentIteration();
92   current_iteration->Reset(gc_cause, clear_soft_references);
429   const collector::Iteration* GetCurrentGcIteration() const {
430     return ¤t_gc_iteration_;
431   }
1254   collector::Iteration current_gc_iteration_;
And we see the collector object being zeroed out, which seems to be the reason for our crash.
gdb) f 2
#2  art::gc::collector::GarbageCollector::Run (this=0xe4fdc380, gc_cause=art::gc::kGcCauseBackground, clear_soft_references=true) at art/runtime/gc/collector/garbage_collector.cc:92
92      in art/runtime/gc/collector/garbage_collector.cc
(gdb) x/100 this
0xe4fdc380:     0       0       0       0
0xe4fdc390:     0       0       0       0
0xe4fdc3a0:     0       0       0       0
0xe4fdc3b0:     0       0       0       0
0xe4fdc3c0:     0       0       0       0
0xe4fdc3d0:     0       0       0       0
0xe4fdc3e0:     0       0       0       0
0xe4fdc3f0:     0       0       0       0
0xe4fdc400:     0       0       0       0
0xe4fdc410:     0       0       0       0
0xe4fdc420:     0       0       0       0
0xe4fdc430:     0       0       0       0
0xe4fdc440:     0       0       0       0
0xe4fdc450:     0       0       0       0
0xe4fdc460:     0       0       0       0
0xe4fdc470:     0       0       0       0
0xe4fdc480:     0       0       0       0
0xe4fdc490:     0       0       0       0
0xe4fdc4a0:     0       0       0       0
0xe4fdc4b0:     0       0       0       0
0xe4fdc4c0:     0       0       0       0
0xe4fdc4d0:     0       0       0       0
0xe4fdc4e0:     0       0       0       0
0xe4fdc4f0:     0       0       0       0
The app crashes seen prior to this zygote crash also seem be to due to similar reason, collector object being NULL.
Debug approaches - 
We have internally tried to use ASAN and malloc_debug to check is such corruptions can be caught.
Unfortunately, after enabling malloc_debug, issue was not reproducible.
And with ASAN enablement, device runs slow, and results in other unrelated issues.
Can you please help to provide any debug suggestions/ share any similar instances of this issue ?
Regards,
Deepika