• After 15+ years, we've made a big change: Android Forums is now Early Bird Club. Learn more here.

Help random high iowait

beanfield

Lurker
I purchased my evo on the Sprint release date and have had random lagging issues since owning it. Most recently, I've been able to run a terminal emulator and/or adb to get access to 'top'. It seems that the situation most often (but not always) occurs when mediaserver is active (playing mp3's or connected to a2dp Bluetooth). Although sometimes it occurs while my phone is off and I notice it when I power the screen on.

The UI becomes extremely sluggish. When I run top, a few pids are taking up ~2-4% but most are idle. System and user CPU are low but IOW is 60-80%. It stays like this until I reboot the phone or kill off none system processes with quick sys info. I can't seem to pin it to one or a few processes....killing different pids at different times brings IOW down to 0 eventually....which makes me think it's a transient hardware issue or a kernel/memory management issue.

I'd normally use pidstat or iotop to get more info...but they're not available on Android.
I tried echoing 1 into block_dump under /proc to get more info on what pids are doing all the i/o but I don't have permissions. Eclair Cache Problem - Android Linux Kernel Development | Google Groups

Anyone have any ideas for troubleshooting this? It continues to cripple my phone's resources.


Edit: possible related thread http://androidforums.com/evo-4g-support-troubleshooting/203542-100-cpu-usage-but-doesnt-add-up.html
 
This issue cropped up 4 times today. The first time was in the car while connected to a2dp bluetooth. 4g is left on all the time. I was streaming music to a separate bluetooth adapter and it started stuttering on it's own. I pulled over and ran top in a terminal...it again showed iow in the 80-90% range. I had quick system info kill all non android core processes and the I/O wait went back to 0%.

At the gym it also popped up on it's own. I was listening to music with wired earbuds and the music started sputtering. Again, 4g was left on. I had the exact same symptom of iow going in the 80-90% range. This time I simply turned off 4g and within a ~10 seconds the I/O wait went back to 0%. Dunno if this was just a coincidence or really related to 4g and/or the 4g widget.

Then I started trying to manually trigger it by flipping on and off 4g using the widget. After a handful of tries, I got it to trigger by...ramping up I/O wait just like before. I immediately flipped off 4g and it went away. Dunno if this was just a coincidence or really related to 4g and/or the 4g widget.

When I finally got home, I tried flipping 4g on and off about 20-30 times. Eventually it happened again. There was a sudden onset of iow in the 80-90% range. I hooked it up to adb and got the following stats.

Notice the high IOW yet low cpu usage. No individual process really stand out here. The processes can't get cpu time because they're starved waiting on I/O.
$ top -n 1



User 4%, System 11%, IOW 83%, IRQ 0%
User 15 + Nice 0 + Sys 34 + Idle 0 + IOW 256 + IRQ 0 + SIRQ 0 = 305

PID CPU% S #THR VSS RSS PCY UID Name
98 6% S 67 295760K 38152K fg system system_server
10032 3% R 1 936K 400K fg shell top
766 3% S 17 170468K 15172K bg app_57 com.qikffc.android
9876 2% S 1 940K 244K fg app_87 top
9794 0% R 22 211024K 32264K bg app_38 com.android.browser
227 0% S 9 185660K 21784K fg app_50 com.htc.android.htcime
3007 0% S 37 272028K 29476K bg app_23 com.google.android.gm

$ cat /proc/meminfo
MemTotal: 424832 kB
MemFree: 82600 kB
Buffers: 12 kB
Cached: 30520 kB
SwapCached: 0 kB
Active: 139768 kB
Inactive: 153752 kB
Active(anon): 136240 kB
Inactive(anon): 143336 kB
Active(file): 3528 kB
Inactive(file): 10416 kB
Unevictable: 7188 kB
Mlocked: 6892 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 270188 kB
Mapped: 13104 kB
Shmem: 9396 kB
Slab: 12568 kB
SReclaimable: 2124 kB
SUnreclaim: 10444 kB
KernelStack: 7920 kB
PageTables: 15792 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 212416 kB
Committed_AS: 5607708 kB
VmallocTotal: 450560 kB
VmallocUsed: 164504 kB
VmallocChunk: 217092 kB

Notice the "b" column shows 1 process in block state...and the second to last column is the high I/O wait again.
$ vmstat
procs memory system cpu
r b free mapped anon slab in cs flt us ni sy id wa ir
0 1 84300 12980 270384 12564 160 240 4 2 0 9 0 89 0
0 1 83680 12996 270384 12544 1351 3023 39 4 0 10 0 85 0
0 1 85540 12404 270384 12412 277 524 8 0 0 1 0 99 0
1 1 86780 12196 269732 12508 943 1871 30 6 0 15 0 80 0
0 1 86904 12232 269756 12440 1181 2478 35 1 0 9 0 90 0
1 1 87400 12084 269760 12360 1046 2137 22 2 0 0 0 99 0
2 1 86532 12124 269760 12504 1326 3081 48 7 0 17 0 76 0

I let it run for about 5 min while poking around the system...so it ran for longer than the other times. I again turned off 4g with the widget and after about 60 sec iow went back down to 0. If it is related to 4g, it's as if whatever triggers this is some kind of loop that sends i/o requests and the sooner it's caught, the sooner it's able to kill the loop. The longer I wait to turn off 4g (if it truly is some process related to 4g that's causing this), the longer it takes for the system to kill the process while it's queued up waiting for the i/o requests to complete.

Anyway, the stats after it wen back to normal.

$ top -n 1



User 1%, System 4%, IOW 0%, IRQ 0%
User 4 + Nice 0 + Sys 13 + Idle 290 + IOW 0 + IRQ 0 + SIRQ 0 = 307

PID CPU% S #THR VSS RSS PCY UID Name
9876 2% S 1 940K 268K fg app_87 top
10211 2% R 1 936K 416K fg shell top
98 0% S 66 296080K 36652K fg system system_server
4 0% S 1 0K 0K fg root watchdog/0
5 0% S 1 0K 0K fg root events/0
6 0% S 1 0K 0K fg root khelper
7 0% S 1 0K 0K fg root async/mgr

$ cat /proc/meminfo
MemTotal: 424832 kB
MemFree: 70756 kB
Buffers: 8 kB
Cached: 40324 kB
SwapCached: 0 kB
Active: 138860 kB
Inactive: 165740 kB
Active(anon): 136572 kB
Inactive(anon): 144356 kB
Active(file): 2288 kB
Inactive(file): 21384 kB
Unevictable: 7188 kB
Mlocked: 6892 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 271464 kB
Mapped: 15916 kB
Shmem: 9472 kB
Slab: 12840 kB
SReclaimable: 2472 kB
SUnreclaim: 10368 kB
KernelStack: 7928 kB
PageTables: 16340 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 212416 kB
Committed_AS: 5904064 kB
VmallocTotal: 450560 kB
VmallocUsed: 168592 kB
VmallocChunk: 217092 kB

$ vmstat
procs memory system cpu
r b free mapped anon slab in cs flt us ni sy id wa ir
0 0 69392 16236 271460 12844 83 82 0 3 0 6 92 0 0
0 0 69392 16236 271460 12844 61 70 0 0 0 0 99 0 0
0 0 69392 16236 271460 12844 62 68 0 0 0 0 99 0 0
0 0 69392 16236 271460 12844 80 85 0 2 0 6 93 0 0
0 0 69392 16236 271460 12840 57 69 0 0 0 1 98 0 0
0 0 69392 16236 271460 12840 60 76 0 0 0 0 99 0 0
0 0 69392 16236 271460 12840 70 84 0 3 0 5 92 0 0
 
I think if I had permissions to enable block_dump, I could see exactly what process is causing all the I/O requests. Unfortunately, Sprint/HTC don't give us rights to write to /proc to help diagnose this.

$ pwd
/proc/sys/vm

$ ls -l
-rw-r--r-- root root 0 2010-12-07 21:53 overcommit_memory
-rw-r--r-- root root 0 2010-12-07 21:53 panic_on_oom
-rw-r--r-- root root 0 2010-12-07 21:53 oom_kill_allocating_task
-rw-r--r-- root root 0 2010-12-07 21:53 oom_dump_tasks
-rw-r--r-- root root 0 2010-12-07 21:53 overcommit_ratio
-rw-r--r-- root root 0 2010-12-07 21:53 page-cluster
-rw-r--r-- root root 0 2010-12-07 21:53 dirty_background_ratio
-rw-r--r-- root root 0 2010-12-07 21:53 dirty_background_bytes
-rw-r--r-- root root 0 2010-12-07 21:53 dirty_ratio
-rw-r--r-- root root 0 2010-12-07 21:53 dirty_bytes
-rw-r--r-- root root 0 2010-12-07 21:53 dirty_writeback_centisecs
-rw-r--r-- root root 0 2010-12-07 21:53 dirty_expire_centisecs
-r--r--r-- root root 0 2010-12-07 21:53 nr_pdflush_threads
-rw-r--r-- root root 0 2010-12-07 21:53 swappiness
-rw-r--r-- root root 0 2010-12-07 21:53 lowmem_reserve_ratio
-rw-r--r-- root root 0 2010-12-07 21:53 drop_caches
-rw-r--r-- root root 0 2010-12-07 21:53 min_free_kbytes
-rw-r--r-- root root 0 2010-12-07 21:53 min_free_order_shift
-rw-r--r-- root root 0 2010-12-07 21:53 percpu_pagelist_fraction
-rw-r--r-- root root 0 2010-12-07 21:53 max_map_count
-rw-r--r-- root root 0 2010-12-07 21:53 laptop_mode
-rw-r--r-- root root 0 2010-12-07 21:53 block_dump
-rw-r--r-- root root 0 2010-12-07 21:53 vfs_cache_pressure
-rw-r--r-- root root 0 2010-12-07 21:53 mmap_min_addr
-rw-r--r-- root root 0 2010-12-07 21:53 scan_unevictable_pages



$ echo 1 > /proc/sys/vm/block_dump ; cat /proc/kmsg ; echo 0 > /proc/sys/vm/block_dump

cannot create /proc/sys/vm/block_dump: permission denied
/proc/kmsg: Permission denied
cannot create /proc/sys/vm/block_dump: permission denied

Does anyone have any advice for writing to this location? It doesn't look like sudo is a part of android. I'd prefer not to root my device, I don't want to void my warranty if I need to work with sprint/htc support. Or can anyone else who's rooted their phone and having this issue do this above command when the problem is occurring?
 
Thanks for your detailed post beanfield. It's been very helpful for me in tracking down a similar issue on my Samsung Captivate. In my case, my IOW is so high that I have no idle cpu cycles available. It's killing performance. I can reboot and the problem will go away temporarily but then return within an hour.

Unfortunately, I haven't yet been able to identify the root cause of my high IOWAIT stats but I'm at least on the way to doing so. I have rooted my phone and ran the command to show /proc/kmsg. Here's what the majority of the blocks involve:

<7>[ 1862.645527] pool-2-thread-1(3488): WRITE block 569352 on mmcblk0p2 (8 sectors)

The mmcblk0p2 device is my /data directory. It seems that the pool-2-thread-* is a Java-related function. Not sure how to track that back to the app calling the function. I can find plenty of threads about identifying high IO in Linux but many of the commands (like iostat) are not available. Suggestions? Advice?

UPDATE: I removed Fancy Box (weather widget) and another weather widget which helped briefly. Now I'm seeing another process hog the WRITE blocks:

<7>[19645.214234] IntentService[A(3046): WRITE block 376 on mmcblk0p2 (8 sectors)

The WRITE block is different for every event so I don't *think* it's a bad sdcard or RAM. I'm not yet familiar with the Android filesystem so don't know what /data is mapping into (whether it's internal RAM or my add-on sdcard).
 
Back
Top Bottom