在正式解釋什么是fd泄露的時候,先看看三份log,是否有眼熟而不知所措感覺?結合公司同事的深入研究,總結了多種實際案例,才有了這篇文章,以后FD泄露問題在也不慌了。
log 1: Could not read input channel file descriptors from parcel
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: FATAL EXCEPTION: main
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: Process: com.miui.weather2, PID: 20556
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: java.lang.RuntimeException: Could not read input channel file descriptors from parcel.
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: at android.view.InputChannel.nativeReadFromParcel(Native Method)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: at android.view.InputChannel.readFromParcel(InputChannel.java:148)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: at android.view.InputChannel$1.createFromParcel(InputChannel.java:39)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: at android.view.InputChannel$1.createFromParcel(InputChannel.java:37)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: at com.android.internal.view.InputBindResult.<init>(InputBindResult.java:68)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: at com.android.internal.view.InputBindResult$1.createFromParcel(InputBindResult.java:112)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: at com.android.internal.view.InputBindResult$1.createFromParcel(InputBindResult.java:110)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: at com.android.internal.view.IInputMethodManager$Stub$Proxy.startInputOrWindowGainedFocus(IInputMethodManager.java:723)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: at android.view.inputmethod.InputMethodManager.startInputInner(InputMethodManager.java:1295)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: at android.view.inputmethod.InputMethodManager.onPostWindowFocus(InputMethodManager.java:1543)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: at android.view.ViewRootImpl$ViewRootHandler.handleMessage(ViewRootImpl.java:4069)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: at android.os.Handler.dispatchMessage(Handler.java:106)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: at android.os.Looper.loop(Looper.java:171)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: at android.app.ActivityThread.main(ActivityThread.java:6642)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: at java.lang.reflect.Method.invoke(Native Method)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:518)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:873)
log 2:Could not allocate JNI Env
06-22 11:59:30.335 2308 2308 E AndroidRuntime: FATAL EXCEPTION: main
06-22 11:59:30.335 2308 2308 E AndroidRuntime: Process: com.xiaomi.bluetooth, PID: 2308
06-22 11:59:30.335 2308 2308 E AndroidRuntime: java.lang.OutOfMemoryError: Could not allocate JNI Env
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at java.lang.Thread.nativeCreate(Native Method)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at java.lang.Thread.start(Thread.java:730)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at com.android.bluetooth.ble.c.dk(SynchronizedGattCallback.java:54)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at com.android.bluetooth.ble.m.dk(GattPeripheral.java:97)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at com.android.bluetooth.ble.m.eN(GattPeripheral.java:227)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at com.android.bluetooth.ble.m.eq(GattPeripheral.java:221)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at com.android.bluetooth.ble.z.run(PeripheralConnectionManager.java:462)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at android.os.Handler.handleCallback(Handler.java:754)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at android.os.Handler.dispatchMessage(Handler.java:95)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at android.os.Looper.loop(Looper.java:160)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at android.app.ActivityThread.main(ActivityThread.java:6202)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at java.lang.reflect.Method.invoke(Native Method)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:874)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:764)
log 3:unable to open database file (code 14)
android.database.sqlite.SQLiteCantOpenDatabaseException: unable to open database file (code 14)
at android.database.sqlite.SQLiteConnection.nativeExecuteForChangedRowCount(Native Method)
at android.database.sqlite.SQLiteConnection.executeForChangedRowCount(SQLiteConnection.java:735)
at android.database.sqlite.SQLiteSession.executeForChangedRowCount(SQLiteSession.java:754)
at android.database.sqlite.SQLiteStatement.executeUpdateDelete(SQLiteStatement.java:64)
at android.database.sqlite.SQLiteDatabase.updateWithOnConflict(SQLiteDatabase.java:1653)
at android.database.sqlite.SQLiteDatabase.update(SQLiteDatabase.java:1599)
at com.android.providers.telephony.TelephonyProvider.update(TelephonyProvider.java:2704)
at android.content.ContentProvider$Transport.update(ContentProvider.java:357)
at android.content.ContentResolver.update(ContentResolver.java:1688)
at com.android.internal.telephony.SubscriptionController.setCarrierText(SubscriptionController.java:1202)
at com.android.internal.telephony.SubscriptionControllerInjectorBase$DatabaseHandler.handleMessage(SubscriptionControllerInjectorBase.java:201)
at android.os.Handler.dispatchMessage(Handler.java:106)
at android.os.Looper.loop(Looper.java:164)
at android.os.HandlerThread.run(HandlerThread.java:65)
相信大多人都覺得這類問題不好解決,更有人覺得這種問題直接加個try catch,結果下個版本接中上報。因為上面的Log基本上都不是案發現場,正式開始需要先補一波FD泄露的基礎知識。在Android進程系列第一篇---進程基礎中的2.3小節也有簡單介紹。
一、FD的相關概念
概念:Fd的全稱是File descriptor,在linux OS里,所有都可以抽象成文件,比如普通的文件、目錄、塊設備、字符設備、socket、管道等等。當通過一些系統調用(如open/socket等),會返回一個fd(就是一個數字)給你,然后根據這個fd對應的文件進行操作,比如讀、寫。
1、FD從何而來?
我們說fd是一個數字,那么這個數字是怎么計算出來的?在內核進程結構體task_struct中為每個進程維護了一個數組,數組下標就是fd,里面存儲的是對這個文件的描述了。里面就有files指針,維護著所有打開的文件信息:
struct task_struct {
...
/* Open file information: */
struct files_struct *files;
...
}
/*
* Open file table structure
*/
struct files_struct {
/*
* read mostly part
*/
atomic_t count;
bool resize_in_progress;
wait_queue_head_t resize_wait;
struct fdtable __rcu *fdt;
struct fdtable fdtab;
/*
* written part on a separate cache line in SMP
*/
spinlock_t file_lock ____cacheline_aligned_in_smp;
unsigned int next_fd;
unsigned long close_on_exec_init[1];
unsigned long open_fds_init[1];
unsigned long full_fds_bits_init[1];
struct file __rcu * fd_array[NR_OPEN_DEFAULT];
};
files_struct中維護一個fdtable,fdtable里的fd就是一個數組,file結構體就是為打開文件的信息了。
struct fdtable {
unsigned int max_fds;
struct file __rcu **fd; /* current fd array */
unsigned long *close_on_exec;
unsigned long *open_fds;
unsigned long *full_fds_bits;
struct rcu_head rcu;
};
2、閾值
linux默認對每個進程最大能打開的fd的個數是1024(軟限制是1024,硬限制是4096),你可以通過/proc/$pid/limits查看Max open files:
當一個進程打開的文件數超過這個軟限制值1024將無法再打開文件了。所以就報出各種問題,和OOM問題一樣,crash堆棧有可能只是壓死駱駝的最后一根稻草,并不是真實案發現場。所以用完fd后需要close關閉這個fd,那么這個fd對應的數字就被系統回收了,下一次的open才會被重新利用。
軟限制和硬限制的區別
硬限制是可以在任何時候任何進程中設置 但硬限制只能由超級用戶提起
軟限制是內核實際執行的限制,任何進程都可以將軟限制設置為任意小于等于對進程限制的硬限制的操作fd
如果覺得fd不夠用了,也可以用下面方式調整.
getrlimit(RLIMIT_NOFILE, &rlim);
setrlimit(RLIMIT_NOFILE, &rlim);
ulimit -n 2048
android.system.Os.getrlimit(OsConstants.RLIMIT_NOFILE);
egg:
void modifyfdlimit() {
rlimit fdLimit;
fdLimit.rlim_cur = 30000;
fdLimit.rlim_max = 30000;
if (-1 == setrlimit(RLIMIT_NOFILE, & fdLimit)) {
printf("Set max fd open count fai. /nl");
char cmdBuffer[ 64];
sprintf(cmdBuffer, "ulimit -n %d", 30000);
if (-1 == system(cmdBuffer)) {
printf("%s failed. /n", cmdBuffer);
exit(0);
}
if (-1 == getrlimit(RLIMIT_NOFILE, & fdLimit)){
printf("Ulimit fd number failed.");
exit(0);
}
}
}
3、打開的fd查看
使用ls -la /proc/$pid/fd查看
nitrogen:/ # pidof system_server
1956
nitrogen:/ # ls -la /proc/1956/f
fd/ fdinfo/
nitrogen:/ # ls -la /proc/1956/fd
total 0
dr-x------ 2 system system 0 2018-11-02 10:57 .
dr-xr-xr-x 9 system system 0 2018-11-02 10:57 ..
lrwx------ 1 system system 64 2018-11-02 12:26 0 -> /dev/null
lrwx------ 1 system system 64 2018-11-02 12:26 1 -> /dev/null
lr-x------ 1 system system 64 2018-11-02 12:26 10 -> /system/framework/QPerformance.jar
lrwx------ 1 system system 64 2018-11-02 12:26 100 -> anon_inode:[timerfd]
lrwx------ 1 system system 64 2018-11-02 12:26 101 -> anon_inode:[timerfd]
lrwx------ 1 system system 64 2018-11-02 12:26 102 -> anon_inode:[timerfd]
lrwx------ 1 system system 64 2018-11-02 12:26 103 -> anon_inode:[timerfd]
lrwx------ 1 system system 64 2018-11-02 12:26 104 -> anon_inode:[timerfd]
lrwx------ 1 system system 64 2018-11-02 12:26 105 -> anon_inode:[eventpoll]
lr-x------ 1 system system 64 2018-11-02 12:26 106 -> anon_inode:inotify
lr-x------ 1 system system 64 2018-11-02 12:26 107 -> pipe:[36681]
l-wx------ 1 system system 64 2018-11-02 12:26 108 -> pipe:[36681]
lrwx------ 1 system system 64 2018-11-02 12:26 109 -> anon_inode:[eventfd]
lr-x------ 1 system system 64 2018-11-02 12:26 11 -> /system/framework/core-oj.jar
lrwx------ 1 system system 64 2018-11-02 12:26 110 -> anon_inode:[eventpoll]
lrwx------ 1 system system 64 2018-11-02 12:26 111 -> socket:[35371]
lrwx------ 1 system system 64 2018-11-02 12:26 112 -> socket:[35372]
lr-x------ 1 system system 64 2018-11-02 12:26 113 -> /system/media/theme/defau
二、Fd Leak案例
現在看幾種FD泄露問題的案例,FD泄露問題的特點是:
- 同一個問題可能出現不同堆棧, 比較隱晦
- Fd泄漏時內存可能不會出現不足,就算觸發GC也不一定能夠回收已經創建的文件句柄
日志關鍵字:
ashmem_create_region failed for ‘indirect ref table’: Too many open files
"Too many open files"
"Could not allocate JNI Env"
"Could not allocate dup blob fd"
"Could not read input channel file descriptors from parcel"
"pthread_create"
"InputChannel is not initialized"
"Could not open input channel pair"
當你看到上面幾種crash的堆棧之后,就需要往fd泄露的方向上去思考了
1、Resource相關
使用輸入輸出流沒有關閉的可能會出問題,FileInputStream,FileOutputStream,FileReader,FileWriter 等,因為每打開一個文件需要fd。一些輸入流也提供了基于fd的構造方法
174 public FileInputStream(FileDescriptor fdObj) {
175 this(fdObj, false /* isFdOwner */);
176 }
177
下面是一種泄露案例
frameworks/base/services/core/java/com/android/server/pm/ResmonWhitelistPackage.java
10final class ResmonWhitelistPackage {
11 private final File mSystemDir;
12 private final File mWhitelistFile;
13
14 final ArrayList<String> mPackages = new ArrayList<String>();
15
16 ResmonWhitelistPackage() {
17 mSystemDir = new File("/system/", "etc");
18 mWhitelistFile = new File(mSystemDir, "resmonwhitelist.txt");
19 }
20
21 void readList() {
....
25 try {
26 /// M: Clear white list record before update it
27 mPackages.clear();
28 BufferedReader br = new BufferedReader(new FileReader(mWhitelistFile));
29 String line = br.readLine();
30 while (line != null) {
31 mPackages.add(line);
32 line = br.readLine();
33 }
34 br.close();
35 } catch (IOException e) {
36 //Log.e(PackageManagerService.TAG, "IO Exception happened while reading resmon whitelist");
37 e.printStackTrace();
38 }
39 }
40}
br.close并不是在finally語句中,可能會出現未關閉的可能。如果代碼寫的風騷一點,也有辦法。從 Java 7 build 105 版本開始,Java 7 的編譯器和運行環境支持新的 try-with-resources 語句,稱為 ARM 塊(Automatic Resource Management) ,自動資源管理。
private static void customBufferStreamCopy(File source, File target) {
try (InputStream fis = new FileInputStream(source);
OutputStream fos = new FileOutputStream(target)){
byte[] buf = new byte[8192];
int i;
while ((i = fis.read(buf)) != -1) {
fos.write(buf, 0, i);
}
}
catch (Exception e) {
e.printStackTrace();
}
}
代碼清晰,且不會發生泄露。
2、HandlerThread相關
使用HandlerThread不小心也會發生fd泄露,看看這個案例
2.1、現象
systemui總是crash,發生問題系統版本Android O
2.2、初步分析
pid: 18465, tid: 32737, name: async_sensor >>> com.android.systemui <<<
signal 5 (SIGTRAP), code -32763 (PTRACE_EVENT_STOP), fault addr 0x3e800007fe1
x0 fffffffffffffffc x1 000000735df1ec38 x2 0000000000000010 x3 00000000ffffffff
x4 0000000000000000 x5 0000000000000008 x6 0000007428971000 x7 0000000000bb3876
x8 0000000000000016 x9 7fffffffffffffff x10 000000000000000c x11 0000000000000000
x12 000000735df1ed38 x13 000000005b20a831 x14 002cd0c4e58dc31b x15 0000fdf7aa690a91
x16 00000074245e7498 x17 000000742453bd00 x18 0000000000000004 x19 000000735df1f588
x20 000000738727b708 x21 00000000ffffffff x22 000000735df1f588 x23 000000738727b660
x24 0000000000000028 x25 000000000000000c x26 0000000014a000b0 x27 0000007385815300
x28 00000000710507c8 x29 000000735df1ebe0 x30 000000742453bd38
sp 000000735df1ebc0 pc 00000074245866dc pstate 0000000060000000
v0 00000000000000000000000000000000 v1 00000000000000000000000000000001
v2 00000000000000002065766974616e3c v3 00000000000000000000000000000000
v4 00000000000000008020080200000000 v5 00000000000000004000000000000000
v6 00000000000000000000000000000000 v7 00000000000000008020080280200802
v8 00000000000000000000000000000000 v9 00000000000000000000000000000000
v10 00000000000000000000000000000000 v11 00000000000000000000000000000000
v12 00000000000000000000000000000000 v13 00000000000000000000000000000000
v14 00000000000000000000000000000000 v15 00000000000000000000000000000000
v16 40100401401004014010040140100401 v17 a0080000a00a0000a800aa0040404000
v18 80200800000000008020080200000000 v19 000000000000000000000000ebad8083
v20 000000000000000000000000ebad8084 v21 000000000000000000000000ebad8085
v22 000000000000000000000000ebad8086 v23 000000000000000000000000ebad8087
v24 000000000000000000000000ebad8088 v25 000000000000000000000000ebad8089
v26 000000000000000000000000ebad808a v27 000000000000000000000000ebad808b
v28 000000000000000000000000ebad808c v29 000000000000000000000000ebad808d
v30 000000000000000000000000ebad808e v31 00000000000000000000000041e00000
fpsr 00000013 fpcr 00000000
backtrace:
#00 pc 000000000006a6dc /system/lib64/libc.so (__epoll_pwait+8)
#01 pc 000000000001fd34 /system/lib64/libc.so (epoll_pwait+52)
#02 pc 0000000000015d08 /system/lib64/libutils.so (android::Looper::pollInner(int)+144)
#03 pc 0000000000015bf0 /system/lib64/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)+108)
#04 pc 0000000000111bac /system/lib64/libandroid_runtime.so (android::android_os_MessageQueue_nativePollOnce(_JNIEnv*, _jobject*, long, int)+44)
#05 pc 0000000000c005cc /system/framework/arm64/boot-framework.oat (offset 0x9cb000) (android.app.NativeActivity.onWindowFocusChangedNative [DEDUPED]+140)
#06 pc 0000000001773f00 /system/framework/arm64/boot-framework.oat (offset 0x9cb000) (android.os.MessageQueue.next+192)
乍看是處理消息的時候掛了?繼續查看log發現
06-15 22:00:33.921 1000 2155 2335 E Parcel : fcntl(F_DUPFD_CLOEXEC) failed in Parcel::read, i is 0, fds[i] is -1, fd_count is 2, error: Too many open files
06-15 22:00:33.921 1000 2155 2335 E Surface : dequeueBuffer: IGraphicBufferProducer::requestBuffer failed: -22
06-15 22:00:33.921 1000 2155 2335 I Adreno : DequeueBuffer: dequeueBuffer failed
06-15 22:00:33.921 1000 2155 2335 E Parcel : fcntl(F_DUPFD_CLOEXEC) failed in Parcel::read, i is 0, fds[i] is -1, fd_count is 2, error: Too many open files
06-15 22:00:33.921 1000 2155 2335 E Surface : dequeueBuffer: IGraphicBufferProducer::requestBuffer failed: -22
06-15 22:00:33.921 1000 2155 2335 I Adreno : DequeueBuffer: dequeueBuffer failed
06-15 22:00:33.921 1000 2155 2335 E Parcel : fcntl(F_DUPFD_CLOEXEC) failed in Parcel::read, i is 0, fds[i] is -1, fd_count is 2, error: Too many open files
06-15 22:00:33.921 1000 2155 2335 E Surface : dequeueBuffer: IGraphicBufferProducer::requestBuffer failed: -22
06-15 22:00:33.921 1000 2155 2335 I Adreno : DequeueBuffer: dequeueBuffer failed
06-15 22:00:33.922 1000 2155 2335 E Parcel : fcntl(F_DUPFD_CLOEXEC) failed in Parcel::read, i is 0, fds[i] is -1, fd_count is 2, error: Too many open files
06-15 22:00:33.922 1000 2155 2335 E Surface : dequeueBuffer: IGraphicBufferProducer::requestBuffer failed: -22
06-15 22:00:33.922 1000 2155 2335 I Adreno : DequeueBuffer: dequeueBuffer failed
06-15 22:00:33.922 1000 2155 2335 E OpenGLRenderer: GL error: GL_INVALID_OPERATION
06-15 22:00:33.922 1000 2155 2335 F OpenGLRenderer: glCopyTexSubImage2D error! GL_INVALID_OPERATION (0x502
狀態欄open fd超過1024, 看log有很多上面這種log,這個是真實案發現場嗎?
2.3、深入分析
O上發生NE時會將fd信息打印到tombstone文件中,看fd信息確實已經滿了,多為anon_inode:[eventfd]和anon_inode:dmabuf,
backtrace:
#00 pc 000000000006a6dc /system/lib64/libc.so (__epoll_pwait+8)
#01 pc 000000000001fd34 /system/lib64/libc.so (epoll_pwait+52)
#02 pc 0000000000015d08 /system/lib64/libutils.so (android::Looper::pollInner(int)+144)
#03 pc 0000000000015bf0 /system/lib64/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)+108)
#04 pc 0000000000111bac /system/lib64/libandroid_runtime.so (android::android_os_MessageQueue_nativePollOnce(_JNIEnv*, _jobject*, long, int)+44)
#05 pc 0000000000c005cc /system/framework/arm64/boot-framework.oat (offset 0x9cb000) (android.app.NativeActivity.onWindowFocusChangedNative [DEDUPED]+140)
#06 pc 0000000001773f00 /system/framework/arm64/boot-framework.oat (offset 0x9cb000) (android.os.MessageQueue.next+192)
....
fd 556: anon_inode:[eventpoll]
fd 557: anon_inode:[eventpoll]
fd 558: anon_inode:[eventfd]
fd 559: anon_inode:[eventpoll]
fd 560: anon_inode:[eventfd]
fd 561: anon_inode:[eventpoll]
fd 562: anon_inode:[eventfd]
fd 563: anon_inode:[eventfd]
fd 564: anon_inode:[eventpoll]
fd 565: anon_inode:[eventfd]
fd 566: anon_inode:[eventpoll]
fd 567: /dev/ashmem
..... //省略千行
fd 1022: anon_inode:dmabuf
fd 1023: socket:[3549620]
通過trace分析,還有一個關鍵的異常log,看到systemui進程有很多個async_sensor線程,為什么這個線程這么多呢?
pid: 11019, tid: 2301, name: async_sensor >>> com.android.systemui <<<
pid: 11019, tid: 2431, name: async_sensor >>> com.android.systemui <<<
pid: 11019, tid: 2522, name: async_sensor >>> com.android.systemui <<<
pid: 11019, tid: 2542, name: async_sensor >>> com.android.systemui <<<
pid: 11019, tid: 2600, name: async_sensor >>> com.android.systemui <<<
.....//省略若干
pid: 11019, tid: 5693, name: async_sensor >>> com.android.systemui <<<
搜查代碼async_sensor是什么?發現async_sensor是個HanderThread
看在DozeFactory中,有new AsyncSensorManager 的地方:
繼續查看assembleMachine方法在哪里調用的,DozeService中有調用 assembleMachine的地方
回頭在結合log發現,DozeService被頻繁的啟動,看來一步步的接近真相了。這個問題看來是鎖屏同事造成的問題。
isTest=false, canDoze=true, userId=0
06-15 21:48:11.888 1000 1313 1384 I DreamController: Stopping dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:48:13.337 1000 1313 1384 I DreamController: Starting dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:48:24.519 1000 1313 1384 I DreamController: Stopping dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:48:33.577 1000 1313 1384 I DreamController: Starting dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:48:50.640 1000 1313 1384 I DreamController: Stopping dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:48:56.540 1000 1313 1384 I DreamController: Starting dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:49:28.240 1000 1313 1384 I DreamController: Stopping dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:49:29.207 1000 1313 1384 I DreamController: Starting dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:49:30.206 1000 1313 1384 I DreamController: Stopping dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:49:33.332 1000 1313 1384 I DreamController: Starting dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:49:37.435 1000 1313 1384 I DreamController: Stopping dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:49:50.529 1000 1313 1384 I DreamController: Starting dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:50:20.010 1000 1313 1384 I DreamController: Stopping dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:50:30.148 1000 1313 1384 I DreamController: Starting dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
......
2.4、修復方案
最終轉給鎖屏同事,將此修復
所以本問題的RootCase就是頻繁的啟動了DozeService,創建了大量的HandlerThread導致fd泄露,那么為什么HandlerThread和fd泄露有關系呢?跟蹤源碼發現HandlerThread創建會引起Looper的創建,每一個Looper在創建的時候會打開兩個fd,一個是eventfd,另外一個是mEpolled,這個和tombstone文件中打印的fd也對上了。
總結,這種泄露問題如果分析的時候,如果不知道HandlerThread會創建兩個fd的基本知識,那么這個問題比較難以分析。
2、Thread.start相關
線程啟動的時候,可能也會有fd泄露的風險,不過這種錯誤不太容易犯下,如果你真是在一個循環中創建1024線程,那么立刻見效,程序死掉。
trace1
java.lang.OutOfMemoryError: Could not allocate JNI Env
at java.lang.Thread.nativeCreate(Native Method)
at java.lang.Thread.start(Thread.java:729)
at com.android.server.wifi.WifiNative.startHal(WifiNative.java:1639)
at com.android.server.wifi.WifiStateMachine.setupDriverForSoftAp(WifiStateMachine.java:3970)
at com.android.server.wifi.WifiStateMachine.-wrap9(WifiStateMachine.java)
at com.android.server.wifi.WifiStateMachine$InitialState.processMessage(WifiStateMachine.java:4480)
at com.android.internal.util.StateMachine$SmHandler.processMsg(StateMachine.java:980)
at com.android.internal.util.StateMachine$SmHandler.handleMessage(StateMachine.java:799)
at android.os.Handler.dispatchMessage(Handler.java:102)
at android.os.Looper.loop(Looper.java:163)
at android.os.HandlerThread.run(HandlerThread.java:61)
trace2
java.lang.OutOfMemoryError: pthread_create (1040KB stack) failed: Try again
at java.lang.Thread.nativeCreate(Native Method)
at java.lang.Thread.start(Thread.java:733)
at com.tencent.mm.sdk.f.b$a.start(SourceFile:61)
at com.tencent.mm.am.a.bU(SourceFile:60)
at com.tencent.mm.ui.MMAppMgr$8.tC(SourceFile:315)
at com.tencent.mm.sdk.platformtools.am.handleMessage(SourceFile:69)
at com.tencent.mm.sdk.platformtools.aj.handleMessage(SourceFile:173)
at com.tencent.mm.sdk.platformtools.aj.dispatchMessage(SourceFile:128)
at android.os.Looper.loop(Looper.java:176)
at android.app.ActivityThread.main(ActivityThread.java:6701)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.Zygote$MethodAndArgsCaller.run(Zygote.java:246)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:783)
3、Inputchannel 相關
Inputchannel也會可能出現fd泄露問題,如下:
3.1 在Activity中不斷彈Dialog
public class Main2Activity extends Activity {
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main2);
}
public void onClick(View view) {
for (int i = 0; i < 1024; i++) {
AlertDialog.Builder builder = new AlertDialog.Builder(this);
builder.setTitle("fd").setIcon(R.drawable.ic_launcher_background).create();
builder.show();
}
}
}
不過一會,這個App就死了,報出了下面的問題
11-02 17:38:22.263 9351-9351/com.example.wangjing.rebootdemo E/AndroidRuntime: FATAL EXCEPTION: main
Process: com.example.wangjing.rebootdemo, PID: 9351
java.lang.IllegalStateException: Could not execute method for android:onClick
at android.view.View$DeclaredOnClickListener.onClick(View.java:5391)
at android.view.View.performClick(View.java:6311)
at android.view.View$PerformClick.run(View.java:24833)
at android.os.Handler.handleCallback(Handler.java:794)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loop(Looper.java:173)
at android.app.ActivityThread.main(ActivityThread.java:6653)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:547)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:821)
Caused by: java.lang.reflect.InvocationTargetException
at java.lang.reflect.Method.invoke(Native Method)
at android.view.View$DeclaredOnClickListener.onClick(View.java:5386)
at android.view.View.performClick(View.java:6311)
at android.view.View$PerformClick.run(View.java:24833)
at android.os.Handler.handleCallback(Handler.java:794)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loop(Looper.java:173)
at android.app.ActivityThread.main(ActivityThread.java:6653)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:547)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:821)
Caused by: java.lang.RuntimeException: Could not read input channel file descriptors from parcel.
at android.view.InputChannel.nativeReadFromParcel(Native Method)
at android.view.InputChannel.readFromParcel(InputChannel.java:148)
at android.view.IWindowSession$Stub$Proxy.addToDisplay(IWindowSession.java:804)
at android.view.ViewRootImpl.setView(ViewRootImpl.java:770)
at android.view.WindowManagerGlobal.addView(WindowManagerGlobal.java:356)
at android.view.WindowManagerImpl.addView(WindowManagerImpl.java:94)
at android.app.Dialog.show(Dialog.java:330)
at android.app.AlertDialog$Builder.show(AlertDialog.java:1114)
at com.example.wangjing.rebootdemo.Main2Activity.onClick(Main2Activity.java:28)
at java.lang.reflect.Method.invoke(Native Method)
at android.view.View$DeclaredOnClickListener.onClick(View.java:5386)
at android.view.View.performClick(View.java:6311)
at android.view.View$PerformClick.run(View.java:24833)
at android.os.Handler.handleCallback(Handler.java:794)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loop(Looper.java:173)
at android.app.ActivityThread.main(ActivityThread.java:6653)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:547)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:821)
查看一下這個進程的fd信息,很多的socket,正常情況下,這些東西是沒有的
jason:/ # ps -ef |grep "wangjing"
u0_a161 13134 9462 3 17:43:12 ? 00:00:04 com.example.wangjing.rebootdemo
root 13385 13380 3 17:45:26 pts/1 00:00:00 grep wangjing
jason:/ # ls -la proc/13134/fd/
total 0
dr-x------ 2 u0_a161 u0_a161 0 2018-11-02 17:43 .
dr-xr-xr-x 9 u0_a161 u0_a161 0 2018-11-02 17:43 ..
lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 0 -> /dev/null
lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 1 -> /dev/null
lr-x------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 10 -> /system/framework/com.nxp.nfc.nq.jar
lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 100 -> socket:[17760179]
lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 101 -> socket:[17753761]
lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 102 -> socket:[17773237]
lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 103 -> socket:[17760182]
lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 104 -> socket:[17760184]
lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 105 -> socket:[17773239]
lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 106 -> socket:[17776657]
lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 107 -> socket:[17774959]
lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 108 -> socket:[17776659]
......
5、Bitmap 相關
bitmap也是需要fd的,如下圖,沒有關閉,可能引發fd泄露的可能。
trace1
java.lang.RuntimeException: Could not allocate dup blob fd.
at android.graphics.Bitmap.nativeCreateFromParcel(Native Method)
at android.graphics.Bitmap.access$100(Bitmap.java:36)
at android.graphics.Bitmap$1.createFromParcel(Bitmap.java:1528)
at android.graphics.Bitmap$1.createFromParcel(Bitmap.java:1520)
at android.widget.RemoteViews$BitmapCache.<init>(RemoteViews.java:954)
at android.widget.RemoteViews.<init>(RemoteViews.java:1820)
at android.widget.RemoteViews.<init>(RemoteViews.java:1812)
at android.widget.RemoteViews.clone(RemoteViews.java:1905)
at android.app.Notification.cloneInto(Notification.java:1534)
at android.app.Notification.clone(Notification.java:1508)
at android.service.notification.StatusBarNotification.clone(StatusBarNotification.java:161)
at com.android.server.notification.NotificationManagerService$NotificationListeners.notifyPostedLocked(NotificationManagerService.java:3557)
at com.android.server.notification.NotificationManagerService$8.run(NotificationManagerService.java:2337)
at android.os.Handler.handleCallback(Handler.java:815)
at android.os.Handler.dispatchMessage(Handler.java:104)
at android.os.Looper.loop(Looper.java:207)
at com.android.server.SystemServer.run(SystemServer.java:410)
at com.android.server.SystemServer.main(SystemServer.java:255)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:933)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:782)
三、總結
通過上面的五個案例,總結常見的fd泄露的情景,一般出現下面的log,就需要懷疑是否有fd泄露的情況。
"Too many open files"
"Could not allocate JNI Env"
"Could not allocate dup blob fd"
"Could not read input channel file descriptors from parcel"
"pthread_create"
"InputChannel is not initialized"
"Could not open input channel pair"
大批量的打開“anon_inode:[eventpoll]” 和 "pipe" 或者 "anon_inode:[eventfd]", 超過100個eventpoll, 通常情況下是開啟了太多的HandlerThread/Looper/MessageQueue, 線程忘記關閉, 或者looper 沒有釋放. 可以抓取hprof 進行快速分析
對于system server, 如果有大批量的socket 打開, 可能是因為Input Channel 沒有關閉, 此類同樣抓取hprof, 查看system server 中WindowState 的情況.
大量的打開“/dev/ashmem”, 如果是Context provider, 或者其他app, 很可能是打開數據庫沒有關閉, 或者數據庫鏈接頻繁打開忘記關閉. 這個時候查看這個進程的maps, cat proc/pid/maps, 即可看到這個ashmem 的name, 然后進一步可知道在哪里泄露.
3.1、容易復現
1.查看fd信息adb shell ls -a -l /proc/<pid>/fd ,lsof
2.查看進程線程信息:ps -t <pid>,或者抓進程trace, kill -3 <pid>
3.抓取hprof定位資源使用情況
3.2、難復現
1.對于應用自身fd泄漏發生JE時可以在復寫UncatchHandlerException在應用crash的時候通過readlink的方式讀取/proc/self/fd的信息,在后面發生的時候可以以獲取fd信息
2.O之后NE的Tombstone文件中有open files,可以查看打開的fd信息
3.抓取進程的ps信息或者trace信息
4.如果是inputchannel類型的,有可能是窗口類型的,因此可以查看window情況,dumpsys window