應用與系統穩定性第三篇---FD泄露問題漫談

在正式解釋什么是fd泄露的時候,先看看三份log,是否有眼熟而不知所措感覺?結合公司同事的深入研究,總結了多種實際案例,才有了這篇文章,以后FD泄露問題在也不慌了。

log 1: Could not read input channel file descriptors from parcel
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: FATAL EXCEPTION: main
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: Process: com.miui.weather2, PID: 20556
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime: java.lang.RuntimeException: Could not read input channel file descriptors from parcel.
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime:     at android.view.InputChannel.nativeReadFromParcel(Native Method)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime:     at android.view.InputChannel.readFromParcel(InputChannel.java:148)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime:     at android.view.InputChannel$1.createFromParcel(InputChannel.java:39)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime:     at android.view.InputChannel$1.createFromParcel(InputChannel.java:37)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime:     at com.android.internal.view.InputBindResult.<init>(InputBindResult.java:68)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime:     at com.android.internal.view.InputBindResult$1.createFromParcel(InputBindResult.java:112)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime:     at com.android.internal.view.InputBindResult$1.createFromParcel(InputBindResult.java:110)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime:     at com.android.internal.view.IInputMethodManager$Stub$Proxy.startInputOrWindowGainedFocus(IInputMethodManager.java:723)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime:     at android.view.inputmethod.InputMethodManager.startInputInner(InputMethodManager.java:1295)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime:     at android.view.inputmethod.InputMethodManager.onPostWindowFocus(InputMethodManager.java:1543)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime:     at android.view.ViewRootImpl$ViewRootHandler.handleMessage(ViewRootImpl.java:4069)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime:     at android.os.Handler.dispatchMessage(Handler.java:106)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime:     at android.os.Looper.loop(Looper.java:171)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime:     at android.app.ActivityThread.main(ActivityThread.java:6642)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime:     at java.lang.reflect.Method.invoke(Native Method)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime:     at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:518)
06-22 20:34:43.035 10037 20556 20556 E AndroidRuntime:     at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:873)
log 2:Could not allocate JNI Env
06-22 11:59:30.335 2308 2308 E AndroidRuntime: FATAL EXCEPTION: main
06-22 11:59:30.335 2308 2308 E AndroidRuntime: Process: com.xiaomi.bluetooth, PID: 2308
06-22 11:59:30.335 2308 2308 E AndroidRuntime: java.lang.OutOfMemoryError: Could not allocate JNI Env
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at java.lang.Thread.nativeCreate(Native Method)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at java.lang.Thread.start(Thread.java:730)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at com.android.bluetooth.ble.c.dk(SynchronizedGattCallback.java:54)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at com.android.bluetooth.ble.m.dk(GattPeripheral.java:97)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at com.android.bluetooth.ble.m.eN(GattPeripheral.java:227)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at com.android.bluetooth.ble.m.eq(GattPeripheral.java:221)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at com.android.bluetooth.ble.z.run(PeripheralConnectionManager.java:462)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at android.os.Handler.handleCallback(Handler.java:754)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at android.os.Handler.dispatchMessage(Handler.java:95)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at android.os.Looper.loop(Looper.java:160)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at android.app.ActivityThread.main(ActivityThread.java:6202)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at java.lang.reflect.Method.invoke(Native Method)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:874)
06-22 11:59:30.335 2308 2308 E AndroidRuntime: at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:764)
log 3:unable to open database file (code 14)
android.database.sqlite.SQLiteCantOpenDatabaseException: unable to open database file (code 14)
  at android.database.sqlite.SQLiteConnection.nativeExecuteForChangedRowCount(Native Method)
  at android.database.sqlite.SQLiteConnection.executeForChangedRowCount(SQLiteConnection.java:735)
  at android.database.sqlite.SQLiteSession.executeForChangedRowCount(SQLiteSession.java:754)
  at android.database.sqlite.SQLiteStatement.executeUpdateDelete(SQLiteStatement.java:64)
  at android.database.sqlite.SQLiteDatabase.updateWithOnConflict(SQLiteDatabase.java:1653)
  at android.database.sqlite.SQLiteDatabase.update(SQLiteDatabase.java:1599)
  at com.android.providers.telephony.TelephonyProvider.update(TelephonyProvider.java:2704)
  at android.content.ContentProvider$Transport.update(ContentProvider.java:357)
  at android.content.ContentResolver.update(ContentResolver.java:1688)
  at com.android.internal.telephony.SubscriptionController.setCarrierText(SubscriptionController.java:1202)
  at com.android.internal.telephony.SubscriptionControllerInjectorBase$DatabaseHandler.handleMessage(SubscriptionControllerInjectorBase.java:201)
  at android.os.Handler.dispatchMessage(Handler.java:106)
  at android.os.Looper.loop(Looper.java:164)
  at android.os.HandlerThread.run(HandlerThread.java:65)

相信大多人都覺得這類問題不好解決,更有人覺得這種問題直接加個try catch,結果下個版本接中上報。因為上面的Log基本上都不是案發現場,正式開始需要先補一波FD泄露的基礎知識。在Android進程系列第一篇---進程基礎中的2.3小節也有簡單介紹。

一、FD的相關概念

概念:Fd的全稱是File descriptor,在linux OS里,所有都可以抽象成文件,比如普通的文件、目錄、塊設備、字符設備、socket、管道等等。當通過一些系統調用(如open/socket等),會返回一個fd(就是一個數字)給你,然后根據這個fd對應的文件進行操作,比如讀、寫。

1、FD從何而來?

我們說fd是一個數字,那么這個數字是怎么計算出來的?在內核進程結構體task_struct中為每個進程維護了一個數組,數組下標就是fd,里面存儲的是對這個文件的描述了。里面就有files指針,維護著所有打開的文件信息:

struct task_struct {
...
   /* Open file information: */
   struct files_struct     *files;
...
}

/*
* Open file table structure
*/
struct files_struct {
 /*
  * read mostly part
  */
   atomic_t count;
   bool resize_in_progress;
   wait_queue_head_t resize_wait;

   struct fdtable __rcu *fdt;
   struct fdtable fdtab;
 /*
  * written part on a separate cache line in SMP
  */
   spinlock_t file_lock ____cacheline_aligned_in_smp;
   unsigned int next_fd;
   unsigned long close_on_exec_init[1];
   unsigned long open_fds_init[1];
   unsigned long full_fds_bits_init[1];
   struct file __rcu * fd_array[NR_OPEN_DEFAULT];
};

files_struct中維護一個fdtable,fdtable里的fd就是一個數組,file結構體就是為打開文件的信息了。

struct fdtable {
   unsigned int max_fds;
   struct file __rcu **fd;      /* current fd array */
   unsigned long *close_on_exec;
   unsigned long *open_fds;
   unsigned long *full_fds_bits;
   struct rcu_head rcu;
};
2、閾值

linux默認對每個進程最大能打開的fd的個數是1024(軟限制是1024,硬限制是4096),你可以通過/proc/$pid/limits查看Max open files:


fd閾值查看

當一個進程打開的文件數超過這個軟限制值1024將無法再打開文件了。所以就報出各種問題,和OOM問題一樣,crash堆棧有可能只是壓死駱駝的最后一根稻草,并不是真實案發現場。所以用完fd后需要close關閉這個fd,那么這個fd對應的數字就被系統回收了,下一次的open才會被重新利用。

軟限制和硬限制的區別
硬限制是可以在任何時候任何進程中設置 但硬限制只能由超級用戶提起
軟限制是內核實際執行的限制,任何進程都可以將軟限制設置為任意小于等于對進程限制的硬限制的操作fd

如果覺得fd不夠用了,也可以用下面方式調整.

getrlimit(RLIMIT_NOFILE, &rlim);
setrlimit(RLIMIT_NOFILE, &rlim);
ulimit -n 2048
android.system.Os.getrlimit(OsConstants.RLIMIT_NOFILE);

egg:

void modifyfdlimit() {
    rlimit fdLimit;
    fdLimit.rlim_cur = 30000;
    fdLimit.rlim_max = 30000;
    if (-1 == setrlimit(RLIMIT_NOFILE, & fdLimit)) {
        printf("Set max fd open count fai. /nl");
        char cmdBuffer[ 64];
        sprintf(cmdBuffer, "ulimit -n %d", 30000);
        if (-1 == system(cmdBuffer)) {
            printf("%s failed. /n", cmdBuffer);
             exit(0);
     }
    if (-1 == getrlimit(RLIMIT_NOFILE, & fdLimit)){
        printf("Ulimit fd number failed.");
        exit(0);
     }
 }
}

3、打開的fd查看

使用ls -la /proc/$pid/fd查看

nitrogen:/ # pidof  system_server
1956
nitrogen:/ # ls -la /proc/1956/f                                                                                                                                                                           
fd/      fdinfo/
nitrogen:/ # ls -la /proc/1956/fd                                                                                                                                                                          
total 0
dr-x------ 2 system system  0 2018-11-02 10:57 .
dr-xr-xr-x 9 system system  0 2018-11-02 10:57 ..
lrwx------ 1 system system 64 2018-11-02 12:26 0 -> /dev/null
lrwx------ 1 system system 64 2018-11-02 12:26 1 -> /dev/null
lr-x------ 1 system system 64 2018-11-02 12:26 10 -> /system/framework/QPerformance.jar
lrwx------ 1 system system 64 2018-11-02 12:26 100 -> anon_inode:[timerfd]
lrwx------ 1 system system 64 2018-11-02 12:26 101 -> anon_inode:[timerfd]
lrwx------ 1 system system 64 2018-11-02 12:26 102 -> anon_inode:[timerfd]
lrwx------ 1 system system 64 2018-11-02 12:26 103 -> anon_inode:[timerfd]
lrwx------ 1 system system 64 2018-11-02 12:26 104 -> anon_inode:[timerfd]
lrwx------ 1 system system 64 2018-11-02 12:26 105 -> anon_inode:[eventpoll]
lr-x------ 1 system system 64 2018-11-02 12:26 106 -> anon_inode:inotify
lr-x------ 1 system system 64 2018-11-02 12:26 107 -> pipe:[36681]
l-wx------ 1 system system 64 2018-11-02 12:26 108 -> pipe:[36681]
lrwx------ 1 system system 64 2018-11-02 12:26 109 -> anon_inode:[eventfd]
lr-x------ 1 system system 64 2018-11-02 12:26 11 -> /system/framework/core-oj.jar
lrwx------ 1 system system 64 2018-11-02 12:26 110 -> anon_inode:[eventpoll]
lrwx------ 1 system system 64 2018-11-02 12:26 111 -> socket:[35371]
lrwx------ 1 system system 64 2018-11-02 12:26 112 -> socket:[35372]
lr-x------ 1 system system 64 2018-11-02 12:26 113 -> /system/media/theme/defau

二、Fd Leak案例

現在看幾種FD泄露問題的案例,FD泄露問題的特點是:

  1. 同一個問題可能出現不同堆棧, 比較隱晦
  2. Fd泄漏時內存可能不會出現不足,就算觸發GC也不一定能夠回收已經創建的文件句柄

日志關鍵字:
ashmem_create_region failed for ‘indirect ref table’: Too many open files
"Too many open files"
"Could not allocate JNI Env"
"Could not allocate dup blob fd"
"Could not read input channel file descriptors from parcel"
"pthread_create"
"InputChannel is not initialized"
"Could not open input channel pair"
當你看到上面幾種crash的堆棧之后,就需要往fd泄露的方向上去思考了

1、Resource相關

使用輸入輸出流沒有關閉的可能會出問題,FileInputStream,FileOutputStream,FileReader,FileWriter 等,因為每打開一個文件需要fd。一些輸入流也提供了基于fd的構造方法

174    public FileInputStream(FileDescriptor fdObj) {
175        this(fdObj, false /* isFdOwner */);
176    }
177

下面是一種泄露案例

frameworks/base/services/core/java/com/android/server/pm/ResmonWhitelistPackage.java
10final class ResmonWhitelistPackage {
11    private final File mSystemDir;
12    private final File mWhitelistFile;
13
14    final ArrayList<String> mPackages = new ArrayList<String>();
15
16    ResmonWhitelistPackage() {
17        mSystemDir = new File("/system/", "etc");
18        mWhitelistFile = new File(mSystemDir, "resmonwhitelist.txt");
19    }
20
21    void readList() {
               ....
25        try {
26            /// M: Clear white list record before update it
27            mPackages.clear();
28            BufferedReader br = new BufferedReader(new FileReader(mWhitelistFile));
29            String line = br.readLine();
30            while (line != null) {
31                mPackages.add(line);
32                line = br.readLine();
33            }
34            br.close();
35        } catch (IOException e) {
36            //Log.e(PackageManagerService.TAG, "IO Exception happened while reading resmon whitelist");
37            e.printStackTrace();
38        }
39    }
40}

br.close并不是在finally語句中,可能會出現未關閉的可能。如果代碼寫的風騷一點,也有辦法。從 Java 7 build 105 版本開始,Java 7 的編譯器和運行環境支持新的 try-with-resources 語句,稱為 ARM 塊(Automatic Resource Management) ,自動資源管理。

private static void customBufferStreamCopy(File source, File target) {
    try (InputStream fis = new FileInputStream(source);
        OutputStream fos = new FileOutputStream(target)){
 
        byte[] buf = new byte[8192];
 
        int i;
        while ((i = fis.read(buf)) != -1) {
            fos.write(buf, 0, i);
        }
    }
    catch (Exception e) {
        e.printStackTrace();
    }
}

代碼清晰,且不會發生泄露。

2、HandlerThread相關

使用HandlerThread不小心也會發生fd泄露,看看這個案例

2.1、現象

systemui總是crash,發生問題系統版本Android O

2.2、初步分析
pid: 18465, tid: 32737, name: async_sensor >>> com.android.systemui <<<
signal 5 (SIGTRAP), code -32763 (PTRACE_EVENT_STOP), fault addr 0x3e800007fe1
x0 fffffffffffffffc x1 000000735df1ec38 x2 0000000000000010 x3 00000000ffffffff
x4 0000000000000000 x5 0000000000000008 x6 0000007428971000 x7 0000000000bb3876
x8 0000000000000016 x9 7fffffffffffffff x10 000000000000000c x11 0000000000000000
x12 000000735df1ed38 x13 000000005b20a831 x14 002cd0c4e58dc31b x15 0000fdf7aa690a91
x16 00000074245e7498 x17 000000742453bd00 x18 0000000000000004 x19 000000735df1f588
x20 000000738727b708 x21 00000000ffffffff x22 000000735df1f588 x23 000000738727b660
x24 0000000000000028 x25 000000000000000c x26 0000000014a000b0 x27 0000007385815300
x28 00000000710507c8 x29 000000735df1ebe0 x30 000000742453bd38
sp 000000735df1ebc0 pc 00000074245866dc pstate 0000000060000000
v0 00000000000000000000000000000000 v1 00000000000000000000000000000001
v2 00000000000000002065766974616e3c v3 00000000000000000000000000000000
v4 00000000000000008020080200000000 v5 00000000000000004000000000000000
v6 00000000000000000000000000000000 v7 00000000000000008020080280200802
v8 00000000000000000000000000000000 v9 00000000000000000000000000000000
v10 00000000000000000000000000000000 v11 00000000000000000000000000000000
v12 00000000000000000000000000000000 v13 00000000000000000000000000000000
v14 00000000000000000000000000000000 v15 00000000000000000000000000000000
v16 40100401401004014010040140100401 v17 a0080000a00a0000a800aa0040404000
v18 80200800000000008020080200000000 v19 000000000000000000000000ebad8083
v20 000000000000000000000000ebad8084 v21 000000000000000000000000ebad8085
v22 000000000000000000000000ebad8086 v23 000000000000000000000000ebad8087
v24 000000000000000000000000ebad8088 v25 000000000000000000000000ebad8089
v26 000000000000000000000000ebad808a v27 000000000000000000000000ebad808b
v28 000000000000000000000000ebad808c v29 000000000000000000000000ebad808d
v30 000000000000000000000000ebad808e v31 00000000000000000000000041e00000
fpsr 00000013 fpcr 00000000

backtrace:
#00 pc 000000000006a6dc /system/lib64/libc.so (__epoll_pwait+8)
#01 pc 000000000001fd34 /system/lib64/libc.so (epoll_pwait+52)
#02 pc 0000000000015d08 /system/lib64/libutils.so (android::Looper::pollInner(int)+144)
#03 pc 0000000000015bf0 /system/lib64/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)+108)
#04 pc 0000000000111bac /system/lib64/libandroid_runtime.so (android::android_os_MessageQueue_nativePollOnce(_JNIEnv*, _jobject*, long, int)+44)
#05 pc 0000000000c005cc /system/framework/arm64/boot-framework.oat (offset 0x9cb000) (android.app.NativeActivity.onWindowFocusChangedNative [DEDUPED]+140)
#06 pc 0000000001773f00 /system/framework/arm64/boot-framework.oat (offset 0x9cb000) (android.os.MessageQueue.next+192)

乍看是處理消息的時候掛了?繼續查看log發現

06-15 22:00:33.921 1000 2155 2335 E Parcel : fcntl(F_DUPFD_CLOEXEC) failed in Parcel::read, i is 0, fds[i] is -1, fd_count is 2, error: Too many open files
06-15 22:00:33.921 1000 2155 2335 E Surface : dequeueBuffer: IGraphicBufferProducer::requestBuffer failed: -22
06-15 22:00:33.921 1000 2155 2335 I Adreno : DequeueBuffer: dequeueBuffer failed
06-15 22:00:33.921 1000 2155 2335 E Parcel : fcntl(F_DUPFD_CLOEXEC) failed in Parcel::read, i is 0, fds[i] is -1, fd_count is 2, error: Too many open files
06-15 22:00:33.921 1000 2155 2335 E Surface : dequeueBuffer: IGraphicBufferProducer::requestBuffer failed: -22
06-15 22:00:33.921 1000 2155 2335 I Adreno : DequeueBuffer: dequeueBuffer failed
06-15 22:00:33.921 1000 2155 2335 E Parcel : fcntl(F_DUPFD_CLOEXEC) failed in Parcel::read, i is 0, fds[i] is -1, fd_count is 2, error: Too many open files
06-15 22:00:33.921 1000 2155 2335 E Surface : dequeueBuffer: IGraphicBufferProducer::requestBuffer failed: -22
06-15 22:00:33.921 1000 2155 2335 I Adreno : DequeueBuffer: dequeueBuffer failed
06-15 22:00:33.922 1000 2155 2335 E Parcel : fcntl(F_DUPFD_CLOEXEC) failed in Parcel::read, i is 0, fds[i] is -1, fd_count is 2, error: Too many open files
06-15 22:00:33.922 1000 2155 2335 E Surface : dequeueBuffer: IGraphicBufferProducer::requestBuffer failed: -22
06-15 22:00:33.922 1000 2155 2335 I Adreno : DequeueBuffer: dequeueBuffer failed
06-15 22:00:33.922 1000 2155 2335 E OpenGLRenderer: GL error: GL_INVALID_OPERATION
06-15 22:00:33.922 1000 2155 2335 F OpenGLRenderer: glCopyTexSubImage2D error! GL_INVALID_OPERATION (0x502

狀態欄open fd超過1024, 看log有很多上面這種log,這個是真實案發現場嗎?

2.3、深入分析

O上發生NE時會將fd信息打印到tombstone文件中,看fd信息確實已經滿了,多為anon_inode:[eventfd]和anon_inode:dmabuf,

backtrace:
#00 pc 000000000006a6dc /system/lib64/libc.so (__epoll_pwait+8)
#01 pc 000000000001fd34 /system/lib64/libc.so (epoll_pwait+52)
#02 pc 0000000000015d08 /system/lib64/libutils.so (android::Looper::pollInner(int)+144)
#03 pc 0000000000015bf0 /system/lib64/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)+108)
#04 pc 0000000000111bac /system/lib64/libandroid_runtime.so (android::android_os_MessageQueue_nativePollOnce(_JNIEnv*, _jobject*, long, int)+44)
#05 pc 0000000000c005cc /system/framework/arm64/boot-framework.oat (offset 0x9cb000) (android.app.NativeActivity.onWindowFocusChangedNative [DEDUPED]+140)
#06 pc 0000000001773f00 /system/framework/arm64/boot-framework.oat (offset 0x9cb000) (android.os.MessageQueue.next+192)
....
fd 556: anon_inode:[eventpoll]
fd 557: anon_inode:[eventpoll]
fd 558: anon_inode:[eventfd]
fd 559: anon_inode:[eventpoll]
fd 560: anon_inode:[eventfd]
fd 561: anon_inode:[eventpoll]
fd 562: anon_inode:[eventfd]
fd 563: anon_inode:[eventfd]
fd 564: anon_inode:[eventpoll]
fd 565: anon_inode:[eventfd]
fd 566: anon_inode:[eventpoll]
fd 567: /dev/ashmem
..... //省略千行
fd 1022: anon_inode:dmabuf
fd 1023: socket:[3549620]

通過trace分析,還有一個關鍵的異常log,看到systemui進程有很多個async_sensor線程,為什么這個線程這么多呢?

pid: 11019, tid: 2301, name: async_sensor >>> com.android.systemui <<<
pid: 11019, tid: 2431, name: async_sensor >>> com.android.systemui <<<
pid: 11019, tid: 2522, name: async_sensor >>> com.android.systemui <<<
pid: 11019, tid: 2542, name: async_sensor >>> com.android.systemui <<<
pid: 11019, tid: 2600, name: async_sensor >>> com.android.systemui <<<
.....//省略若干
pid: 11019, tid: 5693, name: async_sensor >>> com.android.systemui <<<

搜查代碼async_sensor是什么?發現async_sensor是個HanderThread

image.png

看在DozeFactory中,有new AsyncSensorManager 的地方:

image.png

繼續查看assembleMachine方法在哪里調用的,DozeService中有調用 assembleMachine的地方

image.png

回頭在結合log發現,DozeService被頻繁的啟動,看來一步步的接近真相了。這個問題看來是鎖屏同事造成的問題。

isTest=false, canDoze=true, userId=0
06-15 21:48:11.888 1000 1313 1384 I DreamController: Stopping dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:48:13.337 1000 1313 1384 I DreamController: Starting dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:48:24.519 1000 1313 1384 I DreamController: Stopping dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:48:33.577 1000 1313 1384 I DreamController: Starting dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:48:50.640 1000 1313 1384 I DreamController: Stopping dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:48:56.540 1000 1313 1384 I DreamController: Starting dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:49:28.240 1000 1313 1384 I DreamController: Stopping dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:49:29.207 1000 1313 1384 I DreamController: Starting dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:49:30.206 1000 1313 1384 I DreamController: Stopping dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:49:33.332 1000 1313 1384 I DreamController: Starting dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:49:37.435 1000 1313 1384 I DreamController: Stopping dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:49:50.529 1000 1313 1384 I DreamController: Starting dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:50:20.010 1000 1313 1384 I DreamController: Stopping dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
06-15 21:50:30.148 1000 1313 1384 I DreamController: Starting dream: name=ComponentInfo{com.android.systemui/com.android.keyguard.doze.DozeService}, isTest=false, canDoze=true, userId=0
......
2.4、修復方案

最終轉給鎖屏同事,將此修復


image.png

所以本問題的RootCase就是頻繁的啟動了DozeService,創建了大量的HandlerThread導致fd泄露,那么為什么HandlerThread和fd泄露有關系呢?跟蹤源碼發現HandlerThread創建會引起Looper的創建,每一個Looper在創建的時候會打開兩個fd,一個是eventfd,另外一個是mEpolled,這個和tombstone文件中打印的fd也對上了。

image.png

總結,這種泄露問題如果分析的時候,如果不知道HandlerThread會創建兩個fd的基本知識,那么這個問題比較難以分析。

2、Thread.start相關

線程啟動的時候,可能也會有fd泄露的風險,不過這種錯誤不太容易犯下,如果你真是在一個循環中創建1024線程,那么立刻見效,程序死掉。

image.png

trace1

    java.lang.OutOfMemoryError: Could not allocate JNI Env
   at java.lang.Thread.nativeCreate(Native Method)
   at java.lang.Thread.start(Thread.java:729)
   at com.android.server.wifi.WifiNative.startHal(WifiNative.java:1639)
   at com.android.server.wifi.WifiStateMachine.setupDriverForSoftAp(WifiStateMachine.java:3970)
   at com.android.server.wifi.WifiStateMachine.-wrap9(WifiStateMachine.java)
   at com.android.server.wifi.WifiStateMachine$InitialState.processMessage(WifiStateMachine.java:4480)
   at com.android.internal.util.StateMachine$SmHandler.processMsg(StateMachine.java:980)
   at com.android.internal.util.StateMachine$SmHandler.handleMessage(StateMachine.java:799)
   at android.os.Handler.dispatchMessage(Handler.java:102)
   at android.os.Looper.loop(Looper.java:163)
   at android.os.HandlerThread.run(HandlerThread.java:61)

trace2

java.lang.OutOfMemoryError: pthread_create (1040KB stack) failed: Try again
at java.lang.Thread.nativeCreate(Native Method)
at java.lang.Thread.start(Thread.java:733)
at com.tencent.mm.sdk.f.b$a.start(SourceFile:61)
at com.tencent.mm.am.a.bU(SourceFile:60)
at com.tencent.mm.ui.MMAppMgr$8.tC(SourceFile:315)
at com.tencent.mm.sdk.platformtools.am.handleMessage(SourceFile:69)
at com.tencent.mm.sdk.platformtools.aj.handleMessage(SourceFile:173)
at com.tencent.mm.sdk.platformtools.aj.dispatchMessage(SourceFile:128)
at android.os.Looper.loop(Looper.java:176)
at android.app.ActivityThread.main(ActivityThread.java:6701)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.Zygote$MethodAndArgsCaller.run(Zygote.java:246)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:783)
3、Inputchannel 相關

Inputchannel也會可能出現fd泄露問題,如下:


image.png
3.1 在Activity中不斷彈Dialog
public class Main2Activity extends Activity {

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main2);
    }
    public void onClick(View view) {
        for (int i = 0; i < 1024; i++) {
            AlertDialog.Builder builder = new AlertDialog.Builder(this);
            builder.setTitle("fd").setIcon(R.drawable.ic_launcher_background).create();
            builder.show();
        }
    }
}

不過一會,這個App就死了,報出了下面的問題

11-02 17:38:22.263 9351-9351/com.example.wangjing.rebootdemo E/AndroidRuntime: FATAL EXCEPTION: main
 Process: com.example.wangjing.rebootdemo, PID: 9351
java.lang.IllegalStateException: Could not execute method for android:onClick
   at android.view.View$DeclaredOnClickListener.onClick(View.java:5391)
   at android.view.View.performClick(View.java:6311)
   at android.view.View$PerformClick.run(View.java:24833)
   at android.os.Handler.handleCallback(Handler.java:794)
   at android.os.Handler.dispatchMessage(Handler.java:99)
   at android.os.Looper.loop(Looper.java:173)
   at android.app.ActivityThread.main(ActivityThread.java:6653)
   at java.lang.reflect.Method.invoke(Native Method)
   at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:547)
   at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:821)
Caused by: java.lang.reflect.InvocationTargetException
   at java.lang.reflect.Method.invoke(Native Method)
   at android.view.View$DeclaredOnClickListener.onClick(View.java:5386)
   at android.view.View.performClick(View.java:6311) 
   at android.view.View$PerformClick.run(View.java:24833) 
   at android.os.Handler.handleCallback(Handler.java:794) 
   at android.os.Handler.dispatchMessage(Handler.java:99) 
   at android.os.Looper.loop(Looper.java:173) 
   at android.app.ActivityThread.main(ActivityThread.java:6653) 
   at java.lang.reflect.Method.invoke(Native Method) 
   at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:547) 
   at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:821) 
Caused by: java.lang.RuntimeException: Could not read input channel file descriptors from parcel.
   at android.view.InputChannel.nativeReadFromParcel(Native Method)
   at android.view.InputChannel.readFromParcel(InputChannel.java:148)
   at android.view.IWindowSession$Stub$Proxy.addToDisplay(IWindowSession.java:804)
   at android.view.ViewRootImpl.setView(ViewRootImpl.java:770)
   at android.view.WindowManagerGlobal.addView(WindowManagerGlobal.java:356)
   at android.view.WindowManagerImpl.addView(WindowManagerImpl.java:94)
   at android.app.Dialog.show(Dialog.java:330)
   at android.app.AlertDialog$Builder.show(AlertDialog.java:1114)
   at com.example.wangjing.rebootdemo.Main2Activity.onClick(Main2Activity.java:28)
   at java.lang.reflect.Method.invoke(Native Method) 
   at android.view.View$DeclaredOnClickListener.onClick(View.java:5386) 
   at android.view.View.performClick(View.java:6311) 
   at android.view.View$PerformClick.run(View.java:24833) 
   at android.os.Handler.handleCallback(Handler.java:794) 
   at android.os.Handler.dispatchMessage(Handler.java:99) 
   at android.os.Looper.loop(Looper.java:173) 
   at android.app.ActivityThread.main(ActivityThread.java:6653) 
   at java.lang.reflect.Method.invoke(Native Method) 
   at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:547) 
   at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:821) 

查看一下這個進程的fd信息,很多的socket,正常情況下,這些東西是沒有的

jason:/ # ps -ef |grep "wangjing"
u0_a161      13134  9462 3 17:43:12 ?     00:00:04 com.example.wangjing.rebootdemo
root         13385 13380 3 17:45:26 pts/1 00:00:00 grep wangjing
jason:/ # ls -la  proc/13134/fd/                                                                                                                                                                           
total 0
dr-x------ 2 u0_a161 u0_a161  0 2018-11-02 17:43 .
dr-xr-xr-x 9 u0_a161 u0_a161  0 2018-11-02 17:43 ..
lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 0 -> /dev/null
lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 1 -> /dev/null
lr-x------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 10 -> /system/framework/com.nxp.nfc.nq.jar
lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 100 -> socket:[17760179]
lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 101 -> socket:[17753761]
lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 102 -> socket:[17773237]
lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 103 -> socket:[17760182]
lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 104 -> socket:[17760184]
lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 105 -> socket:[17773239]
lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 106 -> socket:[17776657]
lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 107 -> socket:[17774959]
lrwx------ 1 u0_a161 u0_a161 64 2018-11-02 17:45 108 -> socket:[17776659]
......
5、Bitmap 相關

bitmap也是需要fd的,如下圖,沒有關閉,可能引發fd泄露的可能。


image.png

trace1

java.lang.RuntimeException: Could not allocate dup blob fd.
   at android.graphics.Bitmap.nativeCreateFromParcel(Native Method)
   at android.graphics.Bitmap.access$100(Bitmap.java:36)
   at android.graphics.Bitmap$1.createFromParcel(Bitmap.java:1528)
   at android.graphics.Bitmap$1.createFromParcel(Bitmap.java:1520)
   at android.widget.RemoteViews$BitmapCache.<init>(RemoteViews.java:954)
   at android.widget.RemoteViews.<init>(RemoteViews.java:1820)
   at android.widget.RemoteViews.<init>(RemoteViews.java:1812)
   at android.widget.RemoteViews.clone(RemoteViews.java:1905)
   at android.app.Notification.cloneInto(Notification.java:1534)
   at android.app.Notification.clone(Notification.java:1508)
   at android.service.notification.StatusBarNotification.clone(StatusBarNotification.java:161)
   at com.android.server.notification.NotificationManagerService$NotificationListeners.notifyPostedLocked(NotificationManagerService.java:3557)
   at com.android.server.notification.NotificationManagerService$8.run(NotificationManagerService.java:2337)
   at android.os.Handler.handleCallback(Handler.java:815)
   at android.os.Handler.dispatchMessage(Handler.java:104)
   at android.os.Looper.loop(Looper.java:207)
   at com.android.server.SystemServer.run(SystemServer.java:410)
   at com.android.server.SystemServer.main(SystemServer.java:255)
   at java.lang.reflect.Method.invoke(Native Method)
   at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:933)
   at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:782)

三、總結

通過上面的五個案例,總結常見的fd泄露的情景,一般出現下面的log,就需要懷疑是否有fd泄露的情況。

"Too many open files"
"Could not allocate JNI Env"
"Could not allocate dup blob fd"
"Could not read input channel file descriptors from parcel"
"pthread_create"
"InputChannel is not initialized"
"Could not open input channel pair"

  • 大批量的打開“anon_inode:[eventpoll]” 和 "pipe" 或者 "anon_inode:[eventfd]", 超過100個eventpoll, 通常情況下是開啟了太多的HandlerThread/Looper/MessageQueue, 線程忘記關閉, 或者looper 沒有釋放. 可以抓取hprof 進行快速分析

  • 對于system server, 如果有大批量的socket 打開, 可能是因為Input Channel 沒有關閉, 此類同樣抓取hprof, 查看system server 中WindowState 的情況.

  • 大量的打開“/dev/ashmem”, 如果是Context provider, 或者其他app, 很可能是打開數據庫沒有關閉, 或者數據庫鏈接頻繁打開忘記關閉. 這個時候查看這個進程的maps, cat proc/pid/maps, 即可看到這個ashmem 的name, 然后進一步可知道在哪里泄露.

3.1、容易復現

1.查看fd信息adb shell ls -a -l /proc/<pid>/fd ,lsof

2.查看進程線程信息:ps -t <pid>,或者抓進程trace, kill -3 <pid>

3.抓取hprof定位資源使用情況

3.2、難復現

1.對于應用自身fd泄漏發生JE時可以在復寫UncatchHandlerException在應用crash的時候通過readlink的方式讀取/proc/self/fd的信息,在后面發生的時候可以以獲取fd信息

2.O之后NE的Tombstone文件中有open files,可以查看打開的fd信息

3.抓取進程的ps信息或者trace信息

4.如果是inputchannel類型的,有可能是窗口類型的,因此可以查看window情況,dumpsys window

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。
  • 序言:七十年代末,一起剝皮案震驚了整個濱河市,隨后出現的幾起案子,更是在濱河造成了極大的恐慌,老刑警劉巖,帶你破解...
    沈念sama閱讀 227,797評論 6 531
  • 序言:濱河連續發生了三起死亡事件,死亡現場離奇詭異,居然都是意外死亡,警方通過查閱死者的電腦和手機,發現死者居然都...
    沈念sama閱讀 98,179評論 3 414
  • 文/潘曉璐 我一進店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人,你說我怎么就攤上這事。” “怎么了?”我有些...
    開封第一講書人閱讀 175,628評論 0 373
  • 文/不壞的土叔 我叫張陵,是天一觀的道長。 經常有香客問我,道長,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 62,642評論 1 309
  • 正文 為了忘掉前任,我火速辦了婚禮,結果婚禮上,老公的妹妹穿的比我還像新娘。我一直安慰自己,他們只是感情好,可當我...
    茶點故事閱讀 71,444評論 6 405
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著,像睡著了一般。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發上,一...
    開封第一講書人閱讀 54,948評論 1 321
  • 那天,我揣著相機與錄音,去河邊找鬼。 笑死,一個胖子當著我的面吹牛,可吹牛的內容都是我干的。 我是一名探鬼主播,決...
    沈念sama閱讀 43,040評論 3 440
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了?” 一聲冷哼從身側響起,我...
    開封第一講書人閱讀 42,185評論 0 287
  • 序言:老撾萬榮一對情侶失蹤,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后,有當地人在樹林里發現了一具尸體,經...
    沈念sama閱讀 48,717評論 1 333
  • 正文 獨居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內容為張勛視角 年9月15日...
    茶點故事閱讀 40,602評論 3 354
  • 正文 我和宋清朗相戀三年,在試婚紗的時候發現自己被綠了。 大學時的朋友給我發了我未婚夫和他白月光在一起吃飯的照片。...
    茶點故事閱讀 42,794評論 1 369
  • 序言:一個原本活蹦亂跳的男人離奇死亡,死狀恐怖,靈堂內的尸體忽然破棺而出,到底是詐尸還是另有隱情,我是刑警寧澤,帶...
    沈念sama閱讀 38,316評論 5 358
  • 正文 年R本政府宣布,位于F島的核電站,受9級特大地震影響,放射性物質發生泄漏。R本人自食惡果不足惜,卻給世界環境...
    茶點故事閱讀 44,045評論 3 347
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧,春花似錦、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 34,418評論 0 26
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至,卻和暖如春,著一層夾襖步出監牢的瞬間,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 35,671評論 1 281
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留,地道東北人。 一個月前我還...
    沈念sama閱讀 51,414評論 3 390
  • 正文 我出身青樓,卻偏偏與公主長得像,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當晚...
    茶點故事閱讀 47,750評論 2 370

推薦閱讀更多精彩內容

  • 轉載請注明出處(http://www.lxweimin.com/p/5f538820e370),您的打賞是小編繼續...
    福later閱讀 27,605評論 8 72
  • mean to add the formatted="false" attribute?.[ 46% 47325/...
    ProZoom閱讀 2,715評論 0 3
  • 想要學習好中文,永遠不要認為學會課堂 那些就可以了。課堂的積累只是很少一部分,對于我們遠遠不夠。 有...
    古卷青鋒閱讀 4,450評論 0 3
  • 今天肥肥來上課了。但是她一直沒有說話,就窩在座位上。下課后,我和小微白大姐打算拉著肥肥去上廁所,想和肥肥說說話。 ...
    小棉襖呀閱讀 472評論 0 0
  • 記得三年前,我是個過一天算一天的人,還一身債,拿著一份死工資,每月都超支的月光族,總是前月用了后月糧,于是我很渴望...
    品微閱讀 291評論 0 0