如何处理:java . util . concurrent。TimeoutException: android.os.BinderProxy.finalize()在10秒错误后超时?

We're seeing a number of TimeoutExceptions in GcWatcher.finalize, BinderProxy.finalize, and PlainSocketImpl.finalize. 90+% of them happen on Android 4.3. We're getting reports of this from Crittercism from users out in the field.

我们在GcWatcher中看到了一些超时异常。最终,BinderProxy。完成,PlainSocketImpl.finalize。90%以上发生在Android 4.3上。我们从该领域的用户那里得到了批评的报告。

如何处理:java . util . concurrent。TimeoutException: android.os.BinderProxy.finalize()在10秒错误后超时?

The error is a variation of: "com.android.internal.BinderInternal$GcWatcher.finalize() timed out after 10 seconds"

错误是:com. android.intern. binderinternal . binderinternal $GcWatcher.finalize()在10秒后超时。

java.util.concurrent.TimeoutException: android.os.BinderProxy.finalize() timed out after 10 seconds
at android.os.BinderProxy.destroy(Native Method)
at android.os.BinderProxy.finalize(Binder.java:459)
at java.lang.Daemons$FinalizerDaemon.doFinalize(Daemons.java:187)
at java.lang.Daemons$FinalizerDaemon.run(Daemons.java:170)
at java.lang.Thread.run(Thread.java:841)

So far we haven't had any luck reproducing the problem in house or figuring out what might have caused it.

到目前为止,我们还没有在家里重现这个问题,也没有弄清楚是什么原因导致了这个问题。

Any ideas what can cause this? Any idea how to debug this and find out which part of the app causes this? Anything that sheds light on the issue helps.

有什么想法吗?你知道如何调试和找出应用程序的哪个部分导致了这个问题吗?任何能阐明这个问题的东西都有帮助。

More Stacktraces:

加:

1   android.os.BinderProxy.destroy  
2   android.os.BinderProxy.finalize Binder.java, line 482
3   java.lang.Daemons$FinalizerDaemon.doFinalize    Daemons.java, line 187
4   java.lang.Daemons$FinalizerDaemon.run   Daemons.java, line 170
5   java.lang.Thread.run    Thread.java, line 841  

2

2

1   java.lang.Object.wait   
2   java.lang.Object.wait   Object.java, line 401
3   java.lang.ref.ReferenceQueue.remove ReferenceQueue.java, line 102
4   java.lang.ref.ReferenceQueue.remove ReferenceQueue.java, line 73
5   java.lang.Daemons$FinalizerDaemon.run   Daemons.java, line 170
6   java.lang.Thread.run

3

3

1   java.util.HashMap.newKeyIterator    HashMap.java, line 907
2   java.util.HashMap$KeySet.iterator   HashMap.java, line 913
3   java.util.HashSet.iterator  HashSet.java, line 161
4   java.util.concurrent.ThreadPoolExecutor.interruptIdleWorkers    ThreadPoolExecutor.java, line 755
5   java.util.concurrent.ThreadPoolExecutor.interruptIdleWorkers    ThreadPoolExecutor.java, line 778
6   java.util.concurrent.ThreadPoolExecutor.shutdown    ThreadPoolExecutor.java, line 1357
7   java.util.concurrent.ThreadPoolExecutor.finalize    ThreadPoolExecutor.java, line 1443
8   java.lang.Daemons$FinalizerDaemon.doFinalize    Daemons.java, line 187
9   java.lang.Daemons$FinalizerDaemon.run   Daemons.java, line 170
10  java.lang.Thread.run

4

4

1   com.android.internal.os.BinderInternal$GcWatcher.finalize   BinderInternal.java, line 47
2   java.lang.Daemons$FinalizerDaemon.doFinalize    Daemons.java, line 187
3   java.lang.Daemons$FinalizerDaemon.run   Daemons.java, line 170
4   java.lang.Thread.run

9 个解决方案

#1


160  

Full disclosure - I'm the author of the previously mentioned talk in TLV DroidCon.

我是TLV DroidCon中先前提到的谈话的作者。

I had a chance to examine this issue across many Android applications, and discuss it with other developers who encountered it - and we all got to the same point: this issue cannot be avoided, only minimized.

我有机会在许多Android应用程序中检查这个问题,并与遇到这个问题的其他开发人员进行讨论——我们都得出了相同的结论:这个问题无法避免,只能最小化。

I took a closer look at the default implementation of the Android Garbage collector code, to understand better why this exception is Thrown and on what could be the possible causes. I even found a possible root cause during experimentation.

我仔细查看了Android垃圾收集器代码的默认实现,以便更好地理解为什么会抛出这个异常,以及可能的原因。我甚至在实验中发现了一个可能的根本原因。

The root of the problem is at the point a device "Goes to Sleep" for a while - this means that the OS has decided to lower the battery consumption by stopping most User Land processes for a while, and turning Screen off, reducing CPU cycles, etc. The way this is done - is on a Linux system level where the processes are Paused mid run. This can happen at any time during normal Application execution, but it will stop at at a Native system call, as the context switching is done on the kernel level. So - this is where the Dalvik GC joins the story. The Dalvik GC code (as implemented in the Dalvik project in the AOSP site) is not a complicated piece of code. The basic way it work is covered in my DroidCon slides. what I did not cover is the basic GC loop - at the point where the collector has a list of Objects to finalize (and destroy). the loop logic at the base can be simplified like this:

问题的根源是在设备“睡觉”一段时间——这意味着操作系统决定降低电池消耗通过阻止大多数用户土地进程,关闭屏幕,减少CPU周期,等等。这样做的方式是在Linux系统级别的过程中期暂停运行。这可以在正常应用程序执行期间的任何时候发生,但是在本地系统调用时就会停止,因为上下文切换是在内核级完成的。这就是Dalvik GC加入故事的地方。Dalvik GC代码(在AOSP站点的Dalvik项目中实现)不是一段复杂的代码。我的DroidCon幻灯片介绍了它的基本工作方式。我没有介绍的是基本的GC循环——此时收集器有一个要完成(并销毁)的对象列表。底部的循环逻辑可以这样简化:

  1. take starting_timestamp,
  2. starting_timestamp,
  3. remove object for list of objects to release,
  4. 删除要释放的对象列表的对象,
  5. release object - finalize() and call native destroy() if required,
  6. 发布对象——finalize(),并在需要时调用native destroy(),
  7. take end_timestamp,
  8. end_timestamp,
  9. calculate (end_timestamp - starting_timestamp) and compare against a hard coded timeout value of 10 seconds,
  10. 计算(end_timestamp - starting_timestamp)并与硬编码的10秒超时值进行比较,
  11. if timeout has reached - throw the concurrent.TimeoutException and kill the process.
  12. 如果超时已达到-抛出并发。TimeoutException异常并终止进程。

Now consider the following scenario:

Application runs along doing its thing. this is not a User facing application, it runs in the background. During this background operation, Objects are created, used and need to be collected to release memory. Application does not bother with a Wakelock - as this will affect the battery adversely, and seems unnecessary. this means the Application will invoke the GC from time to time. Normally the GC runs is completed without a hitch. Sometimes (very rarely) the System will decide to Sleep in the middle of the GC run. This will happen if you run your application long enough, and monitor the Dalvik memory logs closely. Now - consider the timestamp logic of the basic GC loop - it is possible for the device to start the run, take a start_stamp, and go to sleep at the destroy() native call on a system object. when it wakes up and resumes the run, the destroy() will finish, and the next end_stamp will be the time it took the destroy() call+the sleep time. If the sleep time was long - over 10 seconds, the concurrent.timeout exception will be thrown.

应用程序执行它的操作。这不是一个面向用户的应用程序,它在后台运行。在此后台操作期间,将创建、使用并需要收集对象来释放内存。应用程序不需要使用尾流锁——因为这会对电池产生负面影响,而且似乎没有必要。这意味着应用程序将不时地调用GC。正常情况下,GC运行是顺利完成的。有时(很少)系统会决定在GC运行的中间休眠。如果您运行应用程序的时间足够长,并且密切监视Dalvik内存日志,就会发生这种情况。现在——考虑基本GC循环的时间戳逻辑——设备可以启动运行、获取start_stamp并在系统对象上的destroy()本机调用中休眠。当它醒来并继续运行时,destroy()将完成,下一个end_stamp将是它调用destroy() +休眠时间的时间。如果睡眠时间很长——超过10秒,则同时进行。将抛出超时异常。

I have seen this in the graphs generated from the analysis python script - for Android System Applications, not just my own monitored apps. collect enough logs, you will eventually see it.

我从分析python脚本生成的图表中看到了这一点——对于Android系统应用程序,而不仅仅是我自己的监控应用程序。收集足够的日志,你最终会看到它。

Bottom line:

The issue cannot be avoided - you will encounter it if your app runs in the background. You can mitigate by taking a wakelock, and prevent the device from sleeping, but that is a different story altogether, and a new headache, and maybe another talk in another con.

这个问题无法避免——如果你的应用程序在后台运行,你就会遇到它。你可以通过打个响铃来减轻疼痛,并阻止设备睡觉,但这完全是另外一回事了,还有一个新的头痛,也许还有另一个骗局。

You can minimize the problem by reducing GC calls - making the scenario less likely. tips are in the slides.

您可以通过减少GC调用来最小化问题,从而降低场景发生的可能性。提示在幻灯片里。

I have not yet had the chance to go over the Dalvik 2 (a.k.a ART) GC code - which boasts a new Generational Compacting feature, or performed any experiments on an Android Lollipop.

我还没有机会去看Dalvik 2 (a.k。GC代码——它拥有一个新的代际压缩特性,或者在Android棒棒糖上进行任何实验。

Added 7/5/2015:

添加7/5/2015:

After reviewing the Crash reports aggregation for this crash type, it looks like these crashes from version 5.0+ of Android OS (Lollipop with ART) only account for .5% of this crash type. This means that the ART GC changes has reduced the frequency of these crashes.

在查看了这个崩溃类型的崩溃报告聚合之后,看起来这些来自于Android OS (Lollipop with ART) 5.0+版本的崩溃只占这个崩溃类型的0.5%。这意味着ART GC更改减少了这些崩溃的频率。

Added 6/1/2016:

添加6/1/2016:

Looks like the Android project has added a lot of info on how the GC works in Dalvik 2.0 (a.k.a ART). You can read about it here - Debugging ART Garbage Collection . It also discusses some tools to get information on the GC behavior for your app. Sending a SIGQUIT to your app process will essentially cause an ANR, and dump the application state to a log file for analysis.

看起来Android项目已经添加了很多关于Dalvik 2.0的GC工作的信息(a.k。一个艺术)。你可以在这里阅读-调试艺术垃圾收集。它还讨论了一些工具来获取关于应用程序GC行为的信息。向应用程序进程发送SIGQUIT将导致ANR,并将应用程序状态转储到日志文件中进行分析。

#2


59  

We see this constantly, all over our app, using Crashlytics. The crash usually happens way down in platform code. A small sampling:

在我们的应用程序中,我们经常看到这种情况,使用的是急刹车。崩溃通常发生在平台代码中。一个小抽样:

android.database.CursorWindow.finalize() timed out after 10 seconds

finalize()在10秒后超时

java.util.regex.Matcher.finalize() timed out after 10 seconds

matcher. finalize()在10秒后超时。

android.graphics.Bitmap$BitmapFinalizer.finalize() timed out after 10 seconds

android.graphics.Bitmap$BitmapFinalizer.finalize()在10秒后超时。

org.apache.http.impl.conn.SingleClientConnManager.finalize() timed out after 10 seconds

org.apache.http. impl.int/singleclientconnmanager .finalize()超时10秒。

java.util.concurrent.ThreadPoolExecutor.finalize() timed out after 10 seconds

concurrent.threadpoolexecutor. finalize()超时10秒。

android.os.BinderProxy.finalize() timed out after 10 seconds

finalize()在10秒后超时

android.graphics.Path.finalize() timed out after 10 seconds

finalize()在10秒后超时

The devices on which this happens are overwhelmingly (but not exclusively) devices manufactured by Samsung. That could just mean that most of our users are using Samsung devices; alternately it could indicate a problem with Samsung devices. I'm not really sure.

发生这种情况的设备绝大多数(但并非全部)都是三星制造的设备。这可能意味着我们的大多数用户都在使用三星的设备;另外,它也可能预示着三星设备的问题。我不是很确定。

I suppose this doesn't really answer your questions, but I just wanted to reinforce that this seems quite common, and is not specific to your application.

我想这并不能真正回答您的问题,但我只想强调,这似乎很常见,并不是您的应用程序特有的。

#3


14  

I found some slides about this issue.

我找到了一些关于这个问题的幻灯片。

http://de.slideshare.net/DroidConTLV/android-crash-analysis-and-the-dalvik-garbage-collector-tools-and-tips

http://de.slideshare.net/DroidConTLV/android-crash-analysis-and-the-dalvik-garbage-collector-tools-and-tips

In this slides the author tells that it seems to be a problem with GC, if there are a lot of objects or huge objects in heap. The slide also include a reference to a sample app and a python script to analyze this issue.

在这张幻灯片中,作者告诉我们,如果堆中有很多对象或大型对象,那么GC就会出现问题。这张幻灯片还包括了一个示例应用程序和一个python脚本,以分析这个问题。

https://github.com/oba2cat3/GCTest

https://github.com/oba2cat3/GCTest

https://github.com/oba2cat3/logcat2memorygraph

https://github.com/oba2cat3/logcat2memorygraph

Furthermore I found a hint in comment #3 on this side: https://code.google.com/p/android/issues/detail?id=53418#c3

此外,我在注释#3中发现了一个提示:https://code.google.com/p/android/issues/detail?id=53418#c3

#4


4  

Broadcast Receivers timeout after 10 seconds. Possibly your doing an asynchronous call (wrong) from a broadcast receiver and 4.3 actually detects it.

广播接收器超时10秒。可能你做了一个异步调用(错误)从广播接收器和4.3实际上检测它。

#5


1  

One thing which is invariably true is that at this time, the device would be suffocating for some memory (which is usually the reason for GC to most likely get triggered).

有一件事总是正确的,那就是在这个时候,设备会因为某些内存而窒息(这通常是GC最有可能被触发的原因)。

As mentioned by almost all authors earlier, this issue surfaces when Android tries to run GC while the app is in background. In most of the cases where we observed it, user paused the app by locking their screen. This might also indicate memory leak somewhere in the application, or the device being too loaded already. So the only legitimate way to minimize it is:

正如前面几乎所有作者提到的,当Android试图在后台运行GC时,这个问题就会出现。在我们观察到的大多数情况下,用户通过锁定屏幕来暂停应用程序。这也可能表明应用程序中的某个地方存在内存泄漏,或者设备已经加载过度。所以唯一合法的方法是

  • to ensure there are no memory leaks, and
  • 确保没有内存泄漏,并且
  • to reduce the memory footprint of the app in general.
  • 减少应用程序的内存占用。

#6


1  

try {
    Class<?> c = Class.forName("java.lang.Daemons");
    Field maxField = c.getDeclaredField("MAX_FINALIZE_NANOS");
    maxField.setAccessible(true);
    maxField.set(null, Long.MAX_VALUE);
} catch (ClassNotFoundException e) {
    e.printStackTrace();
} catch (NoSuchFieldException e) {
    e.printStackTrace();
} catch (IllegalAccessException e) {
    e.printStackTrace();
}

#7


1  

We solved the problem by stopping the FinalizerWatchdogDaemon.

我们通过停止finalizerwatchdog守护进程解决了这个问题。

public static void fix() {
    try {
        Class clazz = Class.forName("java.lang.Daemons$FinalizerWatchdogDaemon");

        Method method = clazz.getSuperclass().getDeclaredMethod("stop");
        method.setAccessible(true);

        Field field = clazz.getDeclaredField("INSTANCE");
        field.setAccessible(true);

        method.invoke(field.get(null));

    }
    catch (Throwable e) {
        e.printStackTrace();
    }
}

You can call the method in Application's lifecycle, like attachBaseContext(). For the same reason, you also can specific the phone's manufacture to fix the problem, it's up to you.

您可以在应用程序的生命周期中调用该方法,如attachBaseContext()。出于同样的原因,你也可以指定手机的制造来解决这个问题,这取决于你。

#8


0  

The finalizeQueue may be too long

finalizeQueue可能太长了。

i think that java may require GC.SuppressFinalize() & GC.ReRegisterForFinalize() to let user reduce the finalizedQueue length explicitly

我认为java可能需要GC.SuppressFinalize()和GC.ReRegisterForFinalize()来让用户显式地减少finalizedQueue长度

if the JVM' source code is available, may implement these method ourself, such as android ROM maker

如果JVM的源代码可用,可以自己实现这些方法,比如android ROM maker

#9


0  

It seems like a Android Runtime bug. There seems to be finalizer that runs in its separate thread and calls finalize() method on objects if they are not in the current frame of the stacktrace. For example following code(created to verify this issue) ended with the crash.

这似乎是一个Android运行时的错误。似乎有一个终结器运行在它的独立线程中,如果对象不在stacktrace的当前框架中,那么它会对对象调用finalize()方法。例如,下面的代码(创建来验证这个问题)以崩溃结束。

Let's have some cursor that do something in finalize method(e.g. SqlCipher ones, do close() which locks to the database that is currently in use)

让我们使用一些游标来完成finalize方法(例如。SqlCipher ones, do close()锁定当前正在使用的数据库

private static class MyCur extends MatrixCursor {


    public MyCur(String[] columnNames) {
        super(columnNames);
    }

    @Override
    protected void finalize() {
        super.finalize();

        try {
            for (int i = 0; i < 1000; i++)
                Thread.sleep(30);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }
}

And we do some long running stuff having opened cursor:

我们做了一些长时间的事情,打开光标:

for (int i = 0; i < 7; i++) {
        new Thread(new Runnable() {
            @Override
            public void run() {
                MyCur cur = null;
                try {
                    cur = new MyCur(new String[]{});
                    longRun();
                } finally {
                    cur.close();
                }
            }

            private void longRun() {
                try {
                    for (int i = 0; i < 1000; i++)
                        Thread.sleep(30);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            }
        }).start();
    }

This causes following error:

这将导致以下错误:

FATAL EXCEPTION: FinalizerWatchdogDaemon
                                                                        Process: la.la.land, PID: 29206
                                                                        java.util.concurrent.TimeoutException: MyCur.finalize() timed out after 10 seconds
                                                                            at java.lang.Thread.sleep(Native Method)
                                                                            at java.lang.Thread.sleep(Thread.java:371)
                                                                            at java.lang.Thread.sleep(Thread.java:313)
                                                                            at MyCur.finalize(MessageList.java:1791)
                                                                            at java.lang.Daemons$FinalizerDaemon.doFinalize(Daemons.java:222)
                                                                            at java.lang.Daemons$FinalizerDaemon.run(Daemons.java:209)
                                                                            at java.lang.Thread.run(Thread.java:762)

The production variant with SqlCipher is very similiar:

SqlCipher的产品变体非常相似:

 

12-21 15:40:31.668: E/EH(32131): android.content.ContentResolver$CursorWrapperInner.finalize() timed out after 10 seconds
12-21 15:40:31.668: E/EH(32131): java.util.concurrent.TimeoutException: android.content.ContentResolver$CursorWrapperInner.finalize() timed out after 10 seconds
12-21 15:40:31.668: E/EH(32131): 	at java.lang.Object.wait(Native Method)
12-21 15:40:31.668: E/EH(32131): 	at java.lang.Thread.parkFor$(Thread.java:2128)
12-21 15:40:31.668: E/EH(32131): 	at sun.misc.Unsafe.park(Unsafe.java:325)
12-21 15:40:31.668: E/EH(32131): 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:161)
12-21 15:40:31.668: E/EH(32131): 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:840)
12-21 15:40:31.668: E/EH(32131): 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:873)
12-21 15:40:31.668: E/EH(32131): 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
12-21 15:40:31.668: E/EH(32131): 	at java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:200)
12-21 15:40:31.668: E/EH(32131): 	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
12-21 15:40:31.668: E/EH(32131): 	at net.sqlcipher.database.SQLiteDatabase.lock(SourceFile:518)
12-21 15:40:31.668: E/EH(32131): 	at net.sqlcipher.database.SQLiteProgram.close(SourceFile:294)
12-21 15:40:31.668: E/EH(32131): 	at net.sqlcipher.database.SQLiteQuery.close(SourceFile:136)
12-21 15:40:31.668: E/EH(32131): 	at net.sqlcipher.database.SQLiteCursor.close(SourceFile:510)
12-21 15:40:31.668: E/EH(32131): 	at android.database.CursorWrapper.close(CursorWrapper.java:50)
12-21 15:40:31.668: E/EH(32131): 	at android.database.CursorWrapper.close(CursorWrapper.java:50)
12-21 15:40:31.668: E/EH(32131): 	at android.content.ContentResolver$CursorWrapperInner.close(ContentResolver.java:2746)
12-21 15:40:31.668: E/EH(32131): 	at android.content.ContentResolver$CursorWrapperInner.finalize(ContentResolver.java:2757)
12-21 15:40:31.668: E/EH(32131): 	at java.lang.Daemons$FinalizerDaemon.doFinalize(Daemons.java:222)
12-21 15:40:31.668: E/EH(32131): 	at java.lang.Daemons$FinalizerDaemon.run(Daemons.java:209)
12-21 15:40:31.668: E/EH(32131): 	at java.lang.Thread.run(Thread.java:762)

 

Resume: Close cursors ASAP. At least on Samsung S8 with Android 7 where the issue have been seen.

简历:尽快关闭游标。至少在三星S8和安卓7上,这一问题已经出现。