【多线程问题】android8.1 cts测试原生bug多线程导致CtsJvmtiRedefineClassesHostTestCases中,testJvmti Fail

1、问题描述

android8.1 cts测试 CtsJvmtiTagging case概率failed,如图

【多线程问题】android8.1 cts测试原生bug多线程导致CtsJvmtiRedefineClassesHostTestCases中,testJvmti Fail

2、分析过程

log如下:

4003  4021 E zygote : No implementation found forvoid art.Main.setTag(java.lang.Object, long) (tried Java_art_Main_setTag andJava_art_Main_setTag__Ljava_lang_Object_2J)

4003  4021 I TestRunner: failed:testTagging(android.jvmti.cts.JvmtiTaggingTest)

4003  4021 I TestRunner: ----- begin exception-----

4003  4021 I TestRunner: java.lang.UnsatisfiedLinkError: No implementation foundfor void art.Main.setTag(java.lang.Object, long) (tried Java_art_Main_setTagand Java_art_Main_setTag__Ljava_lang_Object_2J)

4003  4021 I TestRunner:   at art.Main.setTag(Native Method)

4003  4021 I TestRunner:   atandroid.jvmti.cts.JvmtiTaggingTest.test(JvmtiTaggingTest.java:36)

4003  4021 I TestRunner:   atandroid.jvmti.cts.JvmtiTaggingTest.testTagging(JvmtiTaggingTest.java:65)

4003  4021 I TestRunner:   at java.lang.reflect.Method.invoke(Native Method)

4003  4021 I TestRunner:   atorg.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)

4003  4021 I TestRunner:   atorg.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)

4003  4021 I TestRunner:   atorg.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)

4003  4021 I TestRunner:   atorg.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)

4003  4021 I TestRunner:   at android.support.test.internal.runner.junit4.statement.RunBefores.evaluate(RunBefores.java:80)

4003  4021 I TestRunner:   at org.junit.rules.RunRules.evaluate(RunRules.java:20)

……

4003  4021 I TestRunner: ----- end exception -----

4003  4021 I TestRunner: finished:testTagging(android.jvmti.cts.JvmtiTaggingTest)

从上面log可知UnsatisfiedLinkError错误是在查找art.Main.setTag JNI函数时没有找到发生crash。
这里连接问题机器,对应进程有mmap libctsjvmtiagent.so库到内存,从对应库中是可以找到setTag
函数符号,同时从代码看没有看出明显异常,这里查找setTag函数是通过dlopen打开
libctsjvmtiagent.so库,通过dlsym查找符号,但是dlopen是成功的,dlsym这里返回的是NULL,
才会出现上面的错误。
搭建cts环境,做单包测试,会概率复现问题。在调用dlsym代码添加log,抓取log如下:

4003  4021 ITestRunner: run started: 3 tests

4003  4021 ITestRunner: started: testTagging(android.jvmti.cts.JvmtiTaggingTest)

4003  4003 Ezygote  : [pxy]DoDlOpenhandle = 0xc63afa9d path= /data/data/android.jvmti.cts.tagging/libctsjvmtiagent.so

4003  4003 Ezygote  : [pxy]FindSymbol func =0x94f89471      sym_name =Agent_OnLoad       handle =0xc63afa9d

4003  4003 Ezygote  : [pxy]FindSymbol func =0x94f8967d      sym_name =Agent_OnAttach     handle =0xc63afa9d

4003  4003 Ezygote  : [pxy]FindSymbol func = 0x0          sym_name = Agent_OnUnload       handle =0xc63afa9d

4003  4021 Ezygote  :[pxy][---1---FindCodeForNativeMethod GetState= Runnable"]      native_method = 0x0

4003  4021 Ezygote  : [pxy][---2--FindCodeForNativeMethodGetState= Runnable"]       native_method= 0x0

4003  4021 Ezygote  : [pxy]--agent--FindCodeForNativeMethodInAgents  agent = NULL!!!!!!

4003  4021 Ezygote  :[pxy]--agent--FindCodeForNativeMethodInAgents agent1 = Agent { name="", args="", handle=0xb0723180} agent2 = Agent { name="", args="", handle=0xb0723180 }agent_size = 0

4003  4021 Ezygote  :[pxy][---3---FindCodeForNativeMethod GetState= Runnable"]      native_method = 0x0

4003  4021 Ezygote  :[pxy][---4---FindCodeForNativeMethod GetState= Runnable"]      native_method = 0x0

4003  4021 Ezygote  : [pxy]No implementation found for voidart.Main.setTag(java.lang.Object, long) (tried Java_art_Main_setTag andJava_art_Main_setTag__Ljava_lang_Object_2J)

4003  4021 ITestRunner: failed: testTagging(android.jvmti.cts.JvmtiTaggingTest)

从上面复测结果可以看出,dlopen有正常打开返回libctsjvmtiagent.so地址,但是在调用FindCodeForNativeMethodInAgents 函数时发现没有调用FindSymbol函数查找art.Main.setTag函数,原因是agent list为空。所以找不到对应的lib和handle信息,直接返回空。同时发现这里查找Agent_OnAttach符号和查找art.Main.setTag不是同一个线程执行。这里有多线程的问题么?继续分析,代码如下:

这里是Runtime::Current()->GetAgents()为空即agents_为空,所以这里返回空,导致没有找到symbol,发生crash

  staticvoid* FindCodeForNativeMethodInAgents(ArtMethod* m)REQUIRES_SHARED(Locks::mutator_lock_) {

  std::stringjni_short_name(m->JniShortName());

  std::stringjni_long_name(m->JniLongName());

  for (const ti::Agent&agent : Runtime::Current()->GetAgents()) {

    void* fn= agent.FindSymbol(jni_short_name);

    if (fn !=nullptr) {

      returnfn;

    }

    fn =agent.FindSymbol(jni_long_name);

    if (fn !=nullptr) {

      returnfn;

    }

  }

  return nullptr;

}

继续分log发现在上面代码执行完返回err后,对应的art.Main.setTag函数可以找到,log如下:

4003  4021 ITestRunner: started: testTaggingGC(android.jvmti.cts.JvmtiTaggingTest)

4003  4021 Ezygote  :[pxy][---1---FindCodeForNativeMethod GetState= Runnable"]         native_method = 0x0

4003  4021 Ezygote  :[pxy][---2--FindCodeForNativeMethod GetState= Runnable"]           native_method = 0x0

4003  4021 Ezygote  : [pxy]--agent--FindCodeForNativeMethodInAgents  agent1 = Agent {name="/data/data/android.jvmti.cts.tagging/libctsjvmtiagent.so",args="", handle=0xc63afa9d } agent2 = Agent {name="/data/data/android.jvmti.cts.tagging/libctsjvmtiagent.so",args="", handle=0xc63afa9d } agent_size = 1

4003  4021 Ezygote  : [pxy]--aa--FindCodeForNativeMethodInAgents Foundimplementation for void art.Main.setTag(java.lang.Object, long) (symbol:Java_art_Main_setTag__Ljava_lang_Object_2J) in agent = Agent {name="/data/data/android.jvmti.cts.tagging/libctsjvmtiagent.so",args="", handle=0xc63afa9d }

4003  4021 Ezygote  : [pxy]FindSymbol func = 0x94f93259 sym_name = Java_art_Main_setTag       handle=0xc63afa9d

4003  4021 Ezygote  :[pxy][---3---FindCodeForNativeMethod GetState= Runnable"]         native_method = 0x94f93259

4003  4021 Ezygote  :[pxy][---4---FindCodeForNativeMethod GetState= Runnable"]         native_method = 0x94f93259

4003  4021 Ezygote  :[pxy][---1---FindCodeForNativeMethod GetState= Runnable"]         native_method = 0x0

4003  4021 Ezygote  :[pxy][---2--FindCodeForNativeMethod GetState= Runnable"]           native_method = 0x0

4003  4021 Ezygote  :[pxy]--agent--FindCodeForNativeMethodInAgents agent1 = Agent {name="/data/data/android.jvmti.cts.tagging/libctsjvmtiagent.so",args="", handle=0xc63afa9d } agent2 = Agent {name="/data/data/android.jvmti.cts.tagging/libctsjvmtiagent.so", args="",handle=0xc63afa9d } agent_size = 1

4003  4021 Ezygote  :[pxy]--aa--FindCodeForNativeMethodInAgents Found implementation for longart.Main.getTag(java.lang.Object) (symbol:Java_art_Main_getTag__Ljava_lang_Object_2) in agent = Agent {name="/data/data/android.jvmti.cts.tagging/libctsjvmtiagent.so",args="", handle=0xc63afa9d }

4003  4021 Ezygote  : [pxy]FindSymbol func =0x94f9328d sym_name =Java_art_Main_getTag       handle=0xc63afa9d

4003  4021 Ezygote  :[pxy][---3---FindCodeForNativeMethod GetState= Runnable"]         native_method = 0x94f9328d

4003  4021 Ezygote  :[pxy][---4---FindCodeForNativeMethod GetState= Runnable"]         native_method = 0x94f9328d

4003  4021 ITestRunner: finished: testTaggingGC(android.jvmti.cts.JvmtiTaggingTest)

这里说明获取agents_时执行慢了,与正常log对比,正常log在调用完dlopen操作后很快会执行FindSymbol函数。正常log下:

4210  4227 I TestRunner: run finished: 18 tests, 0failed, 0 ignored

4402  4418 I TestRunner: run started: 3 tests

4402  4418 I TestRunner: started: testTagging(android.jvmti.cts.JvmtiTaggingTest)

4402  4402 E zygote : [pxy]DoDlOpen handle = 0x2c715c7d path =/data/data/android.jvmti.cts.tagging/libctsjvmtiagent.so

4402  4402 E zygote : [pxy]FindSymbol func = 0x8d77a471          sym_name= Agent_OnLoad       handle =0x2c715c7d

4402  4402 E zygote : [pxy]FindSymbol func = 0x8d77a67d          sym_name= Agent_OnAttach     handle =0x2c715c7d

4402  4402 E zygote : [pxy]FindSymbol func = 0x0     sym_name= Agent_OnUnload     handle =0x2c715c7d

4402  4402 E zygote : Unable to find 'Agent_OnUnload' symbol in Agent {name="/data/data/android.jvmti.cts.tagging/libctsjvmtiagent.so",args="", handle=0x2c715c7d }

4402  4418 E zygote : [pxy]--aa--FindCodeForNativeMethodInAgentsFound implementation for void art.Main.setTag(java.lang.Object, long) (symbol:Java_art_Main_setTag__Ljava_lang_Object_2J) in agent = Agent {name="/data/data/android.jvmti.cts.tagging/libctsjvmtiagent.so",args="", handle=0x2c715c7d }

4402  4418 E zygote : [pxy]FindSymbol func =0x8d784259          sym_name = Java_art_Main_setTag      handle =0x2c715c7d

4402  4418 E zygote : [pxy]--aa--FindCodeForNativeMethodInAgents Found implementation forlong art.Main.getTag(java.lang.Object) (symbol:Java_art_Main_getTag__Ljava_lang_Object_2) in agent = Agent {name="/data/data/android.jvmti.cts.tagging/libctsjvmtiagent.so",args="", handle=0x2c715c7d }

4402  4418 E zygote : [pxy]FindSymbol func = 0x8d78428d          sym_name= Java_art_Main_getTag       handle=0x2c715c7d

4402  4418 I TestRunner: finished:testTagging(android.jvmti.cts.JvmtiTaggingTest)

4402  4418 I TestRunner: started:testTaggingGC(android.jvmti.cts.JvmtiTaggingTest)

4402  4418 I TestRunner: finished:testTaggingGC(android.jvmti.cts.JvmtiTaggingTest)

经过log对比看,问题锁定在获取agents的流程上没有执行完,同时这里从log中可以看出打开lib库是4402主线程android.jvmti.cts.tagging执行的,而之后findSymbol操作时通过4418子线程TestRunner执行。说明这里存在多线程问题。从上面log看4402主线程中执行DoLoadHelper操作没有完成,4402线程执行完dlopen后就切换到4418运行执行FindCodeForNativeMethodInAgents操作获取agents_,而4402线程没有执行完DoLoadHelper函数流程在callback函数执行前时间片到了cpu就切换到了4418上运行,而这时agents_还没有被赋值,使4418线程在读取agents_数据时为空,发生crash。至此问题rootcause找到。

【代码流程分析】

我们这里以单条测试case为例,cts测试异常case如下

cts-tf > run cts -m CtsJvmtiTaggingHostTestCases –t android.jvmti.cts.JvmtiHostTestTagging#testJvmti

由这个测试case可知测试文件包名和文件名

cts/hostsidetests/jvmti/tagging/AndroidTest.xml

       <option name="jar"value="CtsJvmtiTaggingHostTestCases.jar" />

       <option name="set-option"value="test-file-name:CtsJvmtiTaggingDeviceApp.apk" />

       <option name="set-option"value="package-name:android.jvmti.cts.tagging" />

测试appCtsJvmtiTaggingDeviceApp.apk代码路径                   

cts/hostsidetests/jvmti/tagging/app/src/android/jvmti/cts/JvmtiTaggingTest.java

编译选项

LOCAL_JNI_SHARED_LIBRARIES:= libctsjvmtiagent 

LOCAL_PACKAGE_NAME :=CtsJvmtiTaggingDeviceApp

测试app 进程名为包名android.jvmti.cts.tagging

cts/hostsidetests/jvmti/tagging/app/AndroidManifest.xml:19:   package="android.jvmti.cts.tagging">

 

下面是执行TestRunner测试具体的case testJvmti

mTestPackageName =android.jvmti.cts.tagging

mTestApk =CtsJvmtiTaggingDeviceApp.apk

代码路径:cts/hostsidetests/jvmti/base/host/src/android/jvmti/cts/JvmtiHostTest.java

72    public void testJvmti()throws Exception {

......

92        RemoteAndroidTestRunner runner = newRemoteAndroidTestRunner(mTestPackageName, RUNNER,device.getIDevice());

95       runner.setMaxTimeToOutputResponse(2,TimeUnit.MINUTES);

97       TestResults tr = new TestResults(new AttachAgent(device, mTestPackageName,mTestApk));//这里会attach对应的libctsjvmtiagent.so

99        device.runInstrumentationTests(runner,tr);//这里会启动一个线程TestRunner,之后会去调用dlsym查找相关符号

......

103    }

我们先来看下主线程是如何调用AttachAgent来进行对agent初始化的。

111    private class AttachAgent implementsRunnable {

112        private ITestDevice mDevice;

113        private String mPkg;

114        private String mApk;

115

116        public AttachAgent(ITestDevice device,String pkg, String apk) {

117            this.mDevice = device;

118            this.mPkg = pkg;

119            this.mApk = apk;

120        }

121

122        @Override

123        public void run(){//主线程的启动

124            try {

……

133                     //libctsjvmtiagent.so放到data/data/android.jvmti.cts.tagging的目录下

134        StringagentInDataData = installLibToDataData(pwd, "libctsjvmtiagent.so");

135

136       String attachCmd = "cmd activity attach-agent " + mPkg + "" + agentInDataData;

137       String attachReply =mDevice.executeShellCommand(attachCmd);//这里运行上面的命令,通过AMS调用runtime里面的AttachAgent代码执行

 

代码路径:

frameworks/base/services/core/java/com/android/server/am/ActivityManagerShellCommand.java

132    public int onCommand(String cmd) {

......

242                case "attach-agent":

243                    return runAttachAgent(pw);

代码路径

frameworks/base/services/core/java/com/android/server/am/ActivityManagerShellCommand.java

2553    int runAttachAgent(PrintWriter pw) {

……

2557        String process = getNextArgRequired();

2558        String agent = getNextArgRequired();

......

2564        mInternal.attachAgent(process,agent);

2565        return 0;

2566    }


frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java

24890   public void attachAgent(String process, String path) {

......

24905               proc.thread.attachAgent(path);

 

frameworks/base/core/java/android/app/ActivityThread.java

1102       public void attachAgent(String agent) {

1103           sendMessage(H.ATTACH_AGENT, agent);

1104        }

通过消息处理会调用下面函数

1896               case ATTACH_AGENT:

1897                    handleAttachAgent((String) msg.obj);

……

3225   static final void handleAttachAgent(String agent) {

3226       try {

3227           VMDebug.attachAgent(agent);//这里会去调用runtime中的AttachAgent

3228        }

art/runtime/native/dalvik_system_VMDebug.cc

          NATIVE_METHOD(VMDebug,attachAgent, "(Ljava/lang/String;)V"),

         

511static void VMDebug_attachAgent(JNIEnv* env,jclass, jstring agent) {

......

526   ScopedUtfChars chars(env, agent);

530   filename = chars.c_str();

533 Runtime::Current()->AttachAgent(filename);//这里会调用runtime.cc中的AttachAgent

534}

runtime中的AttachAgent会调用agent.Attachdlopen对应的jni层的动态库,并初始化JNI设置环境变量等参数

void Runtime::AttachAgent(const std::string&agent_arg) {

  std::stringerror_msg;

......

 

  ti::Agent agent(agent_arg);

  int res =0;

 ti::Agent::LoadError result = agent.Attach(&res, &error_msg);//这里会调用attach函数,间接调用dlopenAgent_OnAttach设置jni参数

 

  if (result== ti::Agent::kNoError) {

    agents_.push_back(std::move(agent));//这里会给agents_赋值,及设置libso路径,和so通过dlopen获取的handle


art\runtime\ti\agent.h

  LoadErrorAttach(/*out*/jint* call_res, /*out*/std::string* error_msg) {

    VLOG(agents) <<"Attaching agent: " << name_ << " " <<args_;

    return DoLoadHelper(true, call_res, error_msg);

  }

 

  Agent::LoadErrorAgent::DoLoadHelper(bool attaching,

                                    /*out*/jint* call_res,

                                    /*out*/std::string* error_msg) {

       ...... 

       LoadError err = DoDlOpen(error_msg);//dlopen打开libctsjvmtiagent.so

       AgentOnLoadFunctioncallback = attaching ? onattach_ : onload_;

       ......

       std::unique_ptr<char[]>copied_args(new char[args_.size() + 1]);

    strlcpy(copied_args.get(),args_.c_str(), args_.size() + 1);

    //这里的callback就是调用Agent_OnAttach

    *call_res=callback(Runtime::Current()->GetJavaVM(),copied_args.get(),

                      nullptr);

而上面log可知就是在这里主线程被调度出去了,导致agents_变量没有初始化完成,attach流程没走完,导致子线程读取agents_变量失败,发生crash。

至此问题分析完毕,修改需要进行同步保护,由模块提交google issues,google后续更新cts代码。