【多线程问题】android8.1 cts测试原生bug多线程导致CtsJvmtiRedefineClassesHostTestCases中,testJvmti Fail
1、问题描述
android8.1 cts测试 CtsJvmtiTagging case概率failed,如图
2、分析过程
log如下:
4003 4021 E zygote : No implementation found forvoid art.Main.setTag(java.lang.Object, long) (tried Java_art_Main_setTag andJava_art_Main_setTag__Ljava_lang_Object_2J) 4003 4021 I TestRunner: failed:testTagging(android.jvmti.cts.JvmtiTaggingTest) 4003 4021 I TestRunner: ----- begin exception----- 4003 4021 I TestRunner: java.lang.UnsatisfiedLinkError: No implementation foundfor void art.Main.setTag(java.lang.Object, long) (tried Java_art_Main_setTagand Java_art_Main_setTag__Ljava_lang_Object_2J) 4003 4021 I TestRunner: at art.Main.setTag(Native Method) 4003 4021 I TestRunner: atandroid.jvmti.cts.JvmtiTaggingTest.test(JvmtiTaggingTest.java:36) 4003 4021 I TestRunner: atandroid.jvmti.cts.JvmtiTaggingTest.testTagging(JvmtiTaggingTest.java:65) 4003 4021 I TestRunner: at java.lang.reflect.Method.invoke(Native Method) 4003 4021 I TestRunner: atorg.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) 4003 4021 I TestRunner: atorg.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) 4003 4021 I TestRunner: atorg.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) 4003 4021 I TestRunner: atorg.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) 4003 4021 I TestRunner: at android.support.test.internal.runner.junit4.statement.RunBefores.evaluate(RunBefores.java:80) 4003 4021 I TestRunner: at org.junit.rules.RunRules.evaluate(RunRules.java:20) …… 4003 4021 I TestRunner: ----- end exception ----- 4003 4021 I TestRunner: finished:testTagging(android.jvmti.cts.JvmtiTaggingTest) |
从上面log可知UnsatisfiedLinkError错误是在查找art.Main.setTag JNI函数时没有找到发生crash。
这里连接问题机器,对应进程有mmap libctsjvmtiagent.so库到内存,从对应库中是可以找到setTag
函数符号,同时从代码看没有看出明显异常,这里查找setTag函数是通过dlopen打开
libctsjvmtiagent.so库,通过dlsym查找符号,但是dlopen是成功的,dlsym这里返回的是NULL,
才会出现上面的错误。
搭建cts环境,做单包测试,会概率复现问题。在调用dlsym代码添加log,抓取log如下:
4003 4021 ITestRunner: run started: 3 tests 4003 4021 ITestRunner: started: testTagging(android.jvmti.cts.JvmtiTaggingTest) 4003 4003 Ezygote : [pxy]DoDlOpenhandle = 0xc63afa9d path= /data/data/android.jvmti.cts.tagging/libctsjvmtiagent.so 4003 4003 Ezygote : [pxy]FindSymbol func =0x94f89471 sym_name =Agent_OnLoad handle =0xc63afa9d 4003 4003 Ezygote : [pxy]FindSymbol func =0x94f8967d sym_name =Agent_OnAttach handle =0xc63afa9d 4003 4003 Ezygote : [pxy]FindSymbol func = 0x0 sym_name = Agent_OnUnload handle =0xc63afa9d 4003 4021 Ezygote :[pxy][---1---FindCodeForNativeMethod GetState= Runnable"] native_method = 0x0 4003 4021 Ezygote : [pxy][---2--FindCodeForNativeMethodGetState= Runnable"] native_method= 0x0 4003 4021 Ezygote : [pxy]--agent--FindCodeForNativeMethodInAgents agent = NULL!!!!!! 4003 4021 Ezygote :[pxy]--agent--FindCodeForNativeMethodInAgents agent1 = Agent { name="", args="", handle=0xb0723180} agent2 = Agent { name="", args="", handle=0xb0723180 }agent_size = 0 4003 4021 Ezygote :[pxy][---3---FindCodeForNativeMethod GetState= Runnable"] native_method = 0x0 4003 4021 Ezygote :[pxy][---4---FindCodeForNativeMethod GetState= Runnable"] native_method = 0x0 4003 4021 Ezygote : [pxy]No implementation found for voidart.Main.setTag(java.lang.Object, long) (tried Java_art_Main_setTag andJava_art_Main_setTag__Ljava_lang_Object_2J) 4003 4021 ITestRunner: failed: testTagging(android.jvmti.cts.JvmtiTaggingTest) |
从上面复测结果可以看出,dlopen有正常打开返回libctsjvmtiagent.so地址,但是在调用FindCodeForNativeMethodInAgents 函数时发现没有调用FindSymbol函数查找art.Main.setTag函数,原因是agent list为空。所以找不到对应的lib和handle信息,直接返回空。同时发现这里查找Agent_OnAttach符号和查找art.Main.setTag不是同一个线程执行。这里有多线程的问题么?继续分析,代码如下:
这里是Runtime::Current()->GetAgents()为空即agents_为空,所以这里返回空,导致没有找到symbol,发生crash staticvoid* FindCodeForNativeMethodInAgents(ArtMethod* m)REQUIRES_SHARED(Locks::mutator_lock_) { std::stringjni_short_name(m->JniShortName()); std::stringjni_long_name(m->JniLongName()); for (const ti::Agent&agent : Runtime::Current()->GetAgents()) { void* fn= agent.FindSymbol(jni_short_name); if (fn !=nullptr) { returnfn; } fn =agent.FindSymbol(jni_long_name); if (fn !=nullptr) { returnfn; } } return nullptr; } |
4003 4021 ITestRunner: started: testTaggingGC(android.jvmti.cts.JvmtiTaggingTest) 4003 4021 Ezygote :[pxy][---1---FindCodeForNativeMethod GetState= Runnable"] native_method = 0x0 4003 4021 Ezygote :[pxy][---2--FindCodeForNativeMethod GetState= Runnable"] native_method = 0x0 4003 4021 Ezygote : [pxy]--agent--FindCodeForNativeMethodInAgents agent1 = Agent {name="/data/data/android.jvmti.cts.tagging/libctsjvmtiagent.so",args="", handle=0xc63afa9d } agent2 = Agent {name="/data/data/android.jvmti.cts.tagging/libctsjvmtiagent.so",args="", handle=0xc63afa9d } agent_size = 1 4003 4021 Ezygote : [pxy]--aa--FindCodeForNativeMethodInAgents Foundimplementation for void art.Main.setTag(java.lang.Object, long) (symbol:Java_art_Main_setTag__Ljava_lang_Object_2J) in agent = Agent {name="/data/data/android.jvmti.cts.tagging/libctsjvmtiagent.so",args="", handle=0xc63afa9d } 4003 4021 Ezygote : [pxy]FindSymbol func = 0x94f93259 sym_name = Java_art_Main_setTag handle=0xc63afa9d 4003 4021 Ezygote :[pxy][---3---FindCodeForNativeMethod GetState= Runnable"] native_method = 0x94f93259 4003 4021 Ezygote :[pxy][---4---FindCodeForNativeMethod GetState= Runnable"] native_method = 0x94f93259 4003 4021 Ezygote :[pxy][---1---FindCodeForNativeMethod GetState= Runnable"] native_method = 0x0 4003 4021 Ezygote :[pxy][---2--FindCodeForNativeMethod GetState= Runnable"] native_method = 0x0 4003 4021 Ezygote :[pxy]--agent--FindCodeForNativeMethodInAgents agent1 = Agent {name="/data/data/android.jvmti.cts.tagging/libctsjvmtiagent.so",args="", handle=0xc63afa9d } agent2 = Agent {name="/data/data/android.jvmti.cts.tagging/libctsjvmtiagent.so", args="",handle=0xc63afa9d } agent_size = 1 4003 4021 Ezygote :[pxy]--aa--FindCodeForNativeMethodInAgents Found implementation for longart.Main.getTag(java.lang.Object) (symbol:Java_art_Main_getTag__Ljava_lang_Object_2) in agent = Agent {name="/data/data/android.jvmti.cts.tagging/libctsjvmtiagent.so",args="", handle=0xc63afa9d } 4003 4021 Ezygote : [pxy]FindSymbol func =0x94f9328d sym_name =Java_art_Main_getTag handle=0xc63afa9d 4003 4021 Ezygote :[pxy][---3---FindCodeForNativeMethod GetState= Runnable"] native_method = 0x94f9328d 4003 4021 Ezygote :[pxy][---4---FindCodeForNativeMethod GetState= Runnable"] native_method = 0x94f9328d 4003 4021 ITestRunner: finished: testTaggingGC(android.jvmti.cts.JvmtiTaggingTest) |
这里说明获取agents_时执行慢了,与正常log对比,正常log在调用完dlopen操作后很快会执行FindSymbol函数。正常log如下:
4210 4227 I TestRunner: run finished: 18 tests, 0failed, 0 ignored 4402 4418 I TestRunner: run started: 3 tests 4402 4418 I TestRunner: started: testTagging(android.jvmti.cts.JvmtiTaggingTest) 4402 4402 E zygote : [pxy]DoDlOpen handle = 0x2c715c7d path =/data/data/android.jvmti.cts.tagging/libctsjvmtiagent.so 4402 4402 E zygote : [pxy]FindSymbol func = 0x8d77a471 sym_name= Agent_OnLoad handle =0x2c715c7d 4402 4402 E zygote : [pxy]FindSymbol func = 0x8d77a67d sym_name= Agent_OnAttach handle =0x2c715c7d 4402 4402 E zygote : [pxy]FindSymbol func = 0x0 sym_name= Agent_OnUnload handle =0x2c715c7d 4402 4402 E zygote : Unable to find 'Agent_OnUnload' symbol in Agent {name="/data/data/android.jvmti.cts.tagging/libctsjvmtiagent.so",args="", handle=0x2c715c7d } 4402 4418 E zygote : [pxy]--aa--FindCodeForNativeMethodInAgentsFound implementation for void art.Main.setTag(java.lang.Object, long) (symbol:Java_art_Main_setTag__Ljava_lang_Object_2J) in agent = Agent {name="/data/data/android.jvmti.cts.tagging/libctsjvmtiagent.so",args="", handle=0x2c715c7d } 4402 4418 E zygote : [pxy]FindSymbol func =0x8d784259 sym_name = Java_art_Main_setTag handle =0x2c715c7d 4402 4418 E zygote : [pxy]--aa--FindCodeForNativeMethodInAgents Found implementation forlong art.Main.getTag(java.lang.Object) (symbol:Java_art_Main_getTag__Ljava_lang_Object_2) in agent = Agent {name="/data/data/android.jvmti.cts.tagging/libctsjvmtiagent.so",args="", handle=0x2c715c7d } 4402 4418 E zygote : [pxy]FindSymbol func = 0x8d78428d sym_name= Java_art_Main_getTag handle=0x2c715c7d 4402 4418 I TestRunner: finished:testTagging(android.jvmti.cts.JvmtiTaggingTest) 4402 4418 I TestRunner: started:testTaggingGC(android.jvmti.cts.JvmtiTaggingTest) 4402 4418 I TestRunner: finished:testTaggingGC(android.jvmti.cts.JvmtiTaggingTest) |
经过log对比看,问题锁定在获取agents的流程上没有执行完,同时这里从log中可以看出打开lib库是4402主线程android.jvmti.cts.tagging执行的,而之后findSymbol操作时通过4418子线程TestRunner执行。说明这里存在多线程问题。从上面log看4402主线程中执行DoLoadHelper操作没有完成,4402线程执行完dlopen后就切换到4418运行执行FindCodeForNativeMethodInAgents操作获取agents_,而4402线程没有执行完DoLoadHelper函数流程在callback函数执行前时间片到了cpu就切换到了4418上运行,而这时agents_还没有被赋值,使4418线程在读取agents_数据时为空,发生crash。至此问题rootcause找到。
【代码流程分析】
我们这里以单条测试case为例,cts测试异常case如下
cts-tf > run cts -m CtsJvmtiTaggingHostTestCases –t android.jvmti.cts.JvmtiHostTestTagging#testJvmti
由这个测试case可知测试文件包名和文件名 cts/hostsidetests/jvmti/tagging/AndroidTest.xml <option name="jar"value="CtsJvmtiTaggingHostTestCases.jar" /> <option name="set-option"value="test-file-name:CtsJvmtiTaggingDeviceApp.apk" /> <option name="set-option"value="package-name:android.jvmti.cts.tagging" /> 测试app的CtsJvmtiTaggingDeviceApp.apk代码路径 cts/hostsidetests/jvmti/tagging/app/src/android/jvmti/cts/JvmtiTaggingTest.java 编译选项 LOCAL_JNI_SHARED_LIBRARIES:= libctsjvmtiagent LOCAL_PACKAGE_NAME :=CtsJvmtiTaggingDeviceApp 测试app 进程名为包名android.jvmti.cts.tagging cts/hostsidetests/jvmti/tagging/app/AndroidManifest.xml:19: package="android.jvmti.cts.tagging">
下面是执行TestRunner测试具体的case testJvmti mTestPackageName =android.jvmti.cts.tagging mTestApk =CtsJvmtiTaggingDeviceApp.apk 代码路径:cts/hostsidetests/jvmti/base/host/src/android/jvmti/cts/JvmtiHostTest.java 72 public void testJvmti()throws Exception { ...... 92 RemoteAndroidTestRunner runner = newRemoteAndroidTestRunner(mTestPackageName, RUNNER,device.getIDevice()); 95 runner.setMaxTimeToOutputResponse(2,TimeUnit.MINUTES); 97 TestResults tr = new TestResults(new AttachAgent(device, mTestPackageName,mTestApk));//这里会attach对应的libctsjvmtiagent.so 99 device.runInstrumentationTests(runner,tr);//这里会启动一个线程TestRunner,之后会去调用dlsym查找相关符号 ...... 103 } |
我们先来看下主线程是如何调用AttachAgent来进行对agent初始化的。
111 private class AttachAgent implementsRunnable { 112 private ITestDevice mDevice; 113 private String mPkg; 114 private String mApk; 115 116 public AttachAgent(ITestDevice device,String pkg, String apk) { 117 this.mDevice = device; 118 this.mPkg = pkg; 119 this.mApk = apk; 120 } 121 122 @Override 123 public void run(){//主线程的启动 124 try { …… 133 //将libctsjvmtiagent.so放到data/data/android.jvmti.cts.tagging的目录下 134 StringagentInDataData = installLibToDataData(pwd, "libctsjvmtiagent.so"); 135 136 String attachCmd = "cmd activity attach-agent " + mPkg + "" + agentInDataData; 137 String attachReply =mDevice.executeShellCommand(attachCmd);//这里运行上面的命令,通过AMS调用runtime里面的AttachAgent代码执行
代码路径: frameworks/base/services/core/java/com/android/server/am/ActivityManagerShellCommand.java 132 public int onCommand(String cmd) { ...... 242 case "attach-agent": 243 return runAttachAgent(pw); 代码路径 frameworks/base/services/core/java/com/android/server/am/ActivityManagerShellCommand.java 2553 int runAttachAgent(PrintWriter pw) { …… 2557 String process = getNextArgRequired(); 2558 String agent = getNextArgRequired(); ...... 2564 mInternal.attachAgent(process,agent); 2565 return 0; 2566 }
frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java 24890 public void attachAgent(String process, String path) { ...... 24905 proc.thread.attachAgent(path);
frameworks/base/core/java/android/app/ActivityThread.java 1102 public void attachAgent(String agent) { 1103 sendMessage(H.ATTACH_AGENT, agent); 1104 } 通过消息处理会调用下面函数 1896 case ATTACH_AGENT: 1897 handleAttachAgent((String) msg.obj); …… 3225 static final void handleAttachAgent(String agent) { 3226 try { 3227 VMDebug.attachAgent(agent);//这里会去调用runtime中的AttachAgent 3228 } art/runtime/native/dalvik_system_VMDebug.cc NATIVE_METHOD(VMDebug,attachAgent, "(Ljava/lang/String;)V"),
511static void VMDebug_attachAgent(JNIEnv* env,jclass, jstring agent) { ...... 526 ScopedUtfChars chars(env, agent); 530 filename = chars.c_str(); 533 Runtime::Current()->AttachAgent(filename);//这里会调用runtime.cc中的AttachAgent 534} runtime中的AttachAgent会调用agent.Attach来dlopen对应的jni层的动态库,并初始化JNI设置环境变量等参数 void Runtime::AttachAgent(const std::string&agent_arg) { std::stringerror_msg; ......
ti::Agent agent(agent_arg); int res =0; ti::Agent::LoadError result = agent.Attach(&res, &error_msg);//这里会调用attach函数,间接调用dlopen和Agent_OnAttach设置jni参数
if (result== ti::Agent::kNoError) { agents_.push_back(std::move(agent));//这里会给agents_赋值,及设置libso路径,和so通过dlopen获取的handle值
art\runtime\ti\agent.h LoadErrorAttach(/*out*/jint* call_res, /*out*/std::string* error_msg) { VLOG(agents) <<"Attaching agent: " << name_ << " " <<args_; return DoLoadHelper(true, call_res, error_msg); }
Agent::LoadErrorAgent::DoLoadHelper(bool attaching, /*out*/jint* call_res, /*out*/std::string* error_msg) { ...... LoadError err = DoDlOpen(error_msg);//dlopen打开libctsjvmtiagent.so AgentOnLoadFunctioncallback = attaching ? onattach_ : onload_; ...... std::unique_ptr<char[]>copied_args(new char[args_.size() + 1]); strlcpy(copied_args.get(),args_.c_str(), args_.size() + 1); //这里的callback就是调用Agent_OnAttach *call_res=callback(Runtime::Current()->GetJavaVM(),copied_args.get(), nullptr); 而上面log可知就是在这里主线程被调度出去了,导致agents_变量没有初始化完成,attach流程没走完,导致子线程读取agents_变量失败,发生crash。 |
至此问题分析完毕,修改需要进行同步保护,由模块提交google issues,google后续更新cts代码。