作为控制字符上的WSDL扼流圈消耗的Coldfusion Web服务

问题描述:

客户端以.doc格式将文件上载到服务器目录,并且使用POI按照Ray Camdens发布的内容抽取文本here内容保存在文本/备注字段中在MySQL数据库中,并作为wsdl使用的Web服务提供。所有这些都按预期工作,直到Web服务的使用者访问包含某些(我假定)控制字符的记录,此时Web服务会抛出500错误。作为控制字符上的WSDL扼流圈消耗的Coldfusion Web服务

在数据库中,问题行似乎有控制字符,并且当Firefox中显示文本字段时也有奇怪的字符。 enter image description here 的web服务,只是返回的返回类型的CF查询=“任何”,并为

​​

我相信在WSDL无法发送这些字符调用,所以是有办法对其进行编码,还是我只是必须使用正则表达式或其他东西去除它们?

<cfcomponent> 
    <cffunction output="false" access="remote" returntype="any" name="getPendingReferrals"> 
     <cfargument required="false" name="userName" type="string"/> 
     <cfargument required="false" name="password" type="string"/> 
     <cfargument required="false" name="maxrows" type="numeric" default="20"/>  
     <cfset var q=""> 

<cfinvoke component="cfcs.security" method="checkAuthenticated" returnvariable="checkAuth"> 
    <cfinvokeargument name="username" value="#arguments.userName#"> 
    <cfinvokeargument name="password" value="#arguments.password#"> 
</cfinvoke> 


<cfif checkAuth.authenticates is "true"> 
    <!--- log the login ---> 
    <cfset filename=#datepart("yyyy", now())#&#datepart("m", now())#&#datepart("d", now())#&"loginlog.txt"> 
    <CFSET OUTFILE = "#application.Root#"&"logs\"&"#filename#"> 
    <cfif #FileExists(OUTFILE)# is "Yes"> 
     <cffile action="append" file="#OUTFILE#" output="#checkAuth.userName#, #now()#, #remote_addr#, #Left(http_user_agent, 50)#"> 
    <cfelse> 
     <CFFILE action="write" output="#checkAuth.userName#, #now()#, #remote_addr#, #Left(http_user_agent, 50)#" file="#OUTFILE#"> 
    </cfif> 


     <cfif checkAuth.organisationID is 1> 
      <cfset toStr="toID=1"> 
     <cfelseif checkAuth.organisationID is 28> 
      <cfset toStr="(toID=28 OR toID=29)"> 
     </cfif> 

     <cfquery name="q" datasource='mySqlData' maxrows=#arguments.maxrows#> 
      SELECT messages.messageID, messages.toID, messages.fromID AS referrerID, (SELECT CONCAT(title, ' ',firstName, ' ', lastname) FROM users WHERE users.userID = messages.fromID) as referrerName,messages.threadID, messages.messageBody, messages.dateCreated, messages.dateSent, 
      messages.deleted, messages.createdByID, (SELECT CONCAT(title, ' ',firstName, ' ', lastname) FROM users WHERE users.userID = messages.createdByID) as createdByName, (SELECT organisationName FROM organisations WHERE messages.originatingOrganisationID = organisations.organisationID) as originatingOrganisationName, messages.originatingOrganisationID, messages.viewed, messages.referral, messages.actioned, messages.patientID, messages.refTypeID, messages.specialtyID, organisations.organisationName AS toOrganisationName, patients.nhsNumber AS patientNHSnumber, patients.patientTitle, patients.patientLastname, patients.patientFirstname, patients.patientDOB, patients.address1 as patientAddress1, patients.address2 AS patientAddress2, patients.address3 AS patientAddress3, patients.address4 AS patientAddress4, patients.postcode AS patientPostcode, patients.patientPhone1 
      FROM users INNER JOIN (organisations INNER JOIN (patients INNER JOIN messages ON patients.patientID = messages.patientID) ON organisations.organisationID = messages.toID) ON users.userID = messages.fromID 
      WHERE #toStr# 
      AND NOT actioned 
      AND NOT originatingOrganisationID=3 
      ORDER BY messageID 
      </cfquery> 


      <cfif isQuery(q)> 
       <cfreturn q> 
      <cfelse> 
       <cfreturn "Error : in query"> 
      </cfif> 



<cfelse> 
    <cfreturn "Error : failed to authenticate"> 
</cfif>  

</cffunction> 
+0

这将有助于看到getPendingReferrals方法的代码很可能你将不得不为XmlFormat数据,以便它是有效的XML – 2011-03-10 20:35:19

+0

我把getPendingReferrals.cfc放了起来。它真的只是返回一个查询,它的内容在messageBody字段导致问题。 – Saul 2011-03-10 23:56:27

您应该使用正则表达式去除所有高ascii字符。我发现的最好的之一是written up by Ben Nadel, here。 (虽然它不是完美的,I made some improvements to it in the comments

基本上,如果你只是想剥离出高ASCII字符,这样做:

<cfset result = reReplace(messageBody, "[^\x20-\x7E\x0D\x09]", "", "all") /> 

这个正则表达式采用白名单方式,只允许可打印的字符保留:

  • \x20-\x7E = {space}! “#$%&”()* +, -/0-9:;?< => @ AZ [\]^_`AZ {|}〜
  • \x0D =回车
  • \x09 =水平制表

如果你喜欢这种方法来消毒,可以使用肖恩·柯尼的方法,用循环更新查询:

<cfloop query="q"> 
    <cfset querySetCell(
     q, 
     "messageBody", 
     clean(q.messageBody[q.currentRow]), 
     q.currentRow 
    )/> 
</cfloop> 
<cffunction name="clean"> 
    <cfargument name="in" /> 
    <cfreturn reReplace(arguments.in, "[^\x20-\x7E\x0D\x09]", "", "all") /> 
</cffunction> 
+0

亚当的解决方案+1。它肯定会比单纯使用xmlFormat更安全 – 2011-03-11 15:24:12

+0

它是一个爱情节,+ 1的全面感谢,亚当和肖恩 – Saul 2011-03-11 17:42:16

肖恩被提,你将需要避免所有形式的特殊字符得到有效的XML - 看看http://www.petefreitag.com/item/202.cfm例如

这是不理想的,但你可以尝试这样的:

<cfloop query="q"> 
    <cfset querySetCell(q,"messageBody",xmlFormat(q.messageBody[q.currentRow]),q.currentRow) /> 
</cfloop> 

如果xmlFormat无法删除所有字符(它已知错过了几个),您可能需要编写一个手动方法将它们删除。