Solr的FieldStreamDataSource抛出 “不支持的类型:字符串”(NOT NULL)

问题描述:

(编辑:在Ubuntu Solr的-6.6.0)Solr的FieldStreamDataSource抛出 “不支持的类型:字符串”(NOT NULL)

我试图使用Solr的DataImportHandler索引MySQL数据库,其中包括与BLOB项RTF文件。对于这个我使用FieldStreamDataSource,在这些答案中指定:

How do I index Rich Format Documents in Blobs

Unsupported type Exception on Importing Documents from Database

其它所有的非BLOB字段被索引,我不断收到以下Java 运行时错误来自FieldStreamDataSource.getData()方法,如Solr日志文件中所示:

引起来自:java.lang.RuntimeExce ption:不支持的类型:类在 org.apache.solr.handler.dataimport.FieldStreamDataSource.getData(FieldStreamDataSource.java:77) java.lang.String中 在 org.apache.solr.handler.dataimport.FieldStreamDataSource.getData (FieldStreamDataSource.java:47) 在 org.apache.solr.handler.dataimport.DebugLogger $ 2.getData(DebugLogger.java:187)

这是指Java class的这些线(见箭头'< ==='):

@Override 
    public InputStream getData(String query) { 
    Object o = wrapper.getVariableResolver().resolve(dataField); 
    if (o == null) { 
     throw new DataImportHandlerException(SEVERE, "No field available for name : " + dataField); 
    } else if (o instanceof Blob) {          // <========= XXX 
     Blob blob = (Blob) o; 
     try { 
     return blob.getBinaryStream(); 
     } catch (SQLException sqle) { 
     LOG.info("Unable to get data from BLOB"); 
     return null; 
     } 
    } else if (o instanceof byte[]) { 
     byte[] bytes = (byte[]) o; 
     return new ByteArrayInputStream(bytes); 
    } else { 
     throw new RuntimeException("unsupported type : " + o.getClass()); // <========= XXX 
    } 

这应该意味着getData()方法没有获得Blob类型,而是一个String。

我知道它已经在许多其他线程有人建议,当BLOB值在数据库空这个错误出现了:

Unsupported type Exception on Importing Documents from Database

Solr DIH Throwing Error Unsupported Type Class Java.Lang.String

然而,这是不是这里的情况,因为我在此测试数据库中只有5个条目,其中非空值。 此外,我检查从Solr的DIH接口,这表明Blob值确实是从数据库中检索输出调试(见箭头“< ===”):

"verbose-output": [ 
     "entity:reports1", 
     [ 
      "document#1", 
      [ 
      "query", 
      "SELECT id, institute, exam_date, age, acc, pacs FROM reports", 
      "time-taken", 
      "0:0:0.8", 
      null, 
      "----------- row #1-------------", 
      "id", 
      "1", 
      "institute", 
      "RADIOLOGY", 
      "age", 
      "68", 
      "acc", 
      "165184654", 
      "pacs", 
      "233215", 
      "exame_date", 
      "2016-02-05T00:00:00Z", 
      null, 
      "---------------------------------------------", 
      "entity:reports2", 
      [ 
       "query", 
       "SELECT report FROM reports WHERE id='1'", 
       "time-taken", 
       "0:0:0.6", 
       null, 
       "----------- row #1-------------", 
       "report",     // <========= COLUMN NAME RETURNED FROM THE SQL SELECT 
       "e1xydGYxXGFkZWZ[...]", // <========= VALUE RETURNED FROM THE SQL SELECT 
       null, 
       "---------------------------------------------", 
       "entity:report", 
       [ 
       "query", 
       "report", 
       "EXCEPTION",    // <========== EXCEPTION THROWN 
       "java.lang.RuntimeException: unsupported type : class java.lang.String\n\tat org.apache.solr.handler.dataimport.FieldStreamDataSource.getData(FieldStreamDataSource.java:77)\n\tat org.apache.solr.handler.dataimport.FieldStreamDataSource.getData(FieldStreamDataSource.java:47)\n\tat org.apache.solr.handler.dataimport.DebugLogger$2.getData(DebugLogger.java:187)\n\tat org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:128)\n\tat org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)\n\tat org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:475)\n\tat org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:516)\n\tat org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:516)\n\tat org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)\n\tat org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329)\n\tat org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)\n\tat org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:415)\n\tat org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:474)\n\tat org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:180)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)\n\tat org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)\n\tat org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:534)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\n\tat org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)\n\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)\n\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)\n\tat java.lang.Thread.run(Thread.java:748)\n", 
        "time-taken", 
        "0:0:0.0" 
        ] 
       ] 
       ], 

这里是我的数据-config.xml中

<dataConfig> 

     <dataSource 
      name="db" 
      type="JdbcDataSource" 
      driver="com.mysql.jdbc.Driver" 
      url="jdbc:mysql://localhost:3306/RIS" 
      user="root" 
      password="********"/> 

     <dataSource name="fieldStream" type="FieldStreamDataSource"/> 

     <document> 
      <entity 
       name="reports1" 
       query="SELECT id, institute, exam_date, age, acc, pacs FROM reports" 
       dataSource="db" 
      > 
       <field column="id"   name="id"/> 
       <field column="institute" name="institute"/> 
       <field column="exam_date" name="exam_date"/> 
       <field column="age"   name="age"/> 
       <field column="acc"   name="acc"/> 
       <field column="pacs"   name="pacs"/> 


       <entity 
        name="reports2" 
        query="SELECT report FROM reports WHERE id='${reports1.id}'" 
        dataSource="db" 
       > 
        <entity 
         name="report" 
         dataSource="fieldStream" 
         processor="TikaEntityProcessor" 
         url="report" 
         dataField="reports2.REPORT" 
         format="text" 
         onError="continue"> 
         <field column="text" name="report"/> 
        </entity> 
       </entity> 

      </entity> 

     </document> 
</dataConfig> 

因此看来,我认为从DB的BLOB列中检索到的值被传递或reckonized为字符串,而不是一个斑点的类型。任何人都可以帮我看看我做错了什么吗?我已经找遍和不能看到的解决方案:/

非常感谢

我发现(而琐碎的)的问题(虽然我一直试图天)

更改dataField属性小写做了诡计,违背了this的经验答案。将下面的更改(请参阅箭头'< ===')更改为data-config.xml,并且对所有条目进行了索引。

<entity 
    name="reports2" 
    query="SELECT report FROM reports WHERE id='${reports1.id}'" 
    dataSource="db"> 
     <entity 
      name="report" 
      dataSource="fieldStream" 
      processor="TikaEntityProcessor" 
      url="report" 
      dataField="reports2.report" // <===== instead of reports2.REPORT 
      format="text" 
      onError="continue"> 

      <field column="text" name="report"/> 
     </entity> 

然而,这并给我一个不相关的问题,可能是值得指出的(不知道是否在另一个答案)。

由于从数据库(“报告”)返回的列与Solr Schema Field(“report”)具有相同的名称,因此索引在SQL查询后自动发生,从BLOB索引二进制值。它忽略了TikaEntityProcessor在以下实体中提取的文本。我通过确保两个字段(MySQL和Solr)对此值有不同的名称来解决此问题。

下面,箭头('< ===')显示索引发生的位置(早于计划)。

<entity 
    name="reports2" 
    query="SELECT report FROM reports WHERE id='${reports1.id}'" // <======== DIH indexed this column (report) from the DB into the Solr field of same name (binary representation) 
    dataSource="db"> 
     <entity 
      name="report" 
      dataSource="fieldStream" 
      processor="TikaEntityProcessor" 
      url="report" 
      dataField="reports2.report" 
      format="text" 
      onError="continue"> 

      <field column="text" name="report"/> //<======== instead of waiting for this column (text), the output from Tika (extracted text) 
     </entity> 
</entity>