Apache Tika和解析文档时的字符限制

问题描述:

请问任何人都可以帮我解决这个问题吗?Apache Tika和解析文档时的字符限制

,它可以像这样

Tika tika = new Tika(); 
    tika.setMaxStringLength(10*1024*1024); 

做,但如果你不直接使用提卡,像这样:

ContentHandler textHandler = new BodyContentHandler(); 
Metadata metadata = new Metadata(); 
Parser parser = new AutoDetectParser(); 

ParseContext ps = new ParseContext(); 
for (InputStream is : getInputStreams()) { 
    parser.parse(is, textHandler, metadata, ps); 
    is.close(); 
    System.out.println("Title: " + metadata.get("title")); 
    System.out.println("Author: " + metadata.get("Author")); 
} 

有没有办法来设置它,因为你不” t与WriteOutContentHandler交互。顺便说一句,它默认设置为-1,这意味着没有任何限制。但是由此产生的限制是100000个字符。

/** 
* The maximum number of characters to write to the character stream. 
* Set to -1 for no limit. 
*/ 
private final int writeLimit; 

/** 
* Number of characters written so far. 
*/ 
private int writeCount = 0; 

private WriteOutContentHandler(Writer writer, int writeLimit) { 
    this.writer = writer; 
    this.writeLimit = writeLimit; 
} 

/** 
* Creates a content handler that writes character events to 
* the given writer. 
* 
* @param writer writer 
*/ 
public WriteOutContentHandler(Writer writer) { 
    this(writer, -1); 
} 

您必须因此而忽略了内容处理器与writelimit构造。

ContentHandler textHandler = new BodyContentHandler(int writeLimit); 
+0

OMG我这样的.....谢谢哥们 – lisak 2011-05-26 20:39:33

+1

#selfie .... :-) – bknopper 2014-05-27 08:57:29

+4

注意,-1是无限的字符! – 2014-07-31 11:54:17