Apache Tika和解析文档时的字符限制
问题描述:
请问任何人都可以帮我解决这个问题吗?Apache Tika和解析文档时的字符限制
,它可以像这样
Tika tika = new Tika();
tika.setMaxStringLength(10*1024*1024);
做,但如果你不直接使用提卡,像这样:
ContentHandler textHandler = new BodyContentHandler();
Metadata metadata = new Metadata();
Parser parser = new AutoDetectParser();
ParseContext ps = new ParseContext();
for (InputStream is : getInputStreams()) {
parser.parse(is, textHandler, metadata, ps);
is.close();
System.out.println("Title: " + metadata.get("title"));
System.out.println("Author: " + metadata.get("Author"));
}
有没有办法来设置它,因为你不” t与WriteOutContentHandler
交互。顺便说一句,它默认设置为-1
,这意味着没有任何限制。但是由此产生的限制是100000个字符。
/**
* The maximum number of characters to write to the character stream.
* Set to -1 for no limit.
*/
private final int writeLimit;
/**
* Number of characters written so far.
*/
private int writeCount = 0;
private WriteOutContentHandler(Writer writer, int writeLimit) {
this.writer = writer;
this.writeLimit = writeLimit;
}
/**
* Creates a content handler that writes character events to
* the given writer.
*
* @param writer writer
*/
public WriteOutContentHandler(Writer writer) {
this(writer, -1);
}
答
您必须因此而忽略了内容处理器与writelimit构造。
ContentHandler textHandler = new BodyContentHandler(int writeLimit);
OMG我这样的.....谢谢哥们 – lisak 2011-05-26 20:39:33
#selfie .... :-) – bknopper 2014-05-27 08:57:29
注意,-1是无限的字符! – 2014-07-31 11:54:17