使用apache lucene取消停用词时的异常

问题描述：

我使用以下代码从输入文本中删除停用词。当tokenStream.incrementToken()运行时，我得到异常。使用apache lucene取消停用词时的异常

java.lang.IllegalStateException: TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.

代码：

public static String removeStopWords(String textFile) throws Exception { 
     CharArraySet stopWords = EnglishAnalyzer.getDefaultStopSet(); 
     TokenStream tokenStream = new StandardTokenizer(); 
     tokenStream = new StopFilter(tokenStream, stopWords); 
     StringBuilder sb = new StringBuilder(); 
     CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class); 
     tokenStream.reset(); 
     while (tokenStream.incrementToken()) { 
      String term = charTermAttribute.toString(); 
      sb.append(term + " "); 
     } 
     return sb.toString(); 
    }

答

实例化的TokenStream如下 -

TokenStream tokenStream = new StandardAnalyzer().tokenStream("field",new StringReader(textFile));

在这段代码是什么 “场”？ – Rizstien

“field”是创建的TokenStream用于的字段（IndexableField）的名称。如果您的tokenStream不是特定于某个字段，则可以传递null。另外，由于你的输入是一个字符串，你可以使用''tokenStream（null，textFile）;' – darcula

使用apache lucene取消停用词时的异常

相关推荐