斯坦福核心NLP缺少根

问题描述：

从在线演示Stanford CoreNLP与例句“可以单独测试一个最小的软件项目”，它给人以CC倒塌的依赖关系处理如下：斯坦福核心NLP缺少根

root (ROOT-0 , item-4) 
det (item-4 , A-1) 
amod (item-4 , minimal-2) 
nn (item-4 , software-3) 
nsubjpass (tested-8 , that-5) 
aux (tested-8 , can-6) 
auxpass (tested-8 , be-7) 
rcmod (item-4 , tested-8) 
prep_in (tested-8 , isolation-10)

从我的Java类，我得到除了根（...）。我正在运行的代码如下：

public static void main(String[] args) 
    { 
     Properties props = new Properties(); 
     props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref"); 
     StanfordCoreNLP pipeline = new StanfordCoreNLP(props); 

     Annotation document = new Annotation(args[0]); 

     pipeline.annotate(document); 

     List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class); 

     for (CoreMap sentence : sentences) { 
      SemanticGraph dependencies = sentence.get(SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation.class); 
      System.out.println(dependencies.toList()); 
     } 
    }

所以问题是为什么我的Java代码不输出root`s！？我错过了什么吗？

答

这是一个很好的问题，它在当前代码中暴露了一个坏处。目前，一个根节点和它的一个边不会被存储在图中*。相反，它们必须作为图的根的根/列表单独访问，作为单独的列表存储。这里有两件事情，将工作：（1）增加System.out.println上面这段代码：

IndexedWord root = dependencies.getFirstRoot(); 
System.out.printf("ROOT(root-0, %s-%d)%n", root.word(), root.index());

（2）使用的，而不是你的当前行：

System.out.println(dependencies.toString("readable"));

不像其他toList()或toString()方法，它会打印根（s）。

*有这样的历史原因：我们以前没有任何明确的根。但在这一点上，这种行为是尴尬的，功能障碍，应该改变。它可能会在未来的版本中发生。

我设法找到了我的情况下，其他的解决方案： 'GrammaticalStructure GS = gsf.newGrammaticalStructure（树）;'' 收集 TDL = gs.typedDependenciesCCprocessed（）;' – werd 2013-05-01 22:06:54

是的，行之有效的，因为ROOT真的在这个依赖关系集合中。次要的成本是，你正在付钱让它们从分析树中第二次生成。 – 2013-05-02 22:43:32

斯坦福核心NLP缺少根

相关推荐