ECB+数据集介绍

一、Event Definition

  • ACE:an “event“ is defined as a specific occurrence of something that happens, often a change of state, involving participants (LDC, 2005B).
  • TimeML:“events” are characterized as “situations that happen or occur”

ECB+数据集对上述定义做了拓展,将来源于新闻语料中的事件定义为4个组件的组合

  • action:describing what happens or holds true
  • time slot:describing when something happens or holds true
  • location:specifying where something happens or holds true
  • participant : gives the answer to the question: who or what is
    involved with, undergoes change as result of, or facilitates an event or a state. We
    divide event participants into human participants (viz. §2.2.3.3) and non-human
    participants

ECB+数据集在标注数据集时是以事件为中心。

二、Action Component Annotation

2.1 Mention part of speech

  • 动词(verb)
  • 名词(noun):包括普通名词(nominalization)和专有名词(proper noun),例如“The Civil War ended back in 1865.”,“Fast economic growth across the African continent”。
  • 动词的现在分词或过去分词用作定语(attributive use of present- and past- participles in modifier position)。例如“The deceased mens’ house was sold yesterday”,“The crying baby had a high fever.”
  • 形容词,代词或名词表示的谓语短语(predicative phrases expressed by adjectives, pronouns or nouns)。例如Game Five hero David Ross was happy just to be here.
  • 代词(pronoun)。例如“A small earthquake has hit Japan’s eastern coast yesterday. It did not trigger a tsunami.”

2.2 Action Classes

ECB+数据集在标注action时,也对action的类别进行了标注。其中5种类别来TimeML,具体包括:OCCURRENCE, PERCEPTION, REPORTING, ASPECTUAL 和 STATE。除此之外,还另外标注了两种类型:ACTION_CAUSATIVE和ACTION_GENERIC。

  • OCCURRENCE:适用于新闻中的大多数动作,用于描述事情的发生。
  • PERCEPTION:描述与感知相关的动作,例如see, hear, listen等。
  • ACTION_REPORTING:用于组织或个人宣称、宣告某些事情,例如say, report, tell等。
  • ACTION_ASPECTUAL:描述历史事件的不同方面,例如begin,finish,stop,continue等。
  • ACTION_STATE: 描述某物获得或真实的情况,例如hope,love,live,peace等。
  • ACTION_CAUSATIVE:描述与因果相关的动词提及,例如cause, lead to, result, facilitate, induce, produce, bring about。
  • ACTION_GENERIC:描述未指明时间地点的泛指事件。

三、 Overview of main decisions with regard to event component annotation in ECB+
ECB+数据集介绍

四、Overview of decisions made with regards to coreference annotation

ECB+数据集介绍

五、 Overview of seminal events in ECB+ components.

ECB+数据集介绍
ECB+数据集介绍
ECB+数据集介绍

六、ECB+统计信息

  • Num of topics: 43
  • Num of texts: 982
  • Num of annotated action mentions: 15003
  • Num of annotated location mentions: 2205
  • Num of annotated time mentions: 2412
  • Num of annotated human participant mentions: 9621
  • Num of annotated non human participant mentions: 3056
  • Num of unique intra-document chains: 185
  • Num of unique cross-document chains: 2319 intra-topic instances