ECB+数据集介绍
一、Event Definition
- ACE:an “event“ is defined as a specific occurrence of something that happens, often a change of state, involving participants (LDC, 2005B).
- TimeML:“events” are characterized as “situations that happen or occur”
ECB+数据集对上述定义做了拓展,将来源于新闻语料中的事件定义为4个组件的组合
- action:describing what happens or holds true
- time slot:describing when something happens or holds true
- location:specifying where something happens or holds true
- participant : gives the answer to the question: who or what is
involved with, undergoes change as result of, or facilitates an event or a state. We
divide event participants into human participants (viz. §2.2.3.3) and non-human
participants
ECB+数据集在标注数据集时是以事件为中心。
二、Action Component Annotation
2.1 Mention part of speech
- 动词(verb)
- 名词(noun):包括普通名词(nominalization)和专有名词(proper noun),例如“The Civil War ended back in 1865.”,“Fast economic growth across the African continent”。
- 动词的现在分词或过去分词用作定语(attributive use of present- and past- participles in modifier position)。例如“The deceased mens’ house was sold yesterday”,“The crying baby had a high fever.”
- 形容词,代词或名词表示的谓语短语(predicative phrases expressed by adjectives, pronouns or nouns)。例如Game Five hero David Ross was happy just to be here.
- 代词(pronoun)。例如“A small earthquake has hit Japan’s eastern coast yesterday. It did not trigger a tsunami.”
2.2 Action Classes
ECB+数据集在标注action时,也对action的类别进行了标注。其中5种类别来TimeML,具体包括:OCCURRENCE, PERCEPTION, REPORTING, ASPECTUAL 和 STATE。除此之外,还另外标注了两种类型:ACTION_CAUSATIVE和ACTION_GENERIC。
- OCCURRENCE:适用于新闻中的大多数动作,用于描述事情的发生。
- PERCEPTION:描述与感知相关的动作,例如see, hear, listen等。
- ACTION_REPORTING:用于组织或个人宣称、宣告某些事情,例如say, report, tell等。
- ACTION_ASPECTUAL:描述历史事件的不同方面,例如begin,finish,stop,continue等。
- ACTION_STATE: 描述某物获得或真实的情况,例如hope,love,live,peace等。
- ACTION_CAUSATIVE:描述与因果相关的动词提及,例如cause, lead to, result, facilitate, induce, produce, bring about。
- ACTION_GENERIC:描述未指明时间地点的泛指事件。
三、 Overview of main decisions with regard to event component annotation in ECB+
四、Overview of decisions made with regards to coreference annotation
五、 Overview of seminal events in ECB+ components.
六、ECB+统计信息
- Num of topics: 43
- Num of texts: 982
- Num of annotated action mentions: 15003
- Num of annotated location mentions: 2205
- Num of annotated time mentions: 2412
- Num of annotated human participant mentions: 9621
- Num of annotated non human participant mentions: 3056
- Num of unique intra-document chains: 185
- Num of unique cross-document chains: 2319 intra-topic instances