打印子标签

打印子标签

问题描述：

我有一个XML文件中的数据，看起来像这样：打印子标签

<SpeechSegment spkid="S0"> 
    <Word dur="0.22" stime="0.44">oh</Word> 
    <Word dur="0.27" stime="1.67">bedankt</Word> 
    <Word dur="0.3" stime="2.03">voor</Word> 
    <Word dur="0.53" stime="2.61">deelname</Word> 
</SpeechSegment>

我想要做的是计算每个段的话，如果有超过三个单词中插入另一“SpeechSegment”标签。所以我首选的输出是这样的：

<SpeechSegment spkid="S0"> 
    <Word dur="0.22" stime="0.44">oh</Word> 
    <Word dur="0.27" stime="1.67">bedankt</Word> 
    <Word dur="0.3" stime="2.03">voor</Word> 
    #count is more than 3 
    </SpeechSegment><SpeechSegment spkid="S0"> 
    <Word dur="0.53" stime="2.61">deelname</Word> 
</SpeechSegment>

我试图做到这一点使用下面的代码：

import xml.etree.ElementTree as ET 
raw = ET.parse("Interview_short.xml") 
root = raw.getroot() 
for child in root: 
print(child) 

count_list = 0 
for item in child: 
    print(item) 
    count_list = count_list + 1 
    if count_list > 2: 
    #add speech segment tag

我有问题，但是这

print(child)

给了我这样的：

<Element 'SpeechSegment' at 0x20e3cf8>.

虽然我正在寻找

<SpeechSegment spkid="S0">.

添加.text后项不起作用。对这里出了什么问题有任何想法？

答

您可以通过调用元素上的.attrib来访问标签的属性。在你的情况下，child.attrib将返回字典{'spkid'：'S0'}。

现在你可以通过Python的正常方式访问字典中的键和值。

child.attrib['spkid']

希望有所帮助。

如果您也在问如何添加新标签，请在您的问题中指定。

相关推荐