解析XML以获取节点的值

问题描述：

import xml.dom.minidom 

content = """ 
<urlset xmlns="http://www.google.com/schemas/sitemap/0.90"> 
    <url> 
    <loc>http://www.domain.com/</loc> 
    <lastmod>2011-01-27T23:55:42+01:00</lastmod> 
    <changefreq>daily</changefreq> 
    <priority>0.5</priority> 
    </url> 
    <url> 
    <loc>http://www.domain.com/page1.html</loc> 
    <lastmod>2011-01-26T17:24:27+01:00</lastmod> 
    <changefreq>daily</changefreq> 
    <priority>0.5</priority> 
    </url> 
    <url> 
    <loc>http://www.domain.com/page2.html</loc> 
    <lastmod>2011-01-26T15:35:07+01:00</lastmod> 
    <changefreq>daily</changefreq> 
    <priority>0.5</priority> 
    </url> 
</urlset> 
""" 

xml = xml.dom.minidom.parseString(content) 
urlset = xml.getElementsByTagName("urlset")[0] 
url = urlset.getElementsByTagName("url") 

for i in range(0, url.length): 
    loc = url[i].getElementsByTagName("loc")[0].childNodes[0].nodeValue 
    lastmod = url[i].getElementsByTagName("lastmod")[0].childNodes[0].nodeValue 
    changefreq = url[i].getElementsByTagName("changefreq")[0].childNodes[0].nodeValue 
    priority = url[i].getElementsByTagName("priority")[0].childNodes[0].nodeValue 
    print "%s, %s, %s, %s" % (loc, lastmod, changefreq, priority)

是否没有简单的方法来获取节点的值？解析XML以获取节点的值

loc = url[i].getElementsByTagName("loc")[0].childNodes[0].nodeValue

答

有可能是一个更好的方式来获得一个节点的值...但是这至少是一个更清洁的替代，你不要重复自己：

import xml.dom.minidom 

content = """ 
<urlset xmlns="http://www.google.com/schemas/sitemap/0.90"> 
    <url> 
    <loc>http://www.domain.com/</loc> 
    <lastmod>2011-01-27T23:55:42+01:00</lastmod> 
    <changefreq>daily</changefreq> 
    <priority>0.5</priority> 
    </url> 
    <url> 
    <loc>http://www.domain.com/page1.html</loc> 
    <lastmod>2011-01-26T17:24:27+01:00</lastmod> 
    <changefreq>daily</changefreq> 
    <priority>0.5</priority> 
    </url> 
    <url> 
    <loc>http://www.domain.com/page2.html</loc> 
    <lastmod>2011-01-26T15:35:07+01:00</lastmod> 
    <changefreq>daily</changefreq> 
    <priority>0.5</priority> 
    </url> 
</urlset> 
""" 

def get_first_node_val(obj, tag): 
    return obj.getElementsByTagName(tag)[0].childNodes[0].nodeValue 

xml = xml.dom.minidom.parseString(content) 
urlset = xml.getElementsByTagName("urlset")[0] 
urls = urlset.getElementsByTagName("url") 

for url in urls: 
    loc = get_first_node_val(url, "loc") 
    lastmod = get_first_node_val(url, "lastmod") 
    changefreq = get_first_node_val(url, "changefreq") 
    priority = get_first_node_val(url, "priority") 
    print "%s, %s, %s, %s" % (loc, lastmod, changefreq, priority)

答

这项工作：loc = getElementsByTagName("loc")[i].innerHTML？

这不是Python的。 – anjanesh 2012-08-03 07:19:25

答

为什么点不则firstChild

loc = url[i].getElementsByTagName("loc").firstChild.nodeValue

回溯（最近最后调用）：文件 “script.py”，第31行，在 LOC = URL [I] .getElementsByTagName（ “LOC”）firstChild.nodeValue AttributeError的： '节点列表' 对象没有属性'firstChild' – anjanesh 2012-08-03 07:58:35

from xml.dom.minidom import Node ..您是否导入节点？ – 2012-08-03 08:23:35

答

向“get_first_node_val”添加附加功能，该功能接受具有相同节点值的XML元素。例如，以下包含两个loc元素。

<url> 
<loc>http://domain.com/</loc> 
<loc>http://sub.domain.com</loc> 
<lastmod>2011-01-27T23:55:42+01:00</lastmod> 
<changefreq>daily</changefreq> 
<priority>0.5</priority> 
</url> 


def get_first_node_val(obj, tag): 
    element = [] 
    l = 0 
    for x in obj.getElementsByTagName(tag): 
    element.append({tag : obj.getElementsByTagName(tag)[l].childNodes[0].nodeValue}) 
    l += 1 
    return element

输出

[{'loc': u'http://domain.com/'}, {'loc': u'http://sub.domain.com'}], [{'lastmod': u'2011-01-27T23:55:42+01:00'}], [{'changefreq': u'daily'}], [{'priority': u'0.5'}]

解析XML以获取节点的值

相关推荐