使用ElementTree解析Python XML:如何查找具有相同名称的元素的值?
免责声明:我是一般的Python,XML和编程新手。代码(我从互联网上偷取)的作品,但有一些问题,我似乎无法找到答案或围绕我的大脑...使用ElementTree解析Python XML:如何查找具有相同名称的元素的值?
我想解析XML文件从grants.gov xml extract website与删除所有不在“不受限制”资格类别(在XML中标记为“EligibilityCategory”为“99”)的赠款并输出新的xml文件。
我有下面的代码正确删除不感兴趣的资金哎呀,还删除了有多个EligibilityCategorys其中还包括一个“99”的资金有机磷农药。我认为这是因为.find只抓住了第一次发生的事情。我试图使用.findall,但无法解决。预先感谢您的帮助。
import xml.etree.ElementTree as etree
tree = etree.parse('sample.xml')
root = tree.getroot()
for FundingOppSynopsis in root.findall('FundingOppSynopsis'):
ID = int(FundingOppSynopsis.find('EligibilityCategory').text)
if ID != 99:
root.remove(FundingOppSynopsis)
tree.write("Output/output.xml", xml_declaration=True, encoding='UTF-8', method="xml")
样品(显著下跌剃)XML:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Grants SYSTEM "http://apply07.grants.gov/search/dtd/XMLExtract.dtd">
<Grants>
<FundingOppSynopsis>
<FundingOppNumber>USDA-RMA-RME-2008-03</FundingOppNumber>
<ApplicationsDueDate>03242008</ApplicationsDueDate>
<Office>Risk Management Agency</Office>
<Agency>Department of Agriculture</Agency>
<EligibilityCategory>25</EligibilityCategory>
</FundingOppSynopsis>
<FundingOppSynopsis>
<FundingOppNumber>NPS-ARRAWHIS100315</FundingOppNumber>
<ApplicationsDueDate>11282009</ApplicationsDueDate>
<Office>National Park Service</Office>
<Agency>Department of the Interior</Agency>
<EligibilityCategory>00</EligibilityCategory>
</FundingOppSynopsis>
<FundingOppSynopsis>
<FundingOppNumber>OFDA-FY08-002-APS</FundingOppNumber>
<ApplicationsDueDate>10102008</ApplicationsDueDate>
<Office>None</Office>
<Agency>Agency for International Development</Agency>
<EligibilityCategory>99</EligibilityCategory>
</FundingOppSynopsis>
<FundingOppSynopsis>
<FundingOppNumber>AK-NOI08-0004</FundingOppNumber>
<ApplicationsDueDate>07142008</ApplicationsDueDate>
<Office>Bureau of Land Management</Office>
<Agency>Department of the Interior</Agency>
<EligibilityCategory>99</EligibilityCategory>
</FundingOppSynopsis>
<FundingOppSynopsis>
<FundingOppNumber>RD-RBP-BIOMASS-2007-FULL</FundingOppNumber>
<ApplicationsDueDate>11162007</ApplicationsDueDate>
<Office>Business and Cooperative Programs</Office>
<Agency>Department of Agriculture</Agency>
<EligibilityCategory>06</EligibilityCategory>
<EligibilityCategory>12</EligibilityCategory>
<EligibilityCategory>13</EligibilityCategory>
<EligibilityCategory>20</EligibilityCategory>
<EligibilityCategory>22</EligibilityCategory>
<EligibilityCategory>23</EligibilityCategory>
<EligibilityCategory>25</EligibilityCategory>
</FundingOppSynopsis>
<FundingOppSynopsis>
<FundingOppNumber>BAA07-10</FundingOppNumber>
<ApplicationsDueDateExplanation>The due dates and times established for the receipt of White Papers and Full Proposals are as indicated in Section IV, Paragraph 3 of the BAA. </ApplicationsDueDateExplanation>
<Office>Office of Procurement Operations - Grants Division</Office>
<Agency>Department of Homeland Security</Agency>
<EligibilityCategory>00</EligibilityCategory>
<EligibilityCategory>01</EligibilityCategory>
<EligibilityCategory>02</EligibilityCategory>
<EligibilityCategory>04</EligibilityCategory>
<EligibilityCategory>05</EligibilityCategory>
<EligibilityCategory>06</EligibilityCategory>
<EligibilityCategory>07</EligibilityCategory>
<EligibilityCategory>08</EligibilityCategory>
<EligibilityCategory>11</EligibilityCategory>
<EligibilityCategory>12</EligibilityCategory>
<EligibilityCategory>13</EligibilityCategory>
<EligibilityCategory>20</EligibilityCategory>
<EligibilityCategory>21</EligibilityCategory>
<EligibilityCategory>22</EligibilityCategory>
<EligibilityCategory>23</EligibilityCategory>
<EligibilityCategory>25</EligibilityCategory>
<EligibilityCategory>99</EligibilityCategory>
</FundingOppSynopsis>
</Grants>
你需要提取使用的findall类别列表,然后检查99是在该列表中。您可以使用这样的list comprehension:
for FundingOppSynopsis in root.findall('FundingOppSynopsis'):
IDs = [int(category.text) for category in FundingOppSynopsis.findall('EligibilityCategory')]
if 99 not in IDs:
root.remove(FundingOppSynopsis)
你可以使用一个XPATH要求达到你想要做什么。
import xml.etree.ElementTree as etree
tree = etree.parse('sample.xml')
root = tree.getroot()
req = tree.findall("./FundingOppSynopsis[EligibilityCategory='99']")
for r in req:
print r
我做回谁拥有孩子的文档的所有FundingOppSynopsis元素的列表请求标签包含文本“99” EligibilityCategory。有关XPath请求here
更多信息。大约在Python here XPATH使用
更多信息。
啊!我试图这样做,但不是完全写出“FundingOppSynopsis”,而是使用“*”。我确定在那里有其他语法错误,但我感到沮丧,删除它,并从头开始。谢谢! – 2015-04-03 02:26:29
谢谢你的帮助!我选择这个答案,因为我仍然是非常新的python,这个答案完全符合文件输出的tree.write方法。 – 2015-04-03 02:30:30