使用PHP和XMLReader解析XML
我一直试图用PHP和XMLReader解析一个非常大的XML文件,但似乎无法得到我期待的结果。基本上,我正在搜索大量的信息,如果某个邮件包含某个邮政编码,我想返回那一点XML,或者继续搜索,直到找到该邮政编码。从本质上讲,我将把这个大文件分解成几个小块,所以不必查看数千或数百万个信息组,它可能是10或20个。使用PHP和XMLReader解析XML
这里有一个位XML与想什么,我到
//search through xml
<lineups country="USA">
//cache TX02217 as a variable
<headend headendId="TX02217">
//cache Grande Gables at The Terrace as a variable
<name>Grande Gables at The Terrace</name>
//cache Grande Communications as a variable
<mso msoId="17541">Grande Communications</mso>
<marketIds>
<marketId type="DMA">635</marketId>
</marketIds>
//check to see if any of the postal codes are equal to $pc variable that will be set in the php
<postalCodes>
<postalCode>11111</postalCode>
<postalCode>22222</postalCode>
<postalCode>33333</postalCode>
<postalCode>78746</postalCode>
</postalCodes>
//cache Austin to a variable
<location>Austin</location>
<lineup>
//cache all prgSvcID's to an array i.e. 20014, 10722
<station prgSvcId="20014">
//cache all channels to an array i.e. 002, 003
<chan effDate="2006-01-16" tier="1">002</chan>
</station>
<station prgSvcId="10722">
<chan effDate="2006-01-16" tier="1">003</chan>
</station>
</lineup>
<areasServed>
<area>
//cache community to a variable $community
<community>Thorndale</community>
<county code="45331" size="D">Milam</county>
//cache state to a variable i.e. TX
<state>TX</state>
</area>
<area>
<community>Thrall</community>
<county code="45491" size="B">Williamson</county>
<state>TX</state>
</area>
</areasServed>
</headend>
//if any of the postal codes matched $pc
//echo back the xml from <headend> to </headend>
//if none of the postal codes matched $pc
//clear variables and move to next <headend>
<headend>
etc
etc
etc
</headend>
<headend>
etc
etc
etc
</headend>
<headend>
etc
etc
etc
</headend>
</lineups>
PHP:
<?php
$pc = "78746";
$xmlfile="myFile.xml";
$reader = new XMLReader();
$reader->open($xmlfile);
while ($reader->read()) {
//search to see if groups contain $pc and echo info
}
我知道我在做这个难度比它应该是,但我有点不知所措试图操纵这样一个大文件。任何帮助表示赞赏。
编辑:哦,你想返回父块?一会儿。
下面是一个将所有postalCodes拉出到数组中的例子。
<?php
$string='<lineups country="USA">
<headend headendId="TX02217">
<name>Grande Gables at The Terrace</name>
<mso msoId="17541">Grande Communications</mso>
<marketIds>
<marketId type="DMA">635</marketId>
</marketIds>
<postalCodes>
<postalCode>11111</postalCode>
<postalCode>22222</postalCode>
<postalCode>33333</postalCode>
<postalCode>78746</postalCode>
</postalCodes>
<location>Austin</location>
<lineup>
<station prgSvcId="20014">
<chan effDate="2006-01-16" tier="1">002</chan>
</station>
<station prgSvcId="10722">
<chan effDate="2006-01-16" tier="1">003</chan>
</station>
</lineup>
<areasServed>
<area>
<community>Thorndale</community>
<county code="45331" size="D">Milam</county>
<state>TX</state>
</area>
<area>
<community>Thrall</community>
<county code="45491" size="B">Williamson</county>
<state>TX</state>
</area>
</areasServed>
</headend></lineups>';
$dom = new DOMDocument();
$dom->loadXML($string);
$xpath = new DOMXPath($dom);
$elements= $xpath->query('//lineups/headend/postalCodes/*[text()=78746]');
if (!is_null($elements)) {
foreach ($elements as $element) {
echo "<br/>[". $element->nodeName. "]";
$nodes = $element->childNodes;
foreach ($nodes as $node) {
echo $node->nodeValue. "\n";
}
}
}
输出:
<br/>[postalCode]78746
它会像'if(count($ nodes)){echo $ string; }而不是foreach,还是有更多的呢? – mkaatman 2013-03-11 18:37:41
由于文件太大(可能是一个演出或更多),我认为解决它的最好方法是使用XMLReader逐个节点。我无法预先加载文件,因为它太大了。我不想像
为了获得更大的灵活性XMLReader
我通常创建自己iterators that are able to work on the XMLReader
object and provide the steps I need。
从对所有节点的简单迭代开始,直到迭代元素(可选地使用特定名称)。我们将最后一个XMLElementIterator
称为读取器和元素名称作为参数。
在你的情况我然后将创建一个返回当前元素的SimpleXMLElement
的迭代器,只服用了<headend>
元素:
require('xmlreader-iterators.php'); // https://gist.github.com/hakre/5147685
class HeadendIterator extends XMLElementIterator {
const ELEMENT_NAME = 'headend';
public function __construct(XMLReader $reader) {
parent::__construct($reader, self::ELEMENT_NAME);
}
/**
* @return SimpleXMLElement
*/
public function current() {
return simplexml_load_string($this->reader->readOuterXml());
}
}
配备该迭代作业的其余部分主要是小菜一碟。首先加载10千兆字节的文件:
$pc = "78746";
$xmlfile = '../data/lineups.xml';
$reader = new XMLReader();
$reader->open($xmlfile);
然后检查<headend>
元素包含的信息,如果是的话,显示数据/ XML:
foreach (new HeadendIterator($reader) as $headend) {
/* @var $headend SimpleXMLElement */
if (!$headend->xpath("/*/postalCodes/postalCode[. = '$pc']")) {
continue;
}
echo 'Found, name: ', $headend->name, "\n";
echo "==========================================\n";
$headend->asXML('php://stdout');
}
这并不字面上你想实现:迭代大文档(这对内存友好),直到找到感兴趣的元素。然后处理具体元素,它只是XML; XMLReader::readOuterXml()
在这里是一个很好的工具。
输出例:
Found, name: Grande Gables at The Terrace
==========================================
<?xml version="1.0"?>
<headend headendId="TX02217">
<name>Grande Gables at The Terrace</name>
<mso msoId="17541">Grande Communications</mso>
<marketIds>
<marketId type="DMA">635</marketId>
</marketIds>
<postalCodes>
<postalCode>11111</postalCode>
<postalCode>22222</postalCode>
<postalCode>33333</postalCode>
<postalCode>78746</postalCode>
</postalCodes>
<location>Austin</location>
<lineup>
<station prgSvcId="20014">
<chan effDate="2006-01-16" tier="1">002</chan>
</station>
<station prgSvcId="10722">
<chan effDate="2006-01-16" tier="1">003</chan>
</station>
</lineup>
<areasServed>
<area>
<community>Thorndale</community>
<county code="45331" size="D">Milam</county>
<state>TX</state>
</area>
<area>
<community>Thrall</community>
<county code="45491" size="B">Williamson</county>
<state>TX</state>
</area>
</areasServed>
</headend>
我认为你钉了它。这正是我想要做的。然而,我并不是那么熟悉PHP并且无法遵循你的例子。你能简化一下吗?如果你没有时间,我会继续尝试理解它。 感谢您的回复! – user1129107 2013-03-12 03:08:30
我应付了你的例子。在主要的PHP文件中,我有include('iterator.php');但是,我收到以下错误:致命错误:Class'XMLElementIterator'not found in iterator.php – user1129107 2013-03-12 18:44:17
如何在不创建新类的情况下使用父'XMLElementIterator'类? – 2013-08-26 17:03:45
什么是你实际上是在XML的该块找? XPath是你的朋友。你只是想看看是否有包含预定值? –
mkaatman
2013-03-11 18:15:28
类别。如果我搜索这个大文件,并且块包含预定的邮编,那么我想基本上返回该块。它会将这个庞大文件的大小减少到2%。我仍然会返回XML,但是我将不得不引用的数量将会大大减小。 – user1129107 2013-03-11 18:21:14