使用DOMDocument从网站中抓取所有图像

问题描述：

我基本上想获得所有所有使用DOMDocument的网站中的图像。但后来我甚至不能加载我的HTML由于我不知道的一些原因。使用DOMDocument从网站中抓取所有图像

$url="http://<any_url_here>/"; 
$dom = new DOMDocument(); 
@$dom->loadHTML($url); //i have also tried removing @ 
$dom->preserveWhiteSpace = false; 
$dom->saveHTML(); 
$images = $dom->getElementsByTagName('img'); 
foreach ($images as $image) 
{ 
echo $image->getAttribute('src'); 
}

发生的事情是没有打印。或者我在代码中做了什么错误？

你没有得到错误信息的原因可能是这行'@ $ dom-> loadHTML（$ url）;'在php中'@'隐藏了该函数的所有错误信息。 – 2013-04-09 07:32:10

我在几年前删除它，但仍然没有结果... – Leonid 2013-04-09 07:34:07

您不会得到结果，因为'$ dom-> loadHTML（）'需要html。你给它一个url，你首先需要得到你想要解析的页面的html。你可以使用'file_get_contents（）'。（查看答案） – 2013-04-09 07:36:13

答

You don't get a result because $dom->loadHTML() expects html. You give it an url, you first need to get the html of the page you want to parse. You can use file_get_contents() for that.

我在我的图像抓取类中使用了这个。对我来说工作得很好。

$html = file_get_contents('http://www.google.com/'); 
$dom = new domDocument; 
$dom->loadHTML($html); 
$dom->preserveWhiteSpace = false; 
$images = $dom->getElementsByTagName('img'); 
foreach ($images as $image) { 
    echo $image->getAttribute('src'); 
}

我现在有一个实体错误中重新定义的Attribute类。 '$ dom = new DOMDocument; \t \t $ htmls = file_get_contents（“http://philcooke.com/inspiration-happens-but-the-best-ideas-take-time/”）; $ dom-> loadHTML（$ htmls）;' – Leonid 2013-04-09 08:34:30

你的回答几乎是正确的。只需在$ dom-> loadHTML（$ html）前添加一个“@”字符' – Leonid 2013-04-09 08:40:17

在'$ dom-> loadHTML（$ html）'之前追加'@'来压制错误，您可以使用tidy先清理html。 ''tidy = tidy_parse_string（$ html）; $ html = $ tidy-> html（） - > value;'''但也许这太多了。 – 2013-11-28 08:09:01

使用DOMDocument从网站中抓取所有图像

相关推荐