如何获取<a>的文本标签使用cURL？

问题描述：

我得到了这个错误“致命错误：调用未定义的方法DOMText :: getAttribute（）”与此代码。我想捕捉链接的文本而不是源（我不知道它叫什么）。有人请向我解释我的错误，或告诉我一个不同的方式来做这件事吗？这里是我的代码：如何获取<a>的文本标签使用cURL？

<?php 

$target_url = "SITE I WANT"; 
$userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)'; 

// make the cURL request to $target_url 
$ch = curl_init(); 
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent); 
curl_setopt($ch, CURLOPT_URL,$target_url); 
curl_setopt($ch, CURLOPT_FAILONERROR, true); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); 
curl_setopt($ch, CURLOPT_AUTOREFERER, true); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true); 
curl_setopt($ch, CURLOPT_TIMEOUT, 10); 
$html= curl_exec($ch); 
if (!$html) { 
    echo "<br />cURL error number:" .curl_errno($ch); 
    echo "<br />cURL error:" . curl_error($ch); 
    exit; 
} 

// parse the html into a DOMDocument 
$dom = new DOMDocument(); 
@$dom->loadHTML($html); 

// grab all the on the page 
$xpath = new DOMXPath($dom); 
$hrefs = $xpath->evaluate("/html/body//a/text()"); 

for ($i = 0; $i < $hrefs->length; $i++) { 
    $href = $hrefs->item($i); 
    $url = $href->getAttribute('href'); 
    storeLink($url,$target_url); 
    echo "<br />Link stored: $url"; 
} 
$id = "12"; 
    $query = "DELETE FROM links WHERE id<=$id"; 
    if(!mysql_query($query)) 
     echo "DELETE failed: $query<br />" . 
     mysql_error() . "<br /><br />"; 
     ?>

检查'$ hrefs'的内容。也许你应该使用'/ html/body // a'，然后在每个元素上尝试检索它的文本。 – MatRt

您能否提供我会这样做的代码？我通常都是新来的。 –

看看@Adidi回复，他/她正在编码我刚刚评论过的内容 – MatRt

答

你去那里：

$document = new DOMDocument(); 
$document->loadHTML($html); 
$selector = new DOMXPath($document); 
$anchors = $selector->query('/html/body//a'); 

foreach($anchors as $a) { 
    $text = $a->nodeValue; 
    $href = $a->getAttribute('href'); 
    echo($text . ' : ' . $href . '<br />'); 

}

非常感谢你的先生！ –

如何获取的文本标签使用cURL？

相关推荐