PHP:从文本中删除特定域的所有超链接
问题描述:
删除指向mydomain.com &所有超链接保留不属于这个域的所有其他超链接。
对于剩下的所有其他URL,获取标签之间的值并将其显示为ID。
1.关于第一个任务:
我有这样的:
$str = 'I have been searching <a href="http://www.google.com">Google</a> for all the valuable information. I have also tried <a href="http://www.yahoo.com">Yahoo</a> and I finally, ended up finding it at
<font size="1">My Site <a style="color:#0000ff;font-family:Arial,Helvetica,sans-serif" href="http://www.mydomain.com/go.php?offer=fine&pid=10" target="_blank" >My Link</a></font>. So you can visit <a href="http://www.mydomain.com/go.php?offer=ok" target="_blank">My Link</a>';
我想这样:
$str = 'I have been searching <a href="http://www.google.com">Google</a> for all the valuable information. I have also tried <a href="http://www.yahoo.com">Yahoo</a> and I finally, ended up finding it at . So you can visit ';
我试过了:
我试过下面的preg_replace,但是它删除了所有的链接。我只是想要它从mydomain.com中删除所有链接,并保留其他所有内容。
$pattern = "/<a[^>]*>(.*)<\/a>/iU";
$final_str = preg_replace($pattern, "$1", $str);
2.关于第二个任务:
最后,我想这个落得:
$str = 'I have been searching <a href="http://www.google.com" id="Google">Google</a> for all the valuable information. I have also tried <a href="http://www.yahoo.com" id="Yahoo">Yahoo</a> and I finally, ended up finding it at . So you can visit ';
答
这应该做的伎俩在2个步骤:
<?
$str = 'I have been searching <a href="http://www.google.com">Google</a> for all the valuable information. I have also tried <a href="http://www.yahoo.com">Yahoo</a> and I finally, ended up finding it at <font size="1">My Site <a style="color:#0000ff;font-family:Arial,Helvetica,sans-serif" href="http://www.mydomain.com/go.php?offer=fine&pid=10" target="_blank" >My Link</a></font>. So you can visit <a href="http://www.mydomain.com/go.php?offer=ok" target="_blank">My Link</a>';
// removing the domain links
$pattern1 = '|<a [^>]*href="http://www.mydomain.com[^"]*"[^>]*>.*</a>|iU';
$str = preg_replace($pattern1, '', $str);
// adding IDs
$pattern2 = '|(<a [^>]+)>(.*)</a>|iU';
$str = preg_replace($pattern2, '$1 id="$2">$2</a>', $str);
让我知道你是否也需要摆脱<font size="1">My Site </font>
部分。
回答这两个问题:http://php.net/manual/en/class.domdocument.php – PeeHaa 2012-03-25 00:34:01
不要尝试使用正则表达式解析HTML。你会(/你)失败(ing)。 – PeeHaa 2012-03-25 00:34:51
强制性参考:http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – 2012-03-25 00:52:20