获取字符串中所有子字符串的出现

问题描述：

我想做一件简单的事情：从字符串（即HTML文件）中提取代码的某些特定部分。获取字符串中所有子字符串的出现

例如：

//Get a string from a website: 
$homepage = file_get_contents('http://mywebsite.org'); 

//Then, search a particulare substring between two strings: 
echo magic_substr($homepage, "<script language", "</script>"); 

//where magic_substr is this function (find in this awesome website): 
function magic_substr($haystack, $start, $end) { 

    $index_start = strpos($haystack, $start); 
    $index_start = ($index_start === false) ? 0 : $index_start + strlen($start); 

    $index_end = strpos($haystack, $end, $index_start); 
    $length = ($index_end === false) ? strlen($end) : $index_end - $index_start; 

    return substr($haystack, $index_start, $length); 
}

我想要得到的输出，在这种情况下，所有页面上的脚本。但在我的情况下，我只能得到第一个脚本。我认为这是对的，因为没有任何递归。但我不知道做这件事的最好方法是什么！有什么建议么？

小狗死可怕，只要你不使用[DOM解析器（http://php.net/manual/en/book.dom.php）找东西在html文档中。 – moonwave99

嗨，我试着用简单的Dom解析器，遇到了“max_nested_level”的麻烦..所以我以这种方式移动:) – alessandronos

max_nested_level的问题是什么？我相信PHP Simple DOM Dom Parser可以完成这些工作。 – raygo

答

我喜欢Prototype/jQuery类似的方式从dom-tree中获取元素。

尝试一些从jQuery-like interface for PHP。我没有在PHP中尝试过。

编辑：

对于有效的HTML/XML尝试Tidy或HTML Purifier或htmlLawled。

我一定会尝试！谢谢！ – alessandronos

答

试试这个从任何给标签或数据提取数据在你的情况
提取（$主页“的脚本语言，”脚本“）;
哎呀它无法正确显示脚本标签，但你定义为你定义在你的榜样

/*****************************************************************/ 
/* string refine_str($str,$from,$to="")       */ 
/* show data between $from and $to and also remove $from and $to */ 
/* if $to is not provided $from will be considered    */ 
/* a string to remove.           */ 
/*****************************************************************/ 

function extractor($str,$from,$to) 
{ 
    $from_pos = strpos($str,$from); 
    $from_pos = $from_pos + strlen($from); 
    $to_pos = strpos($str,$to,$from_pos);// to must be after from 
    $return = substr($str,$from_pos,$to_pos-$from_pos); 
    unset($str,$from,$to,$from_pos,$to_pos);   
    return $return; 

}

这与“我的”功能是一样的：DI只能看到$ from字符串和$ to字符串之间的第一个字符串..在我的情况下，必须有19个匹配的类型..我知道html结构具体的文件，我想“解析”，我敢肯定，字符串“从”和“到”总是相同的 – alessandronos

确定im发布第二个答案它将返回数组的所有occourense –

我发布它现在检查它是在页面底部 –

答

/****************************************************************/ 
/* array getSelectiveContent($content,$from,$to,$exclude="") */ 
/* return array of content between provided     */ 
/* from and to positions.          */ 
/****************************************************************/ 

function getSelectiveContent($content,$from,$to,$exclude="") 
{ 
    $return = array(); // array for return elements 
    $size_FROM = strlen($from); 
    $size_TO = strlen($to); 
while(true) 
{ 
    $pos = strpos($content,$from); // find first occurance of $from 
    if($pos === false) 
    { 
     break; // if not exist break loop 
    } 
    else 
    { 
     $element = extractor($content,$from,$to); // fetch first element 
     if($exclude == "") 
     { 
      if(trim($element) != "") 
      { 
       $return[] = trim($element); 
      } 
     } 
     else 
     { 
      if(trim($element) != "" && !strstr($element,$exclude)) // if nothing in range, and exclude is not in it 
      { 
       $return[] = trim($element); // put fetched content in array. 
      } 
     } 
     $content = substr($content,$pos+strlen($element)+$size_FROM+$size_TO); // remove $from to $to from content 
    } 
} 
unset($content,$from,$to,$element,$exclude,$pos,$size_FROM,$size_TO); 
return $return; 
}

现在它工作得很好！谢谢！！ – alessandronos

所以请标记为已接受 –

答

$text="this is an example of text extract in from very long long text this is my test of the php"; 
$start="this"; 
$end="of"; 
$i=substr_count($text,$start); 
$k=substr_count($text,$end); 
$len1=strlen($start); 
$len2=strlen($end); 
$temp=$text; 
for ($j=1;$j<=$i;$j++){ 
     $pos1=strpos($temp,$start); 
    $pos2=strpos($temp,$end); 
    $subs=substr($temp,$pos1+$len1,$pos2-($pos1+$len1)); 
    echo $subs.'<br/>'; 
    $temp=substr($temp,$pos2+$len2,strlen($temp)-strlen($subs)); 
}

它似乎可以“输出”任何东西:) – alessandronos

获取字符串中所有子字符串的出现

相关推荐