如何从URL中获取所有可能的目录组合
问题描述:
我很难找出算法来检测URL列表中的重复目录模式,任何人都可以为此提供一种方法吗?林相当肯定它将需要一个递归调用,但我不能决定如何为每个可能的模式保存记录。如何从URL中获取所有可能的目录组合
注意:这是在PHP中。
Lests说你有一些网址:
1. http://www.goodfood.com/recipes/special_occasion/desserts/pie/chocolate-pie.html
2. http://www.goodfood.com/recipes/special_occasion/desserts/pie/cherry-pie.html
3. http://www.goodfood.com/recipes/special_occasion/apps/chex-mix.html
4. http://www.goodfood.com/recipes/special_occasion/soup/tomato.html
5. http://www.goodfood.com/special/special_occasion/soup/beef-stew.html
6. http://www.goodfood.com/special/special_occasion/soup/vegetable.html
我想找到一种方法来确定一个以上的网址有目录的所有可能的模式。因此,其结果将是这个样子:
'recipes/special_occasion' is found in urls 1, 2, 3 and 4.
'recipes/special_occasion/desserts' is found in urls 1, and 2.
'recipes/special_occasion/desserts/pie' is found in urls 1, and 2.
'special_occasion/desserts/pie' is found in urls 1, and 2.
'desserts/pie' is found in urls 1, and 2.
'special_occasion/desserts' is found in urls 1, and 2.
'special_occasion/desserts/pie' is found in urls 1, and 2.
'special/special_occasion' is found in urls 5, and 6.
'special/special_occasion/soup' is found in urls 5, and 6.
'special_occasion/soup' is found in urls 5, and 6.
我的想法是要经过的每个网址,并拉出每一个可能的新格局,并将其存储在数组中。到目前为止,我有: $ commonDomains = array();
foreach($query AS $row) {
$urlPath = parse_url($row['href'], PHP_URL_PATH);
echo "$urlPath<br/>";
$urlChunks = explode('/', $urlPath);
//var_dump($urlChunks);
foreach($urlChunks AS $domain) {
if(strlen($domain) > 0) {
$thisDomain = $domain.'/';
$commonDomains[$thisDomain][] = $row['id'];
}
}
var_dump($commonDomains);
}
有没有人跑过这个呢?它尖叫着我的模式,但我无法在网上找到答案。我想到的一切都非常复杂。请帮忙,谢谢。
我有什么即时通讯工作的一个例子:http://phpfiddle.org/main/code/kn4-zyh
我的继承人的结果为止
/recipes/special_occasion/desserts/pie/grandmas-chocolate-pie.html
array(5) { [0]=> string(7) "recipes" [1]=> string(16) "special_occasion" [2]=> string(8) "desserts" [3]=> string(3) "pie" [4]=> string(27) "grandmas-chocolate-pie.html" }
0 : 4 : recipes/special_occasion/desserts/pie/grandmas-chocolate-pie.html
0 : 3 : recipes/special_occasion/desserts/pie
0 : 2 : recipes/special_occasion/desserts
0 : 1 : recipes/special_occasion
1 : 4 : special_occasion/desserts/pie/grandmas-chocolate-pie.html
2 : 4 : desserts/pie/grandmas-chocolate-pie.html
3 : 4 : pie/grandmas-chocolate-pie.html
0 : 4 : recipes/special_occasion/desserts/pie/grandmas-chocolate-pie.html
1 : 3 : special_occasion/desserts/pie
**Im missing:
2 : 3 : special_occasion/desserts
1 : 2 : recipes/special_occasion
**
答
实例搜索一个目录:
$links = array(
'http://www.goodfood.com/recipes/special_occasion/desserts/pie/chocolate-pie.html',
'http://www.goodfood.com/recipes/special_occasion/desserts/pie/cherry-pie.html',
'http://www.goodfood.com/recipes/special_occasion/apps/chex-mix.html',
'http://www.goodfood.com/recipes/special_occasion/soup/tomato.html',
'http://www.goodfood.com/special/special_occasion/soup/beef-stew.html',
'http://www.goodfood.com/special/special_occasion/soup/vegetable.html',
);
$dirs = array();
foreach ($links as $key => $link) {
$urlPath = parse_url($link, PHP_URL_PATH);
$arrayUrlPath = explode('/', $urlPath);
$dirs[$key] = array();
$counter = 0;
foreach ($arrayUrlPath as $dir) {
if (empty($dir) || in_array(substr($dir, -5), array('.html'))) {
continue;
}
$dirs[$key][$counter++] = $dir;
}
}
$searchDirs = $dirs;
foreach ($searchDirs as $key => $dir) {
foreach ($dir as $name) {
echo 'dir: ' . $name . ', found in: ' . search($name, $key, $dirs) . "\n";
}
}
function search($name, $excludeKey, $dirs)
{
$return = array();
foreach ($dirs as $key => $dir) {
if ($key === $excludeKey) {
continue;
}
if (in_array($name, $dir)) {
$return[] = (int)$key + 1;
}
}
return join(', ', $return);
}
如果你想搜索长字符串重建search
功能,添加explode
为$name
和比较的研究$key
,如果dir是aaa/bbb/ccc
,检查index 0
是“AAA”和index 1
是bbb
和index 2
是ccc
,除非移动指针index+1
并再次检查。
我希望我能帮上忙。
我承认这是一项艰巨的任务:) – 2013-03-17 01:58:03