curl带cookies采集
今天接到一个任务,需要采集https://www.dianping.com 大众点评站。使用php curl时发现存在2个问题。
1,curl 针对https的设置。这个好解决。 curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
2,后面采集时,发现还是被dianping.com转到别的链接上去了。经过分析和排查发现这个是带cookeis访问的。见图
3,使用php的curl存放dianping.com站的cookies失败。采用linux环境内的 curl -c cookie.txt https://www.dianping.com/search/category/207/10 直接得到cookies.txt。比php内的简单见cookies.txt内容
# Netscape HTTP Cookie File
# http://curl.haxx.se/rfc/cookie_spec.html
# This file was generated by libcurl! Edit at your own risk.
.dianping.com TRUE / FALSE 0 PHOENIX_ID 0a0102fe-15a825c9312-1834aca
.dianping.com TRUE / FALSE 1551317789 s_ViewType 10
www.dianping.com FALSE / FALSE 0 JSESSIONID D5829965CE0CE4E539181967FE7FB063
.dianping.com TRUE / FALSE 1519781789 aburl 1
4,直接在php内加上cookies文件,去采集了。成功了。见截图及代码
<?php
$url = 'https://www.dianping.com/search/category/207/10#breadCrumb';
$curl = curl_init();
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_COOKIEFILE, "cook.txt");
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_TIMEOUT, 60);
$contents = curl_exec($curl);
var_dump($contents);
curl_close( $curl );
?>
转载于:https://my.oschina.net/7795442/blog/847439