如何从一个PHP卷曲获取cookie到一个变量

问题描述:

所以一些其他公司的人认为这将是可怕的,而不是使用soap或xml-rpc或休息或任何其他合理的通信协议,他只是嵌入他的所有响应作为标头中的Cookie。如何从一个PHP卷曲获取cookie到一个变量

我需要从这个卷曲响应中拉出这些cookie作为希望的数组。如果我不得不浪费大量时间为此写一个解析器,我会很不高兴。

有谁知道这可以简单地完成,最好不写任何文件?

如果有人能帮助我解决这个问题,我将不胜感激。

$ch = curl_init('http://www.google.com/'); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
// get headers too with this line 
curl_setopt($ch, CURLOPT_HEADER, 1); 
$result = curl_exec($ch); 
// get cookie 
// multi-cookie variant contributed by @Combuster in comments 
preg_match_all('/^Set-Cookie:\s*([^;]*)/mi', $result, $matches); 
$cookies = array(); 
foreach($matches[1] as $item) { 
    parse_str($item, $cookie); 
    $cookies = array_merge($cookies, $cookie); 
} 
var_dump($cookies); 
+24

不幸的是,我有一种感觉,这是正确的答案。我认为它的荒谬之处在于,curl不能只是给我一个映射数组。 – thirsty93 2009-05-22 15:26:53

+3

我会给你,但preg_match是错误的。 我不只是想要会议,我明白你为什么会这么想。 但是制作他们系统的天才是使用整个响应地图加载cookie,就像get或post一样。像这样: Set-Cookie:price = 1 Set-Cookie:status = accept 我需要一个preg_match_all和'/^Set-Cookie:(。*?)=(。*?)$/sm' – thirsty93 2009-05-22 23:08:06

+0

不够接近,那么? – TML 2009-05-24 05:48:38

如果使用CURLOPT_COOKIE_FILE和CURLOPT_COOKIE_JAR,curl将从文件读取/写入cookie。您可以在卷曲完成后,根据需要阅读和/或修改它。

+9

我认为我们的目标是不使用这个文件 – 2013-01-04 15:54:46

我的理解是,必须将curl的cookies写入文件(curl -c cookie_file)。如果你正在运行curl通过PHP的execsystem功能(或该族中的任何东西),你应该能够饼干保存到一个文件,然后打开文件,并在阅读它们。

+4

他几乎肯定是指php.net/curl :) – TML 2009-05-22 06:12:20

的libcurl也提供CURLOPT_COOKIELIST,提取所有已知的Cookie。所有你需要的是确保PHP/CURL绑定可以使用它。

这样做没有正则表达式,但需要PECL HTTP extension

curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($ch, CURLOPT_HEADER, 1); 
$result = curl_exec($ch); 
curl_close($ch); 

$headers = http_parse_headers($result); 
$cookobjs = Array(); 
foreach($headers AS $k => $v){ 
    if (strtolower($k)=="set-cookie"){ 
     foreach($v AS $k2 => $v2){ 
      $cookobjs[] = http_parse_cookie($v2); 
     } 
    } 
} 

$cookies = Array(); 
foreach($cookobjs AS $row){ 
    $cookies[] = $row->cookies; 
} 

$tmp = Array(); 
// sort k=>v format 
foreach($cookies AS $v){ 
    foreach ($v AS $k1 => $v1){ 
     $tmp[$k1]=$v1; 
    } 
} 

$cookies = $tmp; 
print_r($cookies); 

虽然这个问题很老了,该接受的反应是有效的,我觉得有点unconfortable因为HTTP响应的内容(HTML,XML,JSON,二进制或其他)变得与头混合。

我发现了另一种选择。 CURL提供了一个选项(CURLOPT_HEADERFUNCTION)来设置将针对每个响应标题行调用的回调。该函数将收到curl对象和一个带有标题行的字符串。

您可以使用这样的代码(改编自TML响应):

$cookies = Array(); 
$ch = curl_init('http://www.google.com/'); 
// Ask for the callback. 
curl_setopt($ch, CURLOPT_HEADERFUNCTION, "curlResponseHeaderCallback"); 
$result = curl_exec($ch); 
var_dump($cookies); 

function curlResponseHeaderCallback($ch, $headerLine) { 
    global $cookies; 
    if (preg_match('/^Set-Cookie:\s*([^;]*)/mi', $headerLine, $cookie) == 1) 
     $cookies[] = $cookie; 
    return strlen($headerLine); // Needed by curl 
} 

该解决方案使用全局变量的缺点,但我想这不是短期脚本的问题。如果curl被封装到一个类中,你总是可以使用静态方法和属性。

某人在这里可能会觉得它有用。 hhb_curl_exec2与curl_exec非常类似,但arg3是一个数组,它将填充返回的http头(数字索引),而arg4是一个数组,它将填充返回的cookie($ cookies [“expires”] =>“星期五,06-May-2016 05:58:51 GMT“),并且arg5将填充... curl提供的原始请求信息。

的缺点是,它需要CURLOPT_RETURNTRANSFER要上,否则报错了,而且它会覆盖CURLOPT_STDERR CURLOPT_VERBOSE,如果你已经在使用别的东西..(我以后可能会解决这个问题),如何使用它

例如:

<?php 
header("content-type: text/plain;charset=utf8"); 
$ch=curl_init(); 
$headers=array(); 
$cookies=array(); 
$debuginfo=""; 
$body=""; 
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER,false); 
curl_setopt($ch,CURLOPT_RETURNTRANSFER,true); 
$body=hhb_curl_exec2($ch,'https://www.youtube.com/',$headers,$cookies,$debuginfo); 
var_dump('$cookies:',$cookies,'$headers:',$headers,'$debuginfo:',$debuginfo,'$body:',$body); 

和功能本身..

function hhb_curl_exec2($ch, $url, &$returnHeaders = array(), &$returnCookies = array(), &$verboseDebugInfo = "") 
{ 
    $returnHeaders = array(); 
    $returnCookies = array(); 
    $verboseDebugInfo = ""; 
    if (!is_resource($ch) || get_resource_type($ch) !== 'curl') { 
     throw new InvalidArgumentException('$ch must be a curl handle!'); 
    } 
    if (!is_string($url)) { 
     throw new InvalidArgumentException('$url must be a string!'); 
    } 
    $verbosefileh = tmpfile(); 
    $verbosefile = stream_get_meta_data($verbosefileh); 
    $verbosefile = $verbosefile['uri']; 
    curl_setopt($ch, CURLOPT_VERBOSE, 1); 
    curl_setopt($ch, CURLOPT_STDERR, $verbosefileh); 
    curl_setopt($ch, CURLOPT_HEADER, 1); 
    $html    = hhb_curl_exec($ch, $url); 
    $verboseDebugInfo = file_get_contents($verbosefile); 
    curl_setopt($ch, CURLOPT_STDERR, NULL); 
    fclose($verbosefileh); 
    unset($verbosefile, $verbosefileh); 
    $headers  = array(); 
    $crlf   = "\x0d\x0a"; 
    $thepos  = strpos($html, $crlf . $crlf, 0); 
    $headersString = substr($html, 0, $thepos); 
    $headerArr  = explode($crlf, $headersString); 
    $returnHeaders = $headerArr; 
    unset($headersString, $headerArr); 
    $htmlBody = substr($html, $thepos + 4); //should work on utf8/ascii headers... utf32? not so sure.. 
    unset($html); 
    //I REALLY HOPE THERE EXIST A BETTER WAY TO GET COOKIES.. good grief this looks ugly.. 
    //at least it's tested and seems to work perfectly... 
    $grabCookieName = function($str) 
    { 
     $ret = ""; 
     $i = 0; 
     for ($i = 0; $i < strlen($str); ++$i) { 
      if ($str[$i] === ' ') { 
       continue; 
      } 
      if ($str[$i] === '=') { 
       break; 
      } 
      $ret .= $str[$i]; 
     } 
     return urldecode($ret); 
    }; 
    foreach ($returnHeaders as $header) { 
     //Set-Cookie: crlfcoookielol=crlf+is%0D%0A+and+newline+is+%0D%0A+and+semicolon+is%3B+and+not+sure+what+else 
     /*Set-Cookie:ci_spill=a%3A4%3A%7Bs%3A10%3A%22session_id%22%3Bs%3A32%3A%22305d3d67b8016ca9661c3b032d4319df%22%3Bs%3A10%3A%22ip_address%22%3Bs%3A14%3A%2285.164.158.128%22%3Bs%3A10%3A%22user_agent%22%3Bs%3A109%3A%22Mozilla%2F5.0+%28Windows+NT+6.1%3B+WOW64%29+AppleWebKit%2F537.36+%28KHTML%2C+like+Gecko%29+Chrome%2F43.0.2357.132+Safari%2F537.36%22%3Bs%3A13%3A%22last_activity%22%3Bi%3A1436874639%3B%7Dcab1dd09f4eca466660e8a767856d013; expires=Tue, 14-Jul-2015 13:50:39 GMT; path=/ 
     Set-Cookie: sessionToken=abc123; Expires=Wed, 09 Jun 2021 10:18:14 GMT; 
     //Cookie names cannot contain any of the following '=,; \t\r\n\013\014' 
     // 
     */ 
     if (stripos($header, "Set-Cookie:") !== 0) { 
      continue; 
      /**/ 
     } 
     $header = trim(substr($header, strlen("Set-Cookie:"))); 
     while (strlen($header) > 0) { 
      $cookiename     = $grabCookieName($header); 
      $returnCookies[$cookiename] = ''; 
      $header      = substr($header, strlen($cookiename) + 1); //also remove the = 
      if (strlen($header) < 1) { 
       break; 
      } 
      ; 
      $thepos = strpos($header, ';'); 
      if ($thepos === false) { //last cookie in this Set-Cookie. 
       $returnCookies[$cookiename] = urldecode($header); 
       break; 
      } 
      $returnCookies[$cookiename] = urldecode(substr($header, 0, $thepos)); 
      $header      = trim(substr($header, $thepos + 1)); //also remove the ; 
     } 
    } 
    unset($header, $cookiename, $thepos); 
    return $htmlBody; 
} 

function hhb_curl_exec($ch, $url) 
{ 
    static $hhb_curl_domainCache = ""; 
    //$hhb_curl_domainCache=&$this->hhb_curl_domainCache; 
    //$ch=&$this->curlh; 
    if (!is_resource($ch) || get_resource_type($ch) !== 'curl') { 
     throw new InvalidArgumentException('$ch must be a curl handle!'); 
    } 
    if (!is_string($url)) { 
     throw new InvalidArgumentException('$url must be a string!'); 
    } 

    $tmpvar = ""; 
    if (parse_url($url, PHP_URL_HOST) === null) { 
     if (substr($url, 0, 1) !== '/') { 
      $url = $hhb_curl_domainCache . '/' . $url; 
     } else { 
      $url = $hhb_curl_domainCache . $url; 
     } 
    } 
    ; 

    curl_setopt($ch, CURLOPT_URL, $url); 
    $html = curl_exec($ch); 
    if (curl_errno($ch)) { 
     throw new Exception('Curl error (curl_errno=' . curl_errno($ch) . ') on url ' . var_export($url, true) . ': ' . curl_error($ch)); 
     // echo 'Curl error: ' . curl_error($ch); 
    } 
    if ($html === '' && 203 != ($tmpvar = curl_getinfo($ch, CURLINFO_HTTP_CODE)) /*203 is "success, but no output"..*/) { 
     throw new Exception('Curl returned nothing for ' . var_export($url, true) . ' but HTTP_RESPONSE_CODE was ' . var_export($tmpvar, true)); 
    } 
    ; 
    //remember that curl (usually) auto-follows the "Location: " http redirects.. 
    $hhb_curl_domainCache = parse_url(curl_getinfo($ch, CURLINFO_EFFECTIVE_URL), PHP_URL_HOST); 
    return $html; 
} 

curl_setopt($ch, CURLOPT_HEADER, 1); 
    //Return everything 
    $res = curl_exec($ch); 
    //Split into lines 
    $lines = explode("\n", $res); 
    $headers = array(); 
    $body = ""; 
    foreach($lines as $num => $line){ 
     $l = str_replace("\r", "", $line); 
     //Empty line indicates the start of the message body and end of headers 
     if(trim($l) == ""){ 
      $headers = array_slice($lines, 0, $num); 
      $body = $lines[$num + 1]; 
      //Pull only cookies out of the headers 
      $cookies = preg_grep('/^Set-Cookie:/', $headers); 
      break; 
     } 
    }