用c#抓取JavaScript cookie的网站#
问题描述:
我想从以下网站刮一些东西:http://www.conrad.nl/modelspoor。用c#抓取JavaScript cookie的网站#
这是我的函数:
public string SreenScrape(string urlBase, string urlPath)
{
CookieContainer cookieContainer = new CookieContainer();
HttpWebRequest httpWebRequest = (HttpWebRequest)WebRequest.Create(urlBase + urlPath);
httpWebRequest.CookieContainer = cookieContainer;
httpWebRequest.UserAgent = "Mozilla/6.0 (Windows; U; Windows NT 7.0; en-US; rv:1.9.0.8) Gecko/2009032609 Firefox/3.0.9 (.NET CLR 3.5.30729)";
WebResponse webResponse = httpWebRequest.GetResponse();
string result = new System.IO.StreamReader(webResponse.GetResponseStream(), Encoding.Default).ReadToEnd();
webResponse.Close();
if (result.Contains("<frame src="))
{
Regex metaregex = new Regex("http:[a-z:/._0-9!?=A-Z&]*",RegexOptions.Multiline);
result = result.Replace("\r\n", "");
Match m = metaregex.Match(result);
string key = m.Groups[0].Value;
foreach (Match match in metaregex.Matches(result))
{
HttpWebRequest redirectHttpWebRequest = (HttpWebRequest)WebRequest.Create(key);
redirectHttpWebRequest.CookieContainer = cookieContainer;
webResponse = redirectHttpWebRequest.GetResponse();
string redirectResponse = new System.IO.StreamReader(webResponse.GetResponseStream(), Encoding.Default).ReadToEnd();
webResponse.Close();
return redirectResponse;
}
}
return result;
}
但是,当我这样做,我得到与它使用JavaScript的网站错误的字符串。
是否有人知道如何解决这一问题?
答
使用我博客上的文章(Use C# to Scrape web pages)我能够获得该页面。下面是代码:
string target = @"http://www1.conrad.nl/modelspoor/";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(target);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
using (Stream responseStream = response.GetResponseStream())
using (StreamReader htmlStream = new StreamReader(responseStream, Encoding.UTF8))
Console.WriteLine(htmlStream.ReadToEnd());
HTH
注意到你不设置redirectHttpWebRequest.UserAgent为您HttpWebRequest的主要求做。也许它会导致问题? – 2010-04-11 01:09:14