POST请求给出空结果
我已经使用POST请求在python中编写了一些代码以从网页中获取特定数据。但是,当我运行它时,除了空白控制台外,没有任何结果。我试图相应地填写请求参数。也许,我不能注意到哪些应该包含在参数中。我正在处理的页面在其右侧面板中包含多个图像。点击图片时,我在这里谈论的请求被发送到服务器,并将结果返回并显示有关其下的风味的新信息。我的目标是解析连接到每个图像的所有风味。无论如何,我试图附上所有必要的事情,以找出我失踪的事情。提前致谢。POST请求给出空结果
这是我从Chrome开发者工具一定要准备POST请求:
===================================================================================
General:
Request URL:https://www.optigura.com/product/ajax/details.php
Request Method:POST
Status Code:200 OK
Response Headers:
Cache-Control:no-store, no-cache, must-revalidate
Cache-Control:max-age=0, no-cache, no-store, must-revalidate
Connection:Keep-Alive
Content-Encoding:gzip
Content-Length:782
Content-Type:text/html; charset=utf-8
Request Headers:
Accept:application/json, text/javascript, */*; q=0.01
Accept-Encoding:gzip, deflate, br
Accept-Language:en-US,en;q=0.8
Connection:keep-alive
Content-Length:34
Content-Type:application/x-www-form-urlencoded
Cookie:OGSESSID=s1qqd0euokbfrdub9pf2efubh1; _ga=GA1.2.449310094.1501502802; _gid=GA1.2.791686763.1501502802; _gat=1; __atuvc=1%7C31; __atuvs=597f1d5241db0352000; beyable-TrackingId=499b4c5b-2939-479b-aaf0-e5cd79f078cc; aaaaaaaaa066e9a68e5654b829144016246e1a736=d5758131-71db-41e1-846d-6d719d381060.1501502805122.1501502805122.$bey$https%3a%2f%2fwww.optigura.com%2fuk%2fproduct%2fgold-standard-100-whey%2f$bey$1; aaaaaaaaa066e9a68e5654b829144016246e1a736_cs=; aaaaaaaaa066e9a68e5654b829144016246e1a736_v=1.1.0; checkloc-uk=n
Host:www.optigura.com
Origin:https://www.optigura.com
Referer:https://www.optigura.com/uk/product/gold-standard-100-whey/
User-Agent:Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36
X-Requested-With:XMLHttpRequest
Form Data:
opt:flavor
opt1:207
opt2:47
ip:105
=======================================================================================
这里就是我,试图:
import requests
from lxml import html
payload = {"opt":"flavor","opt1":"207","opt2":"47","ip":"105"}
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.81 Safari/537.36'}
response = requests.post("https://www.optigura.com/product/ajax/details.php", params = payload, headers = headers).text
print(response)
原来这是链接到网页: https://www.optigura.com/uk/product/gold-standard-100-whey/
你应该试试下面的请求结构:
-
要发送的数据:
data = {'opt': 'flavor', 'opt1': '207', 'opt2': '47', 'ip': 105}
-
页眉:
headers = {'X-Requested-With': 'XMLHttpRequest'}
-
网址:
url = 'https://www.optigura.com/product/ajax/details.php'
-
你也需要得到饼干,所以
requests.session()
要求:s = requests.session() r = s.get('https://www.optigura.com/uk/product/gold-standard-100-whey/') cookies = r.cookies
完成请求:
response = s.post(url, cookies=cookies, headers=headers, data=data)
现在,您可以得到所需的一块HTML
为
print(response.json()['info2'])
输出的:
'<ul class="opt2"><li class="active">
<label>
<input type="radio" name="ipr" value="1360" data-opt-sel="47" checked="checked" /> Delicious Strawberry - <span class="green">In Stock</span></label>
</li><li>
<label>
<input type="radio" name="ipr" value="1356" data-opt-sel="15" /> Double Rich Chocolate - <span class="green">In Stock</span></label>
</li><li>
<label>
<input type="radio" name="ipr" value="1169" data-opt-sel="16" /> Vanilla Ice Cream - <span class="green">In Stock</span></label>
</li></ul>'
然后你可以使用lxml
刮味值:
from lxml import html
flavors = response.json()['info2']
source = html.fromstring(flavors)
[print(element.replace(' - ', '').strip()) for element in source.xpath('//label/text()[2]')]
输出:
Delicious Strawberry
Double Rich Chocolate
Vanilla Ice Cream
哦,我的上帝!多么精细的答案!它完成了我之后的工作。 Martijn Pieters爵士也提出了相同的建议,但我无法正确理解事情应该如何。非常感谢,安德森先生。你让我今天一整天都感觉很好。 – SIM
您不在POST正文中发送值,params
设置URL查询参数。使用data
代替:
response = requests.post(
"https://www.optigura.com/product/ajax/details.php",
data=payload,
headers=headers)
您可能需要设置一个网址标头(加'Referer': 'https://www.optigura.com/uk/product/gold-standard-100-whey/'
到你的头字典),并使用session object捕获和管理饼干(发出GET请求https://www.optigura.com/uk/product/gold-standard-100-whey/
第一)。
通过一些实验,我注意到该网站还要求在设置X-Requested-With
标题之前设置其内容。
以下工作:
with requests.session():
session.get('https://www.optigura.com/uk/product/gold-standard-100-whey/')
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.81 Safari/537.36',
'Referer': 'https://www.optigura.com/uk/product/gold-standard-100-whey/',
'X-Requested-With': 'XMLHttpRequest'
}
response = session.post(
"https://www.optigura.com/product/ajax/details.php",
data=payload, headers=headers)
响应之际,JSON数据:
data = response.json()
试图应付你所建议的先生Martijn Pieters。不过对我来说有点先进水平! – SIM
您不在POST主体中发送值,'params'设置URL查询参数。改用'data'。 –