在Python中删除AJAX加载的网站
问题描述:
我已经使用Selenium在以下网站下载了https://www.eex-transparency.com/homepage/power/czech-republic/production/availability/non-usability/non-usability。我正在刮所有的表格数据。它运行良好,但运行该脚本需要相当长的时间。因此,我开始寻找替代方案,并在这里使用API向*发送了请求到服务器的几个主题,但经过数小时的尝试和搜索后,我放弃了,因为我没有得到几件事:在Python中删除AJAX加载的网站
- 如何反向工程API发送正确的请求?
- 我应该使用哪个url链接?
这是我想出了:
import json
import requests
url = "https://www.eex-transparency.com/ajax/en/navigation/ajaxGetNavi/12"
data = {
"id": "16",
"title": "Czech Republic",
"url": "https:\\/\\/www.eex-transparency.com\\/homepage\\/power\\/czech-republic",
"class": "country",
"description": "",
"children": [
{
"id": "649",
"title": "Production",
"url": False,
"class": "",
"description": "",
"children": [
{
"id": "650",
"title": "Capacity",
"url": False,
"class": "",
"description": "",
"children": [
{
"id": "651",
"title": "Installed Capacity",
"url": "https:\\/\\/www.eex-transparency.com\\/homepage\\/power\\/czech-republic\\/production\\/capacity\\/installed-capacity",
"class": "",
"description": ""
}
]
}
]
}
]
}
response = requests.get(url, data=data)
file = response.json()
在一般情况下,也许有人可以解释,我应该以什么措施刮除后网页,我特别感兴趣的是如何找到正确的来自Chrome( - > Inspect - > Network - > XHR)的信息以及如何从后面的信息生成data
变量(即我输入requests
)?
这是怎么想的?你没有提供任何细节.. – Aertonas