如何将我的Python爬虫输出保存到JSON文件?

问题描述:

我最近开始编码和学习Python,我目前正在研究web爬行器。所以它现在只是打印出搜索结果。我想要的是它将数据保存到JSON文件中。如何将我的Python爬虫输出保存到JSON文件?

import requests 
import json 
from bs4 import BeautifulSoup 

url= "http://www.alternate.nl/html/product/listing.html?navId=11622&tk=7&lk=9419" 
r = requests.get(url) 
soup = BeautifulSoup(r.content) 

g_data = soup.find_all("div", {"class": "listRow"}) 
for item in g_data: 
try: 
    print item.find_all("span", {"class": "name"})[0].text#1 
    print item.find_all("span", {"class": "additional"})[0].text#2 
    print item.find_all("span", {"class": "info"})[0].text#3 
    print item.find_all("span", {"class": "info"})[1].text#4 
    print item.find_all("span", {"class": "info"})[2].text#5 
    print item.find_all("span", {"class": "price right right10"})[0].text#6 
except: 
    pass  

这就是我想要它返回:

所以
{"product1":[{"1":"itemfindallresults1"},{"2":"itemfindallresults2"}]} etc 

我该怎么办呢? 在此先感谢。

+0

首先创建'my_data = { “产品1”:[...]}',下次使用'传入json.dump(my_data,...)' – furas 2014-11-20 17:30:18

一个简单的JSON用法是:

import json 
# open the file "filename" in write ("w") mode 
file = open("filename", "w") 
# just an example dictionary to be dumped into "filename" 
output = {"stuff": [1, 2, 3]} 
# dumps "output" encoded in the JSON format into "filename" 
json.dump(output, file) 
file.close() 

希望这有助于。

一个简单的程序来满足您的要求。

import requests 
import json 
from bs4 import BeautifulSoup 

url= "http://www.alternate.nl/html/product/listing.html?navId=11622&tk=7&lk=9419" 
r = requests.get(url) 
soup = BeautifulSoup(r.content) 

product = Product() 

g_data = soup.find_all("div", {"class": "listRow"}) 
for item in g_data: 
try: 
    product.set_<field_name>(item.find_all("span", {"class": "name"})[0].text) 
    product.set_<field_name>("span", {"class": "additional"})[0].text 
    product.set_<field_name>("span", {"class": "info"})[0].text 
    product.set_<field_name>("span", {"class": "info"})[1].text 
    product.set_<field_name>("span", {"class": "info"})[2].text 
    product.set_<field_name>("span", {"class": "price right right10"})[0].text 
except: 
    pass 

import json 
file = open("filename", "w") 
output = {"product1": product} 
json.dump(output, file) 
file.close()