Python BeautifulSoup以CSV格式打印信息

问题描述：

我可以打印出我从网站上提取的信息，没有任何问题。但是，当我尝试将街道名称放在一列中，并将邮编放入另一列中时，我就会遇到遇到问题时的CSV文件。我所获得的所有CSV都是两列名称，并且每一页都在页面的各列中。这是我的代码。另外我使用Python 2.7.5和美丽的汤4Python BeautifulSoup以CSV格式打印信息

from bs4 import BeautifulSoup 
import csv 
import urllib2 

url="http://www.conakat.com/states/ohio/cities/defiance/road_maps/" 

page=urllib2.urlopen(url) 

soup = BeautifulSoup(page.read()) 

f = csv.writer(open("Defiance Steets1.csv", "w")) 
f.writerow(["Name", "ZipCodes"]) # Write column headers as the first line 

links = soup.find_all(['i','a']) 

for link in links: 
    names = link.contents[0] 
    print unicode(names) 

f.writerow(names)

您的代码不显示如何获取邮政编码。另外，你在循环中没有使用f.writerow，名字为 – Vorsprung

答

您从URL检索数据包含比i元素更a元素。您必须过滤a元素，然后使用Python zip buildin构建对。

links = soup.find_all('a') 
links = [link for link in links 
     if link["href"].startswith("http://www.conakat.com/map/?p=")] 
zips = soup.find_all('i') 

for l, z in zip(links, zips): 
    f.writerow((l.contents[0], z.contents[0]))

输出：

Name,ZipCodes 
1ST ST,(43512) 
E 1ST ST,(43512) 
W 1ST ST,(43512) 
2ND ST,(43512) 
E 2ND ST,(43512) 
W 2ND ST,(43512) 
3 RIVERS CT,(43512) 
3RD ST,(43512) 
E 3RD ST,(43512) 
...

这正是我所需要的，非常感谢。 – Codin

答

另一种方法（python3）是每一个<a>链接后，找到下一个兄弟，检查它是否是一个标签，并提取其价值：

from bs4 import BeautifulSoup 
import csv 
import urllib.request as urllib2 

url="http://www.conakat.com/states/ohio/cities/defiance/road_maps/" 

page=urllib2.urlopen(url) 

soup = BeautifulSoup(page.read()) 

f = csv.writer(open("Defiance Steets1.csv", "w")) 
f.writerow(["Name", "ZipCodes"]) # Write column headers as the first line 

links = soup.find_all('a') 

for link in links: 
    i = link.find_next_sibling('i') 
    if getattr(i, 'name', None): 
     a, i = link.string, i.string 
     f.writerow([a, i])

它产生：

Name,ZipCodes 
1ST ST,(43512) 
E 1ST ST,(43512) 
W 1ST ST,(43512) 
2ND ST,(43512) 
E 2ND ST,(43512) 
W 2ND ST,(43512) 
3 RIVERS CT,(43512) 
3RD ST,(43512) 
E 3RD ST,(43512) 
W 3RD ST,(43512) 
...

你的方法很好，也谢谢你。我有一个简短的问题，你将如何从邮政编码周围删除（）。谢谢 – Codin

@Codin：一个字符串（'i.string'）也是一个可迭代的，所以你可以使用一个切片去除第一个和最后一个字符：'a，i = link.string，i.string [1： - 1]' – Birei

谢谢你的帮助 – Codin

Python BeautifulSoup以CSV格式打印信息

相关推荐