错误而使用Python 3
问题描述:
从网页中提取链接
让我们考虑以下几点:错误而使用Python 3
<div class="more reviewdata">
<a onclick="bindreviewcontent('1660651',this,false,'I found this review of Star Health Insurance pretty useful',925075287,'.jpg','I found this review of Star Health Insurance pretty useful %23WriteShareWin','http://www.mouthshut.com/review/Star-Health-Insurance-review-toqnmqrlrrm','Star Health Insurance',' 2/5');" style="cursor:pointer">Read More</a>
</div>
从类似上述情况,我想单独提取http链接如下:
http://www.mouthshut.com/review/Star-Health-Insurance-review-toqnmqrlrrm
为了达到这个目的,我使用BeautifulSoup和Python中的正则表达式编写了一个代码。代码如下:
import urllib.request
import re
from bs4 import BeautifulSoup
page = urllib.request.urlopen('http://www.mouthshut.com/product-reviews/Star-Health-Insurance-reviews-925075287').read()
soup = BeautifulSoup(page, "html.parser")
required = soup.find_all("div", {"class": "more reviewdata"})
for link in re.findall('http://www.mouthshut.com/review/Star-Health-Insurance-review-[a-z]*', required):
print(link)
在执行时,如下程序抛出一个错误:
Traceback (most recent call last):
File "E:/beautifulSoup20April2.py", line 11, in <module>
for link in re.findall('http://www.mouthshut.com/review/Star-Health-Insurance-review-[a-z]*', required):
File "C:\Program Files (x86)\Python35-32\lib\re.py", line 213, in findall
return _compile(pattern, flags).findall(string)
TypeError: expected string or bytes-like object
有人建议应该做什么单独提取URL没有任何错误?
答
首先,你需要循环required
,第二你要的对象<class 'bs4.element.Tag'>
上使用regex
(蟒蛇在抱怨这一点),那么你就需要从bs4
元素,它可以与prettify()
进行提取html
这里有一个工作版本:
import urllib.request
import re
from bs4 import BeautifulSoup
page = urllib.request.urlopen('http://www.mouthshut.com/product-reviews/Star-Health-Insurance-reviews-925075287').read()
soup = BeautifulSoup(page, "html.parser")
required = soup.find_all("div", {"class": "more reviewdata"})
for div in required:
for link in re.findall(r'http://www\.mouthshut\.com/review/Star-Health-Insurance-review-[a-z]*', div.prettify()):
print(link)
输出:
http://www.mouthshut.com/review/Star-Health-Insurance-review-ommmnmpmqtm
http://www.mouthshut.com/review/Star-Health-Insurance-review-rmqulrolqtm
http://www.mouthshut.com/review/Star-Health-Insurance-review-ooqrupoootm
http://www.mouthshut.com/review/Star-Health-Insurance-review-rlrnnuslotm
http://www.mouthshut.com/review/Star-Health-Insurance-review-umqsquttntm
...