Python的scrapy蜘蛛

问题描述：

我想刮从网站http://www.quoka.de/immobilien/bueros-gewerbeflaechen数据与此过滤器：Python的scrapy蜘蛛

<a class="t-bld" rel="nofollow" href="javascript:qsn.set('classtype','of',1);">nur Angebote</a>

如何使用scrapy设置该过滤器？

答

一种方式是通过提交带有参数的请求，并解析的结果响应。请参阅以下代码示例：

import scrapy 

class TestSpider(scrapy.Spider): 

    name = 'quoka' 
    start_urls = ['http://www.quoka.de/immobilien/bueros-gewerbeflaechen'] 

    def parse(self, response): 

     request = scrapy.FormRequest.from_response(
      response, 
      formname='frmSearch', 
      formdata={'classtype': 'of'}, 
      callback=self.parse_filtered 
     ) 
     # print request.body 
     yield request 

    def parse_filtered(self,response): 

     searchResults = response.xpath('//div[@id="ResultListData"]/ul/li') 
     for result in searchResults: 
      title = result.xpath('.//div[@class="q-col n2"]/a/@title').extract() 
      print title

考虑接受我的答案，如果它解决了你的问题。请不要使用评论发布新问题，而是要创建一个新问题并显示代码以及您的期望和获得的内容。请参阅[问]以获取更多信息。 –

答

您可以使用Beautifulsoup和urllib2解析特定网站。这里是你想要根据你写的过滤器解析或者抓取的数据的python实现。

from BeautifulSoup import BeautifulSoup 
import urllib2 

def main1(website): 
    data_list = [] 
    web =urllib2.urlopen(website).read() 
    soup = BeautifulSoup(web) 
    description = soup.findAll('a', attrs={'rel':'nofollow'}) 
    for de in description: 
     data_list.append(de.text) 
    return data_list 

print main1("http://www.quoka.de/immobilien/bueros-gewerbeflaechen")

如果你想解析其它数据，如从下面的描述：

enter image description here

def main(website): 
    data_list = [] 
    web =urllib2.urlopen(website).read() 
    soup = BeautifulSoup(web) 
    description = soup.findAll('div', attrs={'class':'description'}) 
    for de in description: 
     data_list.append(de.text) 
    return data_list 

print main("http://www.quoka.de/immobilien/bueros-gewerbeflaechen") #this is the data of each section

谢谢，但我只能使用scrapy。唯一的问题是，我不知道如何使用scrapy过滤器。 – Dmitriy

Python的scrapy蜘蛛

相关推荐