使用Tweepy检索Twitter数据

问题描述:

我正在使用Python代码使用Tweepy库来检索特定主题标签的Twitter数据,但问题是我需要检索特定时间段,例如2013年6月30日至2013年12月30日。我怎样才能做到这一点?使用Tweepy检索Twitter数据

#imports 
from tweepy import Stream 
from tweepy import OAuthHandler 
from tweepy.streaming import StreamListener 

#setting up the keys 
consumer_key = '……………….' 
consumer_secret = '……………..' 
access_token = '……………….' 
access_secret = '……………..' 

class TweetListener(StreamListener): 
# A listener handles tweets are the received from the stream. 
#This is a basic listener that just prints received tweets to standard output 

    def on_data(self, data): 
    print (data) 
    return True 

    def on_error(self, status): 
    print (status) 



#printing all the tweets to the standard output 
auth = OAuthHandler(consumer_key, consumer_secret) 
auth.set_access_token(access_token, access_secret) 



stream = Stream(auth, TweetListener()) 

t = u"#سوريا" 
stream.filter(track=[t]) 
+1

您无法获取该数据;见例如http://*.com/a/1733360/3001761 – jonrsharpe 2014-11-02 16:07:44

+0

但我连续运行两天的代码,检索数据。所有这些元数据只有三个星期? – Hana 2014-11-02 16:29:59

+0

@Hana你能解决这个问题吗? – user3378649 2014-11-02 23:32:05

我仍在调查为什么我不能得到使用tweepy.Cursor(api.search, geocode=.., q=query, until=date)相同的结果也许是这个reason。但是我可以在两个日期之间使用Tweepy检索Twitter数据。

首先,我在开始日期和结束日期之间创建了一个日期生成器。

def date_range(start,end): 
    current = start 
    while (end - current).days >= 0: 
     yield current 
     current = current + datetime.timedelta(seconds=1) #Based on your need, but you could do it per day/minute/hour 

然后,我创建了一个Listener,所以我可以说是在特定的一天通过访问status.created_at

创建你的代码应该看起来像鸣叫:

import tweepy 
from tweepy import Stream 
from tweepy import OAuthHandler 
from tweepy.streaming import StreamListener 
import json 
import datetime 


#Use your keys 
consumer_key = '...' 
consumer_secret = '...' 
access_token = '...' 
access_secret = '...' 


auth = OAuthHandler(consumer_key, consumer_secret) 
auth.set_access_token(access_token, access_secret) 

def date_range(start,end): 
    current = start 
    while (end - current).days >= 0: 
     yield current 
     current = current + datetime.timedelta(seconds=1) 

class TweetListener(StreamListener): 
    def on_status(self, status): 
     #api = tweepy.API(auth_handler=auth) 
     #status.created_at += timedelta(hours=900) 

     startDate = datetime.datetime(2013, 06, 30) 
     stopDate = datetime.datetime(2013, 10, 30) 
     for date in date_range(startDate,stopDate): 
      status.created_at = date 
      print "tweet " + str(status.created_at) +"\n" 
      print status.text + "\n" 
      # You can dump your tweets into Json File, or load it to your database 

stream = Stream(auth, TweetListener(), secure=True,) 
t = u"#Syria" # You can use different hashtags 
stream.filter(track=[t]) 

输出:

我只是打印日期来检查(我不希望垃圾邮件与政治tweet的*)。

tweet 2013-06-30 00:00:01 

------------------- 

tweet 2013-06-30 00:00:02 

------------------- 

tweet 2013-06-30 00:00:03 

------------------- 

tweet 2013-06-30 00:00:04 

------------------- 

tweet 2013-06-30 00:00:05 

------------------- 

tweet 2013-06-30 00:00:06 

------------------- 

tweet 2013-06-30 00:00:07 

------------------- 

tweet 2013-06-30 00:00:08 

------------------- 

tweet 2013-06-30 00:00:09 

------------------- 
+0

谢谢Taha,我会在系统完成检索数据后尝试该代码。 – Hana 2014-11-02 18:51:03

+0

当然,非常感谢。 – Hana 2014-11-02 19:38:59

+0

我已经试过你的代码,它的工作原理,但我只有推文和推文时间没有用户ID! – Hana 2014-11-19 00:02:32