抓取YouTube用户信息

问题描述：

我试图抓取Youtube以检索关于一组用户（大约200人）的信息。抓取YouTube用户信息

接触
用户
订阅
他们在
评论什么视频等

我已经成功地：我在寻找的用户之间的关系很感兴趣获取以下来源的联系信息：

import gdata.youtube 
import gdata.youtube.service 
from gdata.service import RequestError 
from pub_author import KEY, NAME_REGEX 
def get_details(name): 
    yt_service = gdata.youtube.service.YouTubeService() 
    yt_service.developer_key = KEY 
    contact_feed = yt_service.GetYouTubeContactFeed(username=name) 
    contacts = [ e.title.text for e in contact_feed.entry ] 
    return contacts

我似乎无法获得我需要的其他信息。 reference guide表示我可以从http://gdata.youtube.com/feeds/api/users/username/subscriptions?v=2（对于某些任意用户）获取XML源。但是，如果我试图让其他用户的订阅，我得到了一个403错误，消息如下：

用户必须先登录才能访问这些订阅。

如果我使用GDATA API：

sub_feed = yt_service.GetYouTubeSubscriptionFeed(username=name) 
sub = [ e.title.text for e in contact_feed.entry ]

然后我得到了同样的错误。

如何在不登录的情况下获得这些订阅？这应该是可能的，因为您可以在不登录Youtube网站的情况下访问这些信息。

此外，似乎没有特定用户的订阅者的订阅源。这些信息是否可以通过API获得？

编辑

所以，看来这无法通过API来完成。我不得不这样做快速和肮脏的方式：

for f in `cat users.txt`; do wget "www.youtube.com/profile?user=$f&view=subscriptions" --output-document subscriptions/$f.html; done

然后使用这个脚本从下载HTML文件脱身的用户名：

"""Extract usernames from a Youtube profile using regex""" 
import re 
def main(): 
    import sys 
    lines = open(sys.argv[1]).read().split('\n') 
    # 
    # The html files has two <a href="..."> tags for each user: once for an 
    # image thumbnail, and once for a text link. 
    # 
    users = set() 
    for l in lines: 
     match = re.search('<a href="/user/(?P<name>[^"]+)" onmousedown', l) 
     if match: 
      users.add(match.group('name')) 
    users = list(users) 
    users.sort() 
    print users 
if __name__ == '__main__': 
    main()

答

为了访问用户的订阅供稿没有用户登录后，用户必须检查他的Account Sharing settings下的“订阅频道”复选框。

目前，没有直接的方式通过gdata API获取频道的订阅者。事实上，它已经有一个突出的功能请求，它已经超过3年了！见Retrieving a list of a user's subscribers?。

抓取YouTube用户信息

相关推荐