Python的响应解码

问题描述：

对于使用urllib下面几行：Python的响应解码

# some request object exists 
response = urllib.request.urlopen(request) 
html = response.read().decode("utf8")

什么格式字符串的呢read()回报？我一直试图从Python的文档中找出它，但它根本没有提及它。为什么有decode？是否decode对象从 UTF-8进行解码，以 UTF-8或？从什么格式到解码到什么格式？ decode文档也没有提到这一点。是Python的文档是可怕的，还是我不明白一些标准的约定？

我想存储在UTF-8的文件，HTML。我只是做一个正常的写作，还是我需要“编码”回到某些东西并写下来？

注：我知道的urllib已过时，但我不能切换现在

感谢向下票没有评论...？ – darksky 2013-03-16 20:33:51

[如何停止的痛苦？（http://www.youtube.com/watch?v=sgHbC6udIqc） – root 2013-03-16 20:35:14

真棒，谢谢@root！ – darksky 2013-03-16 20:38:11

答

到的urllib2问蟒蛇：

>>> r=urllib.urlopen("http://google.com") 
>>> a=r.read() 
>>> type(a) 
0: <type 'str'> 
>>> help(a.decode) 
Help on built-in function decode: 

decode(...) 
    S.decode([encoding[,errors]]) -> object 

    Decodes S using the codec registered for encoding. encoding defaults 
    to the default encoding. errors may be given to set a different error 
    handling scheme. Default is 'strict' meaning that encoding errors raise 
    a UnicodeDecodeError. Other possible values are 'ignore' and 'replace' 
    as well as any other name registered with codecs.register_error that is 
    able to handle UnicodeDecodeErrors. 

>>> b = a.decode('utf8') 
>>> type(b) 
1: <type 'unicode'> 
>>>

所以，看来read()返回str。 .decode()从 UTF-8 Python的内部Unicode格式解码。

出于某种原因，我所使用的'decode（）'doc页面是不同的。谢谢 – darksky 2013-03-16 20:39:52

所以'str'不支持所有unicode字符，因此'read（）'后面的'decode（）'链接？ – darksky 2013-03-16 20:41:49

相关推荐