无法解码堆栈交换API的unicode
问题描述:
我在看this codegolf problem,并决定尝试采取python solution并使用urllib
来代替。我修改some sample code与urllib
操纵json
:无法解码堆栈交换API的unicode
import urllib.request
import json
res = urllib.request.urlopen('http://api.stackexchange.com/questions?sort=hot&site=codegolf')
res_body = res.read()
j = json.loads(res_body.decode("utf-8"))
这给:
➜ codegolf python clickbait.py
Traceback (most recent call last):
File "clickbait.py", line 7, in <module>
j = json.loads(res_body.decode("utf-8"))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
如果你去:http://api.stackexchange.com/questions?sort=hot&site=codegolf并点击 “头”,它说charset=utf-8
。为什么它会给我这些奇怪的结果urlopen
?
答
res_body
被gzipped。我不确定解压缩回复是默认情况下urllib
需要处理的内容。
如果您解压缩来自API服务器的响应,您将获得数据。
import urllib.request
import zlib
import json
with urllib.request.urlopen(
'http://api.stackexchange.com/questions?sort=hot&site=codegolf'
) as res:
decompressed_data = zlib.decompress(res.read(), 16+zlib.MAX_WBITS)
j = json.loads(decompressed_data, encoding='utf-8')
print(j)