阅读磁盘
问题描述:
大JSON文件I有一个大的JSON文件,db.json(> 100 MB)具有以下内容:阅读磁盘
{"sitters": [["9919.html", 3, 8, 19, 47, 120, 129, 359]], "yellow": [["9945.html", 791],
["9983.html", 1496], ["9984.html", 151]], "four": [["9971.html", 81, 403], ["9991.html", 37],
["9995.html", 45, 225, 337], ["9975.html", 15], ["9978.html", 100], ["9948.html", 381],
["9966.html", 228], ...
,其中键是字和值的文件名,然后由索引这个词出现在文件中。我想查询n这个JSON文件的字数,然后检索它们相应的文件名和位置。任何想法如何有效地做到这一点给予大文件的大小?我一直在看IJSON,但我似乎无法让它工作。我曾尝试:
parser = parse("db.json")
for prefix, event, value in parser:
if event == 'sitters':
print value
但我可能不会理解,因为它给了我下面的错误如何正确地使用它:
Traceback (most recent call last):
File "retriever.py", line 43, in <module>
sys.exit(main())
File "retriever.py", line 38, in main
for prefix, event, value in parser:
File "/usr/local/lib/python2.7/dist-packages/ijson/common.py", line 63, in parse
for event, value in basic_events:
File "/usr/local/lib/python2.7/dist-packages/ijson/backends/yajl2.py", line 90, in basic_parse
buffer = f.read(buf_size)
AttributeError: 'str' object has no attribute 'read'
任何帮助,不胜感激!
答
你试图解析string
'db.json'
该行的文件'db.json'
代替:
parser = parse("db.json")
正如你可以在错误信息中看到,该行buffer = f.read(buf_size)
抛出此异常:
AttributeError: 'str' object has no attribute 'read'
函数parse
需要一个文件:
f = open('db.json', 'r')
parser = parse(f)
并关闭它你的工作完成后:
f.close()
您也可以使用with
语句处理打开和关闭的过程:
with open('db.json') as f:
parser = parse(f)
# use your parser and after leaving this block indent you're done
JSON是不擅长快速查找。考虑将数据库转换为更合适的格式(例如MySQL)。 – 2013-05-11 07:51:29