如何从字符串中提取多次出现的字典?
问题描述:
我想从字符串中提取多个python字典。目前我正在使用正在失败的正则表达式,因为它也匹配字典之间的数据。我也使用了非贪婪的正则表达式({.+?})
,但它弄乱了嵌套字典并将它们视为不同的事件。如何从字符串中提取多次出现的字典?
例字符串:
mystring = '(2017-05-29, { "mydict": [{ "hello": "world"}, {"hello2":"world2"}]};;/url/string, {"dict2":{"world":"hello"}}'
代码:
>>>import re
>>>match_data = re.compile('({.+})')
>>>match_data.findall(mystring.strip())
['{ "mydict": [{ "hello": "world"}, {"hello2":"world2"}]};;/url/string, {"dict2":{"world":"hello"}}']
预期输出:
['{ "mydict": [{ "hello": "world"}, {"hello2":"world2"}]}', '{"dict2":{"world":"hello"}}']
答
正则表达式可能是对这个问题过于简单化了。然而,一个可能的解决方案是符合paratheses:
s = '{ "mydict": [{ "hello": "wo}}rld"}, {"hello2":"world2"}]};;/url/string, {"dict2":{"world":"hello"}}'
number_of_parthesis = 0
start_index = -1
in_quotes = False
for i,c in enumerate(s):
if c in ["\'", "\""]:
if in_quotes:
in_quotes = False
else:
in_quotes = True
if in_quotes:
continue
if c == "{":
number_of_parthesis += 1
if start_index == -1:
start_index = i
if c == "}":
number_of_parthesis -= 1
if number_of_parthesis == 0:
print(s[start_index:i+1])
start_index = -1
导致:
{ "mydict": [{ "hello": "wo}}rld"}, {"hello2":"world2"}]}
{"dict2":{"world":"hello"}}
我想你会需要编写Python字典的分析器。 – 0605002
试试这个're.findall(r'{。+?}',mystring))',它不会给出你完全不同的东西,但是你可以很容易地解析数据。 – Arun
是“;/url/string”数据总是在同一个地方?就像在两个字典之间? – DexJ