最好的办法
问题描述:
我有一个行的文件像最好的办法
account = "TEST1" Qty=100 price = 20.11 subject="some value" values="3=this, 4=that"
没有特殊的分隔符,每个键有由双引号包围的值,如果它是一个字符串,但不是,如果它是一个数字。没有价值的键虽然可能存在空白字符串表示为“”,并且没有转义字符的报价,因为它不是必需的
我想知道什么是一种很好的解析方法与python一致并将值存储为字典中的键值对
答
我们将需要一个正则表达式。
import re, decimal
r= re.compile('([^ =]+) *= *("[^"]*"|[^ ]*)')
d= {}
for k, v in r.findall(line):
if v[:1]=='"':
d[k]= v[1:-1]
else:
d[k]= decimal.Decimal(v)
>>> d
{'account': 'TEST1', 'subject': 'some value', 'values': '3=this, 4=that', 'price': Decimal('20.11'), 'Qty': Decimal('100.0')}
如果您愿意,可以使用浮点数而不是小数点,但如果涉及金钱可能是个坏主意。
答
bobince的的递归变化解析值与嵌入式等于辞书:
>>> import re
>>> import pprint
>>>
>>> def parse_line(line):
... d = {}
... a = re.compile(r'\s*(\w+)\s*=\s*("[^"]*"|[^ ,]*),?')
... float_re = re.compile(r'^\d.+$')
... int_re = re.compile(r'^\d+$')
... for k,v in a.findall(line):
... if int_re.match(k):
... k = int(k)
... if v[-1] == '"':
... v = v[1:-1]
... if '=' in v:
... d[k] = parse_line(v)
... elif int_re.match(v):
... d[k] = int(v)
... elif float_re.match(v):
... d[k] = float(v)
... else:
... d[k] = v
... return d
...
>>> line = 'account = "TEST1" Qty=100 price = 20.11 subject="some value" values=
"3=this, 4=that"'
>>> pprint.pprint(parse_line(line))
{'Qty': 100,
'account': 'TEST1',
'price': 20.109999999999999,
'subject': 'some value',
'values': {3: 'this', 4: 'that'}}
答
如果你不想使用正则表达式,另一种选择是只为了一次读取字符串的字符:
string = 'account = "TEST1" Qty=100 price = 20.11 subject="some value" values="3=this, 4=that"'
inside_quotes = False
key = None
value = ""
dict = {}
for c in string:
if c == '"':
inside_quotes = not inside_quotes
elif c == '=' and not inside_quotes:
key = value
value = ''
elif c == ' ':
if inside_quotes:
value += ' ';
elif key and value:
dict[key] = value
key = None
value = ''
else:
value += c
dict[key] = value
print dict
答
也许有点简单遵循的是pyparsing再现:
from pyparsing import *
# define basic elements - use re's for numerics, faster than easier than
# composing from pyparsing objects
integer = Regex(r'[+-]?\d+')
real = Regex(r'[+-]?\d+\.\d*')
ident = Word(alphanums)
value = real | integer | quotedString.setParseAction(removeQuotes)
# define a key-value pair, and a configline as one or more of these
# wrap configline in a Dict so that results are accessible by given keys
kvpair = Group(ident + Suppress('=') + value)
configline = Dict(OneOrMore(kvpair))
src = 'account = "TEST1" Qty=100 price = 20.11 subject="some value" ' \
'values="3=this, 4=that"'
configitems = configline.parseString(src)
现在您可以使用返回的配置项ParseResults对象访问您的作品:
>>> print configitems.asList()
[['account', 'TEST1'], ['Qty', '100'], ['price', '20.11'],
['subject', 'some value'], ['values', '3=this, 4=that']]
>>> print configitems.asDict()
{'account': 'TEST1', 'Qty': '100', 'values': '3=this, 4=that',
'price': '20.11', 'subject': 'some value'}
>>> print configitems.dump()
[['account', 'TEST1'], ['Qty', '100'], ['price', '20.11'],
['subject', 'some value'], ['values', '3=this, 4=that']]
- Qty: 100
- account: TEST1
- price: 20.11
- subject: some value
- values: 3=this, 4=that
>>> print configitems.keys()
['account', 'subject', 'values', 'price', 'Qty']
>>> print configitems.subject
some value
您能解释一下这个正则表达式吗? – ash 2011-03-02 01:32:23