最好的办法

问题描述：

我有一个行的文件像最好的办法

account = "TEST1" Qty=100 price = 20.11 subject="some value" values="3=this, 4=that"

没有特殊的分隔符，每个键有由双引号包围的值，如果它是一个字符串，但不是，如果它是一个数字。没有价值的键虽然可能存在空白字符串表示为“”，并且没有转义字符的报价，因为它不是必需的

我想知道什么是一种很好的解析方法与python一致并将值存储为字典中的键值对

答

我们将需要一个正则表达式。

import re, decimal 
r= re.compile('([^ =]+) *= *("[^"]*"|[^ ]*)') 

d= {} 
for k, v in r.findall(line): 
    if v[:1]=='"': 
     d[k]= v[1:-1] 
    else: 
     d[k]= decimal.Decimal(v) 

>>> d 
{'account': 'TEST1', 'subject': 'some value', 'values': '3=this, 4=that', 'price': Decimal('20.11'), 'Qty': Decimal('100.0')}

如果您愿意，可以使用浮点数而不是小数点，但如果涉及金钱可能是个坏主意。

您能解释一下这个正则表达式吗？ – ash 2011-03-02 01:32:23

答

bobince的的递归变化解析值与嵌入式等于辞书：

>>> import re 
>>> import pprint 
>>> 
>>> def parse_line(line): 
...  d = {} 
...  a = re.compile(r'\s*(\w+)\s*=\s*("[^"]*"|[^ ,]*),?') 
...  float_re = re.compile(r'^\d.+$') 
...  int_re = re.compile(r'^\d+$') 
...  for k,v in a.findall(line): 
...    if int_re.match(k): 
...      k = int(k) 
...    if v[-1] == '"': 
...      v = v[1:-1] 
...    if '=' in v: 
...      d[k] = parse_line(v) 
...    elif int_re.match(v): 
...      d[k] = int(v) 
...    elif float_re.match(v): 
...      d[k] = float(v) 
...    else: 
...      d[k] = v 
...  return d 
... 
>>> line = 'account = "TEST1" Qty=100 price = 20.11 subject="some value" values= 
"3=this, 4=that"' 
>>> pprint.pprint(parse_line(line)) 
{'Qty': 100, 
'account': 'TEST1', 
'price': 20.109999999999999, 
'subject': 'some value', 
'values': {3: 'this', 4: 'that'}}

答

如果你不想使用正则表达式，另一种选择是只为了一次读取字符串的字符：

string = 'account = "TEST1" Qty=100 price = 20.11 subject="some value" values="3=this, 4=that"' 

inside_quotes = False 
key = None 
value = "" 
dict = {} 

for c in string: 
    if c == '"': 
     inside_quotes = not inside_quotes 
    elif c == '=' and not inside_quotes: 
     key = value 
     value = '' 
    elif c == ' ': 
     if inside_quotes: 
      value += ' '; 
     elif key and value: 
      dict[key] = value 
      key = None 
      value = '' 
    else: 
     value += c 

dict[key] = value 
print dict

答

也许有点简单遵循的是pyparsing再现：

from pyparsing import * 

# define basic elements - use re's for numerics, faster than easier than 
# composing from pyparsing objects 
integer = Regex(r'[+-]?\d+') 
real = Regex(r'[+-]?\d+\.\d*') 
ident = Word(alphanums) 
value = real | integer | quotedString.setParseAction(removeQuotes) 

# define a key-value pair, and a configline as one or more of these 
# wrap configline in a Dict so that results are accessible by given keys 
kvpair = Group(ident + Suppress('=') + value) 
configline = Dict(OneOrMore(kvpair)) 

src = 'account = "TEST1" Qty=100 price = 20.11 subject="some value" ' \ 
     'values="3=this, 4=that"' 

configitems = configline.parseString(src)

现在您可以使用返回的配置项ParseResults对象访问您的作品：

>>> print configitems.asList() 
[['account', 'TEST1'], ['Qty', '100'], ['price', '20.11'], 
['subject', 'some value'], ['values', '3=this, 4=that']] 

>>> print configitems.asDict() 
{'account': 'TEST1', 'Qty': '100', 'values': '3=this, 4=that', 
    'price': '20.11', 'subject': 'some value'} 

>>> print configitems.dump() 
[['account', 'TEST1'], ['Qty', '100'], ['price', '20.11'], 
['subject', 'some value'], ['values', '3=this, 4=that']] 
- Qty: 100 
- account: TEST1 
- price: 20.11 
- subject: some value 
- values: 3=this, 4=that 

>>> print configitems.keys() 
['account', 'subject', 'values', 'price', 'Qty'] 

>>> print configitems.subject 
some value

相关推荐