如何解析xsd:dateTime格式?

问题描述:

xsd:dateTime类型的值可以有多种形式,如described in RELAX NG如何解析xsd:dateTime格式?

如何将所有表单解析为时间或日期时间对象?

+0

你问的是如何解析CCYY-MM-DDThh: mm:ss [Z |(+ | - )hh:mm]'带有'datetime.datetime.strptime'?还是有更多不在引用的页面上? –

+0

@ S.Lott:稍微多一点或那个页面是因为它将微秒显示为有效输入(朝底部)。 – 2010-02-06 01:07:15

它实际上是一个非常有限制的格式,尤其是与所有ISO 8601相比。使用正则表达式大多与使用strptime加上自己处理偏移量(哪个strptime不做)相同。

import datetime 
import re 

def parse_timestamp(s): 
    """Returns (datetime, tz offset in minutes) or (None, None).""" 
    m = re.match("""^
    (?P<year>-?[0-9]{4}) - (?P<month>[0-9]{2}) - (?P<day>[0-9]{2}) 
    T (?P<hour>[0-9]{2}) : (?P<minute>[0-9]{2}) : (?P<second>[0-9]{2}) 
    (?P<microsecond>\.[0-9]{1,6})? 
    (?P<tz> 
     Z | (?P<tz_hr>[-+][0-9]{2}) : (?P<tz_min>[0-9]{2}) 
    )? 
    $ """, s, re.X) 
    if m is not None: 
    values = m.groupdict() 
    if values["tz"] in ("Z", None): 
     tz = 0 
    else: 
     tz = int(values["tz_hr"]) * 60 + int(values["tz_min"]) 
    if values["microsecond"] is None: 
     values["microsecond"] = 0 
    else: 
     values["microsecond"] = values["microsecond"][1:] 
     values["microsecond"] += "0" * (6 - len(values["microsecond"])) 
    values = dict((k, int(v)) for k, v in values.iteritems() 
        if not k.startswith("tz")) 
    try: 
     return datetime.datetime(**values), tz 
    except ValueError: 
     pass 
    return None, None 

无法处理把时区偏移的日期时间,和消极年与日期时间的问题。这两个问题都可以通过处理xsd:dateTime所需全部范围的不同时间戳类型来解决。

valid = [ 
    "2001-10-26T21:32:52", 
    "2001-10-26T21:32:52+02:00", 
    "2001-10-26T19:32:52Z", 
    "2001-10-26T19:32:52+00:00", 
    #"-2001-10-26T21:32:52", 
    "2001-10-26T21:32:52.12679", 
] 
for v in valid: 
    print 
    print v 
    r = parse_timestamp(v) 
    assert all(x is not None for x in r), v 

    # quick and dirty, and slightly wrong 
    # (doesn't distinguish +00:00 from Z among other issues) 
    # but gets through the above cases 

    tz = ":".join("%02d" % x for x in divmod(r[1], 60)) if r[1] else "Z" 
    if r[1] > 0: tz = "+" + tz 
    r = r[0].isoformat() + tz 

    print r 
    assert r.startswith(v[:len("CCYY-MM-DDThh:mm:ss")]), v 

print "---" 
invalid = [ 
    "2001-10-26", 
    "2001-10-26T21:32", 
    "2001-10-26T25:32:52+02:00", 
    "01-10-26T21:32", 
] 
for v in invalid: 
    print v 
    r = parse_timestamp(v) 
    assert all(x is None for x in r), v 

尝试python-dateutildateutil.parser模块。或者可能isodate(还没有使用过最后一个,但看起来很有趣(并且专门仅用于解析ISO 8601格式)