如何处理这个文本文件并解析我需要的内容?
问题描述:
我试图从Python doctest模块解析输出并将其存储在HTML文件中。如何处理这个文本文件并解析我需要的内容?
我有相似的输出:
**********************************************************************
File "example.py", line 16, in __main__.factorial
Failed example:
[factorial(n) for n in range(6)]
Expected:
[0, 1, 2, 6, 24, 120]
Got:
[1, 1, 2, 6, 24, 120]
**********************************************************************
File "example.py", line 20, in __main__.factorial
Failed example:
factorial(30)
Expected:
25252859812191058636308480000000L
Got:
265252859812191058636308480000000L
**********************************************************************
1 items had failures:
2 of 8 in __main__.factorial
***Test Failed*** 2 failures.
每个失败是由星号线,其限定互相每个测试失败的前面。
我想要做的是去掉失败的文件名和方法,以及预期的和实际的结果。然后我想用这个创建一个HTML文档(或者将它存储在一个文本文件中,然后进行第二轮解析)。
我该如何使用Python或UNIX shell实用程序的组合来完成此操作?
编辑:我制定了以下shell脚本匹配每个块如何我想,但我不确定如何将每个sed匹配重定向到它自己的文件。
python example.py | sed -n '/.*/,/^\**$/p' > `mktemp error.XXX`
答
这是一个快速和肮脏的脚本解析输出与相关信息的元组:
import sys
import re
stars_re = re.compile('^[*]+$', re.MULTILINE)
file_line_re = re.compile(r'^File "(.*?)", line (\d*), in (.*)$')
doctest_output = sys.stdin.read()
chunks = stars_re.split(doctest_output)[1:-1]
for chunk in chunks:
chunk_lines = chunk.strip().splitlines()
m = file_line_re.match(chunk_lines[0])
file, line, module = m.groups()
failed_example = chunk_lines[2].strip()
expected = chunk_lines[4].strip()
got = chunk_lines[6].strip()
print (file, line, module, failed_example, expected, got)
答
你可以写一个Python程序除了挑这个,但也许一个更好的事情将考虑修改文档测试输出,你首先需要的报告。从文档的doctest.DocTestRunner:
... the display output
can be also customized by subclassing DocTestRunner, and
overriding the methods `report_start`, `report_success`,
`report_unexpected_exception`, and `report_failure`.
+0
我一定会看看这个! – samoz 2009-08-07 22:20:50
答
我pyparsing做到这一点写了一个快速的解析器。
from pyparsing import *
str = """
**********************************************************************
File "example.py", line 16, in __main__.factorial
Failed example:
[factorial(n) for n in range(6)]
Expected:
[0, 1, 2, 6, 24, 120]
Got:
[1, 1, 2, 6, 24, 120]
**********************************************************************
File "example.py", line 20, in __main__.factorial
Failed example:
factorial(30)
Expected:
25252859812191058636308480000000L
Got:
265252859812191058636308480000000L
**********************************************************************
"""
quote = Literal('"').suppress()
comma = Literal(',').suppress()
in_ = Keyword('in').suppress()
block = OneOrMore("**").suppress() + \
Keyword("File").suppress() + \
quote + Word(alphanums + ".") + quote + \
comma + Keyword("line").suppress() + Word(nums) + comma + \
in_ + Word(alphanums + "._") + \
LineStart() + restOfLine.suppress() + \
LineStart() + restOfLine + \
LineStart() + restOfLine.suppress() + \
LineStart() + restOfLine + \
LineStart() + restOfLine.suppress() + \
LineStart() + restOfLine
all = OneOrMore(Group(block))
result = all.parseString(str)
for section in result:
print section
给
['example.py', '16', '__main__.factorial', ' [factorial(n) for n in range(6)]', ' [0, 1, 2, 6, 24, 120]', ' [1, 1, 2, 6, 24, 120]']
['example.py', '20', '__main__.factorial', ' factorial(30)', ' 25252859812191058636308480000000L', ' 265252859812191058636308480000000L']
答
这可能是我写过的最优雅的Python脚本之一,但它应有的框架,做你想做的,而不诉诸UNIX实用程序和单独的脚本创建html。它没有经过测试,但应该只需稍作调整即可工作。
import os
import sys
#create a list of all files in directory
dirList = os.listdir('')
#Ignore anything that isn't a .txt file.
#
#Read in text, then split it into a list.
for thisFile in dirList:
if thisFile.endswith(".txt"):
infile = open(thisFile,'r')
rawText = infile.read()
yourList = rawText.split('\n')
#Strings
compiledText = ''
htmlText = ''
for i in yourList:
#clunky way of seeing whether or not current line
#should be included in compiledText
if i.startswith("*****"):
compiledText += "\n\n--- New Report ---\n"
if i.startswith("File"):
compiledText += i + '\n'
if i.startswith("Fail"):
compiledText += i + '\n'
if i.startswith("Expe"):
compiledText += i + '\n'
if i.startswith("Got"):
compiledText += i + '\n'
if i.startswith(" "):
compiledText += i + '\n'
#insert your HTML template below
htmlText = '<html>...\n <body> \n '+htmlText+'</body>... </html>'
#write out to file
outfile = open('processed/'+thisFile+'.html','w')
outfile.write(htmlText)
outfile.close()
如果剥离文件,方法,预期结果和实际结果,剩下的是什么? – juanjux 2009-08-07 20:20:22
嗯,我只是无法解析他们到单独的块,因为到目前为止,我只能一次抓住整个块,而不是单个字段。 – samoz 2009-08-07 20:26:12