查找文件中的文件名(来自目录)
问题描述:
我想查找某个类型的所有文件是否都已被我的程序记录下来。所以基本上,我有一个只有文件名的日志文件,然后使用一个函数来遍历文件来检查文件是否存在。现在内容非常庞大,但我以一种粗暴的方式做到了这一点。不幸的是,它不能正常工作。查找文件中的文件名(来自目录)
import subprocess
import sys
import signal
import shutil
import os, fnmatch
#open file to read
f=open("logs", "r") #files are stored in this directory
o=open("all_output_logs","w")
e=open("missing_logs",'w')
def locate(pattern, root=os.curdir):
'''Locate all files matching supplied filename pattern in and below
supplied root directory.'''
#ignore directories- ignore works, just uncomment.
#ignored = ["0201", "0306"]
for path, dirs, files in os.walk(os.path.abspath(root)):
#for dir in ignored:
# if dir in dirs:
#dirs.remove(dir)
for filename in fnmatch.filter(files, pattern):
yield os.path.join(path, filename)
#here i log all the files in the output file to search in
for line in f:
if line.startswith("D:"):
filename = line
#print line
o.write(filename)
f.close()
o.close()
r.close()
i=open("all_output_logs","r")
#primitive search.. going through each file in the directory to see if its there in the log file
for filename in locate("*.dll"):
for line in i:
if filename in i:
count=count+1
print count
else:
e.write(filename)
我没有看到我的虚拟变量计数正在打印,我只有一个文件名,它在列表中间。
答
问题是只能在第一遍读取文件中的行,并且文件对象(您的案例中的i
)不支持使用如您所期望的in
运算符。您可以将代码更改为如下所示:
lines = open("all_output_logs","r").readlines()
for filename in locate("*.dll"):
for line in lines:
if filename in line:
count=count+1
print count
else:
e.write(filename)
但它仍然效率低下,有点尴尬。
既然你说的日志文件是“庞大”那么你可能不希望它全部读入内存,所以你必须要退每个查询:
f = open("all_output_logs","r")
for filename in locate("*.dll"):
f.seek(0)
for line in f:
if filename in line:
count=count+1
print count
else:
e.write(filename)
我离开in
操作符,因为您没有指定日志文件的每一行包含哪些内容。人们会预期filename == line.strip()
是正确的比较。