查找文件中的文件名(来自目录)

问题描述:

我想查找某个类型的所有文件是否都已被我的程序记录下来。所以基本上,我有一个只有文件名的日志文件,然后使用一个函数来遍历文件来检查文件是否存在。现在内容非常庞大,但我以一种粗暴的方式做到了这一点。不幸的是,它不能正常工作。查找文件中的文件名(来自目录)

import subprocess 
import sys 
import signal 
import shutil 
import os, fnmatch 


#open file to read 
f=open("logs", "r") #files are stored in this directory 
o=open("all_output_logs","w") 
e=open("missing_logs",'w') 


def locate(pattern, root=os.curdir): 
    '''Locate all files matching supplied filename pattern in and below 
    supplied root directory.''' 
     #ignore directories- ignore works, just uncomment. 
    #ignored = ["0201", "0306"] 
    for path, dirs, files in os.walk(os.path.abspath(root)): 
     #for dir in ignored: 
      # if dir in dirs: 
       #dirs.remove(dir) 
     for filename in fnmatch.filter(files, pattern): 
      yield os.path.join(path, filename) 



    #here i log all the files in the output file to search in 
for line in f: 
    if line.startswith("D:"): 
     filename = line 
     #print line 
     o.write(filename) 

f.close() 
o.close() 
r.close() 

i=open("all_output_logs","r") 
#primitive search.. going through each file in the directory to see if its there in the log file 
for filename in locate("*.dll"): 
    for line in i: 
     if filename in i: 
      count=count+1 
      print count 
     else: 
      e.write(filename) 

我没有看到我的虚拟变量计数正在打印,我只有一个文件名,它在列表中间。

问题是只能在第一遍读取文件中的行,并且文件对象(您的案例中的i)不支持使用如您所期望的in运算符。您可以将代码更改为如下所示:

lines = open("all_output_logs","r").readlines() 
for filename in locate("*.dll"): 
    for line in lines: 
     if filename in line: 
      count=count+1 
      print count 
     else: 
      e.write(filename) 

但它仍然效率低下,有点尴尬。

既然你说的日志文件是“庞大”那么你可能不希望它全部读入内存,所以你必须要退每个查询:

f = open("all_output_logs","r") 
for filename in locate("*.dll"): 
    f.seek(0) 
    for line in f: 
     if filename in line: 
      count=count+1 
      print count 
     else: 
      e.write(filename) 

我离开in操作符,因为您没有指定日志文件的每一行包含哪些内容。人们会预期filename == line.strip()是正确的比较。