切片线和保存参数到不同的文件

问题描述:

我有一个g.out文件(粘贴在下面)。切片线和保存参数到不同的文件

该文件包含几个我想提取的FINAL OPTIMIZED几何。

对于给定的FINAL OPTIMIZED GEOMETRY,这些突出显示的值都是我想提取:

enter image description here

我在下面的程序提取已管理的三个第一:VOLUMEAB

我的代码:

import os 
import sys 
import re 

initial_pattern = '^ FINAL OPTIMIZED GEOMETRY - DIMENSIONALITY OF THE SYSTEM  3$' 
middle_pattern = '^ CRYSTALLOGRAPHIC CELL ' 
end_pattern = '^ T = ATOM BELONGING TO THE ASYMMETRIC UNIT$' 


VOLUMES = [] 
P0 = [] 
P2 = [] 
atomic_number = [] 
coord_x = [] 
coord_y = [] 
coord_z = [] 

with open('g.out') as file: 
    for line in file: 
     if re.match(initial_pattern, line): 
      print file.next() 
      print file.next() 
      print file.next() 

      volume_line = file.next() 
      print volume_line 
      aux = volume_line.split() 
      each_volume = aux[7] 
      print each_volume 
      VOLUMES.append(each_volume) 

     if re.match(middle_pattern, line): 
      print line 

      print file.next() 
      parameters_line = file.next() 
      aux = parameters_line.split() 
      p0 = aux[0] 
      p1 = aux[1] 
      p2 = aux[2] 
      p3 = aux[3] 
      p4 = aux[4] 
      p5 = aux[5] # 

      print p0 
      print p2 

      P0.append(p0) 
      P2.append(p2) 

      print file.next() 
      print file.next() 
      print file.next() 
      print file.next() 

      first_coord_line = file.next() 
      print first_coord_line 

     if re.match(end_pattern, line): 
      end_pattern = line 
      print end_pattern 
      all_coordinates = [first_coord_line:end_pattern] 
      for line in all_coordinates: 
       del('F ')    # delete those that contain 'F ' 
       aux2 = line.split() 
       coords = [] 


sys.exit() 
#Template = 
""" 
some stuff 
other stuff 
p0  p2 
3 
A B  C   D 
E F  G   H 
I J  K   L 
other stuff 
some other stuff 
""" 

我不能够提取COORDINATES,因为我不能在这个伪代码中找到切片从first_coord_line线end_pattern的方式,如:

if re.match(end_pattern, line): 
    end_pattern = line 
    print end_pattern 
    all_coordinates = [first_coord_line:end_pattern] 
    for line in all_coordinates: 
     del('F ')    # delete those that contain 'F ' 
     aux2 = line.split() # split lines 
     atomic_number = aux2[2] 
     coord_x = aux2[4] 
     coord_y = aux2[5] 
     coord_z = aux2[6] 

有没有办法实现这个伪代码?

在我的代码,VOLUMESP0P2atomic_numbercoord_xcoord_ycoord_z是因为之前结束for循环,我想在不同的文件,用“VOLUME .INP”的名字命名,以保存列表初始化,这样的信息:

#Template = 
""" 
some stuff 
other stuff 
p0  p2 
3 
A B  C   D 
E F  G   H 
I J  K   L 
other stuff 
some other stuff 
""" 

其中p0p2被值在我的代码萃取(第二和第三突出了屏幕截图的值),和A - Latomic_numbercoord_x,coord_y,coord_z

有没有办法做到这一点?

g.out文件:

more lines 
more lines 
more lines 

FINAL OPTIMIZED GEOMETRY - DIMENSIONALITY OF THE SYSTEM  3 
(NON PERIODIC DIRECTION: LATTICE PARAMETER FORMALLY SET TO 500) 
******************************************************************************* 
LATTICE PARAMETERS (ANGSTROMS AND DEGREES) - BOHR = 0.5291772083 ANGSTROM 
PRIMITIVE CELL - CENTRING CODE 7/0 VOLUME= 119.823364 - DENSITY 2.770 g/cm^3 
     A    B    C   ALPHA  BETA  GAMMA 
    6.28373604  6.28373604  6.28373604 46.646397 46.646397 46.646397 
******************************************************************************* 
ATOMS IN THE ASYMMETRIC UNIT 3 - ATOMS IN THE UNIT CELL: 10 
    ATOM     X/A     Y/B     Z/C 
******************************************************************************* 
     1 T 20 CA 0.000000000000E+00 0.000000000000E+00 0.000000000000E+00 
     2 F 20 CA -5.000000000000E-01 -5.000000000000E-01 -5.000000000000E-01 
     3 T 6 C  2.500000000000E-01 2.500000000000E-01 2.500000000000E-01 
     4 F 6 C -2.500000000000E-01 -2.500000000000E-01 -2.500000000000E-01 
     5 T 8 O -4.924094276183E-01 -7.590572381674E-03 2.500000000000E-01 
     6 F 8 O  2.500000000000E-01 -4.924094276183E-01 -7.590572381674E-03 
     7 F 8 O -7.590572381674E-03 2.500000000000E-01 -4.924094276183E-01 
     8 F 8 O  4.924094276183E-01 7.590572381674E-03 -2.500000000000E-01 
     9 F 8 O -2.500000000000E-01 4.924094276183E-01 7.590572381674E-03 
    10 F 8 O  7.590572381674E-03 -2.500000000000E-01 4.924094276183E-01 

TRANSFORMATION MATRIX PRIMITIVE-CRYSTALLOGRAPHIC CELL 
    1.0000 0.0000 1.0000 -1.0000 1.0000 1.0000 0.0000 -1.0000 1.0000 

******************************************************************************* 
CRYSTALLOGRAPHIC CELL (VOLUME=  359.47009054) 
     A    B    C   ALPHA  BETA  GAMMA 
    4.97568007  4.97568007 16.76591397 90.000000 90.000000 120.000000 

COORDINATES IN THE CRYSTALLOGRAPHIC CELL 
    ATOM     X/A     Y/B     Z/C 
******************************************************************************* 
     1 T 20 CA 0.000000000000E+00 0.000000000000E+00 0.000000000000E+00 
     2 F 20 CA -5.491739570355E-17 -2.745869785177E-17 -5.000000000000E-01 
     3 T 6 C  3.333333333333E-01 -3.333333333333E-01 -8.333333333333E-02 
     4 F 6 C -3.333333333333E-01 3.333333333333E-01 8.333333333333E-02 
     5 T 8 O -4.090760942850E-01 -3.333333333333E-01 -8.333333333333E-02 
     6 F 8 O  3.333333333333E-01 -7.574276095166E-02 -8.333333333333E-02 
     7 F 8 O  7.574276095166E-02 4.090760942850E-01 -8.333333333333E-02 
     8 F 8 O  4.090760942850E-01 3.333333333333E-01 8.333333333333E-02 
     9 F 8 O -3.333333333333E-01 7.574276095166E-02 8.333333333333E-02 
    10 F 8 O -7.574276095166E-02 -4.090760942850E-01 8.333333333333E-02 

T = ATOM BELONGING TO THE ASYMMETRIC UNIT 
INFORMATION **** fort.34 **** GEOMETRY OUTPUT FILE 

more lines 
more lines 
more lines 

FINAL OPTIMIZED GEOMETRY - DIMENSIONALITY OF THE SYSTEM  3 
(NON PERIODIC DIRECTION: LATTICE PARAMETER FORMALLY SET TO 500) 
******************************************************************************* 
LATTICE PARAMETERS (ANGSTROMS AND DEGREES) - BOHR = 0.5291772083 ANGSTROM 
PRIMITIVE CELL - CENTRING CODE 7/0 VOLUME= 121.143469 - DENSITY 2.740 g/cm^3 
     A    B    C   ALPHA  BETA  GAMMA 
    6.32229536  6.32229536  6.32229536 46.436583 46.436583 46.436583 
******************************************************************************* 
ATOMS IN THE ASYMMETRIC UNIT 3 - ATOMS IN THE UNIT CELL: 10 
    ATOM     X/A     Y/B     Z/C 
******************************************************************************* 
     1 T 20 CA 0.000000000000E+00 0.000000000000E+00 0.000000000000E+00 
     2 F 20 CA 5.000000000000E-01 -5.000000000000E-01 -5.000000000000E-01 
     3 T 6 C  2.500000000000E-01 2.500000000000E-01 2.500000000000E-01 
     4 F 6 C -2.500000000000E-01 -2.500000000000E-01 -2.500000000000E-01 
     5 T 8 O -4.927088991116E-01 -7.291100888437E-03 2.500000000000E-01 
     6 F 8 O  2.500000000000E-01 -4.927088991116E-01 -7.291100888437E-03 
     7 F 8 O -7.291100888437E-03 2.500000000000E-01 -4.927088991116E-01 
     8 F 8 O  4.927088991116E-01 7.291100888437E-03 -2.500000000000E-01 
     9 F 8 O -2.500000000000E-01 4.927088991116E-01 7.291100888437E-03 
    10 F 8 O  7.291100888437E-03 -2.500000000000E-01 4.927088991116E-01 

TRANSFORMATION MATRIX PRIMITIVE-CRYSTALLOGRAPHIC CELL 
    1.0000 0.0000 1.0000 -1.0000 1.0000 1.0000 0.0000 -1.0000 1.0000 

******************************************************************************* 
CRYSTALLOGRAPHIC CELL (VOLUME=  363.43040599) 
     A    B    C   ALPHA  BETA  GAMMA 
    4.98494429  4.98494429 16.88768068 90.000000 90.000000 120.000000 

COORDINATES IN THE CRYSTALLOGRAPHIC CELL 
    ATOM     X/A     Y/B     Z/C 
******************************************************************************* 
     1 T 20 CA 0.000000000000E+00 0.000000000000E+00 0.000000000000E+00 
     2 F 20 CA -5.471726358381E-17 -2.735863179191E-17 -5.000000000000E-01 
     3 T 6 C  3.333333333333E-01 -3.333333333333E-01 -8.333333333333E-02 
     4 F 6 C -3.333333333333E-01 3.333333333333E-01 8.333333333333E-02 
     5 T 8 O -4.093755657782E-01 -3.333333333333E-01 -8.333333333333E-02 
     6 F 8 O  3.333333333333E-01 -7.604223244490E-02 -8.333333333333E-02 
     7 F 8 O  7.604223244490E-02 4.093755657782E-01 -8.333333333333E-02 
     8 F 8 O  4.093755657782E-01 3.333333333333E-01 8.333333333333E-02 
     9 F 8 O -3.333333333333E-01 7.604223244490E-02 8.333333333333E-02 
    10 F 8 O -7.604223244490E-02 -4.093755657782E-01 8.333333333333E-02 

T = ATOM BELONGING TO THE ASYMMETRIC UNIT 
INFORMATION **** fort.34 **** GEOMETRY OUTPUT FILE 

more lines 
more lines 
more lines 

更新后的代码:基于@nos标志的做法

,下面的代码能够提取的信息。 VOLUMES是一个包含2个元素的列表。 下面列出的结果:

VOLUMES = ['119.823364', '121.143469'] 
P0 = ['4.97568007', '4.98494429'] 
P2 = ['16.76591397', '16.88768068'] 
Xs = ['0.000000000000E+00', '3.333333333333E-01', '-4.090760942850E-01', '0.000000000000E+00', '3.333333333333E-01', '-4.093755657782E-01'] 
Ys = ['0.000000000000E+00', '-3.333333333333E-01', '-3.333333333333E-01', '0.000000000000E+00', '-3.333333333333E-01', '-3.333333333333E-01'] 
Zs = ['0.000000000000E+00', '-8.333333333333E-02', '-8.333333333333E-02', '0.000000000000E+00', '-8.333333333333E-02', '-8.333333333333E-02'] 
ATOMIC_NUMBERS = ['20', '6', '8', '20', '6', '8'] 

这篇文章的第二部分是写这个信息报告(P0P2ATOMIC_NUMBERSXsYsZs)两个VOLUME.inp文件中。换句话说,这样的:

V_119.823364.inp文件:

some stuff 
other stuff 
4.97568007 4.98494429 
3 
20 0.000000000000E+00 0.000000000000E+00 0.000000000000E+00 
6 3.333333333333E-01 -3.333333333333E-01 -8.333333333333E-02 
8 -4.090760942850E-01 -3.333333333333E-01 -8.333333333333E-02 
other stuff 

V_121.143469.inp文件:根据@号的atoms_per_frameatoms_all_frames的建议,我曾尝试下面的代码

some stuff 
other stuff 
4.97568007 4.98494429 
3 
20 0.000000000000E+00 0.000000000000E+00 0.000000000000E+00 
6 3.333333333333E-01 -3.333333333333E-01 -8.333333333333E-02 
8 -4.093755657782E-01 -3.333333333333E-01 -8.333333333333E-02 
other stuff 

。我在文件中发现元素方面存在困难,例如:

import os 
import sys 
import re 
import glob 

initial_pattern = '^ FINAL OPTIMIZED GEOMETRY - DIMENSIONALITY OF THE SYSTEM  3$' 
middle_pattern = '^ CRYSTALLOGRAPHIC CELL ' 
end_pattern = '^ T = ATOM BELONGING TO THE ASYMMETRIC UNIT$' 

global N_atom_irreducible_unit 
N_atom_irreducible_unit = 3 

VOLUMES = [] 
P0 = [] 
P2 = [] 
ATOMIC_NUMBERS = [] 
Xs = [] 
Ys = [] 
Zs = [] 

with open('g.out') as file: 
    passed_mid_point = False 
    for line in file: 
     if re.match(initial_pattern, line): 
      print file.next() 
      print file.next() 
      print file.next() 

      volume_line = file.next() 
      print volume_line 
      aux = volume_line.split() 
      each_volume = aux[7] 
      print each_volume 
      VOLUMES.append(each_volume) 

     if re.match(middle_pattern, line): 
      print line 

      print file.next() 
      parameters_line = file.next() 
      aux = parameters_line.split() 
      p0 = aux[0] 
      p1 = aux[1] 
      p2 = aux[2] 
      p3 = aux[3] 
      p4 = aux[4] 
      p5 = aux[5] # 

      print p0 
      print p2 

      P0.append(p0) 
      P2.append(p2) 

      print file.next() 
      print file.next() 
      print file.next() 
      print file.next() 

     if re.match(middle_pattern, line): 
      passed_mid_point = True 
      print 'line = ', line 

     if re.match(end_pattern, line): 
      passed_mid_point = False 

     elif passed_mid_point: 
      # parse the coordinates 
      print 'line2 =', line 
      terms = line.split() 
      print 'terms =', terms 

     if terms and terms[1] == 'T': 
      print terms[1] 
      atomic_number = terms[2] 
      print 'atomic_number = ', atomic_number 
      ATOMIC_NUMBERS.append(atomic_number) 

      x = terms[4] 
      print 'x =', x 
      Xs.append(x) 

      y = terms[5] 
      print 'y = ', y 
      Ys.append(y) 

      z = terms[6] 
      print 'z = ', z 
      Zs.append(z) 

print 'VOLUMES = ', VOLUMES 
print 'P0 = ', P0 
print 'P2 = ', P2 
print 'Xs = ', Xs 
print 'Ys = ', Ys 
print 'Zs = ', Zs 
print 'ATOMIC_NUMBERS = ', ATOMIC_NUMBERS 

# create the empty list of lists: 
atoms_all_frames = [[] for _ in xrange(len(VOLUMES))] 
print atoms_all_frames 

for index_vol in range(len(VOLUMES)): 
    for index in range(len(ATOMIC_NUMBERS)): 
    atoms_per_frame = [ATOMIC_NUMBERS[index], Xs[index], Ys[index], Zs[index]] 
    atoms_all_frames[index_vol].append(atoms_per_frame) 

# "atoms_all_frames" would be an appropriate list for looping 
print atoms_all_frames 

# Remove any existing V*.inp files, to clean first: 
for f in glob.glob("V*.inp"): 
    os.remove(f) 

# create the files: 
for V in VOLUMES: 
    filename = "V_{}.d12".format(V) 
    print filename 

    # open them: 
    with open(filename,"a") as f: 

    # the following is a pseudo-code, because I cannot manage to 
    # find the way to write element-wise each string to the files: 
    for p0, p2, atoms_all_frames: 

     f.write("""some stuff 
other stuff 
%s  %s 
%s 
%s %s  %s   %s 
%s %s  %s   %s 
%s %s  %s   %s 
other stuff 
some other stuff\n""" % p0 % p2 %N_atom_irreducible_unit %atoms_all_frames) 
+0

太多的代码和文字... –

+1

我猜你正在解析每个(时间)框架的一些结果,并且每个框架都有体积,并且可能有多个原子与它们的坐标。在这种情况下,首先创建一个列表(例如'atoms_all_frames = []')来保存所有原子结果。然后,在解析文件时,为每个帧创建一个原子坐标列表(例如'atoms_per_frame = []'),并将每个原子的(x,y,z)坐标追加到其中。然后将'atoms_per_frame'追加到'atoms_all_frames'中。这样,您的卷列表和坐标列表将具有相同的大小,即帧的数量。 – nos

+0

@nos感谢您的建议。我采用了这种方法,但是我无法设法将元素明智地写入文件。请参阅更新后的文章 –

有很多方法可以做到这一点。关键是要区分是否通过了mid_pattern,因为在它之前和之后都存在相同的坐标模式,并且只有在它之后才有此坐标模式。

例如,您可以

  1. 设置一个标志,所以我们知道mid_patternend_pattern匹配

    passed_mid_point = False 
    ... 
    if re.match(middle_pattern, line): 
        passed_mid_point = True 
        # do what you need 
        ... 
    if re.match(end_pattern, line): 
        passed_mid_point = False # so you can process a new frame 
        # do what you need after end pattern is matched 
        ... 
    elif passed_mid_point: 
        # parse the coordinates 
        terms = line.split() 
        if terms and terms[1] == 'T': 
         x = float(terms[4]) 
         y = float(terms[5]) 
         z = float(terms[6]) 
    

匹配了

  • 分支出来或者,你可以标记和匹配,像这样:

    passed_mid_point = False 
        coord_patter = r'  \d+ T ' 
        ... 
        if re.match(middle_pattern, line): 
         passed_mid_point = True 
         # do what you need 
         ... 
        if re.match(end_pattern, line): 
         passed_mid_point = False # so you can process a new frame 
         # do what you need after end pattern is matched 
         ... 
        if passed_mid_point and re.match(coord_pattern, line): 
         # parse the coordinates 
         terms = line.split() 
         if terms and terms[1] == 'T': 
          x = float(terms[4]) 
          y = float(terms[5]) 
          z = float(terms[6]) 
    

    坐标匹配完全可以在正则表达式来完成,以及

    sci_num = r'-?\d+\.\d*E[+\-]\d+' 
    coord_pattern = r'\s+\d+\sT\s+\d+\s+[A-Z]+\s+(%s)\s+(%s)\s+(%s)' % (sci_num, sci_num, sci_num) 
    coord_re = re.compile(coord_pattern) 
    if coord_re.match(line): 
        x = float(coord_re.group(1)) 
        y = float(coord_re.group(2)) 
        z = float(coord_re.group(3)) 
    

    记录数据,这将是更好,如果你跟踪帧的原子坐标属于。例如,您可以在开始时创建一个atom_frames。并保持附加的原子坐标列表,其中每个列表对应一个帧。总体而言,它看起来像这样

    atom_frames = [] 
    for i in range(50): # here I assume 50 frames 
        current_frame = [] 
        for a in atoms_in_this_frame: 
         current_frame.append(a) # a could be (x, y, z) of an atom 
        atom_frames.append(current_frame) 
    

    这里我只是循环帧数。在你的情况下,当你点击mid_pattern时,你可以创建current_frame = []。当你点击end_pattern时,做atom_frames.append(current_frame)。希望它是有道理的。

  • +0

    感谢您的答案。这种标志程序方法非常有趣。我在代码中应用了这个原则。但是,当达到'if terms [1] =='T':'语句时,有一个'列表索引超出范围'的错误。请参阅**更新的代码**以重现此问题。 '如果条件[1] =='T':'陈述对我来说似乎很好,我不明白问题出在哪里 –

    +1

    哦,这是因为有空行,请参阅更新代码 – nos

    +0

    感谢您的澄清和再次感谢你的帮助。将信息保存到文件的部分存在一些困难。请参阅**更新的代码**。 –