切片线和保存参数到不同的文件
我有一个g.out
文件(粘贴在下面)。切片线和保存参数到不同的文件
该文件包含几个我想提取的FINAL OPTIMIZED
几何。
对于给定的FINAL OPTIMIZED GEOMETRY
,这些突出显示的值都是我想提取:
我在下面的程序提取已管理的三个第一:VOLUME
和A
和B
:
我的代码:
import os
import sys
import re
initial_pattern = '^ FINAL OPTIMIZED GEOMETRY - DIMENSIONALITY OF THE SYSTEM 3$'
middle_pattern = '^ CRYSTALLOGRAPHIC CELL '
end_pattern = '^ T = ATOM BELONGING TO THE ASYMMETRIC UNIT$'
VOLUMES = []
P0 = []
P2 = []
atomic_number = []
coord_x = []
coord_y = []
coord_z = []
with open('g.out') as file:
for line in file:
if re.match(initial_pattern, line):
print file.next()
print file.next()
print file.next()
volume_line = file.next()
print volume_line
aux = volume_line.split()
each_volume = aux[7]
print each_volume
VOLUMES.append(each_volume)
if re.match(middle_pattern, line):
print line
print file.next()
parameters_line = file.next()
aux = parameters_line.split()
p0 = aux[0]
p1 = aux[1]
p2 = aux[2]
p3 = aux[3]
p4 = aux[4]
p5 = aux[5] #
print p0
print p2
P0.append(p0)
P2.append(p2)
print file.next()
print file.next()
print file.next()
print file.next()
first_coord_line = file.next()
print first_coord_line
if re.match(end_pattern, line):
end_pattern = line
print end_pattern
all_coordinates = [first_coord_line:end_pattern]
for line in all_coordinates:
del('F ') # delete those that contain 'F '
aux2 = line.split()
coords = []
sys.exit()
#Template =
"""
some stuff
other stuff
p0 p2
3
A B C D
E F G H
I J K L
other stuff
some other stuff
"""
我不能够提取COORDINATES
,因为我不能在这个伪代码中找到切片从first_coord_line
线end_pattern
的方式,如:
if re.match(end_pattern, line):
end_pattern = line
print end_pattern
all_coordinates = [first_coord_line:end_pattern]
for line in all_coordinates:
del('F ') # delete those that contain 'F '
aux2 = line.split() # split lines
atomic_number = aux2[2]
coord_x = aux2[4]
coord_y = aux2[5]
coord_z = aux2[6]
有没有办法实现这个伪代码?
在我的代码,VOLUMES
,P0
,P2
,atomic_number
,coord_x
,coord_y
coord_z
是因为之前结束for循环,我想在不同的文件,用“VOLUME
.INP”的名字命名,以保存列表初始化,这样的信息:
#Template =
"""
some stuff
other stuff
p0 p2
3
A B C D
E F G H
I J K L
other stuff
some other stuff
"""
其中p0
和p2
被值在我的代码萃取(第二和第三突出了屏幕截图的值),和A
- L
是atomic_number
和coord_x
,coord_y
,coord_z
。
有没有办法做到这一点?
的g.out
文件:
more lines
more lines
more lines
FINAL OPTIMIZED GEOMETRY - DIMENSIONALITY OF THE SYSTEM 3
(NON PERIODIC DIRECTION: LATTICE PARAMETER FORMALLY SET TO 500)
*******************************************************************************
LATTICE PARAMETERS (ANGSTROMS AND DEGREES) - BOHR = 0.5291772083 ANGSTROM
PRIMITIVE CELL - CENTRING CODE 7/0 VOLUME= 119.823364 - DENSITY 2.770 g/cm^3
A B C ALPHA BETA GAMMA
6.28373604 6.28373604 6.28373604 46.646397 46.646397 46.646397
*******************************************************************************
ATOMS IN THE ASYMMETRIC UNIT 3 - ATOMS IN THE UNIT CELL: 10
ATOM X/A Y/B Z/C
*******************************************************************************
1 T 20 CA 0.000000000000E+00 0.000000000000E+00 0.000000000000E+00
2 F 20 CA -5.000000000000E-01 -5.000000000000E-01 -5.000000000000E-01
3 T 6 C 2.500000000000E-01 2.500000000000E-01 2.500000000000E-01
4 F 6 C -2.500000000000E-01 -2.500000000000E-01 -2.500000000000E-01
5 T 8 O -4.924094276183E-01 -7.590572381674E-03 2.500000000000E-01
6 F 8 O 2.500000000000E-01 -4.924094276183E-01 -7.590572381674E-03
7 F 8 O -7.590572381674E-03 2.500000000000E-01 -4.924094276183E-01
8 F 8 O 4.924094276183E-01 7.590572381674E-03 -2.500000000000E-01
9 F 8 O -2.500000000000E-01 4.924094276183E-01 7.590572381674E-03
10 F 8 O 7.590572381674E-03 -2.500000000000E-01 4.924094276183E-01
TRANSFORMATION MATRIX PRIMITIVE-CRYSTALLOGRAPHIC CELL
1.0000 0.0000 1.0000 -1.0000 1.0000 1.0000 0.0000 -1.0000 1.0000
*******************************************************************************
CRYSTALLOGRAPHIC CELL (VOLUME= 359.47009054)
A B C ALPHA BETA GAMMA
4.97568007 4.97568007 16.76591397 90.000000 90.000000 120.000000
COORDINATES IN THE CRYSTALLOGRAPHIC CELL
ATOM X/A Y/B Z/C
*******************************************************************************
1 T 20 CA 0.000000000000E+00 0.000000000000E+00 0.000000000000E+00
2 F 20 CA -5.491739570355E-17 -2.745869785177E-17 -5.000000000000E-01
3 T 6 C 3.333333333333E-01 -3.333333333333E-01 -8.333333333333E-02
4 F 6 C -3.333333333333E-01 3.333333333333E-01 8.333333333333E-02
5 T 8 O -4.090760942850E-01 -3.333333333333E-01 -8.333333333333E-02
6 F 8 O 3.333333333333E-01 -7.574276095166E-02 -8.333333333333E-02
7 F 8 O 7.574276095166E-02 4.090760942850E-01 -8.333333333333E-02
8 F 8 O 4.090760942850E-01 3.333333333333E-01 8.333333333333E-02
9 F 8 O -3.333333333333E-01 7.574276095166E-02 8.333333333333E-02
10 F 8 O -7.574276095166E-02 -4.090760942850E-01 8.333333333333E-02
T = ATOM BELONGING TO THE ASYMMETRIC UNIT
INFORMATION **** fort.34 **** GEOMETRY OUTPUT FILE
more lines
more lines
more lines
FINAL OPTIMIZED GEOMETRY - DIMENSIONALITY OF THE SYSTEM 3
(NON PERIODIC DIRECTION: LATTICE PARAMETER FORMALLY SET TO 500)
*******************************************************************************
LATTICE PARAMETERS (ANGSTROMS AND DEGREES) - BOHR = 0.5291772083 ANGSTROM
PRIMITIVE CELL - CENTRING CODE 7/0 VOLUME= 121.143469 - DENSITY 2.740 g/cm^3
A B C ALPHA BETA GAMMA
6.32229536 6.32229536 6.32229536 46.436583 46.436583 46.436583
*******************************************************************************
ATOMS IN THE ASYMMETRIC UNIT 3 - ATOMS IN THE UNIT CELL: 10
ATOM X/A Y/B Z/C
*******************************************************************************
1 T 20 CA 0.000000000000E+00 0.000000000000E+00 0.000000000000E+00
2 F 20 CA 5.000000000000E-01 -5.000000000000E-01 -5.000000000000E-01
3 T 6 C 2.500000000000E-01 2.500000000000E-01 2.500000000000E-01
4 F 6 C -2.500000000000E-01 -2.500000000000E-01 -2.500000000000E-01
5 T 8 O -4.927088991116E-01 -7.291100888437E-03 2.500000000000E-01
6 F 8 O 2.500000000000E-01 -4.927088991116E-01 -7.291100888437E-03
7 F 8 O -7.291100888437E-03 2.500000000000E-01 -4.927088991116E-01
8 F 8 O 4.927088991116E-01 7.291100888437E-03 -2.500000000000E-01
9 F 8 O -2.500000000000E-01 4.927088991116E-01 7.291100888437E-03
10 F 8 O 7.291100888437E-03 -2.500000000000E-01 4.927088991116E-01
TRANSFORMATION MATRIX PRIMITIVE-CRYSTALLOGRAPHIC CELL
1.0000 0.0000 1.0000 -1.0000 1.0000 1.0000 0.0000 -1.0000 1.0000
*******************************************************************************
CRYSTALLOGRAPHIC CELL (VOLUME= 363.43040599)
A B C ALPHA BETA GAMMA
4.98494429 4.98494429 16.88768068 90.000000 90.000000 120.000000
COORDINATES IN THE CRYSTALLOGRAPHIC CELL
ATOM X/A Y/B Z/C
*******************************************************************************
1 T 20 CA 0.000000000000E+00 0.000000000000E+00 0.000000000000E+00
2 F 20 CA -5.471726358381E-17 -2.735863179191E-17 -5.000000000000E-01
3 T 6 C 3.333333333333E-01 -3.333333333333E-01 -8.333333333333E-02
4 F 6 C -3.333333333333E-01 3.333333333333E-01 8.333333333333E-02
5 T 8 O -4.093755657782E-01 -3.333333333333E-01 -8.333333333333E-02
6 F 8 O 3.333333333333E-01 -7.604223244490E-02 -8.333333333333E-02
7 F 8 O 7.604223244490E-02 4.093755657782E-01 -8.333333333333E-02
8 F 8 O 4.093755657782E-01 3.333333333333E-01 8.333333333333E-02
9 F 8 O -3.333333333333E-01 7.604223244490E-02 8.333333333333E-02
10 F 8 O -7.604223244490E-02 -4.093755657782E-01 8.333333333333E-02
T = ATOM BELONGING TO THE ASYMMETRIC UNIT
INFORMATION **** fort.34 **** GEOMETRY OUTPUT FILE
more lines
more lines
more lines
更新后的代码:基于@nos标志的做法
,下面的代码能够提取的信息。 VOLUMES
是一个包含2个元素的列表。 下面列出的结果:
VOLUMES = ['119.823364', '121.143469']
P0 = ['4.97568007', '4.98494429']
P2 = ['16.76591397', '16.88768068']
Xs = ['0.000000000000E+00', '3.333333333333E-01', '-4.090760942850E-01', '0.000000000000E+00', '3.333333333333E-01', '-4.093755657782E-01']
Ys = ['0.000000000000E+00', '-3.333333333333E-01', '-3.333333333333E-01', '0.000000000000E+00', '-3.333333333333E-01', '-3.333333333333E-01']
Zs = ['0.000000000000E+00', '-8.333333333333E-02', '-8.333333333333E-02', '0.000000000000E+00', '-8.333333333333E-02', '-8.333333333333E-02']
ATOMIC_NUMBERS = ['20', '6', '8', '20', '6', '8']
这篇文章的第二部分是写这个信息报告(P0
,P2
,ATOMIC_NUMBERS
,Xs
,Ys
,Zs
)两个VOLUME.inp
文件中。换句话说,这样的:
V_119.823364.inp
文件:
some stuff
other stuff
4.97568007 4.98494429
3
20 0.000000000000E+00 0.000000000000E+00 0.000000000000E+00
6 3.333333333333E-01 -3.333333333333E-01 -8.333333333333E-02
8 -4.090760942850E-01 -3.333333333333E-01 -8.333333333333E-02
other stuff
V_121.143469.inp
文件:根据@号的atoms_per_frame
和atoms_all_frames
的建议,我曾尝试下面的代码
some stuff
other stuff
4.97568007 4.98494429
3
20 0.000000000000E+00 0.000000000000E+00 0.000000000000E+00
6 3.333333333333E-01 -3.333333333333E-01 -8.333333333333E-02
8 -4.093755657782E-01 -3.333333333333E-01 -8.333333333333E-02
other stuff
。我在文件中发现元素方面存在困难,例如:
import os
import sys
import re
import glob
initial_pattern = '^ FINAL OPTIMIZED GEOMETRY - DIMENSIONALITY OF THE SYSTEM 3$'
middle_pattern = '^ CRYSTALLOGRAPHIC CELL '
end_pattern = '^ T = ATOM BELONGING TO THE ASYMMETRIC UNIT$'
global N_atom_irreducible_unit
N_atom_irreducible_unit = 3
VOLUMES = []
P0 = []
P2 = []
ATOMIC_NUMBERS = []
Xs = []
Ys = []
Zs = []
with open('g.out') as file:
passed_mid_point = False
for line in file:
if re.match(initial_pattern, line):
print file.next()
print file.next()
print file.next()
volume_line = file.next()
print volume_line
aux = volume_line.split()
each_volume = aux[7]
print each_volume
VOLUMES.append(each_volume)
if re.match(middle_pattern, line):
print line
print file.next()
parameters_line = file.next()
aux = parameters_line.split()
p0 = aux[0]
p1 = aux[1]
p2 = aux[2]
p3 = aux[3]
p4 = aux[4]
p5 = aux[5] #
print p0
print p2
P0.append(p0)
P2.append(p2)
print file.next()
print file.next()
print file.next()
print file.next()
if re.match(middle_pattern, line):
passed_mid_point = True
print 'line = ', line
if re.match(end_pattern, line):
passed_mid_point = False
elif passed_mid_point:
# parse the coordinates
print 'line2 =', line
terms = line.split()
print 'terms =', terms
if terms and terms[1] == 'T':
print terms[1]
atomic_number = terms[2]
print 'atomic_number = ', atomic_number
ATOMIC_NUMBERS.append(atomic_number)
x = terms[4]
print 'x =', x
Xs.append(x)
y = terms[5]
print 'y = ', y
Ys.append(y)
z = terms[6]
print 'z = ', z
Zs.append(z)
print 'VOLUMES = ', VOLUMES
print 'P0 = ', P0
print 'P2 = ', P2
print 'Xs = ', Xs
print 'Ys = ', Ys
print 'Zs = ', Zs
print 'ATOMIC_NUMBERS = ', ATOMIC_NUMBERS
# create the empty list of lists:
atoms_all_frames = [[] for _ in xrange(len(VOLUMES))]
print atoms_all_frames
for index_vol in range(len(VOLUMES)):
for index in range(len(ATOMIC_NUMBERS)):
atoms_per_frame = [ATOMIC_NUMBERS[index], Xs[index], Ys[index], Zs[index]]
atoms_all_frames[index_vol].append(atoms_per_frame)
# "atoms_all_frames" would be an appropriate list for looping
print atoms_all_frames
# Remove any existing V*.inp files, to clean first:
for f in glob.glob("V*.inp"):
os.remove(f)
# create the files:
for V in VOLUMES:
filename = "V_{}.d12".format(V)
print filename
# open them:
with open(filename,"a") as f:
# the following is a pseudo-code, because I cannot manage to
# find the way to write element-wise each string to the files:
for p0, p2, atoms_all_frames:
f.write("""some stuff
other stuff
%s %s
%s
%s %s %s %s
%s %s %s %s
%s %s %s %s
other stuff
some other stuff\n""" % p0 % p2 %N_atom_irreducible_unit %atoms_all_frames)
有很多方法可以做到这一点。关键是要区分是否通过了mid_pattern
,因为在它之前和之后都存在相同的坐标模式,并且只有在它之后才有此坐标模式。
例如,您可以
- 设置一个标志,所以我们知道
mid_pattern
在end_pattern
匹配passed_mid_point = False ... if re.match(middle_pattern, line): passed_mid_point = True # do what you need ... if re.match(end_pattern, line): passed_mid_point = False # so you can process a new frame # do what you need after end pattern is matched ... elif passed_mid_point: # parse the coordinates terms = line.split() if terms and terms[1] == 'T': x = float(terms[4]) y = float(terms[5]) z = float(terms[6])
匹配了
分支出来或者,你可以标记和匹配,像这样:
passed_mid_point = False
coord_patter = r' \d+ T '
...
if re.match(middle_pattern, line):
passed_mid_point = True
# do what you need
...
if re.match(end_pattern, line):
passed_mid_point = False # so you can process a new frame
# do what you need after end pattern is matched
...
if passed_mid_point and re.match(coord_pattern, line):
# parse the coordinates
terms = line.split()
if terms and terms[1] == 'T':
x = float(terms[4])
y = float(terms[5])
z = float(terms[6])
坐标匹配完全可以在正则表达式来完成,以及
sci_num = r'-?\d+\.\d*E[+\-]\d+'
coord_pattern = r'\s+\d+\sT\s+\d+\s+[A-Z]+\s+(%s)\s+(%s)\s+(%s)' % (sci_num, sci_num, sci_num)
coord_re = re.compile(coord_pattern)
if coord_re.match(line):
x = float(coord_re.group(1))
y = float(coord_re.group(2))
z = float(coord_re.group(3))
记录数据,这将是更好,如果你跟踪帧的原子坐标属于。例如,您可以在开始时创建一个atom_frames
。并保持附加的原子坐标列表,其中每个列表对应一个帧。总体而言,它看起来像这样
atom_frames = []
for i in range(50): # here I assume 50 frames
current_frame = []
for a in atoms_in_this_frame:
current_frame.append(a) # a could be (x, y, z) of an atom
atom_frames.append(current_frame)
这里我只是循环帧数。在你的情况下,当你点击mid_pattern
时,你可以创建current_frame = []
。当你点击end_pattern
时,做atom_frames.append(current_frame)
。希望它是有道理的。
感谢您的答案。这种标志程序方法非常有趣。我在代码中应用了这个原则。但是,当达到'if terms [1] =='T':'语句时,有一个'列表索引超出范围'的错误。请参阅**更新的代码**以重现此问题。 '如果条件[1] =='T':'陈述对我来说似乎很好,我不明白问题出在哪里 –
哦,这是因为有空行,请参阅更新代码 – nos
感谢您的澄清和再次感谢你的帮助。将信息保存到文件的部分存在一些困难。请参阅**更新的代码**。 –
太多的代码和文字... –
我猜你正在解析每个(时间)框架的一些结果,并且每个框架都有体积,并且可能有多个原子与它们的坐标。在这种情况下,首先创建一个列表(例如'atoms_all_frames = []')来保存所有原子结果。然后,在解析文件时,为每个帧创建一个原子坐标列表(例如'atoms_per_frame = []'),并将每个原子的(x,y,z)坐标追加到其中。然后将'atoms_per_frame'追加到'atoms_all_frames'中。这样,您的卷列表和坐标列表将具有相同的大小,即帧的数量。 – nos
@nos感谢您的建议。我采用了这种方法,但是我无法设法将元素明智地写入文件。请参阅更新后的文章 –