如何在python中找到列表中的特定单词
我必须找到单词是否在列表中,如果它在列表中找到,那么文件将用标记“1”写入该列表,否则文件将写入标签为“0”的列表。我的Python代码是低于遇到类型错误的错误:只能串联列表(不是 “STR”),列出如何在python中找到列表中的特定单词
f2 = open("C:/Python26/Semantics.txt",'w')
sem = ["cells", "gene","factor","alpha", "receptor", "t","promotor"];
with open("C:/Python26/trigram.txt") as f:
contents = f.readlines()
for lines in contents:
tokens = lines.split('$')
for t in tokens:
if t.strip() in sem:
f2.write(tokens+"\t"+"1 \n");
else:
f2.write(tokens+"\t"+"0 \n");
f2.close()
我的文件看起来像这样:
IL-2$gene$expression$and
IL-2$gene$expression$and$NF-kappa
IL-2$gene$expression$and$NF-kappa$B
IL-2$gene$expression$and$NF-kappa$B$activation
gene$expression$and$NF-kappa$B$activation$through
expression$and$NF-kappa$B$activation$through$CD28
我所需的输出
IL-2 gene expression and 1
IL-2 gene expression and NF-kappa 1
IL-2 gene expression and NF-kappa B 1
IL-2 gene expression and NF-kappa B activation 1
gene expression and NF-kappa B activation through 1
expression and NF-kappa B activation through CD28 0
的情况下,我想产生像
Token cells gene factor……. promoter
IL-2 gene expression and 0 1 0 ……… 0
IL-2 gene expression and NF-kappa 0 1 0 ……… 0
IL-2 gene expression and NF-kappa B 0 1 0 ……… 0
IL-2 gene expression and NF-kappa B activation 0 1 0 ……… 0
gene expression and NF-kappa B activation through 0 1 0 ……… 0
expression and NF-kappa B activation through CD28 0 0 0 ……… 0
我认为将需要在代码一点点变化
尝试这样的输出:
sem = ["cells", "gene","factor","alpha", "receptor", "t","promotor"]
with open("C:/Python26/trigram.txt") as f, open("C:/Python26/Semantics.txt",'w') as f2:
for x in f:
x = x.strip().split("$")
print " ".join(x), len(set(sem) & set(x))
f2.write("{} {}\n".format(" ".join(x), len(set(sem) & set(x))))
或写入文件,而不是打印到控制台
f2.write("{} {}\n".format(" ".join(x), len(set(sem) & set(x))))
输出:
IL-2 gene expression and 1
IL-2 gene expression and NF-kappa 1
IL-2 gene expression and NF-kappa B 1
IL-2 gene expression and NF-kappa B activation 1
gene expression and NF-kappa B activation through 1
expression and NF-kappa B activation through CD28 0
Explanation of
" ".join(x), len(set(sem) & set(x))
“”。加入(X):这将加入由空格分隔列表
LEN(集(SEM)&集(X)):一套给你,没有列出(sem)& set(x)与math
设置和操作相同,只会给你两个列表中的匹配元素,然后我有列表长度的列表
x = x.strip.split(“$”)而不是x = x.strip()。split(“$”)的错误。 thx寻求帮助:) – 2015-04-04 06:14:14
可以解释行打印“”.join(x),len(set(sem)&set(x)) – 2015-04-04 06:26:58
ValueError:零长度字段名格式错误在写入文件的情况下 – 2015-04-04 06:30:51
为什么你以semicolon结束sem,no需要在python中分号 – Hackaholic 2015-04-04 06:07:26
在代码中粘贴后,选择整个块并然后按Ctrl + K缩进所有**。您的程序需要按照显示运行,因为它有缩进错误。 Andy为什么在几行后没有一个分号后面有分号? – Anthon 2015-04-04 06:11:43