Python rstrip和split

问题描述：

嗨我想从我的列表中删除strip换行符并获取每个元素的第三个单词。我正在使用split和rstrip。这是我的代码：Python rstrip和split

# obtain just compound ids from list 
just_compound_id = [] 
for line in final_list: 
    split_file = line.split(' ') 
    split_file = line.rstrip() 
    just_compound_id.append(split_file[2]) 
    print(just_compound_id)

，但我得到一个非常奇怪的输出，这样的事情

['I'] 
['I', 'I'] 
['I', 'I', 'I'] 
['I', 'I', 'I', 'I'] 
['I', 'I', 'I', 'I', 'I']

**编辑

这是我输入

['UNIQUE-ID - ASN\n', 'UNIQUE-ID - D-GLT\n', 'UNIQUE-ID - 4-AMINO- 
BUTYRATE\n', 'UNIQUE-ID - CPD-8569\n', 'UNIQUE-ID - CPD-17095\n', 'UNIQUE-ID 
- CPD-17880\n', 'UNIQUE-ID - GLY\n', 'UNIQUE-ID - CPD-18298\n', 'UNIQUE-ID - 
D-SERINE\n', 'UNIQUE-ID - ACETYLCHOLINE\n', 'UNIQUE-ID - DOPAMINE\n', 
'UNIQUE-ID - SEROTONIN\n', 'UNIQUE-ID - HISTAMINE\n', 'UNIQUE-ID - 
PHENYLETHYLAMINE\n', 'UNIQUE-ID - TYRAMINE\n', 'UNIQUE-ID - CPD-58\n', 
'UNIQUE-ID - 1-4-HYDROXYPHENYL-2-METHYLAMINOETHAN\n', 'UNIQUE-ID - 
TRYPTAMINE\n']

请给出您的输入示例。此外，分裂之前，rstrip（）。 –

你真的想重写'split_file'吗？ :) –

所以，我改变了顺序（顺便说一句，为什么strip之前必须分裂？）。但我仍然没有从列表中删除\ n – StudentOIST

答

它应该是split_file.split(' ')而不是line.split(' ')，并且您还需要执行line.rstrip()脱颖而出split_file.split(' ')：

just_compound_id = [] 
for line in final_list: 
    split_file = line.rstrip() 
    split_file = split_file.split(' ') 
    just_compound_id.append(split_file[2]) 
    print(just_compound_id)

你有它目前的方式，第一分配split_file有没有影响，因为你不使用它，你覆盖它的下一个任务。

答

输出错误，因为每个输入的字符串的第三个字符是I。

你想让角色在中间，对吧？

所以使用这个代码，而不是

# obtain just compound ids from list 
just_compound_id = [] 
for line in final_list: 
    split_file = line.rstrip() 
    split_file = line.split(' ') 
    just_compound_id.append(split_file[len(split_file)/2]) # EDIT HERE 

print(just_compound_id)

使用你的代码，我得到了以下反馈不支持的操作数类型（s），用于/：'str'和'int'。另外我不想得到中间字符，但最后一个字（例如列表中的第一个元素，我想获得ASN） – StudentOIST

@smvpfm谢谢。我做了一个编辑。 –

答

您可以使用列表理解：

[e.rstrip().split('-')[2] for e in finallist]

@smvpfm你想要什么？ –

答

另一种选择是正则表达式，如果你有更复杂的输入。对于上面的情况，我建议使用-分割线，并设置maxsplit参数。

final_list = ['UNIQUE-ID - ASN\n', 'UNIQUE-ID - D-GLT\n', 'UNIQUE-ID - 4-AMINO-BUTYRATE\n', 'UNIQUE-ID - CPD-8569\n', 'UNIQUE-ID - CPD-17095\n', 'UNIQUE-ID - CPD-17880\n', 'UNIQUE-ID - GLY\n', 'UNIQUE-ID - CPD-18298\n', 'UNIQUE-ID - D-SERINE\n', 'UNIQUE-ID - ACETYLCHOLINE\n', 
       'UNIQUE-ID - DOPAMINE\n', 'UNIQUE-ID - SEROTONIN\n', 'UNIQUE-ID - HISTAMINE\n', 'UNIQUE-ID - PHENYLETHYLAMINE\n', 'UNIQUE-ID - TYRAMINE\n', 'UNIQUE-ID - CPD-58\n', 'UNIQUE-ID - 1-4-HYDROXYPHENYL-2-METHYLAMINOETHAN\n', 'UNIQUE-ID - TRYPTAMINE\n'] 
just_compound_id = [] 
for line in final_list: 
    line = line.rstrip() 
    split_id = line.split('-', 2)[2].strip() 
    just_compound_id.append(split_id) 
print(just_compound_id)

相关推荐