如何通过将它们与LINUX中的另一个数据文件进行比较来将相同的等级赋予一个id列表?
我有根据它们的值(列3)从1到600排列的id(列2)的列表。我有另外一个同样ID的列表,但是排名不同,因为它们的差别是不同的。我怎样才能把file2中id的第一个secound列表的排列顺序与file1中的第一个id列表相混淆?例如:如何通过将它们与LINUX中的另一个数据文件进行比较来将相同的等级赋予一个id列表?
file1:
rank list-of-ids values
1 HOUSAM69708729 0.4468
2 HOCANM106363549 0.4434
3 HOCANM10845509 0.4268
4 HOCANM11098662 0.4203
5 HOUSAM68571374 0.3896
6 HOUSAM69990251 0.3895
7 HONLDM716072164 0.3893
8 HOUSAM69756113 0.3656
9 HOCANM11098658 0.3593
10 HOUSAM66626020 0.3538
file2:
list-of-ids values
HOCANM106363549 0.4832
HOUSAM69708729 0.4199
HOCANM10845509 0.4143
HOUSAM69990251 0.3887
HOCANM11098662 0.3792
HOUSAM69756113 0.365
HOUSAM68571374 0.3649
HONLDM716072164 0.3600
HOUSAM66626020 0.3593
HOCANM11098658 0.3545
输出文件应与排名从文件1来进行file2中:
output:
rank list-of-ids values
2 HOCANM106363549 0.4832
1 HOUSAM69708729 0.4199
3 HOCANM10845509 0.4143
6 HOUSAM69990251 0.3887
4 HOCANM11098662 0.3792
8 HOUSAM69756113 0.365
5 HOUSAM68571374 0.3649
7 HONLDM716072164 0.3600
10 HOUSAM66626020 0.3593
9 HOCANM11098658 0.3545
任何建议,好吗?请注意,真实数据没有任何标题,因此输出不应该有标题。
AWK溶液:
awk 'NR==FNR{ a[$2]=$1; next }{ print a[$1],$1,$2 }' file1 file2
NR==FNR
- 处理所述第一输入文件(即file1
)a[$2]=$1
- 捕获rank
值(第一场$1
)到阵列a
索引编号为list-of-ids
个值(第二场$2
)next
- 跳跃到下一个记录(file1
)print a[$1],$1,$2
- 从所述第二输入文件file2
打印字段($1, $2
)与对应rank
值a[$1]
输出:
2 HOCANM106363549 0.4832
1 HOUSAM69708729 0.4199
3 HOCANM10845509 0.4143
6 HOUSAM69990251 0.3887
4 HOCANM11098662 0.3792
8 HOUSAM69756113 0.365
5 HOUSAM68571374 0.3649
7 HONLDM716072164 0.3600
10 HOUSAM66626020 0.3593
9 HOCANM11098658 0.3545
我的真实数据没有任何列名。如何删除“职级”作为列名?我的意思是我不应该在输出 – zara
@zara的第一行(排名列表中的ids值),请参阅我的更新 – RomanPerekhrest
谢谢。你能解释一下你的剧本吗?我想了解它 – zara
另一种选择,使用'join'
$ join -1 2 -2 1 -o 1.1,2.1,2.2 <(sort -k 2 file1) <(sort -k 1 file2)
2 HOCANM106363549 0.4832
3 HOCANM10845509 0.4143
9 HOCANM11098658 0.3545
4 HOCANM11098662 0.3792
7 HONLDM716072164 0.3600
10 HOUSAM66626020 0.3593
5 HOUSAM68571374 0.3649
1 HOUSAM69708729 0.4199
8 HOUSAM69756113 0.365
6 HOUSAM69990251 0.3887
ranks list-of-ids values
诚然,这不处理的头很干净。你已经接受的解决办法,但我喜欢这个工具,而不是很多人都知道它;)
编辑:如果源数据没有任何标题,则该命令的伟大工程:
$ cat file1
1 HOUSAM69708729 0.4468
2 HOCANM106363549 0.4434
3 HOCANM10845509 0.4268
4 HOCANM11098662 0.4203
5 HOUSAM68571374 0.3896
6 HOUSAM69990251 0.3895
7 HONLDM716072164 0.3893
8 HOUSAM69756113 0.3656
9 HOCANM11098658 0.3593
10 HOUSAM66626020 0.3538
$ cat file2
HOCANM106363549 0.4832
HOUSAM69708729 0.4199
HOCANM10845509 0.4143
HOUSAM69990251 0.3887
HOCANM11098662 0.3792
HOUSAM69756113 0.365
HOUSAM68571374 0.3649
HONLDM716072164 0.3600
HOUSAM66626020 0.3593
HOCANM11098658 0.3545
$ join -1 2 -2 1 -o 1.1,2.1,2.2 <(sort -k 2 file1) <(sort -k 1 file2)
2 HOCANM106363549 0.4832
3 HOCANM10845509 0.4143
9 HOCANM11098658 0.3545
4 HOCANM11098662 0.3792
7 HONLDM716072164 0.3600
10 HOUSAM66626020 0.3593
5 HOUSAM68571374 0.3649
1 HOUSAM69708729 0.4199
8 HOUSAM69756113 0.365
6 HOUSAM69990251 0.3887
如果任一文件中确实包含了头,那么你可以只用grep出来的“排序”前:
$ cat file1
ranks list-of-ids values
1 HOUSAM69708729 0.4468
2 HOCANM106363549 0.4434
3 HOCANM10845509 0.4268
4 HOCANM11098662 0.4203
5 HOUSAM68571374 0.3896
6 HOUSAM69990251 0.3895
7 HONLDM716072164 0.3893
8 HOUSAM69756113 0.3656
9 HOCANM11098658 0.3593
10 HOUSAM66626020 0.3538
$ cat file2
list-of-ids values
HOCANM106363549 0.4832
HOUSAM69708729 0.4199
HOCANM10845509 0.4143
HOUSAM69990251 0.3887
HOCANM11098662 0.3792
HOUSAM69756113 0.365
HOUSAM68571374 0.3649
HONLDM716072164 0.3600
HOUSAM66626020 0.3593
HOCANM11098658 0.3545
$ join -1 2 -2 1 -o 1.1,2.1,2.2 <(grep -v "list-of-ids" file1 | sort -k 2) <(grep -v "list-of-ids" file2 | sort -k 1)
2 HOCANM106363549 0.4832
3 HOCANM10845509 0.4143
9 HOCANM11098658 0.3545
4 HOCANM11098662 0.3792
7 HONLDM716072164 0.3600
10 HOUSAM66626020 0.3593
5 HOUSAM68571374 0.3649
1 HOUSAM69708729 0.4199
8 HOUSAM69756113 0.365
6 HOUSAM69990251 0.3887
你是什么意思“真正的数据没有任何头”,你可以请张贴你的实际da ta在这个例子中看起来像? –