如何通过将它们与LINUX中的另一个数据文件进行比较来将相同的等级赋予一个id列表?

问题描述:

我有根据它们的值(列3)从1到600排列的id(列2)的列表。我有另外一个同样ID的列表,但是排名不同,因为它们的差别是不同的。我怎样才能把file2中id的第一个secound列表的排列顺序与file1中的第一个id列表相混淆?例如:如何通过将它们与LINUX中的另一个数据文件进行比较来将相同的等级赋予一个id列表?

file1: 
    rank list-of-ids values 
    1 HOUSAM69708729 0.4468 
    2 HOCANM106363549 0.4434 
    3 HOCANM10845509 0.4268 
    4 HOCANM11098662 0.4203 
    5 HOUSAM68571374 0.3896 
    6 HOUSAM69990251 0.3895 
    7 HONLDM716072164 0.3893 
    8 HOUSAM69756113 0.3656 
    9 HOCANM11098658 0.3593 
    10 HOUSAM66626020 0.3538 

file2: 
list-of-ids values 
HOCANM106363549 0.4832 
HOUSAM69708729 0.4199 
HOCANM10845509 0.4143 
HOUSAM69990251 0.3887 
HOCANM11098662 0.3792 
HOUSAM69756113 0.365 
HOUSAM68571374 0.3649 
HONLDM716072164 0.3600 
HOUSAM66626020 0.3593 
HOCANM11098658 0.3545 

输出文件应与排名从文件1来进行file2中:

output: 
rank list-of-ids values 
2 HOCANM106363549 0.4832 
1 HOUSAM69708729 0.4199 
3 HOCANM10845509 0.4143 
6 HOUSAM69990251 0.3887 
4 HOCANM11098662 0.3792 
8 HOUSAM69756113 0.365 
5 HOUSAM68571374 0.3649 
7 HONLDM716072164 0.3600 
10 HOUSAM66626020 0.3593 
9 HOCANM11098658 0.3545 

任何建议,好吗?请注意,真实数据没有任何标题,因此输出不应该有标题。

+0

你是什么意思“真正的数据没有任何头”,你可以请张贴你的实际da ta在这个例子中看起来像? –

AWK溶液:

awk 'NR==FNR{ a[$2]=$1; next }{ print a[$1],$1,$2 }' file1 file2 
  • NR==FNR - 处理所述第一输入文件(即file1

  • a[$2]=$1 - 捕获rank值(第一场$1)到阵列a索引编号为list-of-ids个值(第二场$2

  • next - 跳跃到下一个记录(file1

  • print a[$1],$1,$2 - 从所述第二输入文件file2打印字段($1, $2)与对应ranka[$1]


输出:

2 HOCANM106363549 0.4832 
1 HOUSAM69708729 0.4199 
3 HOCANM10845509 0.4143 
6 HOUSAM69990251 0.3887 
4 HOCANM11098662 0.3792 
8 HOUSAM69756113 0.365 
5 HOUSAM68571374 0.3649 
7 HONLDM716072164 0.3600 
10 HOUSAM66626020 0.3593 
9 HOCANM11098658 0.3545 
+0

我的真实数据没有任何列名。如何删除“职级”作为列名?我的意思是我不应该在输出 – zara

+0

@zara的第一行(排名列表中的ids值),请参阅我的更新 – RomanPerekhrest

+0

谢谢。你能解释一下你的剧本吗?我想了解它 – zara

另一种选择,使用'join'

$ join -1 2 -2 1 -o 1.1,2.1,2.2 <(sort -k 2 file1) <(sort -k 1 file2) 
2 HOCANM106363549 0.4832 
3 HOCANM10845509 0.4143 
9 HOCANM11098658 0.3545 
4 HOCANM11098662 0.3792 
7 HONLDM716072164 0.3600 
10 HOUSAM66626020 0.3593 
5 HOUSAM68571374 0.3649 
1 HOUSAM69708729 0.4199 
8 HOUSAM69756113 0.365                   
6 HOUSAM69990251 0.3887                   
ranks list-of-ids values 

诚然,这不处理的头很干净。你已经接受的解决办法,但我喜欢这个工具,而不是很多人都知道它;)


编辑:如果源数据没有任何标题,则该命令的伟大工程:

$ cat file1 
    1 HOUSAM69708729 0.4468 
    2 HOCANM106363549 0.4434                 
    3 HOCANM10845509 0.4268                 
    4 HOCANM11098662 0.4203                 
    5 HOUSAM68571374 0.3896 
    6 HOUSAM69990251 0.3895 
    7 HONLDM716072164 0.3893 
    8 HOUSAM69756113 0.3656 
    9 HOCANM11098658 0.3593 
    10 HOUSAM66626020 0.3538 
$ cat file2 
HOCANM106363549 0.4832 
HOUSAM69708729 0.4199 
HOCANM10845509 0.4143 
HOUSAM69990251 0.3887 
HOCANM11098662 0.3792 
HOUSAM69756113 0.365 
HOUSAM68571374 0.3649 
HONLDM716072164 0.3600 
HOUSAM66626020 0.3593 
HOCANM11098658 0.3545 
$ join -1 2 -2 1 -o 1.1,2.1,2.2 <(sort -k 2 file1) <(sort -k 1 file2) 
2 HOCANM106363549 0.4832 
3 HOCANM10845509 0.4143 
9 HOCANM11098658 0.3545 
4 HOCANM11098662 0.3792 
7 HONLDM716072164 0.3600 
10 HOUSAM66626020 0.3593 
5 HOUSAM68571374 0.3649 
1 HOUSAM69708729 0.4199 
8 HOUSAM69756113 0.365 
6 HOUSAM69990251 0.3887 

如果任一文件中确实包含了头,那么你可以只用grep出来的“排序”前:

$ cat file1 
ranks list-of-ids values 
    1 HOUSAM69708729 0.4468 
    2 HOCANM106363549 0.4434 
    3 HOCANM10845509 0.4268 
    4 HOCANM11098662 0.4203 
    5 HOUSAM68571374 0.3896 
    6 HOUSAM69990251 0.3895 
    7 HONLDM716072164 0.3893 
    8 HOUSAM69756113 0.3656 
    9 HOCANM11098658 0.3593 
    10 HOUSAM66626020 0.3538 
$ cat file2 
list-of-ids values 
HOCANM106363549 0.4832 
HOUSAM69708729 0.4199 
HOCANM10845509 0.4143 
HOUSAM69990251 0.3887 
HOCANM11098662 0.3792 
HOUSAM69756113 0.365 
HOUSAM68571374 0.3649 
HONLDM716072164 0.3600 
HOUSAM66626020 0.3593 
HOCANM11098658 0.3545 
$ join -1 2 -2 1 -o 1.1,2.1,2.2 <(grep -v "list-of-ids" file1 | sort -k 2) <(grep -v "list-of-ids" file2 | sort -k 1) 
2 HOCANM106363549 0.4832 
3 HOCANM10845509 0.4143 
9 HOCANM11098658 0.3545 
4 HOCANM11098662 0.3792 
7 HONLDM716072164 0.3600 
10 HOUSAM66626020 0.3593 
5 HOUSAM68571374 0.3649 
1 HOUSAM69708729 0.4199 
8 HOUSAM69756113 0.365 
6 HOUSAM69990251 0.3887