如何阅读hadoop顺序文件？

问题描述：

我有一个顺序文件，它是hadoop map-reduce作业的输出。在这个文件中，数据是用键值对写入的，而数值本身就是一张地图。我想读取作为MAP对象的值，以便我可以进一步处理它。如何阅读hadoop顺序文件？

Configuration config = new Configuration(); 
    Path path = new Path("D:\\OSP\\sample_data\\data\\part-00000"); 
    SequenceFile.Reader reader = new SequenceFile.Reader(FileSystem.get(config), path, config); 
    WritableComparable key = (WritableComparable) reader.getKeyClass().newInstance(); 
    Writable value = (Writable) reader.getValueClass().newInstance(); 
    long position = reader.getPosition(); 

    while(reader.next(key,value)) 
    { 
      System.out.println("Key is: "+textKey +" value is: "+val+"\n"); 
    }

输出程序的：关键是：这是关键]值是：{ABC = 839177，XYZ = 548498，LMN = 2，PQR = 1}

这里我得到的值作为字符串，但我想它作为地图的一个对象。

“val”从哪里来？并且Map不是'Writable'，你在m/r工作中的类是什么？ –

我只有顺序文件，并没有意识到他们在做地图缩减工作时做了什么，而且我提供了以下信息：“每个这样的文件需要作为序列文件打开，需要使用解压缩编解码器 - 序列文件类似乎可以通过告诉你使用什么压缩编解码器，然后我认为每个键和每个值都使用TypedBytes进行编码。“ – samarth

然后你必须得到关键和值的类，否则你不会正确地反序列化它们。 –

答

检查SequenceFile#next(Writable, Writable)

while(reader.next(key,value)) 
{ 
     System.out.println("Key is: "+textKey +" value is: "+val+"\n"); 
}

API文档应

while(reader.next(key,value)) 
{ 
     System.out.println("Key is: "+key +" value is: "+value+"\n"); 
}

使用SequenceFile.Reader#getValueClassName代替，以获得在SequenceFile值类型。 SequenceFile在文件头中具有键/值类型。

谢谢你，值类是“TypedBytesWritable”我可以从这个类中获取地图对象吗？ – samarth

[TypedBytesWritable＃getValue]（http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/typedbytes/TypedBytesWritable.html#getValue%28%29）应该得到对象。 –

嘿它为我工作..谢谢你这么多Praveen。 – samarth

如何阅读hadoop顺序文件？

相关推荐