HDFS图解及流对文件的操作
HDFS:Hadoop Distributed File System(Hadoop分布式文件系统)
Introduction:The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS relaxes a few POSIX requirements to enable streaming access to file system data. HDFS was originally built as infrastructure for the Apache Nutch web search engine project. HDFS is part of the Apache Hadoop Core project.
大致意思:Hadoop分布式文件系统是一个分布式文件系统,为了运行在普通硬件上而设计的,它和已经存在的分布式文件系统很相似。然而,不同之处是其他的分布式文件系统是重要的(因为是单节点)。Hdfs是高容错和为廉价硬件而设计的。Hfds提供高的吞吐量对于通过应用的数据和大的数据集是最合适的。HDFS放宽了一些POSIX要求,允许对文件系统数据进行流式访问,hdfs是以Apache nutch 网页搜索引擎项目而开发的。Hdfs是Apache Hadoop核心的一部分。
1、Hdfs读写过程(图解)
- Hdfs 以流的形式实现读写过程
1)数据的读取
Public void downloadFile(){
//1.针对hdfs获取一个文件的输入流
FSDataInputStream input = fs.open(new Path(“/hadoop修炼笔记”));
//2.针对本地文件获取一个文件的输出流
FileOutputStream output = new FileOutputStream(new File(“e:/”));
//3.利用工具类写入
IOUtils.copyBytes(input,output,4096);
//4.刷新写入流,写到磁盘
Output.flush();
//5.关闭流
Output.close();
Input.close();
}
- 数据的写入
Public void uploadFile(){
//1.针对hdfs获取一个文件的输出流
FSDataOutputStream output = fs.create(new Path(“/aaa/”));
//2.针对本地文件获取一个文件的输入流
FileInputStream input = new FileInputStream(“e:/hadoop修炼笔记”);
//3.利用工具类写入
IOUtils.copyBytes(input,output4096);
//4.刷新写入流,写到磁盘
Output.flush();
//5.关闭流
Output.close();
Input.close();
}