阅读大文件错误“outofmemoryerror”（java）

问题描述：

对不起，我的英语。我想读一个大文件，但是当我读取错误发生时outOfMemoryError。我不明白如何在应用程序中处理内存。以下代码不起作用：阅读大文件错误“outofmemoryerror”（java）

try { 

    StringBuilder fileData = new StringBuilder(1000); 
    BufferedReader reader = new BufferedReader(new FileReader(file)); 

    char[] buf = new char[8192]; 
    int bytesread = 0, 
     bytesBuffered = 0; 

    while((bytesread = reader.read(buf)) > -1) { 

     String readData = String.valueOf(buf, 0, bytesread); 
     bytesBuffered += bytesread; 

     fileData.append(readData); //this is error 

     if (bytesBuffered > 1024 * 1024) { 
      bytesBuffered = 0; 
     } 
    } 

    System.out.println(fileData.toString().toCharArray()); 
} finally { 

}

什么是你可以使用尽可能高的Java版本？以这种方式读取文件非常过时，除了因为Android或其他原因需要使用Java 6以外。否则，你应该使用Java 8的Stream API。 – Bevor 2015-02-07 14:40:46

我使用1.70_71。我需要读取大文件，而不是readLine（）。因为文件（5GB）只能包含一行 – qazqwerty 2015-02-07 14:57:18

答

您需要预先分配一个大缓冲区以避免重新分配。

File file = ...; 
StringBuilder fileData = new StringBuilder(file.size());

并备有大量堆大小运行：

java -Xmx2G

====更新

while循环利用缓冲区并不需要太多的内存来运行。将输入视为流，将搜索字符串与流匹配。这是一个非常简单的状态机。如果你需要搜索多个单词，你可以找到一个TrieTree实现（支持流）。

// the match state model 
...xxxxxxabxxxxxaxxxxxabcdexxxx... 
     ab  a  abcd 

    File file = new File("path_to_your_file"); 
    String yourSearchWord = "abcd"; 
    int matchIndex = 0; 
    boolean matchPrefix = false; 
    try (BufferedReader reader = new BufferedReader(new FileReader(file))) { 
     int chr; 
     while ((chr = reader.read()) != -1) { 
      if (matchPrefix == false) { 
       char searchChar = yourSearchWord.charAt(0); 
       if (chr == searchChar) { 
        matchPrefix = true; 
        matchIndex = 0; 
       } 
      } else { 
       char searchChar = yourSearchWord.charAt(++matchIndex); 
       if (chr == searchChar) { 
        if (matchIndex == yourSearchWord.length() - 1) { 
         // match!! 
         System.out.println("match: " + matchIndex); 
         matchPrefix = false; 
         matchIndex = 0; 
        } 
       } else { 
        matchPrefix = false; 
        matchIndex = 0; 
       } 
      } 
     } 
    }

感谢您的回复。这没关系，例如，如果10GB文件使用这个'StringBuilder fileData = new StringBuilder（file.size（））;'？ – qazqwerty 2015-02-07 14:13:28

你能描述一下10GB文件的操作吗？这个过程可能会有所不同，取决于你的工作。 – javamonk 2015-02-07 14:17:38

我需要一个大文件（5-10gb）来查找包含所需单词的字符串。我不知道该怎么做，也许过多的面向字符或多部分下载。很高兴找到那个例子。 – qazqwerty 2015-02-07 14:21:41

答

试试这个。这可能会有所帮助： -

try{ 
    BufferedReader reader = new BufferedReader(new FileReader(file)); 
    String txt = ""; 
    while((txt = reader.read()) != null){ 
     System.out.println(txt); 
    } 
}catch(Exception e){ 
    System.out.println("Error : "+e.getMessage()); 
}

感谢您的回答。 'txt = reader中的这个错误）！= null'，未解决的编译问题：类型不匹配：无法从BufferedReader转换为String。 – qazqwerty 2015-02-07 14:09:59

@qazqwerty对不起我的坏...它的'reader.read（）'。看到修改后的代码 – khandelwaldeval 2015-02-07 14:13:41

@qazqwerty如果它可以帮助你，也不会忘记接受（勾选）答案 – khandelwaldeval 2015-02-07 14:14:51

答

你不应该在内存中保存这样的大文件，因为你用完了，就像你看到的那样。由于您使用Java 7，因此您需要手动将文件作为流读取，并即时检查内容。否则，你可以使用Java 8的流API。这只是一个例子。它的工作原理，但要记住，该发现字的位置可能会因为编码的问题各不相同，所以这是没有生产代码：

import java.io.File; 
import java.io.FileInputStream; 
import java.io.IOException; 

public class FileReader 
{ 
    private static String wordToFind = "SEARCHED_WORD"; 
    private static File file = new File("YOUR_FILE"); 
    private static int currentMatchingPosition; 
    private static int foundAtPosition = -1; 
    private static int charsRead; 

    public static void main(String[] args) throws IOException 
    { 
     try (FileInputStream fis = new FileInputStream(file)) 
     { 
      System.out.println("Total size to read (in bytes) : " + fis.available()); 

      int c; 
      while ((c = fis.read()) != -1) 
      { 
       charsRead++; 
       checkContent(c); 
      } 

      if (foundAtPosition > -1) 
      { 
       System.out.println("Found word at position: " + (foundAtPosition - wordToFind.length())); 
      } 
      else 
      { 
       System.out.println("Didnt't find the word!"); 
      } 

     } 
     catch (IOException e) 
     { 
      e.printStackTrace(); 
     } 
    } 

    private static void checkContent(int c) 
    { 
     if (currentMatchingPosition >= wordToFind.length()) 
     { 
      //already found.... 
      return; 
     } 

     if (wordToFind.charAt(currentMatchingPosition) == (char)c) 
     { 
      foundAtPosition = charsRead; 
      currentMatchingPosition++; 
     } 
     else 
     { 
      currentMatchingPosition = 0; 
      foundAtPosition = -1; 
     } 
    } 
}

阅读大文件错误“outofmemoryerror”（java）

相关推荐