$show=/label

How to Read a Large File Efficiently In Java

SHARE:

1. Overview In this tutorial, You'll learn how to read the file fast in java efficient manner. Before going to our topic, Every p...

1. Overview


In this tutorial, You'll learn how to read the file fast in java efficient manner.

Before going to our topic, Every programming developer must be working on some technology such as java, python, c++ or swift. Either developing web applications or mobile applications. All the user operations must be saved in files, databases or in-memory or in image format. But here the challenging part is doing it considering the performance aspect. This is what you are going to see how to read the file in java efficiently.

And also this is a famous interview question for all level java developers. We will see the good and bad ways to read large files.

How to Read a Large File Efficiently In Java



2. Reading file using In-memory


Java introduced a new nio package for file operations. Files class a method lines() method which reads all lines from a file and creates a string in memory. This consumes lots of memory and kills the application.

If we are processing a 3GB file then it occupies the memory once the file is loaded into in-memory. Eventually, It ends up the application in the OutOfMemoryError. Once we get the OutOfMemoryError then the application stops functioning properly. Finally, We have to free up the application memory or immediately need to restart the application.

Example reading in memory:


Loading the address.JSON file in java using Files.lines() method.
If you are not aware of what is JSON? JSON is a Javascript Object Notation which is used to store key-value pair.

address.json:

{ "name"   : "John Smith",
  "sku"    : "20223",
  "price"  : 23.95,
  "shipTo" : { "name" : "Jane Smith",
               "address" : "123 Maple Street",
               "city" : "Pretendville",
               "state" : "NY",
               "zip"   : "12345" },
  "billTo" : { "name" : "John Smith",
               "address" : "123 Maple Street",
               "city" : "Pretendville",
               "state" : "NY",
               "zip"   : "12345" }
}

Program:

package com.java.w3schools.blog.java12.files;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Iterator;
import java.util.stream.Stream;

public class ReadInMemory {

 public static void main(String[] args) throws IOException {
  Stream fileContent = Files.lines(Paths.get("files", "address.json"));
  Iterator iterator = fileContent.iterator();
  while (iterator.hasNext()) {
   System.out.println(iterator.next());
  }
 }
}

Output:

{ "name"   : "John Smith",
  "sku"    : "20223",
  "price"  : 23.95,
  "shipTo" : { "name" : "Jane Smith",
               "address" : "123 Maple Street",
               "city" : "Pretendville",
               "state" : "NY",
               "zip"   : "12345" },
  "billTo" : { "name" : "John Smith",
               "address" : "123 Maple Street",
               "city" : "Pretendville",
               "state" : "NY",
               "zip"   : "12345" }
}

Here it is a small file. If it is a large file in GigaBytes then chances of failing and occurrences of performance issues are more.

So this is not a suggested way to use to read the large files.

3. Apache Commons and Gauve API (In-Memory)


As you have seen in the above section that does not work well for large files. Similar to that org.apache.commons and Guava also have such type of methods.

Files.readLines(new File(path), Charsets.UTF_8);
 
FileUtils.readLines(new File(path));

All technical architects are suggested not to use these methods on large files. Because data is loaded into memory at once.

4. Reading file line by line


In this approach, you will be reading only one line at a time. In this process, all are retrieved line by line sequentially.

package com.java.w3schools.blog.files;

import java.io.FileInputStream;
import java.io.IOException;
import java.util.Scanner;

public class ScannerExample {

 public static void main(String[] args) throws IOException {

  FileInputStream inputStream = null;
  Scanner scanner = null;
  try {
   inputStream = new FileInputStream("files//address.json");
   scanner = new Scanner(inputStream, "UTF-8");
   while (scanner.hasNextLine()) {
    String line = scanner.nextLine();
    System.out.println(line.toUpperCase());
   }
  } finally {
   if (inputStream != null) {
    inputStream.close();
   }
   if (scanner != null) {
    scanner.close();
   }
  }
 }

}

This process will repeat through every line in the file – taking into consideration the handling of each line, without keeping references in memory.

5. Reading Efficiently with Apache Commons IO


The same from the above approach can be achieved from the Apache Commons library as well. As it provides a custom line iterator.

Add the below dependency in the pom.xml file.


<!-- https://mvnrepository.com/artifact/commons-io/commons-io -->
<dependency>
    <groupId>commons-io</groupId>
    <artifactId>commons-io</artifactId>
    <version>2.6</version>
</dependency>


Code:

package com.java.w3schools.blog.files;

import java.io.File;
import java.io.IOException;

import org.apache.commons.io.FileUtils;
import org.apache.commons.io.LineIterator;

public class ApacheCommonsCustomIterator {

 public static void main(String[] args) throws IOException {
  LineIterator it = FileUtils.lineIterator(new File("files//address.json"), "UTF-8");
  try {
   while (it.hasNext()) {
    String line = it.nextLine();
    System.out.println(line.toLowerCase());
   }
  } finally {
   LineIterator.closeQuietly(it);
  }
 }

}

In this process and above, the entire file is not loaded into memory. So, memory is utilized efficiently.

6. Split File and process parallel


Reading the file line by line will do the job in terms of memory efficiently but it takes lots of time. You should consider the time as well. For huge traffic websites, time is very crucial to understand business.

You should divide the file into chunks as how Hadoop does internally to store the files into HDFS. But here the discussion on reading the file effectively. So not going into much on Hadoop and HDFS. But, just remember that Hadoop does the file split and run the same logic on each splitted file. Finally, aggregate the output from all splits and run the same logic on this.

Example code to divide the file into MB's:


Constants declaration:

private static final String dir = "/tmp/";
private static final String suffix = ".splitPart";

Split files logic:

/**
 * Split a file into multiples files.
 *
 * @param fileName   Name of file to be split.
 * @param mBperSplit maximum number of MB per file.
 * @throws IOException
 */
public static List splitFile(final String fileName, final int mBperSplit) throws IOException {

    if (mBperSplit <= 0) {
        throw new IllegalArgumentException("mBperSplit must be more than zero");
    }

    List partFiles = new ArrayList<>();
    final long sourceSize = Files.size(Paths.get(fileName));
    final long bytesPerSplit = 1024L * 1024L * mBperSplit;
    final long numSplits = sourceSize / bytesPerSplit;
    final long remainingBytes = sourceSize % bytesPerSplit;
    int position = 0;

    try (RandomAccessFile sourceFile = new RandomAccessFile(fileName, "r");
         FileChannel sourceChannel = sourceFile.getChannel()) {

        for (; position < numSplits; position++) {
            //write multipart files.
            writePartToFile(bytesPerSplit, position * bytesPerSplit, sourceChannel, partFiles);
        }

        if (remainingBytes > 0) {
            writePartToFile(remainingBytes, position * bytesPerSplit, sourceChannel, partFiles);
        }
    }
    return partFiles;
}

Write files example:

private static void writePartToFile(long byteSize, long position, FileChannel sourceChannel, List partFiles) throws IOException {
    Path fileName = Paths.get(dir + UUID.randomUUID() + suffix);
    try (RandomAccessFile toFile = new RandomAccessFile(fileName.toFile(), "rw");
         FileChannel toChannel = toFile.getChannel()) {
        sourceChannel.position(position);
        toChannel.transferFrom(sourceChannel, 0, byteSize);
    }
    partFiles.add(fileName);
}


The above code is to generate the file division. Once the file is splitted then run the reading line by line logic on each split. This will minimize the time processing.

7. Conclusion


In this article, we've seen how to read the file effectively.

Covered areas

How to load the entire file into memory?
The drawback of reading the whole file into memory?
How to read line by line using traditional java API?
Reading line by line using Apache commons API? (Recommended for big files if no importance how much time it takes).
What are the drawbacks of reading line by line?
The best way through file split?

If you have any questions, please leave a comment.

8. References

References sites that used to craft this interesting tutorial.

References for file split

JSON Spec

Files.lines() api

Iterate API

Apache Commons IO API


COMMENTS

BLOGGER

About Us

Author: Venkatesh - I love to learn and share the technical stuff.
Name

accumulo,1,ActiveMQ,2,Adsense,1,API,31,ArrayList,16,Arrays,2,Bean Creation,3,Bean Scopes,1,BiConsumer,1,Blogger Tips,1,Books,1,C Programming,1,Collection,4,Collections,20,Collector,1,Command Line,1,Compile Errors,1,Configurations,7,Constants,1,Control Statements,8,Conversions,5,Core Java,73,Corona India,1,Create,2,CSS,1,Date,2,Date Time API,3,Dictionary,1,Difference,1,Download,1,Eclipse,2,Efficiently,1,Error,1,Errors,1,Exception,1,Exceptions,3,Fast,1,Files,9,Float,1,Font,1,For examples,1,For loop examples,1,For Loop in Java,1,Form,1,Freshers,1,Function,3,Functional Interface,2,Garbage Collector,1,Generics,4,Git,4,Grant,1,Grep,1,HashMap,1,HomeBrew,2,HTML,2,HttpClient,2,Immutable,1,Inner for loops,1,Installation,1,Interview Questions,5,Iterate,2,Jackson API,3,Java,28,Java 10,1,Java 11,5,Java 12,5,Java 13,2,Java 14,2,java 5 For loop,1,Java 8,48,Java 9,1,Java Design Patterns,1,Java Files,1,Java for loop,1,Java Program,2,Java Programs,65,java.lang,5,java.util. function,1,jQuery,1,Kotlin,10,Kotlin Programs,6,Lambda,1,lang,29,Leap Year,1,live updates,1,Mac OS,2,Math,1,Maven,1,Method References,1,Mockito,1,MongoDB,3,Nested for loop,1,Nested for loop examples,1,New Features,1,Operations,1,Optional,4,Oracle,5,Oracle 18C,1,Partition,1,Patterns,1,Programs,1,Property,1,Python,2,Quarkus,1,Read,1,Real Time,1,Recursion,2,Remove,2,Rest API,1,Schedules,1,Serialization,1,Servlet,1,Sorting Techniques,8,Spring,2,Spring Boot,23,Spring Email,1,Spring MVC,1,Stream,3,Streams,11,String,48,String Programs,8,String Revese,1,Swing,1,System,1,Tags,1,Threads,8,Tomcat,1,Tomcat 8,1,Troubleshoot,16,Unix,2,Updates,3,util,5,While Loop,1,
ltr
item
JavaProgramTo.com: How to Read a Large File Efficiently In Java
How to Read a Large File Efficiently In Java
https://1.bp.blogspot.com/-cTpSNa7Ndhs/XbxVRfAxjyI/AAAAAAAAB2g/GY7qejHJDEgIDU3gqEOkR9hz9mk3Q-V7gCLcBGAsYHQ/s400/How%2Bto%2BRead%2Ba%2BLarge%2BFile%2BEfficiently%2BIn%2BJava.png
https://1.bp.blogspot.com/-cTpSNa7Ndhs/XbxVRfAxjyI/AAAAAAAAB2g/GY7qejHJDEgIDU3gqEOkR9hz9mk3Q-V7gCLcBGAsYHQ/s72-c/How%2Bto%2BRead%2Ba%2BLarge%2BFile%2BEfficiently%2BIn%2BJava.png
JavaProgramTo.com
https://www.javaprogramto.com/2019/11/java-read-lines-large-file.html
https://www.javaprogramto.com/
https://www.javaprogramto.com/
https://www.javaprogramto.com/2019/11/java-read-lines-large-file.html
true
3124782013468838591
UTF-8
Loaded All Posts Not found any posts VIEW ALL Readmore Reply Cancel reply Delete By Home PAGES POSTS View All RECOMMENDED FOR YOU LABEL ARCHIVE SEARCH ALL POSTS Not found any post match with your request Back Home Sunday Monday Tuesday Wednesday Thursday Friday Saturday Sun Mon Tue Wed Thu Fri Sat January February March April May June July August September October November December Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec just now 1 minute ago $$1$$ minutes ago 1 hour ago $$1$$ hours ago Yesterday $$1$$ days ago $$1$$ weeks ago more than 5 weeks ago Followers Follow THIS PREMIUM CONTENT IS LOCKED STEP 1: Share to a social network STEP 2: Click the link on your social network Copy All Code Select All Code All codes were copied to your clipboard Can not copy the codes / texts, please press [CTRL]+[C] (or CMD+C with Mac) to copy Table of Content