How to read and write CSV files using Apache Commons CSV

Comma-Separated Values (CSV) is a popular file format for storing tabular data such as spreadsheets and databases in plain text. It uses a delimiter, such as a comma, to separate the values. Every line of the file is a data record. Every record consists of one or more fields, separated by commas.

In this tutorial, you shall learn how to read and write CSV files in Java using Apache Commons CSV.

Dependencies

You need to add apache-commons-csv dependency to your project. If you are using Gradle, add the following dependency to your build.gradle file:

implementation 'org.apache.commons:commons-csv:1.7'

For the Maven project, add the following to your pom.xml file:

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-csv</artifactId>
    <version>1.7</version>
</dependency>

Reading CSV Files

The Apache Commons CSV library provides several ways to read CSV files in different formats. If you are reading an Excel CSV file, it is likely to have a header.

However, if you use a CSV file as a simple text file to transfer the data from one server to another, the file may not include the header. The Apache Commons CSV library works in both cases.

Let us create two sample CSV files: one with a header and another without a header. We will use these files to read and parse in our examples. These files contain the user's information like id, name, email address, and country code.

users.csv

1,Atta Shah,atta@example.com,PK
2,Alex Jones,alex@example.com,DE
3,Jovan Lee,jovan@example.com,FR
4,Greg Hover,greg@example.com,US

users-with-header.csv

ID,Name,Email,Country
1,Atta Shah,atta@example.com,PK
2,Alex Jones,alex@example.com,DE
3,Jovan Lee,jovan@example.com,FR
4,Greg Hover,greg@example.com,US

Let us start with the first file that does not contain a header. There are two ways to read this file which are explained below.

Reading a CSV file using column index

The simplest way to read a file through Apache Commons CSV is by using the column index to access the value of a record:

try {
    // create a reader
    Reader reader = Files.newBufferedReader(Paths.get("users.csv"));

    // read csv file
    Iterable<CSVRecord> records = CSVFormat.DEFAULT.parse(reader);
    for (CSVRecord record : records) {
        System.out.println("Record #: " + record.getRecordNumber());
        System.out.println("ID: " + record.get(0));
        System.out.println("Name: " + record.get(1));
        System.out.println("Email: " + record.get(2));
        System.out.println("Country: " + record.get(3));
    }

    // close the reader
    reader.close();

} catch (IOException ex) {
    ex.printStackTrace();
}

The above code is pretty much self-explanatory. We created an instance of BufferedReader and pass it to CSVFormat class static method parse() with a DEFAULT CSV format.

The CSVFormat class provides some commonly used CSV variants:

DEFAULT — Standard CSV format, similar to RFC4180 but allowing empty lines.
EXCEL — The Microsoft Excel CSV format.
MYSQL — The MySQL CSV format.
ORACLE — Default Oracle format used by the SQL Loader utility.
POSTGRESSQL_CSV — Default PostgreSQL CSV format used by the COPY operation.
POSTGRESSQL_TEXT — Default PostgreSQL text format used by the COPY operation.
RFC-4180 — The RFC-4180 format defined by RFC-4180.
TDF — A tab-delimited format.

The parse() method returns an instance of CSVParser that we can use to iterate over all the records using a loop. It reads and parses one record at a time from the CSV file. The getRecordNumber() method returns the number assigned to the record in the CSV file.

Alternatively, you can also use the getRecords() method from the CSVParser class to read all the records at once into memory:

// read all records into memory
List<CSVRecord> records = CSVFormat.DEFAULT.parse(reader).getRecords();

But it is not suitable for reading significantly large CSV files. It can severely impact your system performance because getRecords() loads the entire CSV file into memory.

Reading a CSV file using a manually defined header

Column indexes may not be the most intuitive way to access the record values for some people. For this purpose, it is possible to manually assign names to each column in the file and then retrieve the values using the assigned names.

Here is an example that manually defines a header and gets the values using the header names:

try {
    // create a reader
    Reader reader = Files.newBufferedReader(Paths.get("users.csv"));

    // read csv file
    Iterable<CSVRecord> records = CSVFormat.DEFAULT.withHeader("ID", "Name", "Email", "Country").parse(reader);
    for (CSVRecord record : records) {
        System.out.println("Record #: " + record.getRecordNumber());
        System.out.println("ID: " + record.get("ID"));
        System.out.println("Name: " + record.get("Name"));
        System.out.println("Email: " + record.get("Email"));
        System.out.println("Country: " + record.get("Country"));
    }

    // close the reader
    reader.close();

} catch (IOException ex) {
    ex.printStackTrace();
}

Note that the column values are still accessible using their index.

Reading a CSV file using an enum to define a header

Using strings everywhere in the code to reference column names is not a good approach. For this reason, it is possible to use an enum to specify the header names. Then we can use the enum constants to access the values.

Let us define an enum first for the header names:

Headers.java

public enum Headers {
    ID,
    NAME,
    EMAIL,
    COUNTRY;
}

Here is an example that uses the above enum to specify the header names for the CSV file:

try {
    // create a reader
    Reader reader = Files.newBufferedReader(Paths.get("users.csv"));

    // read csv file
    Iterable<CSVRecord> records = CSVFormat.DEFAULT.withHeader(Headers.class).parse(reader);
    for (CSVRecord record : records) {
        System.out.println("Record #: " + record.getRecordNumber());
        System.out.println("ID: " + record.get(Headers.ID));
        System.out.println("Name: " + record.get(Headers.NAME));
        System.out.println("Email: " + record.get(Headers.EMAIL));
        System.out.println("Country: " + record.get(Headers.COUNTRY));
    }

    // close the reader
    reader.close();

} catch (IOException ex) {
    ex.printStackTrace();
}

Again it is still possible to access column values by their index and by using a string (for example, ID).

Reading a CSV file with header auto detection

Some CSV files, like Excel, define header names as their first record. If asked, the Apache Commons CSV library can auto-detect the header names from the first record.

Let us read the second sample file (users-with-header.csv) that defines a header through the header auto-detection method:

try {
    // create a reader
    Reader reader = Files.newBufferedReader(Paths.get("users-with-header.csv"));

    // read csv file
    Iterable<CSVRecord> records = CSVFormat.DEFAULT.withFirstRecordAsHeader()
            .withIgnoreHeaderCase()
            .withTrim()
            .parse(reader);

    for (CSVRecord record : records) {
        System.out.println("Record #: " + record.getRecordNumber());
        System.out.println("ID: " + record.get("ID"));
        System.out.println("Name: " + record.get("Name"));
        System.out.println("Email: " + record.get("Email"));
        System.out.println("Country: " + record.get("Country"));
    }

    // close the reader
    reader.close();

} catch (IOException ex) {
    ex.printStackTrace();
}

Apache Commons CSV uses the values from the first record as header names and skips the first record when iterating.

We have also specified some additional configurations like withIgnoreHeaderCase() and withTrim(). The ignoreHeaderCase option makes the header names case-insensitive, and the withTrim option trims leading and trailing blank spaces from the column values.

Writing CSV Files

The Apache Commons CSV library is commonly used for reading data from CSV files. But it can also be used to generate CSV files.

Let us create a simple CSV file using Apache Commons CSV:

try {
    // create a writer
    Writer writer = Files.newBufferedWriter(Paths.get("students.csv"));

    // write CSV file
    CSVPrinter printer = CSVFormat.DEFAULT.withHeader("ID", "Name", "Program", "University").print(writer);

    printer.printRecord(1, "John Mike", "Engineering", "MIT");
    printer.printRecord(2, "Jovan Krovoski", "Medical", "Harvard");
    printer.printRecord(3, "Lando Mata", "Computer Science", "TU Berlin");
    printer.printRecord(4, "Emma Ali", "Mathematics", "Oxford");

    // flush the stream
    printer.flush();

    // close the writer
    writer.close();

} catch (IOException ex) {
    ex.printStackTrace();
}

The above example will generate the following CSV file:

ID,Name,Program,University
1,John Mike,Engineering,MIT
2,Jovan Krovoski,Medical,Harvard
3,Lando Mata,Computer Science,TU Berlin
4,Emma Ali,Mathematics,Oxford

The CSVPrinter class also provides a printRecords() method that accepts a collection of objects and writes them into the file.

Let us rewrite the above example to use this method:

try {
    // create a writer
    Writer writer = Files.newBufferedWriter(Paths.get("students.csv"));

    // write CSV file
    CSVPrinter printer = CSVFormat.DEFAULT.withHeader("ID", "Name", "Program", "University").print(writer);

    // create a list
    List<Object[]> data = new ArrayList<>();
    data.add(new Object[] {1, "John Mike", "Engineering", "MIT"});
    data.add(new Object[] {2, "Jovan Krovoski", "Medical", "Harvard"});
    data.add(new Object[] {3, "Lando Mata", "Computer Science", "TU Berlin"});
    data.add(new Object[] {4, "Emma Ali", "Mathematics", "Oxford"});

    // write list to file
    printer.printRecords(data);

    // flush the stream
    printer.flush();

    // close the writer
    writer.close();

} catch (IOException ex) {
    ex.printStackTrace();
}

Conclusion

That's all for reading and writing CSV files using Apache Commons CSV. This library provides a simple interface to read and write CSV files of various types.

The Apache Commons CSV library is well-maintained and updated regularly. Check out the official user guide to learn about more available options.

How to read and write CSV files using Apache Commons CSV

Dependencies

Reading CSV Files

Reading a CSV file using column index

Reading a CSV file using a manually defined header

Reading a CSV file using an enum to define a header

Reading a CSV file with header auto detection

Writing CSV Files

Conclusion

Further Reading

You might also like...

Buy me a coffee ☕

✨ Learn to build modern web applications using JavaScript and Spring Boot