Comma-Separated Values (CSV) is a popular file format for storing tabular data such as spreadsheets and databases in plain text. It uses a delimiter, such as a comma, to separate the values. Every line of the file is a data record. Every record consists of one or more fields, separated by commas.
In this tutorial, you shall learn how to read and write CSV files in Java using Apache Commons CSV.
Dependencies
You need to add apache-commons-csv
dependency to your project. If you are using Gradle, add the following dependency to your build.gradle
file:
implementation 'org.apache.commons:commons-csv:1.7'
For the Maven project, add the following to your pom.xml
file:
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-csv</artifactId>
<version>1.7</version>
</dependency>
Reading CSV Files
The Apache Commons CSV library provides several ways to read CSV files in different formats. If you are reading an Excel CSV file, it is likely to have a header.
However, if you use a CSV file as a simple text file to transfer the data from one server to another, the file may not include the header. The Apache Commons CSV library works in both cases.
Let us create two sample CSV files: one with a header and another without a header. We will use these files to read and parse in our examples. These files contain the user's information like id, name, email address, and country code.
users.csv
1,Atta Shah,atta@example.com,PK
2,Alex Jones,alex@example.com,DE
3,Jovan Lee,jovan@example.com,FR
4,Greg Hover,greg@example.com,US
users-with-header.csv
ID,Name,Email,Country
1,Atta Shah,atta@example.com,PK
2,Alex Jones,alex@example.com,DE
3,Jovan Lee,jovan@example.com,FR
4,Greg Hover,greg@example.com,US
Let us start with the first file that does not contain a header. There are two ways to read this file which are explained below.
Reading a CSV file using column index
The simplest way to read a file through Apache Commons CSV is by using the column index to access the value of a record:
try {
// create a reader
Reader reader = Files.newBufferedReader(Paths.get("users.csv"));
// read csv file
Iterable<CSVRecord> records = CSVFormat.DEFAULT.parse(reader);
for (CSVRecord record : records) {
System.out.println("Record #: " + record.getRecordNumber());
System.out.println("ID: " + record.get(0));
System.out.println("Name: " + record.get(1));
System.out.println("Email: " + record.get(2));
System.out.println("Country: " + record.get(3));
}
// close the reader
reader.close();
} catch (IOException ex) {
ex.printStackTrace();
}
The above code is pretty much self-explanatory. We created an instance of BufferedReader
and pass it to CSVFormat
class static method parse()
with a DEFAULT
CSV format.
The CSVFormat
class provides some commonly used CSV variants:
DEFAULT
— Standard CSV format, similar to RFC4180 but allowing empty lines.EXCEL
— The Microsoft Excel CSV format.MYSQL
— The MySQL CSV format.ORACLE
— Default Oracle format used by the SQL Loader utility.POSTGRESSQL_CSV
— Default PostgreSQL CSV format used by the COPY operation.POSTGRESSQL_TEXT
— Default PostgreSQL text format used by the COPY operation.RFC-4180
— The RFC-4180 format defined by RFC-4180.TDF
— A tab-delimited format.
The parse()
method returns an instance of CSVParser
that we can use to iterate over all the records using a loop. It reads and parses one record at a time from the CSV file. The getRecordNumber()
method returns the number assigned to the record in the CSV file.
Alternatively, you can also use the getRecords()
method from the CSVParser
class to read all the records at once into memory:
// read all records into memory
List<CSVRecord> records = CSVFormat.DEFAULT.parse(reader).getRecords();
But it is not suitable for reading significantly large CSV files. It can severely impact your system performance because getRecords()
loads the entire CSV file into memory.
Reading a CSV file using a manually defined header
Column indexes may not be the most intuitive way to access the record values for some people. For this purpose, it is possible to manually assign names to each column in the file and then retrieve the values using the assigned names.
Here is an example that manually defines a header and gets the values using the header names:
try {
// create a reader
Reader reader = Files.newBufferedReader(Paths.get("users.csv"));
// read csv file
Iterable<CSVRecord> records = CSVFormat.DEFAULT.withHeader("ID", "Name", "Email", "Country").parse(reader);
for (CSVRecord record : records) {
System.out.println("Record #: " + record.getRecordNumber());
System.out.println("ID: " + record.get("ID"));
System.out.println("Name: " + record.get("Name"));
System.out.println("Email: " + record.get("Email"));
System.out.println("Country: " + record.get("Country"));
}
// close the reader
reader.close();
} catch (IOException ex) {
ex.printStackTrace();
}
Note that the column values are still accessible using their index.
Reading a CSV file using an enum to define a header
Using strings everywhere in the code to reference column names is not a good approach. For this reason, it is possible to use an enum to specify the header names. Then we can use the enum constants to access the values.
Let us define an enum first for the header names:
Headers.java
public enum Headers {
ID,
NAME,
EMAIL,
COUNTRY;
}
Here is an example that uses the above enum to specify the header names for the CSV file:
try {
// create a reader
Reader reader = Files.newBufferedReader(Paths.get("users.csv"));
// read csv file
Iterable<CSVRecord> records = CSVFormat.DEFAULT.withHeader(Headers.class).parse(reader);
for (CSVRecord record : records) {
System.out.println("Record #: " + record.getRecordNumber());
System.out.println("ID: " + record.get(Headers.ID));
System.out.println("Name: " + record.get(Headers.NAME));
System.out.println("Email: " + record.get(Headers.EMAIL));
System.out.println("Country: " + record.get(Headers.COUNTRY));
}
// close the reader
reader.close();
} catch (IOException ex) {
ex.printStackTrace();
}
Again it is still possible to access column values by their index and by using a string (for example, ID
).
Reading a CSV file with header auto detection
Some CSV files, like Excel, define header names as their first record. If asked, the Apache Commons CSV library can auto-detect the header names from the first record.
Let us read the second sample file (users-with-header.csv
) that defines a header through the header auto-detection method:
try {
// create a reader
Reader reader = Files.newBufferedReader(Paths.get("users-with-header.csv"));
// read csv file
Iterable<CSVRecord> records = CSVFormat.DEFAULT.withFirstRecordAsHeader()
.withIgnoreHeaderCase()
.withTrim()
.parse(reader);
for (CSVRecord record : records) {
System.out.println("Record #: " + record.getRecordNumber());
System.out.println("ID: " + record.get("ID"));
System.out.println("Name: " + record.get("Name"));
System.out.println("Email: " + record.get("Email"));
System.out.println("Country: " + record.get("Country"));
}
// close the reader
reader.close();
} catch (IOException ex) {
ex.printStackTrace();
}
Apache Commons CSV uses the values from the first record as header names and skips the first record when iterating.
We have also specified some additional configurations like withIgnoreHeaderCase()
and withTrim()
. The ignoreHeaderCase
option makes the header names case-insensitive, and the withTrim
option trims leading and trailing blank spaces from the column values.
Writing CSV Files
The Apache Commons CSV library is commonly used for reading data from CSV files. But it can also be used to generate CSV files.
Let us create a simple CSV file using Apache Commons CSV:
try {
// create a writer
Writer writer = Files.newBufferedWriter(Paths.get("students.csv"));
// write CSV file
CSVPrinter printer = CSVFormat.DEFAULT.withHeader("ID", "Name", "Program", "University").print(writer);
printer.printRecord(1, "John Mike", "Engineering", "MIT");
printer.printRecord(2, "Jovan Krovoski", "Medical", "Harvard");
printer.printRecord(3, "Lando Mata", "Computer Science", "TU Berlin");
printer.printRecord(4, "Emma Ali", "Mathematics", "Oxford");
// flush the stream
printer.flush();
// close the writer
writer.close();
} catch (IOException ex) {
ex.printStackTrace();
}
The above example will generate the following CSV file:
ID,Name,Program,University
1,John Mike,Engineering,MIT
2,Jovan Krovoski,Medical,Harvard
3,Lando Mata,Computer Science,TU Berlin
4,Emma Ali,Mathematics,Oxford
The CSVPrinter
class also provides a printRecords()
method that accepts a collection of objects and writes them into the file.
Let us rewrite the above example to use this method:
try {
// create a writer
Writer writer = Files.newBufferedWriter(Paths.get("students.csv"));
// write CSV file
CSVPrinter printer = CSVFormat.DEFAULT.withHeader("ID", "Name", "Program", "University").print(writer);
// create a list
List<Object[]> data = new ArrayList<>();
data.add(new Object[] {1, "John Mike", "Engineering", "MIT"});
data.add(new Object[] {2, "Jovan Krovoski", "Medical", "Harvard"});
data.add(new Object[] {3, "Lando Mata", "Computer Science", "TU Berlin"});
data.add(new Object[] {4, "Emma Ali", "Mathematics", "Oxford"});
// write list to file
printer.printRecords(data);
// flush the stream
printer.flush();
// close the writer
writer.close();
} catch (IOException ex) {
ex.printStackTrace();
}
Conclusion
That's all for reading and writing CSV files using Apache Commons CSV. This library provides a simple interface to read and write CSV files of various types.
The Apache Commons CSV library is well-maintained and updated regularly. Check out the official user guide to learn about more available options.
Further Reading
If you enjoy reading this article, you may also be interested in reading other CSV-related articles:
- Reading and writing CSV files using OpenCSV
- Reading and writing CSV files using core Java
- Export & Download Data as CSV File in Spring Boot
✌️ Like this article? Follow me on Twitter and LinkedIn. You can also subscribe to RSS Feed.