Properly handling backslashes using OpenCSV
OpenCSV is one of the popular JAVA libraries used for handling CSV data. In this post I will discuss about one specific issue which we recently faced with this library.
The Problem:
Here is a minimal code snippet for writing and reading CSV data using OpenCSV.
String dataValue = "test";
//writing
StringWriter writer = new StringWriter();
try (CSVWriter csvwriter = new CSVWriter(writer)) {
String[] originalData = new String[2];
originalData[0] = dataValue;
originalData[1] = dataValue;
System.out.println("Original data: " + originalData[0] + "," + originalData[1]);
csvwriter.writeNext(originalData);
} catch (IOException e) {
throw new RuntimeException(e);
}
System.out.println("Written data: " + writer.toString());
//reading
try (CSVReader csvReader = new CSVReader(new StringReader(writer.toString()))) {
String[] readData = csvReader.readNext();
System.out.println("Read data: " + readData[0] + "," + readData[1]);
} catch (IOException e) {
throw new RuntimeException(e);
}
The output of the above snippet:
Original data: test,test
Written data: "test","test"
Read data: test,test
Which is as expected. Well, the life is good with OpenCSV until you encounter a backslash character (‘\’) in your CSV data.
So let’s try running the same snippet with dataValue
having a backslash character:
String dataValue = "t\\est";
Output:
Original data: t\est,t\est
Written data: "t\est","t\est"
Read data: test,test
Note that the backslash character is gone in the read CSV data.
The root cause:
By default CSVReader
is using backslash (‘\’) as escape character. Whereas
CSVWriter
is using a double quote(‘”’) as escape character.
Because of this at the time of writing the data backslash characters are not
properly escaped. At the time of reading, a single backslash character will be
ignored by the CSVParser
as it is the escape character.
The Solution:
By default CSVReader
uses CSVParser
which for parsing CSV data. OpenCSV
provides another parser (RFC4180Parser
) which strictly follows RFC4180 standards.
Using with RFC4180Parser
, the CSVReader
will use double quote(‘”’) as the
escape character making it consistent with CSVWriter
.
We need to replace the reading part of above mentioned snippet with following code:
RFC4180Parser rfc4180Parser = new RFC4180ParserBuilder().build();
CSVReaderBuilder csvReaderBuilder = new CSVReaderBuilder(new StringReader(writer.toString()))
.withCSVParser(rfc4180Parser);
try (CSVReader csvReader = csvReaderBuilder.build()) {
String[] readData = csvReader.readNext();
System.out.println("Read data: " + readData[0] + "," + readData[1]);
} catch (IOException e) {
throw new RuntimeException(e);
}
Output:
Original data: t\est,t\est
Written data: "t\est","t\est"
Read data: t\est,t\est
If you are looking to change the library itself then Apache commons CSV is a good alternative for OpenCSV.
If you have any suggestions and/or queries related to this post then please start a discussion in the comment section below.
Library Version used for the code snippets:
References: