Properly handling backslashes using OpenCSV
OpenCSV is one of the popular JAVA libraries used for handling CSV data. In this post I will discuss about one specific issue which we recently faced with this library.
The Problem:
Here is a minimal code snippet for writing and reading CSV data using OpenCSV.
The output of the above snippet:
Which is as expected. Well, the life is good with OpenCSV until you encounter a backslash character (‘\’) in your CSV data.
So let’s try running the same snippet with dataValue
having a backslash character:
Output:
Note that the backslash character is gone in the read CSV data.
The root cause:
By default CSVReader
is using backslash (‘\’) as escape character. Whereas
CSVWriter
is using a double quote(‘”’) as escape character.
Because of this at the time of writing the data backslash characters are not
properly escaped. At the time of reading, a single backslash character will be
ignored by the CSVParser
as it is the escape character.
The Solution:
By default CSVReader
uses CSVParser
which for parsing CSV data. OpenCSV
provides another parser (RFC4180Parser
) which strictly follows RFC4180 standards.
Using with RFC4180Parser
, the CSVReader
will use double quote(‘”’) as the
escape character making it consistent with CSVWriter
.
We need to replace the reading part of above mentioned snippet with following code:
Output:
If you are looking to change the library itself then Apache commons CSV is a good alternative for OpenCSV.
If you have any suggestions and/or queries related to this post then please start a discussion in the comment section below.
Library Version used for the code snippets:
References: