Properly handling backslashes using OpenCSV

Written on September 17, 2017 by Vatsal Mevada |

OpenCSV is one of the popular JAVA libraries used for handling CSV data. In this post I will discuss about one specific issue which we recently faced with this library.

The Problem:

Here is a minimal code snippet for writing and reading CSV data using OpenCSV.

String dataValue = "test";

//writing  
StringWriter writer = new StringWriter();

try (CSVWriter csvwriter = new CSVWriter(writer)) {
    String[] originalData = new String[2];
    originalData[0] = dataValue;
    originalData[1] = dataValue;
    System.out.println("Original data: " + originalData[0] + "," + originalData[1]);
    csvwriter.writeNext(originalData);
} catch (IOException e) {
    throw new RuntimeException(e);
}
System.out.println("Written data: " + writer.toString());

//reading
try (CSVReader csvReader = new CSVReader(new StringReader(writer.toString()))) {
          String[] readData = csvReader.readNext();
          System.out.println("Read data: " + readData[0] + "," + readData[1]);
      } catch (IOException e) {
          throw new RuntimeException(e);
      }

The output of the above snippet:

Original data: test,test
Written data: "test","test"

Read data: test,test

Which is as expected. Well, the life is good with OpenCSV until you encounter a backslash character (‘\’) in your CSV data.

So let’s try running the same snippet with dataValue having a backslash character:

String dataValue = "t\\est";

Output:

Original data: t\est,t\est
Written data: "t\est","t\est"

Read data: test,test

Note that the backslash character is gone in the read CSV data.

The root cause:

By default CSVReader is using backslash (‘\’) as escape character. Whereas CSVWriter is using a double quote(‘”’) as escape character.

Because of this at the time of writing the data backslash characters are not properly escaped. At the time of reading, a single backslash character will be ignored by the CSVParser as it is the escape character.

The Solution:

By default CSVReader uses CSVParser which for parsing CSV data. OpenCSV provides another parser (RFC4180Parser) which strictly follows RFC4180 standards.

Using with RFC4180Parser, the CSVReader will use double quote(‘”’) as the escape character making it consistent with CSVWriter.

We need to replace the reading part of above mentioned snippet with following code:

RFC4180Parser rfc4180Parser = new RFC4180ParserBuilder().build();
CSVReaderBuilder csvReaderBuilder = new CSVReaderBuilder(new StringReader(writer.toString()))
                .withCSVParser(rfc4180Parser);
try (CSVReader csvReader = csvReaderBuilder.build()) {
    String[] readData = csvReader.readNext();
    System.out.println("Read data: " + readData[0] + "," + readData[1]);
} catch (IOException e) {
    throw new RuntimeException(e);
}

Output:

Original data: t\est,t\est
Written data: "t\est","t\est"

Read data: t\est,t\est

If you are looking to change the library itself then Apache commons CSV is a good alternative for OpenCSV.

If you have any suggestions and/or queries related to this post then please start a discussion in the comment section below.

Library Version used for the code snippets:

OpenCSV 4.0

References:

sourceforge support request

RFC4180

OpenCSV official page