Java – How to read numeric strings in Excel cells as string (not numbers)

apache-poiexceljava

I have excel file with such contents:
- A1: SomeString
- A2: 2
All fields are set to String format.
When I read the file in java using POI, it tells that A2 is in numeric cell format.
The problem is that the value in A2 can be 2 or 2.0 (and I want to be able to distinguish them) so I can't just use .toString().

What can I do to read the value as string?

Best Answer

I had same problem. I did cell.setCellType(Cell.CELL_TYPE_STRING); before reading the string value, which solved the problem regardless of how the user formatted the cell.

Algorithm

To generate a random string, concatenate characters drawn randomly from the set of acceptable symbols until the string reaches the desired length.

Implementation

Here's some fairly simple and very flexible code for generating random identifiers. Read the information that follows for important application notes.

public class RandomString {

    /**
     * Generate a random string.
     */
    public String nextString() {
        for (int idx = 0; idx < buf.length; ++idx)
            buf[idx] = symbols[random.nextInt(symbols.length)];
        return new String(buf);
    }

    public static final String upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";

    public static final String lower = upper.toLowerCase(Locale.ROOT);

    public static final String digits = "0123456789";

    public static final String alphanum = upper + lower + digits;

    private final Random random;

    private final char[] symbols;

    private final char[] buf;

    public RandomString(int length, Random random, String symbols) {
        if (length < 1) throw new IllegalArgumentException();
        if (symbols.length() < 2) throw new IllegalArgumentException();
        this.random = Objects.requireNonNull(random);
        this.symbols = symbols.toCharArray();
        this.buf = new char[length];
    }

    /**
     * Create an alphanumeric string generator.
     */
    public RandomString(int length, Random random) {
        this(length, random, alphanum);
    }

    /**
     * Create an alphanumeric strings from a secure generator.
     */
    public RandomString(int length) {
        this(length, new SecureRandom());
    }

    /**
     * Create session identifiers.
     */
    public RandomString() {
        this(21);
    }

}

Usage examples

Create an insecure generator for 8-character identifiers:

RandomString gen = new RandomString(8, ThreadLocalRandom.current());

Create a secure generator for session identifiers:

RandomString session = new RandomString();

Create a generator with easy-to-read codes for printing. The strings are longer than full alphanumeric strings to compensate for using fewer symbols:

String easy = RandomString.digits + "ACEFGHJKLMNPQRUVWXYabcdefhijkprstuvwx";
RandomString tickets = new RandomString(23, new SecureRandom(), easy);

Use as session identifiers

Generating session identifiers that are likely to be unique is not good enough, or you could just use a simple counter. Attackers hijack sessions when predictable identifiers are used.

There is tension between length and security. Shorter identifiers are easier to guess, because there are fewer possibilities. But longer identifiers consume more storage and bandwidth. A larger set of symbols helps, but might cause encoding problems if identifiers are included in URLs or re-entered by hand.

The underlying source of randomness, or entropy, for session identifiers should come from a random number generator designed for cryptography. However, initializing these generators can sometimes be computationally expensive or slow, so effort should be made to re-use them when possible.

Use as object identifiers

Not every application requires security. Random assignment can be an efficient way for multiple entities to generate identifiers in a shared space without any coordination or partitioning. Coordination can be slow, especially in a clustered or distributed environment, and splitting up a space causes problems when entities end up with shares that are too small or too big.

Identifiers generated without taking measures to make them unpredictable should be protected by other means if an attacker might be able to view and manipulate them, as happens in most web applications. There should be a separate authorization system that protects objects whose identifier can be guessed by an attacker without access permission.

Care must be also be taken to use identifiers that are long enough to make collisions unlikely given the anticipated total number of identifiers. This is referred to as "the birthday paradox." The probability of a collision, p, is approximately n²/(2q^x), where n is the number of identifiers actually generated, q is the number of distinct symbols in the alphabet, and x is the length of the identifiers. This should be a very small number, like 2^‑50 or less.

Working this out shows that the chance of collision among 500k 15-character identifiers is about 2^‑52, which is probably less likely than undetected errors from cosmic rays, etc.

Comparison with UUIDs

According to their specification, UUIDs are not designed to be unpredictable, and should not be used as session identifiers.

UUIDs in their standard format take a lot of space: 36 characters for only 122 bits of entropy. (Not all bits of a "random" UUID are selected randomly.) A randomly chosen alphanumeric string packs more entropy in just 21 characters.

UUIDs are not flexible; they have a standardized structure and layout. This is their chief virtue as well as their main weakness. When collaborating with an outside party, the standardization offered by UUIDs may be helpful. For purely internal use, they can be inefficient.

Java – How to read / convert an InputStream into a String in Java

Summarize other answers I found 11 main ways to do this (see below). And I wrote some performance tests (see results below):

Ways to convert an InputStream to a String:

Using IOUtils.toString (Apache Utils)

 String result = IOUtils.toString(inputStream, StandardCharsets.UTF_8);

Using CharStreams (Guava)

 String result = CharStreams.toString(new InputStreamReader(
       inputStream, Charsets.UTF_8));

Using Scanner (JDK)

 Scanner s = new Scanner(inputStream).useDelimiter("\\A");
 String result = s.hasNext() ? s.next() : "";

Using Stream API (Java 8). Warning: This solution converts different line breaks (like \r\n) to \n.

 String result = new BufferedReader(new InputStreamReader(inputStream))
   .lines().collect(Collectors.joining("\n"));

Using parallel Stream API (Java 8). Warning: This solution converts different line breaks (like \r\n) to \n.

 String result = new BufferedReader(new InputStreamReader(inputStream))
    .lines().parallel().collect(Collectors.joining("\n"));

Using InputStreamReader and StringBuilder (JDK)

 int bufferSize = 1024;
 char[] buffer = new char[bufferSize];
 StringBuilder out = new StringBuilder();
 Reader in = new InputStreamReader(stream, StandardCharsets.UTF_8);
 for (int numRead; (numRead = in.read(buffer, 0, buffer.length)) > 0; ) {
     out.append(buffer, 0, numRead);
 }
 return out.toString();

Using StringWriter and IOUtils.copy (Apache Commons)

 StringWriter writer = new StringWriter();
 IOUtils.copy(inputStream, writer, "UTF-8");
 return writer.toString();

Using ByteArrayOutputStream and inputStream.read (JDK)

 ByteArrayOutputStream result = new ByteArrayOutputStream();
 byte[] buffer = new byte[1024];
 for (int length; (length = inputStream.read(buffer)) != -1; ) {
     result.write(buffer, 0, length);
 }
 // StandardCharsets.UTF_8.name() > JDK 7
 return result.toString("UTF-8");

Using BufferedReader (JDK). Warning: This solution converts different line breaks (like \n\r) to line.separator system property (for example, in Windows to "\r\n").

 String newLine = System.getProperty("line.separator");
 BufferedReader reader = new BufferedReader(
         new InputStreamReader(inputStream));
 StringBuilder result = new StringBuilder();
 for (String line; (line = reader.readLine()) != null; ) {
     if (result.length() > 0) {
         result.append(newLine);
     }
     result.append(line);
 }
 return result.toString();

Using BufferedInputStream and ByteArrayOutputStream (JDK)

BufferedInputStream bis = new BufferedInputStream(inputStream);
ByteArrayOutputStream buf = new ByteArrayOutputStream();
for (int result = bis.read(); result != -1; result = bis.read()) {
    buf.write((byte) result);
}
// StandardCharsets.UTF_8.name() > JDK 7
return buf.toString("UTF-8");

Using inputStream.read() and StringBuilder (JDK). Warning: This solution has problems with Unicode, for example with Russian text (works correctly only with non-Unicode text)
```
StringBuilder sb = new StringBuilder();
for (int ch; (ch = inputStream.read()) != -1; ) {
    sb.append((char) ch);
}
return sb.toString();
```

Warning:

Solutions 4, 5 and 9 convert different line breaks to one.
Solution 11 can't work correctly with Unicode text

Performance tests

Performance tests for small String (length = 175), url in github (mode = Average Time, system = Linux, score 1,343 is the best):

              Benchmark                         Mode  Cnt   Score   Error  Units
 8. ByteArrayOutputStream and read (JDK)        avgt   10   1,343 ± 0,028  us/op
 6. InputStreamReader and StringBuilder (JDK)   avgt   10   6,980 ± 0,404  us/op
10. BufferedInputStream, ByteArrayOutputStream  avgt   10   7,437 ± 0,735  us/op
11. InputStream.read() and StringBuilder (JDK)  avgt   10   8,977 ± 0,328  us/op
 7. StringWriter and IOUtils.copy (Apache)      avgt   10  10,613 ± 0,599  us/op
 1. IOUtils.toString (Apache Utils)             avgt   10  10,605 ± 0,527  us/op
 3. Scanner (JDK)                               avgt   10  12,083 ± 0,293  us/op
 2. CharStreams (guava)                         avgt   10  12,999 ± 0,514  us/op
 4. Stream Api (Java 8)                         avgt   10  15,811 ± 0,605  us/op
 9. BufferedReader (JDK)                        avgt   10  16,038 ± 0,711  us/op
 5. parallel Stream Api (Java 8)                avgt   10  21,544 ± 0,583  us/op

Performance tests for big String (length = 50100), url in github (mode = Average Time, system = Linux, score 200,715 is the best):

               Benchmark                        Mode  Cnt   Score        Error  Units
 8. ByteArrayOutputStream and read (JDK)        avgt   10   200,715 ±   18,103  us/op
 1. IOUtils.toString (Apache Utils)             avgt   10   300,019 ±    8,751  us/op
 6. InputStreamReader and StringBuilder (JDK)   avgt   10   347,616 ±  130,348  us/op
 7. StringWriter and IOUtils.copy (Apache)      avgt   10   352,791 ±  105,337  us/op
 2. CharStreams (guava)                         avgt   10   420,137 ±   59,877  us/op
 9. BufferedReader (JDK)                        avgt   10   632,028 ±   17,002  us/op
 5. parallel Stream Api (Java 8)                avgt   10   662,999 ±   46,199  us/op
 4. Stream Api (Java 8)                         avgt   10   701,269 ±   82,296  us/op
10. BufferedInputStream, ByteArrayOutputStream  avgt   10   740,837 ±    5,613  us/op
 3. Scanner (JDK)                               avgt   10   751,417 ±   62,026  us/op
11. InputStream.read() and StringBuilder (JDK)  avgt   10  2919,350 ± 1101,942  us/op

Graphs (performance tests depending on Input Stream length in Windows 7 system)

Performance test (Average Time) depending on Input Stream length in Windows 7 system:

 length  182    546     1092    3276    9828    29484   58968

 test8  0.38    0.938   1.868   4.448   13.412  36.459  72.708
 test4  2.362   3.609   5.573   12.769  40.74   81.415  159.864
 test5  3.881   5.075   6.904   14.123  50.258  129.937 166.162
 test9  2.237   3.493   5.422   11.977  45.98   89.336  177.39
 test6  1.261   2.12    4.38    10.698  31.821  86.106  186.636
 test7  1.601   2.391   3.646   8.367   38.196  110.221 211.016
 test1  1.529   2.381   3.527   8.411   40.551  105.16  212.573
 test3  3.035   3.934   8.606   20.858  61.571  118.744 235.428
 test2  3.136   6.238   10.508  33.48   43.532  118.044 239.481
 test10 1.593   4.736   7.527   20.557  59.856  162.907 323.147
 test11 3.913   11.506  23.26   68.644  207.591 600.444 1211.545