I want to read from and write to password protected Excel files. How can I do so using Apache POI API.
Java – How to access password protected Excel workbook in Java using POI api
apacheapache-poijava
Related Solutions
Data Storage:
Specify the
utf8mb4
character set on all tables and text columns in your database. This makes MySQL physically store and retrieve values encoded natively in UTF-8. Note that MySQL will implicitly useutf8mb4
encoding if autf8mb4_*
collation is specified (without any explicit character set).In older versions of MySQL (< 5.5.3), you'll unfortunately be forced to use simply
utf8
, which only supports a subset of Unicode characters. I wish I were kidding.
Data Access:
In your application code (e.g. PHP), in whatever DB access method you use, you'll need to set the connection charset to
utf8mb4
. This way, MySQL does no conversion from its native UTF-8 when it hands data off to your application and vice versa.Some drivers provide their own mechanism for configuring the connection character set, which both updates its own internal state and informs MySQL of the encoding to be used on the connection—this is usually the preferred approach. In PHP:
If you're using the PDO abstraction layer with PHP ≥ 5.3.6, you can specify
charset
in the DSN:$dbh = new PDO('mysql:charset=utf8mb4');
If you're using mysqli, you can call
set_charset()
:$mysqli->set_charset('utf8mb4'); // object oriented style mysqli_set_charset($link, 'utf8mb4'); // procedural style
If you're stuck with plain mysql but happen to be running PHP ≥ 5.2.3, you can call
mysql_set_charset
.
If the driver does not provide its own mechanism for setting the connection character set, you may have to issue a query to tell MySQL how your application expects data on the connection to be encoded:
SET NAMES 'utf8mb4'
.The same consideration regarding
utf8mb4
/utf8
applies as above.
Output:
If your application transmits text to other systems, they will also need to be informed of the character encoding. With web applications, the browser must be informed of the encoding in which data is sent (through HTTP response headers or HTML metadata).
In PHP, you can use the
default_charset
php.ini option, or manually issue theContent-Type
MIME header yourself, which is just more work but has the same effect.When encoding the output using
json_encode()
, addJSON_UNESCAPED_UNICODE
as a second parameter.
Input:
Unfortunately, you should verify every received string as being valid UTF-8 before you try to store it or use it anywhere. PHP's
mb_check_encoding()
does the trick, but you have to use it religiously. There's really no way around this, as malicious clients can submit data in whatever encoding they want, and I haven't found a trick to get PHP to do this for you reliably.From my reading of the current HTML spec, the following sub-bullets are not necessary or even valid anymore for modern HTML. My understanding is that browsers will work with and submit data in the character set specified for the document. However, if you're targeting older versions of HTML (XHTML, HTML4, etc.), these points may still be useful:
- For HTML before HTML5 only: you want all data sent to you by browsers to be in UTF-8. Unfortunately, if you go by the only way to reliably do this is add the
accept-charset
attribute to all your<form>
tags:<form ... accept-charset="UTF-8">
. - For HTML before HTML5 only: note that the W3C HTML spec says that clients "should" default to sending forms back to the server in whatever charset the server served, but this is apparently only a recommendation, hence the need for being explicit on every single
<form>
tag.
- For HTML before HTML5 only: you want all data sent to you by browsers to be in UTF-8. Unfortunately, if you go by the only way to reliably do this is add the
Other Code Considerations:
Obviously enough, all files you'll be serving (PHP, HTML, JavaScript, etc.) should be encoded in valid UTF-8.
You need to make sure that every time you process a UTF-8 string, you do so safely. This is, unfortunately, the hard part. You'll probably want to make extensive use of PHP's
mbstring
extension.PHP's built-in string operations are not by default UTF-8 safe. There are some things you can safely do with normal PHP string operations (like concatenation), but for most things you should use the equivalent
mbstring
function.To know what you're doing (read: not mess it up), you really need to know UTF-8 and how it works on the lowest possible level. Check out any of the links from utf8.com for some good resources to learn everything you need to know.
Summarize other answers I found 11 main ways to do this (see below). And I wrote some performance tests (see results below):
Ways to convert an InputStream to a String:
Using
IOUtils.toString
(Apache Utils)String result = IOUtils.toString(inputStream, StandardCharsets.UTF_8);
Using
CharStreams
(Guava)String result = CharStreams.toString(new InputStreamReader( inputStream, Charsets.UTF_8));
Using
Scanner
(JDK)Scanner s = new Scanner(inputStream).useDelimiter("\\A"); String result = s.hasNext() ? s.next() : "";
Using Stream API (Java 8). Warning: This solution converts different line breaks (like
\r\n
) to\n
.String result = new BufferedReader(new InputStreamReader(inputStream)) .lines().collect(Collectors.joining("\n"));
Using parallel Stream API (Java 8). Warning: This solution converts different line breaks (like
\r\n
) to\n
.String result = new BufferedReader(new InputStreamReader(inputStream)) .lines().parallel().collect(Collectors.joining("\n"));
Using
InputStreamReader
andStringBuilder
(JDK)int bufferSize = 1024; char[] buffer = new char[bufferSize]; StringBuilder out = new StringBuilder(); Reader in = new InputStreamReader(stream, StandardCharsets.UTF_8); for (int numRead; (numRead = in.read(buffer, 0, buffer.length)) > 0; ) { out.append(buffer, 0, numRead); } return out.toString();
Using
StringWriter
andIOUtils.copy
(Apache Commons)StringWriter writer = new StringWriter(); IOUtils.copy(inputStream, writer, "UTF-8"); return writer.toString();
Using
ByteArrayOutputStream
andinputStream.read
(JDK)ByteArrayOutputStream result = new ByteArrayOutputStream(); byte[] buffer = new byte[1024]; for (int length; (length = inputStream.read(buffer)) != -1; ) { result.write(buffer, 0, length); } // StandardCharsets.UTF_8.name() > JDK 7 return result.toString("UTF-8");
Using
BufferedReader
(JDK). Warning: This solution converts different line breaks (like\n\r
) toline.separator
system property (for example, in Windows to "\r\n").String newLine = System.getProperty("line.separator"); BufferedReader reader = new BufferedReader( new InputStreamReader(inputStream)); StringBuilder result = new StringBuilder(); for (String line; (line = reader.readLine()) != null; ) { if (result.length() > 0) { result.append(newLine); } result.append(line); } return result.toString();
Using
BufferedInputStream
andByteArrayOutputStream
(JDK)BufferedInputStream bis = new BufferedInputStream(inputStream); ByteArrayOutputStream buf = new ByteArrayOutputStream(); for (int result = bis.read(); result != -1; result = bis.read()) { buf.write((byte) result); } // StandardCharsets.UTF_8.name() > JDK 7 return buf.toString("UTF-8");
Using
inputStream.read()
andStringBuilder
(JDK). Warning: This solution has problems with Unicode, for example with Russian text (works correctly only with non-Unicode text)StringBuilder sb = new StringBuilder(); for (int ch; (ch = inputStream.read()) != -1; ) { sb.append((char) ch); } return sb.toString();
Warning:
Solutions 4, 5 and 9 convert different line breaks to one.
Solution 11 can't work correctly with Unicode text
Performance tests
Performance tests for small String
(length = 175), url in github (mode = Average Time, system = Linux, score 1,343 is the best):
Benchmark Mode Cnt Score Error Units
8. ByteArrayOutputStream and read (JDK) avgt 10 1,343 ± 0,028 us/op
6. InputStreamReader and StringBuilder (JDK) avgt 10 6,980 ± 0,404 us/op
10. BufferedInputStream, ByteArrayOutputStream avgt 10 7,437 ± 0,735 us/op
11. InputStream.read() and StringBuilder (JDK) avgt 10 8,977 ± 0,328 us/op
7. StringWriter and IOUtils.copy (Apache) avgt 10 10,613 ± 0,599 us/op
1. IOUtils.toString (Apache Utils) avgt 10 10,605 ± 0,527 us/op
3. Scanner (JDK) avgt 10 12,083 ± 0,293 us/op
2. CharStreams (guava) avgt 10 12,999 ± 0,514 us/op
4. Stream Api (Java 8) avgt 10 15,811 ± 0,605 us/op
9. BufferedReader (JDK) avgt 10 16,038 ± 0,711 us/op
5. parallel Stream Api (Java 8) avgt 10 21,544 ± 0,583 us/op
Performance tests for big String
(length = 50100), url in github (mode = Average Time, system = Linux, score 200,715 is the best):
Benchmark Mode Cnt Score Error Units
8. ByteArrayOutputStream and read (JDK) avgt 10 200,715 ± 18,103 us/op
1. IOUtils.toString (Apache Utils) avgt 10 300,019 ± 8,751 us/op
6. InputStreamReader and StringBuilder (JDK) avgt 10 347,616 ± 130,348 us/op
7. StringWriter and IOUtils.copy (Apache) avgt 10 352,791 ± 105,337 us/op
2. CharStreams (guava) avgt 10 420,137 ± 59,877 us/op
9. BufferedReader (JDK) avgt 10 632,028 ± 17,002 us/op
5. parallel Stream Api (Java 8) avgt 10 662,999 ± 46,199 us/op
4. Stream Api (Java 8) avgt 10 701,269 ± 82,296 us/op
10. BufferedInputStream, ByteArrayOutputStream avgt 10 740,837 ± 5,613 us/op
3. Scanner (JDK) avgt 10 751,417 ± 62,026 us/op
11. InputStream.read() and StringBuilder (JDK) avgt 10 2919,350 ± 1101,942 us/op
Graphs (performance tests depending on Input Stream length in Windows 7 system)
Performance test (Average Time) depending on Input Stream length in Windows 7 system:
length 182 546 1092 3276 9828 29484 58968
test8 0.38 0.938 1.868 4.448 13.412 36.459 72.708
test4 2.362 3.609 5.573 12.769 40.74 81.415 159.864
test5 3.881 5.075 6.904 14.123 50.258 129.937 166.162
test9 2.237 3.493 5.422 11.977 45.98 89.336 177.39
test6 1.261 2.12 4.38 10.698 31.821 86.106 186.636
test7 1.601 2.391 3.646 8.367 38.196 110.221 211.016
test1 1.529 2.381 3.527 8.411 40.551 105.16 212.573
test3 3.035 3.934 8.606 20.858 61.571 118.744 235.428
test2 3.136 6.238 10.508 33.48 43.532 118.044 239.481
test10 1.593 4.736 7.527 20.557 59.856 162.907 323.147
test11 3.913 11.506 23.26 68.644 207.591 600.444 1211.545
Related Topic
- Java – How to create an executable JAR with dependencies using Maven
- Java – How to use java.net.URLConnection to fire and handle HTTP requests
- Java – How to avoid Java code in JSP files, using JSP 2
- Java – Reading a plain text file in Java
- Java – How to convert a String to an int in Java
- Java – How to read a large text file line by line using Java
- Java – How to create a memory leak in Java
Best Answer
POI should be able to open both protected xls files (using org.apache.poi.hssf.record.crypt) and protected xlsx files (using org.apache.poi.poifs.crypt). Have you tried these?
If you're using HSSF (for a xls file), you need to set the password before opening the file. You do this with a call to:
After that, HSSF should be able to open your file.
For XSSF, you want something like:
.
Alternately, in newer versions of Apache POI, WorkbookFactory supports supplying the password when opening, so you can just do something like:
That will work for both HSSF and XSSF, picking the right one based on the format, and passing in the given password in the appropriate way for the format.