What regex pattern would need I to pass to java.lang.String.split()
to split a String into an Array of substrings using all whitespace characters (' '
, '\t'
, '\n'
, etc.) as delimiters?
Java – How to split a string with any whitespace chars as delimiters
javasplitstringwhitespace
Related Solutions
I use this to split string by a delimiter. The first puts the results in a pre-constructed vector, the second returns a new vector.
#include <string>
#include <sstream>
#include <vector>
#include <iterator>
template <typename Out>
void split(const std::string &s, char delim, Out result) {
std::istringstream iss(s);
std::string item;
while (std::getline(iss, item, delim)) {
*result++ = item;
}
}
std::vector<std::string> split(const std::string &s, char delim) {
std::vector<std::string> elems;
split(s, delim, std::back_inserter(elems));
return elems;
}
Note that this solution does not skip empty tokens, so the following will find 4 items, one of which is empty:
std::vector<std::string> x = split("one:two::three", ':');
Summarize other answers I found 11 main ways to do this (see below). And I wrote some performance tests (see results below):
Ways to convert an InputStream to a String:
Using
IOUtils.toString
(Apache Utils)String result = IOUtils.toString(inputStream, StandardCharsets.UTF_8);
Using
CharStreams
(Guava)String result = CharStreams.toString(new InputStreamReader( inputStream, Charsets.UTF_8));
Using
Scanner
(JDK)Scanner s = new Scanner(inputStream).useDelimiter("\\A"); String result = s.hasNext() ? s.next() : "";
Using Stream API (Java 8). Warning: This solution converts different line breaks (like
\r\n
) to\n
.String result = new BufferedReader(new InputStreamReader(inputStream)) .lines().collect(Collectors.joining("\n"));
Using parallel Stream API (Java 8). Warning: This solution converts different line breaks (like
\r\n
) to\n
.String result = new BufferedReader(new InputStreamReader(inputStream)) .lines().parallel().collect(Collectors.joining("\n"));
Using
InputStreamReader
andStringBuilder
(JDK)int bufferSize = 1024; char[] buffer = new char[bufferSize]; StringBuilder out = new StringBuilder(); Reader in = new InputStreamReader(stream, StandardCharsets.UTF_8); for (int numRead; (numRead = in.read(buffer, 0, buffer.length)) > 0; ) { out.append(buffer, 0, numRead); } return out.toString();
Using
StringWriter
andIOUtils.copy
(Apache Commons)StringWriter writer = new StringWriter(); IOUtils.copy(inputStream, writer, "UTF-8"); return writer.toString();
Using
ByteArrayOutputStream
andinputStream.read
(JDK)ByteArrayOutputStream result = new ByteArrayOutputStream(); byte[] buffer = new byte[1024]; for (int length; (length = inputStream.read(buffer)) != -1; ) { result.write(buffer, 0, length); } // StandardCharsets.UTF_8.name() > JDK 7 return result.toString("UTF-8");
Using
BufferedReader
(JDK). Warning: This solution converts different line breaks (like\n\r
) toline.separator
system property (for example, in Windows to "\r\n").String newLine = System.getProperty("line.separator"); BufferedReader reader = new BufferedReader( new InputStreamReader(inputStream)); StringBuilder result = new StringBuilder(); for (String line; (line = reader.readLine()) != null; ) { if (result.length() > 0) { result.append(newLine); } result.append(line); } return result.toString();
Using
BufferedInputStream
andByteArrayOutputStream
(JDK)BufferedInputStream bis = new BufferedInputStream(inputStream); ByteArrayOutputStream buf = new ByteArrayOutputStream(); for (int result = bis.read(); result != -1; result = bis.read()) { buf.write((byte) result); } // StandardCharsets.UTF_8.name() > JDK 7 return buf.toString("UTF-8");
Using
inputStream.read()
andStringBuilder
(JDK). Warning: This solution has problems with Unicode, for example with Russian text (works correctly only with non-Unicode text)StringBuilder sb = new StringBuilder(); for (int ch; (ch = inputStream.read()) != -1; ) { sb.append((char) ch); } return sb.toString();
Warning:
Solutions 4, 5 and 9 convert different line breaks to one.
Solution 11 can't work correctly with Unicode text
Performance tests
Performance tests for small String
(length = 175), url in github (mode = Average Time, system = Linux, score 1,343 is the best):
Benchmark Mode Cnt Score Error Units
8. ByteArrayOutputStream and read (JDK) avgt 10 1,343 ± 0,028 us/op
6. InputStreamReader and StringBuilder (JDK) avgt 10 6,980 ± 0,404 us/op
10. BufferedInputStream, ByteArrayOutputStream avgt 10 7,437 ± 0,735 us/op
11. InputStream.read() and StringBuilder (JDK) avgt 10 8,977 ± 0,328 us/op
7. StringWriter and IOUtils.copy (Apache) avgt 10 10,613 ± 0,599 us/op
1. IOUtils.toString (Apache Utils) avgt 10 10,605 ± 0,527 us/op
3. Scanner (JDK) avgt 10 12,083 ± 0,293 us/op
2. CharStreams (guava) avgt 10 12,999 ± 0,514 us/op
4. Stream Api (Java 8) avgt 10 15,811 ± 0,605 us/op
9. BufferedReader (JDK) avgt 10 16,038 ± 0,711 us/op
5. parallel Stream Api (Java 8) avgt 10 21,544 ± 0,583 us/op
Performance tests for big String
(length = 50100), url in github (mode = Average Time, system = Linux, score 200,715 is the best):
Benchmark Mode Cnt Score Error Units
8. ByteArrayOutputStream and read (JDK) avgt 10 200,715 ± 18,103 us/op
1. IOUtils.toString (Apache Utils) avgt 10 300,019 ± 8,751 us/op
6. InputStreamReader and StringBuilder (JDK) avgt 10 347,616 ± 130,348 us/op
7. StringWriter and IOUtils.copy (Apache) avgt 10 352,791 ± 105,337 us/op
2. CharStreams (guava) avgt 10 420,137 ± 59,877 us/op
9. BufferedReader (JDK) avgt 10 632,028 ± 17,002 us/op
5. parallel Stream Api (Java 8) avgt 10 662,999 ± 46,199 us/op
4. Stream Api (Java 8) avgt 10 701,269 ± 82,296 us/op
10. BufferedInputStream, ByteArrayOutputStream avgt 10 740,837 ± 5,613 us/op
3. Scanner (JDK) avgt 10 751,417 ± 62,026 us/op
11. InputStream.read() and StringBuilder (JDK) avgt 10 2919,350 ± 1101,942 us/op
Graphs (performance tests depending on Input Stream length in Windows 7 system)
Performance test (Average Time) depending on Input Stream length in Windows 7 system:
length 182 546 1092 3276 9828 29484 58968
test8 0.38 0.938 1.868 4.448 13.412 36.459 72.708
test4 2.362 3.609 5.573 12.769 40.74 81.415 159.864
test5 3.881 5.075 6.904 14.123 50.258 129.937 166.162
test9 2.237 3.493 5.422 11.977 45.98 89.336 177.39
test6 1.261 2.12 4.38 10.698 31.821 86.106 186.636
test7 1.601 2.391 3.646 8.367 38.196 110.221 211.016
test1 1.529 2.381 3.527 8.411 40.551 105.16 212.573
test3 3.035 3.934 8.606 20.858 61.571 118.744 235.428
test2 3.136 6.238 10.508 33.48 43.532 118.044 239.481
test10 1.593 4.736 7.527 20.557 59.856 162.907 323.147
test11 3.913 11.506 23.26 68.644 207.591 600.444 1211.545
Related Topic
- Python – How to split a list into evenly sized chunks
- Bash – How to split a string on a delimiter in Bash
- Javascript – How to make the first letter of a string uppercase in JavaScript
- Javascript – How to replace all occurrences of a string in JavaScript
- C# – Easiest way to split a string on newlines in .NET
- Javascript – How to check whether a string contains a substring in JavaScript
- C# – Split a string by another string in C#
- Java – How to split a string in Java
Best Answer
Something in the lines of
This groups all white spaces as a delimiter.
So if I have the string:
This should yield the strings
"Hello"
and"World"
and omit the empty space between the[space]
and the[tab]
.As VonC pointed out, the backslash should be escaped, because Java would first try to escape the string to a special character, and send that to be parsed. What you want, is the literal
"\s"
, which means, you need to pass"\\s"
. It can get a bit confusing.The
\\s
is equivalent to[ \\t\\n\\x0B\\f\\r]
.