Java Strings – Why Is String Immutable in Java?

immutabilityjavalanguage-designprogramming-languagesstrings

I couldn't understand the reason of it. I always use String class like other developers, but when I modify the value of it, new instance of String created.

What might be the reason of immutability for String class in Java?

I know there are some alternatives like StringBuffer or StringBuilder. It's just curiosity.

Best Answer

Concurrency

Java was defined from the start with considerations of concurrency. As often been mentioned shared mutables are problematic. One thing can change another behind the back of another thread without that thread being aware of it.

There are a host of multithreaded C++ bugs that have croped up because of a shared string - where one module thought it was safe to change when another module in the code had saved a pointer to it and expected it to stay the same.

The 'solution' to this is that every class makes a defensive copy of the mutable objects that are passed to it. For mutable strings, this is O(n) to make the copy. For immutable strings, making a copy is O(1) because it isn't a copy, its the same object that can't change.

In a multithreaded environment, immutable objects can always be safely shared between each other. This leads to an overall reduction in memory usage and improves memory caching.

Security

Many times strings are passed around as arguments to constructors - network connections and protocals are the two that most easily come to mind. Being able to change this at an undetermined time later in the execution can lead to security issues (the function thought it was connecting to one machine, but was diverted to another, but everything in the object looks like it connected to the first... its even the same string).

Java lets one use reflection - and the parameters for this are strings. The danger of one passing a string that can get modified through the way to another method that reflects. This is very bad.

Keys to the Hash

The hash table is one of the most used data structures. The keys to the data structure are very often strings. Having immutable strings means that (as above) the hash table does not need to make a copy of the hash key each time. If strings were mutable, and the hash table didn't make this, it would be possible for something to change the hash key at a distance.

The way the Object in java works, is that everything has a hash key (accessed via the hashCode() method). Having an immutable string means that the hashCode can be cached. Considering how often Strings are used as keys to a hash, this provides a significant performance boost (rather than having to recalculate the hash code each time).

Substrings

By having the String be immutable, the underlying character array that backs the data structure is also immutable. This allows for certain optimizations on the substring method the be done (they aren't necessarily done - it also introduces the possibility of some memory leaks too).

If you do:

String foo = "smiles";
String bar = foo.substring(1,5);

The value of bar is 'mile'. However, both foo and bar can be backed by the same character array, reducing the instantiation of more character arrays or copying it - just using different start and end points within the string.

foo |    | (0, 6)
    v    v
    smiles
     ^  ^
bar  |  |  (1, 5)

Now, the downside of that (the memory leak) is that if one had a 1k long string and took the substring of the first and second character, it would also be backed by the 1k long character array. This array would remain in memory even if the original string that had a value of the entire character array was garbage collected.

One can see this in String from JDK 6b14 (the following code is from a GPL v2 source and used as an example)

   public String(char value[], int offset, int count) {
       if (offset < 0) {
           throw new StringIndexOutOfBoundsException(offset);
       }
       if (count < 0) {
           throw new StringIndexOutOfBoundsException(count);
       }
       // Note: offset or count might be near -1>>>1.
       if (offset > value.length - count) {
           throw new StringIndexOutOfBoundsException(offset + count);
       }
       this.offset = 0;
       this.count = count;
       this.value = Arrays.copyOfRange(value, offset, offset+count);
   }

   // Package private constructor which shares value array for speed.
   String(int offset, int count, char value[]) {
       this.value = value;
       this.offset = offset;
       this.count = count;
   }

   public String substring(int beginIndex, int endIndex) {
       if (beginIndex < 0) {
           throw new StringIndexOutOfBoundsException(beginIndex);
       }
       if (endIndex > count) {
           throw new StringIndexOutOfBoundsException(endIndex);
       }
       if (beginIndex > endIndex) {
           throw new StringIndexOutOfBoundsException(endIndex - beginIndex);
       }
       return ((beginIndex == 0) && (endIndex == count)) ? this :
           new String(offset + beginIndex, endIndex - beginIndex, value);
   }

Note how the substring uses the package level String constructor that doesn't involve any copying of the array and would be much faster (at the expense of possibly keeping around some large arrays - though not duplicating large arrays either).

Do note that the above code is for Java 1.6. The way the substring constructor is implemented was changed with Java 1.7 as documented in Changes to String internal representation made in Java 1.7.0_06 - the issue bing that memory leak that I mentioned above. Java likely wasn't seen as being a language with lots of String manipulation and so the performance boost for a substring was a good thing. Now, with huge XML documents stored in strings that are never collected, this becomes an issue... and thus the change to the String not using the same underlying array with a substring, so that the larger character array may be collected more quickly.

Don't abuse the Stack

One could pass the value of the string around instead of the reference to the immutable string to avoid issues with mutability. However, with large strings, passing this on the stack would be... abusive to the system (putting entire xml documents as strings on the stack and then taking them off or continuing to pass them along...).

The possibility of deduplication

Granted, this wasn't an initial motivation for why Strings should be immutable, but when one is looking at the rational of why immutable Strings are a good thing, this is certainly something to consider.

Anyone who has worked with Strings a bit knows that they can suck memory. This is especially true when you're doing things like pulling data from databases that sticks around for awhile. Many times with these stings, they are the same string over and over again (once for each row).

Many large-scale Java applications are currently bottlenecked on memory. Measurements have shown that roughly 25% of the Java heap live data set in these types of applications is consumed by String objects. Further, roughly half of those String objects are duplicates, where duplicates means string1.equals(string2) is true. Having duplicate String objects on the heap is, essentially, just a waste of memory. ...

With Java 8 update 20, JEP 192 (motivation quoted above) is being implemented to address this. Without getting into the details of how string deduplication works, it is essential that the Strings themselves are immutable. You can't deduplicate StringBuilders because they can change and you don't want someone changing something from under you. Immutable Strings (related to that String pool) means that you can go through and if you find two strings that are the same, you can point one string reference to the other and let the garbage collector consume the newly unused one.

Other languages

Objective C (which predates Java) has NSString and NSMutableString.

C# and .NET made the same design choices of the default string being an immutable.

Lua strings are also immutable.

Python as well.

Historically, Lisp, Scheme, Smalltalk all intern the string and thus have it be immutable. More modern dynamic languages often use strings in some way that requires that they be immutable (it may not be a String, but it is immutable).

Conclusion

These design considerations have been made again and again in a multitude of languages. It is the general consensus that immutable strings, for all of their awkwardness, are better than the alternatives and lead to better code (fewer bugs) and faster executables overall.

Related Topic