Java Garbage Collection – Readability vs Performance in Data Structure Management

arraygarbage-collectionjava

I work with large HashMap's and ArrayLists. When not needing them to be in memory any longer, I use myArray.clear(); to free up the memory.

When my colleague saw that line, he changed it to myArray = new ArrayList<>();. He agreed when I asked if he was doing it to let the garbage collector take care of it.

  1. Although I feel it's nice, I felt it decreases readability. Somehow, clear() lets the maintainer know that the array is being cleared. A cursory glance at new ArrayList<>() might make a person think an array is newly being initialized there.
  2. Is the performance improvement really worth it? I saw the source code of ArrayList, and the fact that they are iterating over the list of elements to assign null to them, made me wonder why they couldn't clear the memory by a quicker technique.

Implementation of "clear":

 public void clear() {
     modCount++;
     // clear to let GC do its work
     for (int i = 0; i < size; i++) {elementData[i] = null;}
     size = 0;
 }

The one downside I see to using new ArrayList<>() is that a new set of contiguous locations would have to be allocated in memory. Perhaps that would pose a problem only if there is not enough of memory remaining, before the garbage collector can clear up the

Best Answer

It seems you missed one important difference between myArray.clear() and myArray = new ArrayList<>(): the first one preserves the capacity of the array, thus not freeing the array memory itself. Only the memory of the objects your array elements were referencing to will be freed (as long as the array holded the only reference to those objects).

So if you want to let the GC free the whole memory, better use myArray = new ArrayList<>(). Of course, the difference will probably be negligible if you immediately fill myArray with a similar number of elements than it was filled before.

Is the performance improvement really worth it?

Well, which performance improvement? It is not inherently clear which of the two approaches will be faster in your use case. The clear method may contain a loop, but if you create a new arraylist which grows over time, not having a preallocated capacity will cause some reallocation, which might result in a measureable performance hit. So without measuring, you cannot tell beforehand which of the two approaches will be faster. For most real world situations, I would expect the difference to be irrelevant, but we do not know your use case, and if performance is important for your case, profile where the bottleneck is, try different approaches, measure and compare them.