I work with large HashMap's and ArrayLists. When not needing them to be in memory any longer, I use myArray.clear();
to free up the memory.
When my colleague saw that line, he changed it to myArray = new ArrayList<>();
. He agreed when I asked if he was doing it to let the garbage collector take care of it.
- Although I feel it's nice, I felt it decreases readability. Somehow,
clear()
lets the maintainer know that the array is being cleared. A cursory glance atnew ArrayList<>()
might make a person think an array is newly being initialized there. - Is the performance improvement really worth it? I saw the source code of ArrayList, and the fact that they are iterating over the list of elements to assign null to them, made me wonder why they couldn't clear the memory by a quicker technique.
Implementation of "clear":
public void clear() {
modCount++;
// clear to let GC do its work
for (int i = 0; i < size; i++) {elementData[i] = null;}
size = 0;
}
The one downside I see to using new ArrayList<>()
is that a new set of contiguous locations would have to be allocated in memory. Perhaps that would pose a problem only if there is not enough of memory remaining, before the garbage collector can clear up the
Best Answer
It seems you missed one important difference between
myArray.clear()
andmyArray = new ArrayList<>()
: the first one preserves the capacity of the array, thus not freeing the array memory itself. Only the memory of the objects your array elements were referencing to will be freed (as long as the array holded the only reference to those objects).So if you want to let the GC free the whole memory, better use
myArray = new ArrayList<>()
. Of course, the difference will probably be negligible if you immediately fillmyArray
with a similar number of elements than it was filled before.Well, which performance improvement? It is not inherently clear which of the two approaches will be faster in your use case. The
clear
method may contain a loop, but if you create a new arraylist which grows over time, not having a preallocated capacity will cause some reallocation, which might result in a measureable performance hit. So without measuring, you cannot tell beforehand which of the two approaches will be faster. For most real world situations, I would expect the difference to be irrelevant, but we do not know your use case, and if performance is important for your case, profile where the bottleneck is, try different approaches, measure and compare them.