Python – What do I need K.clear_session() and del model for (Keras with Tensorflow-gpu)

kerasmemory-managementpythontensorflow

What I am doing
I am training and using a convolutional neuron network (CNN) for image-classification using Keras with Tensorflow-gpu as backend.

What I am using
– PyCharm Community 2018.1.2
– both Python 2.7 and 3.5 (but not both at a time)
– Ubuntu 16.04
– Keras 2.2.0
– Tensorflow-GPU 1.8.0 as backend

What I want to know
In many codes I see people using

from keras import backend as K 

# Do some code, e.g. train and save model

K.clear_session()

or deleting the model after using it:

del model

The keras documentation says regarding clear_session: "Destroys the current TF graph and creates a new one. Useful to avoid clutter from old models / layers." – https://keras.io/backend/

What is the point of doing that and should I do it as well? When loading or creating a new model my model gets overwritten anyway, so why bother?

Best Answer

K.clear_session() is useful when you're creating multiple models in succession, such as during hyperparameter search or cross-validation. Each model you train adds nodes (potentially numbering in the thousands) to the graph. TensorFlow executes the entire graph whenever you (or Keras) call tf.Session.run() or tf.Tensor.eval(), so your models will become slower and slower to train, and you may also run out of memory. Clearing the session removes all the nodes left over from previous models, freeing memory and preventing slowdown.

Edit 21/06/19:

TensorFlow is lazy-evaluated by default. TensorFlow operations aren't evaluated immediately: creating a tensor or doing some operations to it creates nodes in a dataflow graph. The results are calculated by evaluating the relevant parts of the graph in one go when you call tf.Session.run() or tf.Tensor.eval(). This is so TensorFlow can build an execution plan that allocates operations that can be performed in parallel to different devices. It can also fold adjacent nodes together or remove redundant ones (e.g. if you concatenated two tensors and later split them apart again unchanged). For more details, see https://www.tensorflow.org/guide/graphs

All of your TensorFlow models are stored in the graph as a series of tensors and tensor operations. The basic operation of machine learning is tensor dot product - the output of a neural network is the dot product of the input matrix and the network weights. If you have a single-layer perceptron and 1,000 training samples, then each epoch creates at least 1,000 tensor operations. If you have 1,000 epochs, then your graph contains at least 1,000,000 nodes at the end, before taking into account preprocessing, postprocessing, and more complex models such as recurrent nets, encoder-decoder, attentional models, etc.

The problem is that eventually the graph would be too large to fit into video memory (6 GB in my case), so TF would shuttle parts of the graph from video to main memory and back. Eventually it would even get too large for main memory (12 GB) and start moving between main memory and the hard disk. Needless to say, this made things incredibly, and increasingly, slow as training went on. Before developing this save-model/clear-session/reload-model flow, I calculated that, at the per-epoch rate of slowdown I experienced, my model would have taken longer than the age of the universe to finish training.

Disclaimer: I haven't used TensorFlow in almost a year, so this might have changed. I remember there being quite a few GitHub issues around this so hopefully it has since been fixed.

Related Solutions

Python – What does ** (double star/asterisk) and * (star/asterisk) do for parameters

The *args and **kwargs is a common idiom to allow arbitrary number of arguments to functions as described in the section more on defining functions in the Python documentation.

The *args will give you all function parameters as a tuple:

def foo(*args):
    for a in args:
        print(a)        

foo(1)
# 1

foo(1,2,3)
# 1
# 2
# 3

The **kwargs will give you all keyword arguments except for those corresponding to a formal parameter as a dictionary.

def bar(**kwargs):
    for a in kwargs:
        print(a, kwargs[a])  

bar(name='one', age=27)
# name one
# age 27

Both idioms can be mixed with normal arguments to allow a set of fixed and some variable arguments:

def foo(kind, *args, **kwargs):
   pass

It is also possible to use this the other way around:

def foo(a, b, c):
    print(a, b, c)

obj = {'b':10, 'c':'lee'}

foo(100,**obj)
# 100 10 lee

Another usage of the *l idiom is to unpack argument lists when calling a function.

def foo(bar, lee):
    print(bar, lee)

l = [1,2]

foo(*l)
# 1 2

In Python 3 it is possible to use *l on the left side of an assignment (Extended Iterable Unpacking), though it gives a list instead of a tuple in this context:

first, *rest = [1,2,3,4]
first, *l, last = [1,2,3,4]

Also Python 3 adds new semantic (refer PEP 3102):

def func(arg1, arg2, arg3, *, kwarg1, kwarg2):
    pass

Such function accepts only 3 positional arguments, and everything after * can only be passed as keyword arguments.

Note:

A Python dict, semantically used for keyword argument passing, are arbitrarily ordered. However, in Python 3.6, keyword arguments are guaranteed to remember insertion order.
"The order of elements in **kwargs now corresponds to the order in which keyword arguments were passed to the function." - What’s New In Python 3.6
In fact, all dicts in CPython 3.6 will remember insertion order as an implementation detail, this becomes standard in Python 3.7.

Tensorflow – Low GPU usage by Keras / Tensorflow

Could be due to several reasons but most likely you're having a bottleneck when reading the training data. As your GPU has processed a batch it requires more data. Depending on your implementation this can cause the GPU to wait for the CPU to load more data resulting in a lower GPU usage and also a longer training time.

Try loading all data into memory if it fits or use a QueueRunner which will make an input pipeline reading data in the background. This will reduce the time that your GPU is waiting for more data.

The Reading Data Guide on the TensorFlow website contains more information.

Best Answer

Related Solutions

Python – What does ** (double star/asterisk) and * (star/asterisk) do for parameters

Note:

Tensorflow – Low GPU usage by Keras / Tensorflow

Related Topic