The *args
and **kwargs
is a common idiom to allow arbitrary number of arguments to functions as described in the section more on defining functions in the Python documentation.
The *args
will give you all function parameters as a tuple:
def foo(*args):
for a in args:
print(a)
foo(1)
# 1
foo(1,2,3)
# 1
# 2
# 3
The **kwargs
will give you all
keyword arguments except for those corresponding to a formal parameter as a dictionary.
def bar(**kwargs):
for a in kwargs:
print(a, kwargs[a])
bar(name='one', age=27)
# name one
# age 27
Both idioms can be mixed with normal arguments to allow a set of fixed and some variable arguments:
def foo(kind, *args, **kwargs):
pass
It is also possible to use this the other way around:
def foo(a, b, c):
print(a, b, c)
obj = {'b':10, 'c':'lee'}
foo(100,**obj)
# 100 10 lee
Another usage of the *l
idiom is to unpack argument lists when calling a function.
def foo(bar, lee):
print(bar, lee)
l = [1,2]
foo(*l)
# 1 2
In Python 3 it is possible to use *l
on the left side of an assignment (Extended Iterable Unpacking), though it gives a list instead of a tuple in this context:
first, *rest = [1,2,3,4]
first, *l, last = [1,2,3,4]
Also Python 3 adds new semantic (refer PEP 3102):
def func(arg1, arg2, arg3, *, kwarg1, kwarg2):
pass
Such function accepts only 3 positional arguments, and everything after *
can only be passed as keyword arguments.
Note:
- A Python
dict
, semantically used for keyword argument passing, are arbitrarily ordered. However, in Python 3.6, keyword arguments are guaranteed to remember insertion order.
- "The order of elements in
**kwargs
now corresponds to the order in which keyword arguments were passed to the function." - What’s New In Python 3.6
- In fact, all dicts in CPython 3.6 will remember insertion order as an implementation detail, this becomes standard in Python 3.7.
Could be due to several reasons but most likely you're having a bottleneck when reading the training data. As your GPU has processed a batch it requires more data. Depending on your implementation this can cause the GPU to wait for the CPU to load more data resulting in a lower GPU usage and also a longer training time.
Try loading all data into memory if it fits or use a QueueRunner which will make an input pipeline reading data in the background. This will reduce the time that your GPU is waiting for more data.
The Reading Data Guide on the TensorFlow website contains more information.
Best Answer
K.clear_session()
is useful when you're creating multiple models in succession, such as during hyperparameter search or cross-validation. Each model you train adds nodes (potentially numbering in the thousands) to the graph. TensorFlow executes the entire graph whenever you (or Keras) calltf.Session.run()
ortf.Tensor.eval()
, so your models will become slower and slower to train, and you may also run out of memory. Clearing the session removes all the nodes left over from previous models, freeing memory and preventing slowdown.Edit 21/06/19:
TensorFlow is lazy-evaluated by default. TensorFlow operations aren't evaluated immediately: creating a tensor or doing some operations to it creates nodes in a dataflow graph. The results are calculated by evaluating the relevant parts of the graph in one go when you call
tf.Session.run()
ortf.Tensor.eval()
. This is so TensorFlow can build an execution plan that allocates operations that can be performed in parallel to different devices. It can also fold adjacent nodes together or remove redundant ones (e.g. if you concatenated two tensors and later split them apart again unchanged). For more details, see https://www.tensorflow.org/guide/graphsAll of your TensorFlow models are stored in the graph as a series of tensors and tensor operations. The basic operation of machine learning is tensor dot product - the output of a neural network is the dot product of the input matrix and the network weights. If you have a single-layer perceptron and 1,000 training samples, then each epoch creates at least 1,000 tensor operations. If you have 1,000 epochs, then your graph contains at least 1,000,000 nodes at the end, before taking into account preprocessing, postprocessing, and more complex models such as recurrent nets, encoder-decoder, attentional models, etc.
The problem is that eventually the graph would be too large to fit into video memory (6 GB in my case), so TF would shuttle parts of the graph from video to main memory and back. Eventually it would even get too large for main memory (12 GB) and start moving between main memory and the hard disk. Needless to say, this made things incredibly, and increasingly, slow as training went on. Before developing this save-model/clear-session/reload-model flow, I calculated that, at the per-epoch rate of slowdown I experienced, my model would have taken longer than the age of the universe to finish training.