Python – How to get reproducible results in keras

kerasnumpypythontheano

I get different results (test accuracy) every time I run the imdb_lstm.py example from Keras framework (https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py)
The code contains np.random.seed(1337) in the top, before any keras imports. It should prevent it from generating different numbers for every run. What am I missing?

UPDATE: How to repro:

  1. Install Keras (http://keras.io/)
  2. Execute https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py a few times. It will train the model and output test accuracy.
    Expected result: Test accuracy is the same on every run.
    Actual result: Test accuracy is different on every run.

UPDATE2: I'm running it on Windows 8.1 with MinGW/msys, module versions:
theano 0.7.0
numpy 1.8.1
scipy 0.14.0c1

UPDATE3: I narrowed the problem down a bit. If I run the example with GPU (set theano flag device=gpu0) then I get different test accuracy every time, but if I run it on CPU then everything works as expected. My graphics card: NVIDIA GeForce GT 635)

Best Answer

You can find the answer at the Keras docs: https://keras.io/getting-started/faq/#how-can-i-obtain-reproducible-results-using-keras-during-development.

In short, to be absolutely sure that you will get reproducible results with your python script on one computer's/laptop's CPU then you will have to do the following:

  1. Set the PYTHONHASHSEED environment variable at a fixed value
  2. Set the python built-in pseudo-random generator at a fixed value
  3. Set the numpy pseudo-random generator at a fixed value
  4. Set the tensorflow pseudo-random generator at a fixed value
  5. Configure a new global tensorflow session

Following the Keras link at the top, the source code I am using is the following:

# Seed value
# Apparently you may use different seed values at each stage
seed_value= 0

# 1. Set the `PYTHONHASHSEED` environment variable at a fixed value
import os
os.environ['PYTHONHASHSEED']=str(seed_value)

# 2. Set the `python` built-in pseudo-random generator at a fixed value
import random
random.seed(seed_value)

# 3. Set the `numpy` pseudo-random generator at a fixed value
import numpy as np
np.random.seed(seed_value)

# 4. Set the `tensorflow` pseudo-random generator at a fixed value
import tensorflow as tf
tf.random.set_seed(seed_value)
# for later versions: 
# tf.compat.v1.set_random_seed(seed_value)

# 5. Configure a new global `tensorflow` session
from keras import backend as K
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)
# for later versions:
# session_conf = tf.compat.v1.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
# sess = tf.compat.v1.Session(graph=tf.compat.v1.get_default_graph(), config=session_conf)
# tf.compat.v1.keras.backend.set_session(sess)

It is needless to say that you do not have to to specify any seed or random_state at the numpy, scikit-learn or tensorflow/keras functions that you are using in your python script exactly because with the source code above we set globally their pseudo-random generators at a fixed value.