Python – How to convert a dense layer to an equivalent convolutional layer in Keras

conv-neural-networkkerasneural-networkpython

I would like to do something similar to the Fully Convolutional Networks paper (https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf) using Keras. I have a network which ends up flattening the feature maps and runs them through several dense layers. I would like to load the weights from a network like this into one where the dense layers are replaced with equivalent convolutions.

The VGG16 network which comes with Keras could be used as an example, where the 7x7x512 output of the last MaxPooling2D() is flattened and then goes into a Dense(4096) layer. In this case the Dense(4096) would be replaced with a 7x7x4096 convolution.

My real network is slightly different, there is a GlobalAveragePooling2D() layer instead of MaxPooling2D() and Flatten(). The output of GlobalAveragePooling2D() is a 2D tensor, and there is no need to additionally flatten it, so all the dense layers including the first would be replaced with 1×1 convolutions.

I've seen this question: Python keras how to transform a dense layer into a convolutional layer which seems very similar if not identical. The trouble is I can't get the suggested solution to work, because (a) I'm using TensorFlow as the backend, so the weights rearrangement/filter "rotation" isn't right, and (b) I can't figure out how to load the weights. Loading the old weights file into the new network with model.load_weights(by_name=True) doesn't work, because the names don't match (and even if they did the dimensions differ).

What should the rearrangement be when using TensorFlow?

How do I load the weights? Do I create one of each model, call model.load_weights() on both to load the identical weights and then copy some of the extra weights that need rearrangement?

Best Answer

Based on the answer of hars, I created this function to transform an arbitrary cnn into a fcn:

from keras.models import Sequential
from keras.layers.convolutional import Convolution2D
from keras.engine import InputLayer
import keras

def to_fully_conv(model):

    new_model = Sequential()

    input_layer = InputLayer(input_shape=(None, None, 3), name="input_new")

    new_model.add(input_layer)

    for layer in model.layers:

        if "Flatten" in str(layer):
            flattened_ipt = True
            f_dim = layer.input_shape

        elif "Dense" in str(layer):

            input_shape = layer.input_shape
            output_dim =  layer.get_weights()[1].shape[0]
            W,b = layer.get_weights()

            if flattened_ipt:
                shape = (f_dim[1],f_dim[2],f_dim[3],output_dim)
                new_W = W.reshape(shape)
                new_layer = Convolution2D(output_dim,
                                          (f_dim[1],f_dim[2]),
                                          strides=(1,1),
                                          activation=layer.activation,
                                          padding='valid',
                                          weights=[new_W,b])
                flattened_ipt = False

            else:
                shape = (1,1,input_shape[1],output_dim)
                new_W = W.reshape(shape)
                new_layer = Convolution2D(output_dim,
                                          (1,1),
                                          strides=(1,1),
                                          activation=layer.activation,
                                          padding='valid',
                                          weights=[new_W,b])


        else:
            new_layer = layer

        new_model.add(new_layer)

    return new_model

you can test the function like this:

model = keras.applications.vgg16.VGG16()
new_model = to_fully_conv(model)

Related Solutions

Python – input dimensions to a one dimensional convolutional network in keras

The reason why it look like this is that Keras designer intended to make 1-dimensional convolutional framework to be interpreted as a framework to deal with sequences. To fully understand the difference - try to imagine that you have a sequence of a multiple feature vectors. Then your output will be at least two dimensional - where first dimension is connected with time and other dimensions are connected with features. 1-dimensional convolutional framework was designed to in some way bold this time dimension and try to find the reoccuring patterns in data - rather than performing a classical multidimensional convolutional transformation.

In your case you must simply reshape your data to have shape (dataset_size, 101, 1) - because you have only one feature. It could be easly done using numpy.reshape function. To understand what does a new step mean - you must understand that you are doing the convolution over time - so you change the temporal structure of your data - which lead to new time-connected structure. In order to get your data to a format which is suitable for dense / static layers use keras.layers.flatten layer - the same as in classic convolutional case.

UPDATE: As I mentioned before - the first dimension of input is connected with time. So the difference between (1, 101) and (101, 1) lies in that in first case you have one time step with 101 features and in second - 101 timesteps with 1 feature. The problem which you mentioned after your first change has its origin in making pooling with size 2 on such input. Having only one timestep - you cannot pool any value on a time window of size 2 - simply because there is not enough timesteps to do that.

Python – Add dropout layers between pretrained dense layers in keras

I found an answer myself by using Keras functional API

from keras.applications import VGG16
from keras.layers import Dropout
from keras.models import Model

model = VGG16(weights='imagenet')

# Store the fully connected layers
fc1 = model.layers[-3]
fc2 = model.layers[-2]
predictions = model.layers[-1]

# Create the dropout layers
dropout1 = Dropout(0.85)
dropout2 = Dropout(0.85)

# Reconnect the layers
x = dropout1(fc1.output)
x = fc2(x)
x = dropout2(x)
predictors = predictions(x)

# Create a new model
model2 = Model(input=model.input, output=predictors)

model2 has the dropout layers as I wanted

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
input_1 (InputLayer)             (None, 3, 224, 224)   0                                            
____________________________________________________________________________________________________
block1_conv1 (Convolution2D)     (None, 64, 224, 224)  1792        input_1[0][0]                    
____________________________________________________________________________________________________
block1_conv2 (Convolution2D)     (None, 64, 224, 224)  36928       block1_conv1[0][0]               
____________________________________________________________________________________________________
block1_pool (MaxPooling2D)       (None, 64, 112, 112)  0           block1_conv2[0][0]               
____________________________________________________________________________________________________
block2_conv1 (Convolution2D)     (None, 128, 112, 112) 73856       block1_pool[0][0]                
____________________________________________________________________________________________________
block2_conv2 (Convolution2D)     (None, 128, 112, 112) 147584      block2_conv1[0][0]               
____________________________________________________________________________________________________
block2_pool (MaxPooling2D)       (None, 128, 56, 56)   0           block2_conv2[0][0]               
____________________________________________________________________________________________________
block3_conv1 (Convolution2D)     (None, 256, 56, 56)   295168      block2_pool[0][0]                
____________________________________________________________________________________________________
block3_conv2 (Convolution2D)     (None, 256, 56, 56)   590080      block3_conv1[0][0]               
____________________________________________________________________________________________________
block3_conv3 (Convolution2D)     (None, 256, 56, 56)   590080      block3_conv2[0][0]               
____________________________________________________________________________________________________
block3_pool (MaxPooling2D)       (None, 256, 28, 28)   0           block3_conv3[0][0]               
____________________________________________________________________________________________________
block4_conv1 (Convolution2D)     (None, 512, 28, 28)   1180160     block3_pool[0][0]                
____________________________________________________________________________________________________
block4_conv2 (Convolution2D)     (None, 512, 28, 28)   2359808     block4_conv1[0][0]               
____________________________________________________________________________________________________
block4_conv3 (Convolution2D)     (None, 512, 28, 28)   2359808     block4_conv2[0][0]               
____________________________________________________________________________________________________
block4_pool (MaxPooling2D)       (None, 512, 14, 14)   0           block4_conv3[0][0]               
____________________________________________________________________________________________________
block5_conv1 (Convolution2D)     (None, 512, 14, 14)   2359808     block4_pool[0][0]                
____________________________________________________________________________________________________
block5_conv2 (Convolution2D)     (None, 512, 14, 14)   2359808     block5_conv1[0][0]               
____________________________________________________________________________________________________
block5_conv3 (Convolution2D)     (None, 512, 14, 14)   2359808     block5_conv2[0][0]               
____________________________________________________________________________________________________
block5_pool (MaxPooling2D)       (None, 512, 7, 7)     0           block5_conv3[0][0]               
____________________________________________________________________________________________________
flatten (Flatten)                (None, 25088)         0           block5_pool[0][0]                
____________________________________________________________________________________________________
fc1 (Dense)                      (None, 4096)          102764544   flatten[0][0]                    
____________________________________________________________________________________________________
dropout_1 (Dropout)              (None, 4096)          0           fc1[0][0]                        
____________________________________________________________________________________________________
fc2 (Dense)                      (None, 4096)          16781312    dropout_1[0][0]                  
____________________________________________________________________________________________________
dropout_2 (Dropout)              (None, 4096)          0           fc2[1][0]                        
____________________________________________________________________________________________________
predictions (Dense)              (None, 1000)          4097000     dropout_2[0][0]                  
====================================================================================================
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
____________________________________________________________________________________________________

Best Answer

Related Solutions

Python – input dimensions to a one dimensional convolutional network in keras

Python – Add dropout layers between pretrained dense layers in keras

Related Topic