I am relatively new to DL and Keras.
I am trying to implement perceptual loss using the pretrained VGG16 in Keras but have some troubles. I already found that question but I am still struggling :/
A short explanation of what my network should do:
I have a CNN (subsequent called mainModel) that gets grayscale images as input (#TrainData, 512, 512, 1) and outputs grayscale images with the same size. The network should reduce artifacts in the images – but I think it is not that important for this question. Instead of using e.g. MSE as loss function, I would like to implement the perceptual loss.
What I want to do (I hope I have properly understood the concept of perceptual loss):
I would like to append a lossModel (pretrained VGG16 with fixed params) to my mainModel. Then I would like to pass the output of the mainModel to the lossModel. In addition I pass the label images (Y_train) to the lossModel. Further on I compare the activations at a specific layer (e.g. block1_conv2) of the lossModel using e.g. MSE and use it as loss function.
What I did so far:
Load in data and create the mainModel:
### Load data ###
with h5py.File('.\train_test_val.h5', 'r') as hf:
X_train = hf['X_train'][:]
Y_train = hf['Y_train'][:]
X_test = hf['X_test'][:]
Y_test = hf['Y_test'][:]
X_val = hf['X_val'][:]
Y_val = hf['Y_val'][:]
### Create Main Model ###
input_1 = Input((512,512,9))
conv0 = Conv2D(64, (3,3), strides=(1,1), activation=relu, use_bias=True, padding='same')(input_1)
.
.
.
mainModel = Model(inputs=input_1, outputs=output)
Create lossModel, append it to mainModel and fix params:
### Create Loss Model (VGG16) ###
lossModel = vgg16.VGG16(include_top=False, weights='imagenet', input_tensor=mainModel.output, input_shape=(512,512, 1))
lossModel.trainable=False
for layer in lossModel.layers:
layer.trainable=False
Create new model including both networks and compile it
### Create new Model ###
fullModel = Model(inputs=mainModel.input, outputs=lossModel.output)
fullModel.compile(loss='mse', optimizer='adam',metrics=['mse','mae'])
fullModel.summary()
Adjust label images by passing them through the lossNetwork:
Y_train_lossModel = lossModel.predict(Y_train)
Fit the fullModel using the perceptual loss:
fullModel.fit(X_train, Y_train_lossModel, batch_size=32, epochs=5, validation_data=[X_val,Y_val])
Problems occurring:
-
VGG16 wants to get inputs of shape (?,?,3) but my mainModel outputs a grayscale image (?,?,1)
-
Some issue with appending the lossModel to the mainModel
RuntimeError: Graph disconnected: cannot obtain value for tensor Tensor("conv2d_2/Relu:0", shape=(?, 512, 512, 3), dtype=float32) at layer "input_2". The following previous layers were accessed without issue: []
- How can I calculate the MSE at a specific layers activation and not at the output of the lossModel?
Thank you so much for your help and sorry for the extremely long question 🙂
Best Answer
Number of channels
Well, the first problem is significant.
VGG models were made to color images with 3 channels... so, it's quite not the right model for your case. I'm not sure if there are models for black & white images, but you should search for them.
A workaround for that, which I don't know if will work well, is to make 3 copies of
mainModel
's output.Graph disconnected
This means that nowhere in your code, you created a connection between the input and output of
fullModel
. You must connect the output ofmainModel
to the input oflossModel
But first, let's prepare the VGG model for multiple outputs.
Preparing
lossModel
for multiple outputsYou must select which layers of the VGG model will be used to calculate the loss. If you use only the final output there won't be really a good perceptual loss because the final output is made more of concepts than of features.
So, after you select the layers, make a list of their indices or names:
Let's make a new model from VGG16, but with multiple outputs:
Joining the models
Now, here we create the connection between the two models.
We call the
lossModel
(as if it were a layer) taking the output of themainModel
as input:Now, with the graph entirely connected from the input of mainModel to the output of lossModel, we can create the fullModel:
Training
Take the predictions of this new
lossModel
, just as you did. But for the workaround, let's make it triple channel as well:Make sure you make each layer of
lossModel
non trainable beforefullModel.compile()
.If you want 'mse' for all outputs, you just do:
If you want a different loss for each layer, pass a list of losses:
Additional considerations
Since VGG is supposed to work with images in the caffe format, you might want to add a few layers after
mainModel
to make the output suitable. It's not absolutely required, but it would use the best performance from VGG.See how keras transforms an input image ranging from 0 to 255 into a caffe format here at line 15 or 44