DEV Community

Ramprakash
Ramprakash

Posted on

Neural Style Transfer

Most people have probably experimented with picture editing, some of us professionally, and at the very least used mobile apps like snapseed, picsart, etc. We went through a procedure where we added filters as an effect to our images to improve them. Have you ever pondered the mechanism at play here — specifically, what takes place when we select a number of filters? I’ll describe the mechanism and the process used, in this post. This method is called Neural Style Transfer.

What is Neural Style Transfer?

Neural Style Transfer(NST) is an algorithmic technique that involves modifying digital images or videos to take on the appearance or visual style of another image. Deep neural networks are a feature of NST algorithms, which they employ for the purpose of picture transformation.

How Convolutional Neural Networks are used here?

CNN are used to extract the features from the style and content images, giving us the final product in an enforced format. In essence, CNN is made up of a variety of Convolutional techniques that identify and separate the distinct aspects of an image for feature extraction and analysis. Convolutional tools are a network of convolutional and pooling layers that aid in feature extraction and send the extracted data to fully connected layers for classification.

Basic structure of CNN

VGG19 is used in training this algorithm. VGG19 is a pre-trained CNN model that has 19 Convolutional layers which has been trained on more than a million images from the ImageNet dataset. The reason behind using pre-trained model is they know the features that should be extracted from the images which in this case will be useful to extract the features from style and content Image. The main goal of this algorithm is to get the output image(G) which has the content from the content image(C) and the style from the style Image(S). Therefore, the loss of the generated image(G) should be calculated with respect to the loss of the style(S) and the content(C) image.

With the above intuition in mind, lets define the loss of the final generated image.

NST Working architecture

Content Loss

Content Loss refers to check how different is the content image(C) from the generated image(G). Content Loss is calculated to make sure that the image is having the same content throughout the training in each layer. Let C and G represents the original and generated image and let C[l] and G[l] represents the feature extracted from the images at the respective layers. The content loss can be defined as

Content loss

Style Loss

Style loss represents how similar are the style images from the original image. The major difference between the style loss and the content loss is style loss is calculated in all the layers of the style and generated image. To find the style loss, we need to find how the features are correlated to each other. This can be found through Gram Matrix.

Gram Matrix calculation.

Gram Matrix represents the correlation between all the styles of the feature map. To find the correlation between the features, dot product is taken. Therefore, the style loss is the root-mean-square of the Gram Matrix of the style image and the Gram Matrix of the generated image. This can be represented as,

Style loss along with Gram Matrix calculation

Total Loss

By defining the style and the content loss, the total loss can be calculated as the sum of content and the style loss,

Total Loss

Where α and β are the hyperparameters which defines the weightage of each cost in the Generated Image.

Once, the total loss is calculated it needs to be minimized by backpropagation which will result in the mixture of the style and the content images.

Conclusion

In this post we saw that NST can be achieved using VGG19 model. However, other models can also be used if the layers are chosen properly for feature extraction.

References

https://www.upgrad.com/blog/basic-cnn-architecture/

https://medium.com/analytics-vidhya/understanding-and-implementing-neural-style-transfer-7d752d3cfe74

https://towardsdatascience.com/how-do-neural-style-transfers-work-b76de101eb3

https://towardsdatascience.com/light-on-math-machine-learning-intuitive-guide-to-neural-style-transfer-ef88e46697ee

Top comments (1)

Collapse
 
michaljasa profile image
Michal sa

What distinguishes PicsArt is its Picsart adaptability; it can now be used for video editing in addition to picture editing, combining the two features into one easy-to-use application.