TL;DR: We point out how to make psychedelic animations from discarded instabilities in neural style transfer. This post builds upon a remark we made in our recent paper Interactive Neural Style Transfer with Artists. In this paper, we questioned several simple evaluation aspects of neural style transfer methods. Also, it is our second series of interactive painting experiments where style transfer outputs constantly influence a painter, see the other series here. See also our medium post.

Written by Thomas Kerdreux and Louis Thiry.

The first frame of the video, a watercolor from my grand-father, is progressively stirred into a plethora of curvy and colorful patches. It then metamorphoses into a purplish phantasmal coral reef that is itself slowly submerged by an angry puce ocean. The water then calms down as the coral reef disappears and ends up perfectly still. How is this psychedelic animation related to style transfer methods?

Neural style transfers are rendering techniques – for images mostly – that seek to stylize a content image with the style of another, see figure below. More precisely, the algorithms are designed to extract a style representation of an image and a representation of the semantic content of another and then cleverly construct a new picture from these.

Heard Island in Antarctica Maxime Maufra’s painting Style Transfer Output using STROTSS [KS]

While designing new evaluation techniques for style transfer methods in [KT], we made an uncomplicated but crucial observation. Style transfer applied to the same image as style and content should reasonably output the image itself. However, we observed that many style transfer algorithms do not satisfy this property. No one ever cared for hard-coding this fundamental property. Here, we show how, leveraging on that instability, we produce animations like the one above.

MST first iteration MST second iteration MST third iteration

Formally a style transfer method is simply a function \(f\) that takes a style image \(s\) and a content image \(c\) and outputs a new image \(f(s,c)\). Our observation is that for some style transfer methods \(f\) and an initial images \(x_0\), the equality $f(x_0, x_0) = x_0$ is not satisfied. The output images are adding a slightly perceptible flicker, blur or blunder to the initial image \(x_0\). These instability patterns differ from one method to another but are experimentally the same when starting from different images \(x_0\).

Yet these effects are hardly perceptible. Hence to better understand the phenomenon, we need to amplify them. We simply repeat the process: start from an initial image \(x_0\) and reiterate the style transfer operation

\[\begin{align*} x_{t+1} = f(x_t, x_t) \end{align*},\]

In Figure above, after a few iterations, the effects become perceptible and particularly stylish. For instance, when taking the MST style transfer method [MST] (with this code) the iterates become tessellated versions of the initial image. The instabilities amplify all the lines of the pictures. On portraits, they reveal all wrinkles. When taking another algorithm like WCT [WCT] (with this code), the effects are different. The goblin is slowly dematerialized by the devilish style transfer instabilities, see figure below.

MST first iteration MST second iteration MST fourth iteration

So far, we simply showed the outputs of the first iterations of the repeat process. Actually, the animation above basically collects all the images of the sequence \((x_t)\). For many different pictures and methods, we observe this asymptotic type of divergence that we name psychedelic regime. Indeed, once the algorithm loses track of the initial image, it starts raving. And then feeds itself with its own slowly delusional outputs, without ever going back to our reality! The raving differs from one method to another but experimentally seems not to depend on the initial image.

This playfully shoots what a machine can do when forgetting about the human inputs or the non-numerical reality. In fact, metaphorically, this is also happening in many practical uses of algorithms. For instance, collaborative based filtering recommender systems use new data that come from humans interacting with the algorithm. We no longer assess the choices human would have done without ever been influenced by algorithms. We have lost this initial input!

[R] and [G] studied instabilities of style transfer method in the case of real-time style transfer for videos. The style transfer output may differ significantly from one frame to another while the initial consecutive frames are perceptibly the same. This results in unpleasing flickering effect in style transferred videos. Similarly to the adversarial examples literature, the main focus is to study the instabilities to detect, correct and remove them. Here we outlined instabilities stemming from another type of inconsistency and took advantage of them.

Also, note that MST and WCT are feed-forward approaches to style transfer, i.e. the function $f$ is a neural network [JA,GL,LW]. In reality, the first approach to neural style transfer was optimization-based [G]. In particular, when considering the same image as style and content, the image is the global optimum of the loss. Hence the method satisfies $f(x,x)=x$ if properly initialized. Actually, even when choosing a random initial image, we observed that the image iterate converges to the initial image, i.e. the global minimum of a non-convex loss. Note although that some optimization-based methods like STROTSS may still not satisfy this stability property because of some randomization and re-parametrization of the image with its Laplacian pyramid.

Finally, if you are interested in making your psychedelic videos, the take-home message is that most certainly any feed-forward neural style-transfer approach will give a different psychedelic regime. Below we show one using the WCT method and the first iteration when using STROTSS optimization-based style method (our code).

STROTSS first iteration STROTSS several iteration later… STROTSS several iteration later…

References

[CKT] Cabannes, V., Kerdreux, T., Thiry, L., Campana, T., & Ferrandes, C. (2019). Dialog on a Canvas with a Machine. Third Workshop of Creativity and Design at NeurIPS 2019. pdf

[JA] Johnson, J.; Alahi, A.; and Fei-Fei, L. 2016. Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision, 694–711. Springer. pdf

[G] Gatys, L. A., Ecker, Alexander S., et Bethge, M.. A neural algorithm of artistic style. (2015). pdf

[GL] Ghiasi, G.; Lee, H.; Kudlur, M.; Dumoulin, V.; and Shlens, J. (2017). Exploring the structure of a real-time, arbitrary neural artistic stylization network. pdf

[G] Gupta, A.; Johnson, J.; Alahi, A.; and Fei-Fei, L. 2017. Characterizing and improving stability in neural style transfer. In Proceedings of the IEEE International Conference on Computer Vision, 4067–4076. pdf

[KT] Kerdreux, T., Thiry, L., Kerdreux, E. (2020). Interactive Neural Style Transfer with Artists. pdf

[KS] Kolkin, N., Salavon, J., Shakhnarovich G. (2019). Style Transfer by Relaxed Optimal Transport and Self-Similarity. pdf

[LW] Li, C., and Wand, M. (2016). Precomputed real-time texture synthesis with markovian generative adversarial networks. In European conference on computer vision, 702–716 Springer. pdf

[MST] Zhang, Y., Fang, C., Wang, Y., Wang, Z., Lin, Z., Fu, Y., Yang, J. (2017). Multimodal Style Transfer via Graph Cuts. In Proceedings of the IEEE International Conference on Computer Vision. 2019. p. 5943–5951. pdf

[WCT] Li, Y., Fang, C., Yang, J., Wang, Z., Lu, X., & Yang, M. H. (2017). Universal style transfer via feature transforms. In Advances in neural information processing systems (pp. 386–396). pdf

[R] Risser, E.; Wilmot, P.; and Barnes, C. 2017. Stable and controllable neural texture synthesis and style transfer using histogram losses. pdf