A few weeks back I decided I wanted a cartoon version of my profile picture.
Sadly, as I'm not much of an artist, drawing it myself was out of the question. So I set about on what seemed the only other logical course of action...
I trained a neural network to do it for me!
Let's get this out of the way - if you'd like a cartoon version of your profile picture, here's how:
Tweet at me (@harrison_g_reid) and include "cartoonify me" somewhere in the text of the tweet.
Within a few minutes, my (hopefully?) dependable twitter bot will reply with a cartoon-ified version of your profile picture.
NOTE: Since this was posted, the twitter bot has been disabled because πΈ, but I've put it online so you can try it out here
I should warn you, the results are...mixed. But it's funnier when it does a terrible job anyway, so π€·ββοΈ. Here's a GIF demo:
Read on if you want to learn how I built it...
Finding a Model (CartoonGAN):
The first thing I did after deciding that this would be a fun project was google around to see if there were any existing libraries I could use.
As usual, the open source world did not disappoint! I soon came across a tensorflow implementation of the CartoonGAN generative adversarial neural network.
mnicnc404 / CartoonGan-tensorflow
Generate your own cartoon-style images with CartoonGAN (CVPR 2018), powered by TensorFlow 2.0 Alpha.
The Github repo has some really cool examples of converting images and gifs into anime style cartoons - I recommend checking it out.
But this wasn't quite the style I was after. I wanted something a little more comic-book style - heavy black lines & flat colors. I wanted it to look like Archer!
Fortunately, the repo contains some pretty detailed instructions on how to train the network on your own training data.
So, I set about gathering a lot of images.
Gathering Data:
To train CartoonGAN, I would need two sets of images:
A large set of real life images of human faces.
An equally large set of cartoon faces (from Archer)
It was relatively easy to find a good dataset of human faces. I found the VGGFace2 face dataset, which is an enormous dataset, and far exceeded my needs.
Of course, there's no dataset of faces from Archer available, so I'd need to create my own.
Since I was aiming for a dataset of about 3500 images, there was no way I could realistically do this manually.
It took a little creativity, but I managed to mostly automate this. It basically ended up as a four stage process.
Using ffmpeg, extract a frame for every 4 seconds of video, for every episode of the first season of Archer. (If you're interested, the ffmpeg command to do this for a single video is:
ffmpeg -i video.mov -r 0.25 video-images/%04d.png
.)Detect the location of all the faces in every frame using facedetect. Yes, this works surprisingly well on cartoon faces!
Crop images for each located face using Jimp.
Manually check the extracted images, and remove all weird things that had incorrectly been identified as faces.
The end result was a set of ~3700 images. Just about every face from the first season of archer:
Now we're talking.
Training the Model:
This was the easy part - it basically involved cloning the the CartoonGAN repo mentioned above, copying the images to the correct directory and running the python script as per the instructions in the repo.
It was a real workout for my computer though - it took several days running the training in the background to make it through 30 epochs of training.
Here's a gif of the training progress over the first few epochs.
Running it on Node:
If you're a JavaScript developer and you haven't tried TensorFlow.js yet, get amongst it. You don't really need to know all that much about Machine Learning to make use of existing models, and build some cool stuff.
In any case, the Node API's for TensorFlow.js let you directly load the format of model output by the CartoonGAN training process (SavedModel format).
Viola! A cartoon generating neural network running on Node.
If you're interested in how I've deployed the model as a twitter bot, stay tuned! I'll provide a walkthrough in a future post.
Note: The code for this isn't yet available on my Github, but will be made available soon.
Top comments (14)
Is it temporally stable, i.e. can you turn video into reasonably good looking animation?
(Edit: the GIF isn't coming up in the comment, so have added it to the post, near the top) Just a short one, but this worked pretty well. Technically the source for this was a GIF, not a video, but should work just as well for video. Again t is pretty hit and miss though - sometimes it seems to barely do anything all. Had to try a few to get one that turned out this well π
Good question - I havenβt actually tried this yet, will give it a shot sometime today and post up the results. There are some demos on the CartoonGAN repo of some pretty good quality gifs that have been generated though, so Iβm hopeful.
This is so freaking cool, it needs to be a webapp!
Thanks Waylon! I think I'll give this a shot - in theory it should be possible to convert it run entirely in the browser with no need for a backend. May be slow though... π€
Now have it online running fully in browser at harrison.codes/cartoonify
Quite hackily put together and likely buggy, but working :)
Really great to have the explanatory step by step here. Cool stuff Harrison.
Thanks Ben π
This is so cool!
Thanks Jess! π
Nine Nine!
This is impressive! Thanks for sharing.
Wow that's really awesome! Did you look into cloud services that can train the model for you? I've never used any but I've seen some ads for them.
I did! Although not specifically with the cartoon-ifying part...
Basically I wanted to see if I could also trim the background out of the image (so I could overlay the cartoon-ified person onto a flat colour background or something like that), so spent a bit of time playing around with AWS Sagemaker, using their inbuilt semantic segmentation model. Haven't made much progress with that yet though...