A few years ago a friend of mine showed me the hilarious Harry Potter and the Portrait of what Looked Like a Large Pile of Ash, created by Botnik, and subsequentially I've fallen in love with the idea of training machines to write.
Since then I've been exploring the internet looking for funny/weird/awesome content generated by AIs, here's a taste of what I've found so far:
- Sunspring - a short sci-fi film by Oscar Sharp and Ross Goodwin.
- Speedgate - a sport where the rules were written by an AI.
- Recipes that are written by an AI. There are lots of examples of these but my favourite was when Botnik created videos of people following the recipes.
I've since decided it'd be a great side project to look into myself.
So, how can you get started?
You might think this could be a complicated, long process but trust me it's not! Thanks to the help of the amazing work done by Max Woolf and his Python module textgenrnn. Textgenrnn is a Python 3 module on top of Keras/TensorFlow for creating char-rnns. If none of that sentence made any sense, don't worry - it creates neural networks that learn based on input text you provide the model.
First, we want to install Textgenrnn and TensorFlow using pip, like this:
pip3 install textgenrnn tensorflow
Now in your favourite Python IDE/text editor (I use Spyder) simply add the following 3 (yes only 3!) lines of Python:
from textgenrnn import textgenrnn
textgen = textgenrnn()
textgen.generate()
If you run the above, as per the Textgenrnn tutorial, you will see an output something like:
[Spoiler] Anyone else find this post and their person that was a little more than I really like the Star Wars in the fire or health and posting a personal house of the 2016 Letter for the game in a report of my backyard.
Now that didn't make much sense and the reason your output won't be the same as the above is that you have made an AI write text. That's right the output was that of an AI that has been trained on data inside the Textgenrnn module. How awesome is that? Well done on training your first AI text generator!
Let's find some data
We've got an AI to generate output based on pre-loaded data, all we need to do now is train the model on a new text. And as it turns out this only takes two lines of code:
textgen.train_from_file('yourFileGoesHere.txt', num_epochs=1)
textgen.generate()
The hard part, as usual, is getting lots of data and data in the correct format. That's why I chose Harry Potter, I knew the fans would have me covered and they did: spells.csv, a CSV of different spells from the books! Thank you @Gulsah Demiryurek all credit to you! :)
Note: you will want to be careful that data you find is clean and suitable for your use case, I tidied up the above (mainly removing "
and additional ;
from the values) and you can find that here.
Now all we need to do is add our csv file to the train_from_file
, right? Unfortunately not, Textgenrnn requires our data to be in a specific format, that is values on separate lines. We need to pick out the spells and effects from the csv and save them in separate text files like so:
# Get spells and effects from csv
with open('Spells.csv') as csvfile:
spells = []
effects = []
spellsreader = csv.reader(csvfile, delimiter=';')
next(spellsreader, None) # skip the headers
for row in spellsreader:
if not (row[1] == 'Unknown' or row[1] == ''):
print(row[1])
spells.append(row[1])
effects.append(row[3])
# Write to single line text files ready for input to textgenrnn
with open("spells.txt", "w") as output:
output.write('\n'.join(spells))
with open("effects.txt", "w") as output:
output.write('\n'.join(effects))
Now we have our data!!
Putting it all together to Harry Potter-ify the output
To generate spells, we can simply do the following:
# Generate Spells
spellgen = textgenrnn()
spellgen.train_from_file('spells.txt', num_epochs=1)
generated_spells = spellgen.generate(5, return_as_list=True)
output:
Evpaborrra
Incendiarars
Lomomorrius
Flips
Glandiren Skullus
Similarly to generate spell effects:
# Generate Spell Effects
effectgen = textgenrnn()
effectgen.train_from_file('effects.txt', num_epochs=1)
generated_effects = effectgen.generate(5, return_as_list=True)
output:
Mends target
Conjures sparks
Turns water to splose
she nothing the juckers
Turns target to shee
These outputs are, um, interesting?
Generated content is never going to make perfect sense - we know that and it's part of the fun - but I'd like it more if the spells and effects were linked to one another. For example, "Incendiarars" should set something on fire, "Evpaborra" should evaporate something and "Flips" should well flip something. Let's try to generate effects based on the spell name.
To do this is simple when we create our spells and effects arrays instead just create one where each element is a spell name and the effect:
spell = "%s: %s" % (row[1],row[3])
print(spell)
spells.append(spell)
This way the data our model is trained on (and therefore the data our model outputs) will be a spell name followed by its corresponding effect. Now we only train from spells.txt, my output was the following:
Flippers: Turns target
Stuckus: Creates mess of things in the wand things
Incardimo: Reveals doors
Solor: Reveals objects of the flames
Victimous: Transimotions target
I'm not sure this fixed the problem but it does look better to me, Solor is fire/light related and this time Flippers is flipping something - flipping brilliant!
If you have any ideas on further improvements, I'd love to hear from you in the comments :)
Well, we did it! We generated new Harry Potter spells - though I'm not convinced J K Rowling will want to use them!
Where to go from here?
Hopefully, this example has shown how easy it is to generate text using Textgenrnn. From here, as long as you can find the data, you could generate anything you want. Kaggle Datasets are always a great source of data and are usually well documented. However, I would also encourage you to do something more personal/relevant to you, as this always helps with motivation. During a hackathon at my workplace, I generated Jira tickets based on all of the Jira tickets in our backlog (you can easily download a CSV of these), this was very amusing but also showed how poorly some of our tickets were written and how repetitive they could be. For example: "document project" was in a lot of our Jira ticket names, and so my neural network came up with "document document project document project" - a subtle reminder we devs often neglect to document our work.
Please above all else have fun exploring the peculiar world of text generation!
Top comments (6)
While everyone else is social distracting you are productivity distracting, I love it!
Haha, thanks! I'm always looking for more time to do the things I enjoy!
AI generated Magic the Gathering cards are pretty amusing as well.
Thanks for the interesting read.
Oh cool, I'll have to look them up, I used to play a couple years back! Thanks for the suggestion π
Superb! My son loved this, and I'm feeling fired up to do some experiments! π₯ π€ π§ π§π½ββοΈ
Thanks @terkwood ! Glad to hear your son enjoyed the post π One thing I didn't get into in this post is the
temperature
variable in textgenrnn, it allows you to generate between 'perfect grammar' and 'complete nonesense' β I was considering making a seperate post about it. Have fun playing around and get in touch if you'd like a hand π π§ββοΈ