My dad loves good morning images. Those with happy little animals, whimsical forests and a happy message to start the day. Every morning he receives a lot of them, and forwards to his friends and family.
The problem is that, sometimes, he have to recycle images from other days if he does not receive any new images. Because of that, I want to send him fresh images everyday, with a unique scenery!
Since I am absolutely awful with image editing, what would be better to accomplish that than generating images with Imagine, texts with ChatGPT and overlaying them on Svelte? Let's get to work!
I've never worked with AI tools before. So, the first thing I've done for this project was to understand how their API's work. For the first version, I choose not to dive deep with the prompts, focusing on integrating the tools. Since OpenAI gives $5.00 as credits, I will be using their ChatGPT to create the texts. But, I did not use Dall-E to generate the images, because I would not be able to test the image prompts on their website for free. Thus, I've used Imagine.
On Imagine website, we can test their prompts and modifiers. This helps a lot to understand what each option change and how to use it later. By registering on their website, you receive 50 credits monthly for free.
Their API usage is quite straightforward. It is basically a POST request, pass your token as a bearer on the header, and the parameters as multipart/form data. Prompt and style are mandatory. For the style, we can use their playground to see which one fits better.
Using the parameters that I've found on the playground, I was able to quickly test the API and get similar results.
Also a very good documentation! Here we are also able to test the requests on their playground, which allowed me to understand how to craft a good prompt to create the message that will go on the image.
To generate something, you will need to pick a model and also add some modifiers to set the tone of the message. Similar to my approach with the images, I retained myself to first get the API going with a simple prompt, and study them further on the future.
With the API's already figured out, the next step is to understand what I would build to integrate it. My idea was to create a simple page that allows the user to generate the image with Imagine, and the tool will ask for a motivational phrase to ChatGPT to place over the image. My choice was to create some arrays of options to generate the prompt.
If you want just to generate something, ChatGPT's library makes that as easy as it can get. You send the prompt and the modifiers that you want, and it responds with an array of completions. In this case, it will be only one response anyways. That way, we can just access the message from the first completion of the response. I've reduced a lot the maximum number of tokens, so it would not use
Since the response is deterministic, if you give the same prompt, you will have the same response. Because of that, I've created a basic random prompt generator, that sets the tone of the message. Positivity, blessings, motivational and friendship are some of them. Still need to improve that list so it does not get repetitive over time.
For the image generation, Imagine Art was called via API. With the image returned, got it as a blob and saved it into the static folder.
To place the text over the image, I've used Jimp. It was quite easy to use it, the only tricky part was that, to print some text with a font that its not their default ones, it requests a font in bitmap format. They provide some good tools that performs that conversion. So, I've downloaded a font from Google fonts and added the generated bitmap to the project.
Since, with bitmap, the font has only one color, I've used the trick to create text over a transparent image and transform its color to obtain sort of a shadow effect and improve readability.
With the landing page created and all the login set up, we are up and running!
The first image generated with this tool represents very well the objective, with the phrase "Have a good day full of possibilities".
Overall, the tools used were really good and had a nice interface. I struggled a bit with the image handling, since I was not familiar with it. I was trying to return the blob from the server side to client side and work with it there, but, ultimately, decided to save it to the static folder and handle it entirely on the server side, with the frontend just consuming the image from the static folder.
It was really good to find tools such as Jimp, I see it helping me on the future on other situations. Albeit simple, it has a lot of tweaks and hacks that allow to create complex images.
To further improve this project, the biggest improvement would be on creating the prompts. Currently, the most "Whastapp like" image that I could generate with simple prompts, would be some cute animals on a plant background. So, on the future, it would be nice to feed the model with other good morning messages and receive better results.
Also, improve it to other circumstances, such as good night, happy birthday, merry Christmas, and other festive occasions. For that, the message and the image should be improved, to understand the context based on the user selection.
Let me know if you have some suggestion or tips! Drop a heart if you liked this article :D
Code for this project can be found on: https://github.com/FRicardi/good-morning-sunshine