On December 6th 2023, Google unveiled Gemini, one of the most powerful AI models of recent times. It is the largest and most capable model with the ability to read and interpret multiple modals; videos, audio and images.
Currently, there are two ways to use Gemini , 1.) It is integrated to Google Bard, Google's artificial intelligence chatbot, 2.) Google made it commercially available for application developers to incorporate into their applications via REST API.
Gemini API currently supports images only and the images must be 4MB and below. It is available in 3 sizes;Gemini ultra,Gemini pro and Gemini nano.
I recently developed a small image-to-text application Image reader using React js,Vite and tailwind css that consumes Gemini API by taking in images and prompts and responding with text based on the given prompts.
The website application enables users to upload photos or capture pictures with their webcams and then prompts them to describe what they would like to see in the photograph. After receiving the text response, the user can copy it to use later.
Additionally, the application has a capability to compare images; upload two images and ask Gemini to differentiate or compare the images. The API has a comprehensive documentation here Docs
In conclusion, AI is becoming more and more accessible to web developers through the API frontierand tooling for libraries and frameworks like Github Co-pilot, codium AI and prompt engineering chatbots (chat GPT, Google bard and claude AI).
The source code is available here on Github and the app is hosted here on Firebase
Top comments (0)