DEV Community

Cover image for OpenAI Launches GPT-4o Mini: A Cost-Effective AI Model
Aishik Chatterjee
Aishik Chatterjee

Posted on

OpenAI Launches GPT-4o Mini: A Cost-Effective AI Model

OpenAI, led by Sam Altman, has unveiled its latest AI model, GPT\-4o Mini,
marking a significant step in making artificial intelligence more accessible
and affordable. This new model is designed to be cost\-effective, catering to
developers with limited resources who wish to integrate generative AI into
their applications. OpenAI introduced GPT\-4o mini on Thursday, its latest
small AI model. The company says GPT\-4o mini, which is cheaper and faster
than OpenAI’s current cutting\-edge AI models, is being released for
developers, as well as through the ChatGPT web and mobile app for consumers,
starting today. Enterprise users will gain access next week. The company says
GPT\-4o mini outperforms industry\-leading small AI models on reasoning
tasks involving text and vision. As small AI models improve, they are becoming
more popular for developers due to their speed and cost efficiencies compared
to larger models, such as GPT\-4 Omni or Claude 3\.5 Sonnet. They’re a
useful option for high volume, simple tasks that developers might repeatedly
call on an AI model to perform. GPT\-4o mini will replace GPT\-3\.5 Turbo
as the smallest model OpenAI offers. The company claims its newest AI model
scores 82% on MMLU, a benchmark to measure reasoning, compared to 79% for
Gemini 1\.5 Flash and 75% for Claude 3 Haiku, according to data from
Artificial Analysis. On MGSM, which measures math reasoning, GPT\-4o mini
scored 87%, compared to 78% for Flash and 72% for Haiku. Further, OpenAI says
GPT\-4o mini is significantly more affordable to run than its previous
frontier models, and more than 60% cheaper than GPT\-3\.5 Turbo. Today,
GPT\-4o mini supports text and vision in the API, and OpenAI says the model
will support video and audio capabilities in the future. "For every corner of
the world to be empowered by AI, we need to make the models much more
affordable" said OpenAI's head of Product API, Olivier Godement, in an
interview with TechCrunch. "I think GPT\-4o mini is a really big step forward
in that direction." For developers building on OpenAI’s API, GPT\-4o mini
is priced at 15 cents per million input tokens and 60 cents per million output
tokens. The model has a context window of 128,000 tokens, roughly the length
of a book, and a knowledge cutoff of October 2023\. OpenAI would not disclose
exactly how large GPT\-4o mini is, but said it’s roughly in the same tier
as other small AI models, such as Llama 3 8b, Claude Haiku and Gemini 1\.5
Flash. However, the company claims GPT\-4o mini to be faster, more
cost\-efficient and smarter than industry\-leading small models, based
pre\-launch testing in the LMSYS.org chatbot arena. Early independent tests
seem to confirm this. "Relative to comparable models, GPT\-4o mini is very
fast, with a median output speed of 202 tokens per second," said George
Cameron, Co\-Founder at Artificial Analysis, in an email to TechCrunch. "This
is more than 2X faster than GPT\-4o and GPT\-3\.5 Turbo and represents a
compelling offering for speed\-dependent use\-cases including many consumer
applications and agentic approaches to using LLMs." Separately, OpenAI
announced new tools for enterprise customers on Thursday. In a blog post,
OpenAI announced the Enterprise Compliance API to help businesses in highly
regulated industries such as finance, healthcare, legal services and
government comply with logging and audit requirements. OpenAI launched
GPT\-4o mini yesterday (18th June 2024\), taking the world by storm. There
are several reasons for this. OpenAI has traditionally focused on large
language models (LLMs), which take a lot of computing power and have
significant costs associated with using them. However, with this release, they
are officially venturing into small language models (SLMs) territory and
competing against models like Llama 3, Gemma 2, and Mistral. While many
official benchmark results and performance comparisons have been released, I
thought of putting this model to the test against its two predecessors,
GPT\-3\.5 Turbo, and their newest flagship model, GPT\-4o, in a series of
diverse tasks. So, let’s dive in and see more details about GPT\-4o mini
and its performance. This section will try to understand all the details about
OpenAI’s new GPT\-4o mini model. Based on their recent announcement, this
model has been released, focusing on making access to intelligent models more
affordable. It has low cost (more on this shortly) and latency. It enables
users to build Generative AI applications faster, processing large volumes of
text thanks to its large context window, giving near\-real\-time responses,
and parallelizing multiple API calls. GPT\-4o mini, just like its
predecessor, GPT\-4o, is a multimodal model and has support for text, images,
audio, and video. Right now, it only supports text and image, unfortunately,
with the other input options to be released sometime in the future. This model
has been trained on data upto October 2023 and has a massive input context
window of 128K tokens and an output response token limit of 16K per request.
This model shares the same tokenizer as GPT\-4o and hence has improved
responses for prompts in non\-English languages. OpenAI has significantly
tested GPT\-4o mini’s performance across a variety of standard benchmark
datasets focusing on diverse tasks and comparing it with several other large
language models (LLMs), including Gemini, Claude, and its predecessors,
GPT\-3\.5 and GPT\-4o. OpenAI claims that GPT\-4o mini performs
significantly better than GPT\-3\.5 Turbo and other models in textual
intelligence, multimodal reasoning, math, and coding proficiency benchmarks.
As you can see in the above\-mentioned visualization, GPT\-4o mini has been
evaluated across several key benchmarks, including: We also have detailed
analysis and comparisons done by Artificial Analysis, an independent
organization that provides benchmarking and related information for various
LLMs and SLMs. The following visual clearly shows how GPT\-4o mini focuses on
providing quality responses at blazing\-fast speeds as compared to most other
models. Besides the performance of the model in terms of quality of results,
there are a couple of factors which we usually consider when choosing an LLM
or SLM, this includes the response speed and cost. Considering these factors,
we get a variety of comparisons, including the model’s output speed, which
basically focuses on the output tokens per second received while the model is
generating tokens (ie, after the first chunk has been received from the API).
These numbers are based on the median speed across all providers, and as
claimed by their observations, GPT\-4o\-mini seems to have the highest
output speed, which is pretty interesting, as seen in the following visual. We
also get a detailed comparison from Artificial Analysis on the cost of using
GPT\-4o mini vs other popular models. Here, the pricing is shown in terms of
both input prompts and output responses in USD per 1M (million) tokens.
GPT\-4o mini is quite cheap, considering you do not need to worry about
hosting it, setting up your own GPU infrastructure, and maintaining it! OpenAI
also mentions that GPT\-4o mini demonstrates strong performance in function
and tool calling, which means you can get better performance when using this
model to build AI Agents and complex Agentic AI systems that can fetch live
data from the web, reason, observe, and take actions with external systems and
tools. GPT\-4o mini also has improved long\-context performance compared to
GPT\-3\.5 Turbo and also performs well in tasks like extracting structured
data from receipts or generating high\-quality email responses when provided
with the full conversation history. Also Read: Here’s How You Can Use GPT 4o
API for Vision, Text, Image & More. For more information, visit OpenAI's
official blog and stay tuned for updates on the latest advancements in AI
technology. For more insights, check out our Rapid Innovation Blogs.

Top comments (0)