DEV Community

Govind.S.B

Posted on Dec 19, 2023

Mixtral, OpenAI and the race to bottom

Competition is good

A good competition in the market always benefits the end consumer. This has been time and time again proven
And right now in the AI/LLM space we are seeing something similar happening. A race to the bottom

See before this open source roar in the space there was only OpenAI with their models GPT3.5 and GPT4 and their pricing blew us away, GPT3.5 gives you a good enough AI for most general purpose applications we build (for the current tech). But the thing to consider is there was only them leading it , if you wanted to build an app cost effectively it was only them and they really enjoyed their time up in the ladder

But as google put it :
"We Have No Moat, And Neither Does OpenAI"

Open source LLMs

Open source models were lagging behind gpt3.5 in terms of performance and cost , they were expensive to run and returns dumb answers lol, But then gigachad mistral dropped mistral their 7B model that they call "tiny" , That made everyone lose their mind it was soo good and soon the community was flooded with mistral finetunes.

Mistral then recently dropped Mixtral, a new MoE model where they used their mistral models in a creative way. A mixture of experts, basically they instead of training a new single model had 8 7B models specialized in various tasks and then combined them, and the neat thing is due to sharing of weights among the models its smaller and while inferring it just used 13B parameters ( 2 experts are used while each token is generated ). So basically this thing can run on commercial hardware that people have ... And this thing performs as well as GPT3.5 or better ... This is important , an open source model that beats the current cheapest feasible closed source AI.

Open AI can charge their pricing cause its their model and they are the only providers so consider infra cost and their R&D and profits. With an open source model there is no moat, the model is free and out in the open. So this is what happened later, there are these services which provider inference APIs they host the model and give you the API just like open AI does but charge just for the infra. Its obvious that they are gonna undercut the pricing but by how much is just mind blowing

The price drop

GPT 3.5 costs $1 per million input tokens and $2 per million output tokens , on avg $1.5 per million tokens
Together AI, the leading AI infra provider put out their pricing as $0.6 per miliion tokens
Seeing this , Anyscale another such provider was like we will do you one better , $0.5 per million tokens
It doesnt stop there , DeepInfra dropped their pricing to $0.27 per million token

The cost drop is a staggering 82%

With all this happening openrouter came up , they are this service which autoroutes your request to the cheapest infra available provider for us to benefit from the race to bottom , they decided to host mixtral for free now. Yep free! I taked with their team on discord and they said they want to support a lot of models and was having them on beta for people to test ( or as i would say get used to their ecosystem )

One thing to note here is dont forget that lots of providers also are heavily funded by VC money to burn and THEY WILL burn it, you can see this in the form of heavily subsidized prices and free credits to capture market

Together AI gives you $25 in free credits once you sign up, thats about 41 million tokens, you are not running out of that if you are doing personal pet projects.

The race to the bottom is here and it is here to stay for a while imo, So profit while you can and build cool stuff
I have this repo where I wrote a general purpose function to interact with all these providers to use mixtral,check that out if you wanna jump in fast . Star the thing if you find it useful and thats it from me, thanks for reading.

BulletLaunch / Mixtral-Inference-APIs

a convenience script used internally having a collection of inference API providers with cheap infra cost

Mixtral Inference APIs

make the best out of the race to bottom

a convenience script we use internally having a collection of providers with cheap infra cost for LLM inference

if you like what we are doing

Please leave a star on the repo

Support us on buymeacoffee

Checkout our socials and follow us there

Usage

Rename the .env.template file to .env and add the corresponding credentials for the providers you want to use (check pricing and performance comparison below)
You can either run the script directly for testing the endpoints or use the inference function in your program logic

To directly use the inference function copy the .env file and llm_inference_script to your project and import the function

example :

from llm_inference_script import llm_inference
from dotenv import load_dotenv
import os
load_dotenv()
KEY = os.getenv("PROVIDER_API_KEY") # put correct provider name here
output = llm_inference

…

View on GitHub

TLDR ; GPT3.5 is dead and mixtral killed it , you can run it for free at this point. Checkout repo for my API endpoint collection to get into the hype fast

UPDATE : Openrouter is not free anymore

DEV Community

Mixtral, OpenAI and the race to bottom

Competition is good

Open source LLMs

The price drop

BulletLaunch / Mixtral-Inference-APIs

a convenience script used internally having a collection of inference API providers with cheap infra cost

Mixtral Inference APIs

Usage

Top comments (0)

Read next

Privacy-First Crypto Exchanges: Benefits and Challenges

React Strict Mode: Enhancing Code Quality and Preparing for the Future

Conditional Rendering in React: Dynamically Rendering UI Elements

[Boost]