In the recent months, there has been a huge excitement and interest around Generative AI, there are tons of announcements/new innovations! It has been great for overall ecosystem, however, quite difficult for individual dev to catch up!
Okay! While diving into this field of Generative AI, one of the commonly used term is LLMs. What are LLMs? (if you are new!)
Large Language Models
Large Language Models (LLMs) are a type of artificial intelligence (AI) model designed to understand and generate human-like text based on vast amounts of data. Think of LLMs as a large math ball of information, compressed into one file and deployed on GPU for inference .
Some of the most common LLMs are OpenAI's GPT-3, Anthropic's Claude and Google's Gemini, or dev's favorite Meta's Open-source Llama.
In this blog, we will be discussing about some LLMs that are recently launched.
Now the obvious question that will come in our mind is Why should we know about the latest LLM trends.
Perhaps, it too long winding to explain it here. Hemant Mohapatra, a DevTool and Enterprise SaaS VC has perfectly summarised how the GenAI Wave is playing out.
There are more and more players commoditising intelligence, not just OpenAI, Anthropic, Google. Every new day, we see a new Large Language Model.
Top LLMs released in this Month
Here is the list of 5 recently launched LLMs, along with their intro and usefulness.
Firefunction-v2
Recently, Firefunction-v2 - an open weights function calling model has been released. Downloaded over 140k times in a week. It is designed for real world AI application which balances speed, cost and performance. It involve function calling capabilities, along with general chat and instruction following.
Key Features:
Enhanced Functionality: Firefunction-v2 can handle up to 30 different functions.
Real-World Optimization: Firefunction-v2 is designed to excel in real-world applications. It can handle multi-turn conversations, follow complex instructions.
Competitive Performance: Firefunction-v2 performs better than GPT-4o in terms of function calling capabilities, scoring 0.81 on various public benchmarks compared to GPT-4o's 0.80.
Cost-Effective and Fast: Firefunction-v2 is much more affordable than GPT-4o, costing only $0.9 per output token compared to GPT-4o's $15.
Deepseek v2
DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks.
- Excels in coding and math, beating GPT4-Turbo, Claude3-Opus, Gemini-1.5Pro, Codestral.
- Supports 338 programming languages and 128K context length.
- Fully open-sourced with two sizes: 230B and 16B
Meta Chameleon
Meta’s Fundamental AI Research team has recently published an AI model termed as Meta Chameleon. Chameleon is a unique family of models that can understand and generate both images and text simultaneously. This model does both text-to-image and image-to-text generation.
Key Features:
- Chameleon is versatile, accepting a combination of text and images as input and generating a corresponding mix of text and images.
- It can be applied for text-guided and structure-guided image generation and editing, as well as for creating captions for images based on various prompts.
- Additionally, Chameleon supports object to image creation and segmentation to image creation.
Nvidia's NemoTron-4 340B
Nvidia has introduced NemoTron-4 340B, a family of models designed to generate synthetic data for training large language models (LLMs). This innovative approach not only broadens the variety of training materials but also tackles privacy concerns by minimizing the reliance on real-world data, which can often include sensitive information.
NemoTron-4 also promotes fairness in AI. It creates more inclusive datasets by incorporating content from underrepresented languages and dialects, ensuring a more equitable representation.
Another significant benefit of NemoTron-4 is its positive environmental impact. Generating synthetic data is more resource-efficient compared to traditional training methods.
Hermes Theta
Hermes-2-Theta-Llama-3-8B is a cutting-edge language model created by Nous Research. This model is a blend of the impressive Hermes 2 Pro and Meta's Llama-3 Instruct, resulting in a powerhouse that excels in general tasks, conversations, and even specialised functions like calling APIs and generating structured JSON data.
Hermes-2-Theta-Llama-3-8B excels in a wide range of tasks. It helps you with general conversations, completing specific tasks, or handling specialised functions.
Key Features:
Conversational AI Agents: Create chatbots and virtual assistants for customer service, education, or entertainment.
Creative Content Generation: Write engaging stories, scripts, or other narrative content.
Detailed Analysis: Provide in-depth financial or technical analysis using structured data inputs.
Task Automation: Automate repetitive tasks with its function calling capabilities.
Note: This list is never ending with Qwen 2 72B etc :D
Interestingly, I've been hearing about some more new models that are coming soon.
As developers and enterprises, pickup Generative AI, I only expect, more solutionised models in the ecosystem, may be more open-source too.
I'd love to see models which can do:
Smarter Conversations: LLMs getting better at understanding and responding to human language. Hold semantic relationships while conversation and have a pleasure conversing with it.
Personal Assistant: Future LLMs might be able to manage your schedule, remind you of important events, and even help you make decisions by providing useful information. We already see that trend with Tool Calling models, however if you have seen recent Apple WWDC, you can think of usability of LLMs.
Learning and Education: LLMs will be a great addition to education by providing personalized learning experiences. Today, they are large intelligence hoarders.
To Conclude
As we have seen throughout the blog, it has been really exciting times with the launch of these five powerful language models.
Each one brings something unique, pushing the boundaries of what AI can do. Whether it's enhancing conversations, generating creative content, or providing detailed analysis, these models really creates a big impact.
At Portkey, we are helping developers building on LLMs with a blazing-fast AI Gateway that helps with resiliency features like Load balancing, fallbacks, semantic-cache. Drop us a star if you like it or raise a issue if you have a feature to recommend!
Portkey-AI / gateway
A Blazing Fast AI Gateway. Route to 200+ LLMs with 1 fast & friendly API.
Gateway streamlines requests to 200+ open & closed source models with a unified API. It is also production-ready with support for caching, fallbacks, retries, timeouts, loadbalancing, and can be edge-deployed for minimum latency.
✅ Blazing fast (9.9x faster) with a tiny footprint (~45kb installed)
✅ Load balance across multiple models, providers, and keys
✅ Fallbacks make sure your app stays resilient
✅ Automatic Retries with exponential fallbacks come by default
✅ Configurable Request Timeouts to easily handle unresponsive LLM requests
✅ Multimodal to support routing between Vision, TTS, STT, Image Gen, and more models
✅ Plug-in middleware as needed
✅ Battle tested over 300B tokens
✅ Enterprise-ready for enhanced security, scale, and custom deployments
How to Run the Gateway?
- Run it Locally for complete control & customization
- Hosted by Portkey for quick setup without…
Top comments (0)