DEV Community

Cover image for Rise of Local LLMs ?
Sarthak Sharma
Sarthak Sharma

Posted on

Rise of Local LLMs ?

In the not-so-distant past, dabbling in generative AI technology meant leaning heavily on proprietary models. The routine was straightforward: snag an OpenAI key, and you're off to the races, albeit tethered to a pay-as-you-go scheme. This barrier, however, started to crumble with the introduction of ChatGPT, flinging the doors wide open for any online user to experiment with the technology sans the financial gatekeeping.

Yet, 2023 marked a thrilling pivot as the digital landscape began to bristle with Open Source Models. Reflecting on an interview with Sam Altman, his forecast of a future dominated by a handful of models, with innovation primarily occurring through iterations on these giants, appears to have missed the mark. Despite GPT-4's continued reign at the apex of Large Language Models (LLMs), the Open Source community is rapidly gaining ground.

Among the burgeoning Open Source projects, one, in particular, has captured my attention: Ollama. Contrary to initial impressions, Ollama isn't just another model tossed into the ever-growing pile. Instead, it's a revolutionary application designed to empower users to download and run popular models directly on their local machines, democratizing access to cutting-edge AI technology.

Ollama

What's the Hype?

Okay, instead of just telling you, let me show you something.

Download Ollama onto your local machine.

Once that's done, try chatting with the model here:

In case you encounter a CORS error, run this command:

OLLAMA_ORIGINS=* ollama serve
Enter fullscreen mode Exit fullscreen mode

With that, you're now able to chat with the Llama2 Model, in an article, running on your local machine.

Take a look at the code—it's just a simple fetch request:

fetch("http://localhost:11434/api/generate", {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        model: "llama2",
        prompt: document.querySelector("input").value,
        stream: true,
      }),
    });
Enter fullscreen mode Exit fullscreen mode

llama2 isn't the only Model available. They have a long list of models in their library that you can download and run locally. All you have to do is change model name , model: "llama2", in the fetch request.

Fun right. Now you have powerful models running on your computer (offline).

What's fascinating is that you can run it on any machine with 8GB of RAM (7B models). To run bigger models you might need more RAM.

How This Can Change Everything?

Major players like OpenAI and Cohere offer similar APIs, but there's always a cost associated with them, not to mention the privacy concerns.

Sure, hosting and running these models online is an option, but just imagine the possibilities that come with operating these models offline.

For instance, take a look at this plugin, Text Gen, that you can use in Obsidian. You can set Ollama as your LLM Provider in the settings and specify the model to be used like this:

Ollama and Obsidian

And voilà,

You now have auto-completion in Obsidian, just like that.

Obsidian and ollama ai

Alternatively, you can use Continue as a replacement for GitHub's Copilot. Simply download codellama with Ollama, select Ollama as your LLM provider, and voilà—you've got a local Copilot running on your computer.

Continue VScode Extension

I mean, now you can do this...

OFFLINE LLMS

No internet needed. Zero cost.

Interested in more cool experiments built on Ollama? Here are 10:

And...

Imagine adding an Ollama provider in your project. If the user has Ollama installed, you can provide powerful AI features to your users at no cost.

Conclusion

Exciting times

This is, of course, just the start. The newest versions of Ollama feature some really cool models, such as Gemma, a family of lightweight, state-of-the-art open models built by Google DeepMind that were launched last week.

Or consider the LLava model, which can endow your app with computer vision capabilities.

LLava Computer Vision

Ollama is just the beginning, I would say, but it's not the only project embracing this philosophy. There's also an older project called WebLLM that downloads open-source models into your browser cache, allowing you to use them offline.

So, where does it go from here? No one really knows. But if I had to guess, I'd say these models will soon be part of operating systems. I mean, they almost have to be. It's possible that Apple is already working on one—who knows? Their machines certainly have the powerful chips needed. 🤷‍♂️

Offline LLMs (Large Language Models) will be the future, especially given concerns about privacy and security. Before long, even your mobile devices will be equipped with LLMs. The question is, how will we utilize them? Trust me, chatting is not the only use case; there will be many more. The latest mobile devices like the Samsung Galaxy S24 Ultra have already started taking leverage of AI powered apps, and they're relatively well-integrated into the OS as a whole. My guess is that as time progresses and local LLMs get more powerful, this will further empower our mobile devices in even more novel (and secure) ways.

I would encourage you to try your hand at this technology. You can visit Ollamahub to find different model files and prompt templates to give your local LLMs a twist.

Consider using a library like LangChain to further build upon this technology, adding long-term memory and context awareness. You can even try a low-code setup locally with Flowise.

Isn't this exciting? I'd love to hear what you think! 🌟 Have any cool ideas started bubbling up in your mind? Please, don't hesitate to share your thoughts in the comments below—I'm all ears!

And if you're itching to chat about a potential idea or just need a bit of guidance, why not drop me a line on Twitter @sarthology? I can't wait to connect with you. See you around! 💬✨

Top comments (26)

Collapse
 
chuangtc profile image
Jason TC Chuang • Edited

I have done a survey to see which local LLMs are mostly used in Taiwanese AI community. 2024-2-20~2024-2-29. Here are the top 5 results.

  1. Ollama (15 votes)
  2. Transformers (14 votes)
  3. Langchain (13 votes)
  4. llama.cpp (12 votes)
  5. LM Studio (7 votes)
Collapse
 
sarthology profile image
Sarthak Sharma

Ollama on the rise

Collapse
 
isslerman profile image
Marcos Issler

I have used llava to OCR some image documents, but without success. Sometimes works, sometimes says that is not possible, low resolution and others answers. Someone know another model that can do OCR image-to-text of the text tha has in the image, like a letter ? Tks!

Collapse
 
sarthology profile image
Sarthak Sharma

There is a way(hack) by which you can download any huggingface’s open-source image to text model locally.

youtu.be/fnvZJU5Fj3Q?si=YiiHdwRw90...

Additionally, you can also upscale an image using a different AI model and then feed it to llava for better results.

I’m sure the llava model will get better with time.

Collapse
 
isslerman profile image
Marcos Issler

Thanks Sarthak, I will try it!

Collapse
 
ranjancse profile image
Ranjan Dailata

@isslerman Please do not rely on the traditional LLMs for the OCR purposes. They are not good at it. You should be using the OCR Providers for ex: OCR API

Collapse
 
isslerman profile image
Marcos Issler

Thanks Ranjan, I will take a look it. The project itself is more than an OCR. We are using LLMs for other proposes and the OCR is a step. But yes, OCR API can be a good solution too. I will take a look if there is any open source that I can use or check the providers, because the documents in this case is sensitive to be shared.
Bests, Marcos Issler

Thread Thread
 
ranjancse profile image
Ranjan Dailata • Edited

Sure, it's understandable. There are always pros and cons when it comes to the open source options. Please do remember about the accuracy issues with the open source. However, Since you have mentioned the sensitive document OCR, I would highly recommend you to consider the public cloud options such as AWS, Azure. They provide you the ultimate solution, and they are also GDPR and SOC 2 complaint. Generally, we need to keep several things in mind. Cost, Accuracy, Security, Reliability, Scalability etc. If you can do something with the in-house open source and think that works the best, please proceed with the same. However, it's good to experiment and see the best options.

Collapse
 
utkarsh profile image
Utkarsh Talwar • Edited

Local LLMs integrated into the OS could be such a blessing for people like me. I've already noticed I use ChatGPT much more compared to Google search when I need a concept explained quickly and succinctly. In fact, "define ______" and "what is ______?" were two of my most frequent search queries up until last year.
Image description

Now I just ask ChatGPT. When things don't make sense, I just ask it to explain like I'm five or ten, and it works wonderfully 80% of the times. Having a local LLM capable of doing this at your fingertips will make this even smoother!

Collapse
 
sarthology profile image
Sarthak Sharma

Exactly. Recently, the Arc Search App( a browser actually) released a feature that allows you to pinch a page, and it will summarize it for you. Although these are simple use cases of Generative AI, they give you a glimpse of where we are heading.

Collapse
 
siph profile image
Chris Dawkins

I've been using Ollama + ROCM with my fairly underpowered RX 580 (8gb), and have had a lot of success with different models. I'm surprised at how well everything works and can see myself building a home server dedicated to AI workloads in the relatively near future.

Collapse
 
fpaghar profile image
Fatemeh Paghar

The rise of local Large Language Models (LLMs) is an absolute game-changer! Your exploration of Ollama and its potential applications is truly fascinating. The ability to run powerful AI models locally, offline, with no associated costs, opens up a realm of possibilities.

I love the practical examples you provided, from integrating Ollama with Obsidian for auto-completion to using it as a replacement for GitHub's Copilot in VSCode. The prospect of adding an Ollama provider to projects for free AI features is revolutionary.

The diversity of models available, from Gemma for lightweight open models to LLava for computer vision capabilities, showcases the versatility of this approach. It's exciting to think about the future integration of these models into operating systems, potentially even on mobile devices like the Samsung Galaxy S24 Ultra.

Collapse
 
krlz profile image
krlz

This was really inspiring thanks for sharing!

Collapse
 
aadi2510 profile image
aadi-2510

Got goosebumps just by thinking infinite number of possibilities LLMs are going to make us believe in and in not so distant future.
This 5 minute read really pushed all my brain cells to imagine what are heading towards.

Would definitely love to read more such tech updates.

Amazing Article!!!

Collapse
 
lincemathew profile image
LinceMathew

Good Explorations into Local LLMs. Langchain is a marvellous invention

Collapse
 
asadravian profile image
Muhammad Asadullah (Asad)

One of the great posts I have come across lately! Great work! 🙏

Collapse
 
prestonp profile image
Preston Pham

Thanks for sharing this, now I don't need to worry about internet connection all the time when I do travel coding.

Collapse
 
bcouetil profile image
Benoit COUETIL 💫

Great ! Does the popularity of models on their website align with their performance, from your point of view ? Is it the same for coding purpose ?

Collapse
 
sarthology profile image
Sarthak Sharma

Of course, it does. With the right hardware, it can amaze you. Try using this amazing experiment by hugging face huggingface.co/chat/

Collapse
 
captain_drack profile image
Akshat Austin

I used this AI tool, and it's fantastic. Thank you for the Guidance and wonderful post.

Collapse
 
sarthology profile image
Sarthak Sharma

Appreciate your kind words. 😊

Collapse
 
hmksdh3213 profile image
Ahamd hassan

Ollama on the rise

Collapse
 
mandarvaze profile image
Mandar Vaze

What configuration do you suggest for running Local LLMs ?
I have a Windows 10 machine with 16GB RAM. I run ollama inside WSL2
I tried several 7B models including codellama, and response is VERY SLOW.
Some 3B models are only slightly better.
This windows PC does not have a GPU

OTOH, my work macbook pro M2 with 16GB RAM has respectable response time.

Collapse
 
qsagehint profile image
Q sageHint

thanks for your content.
I have some questions regarding to OCR.
Which service is best do you think?
My past client is considering using AWS / Google Vision OCR, but worry about getting the desired results..

Some comments may only be visible to logged-in visitors. Sign in to view all comments.