Andrii Melashchenko

Posted on Jan 4, 2024 • Originally published at blog.javatask.dev on Jan 4, 2024

AI Unleashed: Running Generative Models Locally. Introduction

#ai #beginners #tutorial #productivity

Introduction

This article is the first in the series about running a generative AI model locally on consumer-grade hardware to give you a safe place to experiment and understand your use cases. I will try to answer the question of how to safely start experimenting with your real data.

Gartner states, "In theory, at least, this (Generative AI) will increase worker productivity". Another Gartner recommendation is to "Start Inside ... with ... Off-the-shelf products". But it's a theory.

In this series, I'll offer practical steps for implementing Gartner's recommendations for regulated (internally and externally) environments. The series promotes the usage of open-source large language models (LLMs) that are the heart of Generative AI (GenAI).

Note. If you don't have any restriction to process your real world data on major Public Cloud providers, like AWS Bedrock, Azure AI or Google Vertex AI. Go for the Public Cloud GenAI offerings!!!

Key Definitions

Generative AI: This is a subfield of artificial intelligence that uses models and algorithms to generate content. It can create anything from written text to images or music, by learning patterns from existing data and producing new content that mimics it.

Large Language Models (LLMs): These are AI models trained on a vast amount of text data. They can generate human-like text by predicting the probability of a word given the previous words used in the text. Examples include OpenAI's GPT-3 and Google's Gemini.

AI Agents: These are systems or software that can perform tasks or make decisions autonomously. They use Generative AI to "understand" their environment and perform actions to achieve specific goals.

CUDA: This stands for Compute Unified Device Architecture. It's a parallel computing platform and application programming interface model created by Nvidia. It allows software developers to use a CUDA-enabled graphics processing unit for general-purpose processing.

Chatbot: A chatbot is an AI-powered software designed to interact with humans in their natural languages. These interactions can occur in both text and voice formats.

Machine Customers: This is a term coined by Gartner to describe AI systems that can autonomously perform tasks or make purchasing decisions on behalf of human users or other systems.

Public Cloud GenAI offerings: These are Generative AI models or services offered by public cloud providers like Google, Amazon, and Microsoft. They provide pre-trained models and services which developers can use to integrate AI capabilities into their applications.

Steam Revolution is here

This article has a steam engine on its cover because steam and latte electric engines changed the world. The same is happening with AI, specifically with AI Agents.

According to LangChain: "The core idea of (AI) agents is to use a language model to choose a sequence of actions. In chains, a sequence of actions is hardcoded (in code). In agents, a language model is used as a reasoning engine to determine which actions to take and which order."

So GenAI becomes the "brain", an orchestrator of tools you may use to achieve a given goal or react to the situation. This agent can work 24/7, process enormous amounts of data and be "objective". One type of such agent Gartner call " Machine Customers".

Application Covered by the Series

This series will have two big blocks:

Setting up and running your "brain" = LLM locally
- With CPU only
- With CPU and Nvidia GPU, you need GPUs that has CUDA support
Configuring apps that use "brain" to deliver value:
- ChatGPT like chatbot
- GitHub Copilot like VS Code AI Assistant
- AWS Knowledge Bases like super-powered search on private docs
- AI Agents - Demo on how "brain" can use external APIs

Note. Diagram shows that Public Cloud providers offer you LLM APIs, so you don't need to worry about hardware and other supportive software.

Manage your expectations

Technology is here, but hardware is still evolving. Following my tutorials on setting up GenAI locally, you soon feel that some models are "dummer" than ChatGPT 4 (state-of-the-art, closed model). They are slower because you may not have the latest and greatest CUDA-enabled GPU.

BUT, local LLMs technology is good enough for you to start experimenting with GenAI to get value out of it. Remember GenAI is a company-wide initiative, not an IT initiative!

Conclusion

Setting up a local Generative AI model can be a game-changer, providing an avenue to explore, experiment, and build expertise in its use cases. While the technology is available, remember that hardware is still in a state of evolution. Despite some models being slower and less sophisticated than state-of-the-art models, leveraging these tools locally offers a valuable opportunity to experiment and identify the best use cases for Generative AI. This is not just an IT initiative, but a company-wide effort that can revolutionize productivity and efficiency. As we move forward, the fusion of AI with our daily tools and tasks will become increasingly integral to our work and lives.

DEV Community

AI Unleashed: Running Generative Models Locally. Introduction

Introduction

Key Definitions

Steam Revolution is here

Application Covered by the Series

Manage your expectations

Conclusion

Top comments (0)

Read next

Host a static website on AWS: A detailed step-by-step guide

🚀 React Patterns: Essential Tips and Tricks for Developers

AI Models Get Human-Like Memory with New Test-Time Regression Framework

Understanding LLM Concepts: Orchestrators, Evaluators, Validators, and Guardrails