DEV Community

Cover image for Run LLMs Locally with Ollama & Semantic Kernel in .NET: A Quick Start
Frank Noorloos
Frank Noorloos

Posted on

Run LLMs Locally with Ollama & Semantic Kernel in .NET: A Quick Start

Introduction

As AI becomes increasingly central to modern applications, developers in the .NET ecosystem are exploring ways to incorporate powerful language models with minimal friction. Ollama and Semantic Kernel provide a compelling approach for running generative AI applications locally, giving teams the flexibility to keep data on-premises, reduce network latency, and avoid recurring cloud costs.

In this post, you’ll learn how to:

  1. Set up Ollama on your machine.
  2. Pull and serve a local model (like llama3.2).
  3. Integrate it with Semantic Kernel in a .NET 9 project.

By the end, you’ll have a simple yet powerful local AI application — no cloud dependency required.


What is Ollama?

Ollama is a self-hosted platform for running language models locally, eliminating the need for external cloud services. Key benefits include:

  • Data Privacy: Your data never leaves your environment.
  • Lower Costs: Eliminate the “pay by API call” model of external services.
  • Ease of Setup: A quick and straightforward way to get up and running with local AI.

With Ollama, you pull the models you need (for example, llama3.2), serve them locally, and integrate them just like you would with a remote API—except it all stays on your machine or server.


Introducing Semantic Kernel

Semantic Kernel is an open-source SDK from Microsoft that enables developers to seamlessly integrate AI capabilities into .NET applications. It allows you to combine AI models with APIs and existing application logic to build intelligent and context-aware solutions.

Key features include:

  • Function Orchestration: Efficiently manage and compose multiple AI functions, such as generating text, summarization, and Q&A, for your applications.
  • Extensibility: Support for a wide range of AI model providers, including OpenAI, Azure OpenAI, and local deployment options like Ollama.
  • Context Management: Retain and utilize contextual information, such as conversation history and user preferences, to create personalized and coherent AI-driven experiences.

By pairing Semantic Kernel with Ollama, you can explore powerful AI interactions entirely on your own hardware. This ensures you maintain full control over your data and resources, and eliminates the ongoing costs tied to cloud-based APIs. It’s a great way to experiment, prototype, or run offline—without sacrificing the advanced capabilities of AI.


Prerequisites

  1. .NET 9 SDK

    Make sure you have the .NET 9 SDK installed on your system.

  2. Ollama

     ollama serve
    
  • Pull the desired model, for example:

     ollama pull llama3.2
    

Step-by-Step Integration

1. Create a New .NET 9 Project

Open your terminal or command prompt and run:

dotnet new console -n OllamaSemanticKernelDemo
cd OllamaSemanticKernelDemo
Enter fullscreen mode Exit fullscreen mode

2. Add the Required NuGet Packages

Inside your project directory, add the Semantic Kernel package and the Ollama connector:

dotnet add package Microsoft.SemanticKernel --version 1.32.0
dotnet add package Microsoft.SemanticKernel.Connectors.Ollama --version 1.32.0-alpha
Enter fullscreen mode Exit fullscreen mode

3. Ensure Ollama is Running and Pull Your Model

In another terminal window:

ollama serve
Enter fullscreen mode Exit fullscreen mode

Then pull the model you want to use:

ollama pull llama3.2
Enter fullscreen mode Exit fullscreen mode

Ollama will listen on http://localhost:11434 by default.

Example Code

In your Program.cs (or any .cs file you designate as the entry point), use the following code:

using System.Diagnostics.CodeAnalysis;
using Microsoft.SemanticKernel.ChatCompletion;
using Microsoft.SemanticKernel.Connectors.Ollama;

public static class OllamaSemanticKernelDemo
{
    [Experimental("SKEXP0070")]
    public static async Task Main(string[] args)
    {
        // 1. Initialize the Ollama client with the local endpoint and model
        using var ollamaClient = new OllamaClient(
            endpoint: new Uri("http://localhost:11434"),
            model: "llama3.2");

        // 2. Create a chat service from the Ollama client
        var chatService = ollamaClient.AsChatCompletionService();

        // 3. Define the AI's role/behavior via a system message
        var chatHistory = new ChatHistory("You are an expert about comic books");

        // 4. User initiates the conversation
        chatHistory.AddUserMessage("Hi, I'm looking for comic suggestions");
        OutputLastMessage(chatHistory);

        // 5. Get the AI's reply
        var reply = await chatService.GetChatMessageContentAsync(chatHistory);
        chatHistory.Add(reply);
        OutputLastMessage(chatHistory);

        // 6. User follows up with more info
        chatHistory.AddUserMessage("I love sci-fi, I'd like to learn something new about the galatic empire, any suggestion");
        OutputLastMessage(chatHistory);

        // 7. AI responds with more tailored suggestions
        reply = await chatService.GetChatMessageContentAsync(chatHistory);
        chatHistory.Add(reply);
        OutputLastMessage(chatHistory);
    }

    private static void OutputLastMessage(ChatHistory chatHistory)
    {
        var lastMessage = chatHistory.Last();
        Console.WriteLine($"{lastMessage.Role}: {lastMessage.Content}\n");
    }
}
Enter fullscreen mode Exit fullscreen mode

Example output

user: Hi, I'm looking for comic suggestions

assistant: I'd be happy to help you find some great comics.

Before we get started, can you please tell me a bit more about what you're in the mood for? Here are a few questions to help me narrow down some recommendations:

1. What genre are you interested in? (e.g., superhero, fantasy, horror, romance, etc.)
2. Are there any specific characters or franchises that you enjoy?
3. Do you prefer classic comics from the past (e.g., 50s-90s), or more modern releases?
4. Is there a particular tone you're looking for? (e.g., light-hearted, dark and gritty, adventurous, etc.)
5. Are you looking for something new to read, or are you open to exploring older comics?

Let me know your answers to these questions, and I'll do my best to suggest some fantastic comics that fit your tastes!

user: I love sci-fi, I'd like to learn something new about the galatic empire, any suggestion

assistant: Science fiction is an amazing genre.

If you're interested in learning more about the Galactic Empire from a comic book perspective, here are some suggestions:

**Must-read comics:**

1. **Darth Vader** (Marvel Comics, 2015-2019) - A solo series that explores Darth Vader's backstory and his fall to the dark side.
2. **Star Wars: Tarkin** (Dynamite Entertainment, 2016) - A graphic novel that delves into Grand Moff Tarkin's personality and motivations.
3. **Star Wars: Lando** (Marvel Comics, 2018-2020) - A series that explores the life of Lando Calrissian, a key character in the Galactic Empire.

**Recommended comics for Imperial insights:**

1. **Star Wars: Rebel Run** (Dark Horse Comics, 2009) - A miniseries that shows the inner workings of the Empire's security forces and their efforts to track down Rebel Alliance members.
2. **Star Wars: The Old Republic - Revan** (Dark Horse Comics, 2014-2015) - A limited series based on the popular video game, which explores the complexities of the Mandalorian Wars and the Imperial forces involved.

**Classic comics with Imperial connections:**

1. **Tales of the Jedi** (Dark Horse Comics, 1993-1996) - A comic book series that explores various events in the Star Wars universe, including some involving the Galactic Empire.
2. **Star Wars: X-Wing** (Dark Horse Comics, 1998-2000) - A comic book series based on the popular video game, which focuses on a group of Rebel Alliance pilots fighting against Imperial forces.

**Recent comics with Imperial themes:**

1. **Star Wars: The High Republic** (Marvel Comics, 2020-present) - An ongoing series that explores a new era in the Star Wars universe, including events and characters connected to the Galactic Empire.
2. **Star Wars: Resistance** (IDW Publishing, 2018-2020) - A comic book series set during the First Order's rise to power, which includes connections to the original trilogy.

These comics offer a mix of Imperial perspectives, character studies, and historical context that can help deepen your understanding of the Galactic Empire. May the Force be with you!


Enter fullscreen mode Exit fullscreen mode

How It Works

  1. Ollama is listening on localhost:11434. All requests go to your local server.
  2. By specifying "llama3.2", you tell Ollama which model to use for inference.
  3. The ChatHistory class from Semantic Kernel logs each turn in the conversation; the AsChatCompletionService() sends those messages to Ollama, which generates the next reply.

Running the Application

To run your newly created console application:

dotnet run
Enter fullscreen mode Exit fullscreen mode

Interact with your console and observe how the local AI model responds to your prompts—no cloud calls required.

Performance Considerations

  • Resource Usage: Larger models demand more CPU/RAM resources. Llama3.2 is very small and only requires 2.0GB.
  • Latency: Local inference is often faster than cloud-based solutions, but ensure your hardware can handle the load.

Use Cases

  1. Prototyping: Experiment quickly without incurring cloud costs or dealing with rate limits.
  2. Internal Knowledge Bases: Keep information in-house for compliance and data privacy.
  3. Edge or Offline Applications: Perfect for scenarios with limited internet access or strict data governance.

Conclusion

By running AI models locally with Ollama and coordinating them through Semantic Kernel, you can build private, cost-effective, and high-performance .NET applications. After installing Ollama, serving your model, and adding the Semantic Kernel + Ollama NuGet packages, you can host generative AI experiences entirely on your own infrastructure. This setup is especially well-suited for local development, letting you experiment and prototype without relying on cloud services or external APIs. In doing so, you maintain complete control over your environment while reducing both complexity and recurring costs.

As local AI continues to evolve, the combination of Ollama and Semantic Kernel lays the groundwork for building robust, self-contained solutions that scale from simple prototypes to production-ready applications—without the hidden costs of cloud dependencies.

Next Steps

  • Experiment with different models in Ollama to find the best fit for your application.
  • Tweak your system and user prompts to see how they change the model’s responses.
  • Preview of What’s Next: In an upcoming post, we’ll dive into Retrieval Augmented Generation (RAG) to give your LLM context-aware responses sourced from your own data—still running entirely on local infrastructure.

Happy coding! 🎄


Additional Resources

Top comments (0)