DEV Community

Cover image for Open-sourcing Inductor LLM app starter templates: An out-of-the-box systematic approach for developing LLM applications
Natalie Fagundo for Inductor

Posted on • Originally published at inductor.ai on

Open-sourcing Inductor LLM app starter templates: An out-of-the-box systematic approach for developing LLM applications

We are excited to announce that we’ve open-sourced Inductor’s first LLM application starter template (GitHub repo here). This template will make it easy for you to get started with a systematic and iterative development process for building and shipping a RAG-based LLM application. Most templates show you how to get started by providing the simple scaffolding for an LLM application. Beyond that, this template also includes an end-to-end developer workflow optimized for the iterative development required to confidently and efficiently develop a production-grade LLM application. Some key components of the integrated developer workflow are:

  • Test suites to systematically test the LLM application and ensure its quality.
  • Hyperparameters to automate experimentation and rapidly find the LLM app design that delivers the results that you need, across choice of model, prompt, retrieval augmentation, and more.
  • An auto-generated playground that can be instantly and securely shared for prototyping, and that integrates with test suites and hyperparameters.
  • Integrated logging for monitoring your live traffic in order to understand usage, resolve issues, facilitate A/B testing, and further improve the application.

The first template that we are now releasing is a getting started template for developing an LLM-powered documentation Q&A bot. It takes just minutes to get started, and you can easily integrate and configure the application to work with your own sources of documentation.

Why we’re building LLM application starter templates

As the demand for LLM-powered applications and product features grows, developers and their teams find themselves in need of a comprehensive and streamlined approach to their end-to-end development lifecycle. In the world of traditional application development, there is a well-established development lifecycle and a clear methodology for testing quality. Developers can rely on a structured process that guides them from concept to production, ensuring robust and reliable applications. However, when it comes to developing applications with large language models (LLMs), the path is far less straightforward. LLM application development requires a more experimental and iterative approach, where developers must continually refine and optimize their applications to achieve the desired performance.

This iterative nature presents several challenges, and developers need ways to:

  • Rapidly prototype with stakeholders
  • Systematically evaluate their LLM application or feature
  • Identify and implement improvements
  • Observe behavior in production and take appropriate action

Without the right tools and workflows, navigating this process is time-consuming and complex.

Our solution: jumpstart systematic LLM app development

Enter Inductor’s LLM application starter templates. Designed to address the unique challenges of LLM application development, each template is open-source and provides an easy path to get started quickly and efficiently. Each template includes the necessary scaffolding to facilitate rapid prototyping as well as streamline the progression from prototype to production.

Here’s what you can expect from each Inductor LLM app starter template:

  • Application scaffolding: A robust foundation for your LLM application, ensuring you have all the essential components to build upon.
  • Out-of-the-box UI for rapid prototyping: With a single CLI command, you can start an auto-generated and securely shareable user interface that enables you to quickly prototype and gather feedback from stakeholders, via Inductor playgrounds.
  • Test suite scaffolding for easy evaluation-driven development: Each template includes an Inductor test suite that can be customized for your particular use case.
  • Experimentation scaffolding for systematic improvement: Each template includes built-in touchpoints for rapid and automated experimentation, which can be used with Inductor to automate and orchestrate testing of multiple different app variants in order to further improve your app.
  • Production logging integration for easy observability: Pre-built logging integration to maintain visibility and monitor your application’s performance in a production environment.

The Inductor platform, if used in conjunction with each starter template, provides the tools and systems needed to bring successful production-grade LLM applications to market:

Without Inductor vs With Inductor

Documentation Q&A bot starter template

The LLM-powered documentation Q&A bot (GitHub repo here) is a RAG-based LLM application that answers questions using one or more Markdown documents as a data source to provide context. This starter template is intended for use cases like Q&A on developer documentation that have one or more Markdown documents on which you would like to provide a question-answering (Q&A) capability.

App architecture

The application is implemented in Python and includes two main components:

  • An ETL (Extract, Transform, Load) process that parses, chunks, and embeds the relevant Markdown files and populates a vector database. (By default, the starter template uses an included sample Markdown file, and it can easily be configured to instead utilize your own Markdown files.)
  • The main application entrypoint, which takes a question as input, retrieves relevant content from the vector database, and uses an LLM to generate an answer to the question.

Specifically, the ETL process ingests one or more Markdown files, splits them into chunks by Markdown sections, and converts each section to an embedding using Sentence-Transformers' all-MiniLM-L6-v2 model (the default model for Chroma). The embeddings, along with their associated chunks and metadata, are stored locally in a Chroma vector database. The app can also easily be modified to instead utilize a different vector database.

The main application entrypoint consists of a function that takes a question as input, queries the vector database to retrieve the most relevant Markdown content based on the question, and then uses the OpenAI “gpt-4o” model to generate and return an answer to the question. The app can easily be modified to utilize a different LLM or LLM provider.

Scaffolding for an effective development workflow: Inductor integration

Going from prototype to production with an LLM application for your particular use case requires iterative testing, experimentation, and collaboration, as well as live observability so that you are not flying blind when you ship. To enable doing this rapidly and reliably, Inductor provides a platform for prototyping, evaluating, improving, and observing your LLM app. This starter template includes pre-built scaffolding that leverages Inductor’s capabilities to enable you to iterate quickly, ship reliably, and collaborate effectively.

Key features include:

  • Test Suites: Easily, rigorously, and continuously test your LLM application with Inductor’s test suites and CLI, to systematically find shortcomings in behavior, accuracy, or cost-efficacy.
  • Hyperparameters: Dramatically accelerate your experimentation and optimization process with Inductor hyperparameters, to rapidly find the LLM app design that eliminates any shortcomings.
  • Playgrounds: Quickly prototype and collaborate using a playground that is instantly auto-generated for your LLM app, can be easily and securely shared, and integrates seamlessly with your test suites and hyperparameters.
  • Logging: Gain deep insights into usage, detect and resolve issues, and continuously improve your application with Inductor’s rich, automated production logging.

With these features, Inductor empowers you to build, refine, and deliver your LLM applications more effectively than ever before.

Test suites

An Inductor test suite is included alongside the documentation Q&A bot application to evaluate its performance and enable you to systematically test and improve. The included test suite consists of a set of test cases, each containing a set of input (i.e., argument) values for your LLM application and an example of an output value that should be considered high-quality or correct. The test suite also includes a set of quality measures specifying how to evaluate the output of your LLM program. Quality measures can be programmatic, human, or LLM-powered. Using Inductor test suites you can:

  • Rapidly customize quality evaluation for your use case
  • Auto-generate shareable UIs for human evals, and automate with rigorous LLM-powered evals
  • Construct, evolve, and share test cases Automatically orchestrate test suite execution

Within the test suite that is part of this starter template, the included set of test cases can be split into the following categories:

  • Common questions with examples of high quality answers or target outputs
  • Unanswerable questions
  • Out of scope questions
  • Malicious questions

Additionally, test cases can easily be added by modifying the test_cases.yaml file within the starter template.

Along with these test cases are quality measures. This template uses LLM-powered quality measures to assess:

  • Can the question be answered with the provided context?
  • Is the target output contained in the answer provided?

Inductor enables easily running the test suite and viewing its results, an example of which are as follows:

Test suites

Hyperparameters

Improving an LLM application requires iterative experimentation in order to test different variants of your app’s design in order to find the design that yields the desired app behavior and quality. For example, this can include changing the prompt content, adjusting the prompt construction, selecting different models, tweaking model settings (e.g., temperature), and refining retrieval augmentation techniques. Inductor hyperparameters enable you to systematically and rapidly test and evaluate different configurations of your LLM application. This capability helps you assess the quality and cost-effectiveness of various setups, enabling rapid experimentation while maintaining organization and rigor.

In summary, hyperparameters enable you to:

  • Automate your experimentation in order to rapidly find the LLM app design that delivers the results that you need, across choice of model, prompt, retrieval augmentation, or anything else.
  • Automatically version and track all experiment results.

This starter template comes pre-built with two key hyperparameters (and you can also easily add more based on your needs):

Rephrasing vector database query: This hyperparameter controls whether to use the original user question to query the vector database or to rephrase the question to generate a more informative and relevant vector database query. Rephrasing can incorporate additional keywords and phrases to improve retrieval accuracy. However, this strategy may introduce trade-offs, such as increased latency and higher costs due to the additional LLM API call required for rephrasing. By using this hyperparameter, you can easily experiment with and evaluate the effectiveness of using the original versus the rephrased question.

Number of contextual results retrieved from the vector database: This hyperparameter sets the number of results to be retrieved from the vector database. Adjusting this setting allows you to control the breadth of information retrieved, which can impact the comprehensiveness and relevance of the responses provided by your documentation Q&A bot.

When you run the test suite included in the starter template, Inductor will automatically run and evaluate your LLM app on all included test cases for all combinations of values of the hyperparameters, so that you can easily and rapidly identify the best hyperparameter configuration for your app (by simply clicking on the “Hparam summary” button seen in the screenshot above). Additionally, as seen in the next section below, you can also interactively experiment with different hyperparameter values via a Customer Playground.

Therefore, by leveraging hyperparameters, you can rapidly evolve your LLM application to achieve the desired balance between performance, accuracy, and cost, ultimately enhancing the user experience of your documentation Q&A bot in a cost-efficient manner.

Playgrounds

Inductor Custom Playgrounds enable you to auto-generate a powerful, instantly shareable playground for your LLM app with a single CLI command - and run it within your environment. Playgrounds provide a developer-first approach to prototype and iterate on LLM programs fast, as well as loop collaborators (including less-technical collaborators) into your development process, and get their feedback early and often.

In particular, with Custom Playgrounds you can:

  • Auto-generate a custom playground UI for your LLM app
  • Run securely in your environment, with your code and data sources
  • Share instantly
  • Iteratively develop test suites for systematic evaluation and improvement

All interactive executions of your LLM program in your playground are automatically logged, so that you can easily replay them, and never lose your work.

Inductor enables you to start a playground for your documentation Q&A bot with a single CLI command. An example of such a playground is as follows:

Playgrounds

Developers and domain experts can interactively experiment with different combinations of hyperparameters in playgrounds to rapidly prototype and collaborate:

Collaborate

Inductor additionally makes it easy to add active and previous work performed in playgrounds to test suites, enabling you to rapidly transition work from prototyping to systematic evaluation, and ultimately to production, by improving test suites and evaluating changes across essential points of testing and validation.

Live logging and monitoring

When your LLM program is running in production, live monitoring becomes essential for several reasons:

  • Ensuring intended behavior: Continuous monitoring helps confirm that your application is performing as expected and delivering useful, accurate results.
  • Issue detection and resolution: By keeping an eye on real-time operations, you can quickly identify and fix any emerging issues before they escalate.
  • Usage analysis for improvement: Understanding how users actually interact with your application enables you to gather valuable insights and make data-driven enhancements.
  • Feedback loop to development: Insights and issues identified via live monitoring can be fed back into the development process, enabling continuous improvement.

To facilitate this, the starter template includes the ability (via inclusion of the Inductor decorator) to use Inductor to automatically log multiple elements of the LLM application’s behavior. This automatically enables you to view and analyze many aspects of the LLM app’s behavior, including:

  • User input: Captures the queries inputted by users.
  • LLM output: Logs the responses generated by the app.
  • Latency: Tracks response times to measure and ensure responsiveness.
  • RAG system elements: Monitors components of the retrieval-augmented generation (RAG) system, such as the set of text snippets retrieved from the vector database in order to respond to any given query.

With this comprehensive live monitoring capability, you can maintain high standards of reliability, performance, and user satisfaction for your LLM applications.

What next?

To get started in minutes, visit the GitHub repo, clone the documentation Q&A starter template, and follow the simple steps provided to start systematically developing your LLM application.

You can also learn more about Inductor by visiting our documentation or booking a demo.

Top comments (0)