DEV Community

Cover image for Integrate Orquesta with Cohere using Python SDK
Olumide Shittu for Orquesta

Posted on

Integrate Orquesta with Cohere using Python SDK

Orquesta provides your product teams with no-code collaboration tooling to experiment, operate, and monitor LLMs and remote configurations within your SaaS. Using Orquesta, you can easily perform prompt engineering, prompt management, experiment in production, push new versions directly to production, and roll back instantly.

Cohere, on the other hand, is an API that offers language processing to any system. It trains massive language models and puts them behind a very simple API.

Cohere
Source: Cohere.

This article guides you through integrating your SaaS with Orquesta and Cohere using our Python SDK. By the end of the article, you'll know how to set up a prompt in Orquesta, perform prompt engineering, request a prompt variant using our SDK code generator, map the Orquesta response with Cohere, send a payload to Cohere, and report the response back to Orquesta for observability and monitoring.

Prerequisites

For you to be able to follow along in this tutorial, you will need the following:

  • Jupyter Notebook (or any IDE of your choice).

  • Orquesta Python SDK.

Integration

Follow these steps to integrate the Python SDK with Cohere.

Step 1 - Install SDK and create a client instance

pip install orquesta-sdk
pip install cohere
Enter fullscreen mode Exit fullscreen mode

To create a client instance, you need to have access to the Orquesta API key, which can be found in your workspace https://my.orquesta.dev/<workspace-name>/settings/developers

Copy it and add the following code to your notebook to initialize the Orquesta client.

import time
import cohere
from orquesta_sdk import OrquestaClient, OrquestaClientOptions
from orquesta_sdk.helpers import orquesta_cohere_parameters_mapper
from orquesta_sdk.prompts import OrquestaPromptMetrics


# Initialize Orquesta client
from orquesta_sdk import OrquestaClient, OrquestaClientOptions

api_key = "ORQUESTA-API-KEY"

options = OrquestaClientOptions(
    api_key=api_key,
    ttl=3600
)

client = OrquestaClient(options)
Enter fullscreen mode Exit fullscreen mode

Explanation:

  • Import the time module to calculate the total time for the program to run

  • We also import cohere, to be able to use the API

  • The OrquestaClient and the OrquestaClientOptions classes that are already defined in the orquesta_sdk module, are imported.

  • The Orquesta SDK has helper functions that map and interface between Orquesta and specific LLM providers. For this integration, we will make use of the orquesta_cohere_parameters_mapper helper

  • To log all the interactions with Cohere, we use the OrquestaPromptMetrics class

  • We create the instance of the OrquestaClientOptions and configure it with the api_key and the ttl (Time to Live) in seconds for the local cache; by default, it is 3600 seconds (1 hour)

Finally, an instance of the OrquestaClient class is created and initialized with the previously configured options object. This client instance can now interact with the Orquesta service using the provided API key for authentication.

Step 2 - Enable Cohere models in Model Garden

Head over to Orquesta's Model Garden and enable the Cohere models you want to use.

Enable Cohere models in Model Garden

Step 3 - Set up a completion prompt and variants

The next step is to set up your completion prompt; ensure it is completion and not chat to use Cohere.

To create a prompt, click on Add Prompt, provide a prompt key, a Domain (optional) and select Completion.

Set up a completion prompt and variants

Once that is set up, create your first completion, give it a name prompt, add all the necessary information, and click Save.

Set up a completion prompt and variants

Step 4 - Request a variant from Orquesta using the SDK

Our flexible configuration matrix allows you to define multiple prompt variants based on custom context. This allows you to work with different prompts and hyperparameters with, for example, environment, country, locale or user segment. The Code Snippet Generator makes it easy to request a prompt variant.

Code Snippet Generator

Once you open the Code Snippet Generator, copy the code snippet and paste it into your editor.

Code Snippet Generator

# Query the prompt from Orquesta
prompt = client.prompts.query(
  key="data_completion",
  context={
    "environments": ["test"]
  },
  variables={  }
)
Enter fullscreen mode Exit fullscreen mode

Step 5 - Map the Orquesta response to Cohere using a Helper

We have already established at the beginning of this tutorial that for us to be able to integrate these two technologies, we will make use of a Helper provided by Orquesta, which is orquesta_cohere_parameters_mapper.

# Start time of the completion request
start_time = time.time()
print(f'Start time: {start_time}') 

co = cohere.Client('COHERE-API-KEY') # Insert your Cohere API key
completion = co.generate(
    **orquesta_cohere_parameters_mapper(prompt.value),
    model=prompt.value.get("model"),
    prompt=prompt.value.get('prompt'),
)

# End time of the completion request
end_time = time.time()
print(f'End time: {end_time}')

# Calculate the difference (latency) in milliseconds
latency = (end_time - start_time) * 1000
print(f'Latency is: {latency}')
Enter fullscreen mode Exit fullscreen mode

Latency

Explanation:

  • We start the time using the time module.

  • An instance of the Cohere client is created.

  • Using the generate() endpoint, we can generate realistic text conditioned on a given input

  • The generate() endpoint also receives other body parameters, such as the prompt as a required string, the model, the num_generations, max_tokens, temperature, etc. For simplicity, we are only working with model and prompt

  • We end the time and calculate latency.

Step 6 - Report analytics back to Orquesta

After each query, Orquesta generates a log with a Trace ID. Using the add_metrics() method, you can add additional information, such as the llm_response, metadata, latency, and economics.

# Tokenize responses
prompt_tokenization = co.tokenize(prompt.value.get('prompt'))
completion_tokenization = co.tokenize(completion.generations[0].text)

prompt_tokens = len(prompt_tokenization.tokens)
completion_tokens = len(completion_tokenization.tokens)
total_tokens = prompt_tokens + completion_tokens

# Report the metrics back to Orquesta
metrics = OrquestaPromptMetrics(
    economics={
        "total_tokens": total_tokens,
        "completion_tokens": completion_tokens,
        "prompt_tokens": prompt_tokens,
    },
    llm_response=completion.generations[0].text,
    latency=latency,
    metadata={
        "finish_reason": completion.generations[0].finish_reason,
    },
)

prompt.add_metrics(metrics=metrics)
Enter fullscreen mode Exit fullscreen mode

Conclusion

With these easy steps, you have successfully integrated Orquesta with Cohere, and this is just the tip of the iceberg because, as of the time of writing this article, Orquesta only supports the generate() endpoint, but in the future, you can use the other endpoints, such as embed, classify, summarize, detect-language, etc.

Orquesta supports other SDKs such as Angular, Node.js, React, and TypeScript. Refer to our documentation for more information.

Full Code Example

import os
import time
import cohere
from orquesta_sdk import OrquestaClient, OrquestaClientOptions
from orquesta_sdk.helpers import orquesta_cohere_parameters_mapper
from orquesta_sdk.prompts import OrquestaPromptMetrics

# Initialize Orquesta client
from orquesta_sdk import OrquestaClient, OrquestaClientOptions

api_key = "ORQUESTA-API-KEY"

options = OrquestaClientOptions(
    api_key=api_key,
    ttl=3600
)

client = OrquestaClient(options)
co = cohere.Client('COEHERE-API-KEY')

prompt = client.prompts.query(
  key="data_completion",
  context={
    "environments": ["test"]
  },
  variables={  },
  metadata={"user_id":45515}
)

# Start time of the completion request
start_time = time.time()
print(f'Start time: {start_time}') 

completion = co.generate(
    **orquesta_cohere_parameters_mapper(prompt.value),
    model=prompt.value.get("model"),
    prompt=prompt.value.get('prompt'),
)

# End time of the completion request
end_time = time.time()
print(f'End time: {end_time}')

# Calculate the difference (latency) in milliseconds
latency = (end_time - start_time) * 1000
print(f'Latency is: {latency}')

# Tokenize responses
prompt_tokenization = co.tokenize(prompt.value.get('prompt'))
completion_tokenization = co.tokenize(completion.generations[0].text)

prompt_tokens = len(prompt_tokenization.tokens)
completion_tokens = len(completion_tokenization.tokens)
total_tokens = prompt_tokens + completion_tokens

# Tokenize responses
prompt_tokenization = co.tokenize(prompt.value.get('prompt'))
completion_tokenization = co.tokenize(completion.generations[0].text)

prompt_tokens = len(prompt_tokenization.tokens)
completion_tokens = len(completion_tokenization.tokens)
total_tokens = prompt_tokens + completion_tokens

# Report the metrics back to Orquesta
metrics = OrquestaPromptMetrics(
    economics={
        "total_tokens": total_tokens,
        "completion_tokens": completion_tokens,
        "prompt_tokens": prompt_tokens,
    },
    llm_response=completion.generations[0].text,
    latency=latency,
    metadata={
        "finish_reason": completion.generations[0].finish_reason,
    },
)

prompt.add_metrics(metrics=metrics)
Enter fullscreen mode Exit fullscreen mode

Top comments (0)