Orquesta provides your product teams with no-code collaboration tooling to experiment, operate, and monitor LLMs and remote configurations within your SaaS. Using Orquesta, you can easily perform prompt engineering, prompt management, experiment in production, push new versions directly to production, and roll back instantly.
Cohere, on the other hand, is an API that offers language processing to any system. It trains massive language models and puts them behind a very simple API.
Source: Cohere.
This article guides you through integrating your SaaS with Orquesta and Cohere using our Python SDK. By the end of the article, you'll know how to set up a prompt in Orquesta, perform prompt engineering, request a prompt variant using our SDK code generator, map the Orquesta response with Cohere, send a payload to Cohere, and report the response back to Orquesta for observability and monitoring.
Prerequisites
For you to be able to follow along in this tutorial, you will need the following:
Jupyter Notebook (or any IDE of your choice).
Orquesta Python SDK.
Integration
Follow these steps to integrate the Python SDK with Cohere.
Step 1 - Install SDK and create a client instance
pip install orquesta-sdk
pip install cohere
To create a client instance, you need to have access to the Orquesta API key, which can be found in your workspace https://my.orquesta.dev/<workspace-name>/settings/developers
Copy it and add the following code to your notebook to initialize the Orquesta client.
import time
import cohere
from orquesta_sdk import OrquestaClient, OrquestaClientOptions
from orquesta_sdk.helpers import orquesta_cohere_parameters_mapper
from orquesta_sdk.prompts import OrquestaPromptMetrics
# Initialize Orquesta client
from orquesta_sdk import OrquestaClient, OrquestaClientOptions
api_key = "ORQUESTA-API-KEY"
options = OrquestaClientOptions(
api_key=api_key,
ttl=3600
)
client = OrquestaClient(options)
Explanation:
Import the
time
module to calculate the total time for the program to runWe also import
cohere
, to be able to use the APIThe
OrquestaClient
and theOrquestaClientOptions
classes that are already defined in theorquesta_sdk
module, are imported.The Orquesta SDK has helper functions that map and interface between Orquesta and specific LLM providers. For this integration, we will make use of the
orquesta_cohere_parameters_mapper
helperTo log all the interactions with Cohere, we use the
OrquestaPromptMetrics
classWe create the instance of the
OrquestaClientOptions
and configure it with theapi_key
and thettl
(Time to Live) in seconds for the local cache; by default, it is 3600 seconds (1 hour)
Finally, an instance of the OrquestaClient
class is created and initialized with the previously configured options object. This client
instance can now interact with the Orquesta service using the provided API key for authentication.
Step 2 - Enable Cohere models in Model Garden
Head over to Orquesta's Model Garden and enable the Cohere models you want to use.
Step 3 - Set up a completion prompt and variants
The next step is to set up your completion prompt; ensure it is completion and not chat to use Cohere.
To create a prompt, click on Add Prompt, provide a prompt key, a Domain (optional) and select Completion.
Once that is set up, create your first completion, give it a name prompt, add all the necessary information, and click Save.
Step 4 - Request a variant from Orquesta using the SDK
Our flexible configuration matrix allows you to define multiple prompt variants based on custom context. This allows you to work with different prompts and hyperparameters with, for example, environment, country, locale or user segment. The Code Snippet Generator makes it easy to request a prompt variant.
Once you open the Code Snippet Generator, copy the code snippet and paste it into your editor.
# Query the prompt from Orquesta
prompt = client.prompts.query(
key="data_completion",
context={
"environments": ["test"]
},
variables={ }
)
Step 5 - Map the Orquesta response to Cohere using a Helper
We have already established at the beginning of this tutorial that for us to be able to integrate these two technologies, we will make use of a Helper provided by Orquesta, which is orquesta_cohere_parameters_mapper
.
# Start time of the completion request
start_time = time.time()
print(f'Start time: {start_time}')
co = cohere.Client('COHERE-API-KEY') # Insert your Cohere API key
completion = co.generate(
**orquesta_cohere_parameters_mapper(prompt.value),
model=prompt.value.get("model"),
prompt=prompt.value.get('prompt'),
)
# End time of the completion request
end_time = time.time()
print(f'End time: {end_time}')
# Calculate the difference (latency) in milliseconds
latency = (end_time - start_time) * 1000
print(f'Latency is: {latency}')
Explanation:
We start the
time
using the time module.An instance of the Cohere
client
is created.Using the
generate()
endpoint, we can generate realistic text conditioned on a given inputThe
generate()
endpoint also receives other body parameters, such as the prompt as a required string, the model, the num_generations, max_tokens, temperature, etc. For simplicity, we are only working with model and promptWe end the
time
and calculatelatency
.
Step 6 - Report analytics back to Orquesta
After each query, Orquesta generates a log with a Trace ID. Using the add_metrics()
method, you can add additional information, such as the llm_response, metadata, latency, and economics.
# Tokenize responses
prompt_tokenization = co.tokenize(prompt.value.get('prompt'))
completion_tokenization = co.tokenize(completion.generations[0].text)
prompt_tokens = len(prompt_tokenization.tokens)
completion_tokens = len(completion_tokenization.tokens)
total_tokens = prompt_tokens + completion_tokens
# Report the metrics back to Orquesta
metrics = OrquestaPromptMetrics(
economics={
"total_tokens": total_tokens,
"completion_tokens": completion_tokens,
"prompt_tokens": prompt_tokens,
},
llm_response=completion.generations[0].text,
latency=latency,
metadata={
"finish_reason": completion.generations[0].finish_reason,
},
)
prompt.add_metrics(metrics=metrics)
Conclusion
With these easy steps, you have successfully integrated Orquesta with Cohere, and this is just the tip of the iceberg because, as of the time of writing this article, Orquesta only supports the generate()
endpoint, but in the future, you can use the other endpoints, such as embed
, classify
, summarize
, detect-language
, etc.
Orquesta supports other SDKs such as Angular, Node.js, React, and TypeScript. Refer to our documentation for more information.
Full Code Example
import os
import time
import cohere
from orquesta_sdk import OrquestaClient, OrquestaClientOptions
from orquesta_sdk.helpers import orquesta_cohere_parameters_mapper
from orquesta_sdk.prompts import OrquestaPromptMetrics
# Initialize Orquesta client
from orquesta_sdk import OrquestaClient, OrquestaClientOptions
api_key = "ORQUESTA-API-KEY"
options = OrquestaClientOptions(
api_key=api_key,
ttl=3600
)
client = OrquestaClient(options)
co = cohere.Client('COEHERE-API-KEY')
prompt = client.prompts.query(
key="data_completion",
context={
"environments": ["test"]
},
variables={ },
metadata={"user_id":45515}
)
# Start time of the completion request
start_time = time.time()
print(f'Start time: {start_time}')
completion = co.generate(
**orquesta_cohere_parameters_mapper(prompt.value),
model=prompt.value.get("model"),
prompt=prompt.value.get('prompt'),
)
# End time of the completion request
end_time = time.time()
print(f'End time: {end_time}')
# Calculate the difference (latency) in milliseconds
latency = (end_time - start_time) * 1000
print(f'Latency is: {latency}')
# Tokenize responses
prompt_tokenization = co.tokenize(prompt.value.get('prompt'))
completion_tokenization = co.tokenize(completion.generations[0].text)
prompt_tokens = len(prompt_tokenization.tokens)
completion_tokens = len(completion_tokenization.tokens)
total_tokens = prompt_tokens + completion_tokens
# Tokenize responses
prompt_tokenization = co.tokenize(prompt.value.get('prompt'))
completion_tokenization = co.tokenize(completion.generations[0].text)
prompt_tokens = len(prompt_tokenization.tokens)
completion_tokens = len(completion_tokenization.tokens)
total_tokens = prompt_tokens + completion_tokens
# Report the metrics back to Orquesta
metrics = OrquestaPromptMetrics(
economics={
"total_tokens": total_tokens,
"completion_tokens": completion_tokens,
"prompt_tokens": prompt_tokens,
},
llm_response=completion.generations[0].text,
latency=latency,
metadata={
"finish_reason": completion.generations[0].finish_reason,
},
)
prompt.add_metrics(metrics=metrics)
Top comments (0)