Andrew R. Freed

Posted on Oct 7, 2021

How to Improve your Assistant

#ai #analytics #improvement #conversationalai

Take 40% off Conversational AI by entering fccfreed into the discount code box at checkout at manning.com.

Imagine that you have been contracted to diagnose problems and offer solutions for a virtual assistant belonging to a company called FICTITIOUS INC.

FICTITIOUS INC have deployed their virtual assistant to production, but aren’t achieving the success metrics they outlined for the solution. The virtual assistant was supposed to reduce the burden on other customer service channels, but these channels haven’t seen a significant reduction in user activity. FICTITIOUS INC knows how to troubleshoot their traditional applications but doesn’t know where to start troubleshooting their virtual assistant.

FICTITIOUS INC needs to quickly drill down into WHY their assistant isn’t performing well. They need to find out if their conversational flow doesn’t work for users, or if the intent mapping they have done doesn’t work, or if there’s some other core problem with their assistant.

FICTITIOUS INC is in good company. Deploying a virtual assistant to production isn’t the end – it’s only the beginning! Figure 1 demonstrates the continuous improvement in a virtual assistant’s lifecycle. Continuous improvement is broadly applicable in software projects, and it’s applicable for virtual assistants.

Figure 1 Improvement is part of a continuous cycle in the life of a virtual assistant. This cycle continues even after an assistant is deployed to production!

This cycle doesn’t stop for FICTITIOUS INC when they deploy their assistant. The first improvement cycle after deploying to production is the most informative. This is where FICTITIOUS INC learns which of their assumptions were correct and which ones need to be revisited.

Deploying a virtual assistant to production isn’t the end – it’s only the beginning!

In this article, we learn how FICTITIOUS INC can identify where their assistant needs the most improvement. FICTITIOUS INC has chosen “successful containment” as their key success metric and we use that to drive our investigation. Containment for virtual assistants is the percentage of conversations handled entirely by the virtual assistant. (A conversation which isn’t escalated to a human is “contained”). FICTITIOUS INC’s “successful containment” modifies this definition: only conversations that finish with the least process flow are “successfully contained”.

With successful containment in mind, we use a data-driven approach to evaluate their virtual assistant, including the dialog flows and intent identification. We conduct a single evaluation of FICTITIOUS INC’s virtual assistant. FICTITIOUS INC needs to evaluate their virtual assistant many times over its lifetime. Let’s start by looking for the first improvement FICTITIOUS INC needs to make.

"Change is the only constant in life." Heraclitus, Greek philosopher

Using a success metric to determine where to start improvements

Analyzing a virtual assistant can feel like a daunting process. Many different types of analyses exist. Where should FICTITIOUS INC begin their analysis? Analysis should center on a success metric. This success metric forms a guiding principle for all analysis and improvement. Any potential analysis or improvement work should be prioritized based on how it impacts a success metric.

FICTITIOUS INC’s chosen success metric is “successful containment”. “Successful containment” is better aligned with their users’ needs better than “containment”. If a user quits a conversation before getting an answer, that conversation is contained, but FICTITIOUS INC doesn’t consider the conversation a success. Table 1 contrasts containment and successful containment.

Table 1 Sample scenarios and how they are measured. FICTITIOUS INC uses "successful containment".

FICTITIOUS INC uses three data point to start the analysis of their assistant: overall successful containment, volume by intent, and successful containment by intent. These data points enable analysis of each intent. From these data points we can find which intents have the largest impact on the overall successful containment.

To simplify the analysis, we only consider five of FICTITIOUS INC’s intents. These intents and their associated metrics are shown in Table 2. Based on this table, which intent would you explore first?

Table 2 FICTITIOUS INC's metrics for conversation volume and successful containment, broken down per intent.

#appointments has the lowest overall containment at thirty percent, but it’s a low-volume intent. And #reset_password is the largest source of uncontained conversations, comprising two-thirds of the total uncontained conversations. If FICTITIOUS INC can fix what’s wrong in those two intents, their virtual assistant will have much higher containment and be more successful. Because #reset_password has the biggest problem, FICTITIOUS INC should start there.

Improving the first flow to fix containment problems

Solving problems is easier when you know what the specific problems are. FICTITIOUS INC has identified the #reset_password flow as the biggest source of non-contained conversations. This is the most complex of FICTITIOUS INC’s process flows, and this is probably not a coincidence. Let’s reacquaint ourselves with FICTITIOUS INC’s #reset_password flow in Figure 2.

Figure 2 FICTITIOUS INC’s reset password conversational flow. Any conversation that visits P00 is counted as a password reset conversation. Only conversations that include P08 are successfully contained.

A password reset conversation always starts with dialog P00 and P01. After that, the password reset flow has only one path to success. The successful path includes dialog nodes P00, P01, P03, P05, and P08. These nodes form a conversion funnel which is shown in Figure 3. Every conversation that includes P03 must necessarily include P01, but some conversations that include P01 don’t include P03. A conversation that includes P01 but not P03 has “drop-off” at P01. By measuring the drop-off between P01, P03, P05, and P08, FICTITIOUS INC can narrow in on why password reset conversations fail to complete.

Figure 3 A successful password reset flow, visualized as a funnel. A conversation that completes each step in this funnel is successfully contained.

FICTITIOUS INC can analyze their password reset process flow via a conversion funnel by analyzing their virtual assistant logs and counting how many times each dialog node is invoked. Then, they can compute the drop-off in each step of the funnel. This high-level analysis illuminates what parts of the process flow require further analysis. The parts causing the most drop-off should be improved first.

How can you run log analysis in your specific virtual assistant platform?
You can perform the analysis multiple ways in this section. The specific steps vary by virtual assistant platform. For instance, your virtual assistant platform may make it easy to find conversations that include one dialog node but not another.
The techniques in this article are purposely generic, even if somewhat inefficient. If your virtual assistant provider doesn’t include analytic capabilities, you should build the analyses described in this section.

FICTITIOUS INC’s password reset conversion funnel metrics can be found in Table 3.

Table 3 Conversion funnel for FICTITIOUS INC's password reset dialog flow. This analysis shows a steep drop-off after asking for the user ID and the security question.

The conversion funnel tells FICTITIOUS INC that one-third of password reset flow conversations included the question “What is your user ID?” but doesn’t include the question “What is your date of birth?”. The “What is your user ID?” question has a thirty-three percent drop-off rate. It’s also the largest source of drop-offs, causing containment failure on sixty-five total conversations. The entire conversion funnel can be visualized as in Figure 4.

Figure 4 FICTITIOUS INC's password reset flow conversion funnel annotated with the number of conversations containing each step of the dialog. P01 and P05 cause most of the total drop-off.

The severe drop-offs between P01 and P03, as well as P05 and P08, are both detrimental to FICTITIOUS INC’s successful containment metric. The P05 to P08 drop-off is more severe from a relativistic perspective (38% vs 33%), but the P01 to P03 drop-off affects more conversations in total. FICTITIOUS INC should first focus on the P01 to P03 drop-off.

Analyzing the first source of drop-off in the first intent

The first detailed analysis for the P01 to P03 drop-off is to find out what users are saying to the assistant between P01 and P03. Depending on their virtual assistant platform, FICTITIOUS INC can query for:

What users say immediately after P01
What users say immediately before P03

This query tells FICTITIOUS INC what users are saying in response to the “What is your User ID?” question. FICTITIOUS INC can inspect a small sample of these responses, perhaps ten or twenty, in their initial investigation. The query results are shown in Table 4. All valid FICTITIOUS INC user IDs follow the same format: four to twelve alphabetic characters followed by one to three numeric characters. Any other user ID string is invalid. Before reading ahead, see if you can classify the response patterns.

afreed1
don't know, that’s why I called
ebrown5
fjones8
hgarcia3
I don't know it
I'm not sure
jdoe3
mhill14
nmiller
no idea
pjohnson4
pdavis18
tsmith
vwilliams4

The analysis of these responses is shown in Figure 5. The analysis surfaces several patterns in the response utterances. The expected response to P01 is a valid user ID consisting of four-to-twelve letters followed by one to three numbers, but this isn’t what users always provide!

Figure 5 Patterns in user utterances given in response to FICTITIOUS INC's question P01: "What is your User ID?"

FICTITIOUS INC can transform these patterns into actionable insights that improves successful containment.

Insight #1: Many users don’t know their user ID. FICTITIOUS INC could build an intent for #i_dont_know. When a user is asked for their ID and responds with #i_dont_know, the assistant could provide the user with instructions on how to find their user ID. Or the assistant could be programmed to validate the user another way.
Insight #2: Many users provide their user ID incorrectly. This may be because they don’t know their user ID, or they may have entered it incorrectly. These users could be given another chance to enter their ID or guidance on what a valid user ID looks like.

That’s all for this article. If you want to learn more about the book, check it out on Manning’s liveBook platform here.

DEV Community

How to Improve your Assistant

Using a success metric to determine where to start improvements

Improving the first flow to fix containment problems

Analyzing the first source of drop-off in the first intent

Top comments (0)

Read next

🚀 21 Tools to take your dev skills to the moon 🌝

PlantUMLApp 3.0 - Let's play with AI Multi-Modality

Ten Hard Problems in Artificial Intelligence We Must Get Right

Simplify Restaurant Reservations with Lyzr.ai's Chatbot-Powered App