DEV Community

Cover image for Unity MR Part 10: Voice SDK
tststs for Taikonauten

Posted on • Updated on • Originally published at Medium

Unity MR Part 10: Voice SDK

👀 Stumbled here on accident? Start with the introduction!

📚 The aim of this article is to incorporate the Meta - Voice SDK into our application, enabling it to respond to specific word sequences. This functionality will allow us to interact with the door we rendered in the previous article, providing a more immersive and interactive experience in our application.


ℹ️ If you find yourself facing any difficulties, remember that you can always refer to or download the code from our accompanying GitHub repository


Login to the Unity Store Unity Asset Store - The Best Assets for Game Making and add Meta - Voice SDK - Immersive Voice Commands to your library.

Return to the Unity Editor and install the Meta - Voice SDK through the Package Manager. You can directly access My Assets via Window -> My Assets.

Adding the Meta - Voice SDK with the Package Manager

Adding the Meta - Voice SDK with the Package Manager

Before we begin utilizing the Meta - Voice SDK, it's necessary to create an account on Wit.ai . You can conveniently use your existing Meta developer account for this purpose by clicking on Continue with Meta on the Wit.ai landing page.

ℹ️ Wit.ai (often referred to as Won in Translation, by David Jacobs from its URL) is a natural language processing (NLP) service created by Facebook. It enables developers to build applications that can understand human language by providing a powerful and easy-to-use API.

Once you've set up your Wit.ai account, you can create a new application at Wit.ai. If you're unsure about what to name the application, simply go with unity_example.

After your app is created, click on unity_example. Then, as illustrated in the upcoming screenshot, add an Utterance for the Intent open_door. This step is crucial for training your application to recognize and respond to specific user inputs related to the action of opening a door.

ℹ️ An Utterance in the context of natural language processing (NLP), linguistics, and conversational AI, refers to a sequence of words or sounds made by a speaker. It's essentially a unit of speech. In practical terms, an utterance can be as short as a single word (like a command or an exclamation) or as long as a complete sentence or multiple sentences.

ℹ️ In Wit.ai, an Intent represents the purpose or goal behind a user's input, typically a spoken or written phrase. It's a fundamental concept in natural language understanding (NLU) and is used to categorize user utterances into specific actions that the application should perform.

1. Fill in the phrase “open the door” as the Utterance and select the open_door intent in the Intent dropdown.

Adding the Utterance for the Intent open_door

Adding the Utterance for the Intent open_door

2. In the Utterance input field select the word “open”. This will open the entity form. Fill in action as the entity and click on + Create Entity.

Creating the action entity

Creating the action entity

3. In the Utterance input field select the word “door”. This will open the entity form. Fill in entity as the entity and click on + Create Entity.

Creating the entity entity

Creating the entity entity

4. After creating the entities, they will now be highlighted as seen in the next screenshot.

The Utterance view after creating our entities

The Utterance view after creating our entities

Now, click on Train and Validate. Once the training process is complete (indicator on the top left next to the app name), return to the Unity Editor and navigate to Oculus → Voice SDK → Get Started. In the first dialog enter the Wit Server Access Token. The access token can be found on the Wit.ai website under Management → Settings within your unity_example app.

Enter the Server Access Token

Enter the Server Access Token

You will be prompted to choose a location for saving your Wit asset. Save it in your Assets/Settings folder and name it wit. Once you have saved the asset, the following screen will appear.

The Wit Configurations after entering the server token

The Wit Configurations after entering the server token

Now, click on Specify Assemblies and uncheck everything except the first entry:

De-selecting unnecessary assemblies

De-selecting unnecessary assemblies

Click on Generate Manifest then close the Voice Hub for now.

The next step is to respond to the Utterance “open the door”. Create a new Script under Assets/Scripts/VoiceSDK and name it OpenDoorConduit. Add the Script to the XR Origin (XR Rig) via Add Component.

Creating the OpenDoorConduit Script under Scripts/VoiceSDK

Creating the OpenDoorConduit Script under Scripts/VoiceSDK

Adding the OpenDoorConduit Script to the XR Origin (XR Rig) GameObject

Adding the OpenDoorConduit Script to the XR Origin (XR Rig) GameObject

The Script looks as follows:

using System.Collections;
using System.Collections.Generic;
using Meta.WitAi;
using UnityEngine;

namespace Taikonauten.Unity.ArticleSeries
{
    public class OpenDoorConduit : MonoBehaviour
    {
        private const string OPEN_DOOR_INTENT = "open_door";

        [MatchIntent(OPEN_DOOR_INTENT)]
        public void OpenDoor(string[] values)
        {
            Debug.Log("OpenDoorConduit -> OpenDoor()");

            string action = values[0];
            string entity = values[1];

            if (!string.IsNullOrEmpty(action) && !string.IsNullOrEmpty(entity))
            {
                Debug.Log("OpenDoorConduit -> OpenDoor(): match");
            }
        }
    }
}

Enter fullscreen mode Exit fullscreen mode

As you can see we are using MatchIntent with the open_door Intent which we created in a previous step. The method OpenDoor is automatically called by the ConduitDispatcher.

ℹ️ ConduitDispatcher is used to manage voice commands, direct them to the appropriate processing channels, or handle the distribution of responses or actions triggered by voice input.

Now, proceed by adding the App Voice Experience component to the XR Origin (XR Rig) GameObject. Once you have added this component, it's necessary to select the Wit configuration, which you have created during the Get Started process for Wit.

Adding App Voice Experience to the XR Origin (XR Rig) GameObject and selecting the Wit configuration

Adding App Voice Experience to the XR Origin (XR Rig) GameObject and selecting the Wit configuration

For the final step, we must add the Response Matcher. This component generates the Android Intent open_door in the Android manifest file, and upon receiving a successful response from Wit, it triggers our OpenDoor method, which we defined earlier.

To do this, open the Understanding Viewer via Oculus → Understanding Viewer. Enter “open the door“ in the Utterance field and click Send.

Now right-click value and select Add response matcher to XR Origin (XR Rig).

Adding the response matcher via the Understanding Viewer

Adding the response matcher via the Understanding Viewer

Adding the value matcher via the Understanding Viewer

Adding the value matcher via the Understanding Viewer

This will result in the following screenshot:

The XR Origin (XR Rig) after adding the response matcher and value matchers

The XR Origin (XR Rig) after adding the response matcher and value matchers

Next, create an entry under On Multi Value Event and select the values as follows:

Selecting the OpenDoor method for the new On Multi Value Event

Selecting the OpenDoor method for the new On Multi Value Event

Android Setup

To ensure compatibility with the Voice SDK, some adjustments are needed for the Android build. Access the Project Settings by going to Edit → Project Settings. This step is crucial for configuring your project to work seamlessly with the Voice SDK on Android.

1. In the Project Settings, navigate to the Player section. There, change the Minimum API Level to Android 10.0 (API Level 29) and the Target API Level to Android 12L (API Level 32). These options are located under Other Settings.

Setting the Minimum and Target API level for Android

Setting the Minimum and Target API level for Android

2. In the Project Settings, go to the Player section, find Application Entry Point under Other Settings and change the value from GameActivity to Activity.

Setting Activity instead of GameActivity as the Application Entry Point

Setting Activity instead of GameActivity as the Application Entry Point

ℹ️ You can find more information about application entry points in the Unity documentation Unity - Manual: Android application entry points. As of the time of writing, it's important to note that GameActivity is not compatible with the Meta Voice SDK.

3. In the Project Settings, go to the Player section, open the Publishing Settings tab and enable Custom Main Manifest. After activating this option, you will find the manifest file located at Assets/Plugins/Android/AndroidManifest.xml. This step is essential for gaining direct control over the Android manifest file, allowing you to make specific customizations needed for your project.

Enable the Custom Main Manifest option for Android

Enable the Custom Main Manifest option for Android

Open the AndroidManifest file in your code editor and modify its content as follows.

<?xml version="1.0" encoding="utf-8"?>
<manifest
    xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:tools="http://schemas.android.com/tools">
    <application>
        <activity android:name="com.unity3d.player.UnityPlayerActivity"
                  android:theme="@style/UnityThemeSelector">
            <intent-filter>
                <action android:name="android.intent.action.MAIN" />
                <category android:name="android.intent.category.LAUNCHER" />
            </intent-filter>
            <meta-data android:name="unityplayer.UnityActivity" android:value="true" />
            <meta-data android:name="unityplayer.SkipPermissionsDialog" android:value="false" />
        </activity>
    </application>
</manifest>

Enter fullscreen mode Exit fullscreen mode

Let's review the changes we've made. This will help us understand the adjustments and their impact on the application's functionality and compatibility.

  1. We have removed the GameActivity block, as only one Activity is permitted and we previously opted for UnityActivity instead of GameActivity in the Project Settings.

  2. We included the unityplayer.SkipPermissionsDialog setting with a value of false to ensure that required permission dialogs are not automatically bypassed. This adjustment is important for guaranteeing that the application appropriately prompts users for necessary permissions, aligning with best practices for user consent and app functionality.


Adding some UI

Before we can test our implementation, it is essential to integrate some user interface elements into the scene. We will add a UI Label that becomes visible when the application starts listening to the user's voice. This label will then disappear once the recording ceases, triggered by a successful response from Wit. This UI component plays a crucial role in providing visual feedback to the user about the state of voice recognition within the application.

To set up the user interface for voice recognition feedback, follow these steps:

1. Create an empty GameObject in your scene and name it UI.
2. Within the UI GameObject, add another empty GameObject and name it VoiceSDK.
3. With the VoiceSDK GameObject selected in the hierarchy attach the Lazy Follow script via Add Component. Configure the script as follows:

Image description

Enable the Custom Main Manifest option for Android

4. Add a Canvas to the VoiceSDK GameObject by right-clicking it and navigating to UI -> Canvas.

5. Inside the Canvas, add a Text element by choosing UI → Text - TextMeshPro.

This setup creates a structured UI hierarchy in your scene, with the VoiceSDK GameObject serving as a container for the elements that will provide visual feedback for voice recognition. The Lazy Follow script will manage the positioning, and the TextMeshPro element will display the necessary information or status messages.

Your hierarchy should now look as follows:

Hierarchy after adding the UI

Hierarchy after adding the UI

Select the EventSystem GameObject in your hierarchy and delete and add components like follows:

The UI EventSystem

The UI EventSystem

Don’t forget to remove the Standalone Input Module if any.

Next, we need to configure our Canvas and Text (TMP) elements. Select the Canvas GameObject in your hierarchy and set it up as follows (Add components as seen in the screenshot via Add Component).

ℹ️ We won't be delving into UI-related topics in this article series. For those who need assistance or guidance with Unity's UI system, I recommend checking out the Unity documentation. It provides comprehensive resources and tutorials that can help you understand and effectively use Unity's UI tools in your projects.

The complete Canvas configuration

The complete Canvas configuration

The complete Text (TMP) configuration

The complete Text (TMP) configuration

Lastly, deactivate the VoiceSDK GameObject, as we won't be displaying it immediately. The visibility of the UI will be managed later through our script.

ℹ️ If you're not familiar with how to deactivate a GameObject in the Unity Inspector, I recommend consulting the Unity documentation: Unity - Manual: Deactivate GameObjects.


Updating our MRArticleSeriesController Script

For the final step in this article, we'll update our MRArticleSeriesController Script to enable and disable the Voice Service using the left Trigger. This modification will allow for straightforward control of the Voice Service directly through user input, enhancing the interactive capabilities of our application.

using System.Collections;
using System.Collections.Generic;
using Meta.WitAi;
using Meta.WitAi.Requests;
using UnityEngine;
using UnityEngine.InputSystem;
using UnityEngine.XR.ARFoundation;
using UnityEngine.XR.ARSubsystems;
using UnityEngine.XR.Interaction.Toolkit;

namespace Taikonauten.Unity.ArticleSeries
{
    public class MRArticleSeriesController : MonoBehaviour
    {
        [SerializeField] private ARAnchorManager anchorManager;
        [SerializeField] private GameObject door;
        [SerializeField] private GameObject uI;
        [SerializeField] private InputActionReference buttonActionLeft;
        [SerializeField] private InputActionReference buttonActionRight;
        [SerializeField] private VoiceService voiceService;
        [SerializeField] private XRRayInteractor rayInteractor;
        private VoiceServiceRequest voiceServiceRequest;
        private VoiceServiceRequestEvents voiceServiceRequestEvents;

        void OnEnable()
        {
            Debug.Log("MRArticleSeriesController -> OnEnable()");

            buttonActionRight.action.performed += OnButtonPressedRightAsync;
            buttonActionLeft.action.performed += OnButtonPressedLeft;
        }

        void OnDisable()
        {
            Debug.Log("MRArticleSeriesController -> OnDisable()");
            buttonActionRight.action.performed -= OnButtonPressedRightAsync;
            buttonActionLeft.action.performed -= OnButtonPressedLeft;
        }

        private void ActivateVoiceService()
        {
            Debug.Log("MRArticleSeriesController -> ActivateVoiceService()");

            if (voiceServiceRequestEvents == null)
            {
                voiceServiceRequestEvents = new VoiceServiceRequestEvents();

                voiceServiceRequestEvents.OnInit.AddListener(OnInit);
                voiceServiceRequestEvents.OnComplete.AddListener(OnComplete);
            }

            voiceServiceRequest = voiceService.Activate(voiceServiceRequestEvents);
        }

        private void DeactivateVoiceService()
        {
            Debug.Log("MRArticleSeriesController -> DeactivateVoiceService()");
            voiceServiceRequest.DeactivateAudio();
        }

        private void OnInit(VoiceServiceRequest request)
        {
            uI.SetActive(true);
        }

        private void OnComplete(VoiceServiceRequest request)
        {
            uI.SetActive(false);
            DeactivateVoiceService();
        }

        private async void OnButtonPressedRightAsync(InputAction.CallbackContext context)
        {
            Debug.Log("MRArticleSeriesController -> OnButtonPressedRightAsync()");

            if (rayInteractor.TryGetCurrent3DRaycastHit(out RaycastHit hit))
            {
                Pose pose = new(hit.point, Quaternion.identity);
                Result<ARAnchor> result = await anchorManager.TryAddAnchorAsync(pose);

                result.TryGetResult(out ARAnchor anchor);

                if (anchor != null)
                {
                    // Instantiate the door Prefab
                    GameObject _door = Instantiate(door, hit.point, Quaternion.identity);

                    // Unity recommends parenting your content to the anchor.
                    _door.transform.parent = anchor.transform;
                }
            }
        }

        private void OnButtonPressedLeft(InputAction.CallbackContext context)
        {
            Debug.Log("MRArticleSeriesController -> OnButtonPressedLeft()");

            ActivateVoiceService();
        }
    }
}

Enter fullscreen mode Exit fullscreen mode

Let's review the updates quickly:

  1. Added a field named uI to hold the GameObject that will be activated or deactivated based on the VoiceService state.
  2. Included a field called voiceService to reference the App Voice Experience component added to the XR Origin (XR Rig) GameObject.
  3. Introduced a field voiceServiceRequest to store the active request to the VoiceService.
  4. Added a voiceServiceRequestEvents field, which is passed to the VoiceService. This ensures that the OnInit and OnComplete methods are called by the VoiceService.
  5. In OnEnable and OnDisable we add and remove the OnButtonPressedLeft action, so we can respond to the left controller Trigger press.
  6. ActivateVoiceService method, which activates the VoiceService, creates VoiceServiceRequestEvents if not already initialized, and is called via OnButtonPressedLeft when the user presses the left Trigger.
  7. DeactivateVoiceService which simple deactivates the recording of the VoiceService.
  8. OnInit Invoked when the VoiceService starts listening. It enables the UI GameObject to inform the user that the app is now listening.
  9. OnComplete called when a voice request succeeds. It also invokes DeactivateVoiceService to stop the VoiceService from listening.
  10. OnButtonPressedLeft triggered when the left Trigger is pressed.

Please be aware that we have also renamed some variables and methods. After saving these changes, remember to return to the Unity Editor and update the fields for the Player component accordingly.

Updating our values on the MRArticleSeriesController component

Updating our values on the MRArticleSeriesController component

Testing the app

We are now prepared to test the app. Select Build and Run, and once the app is running, press the trigger on the left controller. You should see a label appear in front of you indicating "...Listening...". At this point, say the phrase "open the door". After a brief delay, the label should disappear. This process will allow you to verify the functionality of the voice recognition feature in your application.

View of the app after a left Trigger press

View of the app after a left Trigger press

Additionally, you can verify in the console if the Intent was triggered, as shown in the following screenshot. This will provide a clear indication of whether the voice command was successfully recognized.

Console output after the voice command was successfully recognized

Console output after the voice command was successfully recognized

Next article

In our upcoming article, we will take an exciting step forward by leveraging the voice command functionality we've established to initiate an animation that opens the door. This integration represents a significant enhancement in our application, blending voice recognition with dynamic visual feedback.

Top comments (0)