DEV Community

Cover image for Speech Recognition with SwiftUI
Dilek Karasoy for Picovoice

Posted on

Speech Recognition with SwiftUI

On day 31, we'll work with SwiftUI to build a voice assistant to get Siri pour coffee -at least visually. Of course, you can take it one step further and connect to a coffee maker.

Let's handle the speech recognition part.

The Speech framework offers cloud-dependent transcription, even for simple things like getting coffee. Rhino Speech-to-Intent is the best choice for the context-aware spoken language understanding.

** Let's start with a Simple Graphic User Interface**
Thankfully, SwiftUI has made creating visually appealing stateful UIs really easy. In about a half-hour you can mock-up a GUI with a coffee maker image, some text prompts, and a collection of stateful buttons similar to this:
Siri Barista

Add the Picovoice Cocoapod
CocoaPods modernize iOS package management, enabling effortless integration of powerful app extensions by developers.

To install the Picovoice pod using Cocoapods, add the following to your Podfile:

source 'https://cdn.cocoapods.org/'
// ...
pod 'Picovoice-iOS'
Enter fullscreen mode Exit fullscreen mode

Let's dive into Voice AI
Picovoice Platform SDK combines two speech recognition engines - Porcupine Wake Word and Rhino Speech-to-Intent. Together, they enable creating voice interactions akin to Alexa and Siri, while keeping the voice processing on-device. For example, in a command like
Hey Siri, could I have a medium coffee?
"Hey Siri" is detected by Porcupine, and the rest is inferred by Rhino through a specialized context without transcribing it to text.

When Rhino infers the utterance, it returns an instance of an Inference struct; for the above sample phrase, the struct will look like this:

IsUnderstood: true,
Intent: 'orderBeverage',
Slots: {
  size: 'medium',
  beverage: 'coffee'
}
Enter fullscreen mode Exit fullscreen mode

In order to initialize the voice AI, we’ll need both Porcupine (.ppn) and Rhino (.rhn) model files. Picovoice has made several pre-trained Porcupine and pre-trained Rhino models available on the Picovoice GitHub repositories. For this Barista app, we’re going to use the trigger phrase Hey Barista and the Coffee Maker context.

  • Download the hey barista_ios.ppn and coffee_maker_ios.rhn models.
  • Add them to the iOS project as a bundled resource.
  • Get your Picovoice AccessKey from the Picovoice Console for free if you haven't.

Now, we can load models at runtime. Let's initialize the Picovoice Platform:

import Picovoice

let accessKey = "..." // your Picovoice AccessKey
let contextPath = Bundle.main.path(forResource: "coffee_maker_ios", ofType: "rhn")
let keywordPath = Bundle.main.path(forResource: "hey barista_ios", ofType: "ppn")
var picovoiceManager:PicovoiceManager!

init() {
    do {
        picovoiceManager = PicovoiceManager(
            accessKey: accessKey,
            keywordPath: keywordPath!,
            onWakeWordDetection: {
              // wake word detected
            },
            contextPath: contextPath!,
            onInference: { inference in
              // inference result
            })

        try picovoiceManager.start()
    } catch {
       print("\(error)")
   }
}
Enter fullscreen mode Exit fullscreen mode

The method picovoiceManager.start() starts audio capture and passes audio streams to the engines.

To capture microphone audio, we must add the permission request to the Info.plist:

<key>NSMicrophoneUsageDescription</key> 
<string>To recognize voice commands</string>
Enter fullscreen mode Exit fullscreen mode

Integrate Voice Controls
To manage SwiftUI programmatically, we'll generate a ViewModel and have the UI observe it. The required UI controls are straightforward: 1. indicating the detection of the wake word, and 2. showing the drink order. Create a struct to represent buttons and state variables to show and hide text; the UI will then be bound to these parameters because they use the Published keyword. So ViewModel will look like:

import SwiftUI
import Picovoice

struct CapsuleSelection: Codable, Identifiable{
    var title:String
    var id:String
    var isSelected:Bool

    init(title:String) {
        self.title = title
        self.id = title.lowercased()
        self.isSelected = false
    }
}

class ViewModel: ObservableObject {

    @Published var sizeSel = [CapsuleSelection(title: "Small"), 
                              CapsuleSelection(title: "Medium"), 
                              CapsuleSelection(title: "Large")]
    @Published var shotSel = [CapsuleSelection(title: "Single Shot"), 
                              CapsuleSelection(title: "Double Shot"), 
                              CapsuleSelection(title: "Triple Shot")]
    @Published var bevSel = [CapsuleSelection(title: "Americano"), 
                             CapsuleSelection(title: "Cappuccino"), 
                             CapsuleSelection(title: "Coffee"), 
                             CapsuleSelection(title: "Espresso"),
                             CapsuleSelection(title: "Latte"), 
                             CapsuleSelection(title: "Mocha")]    
    @Published  var isListening = false
    @Published  var missedCommand = false

    let accessKey = "..." // your Picovoice AccessKey
    let contextPath = Bundle.main.path(forResource: "coffee_maker_ios", ofType: "rhn")
    let keywordPath = Bundle.main.path(forResource: "hey barista_ios", ofType: "ppn")
    var picovoiceManager:PicovoiceManager!

    init() {
        do {
            picovoiceManager = PicovoiceManager(
                accessKey: accessKey,
                keywordPath: keywordPath!,                
                onWakeWordDetection: {
                    DispatchQueue.main.async {
                        self.isListening = true
                        self.missedCommand = false                        
                    }
                },
                contextPath: contextPath!,                
                onInference: { inference in
                    DispatchQueue.main.async {
                        if inference.isUnderstood {
                            if inference.intent == "orderBeverage" {       

                                // parse size
                                if let size = inference.slots["size"]{
                                    if let i = self.sizeSel.firstIndex(
                                      where: { $0.id == size }) {
                                        self.sizeSel[i].isSelected = true
                                    }
                                }

                                // repeat for 'numShots' and 'beverage'...
                            }
                        }
                        else {
                            self.missedCommand = true
                        }                        
                        self.isListening = false
                    }
                })

            try picovoiceManager.start()
        } catch {
           print("\(error)")
       }
    }
}
Enter fullscreen mode Exit fullscreen mode

Finally, Siri understands how you want to get your coffee without connecting to the internet!

Below are some useful resources:
Open-source code
Picovoice Platform SDK
Picovoice website

Top comments (0)