On day 31, we'll work with SwiftUI to build a voice assistant to get Siri pour coffee -at least visually. Of course, you can take it one step further and connect to a coffee maker.
Let's handle the speech recognition part.
The Speech framework offers cloud-dependent transcription, even for simple things like getting coffee. Rhino Speech-to-Intent is the best choice for the context-aware spoken language understanding.
** Let's start with a Simple Graphic User Interface**
Thankfully, SwiftUI has made creating visually appealing stateful UIs really easy. In about a half-hour you can mock-up a GUI with a coffee maker image, some text prompts, and a collection of stateful buttons similar to this:
Add the Picovoice Cocoapod
CocoaPods modernize iOS package management, enabling effortless integration of powerful app extensions by developers.
To install the Picovoice pod using Cocoapods, add the following to your Podfile:
source 'https://cdn.cocoapods.org/'
// ...
pod 'Picovoice-iOS'
Let's dive into Voice AI
Picovoice Platform SDK combines two speech recognition engines - Porcupine Wake Word and Rhino Speech-to-Intent. Together, they enable creating voice interactions akin to Alexa and Siri, while keeping the voice processing on-device. For example, in a command like
Hey Siri, could I have a medium coffee?
"Hey Siri" is detected by Porcupine, and the rest is inferred by Rhino through a specialized context without transcribing it to text.
When Rhino infers the utterance, it returns an instance of an Inference
struct; for the above sample phrase, the struct will look like this:
IsUnderstood: true,
Intent: 'orderBeverage',
Slots: {
size: 'medium',
beverage: 'coffee'
}
In order to initialize the voice AI, we’ll need both Porcupine (.ppn) and Rhino (.rhn) model files. Picovoice has made several pre-trained Porcupine and pre-trained Rhino models available on the Picovoice GitHub repositories. For this Barista app, we’re going to use the trigger phrase Hey Barista
and the Coffee Maker
context.
- Download the
hey barista_ios.ppn
andcoffee_maker_ios.rhn
models. - Add them to the iOS project as a bundled resource.
- Get your
Picovoice AccessKey
from the Picovoice Console for free if you haven't.
Now, we can load models at runtime. Let's initialize the Picovoice Platform:
import Picovoice
let accessKey = "..." // your Picovoice AccessKey
let contextPath = Bundle.main.path(forResource: "coffee_maker_ios", ofType: "rhn")
let keywordPath = Bundle.main.path(forResource: "hey barista_ios", ofType: "ppn")
var picovoiceManager:PicovoiceManager!
init() {
do {
picovoiceManager = PicovoiceManager(
accessKey: accessKey,
keywordPath: keywordPath!,
onWakeWordDetection: {
// wake word detected
},
contextPath: contextPath!,
onInference: { inference in
// inference result
})
try picovoiceManager.start()
} catch {
print("\(error)")
}
}
The method picovoiceManager.start()
starts audio capture and passes audio streams to the engines.
To capture microphone audio, we must add the permission request to the Info.plist
:
<key>NSMicrophoneUsageDescription</key>
<string>To recognize voice commands</string>
Integrate Voice Controls
To manage SwiftUI programmatically, we'll generate a ViewModel and have the UI observe it. The required UI controls are straightforward: 1. indicating the detection of the wake word, and 2. showing the drink order. Create a struct to represent buttons and state variables to show and hide text; the UI will then be bound to these parameters because they use the Published
keyword. So ViewModel
will look like:
import SwiftUI
import Picovoice
struct CapsuleSelection: Codable, Identifiable{
var title:String
var id:String
var isSelected:Bool
init(title:String) {
self.title = title
self.id = title.lowercased()
self.isSelected = false
}
}
class ViewModel: ObservableObject {
@Published var sizeSel = [CapsuleSelection(title: "Small"),
CapsuleSelection(title: "Medium"),
CapsuleSelection(title: "Large")]
@Published var shotSel = [CapsuleSelection(title: "Single Shot"),
CapsuleSelection(title: "Double Shot"),
CapsuleSelection(title: "Triple Shot")]
@Published var bevSel = [CapsuleSelection(title: "Americano"),
CapsuleSelection(title: "Cappuccino"),
CapsuleSelection(title: "Coffee"),
CapsuleSelection(title: "Espresso"),
CapsuleSelection(title: "Latte"),
CapsuleSelection(title: "Mocha")]
@Published var isListening = false
@Published var missedCommand = false
let accessKey = "..." // your Picovoice AccessKey
let contextPath = Bundle.main.path(forResource: "coffee_maker_ios", ofType: "rhn")
let keywordPath = Bundle.main.path(forResource: "hey barista_ios", ofType: "ppn")
var picovoiceManager:PicovoiceManager!
init() {
do {
picovoiceManager = PicovoiceManager(
accessKey: accessKey,
keywordPath: keywordPath!,
onWakeWordDetection: {
DispatchQueue.main.async {
self.isListening = true
self.missedCommand = false
}
},
contextPath: contextPath!,
onInference: { inference in
DispatchQueue.main.async {
if inference.isUnderstood {
if inference.intent == "orderBeverage" {
// parse size
if let size = inference.slots["size"]{
if let i = self.sizeSel.firstIndex(
where: { $0.id == size }) {
self.sizeSel[i].isSelected = true
}
}
// repeat for 'numShots' and 'beverage'...
}
}
else {
self.missedCommand = true
}
self.isListening = false
}
})
try picovoiceManager.start()
} catch {
print("\(error)")
}
}
}
Finally, Siri understands how you want to get your coffee without connecting to the internet!
Below are some useful resources:
Open-source code
Picovoice Platform SDK
Picovoice website
Top comments (0)