With the impending demise of Snips, I’ve been looking for a suitable replacement offline speech recognition solution. After some research, Rhasspy seems like a real winner. Besides supporting a variety of toolkits, it has good documentation, and can be easy to get working.
A Raspberry Pi3B with stock Debian really struggled with this at times. It might be possible to alleviate this by picking different services or adjusting other configuration, but you might be better off just using a more powerful device (like a Pi4 or Jetson Nano) or running it remotely.
Installation
Normally, I like to go through manual installation. But installing Pocketsphinx and OpenFST for Jasper was enough of a headache that I decided to go the container route.
Follow the Rhasspy installation docs. I’m runnning both Hass and Rhasspy on the same Raspberry Pi. From my PC I connect to the pi as pi3.local
- adjust this based on the name of your device or use the IP address. If working directly on the device everything is localhost
.
If you haven’t already, install docker using the convenience script:
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker $USER
You can run Rasspy with the recommended:
docker run -d -p 12101:12101 \
--restart unless-stopped \
-v "$HOME/.config/rhasspy/profiles:/profiles" \
--device /dev/snd:/dev/snd \
synesthesiam/rhasspy-server:latest \
--user-profiles /profiles \
--profile en
Or , use docker-compose:
-
Install docker-compose via alternative install options:
sudo pip install docker-compose
-
Use the recommended
docker-compose.yml
:
rhasspy: image: "synesthesiam/rhasspy-server:latest" restart: unless-stopped volumes: - "$HOME/.config/rhasspy/profiles:/profiles" ports: - "12101:12101" devices: - "/dev/snd:/dev/snd" command: --user-profiles /profiles --profile en
Run:
docker-compose up
If docker-compose up
fails with ImportError: No module named ssl_match_hostname
see this issue:
# Remove problematic `ssl-match-hostname`
sudo pip uninstall backports.ssl-match-hostname docker-compose
# Install alternative `ssl-match-hostname`
sudo apt-get install -y python-backports.ssl-match-hostname \
python-backports.shutil-get-terminal-size
# Reinstall docker-compose
sudo pip install docker-compose
Docker Shell
When running things with docker, it takes an extra step to have a shell in the context of the container.
-
Show running containers with
docker ps
ordocker container ls
:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 4181a2880c84 synesthesiam/rhasspy-server:latest "/run.sh --user-prof…" 26 hours ago Up 4 minutes 0.0.0.0:12101->12101/tcp pi_rhasspy_1
-
Get a shell to the container:
pc> docker exec -it pi_rhasspy_1 /bin/bash # Now you're in the container root@4181a2880c84:/#
Replace pi_rhasspy_1
with the “container id” or “name” of the appropriate container.
Configuration
Once docker outputs rhasspy_1 | Running on https://0.0.0.0:12101 (CTRL + C to quit)
Rhasspy should be up and running. Ignore what it says and use http
instead of https
- point your browser at http://pi3.local:12101.
At this point I was able to configure everything via the Settings tab. Should that not cooperate, everything can also be done via json.
Audio
The first things to get working are audio input and output. Refer back to an earlier post about working with ALSA.
-
Settings > Microphone (“Audio Recording”)
- Use arecord directly (ALSA) (default is PyAudio)
- Select appropriate Input Device
-
Settings > Sounds (“Audio Playing”)
- Use aplay directly (ALSA)
- Select appropriate Output Device
To verfy audio recording/playback works, from a docker shell use arecord
and aplay
.
If, instead of ALSA for input you want to use PyAudio, it’s handy to see what PyAudio sees:
# Install pyaudio
sudo apt-get install -y python-pyaudio
# Launch python REPL
python
Then, run the following (from SO#1, SO#2):
import pyaudio
p = pyaudio.PyAudio()
for i in range(p.get_device_count()):
print p.get_device_info_by_index(i)
## OR
import pyaudio
p = pyaudio.PyAudio()
info = p.get_host_api_info_by_index(0)
numdevices = info.get('deviceCount')
for i in range(0, numdevices):
if (p.get_device_info_by_host_api_device_index(0, i).get('maxInputChannels')) > 0:
print "Input Device id ", i, " - ", p.get_device_info_by_host_api_device_index(0, i).get('name')
Rhasspy TTS
Testing text-to-speech also seems to be the easist way to validate your audio output is working.
-
Settings > Text to Speech
-
eSpeak
didn’t work for me, but bothflite
andpico-tts
did
-
-
Speech tab, in Sentence put
hello
and Speak - Check Log tab for
FliteSentenceSpeaker
lines to see e.g. command lines it’s using
Intent Recognition
One way to validate audio input is to setup Rhasspy to recognize intents.
-
Settings > Intent Recognition
- Default
OpenFST
should work
- Default
-
Sentences tab to configure recognized intents
- Uses a simplified JSGF syntax
- Speech tab, use Hold to Record or Tap to Record for mic input
- Saying
what time is it
should output:
{
"intent":{
"entities":{},
"hass_event":{
"event_data":{},
"event_type": "rhasspy_GetTime"
},
"intent":{
"confidence": 1,
"name": "GetTime"
},
"raw_text": "what time is it",
}
}
Wake word
Another way to validate audio input is to setup a phrase to trigger Rhasspy to recognize intents (i.e. hey siri
, ok google
, etc.)
- Settings > Wake Word
-
PocketSphinx is the only fully open/offline option
- “Wake Keyphrase” is the trigger phrase
- Save Settings and wait for Rhasspy to restart
- Train (mentioned in the docs)
- Check Log for
PocketsphinxWakeListener: Hotword detected
If your wake keyphrase contains a new word, the log will complain it’s not in dictionary.txt
after you Save Settings :
[WARNING:955754080] PocketsphinxWakeListener: XXX not in dictionary
[DEBUG:3450672] PocketsphinxWakeListener: Loading wake decoder with hmm=/profiles/en/acoustic_model, dict=/profiles/en/dictionary.txt
It seems like either adding a custom word via the Words tab and/or hitting Train should fix this, but I haven’t yet figured out the correct incantation.
Hass Integration
Integrating with Home Assistant is accomplished by leveraging Hass’ REST API and POSTing to /api/events
endpoint.
- Hass: Create long-lived access token
- Open Hass: http://pi3.local:8123/profile
- Long-Lived Access Tokens > Create Token
- Also read Hass authetication docs
- Rhasspy: Configure intent handling with Hass
- Open Rhasspy: http://pi3.local:12101/
- Settings > Intent Handling
-
Hass URL
http://172.17.0.1:8123
(the docker host,172.17.0.2
is the container itself)- If not using docker could instead use
localhost
- If not using docker could instead use
- Access Token the token from above
- Save Settings > OK to restart
Check Hass REST API is working:
curl -X GET -H "Authorization: Bearer <ACCESS TOKEN>" -H "Content-Type: application/json" http://pi3.local:8123/api/
Should return:
{"message": "API running."}
Note that from within the container you can’t connect to services outside the container using localhost
. There’s a few different ways to do this, but that’s why we’re using 172.17.0.1
above:
# Shell into container
docker exec -it pi_rhasspy_1 /bin/bash
# Try Hass REST API to `localhost`
curl -X GET -H "Authorization: Bearer <ACCESS TOKEN>" -H "Content-Type: application/json" http://localhost:8123/api/
curl: (7) Failed to connect to localhost port 8123: Connection refused
Let’s test the Rhasspy->Hass connection:
- Open Hass: http://pi3.local:8123/profile
-
Developer Tools > Events > Listen to events
-
rhasspy_GetTime
and Start Listening.
-
- Like for intent recognition, say “what time is it”
- Hass should output:
{
"event_type": "rhasspy_GetTime",
"data": {},
"origin": "REMOTE",
"time_fired": "2019-12-17T16:02:51.366090+00:00",
"context": {
"id": "012345678901234567890123456789",
"parent_id": null,
"user_id": "deadbeefdeadbeefdeadbeefdeadbeef"
}
}
Let’s test Hass automation:
- Open Hass: http://pi3.local:8123/profile
- Configuration > Automation > +
- Create an Event trigger:
- Triggers
- Trigger type: Event
- Actions
- Action type: Call service
- Service:
system_log.write
- Service data:
{message: 'Hello event'}
- Triggers
- Like for intent recognition, say “what time is it”
- In Hass, Developer Tools > Logs should show the message.
Hass TTS
To use Rhasspy’s TTS we can leverage its REST API:
curl -X POST -d "hello world" http://pi3.local:12101/api/text-to-speech
To trigger this from Hass, we can use the RESTful Command integration. In configuration.yaml
:
rest_command:
tts:
url: http://localhost:12101/api/text-to-speech
method: POST
payload: ''
The payload
is Jinja2 template that can be set by the caller.
Test the tts
REST command:
- Open Hass: http://pi3.local:8123/profile
- Developer Tools > Services
- Specify
rest_command.tts
service and with datamessage: "hello"
- Call Service to trigger Rhasspy TTS
Let’s add it to our Hass automation:
- Configuration > Automation
- Edit the previous item (click the pencil- ✎)
-
Add Action :
- Action type: Call service
- Service:
rest_command.tts
(it should auto-complete for you) - Service data:
{message: 'hello world'}
- Like for intent recognition, say “what time is it”
This should trigger a full loop:
speech -> Rhasspy -> intent -> Hass -> text -> Rhasspy -> speech
Systemd
I’d like Rhasspy to auto-start similar to Hass.
It would seem that mixing docker with systemd is bad mojo, making me contemplate re-installing Hass via docker. Docker says little on starting containers with systemd other than don’t cross the streams with restarts. And so far google has turned up dubious results- mostly from several years ago that don’t work with current versions of docker.
Create /etc/systemd/system/rhasspy@homeassistant.service
:
[Unit]
Description=Rhasspy
Wants=home-assistant@homeassistant.service
Requires=docker.service
After=home-assistant@homeassistant.service docker.service
[Service]
Type=exec
ExecStart=docker run --rm \
--name rhasspy \
-p 12101:12101 \
-v "/home/homeassistant/.config/rhasspy/profiles:/profiles" \
--device /dev/snd:/dev/snd \
synesthesiam/rhasspy-server:latest \
--user-profiles /profiles --profile en
ExecStop=docker stop rhasspy
# Restart on failure
Restart=on-failure
RestartSec=5s
[Install]
WantedBy=multi-user.target
Wants/Requires/After | Docker must be running, and ideally Hass is (but we can start Rhasspy without it) | man |
Type | Stronger requirement than simple ensuring the process starts |
man |
ExecStart | Start the container | |
ExecStop | Stop the container |
For ExecStart
, note a few differences from the original docker run
:
--rm |
Remove the container on exit. Otherwise we get “name taken” errors on restarts. | |
--name |
Give it a predictable name to simplify ExecStop , and make it easier to open docker shells |
|
-v |
Docker defaults to creating files as root. /srv/ might be better, but I thought this would make the profiles easier to find. |
|
--restart unless-stopped |
Removed since systemd is managing the lifetime. |
Configure it to auto-start and start it:
sudo systemctl --system daemon-reload
sudo systemctl enable rhasspy@homeassistant
sudo systemctl start rhasspy@homeassistant
To debug:
# Check running containers
docker container ls
# Check log output
sudo journalctl -f -u rhasspy@homeassistant
# Open docker shell
docker exec -it rhasspy /bin/bash
Note, if you fail to remove $HOME
from docker run
it will fail with:
Dec 18 19:00:25 pi3 docker[4764]: /usr/bin/docker: invalid reference format.
Top comments (5)
Great, this is on my to-do list! :) I will try running model inference on intel NUC to see if it works better than rpi
I have a friend that loves those and uses them extensively. I like that they have a wide range features/capabilities, but have yet to actually get one.
wouldn't buy it again though, I would look for something with AMD CPU instead of Intel :)
Something based on Raven Ridge or newer APU would be interesting.
Those NUCs are a bit pricier than I expected. Looks like min-spec is starting $200+. It’s a bit over my impulsive buy limit, but I could see grabbing just one.
Curious if you got a chance to consider or evaluate the Almond+Ada integration?