( … this used to be one very long blog post, which I have since split up into 3 separate shorter posts … )
I wanted my voice interaction solution to be fairly independent from openHAB, in order to have the option to switch to a different home automation controller in the future. That drove my design decisions in how to couple Rhasspy and openHAB.
The solution also needed to extensible: I expect to add more openHAB items for lights and other gadgets in the future, and I don’t want to have to manually edit lists of expected voice command sentences every time I do that.
I implemented three kinds of voice interaction, described in separate posts: voice announcements, voice commands and voice questions and answers.
My Rhasspy configuration
Rhasspy can be deployed in many different ways: on a PC, on a Raspberry Pi, in a Docker container running on either of these, etc. I chose to install Rhasspy on a Debian Linux VM (4 CPUs, 4 GB memory), running on a host that has an Intel Core i5-12400 processor and 16 GB memory. I decided to create a fairly “beefy” VM, because I noticed that the speech-to-text algorithms use quite a bit of memory and CPU time.
Also, Rhasspy is a very modular design, with many voice models etc. to choose from. I am happy with the following settings, YMMV:
- Speech to Text with Mozilla DeepSpeech
- Intent Recognition with Fsticuffs (the recommended setting)
- Text to Speech with Larynx, using the blizzard_lessac voice, high vocoder quality.
The system is composed of multiple Rhasspy satellites dotted around the house, so I can have voice-based interaction in almost every room. The satellites are mostly Raspberry Pies, as described in this blog post. There are also two ESP32-based output-only satellites, as described here. I am also experimenting with an ESP32 box running speech recognition software from the Willow project.
The server running Rhasspy, named stt-server in my setup, also runs an MQTT broker, the popular Eclipse Mosquitto. This is known to openHAB via a broker definition in a .things file
Bridge mqtt:broker:rh-mq "Rhasspy-Mosquitto"
[ host="stt-server", secure=false, clientID="oh3" ]
Voice announcements
In this feature, information flows only in one direction: from openHAB to me, no speech recognition involved.
For details, see the blog post Rhasspy with openHAB, part I: voice announcements.
Voice commands
For this feature, openHAB items need to be marked as “voice controlled” and that information must be transferred to Rhasspy. Then, whenever Rhasspy detects a voice command, it will provide all relevant informationto openHAB, which in turn performs the requested action.
For details, see the blog post Rhasspy with openHAB, part 2: voice commands.
Voice questions and answers
In addition to voice announcements and voice commands, I also implemented a “dialog” feature, where I can ask for information, and get an answer from the system.
For details, see the blog post Rhasspy with openHAB, part 3: questions and answers.