Listen! … voice interaction with home automation

I have used an openHAB based home automation setup for several years now, and I am mostly happy with it … except in those very lazy moments where I just want to turn on a light, and wish I didn’t have to to pull my phone from my pocket, unlock it, open the openHAB app, navigate to the right UI page, and push a button. So … I decided it was time for voice-based interaction, Alexa style.

What I will describe here is a solution that works for me, and satisfy my requirements defined below. I write this in the hope it will help others get started with voice control. I am not claiming that is is the only way or the best way or the most sophisticated way of doing voice interaction with home automation.

Objectives

My functional requirements are to

I want to control lights and fans via voice commands
I want to ask for some basic information like temperature or time of day, and get a spoken response
I want to get spoken alerts for some relevant events such as “the washer has finished”, “you left the freezer door open” or “your uncle is calling“.

My non-functional requirements are

This needs to work in many rooms in the house, not just at my desk … so multiple microphones and speakers are needed.
No cloud-based solution. While I like shopping from Amazon, or searching with Google, I don’t want either of them to listen to every conversation I am having in my house.
The speech-to-text and text-to-speech solution should be independent from the home automation software I am using, if possible. Currently I use openHAB, and I am reasonably happy with it, but who knows which direction the developers will take next year, maybe I will want to switch to something else in the future.
The solution needs to extensible: I expect to add more openHAB items for lights and other gadgets in the future, and I don’t want to have to manually edit lists of expected voice command sentences every time I do that.
The system needs to understand and speak English.

Architecture

The solution I chose looks like this (detailed descriptions will follow in later posts):

The current home automation system uses openHAB 3.3, running in a Debian Linux VM (2 CPUs, 2 GB memory) on a Debian Linux host with an Intel Core i5-12400 processor and 16 GB memory.
The center of voice control is Rhasspy 2.5.11, running on another Debian Linux VM (4 CPUs, 4 GB memory) on the same VM host.
There are multiple types of satellites around the house
- Several Raspberry Pies (2B, 3B and Zero 2W models), some of which were already in use before, e.g. as Kodi mediaplayers. They now have Rhasspy installed and configured as a satellite. For audio I/O, they have a ReSpeaker 2-Mics Pi HAT, which contains two microphones, and has a small passive speaker attached. For a detailed description, see the separate blog post.
- Some output-only satellites are built from a small PC speaker from Amazon, with an ESP32 placed inside (see my Basic Satellite blog post).
- Some input-only satellites consist of a ESP32-S3-Box-Lite running the Willow voice recognition software. Those little boxes actually do all the speech recognition locally, they are independent of Rhasspy.
The interface between openHAB and the voice machinery is via MQTT and a few openHAB items backed by rules that run when one of those items changes. For details, see my blog post on Rhasspy with openHAB.

Features

My installation supports the following scenarios

Lights and fans controlled by openHAB can be switched on or off with commands like “turn on left table lamp” or “switch the hallway light off, please“. When a command is recognized, the system performs the requested action and then acknowledges to command by repeating the spoken command as it was understood
I can ask about a few topics, and the system answers with a spoken message. A this time, the only available topics are time of day, temperature and humidity.
The system makes voice announcements for the following events: 1. washer or dryer have finished, 2. incoming telephone call, with the name of the caller if available, 3. a warning if the freezer door has been left ajar for too long, and 4. a personalized welcome message when my wife or I enter the apartment — the latter is just a gimmick, really …

Conclusions

It is 11pm, I am sitting at my desk, and decide to call it a day. I need to turn off my PC and my desk lamp, and then walk to the office door. I don’t want to walk in the dark, so I look at the Raspberry Pi sitting on a bookshelf about 3m away, and say “Porcupine, turn on the left hallway light“. The hallway light turns on and shines through the glass door into my office, so I can walk out without tripping over anything. Luxury and lazyness …

After 6 months of use, I still like the voice interaction solution I built. The speech recognition is reliable enough, I hardly ever have to repeat myself. The features it offers are not essential, but they are nice and convenient.

For the future, I’d like to add more functionality. When I’m in the kitchen cooking dinner, I’d like to say “Porcupine, set a timer for 12 minutes“, and get a spoken reminder when the pasta are done.

This has been a mixture of what I would call requirements-driven and capabilities-driven development. In part, I found it helpful to start by envisioning specific scenarios (what you might call requirements engineering), and then try to implement them using the architecture I had picked earlier. In part, I was also inspired by the capabilities now available by the system I built, to come up with additional useful features: I have means to detect when the freezer door is left ajar (see my blog post), and now I have means for spoken announcements, so let’s combine the two!

Objectives

Architecture

Features

Conclusions

Leave a Reply Cancel reply