Rhasspy with openHAB

Posted by

I wanted my voice interaction solution to be fairly independent from openHAB, in order to have the option to switch to a different home automation controller in the future. That drove my design decisions in how to couple Rhasspy and openHAB.

The solution also needed to extensible: I expect to add more openHAB items for lights and other gadgets in the future, and I don’t want to have to manually edit lists of expected voice command sentences every time I do that.

I implemented three kinds of voice interaction: voice announcements, voice commands and voice questions and answers.

My Rhasspy configuration

Rhasspy can be deployed in many different ways: on a PC, on a Raspberry Pi, in a Docker container running on either of these, etc. I chose to install Rhasspy on a Debian Linux VM (4 CPUs, 4 GB memory), running on a host that has an Intel Core i5-12400 processor and 16 GB memory. I decided to create a fairly “beefy” VM, because I noticed that the speech-to-text algorithms use quite a bit of memory and CPU time.

Also, Rhasspy is a very modular design, with many voice models etc. to choose from. I am happy with the following settings, YMMV:

  • Speech to Text with Mozilla DeepSpeech
  • Intent Recognition with Fsticuffs (the recommended setting)
  • Text to Speech with Larynx, using the blizzard_lessac voice, high vocoder quality.

The server running Rhasspy, named stt-server in my setup, also runs an MQTT broker, the popular Eclipse Mosquitto. This is known to openHAB via a broker definition in a .things file

Bridge mqtt:broker:rh-mq "Rhasspy-Mosquitto"  
[ host="stt-server", secure=false, clientID="oh3" ]

Voice announcements

How openHAB communicates with the text-to-speech engine

To trigger a spoken announcement, openHAB publishes an MQTT message that contains all the relevant information: what to say, and on which satellite to say it. When the message is received by the master Rhasspy service, it converts the text to audio and then streams it to the specified satellite.

There is one Thing definition for all of this, with a separate Channel for each satellite, and a separate Item for each satellite capable of audio output. I have five satellites at the moment, three Raspberry Pies and two ESP32 based units. Assigning a text to one of these Items will send out the MQTT message that triggers Rhasspy to speak the text.

In /etc/openhab/things/rhasspy.things we have (tested with openHAB 3.3)

Thing mqtt:topic:rh-mq:say (mqtt:broker:rh-mq) {
 Channels:
  Type string: espD    [ commandTopic="hermes/tts/say", 
               formatBeforePublish="{\"text\":\"%s\",\"siteId\":\"espD\"}" ]
  Type string: espG    [ commandTopic="hermes/tts/say", 
               formatBeforePublish="{\"text\":\"%s\",\"siteId\":\"espG\"}" ]
  Type string: raspi7  [ commandTopic="hermes/tts/say", 
               formatBeforePublish="{\"text\":\"%s\",\"siteId\":\"raspi7\"}" ]
  Type string: raspi11 [ commandTopic="hermes/tts/say", 
               formatBeforePublish="{\"text\":\"%s\",\"siteId\":\"raspi11\"}" ]
  Type string: raspi14 [ commandTopic="hermes/tts/say", 
               formatBeforePublish="{\"text\":\"%s\",\"siteId\":\"raspi14\"}" ]
}

In /etc/openhab/items/rhasspy.items we have (note how the names of the Items always start with “say_“, followed by the siteId of the satellite)

Group gSay "Group: TTS audio sinks"
String say_espD     (gSay) { channel="mqtt:topic:rh-mq:say:espD" }     
String say_espG     (gSay) { channel="mqtt:topic:rh-mq:say:espG" }     
String say_raspi7   (gSay) { channel="mqtt:topic:rh-mq:say:raspi7" }
String say_raspi11  (gSay) { channel="mqtt:topic:rh-mq:say:raspi11" }
String say_raspi14  (gSay) { channel="mqtt:topic:rh-mq:say:raspi14" }

All of those Items representing “speaking satellites” are members of a gSay group, which allows me to find a group member, or do something for all group members, in a rule (see below).

How to make people pay attention to the announcement

Have you ever noticed that public announcements, e.g. at an airport or on an airplane, are always preceded by some chime or jingle, to catch your attention? Ding-dong, please fasten your seat belts. This is necessary to help you focus your attention on what is about to be announced. Rhasspy doesn’t offer that directly, as far as I know, so I had to make openHAB do it instead.

With Rhasspy, you can stream a WAV audio file directly to one of the satellites, by publishing chunks of audio via MQTT. I created a small script /etc/openhab/dingdong.sh that can be called from openHAB rules. it expects the siteId (name) of the satellite as its sole argument.

#!/bin/bash
if [ -z "$1" ] ; then
    echo "No argument supplied"
    exit 1
fi

uuid=$(uuidgen)
topic=topic=hermes/audioServer/$1/playBytes/$uuid
sound=/etc/openhab/sounds/dingdong.wav
mosquitto_pub -h stt-server -t $topic -s < $sound
sleep 2.5

The jingle is in /etc/openhab/sounds/dingdong.wav, a short mono WAV file at 44100 kHz sample rate. Pick whatever sound pleases your ears …

Now we just need a dummy string Item Rhasspy_NotifySite, and a rule that is triggered when that Item is set to a new text.

In /etc/openhab/rules/rhasspy.rules we have

rule "notify site"
when 
    Item Rhasspy_NotifySite received update 
then 
    val theArgs = newState.toString.split(":")
    if (theArgs.size != 2) { return; }
    
    val String siteId = theArgs.get(0)
    val String theText = theArgs.get(1)
    val String theSatelliteName = 'say_' + siteId
    var theSatellite = gSay.members.findFirst[ t | t.name==theSatelliteName]
    logInfo('rhasspy',"at {} say '{}'", siteId, theText)

    if (theSatellite!==null) {
        executeCommandLine(Duration.ofSeconds(3), '/etc/openhab/dingdong.sh', siteId )
        theSatellite.sendCommand(theText)
    }
end

How to broadcast an announcement to all satellites

This just needs one more dummy Item, named Rhasspy_NotifyAllSites, and a rule that is triggered when you assign a text to the Item.

In /etc/openhab/rules/rhasspy.rules we have

rule "notify all sites"
when 
    Item Rhasspy_NotifyAllSites received update 
then 
    if (IsLive.state == OFF) { return; }
    val String theText = newState.toString

    gSay.members.forEach[ i |
        val String site = i.name.replace("say_","")
        logInfo('rhasspy',"say '{}' at {}", theText, site)
        executeCommandLine(Duration.ofSeconds(3), '/etc/openhab/dingdong.sh', site )
        i.sendCommand(theText)
    ]
end 

How to use voice alerts in openHAB

To trigger an announcement in a rule, just assign a string like “siteId : what to say” to Item Rhasspy_NotifySite, using the postUpdate() or sendCommand() methods. If you set the Item to “raspi11:the washer has finished“, it will speak the announcement “the washer has finished” on the satellite named “raspi11”. To speak an announcement on all satellites, just assign a string to Item Rhasspy_NotifyAllSites.

Voice commands

How to tell Rhasspy what to expect

All openHAB items which I want to control (switch on of off) via voice commands are assigned to a group named gVA. The script oh_items queries openHAB via its REST API for a list of all items, and if the item is in group gVA, then several Rhasspy “slots” are generated. This script is run by the Rhasspy slot_programs mechanism, for more details see the Rhasspy documentation.

The Rhasspy sentences.ini configuration file contains a section called SetOneLight that defines what sentences to expect:

[SetOneLight]
(turn | switch) (on | off){state!upper} [the] ($oh_items,gVA){lightName} [please] 
(turn | switch) [the] ($oh_items,gVA){lightName} (on | off){state!upper} [please] 

With these definitions, the system will recognize voice commands like “turn the left hallway light on, please” or “switch off main bathroom fan” … provided you have defined openHAB Items with the label text “left hallway light” and “main bathroom fan“.

Every time you create new openHAB Items that you want controlled by voice commands, you need to revisit http://your-rhasspy-server-name:12101/ and click “Save Sentences”. This will re-run the scripts in slot_programs, and re-train the speech recognition engine.

How openHAB responds to a voice command

When one of those defined command sentences is recognized by Rhasspy, it publishes a lengthy JSON message via MQTT, with contains all the relevant information. This is captured by an openHAB Thing called SetOneLight and an associated Item Rhasspy_SetOneLight .

In /etc/openhab/things/rhasspy.things we have

Thing mqtt:topic:rh-mq:SetOneLight (mqtt:broker:rh-mq) {
  Channels:
    Type string: Message [ stateTopic="hermes/intent/SetOneLight" ]
}

In /etc/openhab/items/rhasspy.items we have

String  Rhasspy_SetOneLight  "Rhasspy Message"  { channel="mqtt:topic:rh-mq:SetOneLight:Message" }

When this item changes, i.e. when an MQTT message has been published by Rhasspy, then an openHAB rule extracts all the relevant pieces of information from the JSON payload, and acts as needed.

In /etc/openhab/rules/rhasspy.rules we have

rule "Rhasspy SetOneLight message"
when 
    Item Rhasspy_SetOneLight received update 
then 
    val String json = newState.toString
    val String rawInput = transform("JSONPATH","$.rawInput", json)
    val String itemName = transform("JSONPATH","$.slots[?(@.entity=='oh_items')].value.value", json)
    val String itemState = transform("JSONPATH","$.slots[?(@.entity=='state')].value.value", json).toUpperCase
    val String siteId = transform("JSONPATH","$.siteId", json)
    logInfo("voice","Site {} heard '{}'", siteId, rawInput )

    // set the item as requested
    val theItem = gVA.members.findFirst[ t | t.name==itemName]
    if (theItem !== null) {
        theItem.sendCommand(itemState)
    }
end

Aye, aye, sir — how to acknowledge a voice command

I wanted the voice interaction system to acknowledge that a voice command has been received by repeating the command, after the action has been performed. Do do this, we just use voice announcement functionality described above to repeat the text that the Rhasspy speech recognition had reported.

One little complication: some satellites are input-only, and others are output-only, so the acknowledgement may need to be spoken by a different satellite from the one that heard the voice command. To deal with that, we have a MAP file that maps input siteIds to corresponding output siteIds.

In /etc/openhab/transform/source_to_sink.map , we have

# map voice recognition sites to notification sites
boxlite-A=espD
boxlite-B=raspi7
raspi7=raspi7
raspi11=raspi11
raspi14=raspi14
# espD=espD
-=undefined
NULL=NULL

In /etc/openhab/rules/rhasspy.rules , in addition to what was shown above, the rule has one more section

rule "Rhasspy SetOneLight message"
when 
    Item Rhasspy_SetOneLight received update 
then 
    ...see above ...

    // where should the command acknowledgement be heard?
    val sinkName = transform("MAP","source_to_sink.map",siteId)
    val sinkItem = gSay.members.findFirst[ t | t.name=="say_"+sinkName]
    if (sinkItem !== null) {
        sinkItem.sendCommand(rawInput)
    }
end 

Voice questions and answers

In addition to voice announcements and voice commands, I also implemented a “dialog” feature, where I can ask for information, and get an answer from the system. Currently, this very limited, I can only ask for the time of day, for temperature and humidity.

In /etc/openhab/things/rhasspy.things we have

Thing mqtt:topic:rh-mq:VoiceQuestion (mqtt:broker:rh-mq) {
  Channels:
    Type string: Message [ stateTopic="hermes/intent/VoiceQuestion" ]
}

In /etc/openhab/items/rhasspy.items we have

String Rhasspy_Question  "Rhasspy Question"  { channel="mqtt:topic:rh-mq:VoiceQuestion:Message" }

String vqTime "time" (gVQ)  // dummy items picked up by Rhasspy slot program

in other .items files, I have (the details are not relevant for what I am trying to explain here)

Number localCurrentTemperature "temperature [%.0f°C]" <temperature>  (gVQ)  {...some binding...}
Number localCurrentHumidity    "humidity [%.0f%%rH]"  <humidity>     (gVQ)  {...some binding...}

In /etc/openhab/rules/rhasspy.rules we have

rule "Rhasspy VoiceQuestion message"
when 
    Item Rhasspy_Question received update 
then 
    if (IsLive.state!=ON) { return; }

    val String json = newState.toString
    val String rawInput = transform("JSONPATH","$.rawInput", json)
    val String topic = transform("JSONPATH","$.slots[?(@.entity=='oh_items')].rawValue", json).toLowerCase
    val String siteId = transform("JSONPATH","$.siteId", json)
    logInfo("voice","Site {} heard '{}', ask about '{}'", siteId, rawInput, topic )

    var String answer = " I don't know"

    if (topic=="temperature") {
        answer = "the temperature is " 
        + String::format("%.0f", (localCurrentTemperature.state as DecimalType).floatValue )
        + " degrees outside and "
        + String::format("%.0f", (AZ_Temp.state as DecimalType).floatValue )
        + " degrees inside"
    } else if (topic=="humidity") {
        answer = "the " + topic + " is " 
        + String::format("%.0f", (localCurrentHumidity.state as DecimalType).floatValue )
        + " percent"
    } else if (topic == "time") {
        answer = "it is " + String.format("%1$tH:%1$tM", now)
    }

    // where should the answer be heard?
    val sinkName = transform("MAP","source_to_sink.map",siteId)
    val sinkItem = gSay.members.findFirst[ t | t.name=="say_"+sinkName]
    if (sinkItem !== null) {
        sinkItem.sendCommand(answer)
    }

    logInfo("voice","QUESTION '{}' to '{}', ANSWER '{}' to '{}'", topic, siteId, answer, sinkName)
end 

This part is a bit ugly, because I have a long if .. else if chain to identify each question topic. There is room for improvement …

Leave a Reply