Thanks! There are at least two things here: 1) The ability to do speech to text ...

Thanks!

There are at least two things here:

1) The ability to do speech to text on random speech. I'm going to stick by that description :). If you've ever watched a little kid play with Alexa it's definitely what you would call "random speech" haha!

2) The ability to satisfy the request (intent) of the text output. Up to and including current information via API, etc.

Our soon to be released highly optimized open source inference server uses Whisper and is ridiculously fast and accurate. Based on our testing with nieces and nephews we have "random speech" covered :). Our inference server also supports LLaMA, Vicuna, etc and can chain together STT -> LLM/API/etc -> TTS - with the output simply played over the Willow speaker and/or displayed on the LCD.

Our goal is to make a Willow Home Assistant component that assists with #2. There are plenty of HA integrations and components to do things like get weather in real time, in addition to satisfying user intent recognition. They have an entire platform for it[0]. Additionally, we will make our inference server implementation (that does truly unique things for Willow) available as just another TTS/STT integration option on top of the implementations they already support so you can use whatever you want, or send the audio output after wake to whatever you want like Vosk, Cheetah, etc, etc.

[0] - https://developers.home-assistant.io/docs/intent_index/