Hey HN!
Willow Inference Server (WIS) is a focused and highly optimized language inference server implementation. Our goal is to "automagically" enable performant, cost-effective self-hosting of released state of the art/best of breed models to enable speech and language tasks:
Primarily targeting CUDA (works on CPU too) with support for low-end (cheap) devices such as the Tesla P4, GTX 1060, and up. Don't worry - it screams on an RTX 4090 too! (See benchmarks on Github).
Memory optimized - all three default Whisper (base, medium, large-v2) models loaded simultaneously with TTS support inside of 6GB VRAM. LLM support defaults to int4 quantization (conversion scripts included). ASR/STT + TTS + Vicuna 13B require roughly 18GB VRAM. Less for 7B, of course!
ASR. Heavy emphasis - Whisper optimized for very high quality as-close-to-real-time-as-possible speech recognition via a variety of means (Willow, WebRTC, POST a file, integration with devices and client applications, etc). Results in hundreds of milliseconds or less for most intended speech tasks. See YouTube WebRTC demo[0].
TTS. Primarily provided for assistant tasks (like Willow!) and visually impaired users.
LLM. Optionally pass input through a provided/configured LLM for question answering, chatbot, and assistant tasks. Currently supports LLaMA deriviates with strong preference for Vicuna (I like 13B). Built in support for quantization to int4 to conserve GPU memory.
Support for a variety of transports. REST, WebRTC, Web Sockets (primarily for LLM).
Performance and memory optimized. Leverages CTranslate2 for Whisper support and AutoGPTQ for LLMs.
Willow support. WIS powers the Tovera hosted best-effort example server Willow users enjoy.
Support for WebRTC - stream audio in real-time from browsers or WebRTC applications to optimize quality and response time. Heavily optimized for long-running sessions using WebRTC audio track management. Leave your session open for days at a time and have self-hosted ASR transcription within hundreds of milliseconds while conserving network bandwidth and CPU!
Support for custom TTS voices. With relatively small audio recordings WIS can create and manage custom TTS voices. See API documentation for more information.
Much like the release of Willow[1] last week this is an early release but we had a great response from HN and are looking forward to hearing what everyone thinks!
[0] - https://www.youtube.com/watch?v=PxCO5eONqSQ
[1] - https://github.com/toverainc/willow
I used Amazon Echo devices during their first 6 months of public availability before I got sufficiently creeped out to pull the plug permanently. Since then, I've wished for something similar that wasn't a 'black box' doing unknown things with my data.
When you posted about Willow here on HN, I immediately purchased an ESP-BOX (glad I didn't wait, they sold out quickly!)
I have a bunch of unused rpi CM4 that I stocked up on a few years back, I loaded Home Assistant onto one of them and connected Willow to it. I didn't have anything to automate yet, so all I got was error messages about missing HA intents. Then finally, last night, some Zigbee stuff got delivered and now, after an 8 year hiatus, I have a voice assistant again, and it doesn't creep me out.
After a couple hours last night messing around with HA and researching, I have some more stuff on the way. I'm going to be able to automate my window-mounted air conditioner using an IR device, and that same device includes an RF component so I can control my 433mhz ceiling fan (broadlink RM4 pro, for any interested reader). I have some temperature sensors on the way to assist with all that.
Home Assistant has a commercial side with a cloud offering that lets me control this setup from anywhere for about $60/year. It can even tie in to my phone to run automation based on when I leave or return. And all of this is open source, with none of my data going anywhere (except to Tovera's inference server, which I will shortly replace with my own)
I also saw your issue comment last night about a Willow Application Server. The idea is exciting, and I hope it happens, I am very interested in that idea.
Thanks again for what you're doing. I hope you see success with this, the entire home computing/home automation ecosystem will benefit in the long term.