That's might closer than you think I guess.
The thing is that the new Assist pipeline is fully customisable and can use other models as well. They already have a ChatGPT integration which does not able to control entities in HA but at least you can have a conversation with ChatGPT in speach trough HA.
So if you spin up somehow an LLM model locally and connect create a HA Assist pipeline with it and than you use Willow(s future release which should be able to leverage the new Assist featre) as a phisical interface than you are golden.
It may be hard or impossible today but I think within months HA and Willow will mature into a state where tha bigges problem will be the training and runing a good enough LLM model locally. But I bet a good amount of hackers are already hard working on that part anyway.
I've been trying to adapt it to an offline LLM model, probably a LLaMA-like one using the llm package for Rust, or a ggml-based C implementation like llama.c.
It could even be fine-tuned or trained to perform better and always output only the json.
This could be a good fit with open sourced tovera when that is released.
I like the idea of supporting natural language commands that feel more natural and don't have to follow a specific syntax.
It can also process general LLM requests, possibly using a third-party LLM like Bard for more up to date responses.