At some point you've got to get from language to action, yes - in my case, I use...

At some point you've got to get from language to action, yes - in my case, I use the LLM as a multi-stage classifier, mapping from a set of high-level areas of capability, to more focused mappings to specific systems and capabilities. So the first layer of classification might say something like "this interaction was about <environmental control>" where <environmental control> is one of a finite set of possible systems. The next layer might say something like "this is about <lighting>", and the next layer may now have enough information to interrogate using a specific enough prompt (which may be generated based on a capability definition, so for example "determine any physical location, an action, and any inputs regarding colour or brightness from the following input" - which can be generated from the possible inputs of the capability you think you're addressing).

Of course this isn't fool proof, and there still needs to be work defining capabilities of systems, etc. (although these are tasks AI can assist with). But it's promising - "teaching" the system how to do new things is relatively simple, and effectively akin to describing capabilities rather than programming directly.