llamacpp is a high performative open source solution capable of inference of large number of published LL models both in CPU and in GPU. It's written in C++.
It's easy to download and build in Windows or Linux.
It can be used as a command line tool, linked and used as a library from a variety of languages, including Python, or communicated with through a simple REST service which is also part of the same repo. It even has a simple Web frontend (built with React I believe) which allows you to use it for simple conversations (no bells and whistles).
And yet the author is using Ollama which itself is a wrapper around llamacpp (as most of them are) written in Python.