I write something similar for my own problems. Usually a single instance alone isn’t going to cut it, so we run a portfolio of solvers and treat it like a multi armed bandit problem as to choose which solver with its specific hyper parameters is finding solutions / counter examples faster. A pretty common trick when you do this is to dynamically resize your portfolio with new solvers introduced with a high likelihood of solving fast based off the parameters of other solvers that are solving fast, a particle filter of sorts.
This takes care of setting and handling individual solver timeouts, while allowing the user to specify not only a timeout for the whole portfolio, possibly lengthening when new solvers are introduced, but also a max delay between any two solutions.
Either way, you’re still forced to consider if the model can return sat or unsat, or a timeout. If it’s a timeout we don’t swap in the faster code, if it’s unsat then same deal, but if it’s sat we can get some new code to jump to
Right, the problem with SMT solvers (and the SAT solvers they're based on, although the problem there is slightly better) is that they're unpredictable and highly dependent on the form in which you pose the instance, often in very unintiitive ways (they can fail on an instance that you'd think would be as easy or even easier than another they do manage).
SAT/SMT hardness regression has come a long way, especially if you can insert profiling information for real instances.
If this is a production model for the cloud, the engineer should have put some thought into the encoding as to express sat/unsat clearly, if possible.
In the general case, sure, to some extent one can construct an adversarial example just as with NN (random 3-sat hardness cliff). Real world formulations of problems are either nice or atrocious IME, discoverable at dev time.
Oh, I don't mind adversarial examples. As a user of SMT solvers (for semi-automated program verification), I just find them unpredictable and frustrating. But I've never employed SMT solvers as a library in some production system, and it's good to hear that in that case the instances could be controlled well enough for the normal operation to be predictable.
Either way, you’re still forced to consider if the model can return sat or unsat, or a timeout. If it’s a timeout we don’t swap in the faster code, if it’s unsat then same deal, but if it’s sat we can get some new code to jump to