More

pavpanchekha · 2026-04-04T14:04:09 1775311449

Documented here but yes it's an average, of something similar to but not exactly the same as relative error: https://herbie.uwplse.org/doc/latest/error.html

It's true that averages can be misleading but we encourage users to think about it instead as a percentage of inputs. In practice the error distribution is very bimodal, the two modes being "basically fine" (a few ulps of error) and "garbage" (usually 0 instead of some actual value)

pavpanchekha · 2026-04-04T14:01:27 1775311287

Author here. The speed up is modeled throughput, though the model is relatively naive. It's possible to disable branches by turning off the regimes flag, see https://herbie.uwplse.org/doc/1.0/options.html

bee_rider · 2026-04-04T15:03:56 1775315036

Nice!

What’s uwplse mean? I mixed up the letters and misread it as ulp-wise which works for the project, haha.

pavpanchekha · 2026-04-04T18:09:31 1775326171

University of Washington Programming Languages and Software Engineering (research group).

I'm not at UW any more, I'm now at Utah, but some of the Herbie team is at UW and they provide the infrastructure

pavpanchekha · 2026-04-04T13:59:55 1775311195

Author here. I've got a few papers about this problem (including one in submission), but it is very very hard to do, especially with acceptable overhead. The state of the art is maybe 100x overhead.

pavpanchekha · 2026-04-04T13:58:44 1775311124

It is, there's a page in the documentation about how errors are defined. Let me also add: Herbie generally gives the most accurate option it found first, and then the other stuff might be useful for speed (0.5x is way faster than two square roots and a divide!) but it's not as accurate

pavpanchekha · 2026-04-04T13:56:17 1775310977

Author here! Yes, the float distribution isn't what you want in practice, but distribution selector isn't really the right thing either, because a low probability bad result can still be pretty bad! Hence the range selector; the float distribution is good at picking extreme values that trigger FP error.

We usually recommend looking for 90%+ accuracy or carefully examining the accuracy plot

pavpanchekha · 2026-04-04T13:49:56 1775310596

It was me. Damn it you're right! Will fix!

pavpanchekha · 2026-03-30T14:56:15 1774882575

In calculus the core issue is that the concept of a "function" was undefined but generally understood to be something like what we'd call today an "expression" in a programming language. So, for example, "x^2 + 1" was widely agreed to be a function, but "if x < 0 then x else 0" was controversial. What's nice about the "function as expression" idea is that generally speaking these functions are continuous, analytic [1], etc and the set of such functions is closed under differentiation and integration [2]. There's a good chance that if you took AP Calculus you basically learned this definition.

The formal definition of "function" is totally different! This is typically a big confusion in Calculus 2 or 3! Today, a function is defined as literally any input→output mapping, and the "rule" by which this mapping is defined is irrelevant. This definition is much worse for basic calculus—most mappings are not continuous or differentiable. But it has benefits for more advanced calculus; the initial application was Fourier series. And it is generally much easier to formalize because it is "canonical" in a certain sense, it doesn't depend on questions like "which exact expressions are allowed".

This is exactly what the article is complaining about. The non-rigorous intuition preferred for basic calculus and the non-rigorous intuition required for more advanced calculus are different. If you formalize, you'll end up with one rigorous definition, which necessarily will have to incorporate a lot of complexity required for advanced calculus but confusing to beginners.

Programming languages are like this too. Compare C and Python. Some things must be written in C, but most things can be more easily written in Python. If the whole development must be one language, the more basic code will suffer. In programming we fix this by developing software as assemblages of different programs written in different languages, but mechanisms for this kind of modularity in formal systems are still under-studied and, today, come with significant untrusted pieces or annoying boilerplate, so this solution isn't yet available.

[1] Later it was discovered that in fact this set isn't analytic, but that wasn't known for a long time.

[2] I am being imprecise; integrating and solving various differential equations often yields functions that are nice but aren't defined by combinations of named functions. The solution at the time was to name these new discovered functions.

lilyball · 2026-03-30T20:19:50 1774901990

> If you formalize, you'll end up with one rigorous definition

Can't you just formalize both definitions and pick the one to work with based on what you want to do? Surely the only obstacle here is the time and effort it takes to write the formalization?

Or, alternatively, just because you've formalized the advanced calculus version doesn't mean you need to use the formalization when teaching basic calculus. The way we've proven something and the way we teach that something don't have to be the same.

Someone · 2026-03-31T07:26:05 1774941965

> the concept of a "function" was undefined but generally understood to be something like what we'd call today an "expression" in a programming language. So, for example, "x^2 + 1" was widely agreed to be a function, but "if x < 0 then x else 0" was controversial

Good answer, but not the best example. In many programming languages, the latter is easily written as an expression:

   (x - abs(x)) / 2

It boils down to what you call an elementary function (https://en.wikipedia.org/wiki/Elementary_function).

Taking the absolute of a number generally is not assumed to be in that set, but there is no generally accepted strict definition.

A step ‘up’ from elementary functions are special functions (https://en.wikipedia.org/wiki/Special_functions). Likewise, that is loosely defined.

For example https://en.wikipedia.org/wiki/List_of_eponyms_of_special_fun... mentions lots of polynomials, one of them https://en.wikipedia.org/wiki/Cyclotomic_polynomial, which definitely are elementary according to https://en.wikipedia.org/wiki/Elementary_function.

Wikipedia also contradicts itself in https://en.wikipedia.org/wiki/Closed-form_expression, where it says

“a closed form expression or formula is one that is formed with constants, variables, and a set of functions considered as basic and connected by arithmetic operations (+, −, ×, /, and integer powers) and function composition. Commonly, the basic functions that are allowed in closed forms are nth root, exponential function, logarithm, and trigonometric functions”

and

“For example, if one adds polynomial roots to the basic functions, the functions that have a closed form are called elementary functions”

That would put the goniometric functions in the basic set allowed in elementary functions.

coldcity_again · 2026-03-30T15:05:20 1774883120

That's very helpful and clear, thank you

pavpanchekha · 2026-03-14T17:02:06 1773507726

They're cheap but not free, especially at the front end of the CPU where it's just a lot more instructions to churn through. What the branch predictor gets you is it turns branches, which would normally cause a pipeline bubble, to be executed like straightline code if they're predicted right. It's a bit like a tracing jit. But you will still have a bunch of extra instructions to, like, compute the branch predicate.

beng-nl · 2026-03-14T17:30:02 1773509402

Worse, IMO, is the never taken branch taking up space in branch prediction buffers. Which will cause worse predictions elsewhere (when this branch ip collides with a legitimate ip). Unless I missed a subtlety and never taken branches don’t get assigned any resources until they are taken (which would be pretty smart actually).

pavpanchekha · 2026-03-11T18:54:18 1773255258

Horner's form is typically also more accurate, or at least, it is not bit-identical, so the compiler won't do it unless you pass -funsafe-math, and maybe not even then.

pavpanchekha · 2026-03-01T01:32:18 1772328738

Deterministic output is incompatible with batching, which in turn is critical to high utilization on GPUs, which in turn is necessary to keep costs low.

yorwba · 2026-03-01T08:42:00 1772354520

Batching doesn't mean the computation suddenly becomes non-deterministic. Ideally, it just means you perform the same computation on multiple token streams in the batch simultaneously, without the values interacting with each other. Vectorization, basically.

Batching leads to cross-contamination in practice because of things like MoE load-balancing within the batch, or supporting different batch sizes with different kernels that have different numerical behavior. But a careful implementation could avoid such issues while still benefiting from the higher efficiency of batching.