This is cool! ...but 1 second per time step is a lot. I wonder how fast it would...

hermitcrab · on March 17, 2024

I believe it is using Numba which converts to machine code.

https://numba.pydata.org/

mort96 · on March 17, 2024

Right, and compiling Python to machine code does get rid of the overhead associated with opcode dispatch... but it's not magic, Python is still a wildly dynamic language. It's mainly the dynamicness that makes it slow, not the fact that each opcode has to go through a switch statement in the cpython interpreter

To get significantly better performance with a JIT, you need one which analyzes the code at runtime to detect patterns, such as "this function is always called with an integer argument" or "the dictionary passed to this function always has this shape", like what V8 does. AFAIK Numba doesn't do that.

(Though if I'm wrong and there are benchmarks which shows Numba coming close to something like Rust in normal dynamic Python code, please do correct me! I haven't done much research on Numba specifically)

Jabrov · on March 17, 2024

Your Python program maxing out 32 cores was x10 slower than a single threaded Rust program?

Are you exaggerating? If not, can you share a bit more?

mort96 · on March 18, 2024

It's slightly exaggerated; the Python program might not have been able to fully utilize all cores, it's really just 16 cores with hyperthreading. But it's not unreasonable: 150x speed-up isn't unexpected when going from Python to C/Rust/C++ in number crunching code, 150/16=9.3 (16 is based on the assumption that the gains from hyperthreading and the losses from imperfect parallelism more or less cancel out)

I don't think I have the code for these large-ish data processing experiments I did any more, but it would be fun to make some toy problems with large amounts of data and create comparable Python and C implementations and create a blog post with the results.

pulvinar · on March 18, 2024

Can we assume that you weren't able to use Numpy here, or at least that your inner loops weren't using it? It can be faster than C++ when you don't happen to know all the optimizations the Numpy library writers knew.

mort96 · on March 18, 2024

Yeah, I'm just talking about normal Python code here. If you're able to express your problem such that numpy or scipy or pytorch or NLTK or some other C/Fortran library does all the number crunching, Python's performance is less of an issue