Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is cool!

...but 1 second per time step is a lot. I wonder how fast it would've been if it wasn't in Python. I think we as a society are doing a whole lot of people (especially physicists) a disservice by mainly teaching them Python rather than languages which are literally hundreds of times faster in some cases. Python works well when you just want to glue together existing fast C or Fortran libraries with Python APIs, but it quickly proves limiting.

I've personally been caught by the Python trap, where the easiest way to do something was to write a Python script to do it, and it worked, but then I wanted to try to process more data or whatever, and suddenly Python is a huge limiting factor. I then spend more time parallelizing the Python code to make it run faster, and it becomes a beast that's hard to debug and which maxes out 32 CPU cores and is still 10x slower than what a single threaded Rust program would've been and I regret my choice of language.

EDIT: Also, this is in no way anti-Python, I think it's a nice language and there are many uses where it is wholly appropriate.



I believe it is using Numba which converts to machine code.

https://numba.pydata.org/


Right, and compiling Python to machine code does get rid of the overhead associated with opcode dispatch... but it's not magic, Python is still a wildly dynamic language. It's mainly the dynamicness that makes it slow, not the fact that each opcode has to go through a switch statement in the cpython interpreter

To get significantly better performance with a JIT, you need one which analyzes the code at runtime to detect patterns, such as "this function is always called with an integer argument" or "the dictionary passed to this function always has this shape", like what V8 does. AFAIK Numba doesn't do that.

(Though if I'm wrong and there are benchmarks which shows Numba coming close to something like Rust in normal dynamic Python code, please do correct me! I haven't done much research on Numba specifically)


Your Python program maxing out 32 cores was x10 slower than a single threaded Rust program?

Are you exaggerating? If not, can you share a bit more?


It's slightly exaggerated; the Python program might not have been able to fully utilize all cores, it's really just 16 cores with hyperthreading. But it's not unreasonable: 150x speed-up isn't unexpected when going from Python to C/Rust/C++ in number crunching code, 150/16=9.3 (16 is based on the assumption that the gains from hyperthreading and the losses from imperfect parallelism more or less cancel out)

I don't think I have the code for these large-ish data processing experiments I did any more, but it would be fun to make some toy problems with large amounts of data and create comparable Python and C implementations and create a blog post with the results.


Can we assume that you weren't able to use Numpy here, or at least that your inner loops weren't using it? It can be faster than C++ when you don't happen to know all the optimizations the Numpy library writers knew.


Yeah, I'm just talking about normal Python code here. If you're able to express your problem such that numpy or scipy or pytorch or NLTK or some other C/Fortran library does all the number crunching, Python's performance is less of an issue




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: