If you're working on CPU-bound tasks with NumPy/SciPy using threads then you have to think very hard to make sure most of the critical sections are hitting the NumPy calls in C which release the GIL. It's not a great reliable way to program. The way the author describes is basically the only pure-Python way of achieving parallelism for this kind of problem.
If you're holding a global mutex every 100 instructions while context switching between CPU bound tasks, then yes, the GIL does suck. There are a class of IO-bound problems where threading/evented models in Python can be used effectively but that's not the class of problems the author is talking about here.