How to Retrieve 100k Objects with Python: Why We Prefer Threading to Asyncio

mrosett · on May 15, 2019

OP here. There are some good guides [0] about using Python's asyncio module (and libraries built on top of it) but I hadn't seen a comparison of it to threading which is what I prefer for client-side coding. The key difference is that using the async module requires modifying every function in the call stack, while threading can be cleanly wrapped around existing code. So although threading has more of a memory footprint and feels less Pythonic, it's a much better option when working with third-party libraries.

[0]: https://pawelmhm.github.io/asyncio/python/aiohttp/2016/04/22...

mattbillenstein · on May 15, 2019

> The key difference is that using the async module requires modifying every function in the call stack

This is why I still prefer gevent - it's easy to do async I/O using familiar patterns (gevent Pool works like multiprocessing Pool) while mostly writing blocking code.

badrequest · on May 15, 2019

Just wanted to say I appreciate how ya'll made a bucket available to test the code on. More of this from technical blogposts, please!

mrosett · on May 15, 2019

I’m glad you appreciated that! I want results to be reproducible.

dekhn · on May 15, 2019

I use very simple code- multiprocessing wrapped around s3.download_file and see gigabit+ throughput with a single core.