OK, let’s not because i don’t know enough about it and far smarter people have written vastly about it already. If you do want to want to learn about the GIL and would like a headache, i’ve found Larry Hastings’ talk on removing the GIL interesting. Ultimately, all posts regarding threading state that it is only useful if something is I/O-bound. the ambiguity of the term “I/O-bound” will confound you for the rest of your life.
For my purposes, i’ve been benchmarking the most recent code of Gnocchi to verify we didn’t introduce any performance regressions since the last release. As i lost my Ceph environment recently, i’ve been benchmarking using a redis+file Gnocchi deployment rather than a pure Ceph deployment as i’ve done previously
During my benchmarking, i discovered an issue when adjusting the parallel_operations option in Gnocchi which executes certain I/O related parts of the code in threads to improve performance. Suffice it to say, it did not improve performance but rather decreased performance by 15%-30%. This was contrary to the results i had when benchmarking Gnocchi previously, which gave 25% better performance when threading was enabled.
So i decided to do what i normally don’t do: not give up. I wrote really trivial code to test what was happening. The following is my test code which crudely simulates a part of Gnocchi’s code. It will essentially try to read and write eight unique time-series to it’s own file/key/object:
Running the above code, i got what i saw while running my test script against Gnocchi: threading sucks and makes things worse.
When reading and writing to the local disk, the performance dips up to 8x occassionally:
A similar result happens when interacting with a local redis. I/O in this case should be less significant as it’s now interacting with an in-memory service:
Where this becomes arguably interesting is when i change the targets to interact with remote targets. When writing to a machine with a ping of ~400µs:
In the above case, writing to a remote drive makes the threaded scenario perform almost 2x better. Similarly, when pushing to a Redis service on the same remote machine, threading performs better:
When pushing to a remote machine that is ~200µs away, it becomes less obvious whether to use a single thread or multiple threads. Pushing to Redis, single thread execution netted better performance in my environment but when pushing to a remote disk, the inverse held true:
So when should you use threads? i have no idea. For reference, the fake workload i am creating in this scenario is minimal so it’s definitely I/O-bound regardless if storage is local or remote:
All i can conclude is that concurrency is not parallelism, threading in python is concurrency, and concurrency in python is a b!tch. That said, you probably don’t need threading if you’re doing anything locally… although it could be a big file… but how big a file… ARGGGHHH!