Understanding Parallel Processing in Python with concurrent.futures and ThreadPoolExecutor

data engineering

Publish Date: 2023-07-24

When it comes to improving the performance of your Python code, parallel processing is a powerful tool that can significantly reduce execution times for tasks that can be broken up into independent pieces. One module that makes it easy to implement parallel processing is concurrent.futures. In this blog, we will explore how to use this module with a particular focus on the ThreadPoolExecutor class.

What is concurrent.futures?

Introduced in Python 3.2, concurrent.futures is a high-level interface for asynchronously executing Preemptive multitasking that can take advantage of multi-core processors to maximize performance. It provides several classes for executing tasks, including ThreadPoolExecutor for thread-based parallelism.

Code Example: Using ThreadPoolExecutor

Here’s a basic example of how to use ThreadPoolExecutor:

import concurrent.futures
import time
def some_function(n):
    print(f’start function {n}\n’)
    time.sleep(n)
    print(f’end function {n}\n’)
    return n
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
    futures = [executor.submit(some_function, n) for n in range(1, 4)]
    for future in concurrent.futures.as_completed(futures):
        result = future.result()
        print(f’Thread that finished: {result} \n’)

In this code, we define a task some_function that simulates a long-running operation. The executor.submit method schedules tasks to be executed and returns a Future object. We then use as_completed to yield these Future objects as they complete.

Collecting Results with executor.map()

If you want to collect all results and return them together, executor.map is a timesaver.

def main():
    with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
        results = executor.map(some_function, range(1, 4))
        print(f’The results: {list(results)}’)
if __name__ == “__main__“:
    main()

In this example, executor.map simplifies aggregating the results. It runs some_function for each input and collects the results in the order they were called.

Using executor.map() with Multiple Parameters

If your function takes more than one argument, the map function can still be used by passing additional iterables.

def multiply(x, y):
    return x * y
def main():
    with concurrent.futures.ThreadPoolExecutor() as executor:
        xs = [1, 2, 3, 4, 5]
        ys = [10, 20, 30, 40, 50]
        results = executor.map(multiply, xs, ys)
        print(list(results))
if __name__ == “__main__“:
    main()

Conclusion

Python’s concurrent.futures module, and the ThreadPoolExecutor class in particular, provide powerful tools for implementing parallel processing in your code. By understanding and utilizing these tools, you can dramatically improve the performance of your Python applications.

robot learner

https://datasciencebyexample.github.io/2023/07/24/parallel-processing-using-cocurrent-futures-threadpool-in-python/

All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source robot learner !

python concurrent threadpool

Advanced GitHub and Pip Usage, Creating Releases, Managing Versions and Access Tokens

2023-07-27 data engineering

github

Understanding Python’s __init__ Method and Interpreting Errors with Empty Initialization

2023-07-24 data engineering

python