Python进阶：聊聊IO密集型任务、计算密集型任务，以及多线程、多进程

Python爱好者，关注爬虫、数据分析、数据挖掘、数据可视化等

319 人赞了该文章

IO密集型任务 VS 计算密集型任务

所谓IO密集型任务，是指磁盘IO、网络IO占主要的任务，计算量很小。比如请求网页、读写文件等。当然我们在Python中可以利用sleep达到IO密集型任务的目的。
所谓计算密集型任务，是指CPU计算占主要的任务，CPU一直处于满负荷状态。比如在一个很大的列表中查找元素（当然这不合理），复杂的加减乘除等。

多线程 VS 多进程

Python中比较常见的并发方式主要有两种：多线程和多进程。当然还有协程，这里不做介绍。

1、多线程

多线程即在一个进程中启动多个线程执行任务。一般来说使用多线程可以达到并行的目的，但由于Python中使用了全局解释锁GIL的概念，导致Python中的多线程并不是并行执行，而是“交替执行”。类似于下图：（图片转自网络，侵删）

所以Python中的多线程适合IO密集型任务，而不适合计算密集型任务。

Python提供两组多线程接口，一是thread模块_thread，提供低等级接口。二是threading模块，提供更容易使用的基于对象的接口，可以继承Thread对象来实现线程，此外其还提供了其它线程相关的对象，例如Timer，Lock等。

2、多进程

由于Python中GIL的原因，对于计算密集型任务，Python下比较好的并行方式是使用多进程，这样可以非常有效的使用CPU资源。当然同一时间执行的进程数量取决你电脑的CPU核心数。

Python中的进程模块为mutliprocess模块，提供了很多容易使用的基于对象的接口。另外它提供了封装好的管道和队列，可以方便的在进程间传递消息。Python还提供了进程池Pool对象，可以方便的管理和控制线程。

实例讲解Python中的多线程、多进程如何应对IO密集型任务、计算密集型任务

本文不会讲解Python多线程模块、多进程模块的具体用法，想了解的可以参考官方文档。这里通过一个实例，说明多线程适合IO密集型任务，多进程适合计算密集型任务。

首先定义一个队列，并定义初始化队列的函数：

# 定义全局变量Queue
g_queue = multiprocessing.Queue()

def init_queue():
    print("init g_queue start")
    while not g_queue.empty():
        g_queue.get()
    for _index in range(10):
        g_queue.put(_index)
    print("init g_queue end")
    return

然后定义IO密集型任务和计算密集型任务，分别从队列中获取任务数据：

# 定义一个IO密集型任务：利用time.sleep()
def task_io(task_id):
    print("IOTask[%s] start" % task_id)
    while not g_queue.empty():
        time.sleep(1)
        try:
            data = g_queue.get(block=True, timeout=1)
            print("IOTask[%s] get data: %s" % (task_id, data))
        except Exception as excep:
            print("IOTask[%s] error: %s" % (task_id, str(excep)))
    print("IOTask[%s] end" % task_id)
    return

g_search_list = list(range(10000))
# 定义一个计算密集型任务：利用一些复杂加减乘除、列表查找等
def task_cpu(task_id):
    print("CPUTask[%s] start" % task_id)
    while not g_queue.empty():
        count = 0
        for i in range(10000):
            count += pow(3*2, 3*2) if i in g_search_list else 0
        try:
            data = g_queue.get(block=True, timeout=1)
            print("CPUTask[%s] get data: %s" % (task_id, data))
        except Exception as excep:
            print("CPUTask[%s] error: %s" % (task_id, str(excep)))
    print("CPUTask[%s] end" % task_id)
    return task_id

准备完上述代码之后，进行试验：

if __name__ == '__main__':
    print("cpu count:", multiprocessing.cpu_count(), "\n")

    print("========== 直接执行IO密集型任务 ==========")
    init_queue()
    time_0 = time.time()
    task_io(0)
    print("结束：", time.time() - time_0, "\n")

    print("========== 多线程执行IO密集型任务 ==========")
    init_queue()
    time_0 = time.time()
    thread_list = [threading.Thread(target=task_io, args=(i,)) for i in range(5)]
    for t in thread_list:
        t.start()
    for t in thread_list:
        if t.is_alive():
            t.join()
    print("结束：", time.time() - time_0, "\n")

    print("========== 多进程执行IO密集型任务 ==========")
    init_queue()
    time_0 = time.time()
    process_list = [multiprocessing.Process(target=task_io, args=(i,)) for i in range(multiprocessing.cpu_count())]
    for p in process_list:
        p.start()
    for p in process_list:
        if p.is_alive():
            p.join()
    print("结束：", time.time() - time_0, "\n")

    print("========== 直接执行CPU密集型任务 ==========")
    init_queue()
    time_0 = time.time()
    task_cpu(0)
    print("结束：", time.time() - time_0, "\n")

    print("========== 多线程执行CPU密集型任务 ==========")
    init_queue()
    time_0 = time.time()
    thread_list = [threading.Thread(target=task_cpu, args=(i,)) for i in range(5)]
    for t in thread_list:
        t.start()
    for t in thread_list:
        if t.is_alive():
            t.join()
    print("结束：", time.time() - time_0, "\n")

    print("========== 多进程执行cpu密集型任务 ==========")
    init_queue()
    time_0 = time.time()
    process_list = [multiprocessing.Process(target=task_cpu, args=(i,)) for i in range(multiprocessing.cpu_count())]
    for p in process_list:
        p.start()
    for p in process_list:
        if p.is_alive():
            p.join()
    print("结束：", time.time() - time_0, "\n")

结果说明：

对于IO密集型任务：

直接执行用时：10.0333秒
多线程执行用时：4.0156秒
多进程执行用时：5.0182秒

说明多线程适合IO密集型任务。

对于计算密集型任务

直接执行用时：10.0273秒多线程执行用时：13.247秒多进程执行用时：6.8377秒

说明多进程适合计算密集型任务

老规矩，代码上传至GitHub：xianhu/LearnPython

（转载）Python进阶：聊聊IO密集型任务、计算密集型任务，以及多线程、多进程 - 笑虎的文章 - 知乎...

Python进阶：聊聊IO密集型任务、计算密集型任务，以及多线程、多进程

IO密集型任务 VS 计算密集型任务

多线程 VS 多进程

实例讲解Python中的多线程、多进程如何应对IO密集型任务、计算密集型任务

相关推荐