Hadoop学习（二）——Capacity Scheduler学习

Hadoop学习（二）——Capacity Scheduler学习

目录：

1、Capacity Scheduler资源调度器的特点：

2、官方文档查看：

3、在Ambari上配置Resource Manager的Capacity Scheduler资源调度器：

4、结果展示：

1、Capacity Scheduler资源调度器的特点：

（1）当队列已满，Capacity Scheduler资源调度器不会强制释放Container，当一个队列资源不够用时，这个队列只能获得其它队列释放后的Container资源，即“弹性队列”。同时也可以为队列设置一个最大资源使用量，防止过多占用其他队列的资源。

（2）有一个专门的队列来运行小任务，但是为小任务专门设置一个队列会预先占用一定的集群资源，这就导致大任务的执行时间会落后于使用FIFO资源调度器的时间。

（3）在一个队列内部，资源的调度采用的是FIFO（先进先出）策略

2、官方文档查看：

CONFIGURING YARN CAPACITY SCHEDULER WITH APACHE AMBARI

In this tutorial we are going to explore how we can configure YARN Capacity Scheduler from Ambari.

YARN’s Capacity Scheduler is designed to run Hadoop applications in a shared, multi-tenant cluster while maximizing the throughput and the utilization of the cluster.

Traditionally each organization has it own private set of compute resources that have sufficient capacity to meet the organization’s SLA. This generally leads to poor average utilization. Also there is heavy overhead of managing multiple independent clusters.

Sharing clusters between organizations allows economies of scale . However, organizations are concerned about sharing a cluster in the fear of not getting enough available resources that are critical to meet their SLAs.

The Capacity Scheduler is designed to allow sharing a large cluster while giving each organization capacity guarantees. There is an added benefit that an organization can access any excess capacity not being used by others. This provides elasticity for the organizations in a cost-effective manner.

Sharing clusters across organizations necessitates strong support for multi-tenancy since each organization must be guaranteed capacity and safeguards to ensure the shared cluster is impervious to single rogue application or user or sets thereof. The Capacity Scheduler provides a stringent set of limits to ensure that a single application or user or queue cannot consume disproportionate amount of resources in the cluster. Also, the Capacity Scheduler provides limits on initialized/pending applications from a single user and queue to ensure fairness and stability of the cluster.

The primary abstraction provided by the Capacity Scheduler is the concept of queues. These queues are typically set up by administrators to reflect the economics of the shared cluster.

To provide further control and predictability on sharing of resources, the Capacity Scheduler supports hierarchical queues to ensure resources are shared among the sub-queues of an organization before other queues are allowed to use free resources, thereby providing affinity for sharing free resources among applications of a given organization.

对其进行翻译：

Apache Ambari中配置Yarn资源调度器Capacity Scheduler

在本教程中，我们将探索如何从Ambari中配置Yarn的Capacity Scheduler资源调度器。

Yarn的Capacity Scheduler资源调度器的设计是在共享的多租户集群中运行Hadoop应用程序，同时最大化集群的系统吞吐量和资源利用率。

传统上每一个组织有它自己的私有的一组计算资源，它们有足够的能力去满足这个组织的SLA（Service-Level Agreement）。这通常会导致平均资源利用率不高。此外，管理多个独立集群的开销很大。

在组织间共享集群资源允许规模经济。然而，组织关注的是共享集群这种方式，他们害怕共享集群不能够提供足够的资源去满足它们各自的SLA。

跨组织共享集群需要对多租户给予强有力的支持，因为每个组织必须保证其资源使用及安全防护，以确保共享集群不受单个流氓应用程序或者用户或者其组合的影响。Capacity Scheduler资源调度器提供了一组严格的限制来确保单个应用程序或者用户或者队列不会耗尽集群中不相称的大量资源。同时，Capacity Scheduler资源调度器对来自单个用户和队列中初始化/挂起的应用提供限制，以确保集群的公平性和稳定性。

Capacity Scheduler资源调度器提供的主要抽象概念是队列。这些队列通常由管理员设置，来反映共享集群的经济性。

Capacity Scheduler资源调度器的设计是允许共享一个大集群的资源，同时保证每个组织的资源供给。另一个好处是一个组织能够访问任何不被其他人使用的额外资源。这个好处以成本效益的方式为组织间提供了弹性。

为了给共享资源提供更进一步的控制和预测能力，Capacity Scheduler支持层次化队列调度，确保一个组织的子队列间其它队列可以使用空闲的共享资源，由此对于特定组织的应用程序之间共享空闲资源能提供最密切的使用。

3、在Ambari上配置Resource Manager的Capacity Scheduler资源调度器：

（1）点击YARN->Configs->查看Scheduler部分，修改配置yarn.resourcemanager.scheduler.class以及Capacity Scheduler部分。

Hadoop学习（二）——Capacity Scheduler学习

图3.1 配置项截图

Hadoop学习（二）——Capacity Scheduler学习

相关推荐