ceph pg计算

官网原文:https://ceph.com/pgcalc/

Ceph PGs per Pool Calculator

 

Instructions

  1. Confirm your understanding of the fields by reading through the Key below.
  2. Select a "Ceph Use Case" from the drop down menu.
  3. Adjust the values in the "Green" shaded fields below.
    Tip: Headers can be clicked to change the value throughout the table.
  4. You will see the Suggested PG Count update based on your inputs.
  5. Click the "Add Pool" button to create a new line for a new pool.
  6. Click the icon to delete the specific Pool.
  7. For more details on the logic used and some important details, see the area below the table.
  8. Once all values have been adjusted, click the "Generate Commands" button to get the pool creation commands.

Key

Pool Name

Name of the pool in question. Typical pool names are included below.

Size

Number of replicas the pool will have. Default value of 3 is pre-filled.

OSD #

Number of OSDs which this Pool will have PGs in. Typically, this is the entire Cluster OSD count, but could be less based on CRUSH rules. (e.g. Separate SSD and SATA disk sets)

%Data

This value represents the approximate percentage of data which will be contained in this pool for that specific OSD set. Examples are pre-filled below for guidance.

Target PGs per OSD

This value should be populated based on the following guidance:

100

If the cluster OSD count is not expected to increase in the foreseeable future.

200

If the cluster OSD count is expected to increase (up to double the size) in the foreseeable future.

Notes

  • "Total Data Percentage" below table should be a multiple of 100%.
  • "Total PG Count" below table will be the count of Primary PG copies. However, when calculating total PGs per OSD average, you must include all copies.
  • It's also important to know that the PG count can be increased, but NEVER decreased without destroying / recreating the pool. However, increasing the PG Count of a pool is one of the most impactful events in a Ceph Cluster, and should be avoided for production clusters if possible.

Logic behind Suggested PG Count

( Target PGs per OSD ) x ( OSD # ) x ( %Data )/( Size )

  1. If the value of the above calculation is less than the value of ( OSD# ) / ( Size ), then the value is updated to the value of ( OSD# ) / ( Size ). This is to ensure even load / data distribution by allocating at least one Primary or Secondary PG to every OSD for every Pool.
  2. The output value is then rounded to the nearest power of 2.
    Tip: The nearest power of 2 provides a marginal improvement in efficiency of the CRUSH algorithm.
  3. If the nearest power of 2 is more than 25% below the original value, the next higher power of 2 is used.

Objective

  • The objective of this calculation and the target ranges noted in the "Key" section above are to ensure that there are sufficient Placement Groups for even data distribution throughout the cluster, while not going high enough on the PG per OSD ratio to cause problems during Recovery and/or Backfill operations.

Effects of enpty or non-active pools:

  • Empty or otherwise non-active pools should not be considered helpful toward even data distribution throughout the cluster.
  • However, the PGs associated with these empty / non-active pools still consume memory and CPU overhead.

 

 

Pool 对应 PG PGP数量的计算公式:  官方计算地址

Total PGs = ((Total_number_of_OSD * Target PGs per OSD) / max_replication_count) / pool_count

Target PGs per OSD 通常被设置为 100

 

http://ceph.sptty.com/rados/operations/placement-groups/

ceph pg计算

 

 

预定义 PG_NUM

用此命令创建存储池时:

ceph osd pool create {pool-name} pg_num

确定 pg_num 取值是强制性的,因为不能自动计算。下面是几个常用的值:

  • 少于 5 个 OSD 时可把 pg_num 设置为 128
  • OSD 数量在 5 到 10 个时,可把 pg_num 设置为 512
  • OSD 数量在 10 到 50 个时,可把 pg_num 设置为 4096
  • OSD 数量大于 50 时,你得理解权衡方法、以及如何自己计算 pg_num 取值
  • 自己计算 pg_num 取值时可借助 pgcalc 工具

随着 OSD 数量的增加,正确的 pg_num 取值变得更加重要,因为它显著地影响着集群的行为、以及出错时的数据持久性(即灾难性事件导致数据丢失的概率)。

 

 

Pool是ceph存储数据时的逻辑分区,它起到namespace的作用。其他分布式存储系统,比如Mogilefs、Couchbase、Swift都有pool的概念,只是叫法不同。每个pool包含一定数量的PG,PG里的对象被映射到不同的OSD上,因此pool是分布到整个集群的。

  pool有两种方法增强数据的可用性,一种是副本(replicas),另一种是EC(erasure coding 纠错码)。从Firefly版本起,EC功能引入。在EC里,数据被打散成碎片,加密,然后进行分布式存储。ceph由于其分布式能力,处理EC非常成功。pool在创建时可以设置这两种方法之一,但不能同时设置两者。

pool默认的副本数量是3,我们可以自己控制副本的数量。ceph的复制能力非常灵活,可以在任何时候更改这个参数:

 语法格式: ceph osd pool set {Pool-name} size {num}
 例如:  ceph osd pool set firstpool size 2

   在数据写往pool时,遵循CRUSH的规则集,也就是说,写往哪个位置以及副本数量,受规则集影响。这个规则集是pool的重要功能。比如我们可以通过规则集,定义一个pool使用SSD存储,另一个pool使用SATA存储。

  pool也支持snapshot功能。可以运行ceph osd pool mksnap命令创建pool的快照,并且在必要的时候恢复它。还可以设置pool的拥有者属性,从而进行访问控制。

创建ceph pool的命令如下,它的参数包括pool名字、PG和PGP的数量:

$ ceph osd pool create mytest 128 128
pool 'mytest' created

查看pool有几种方式,比如:

ceph pg计算

$ rados lspools
data
metadata
rbd
mytest

$rados df

  pool name category KB objects clones degraded unfound rd rd KB wr wr KB
  data - 0 0 0 0 0 0 0 0 0
  metadata - 0 0 0 0 0 0 0 0 0
  mytest - 2 4 2 0 0 0 0 5 3
  rbd - 0 0 0 0 0 0 0 0 0
  total used 15830040 4
  total avail 109934580

$ ceph osd lspools
0 data,1 metadata,2 rbd,3 mytest,

$ ceph osd dump |grep pool
pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 43 flags hashpspool crash_replay_interval 45 stripe_width 0
pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 41 flags hashpspool stripe_width 0
pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 45 flags hashpspool stripe_width 0
pool 3 'mytest' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 58 flags hashpspool stripe_width 0

ceph pg计算

毫无疑问ceph osd dump输出的信息最详尽,包括pool ID、副本数量、CRUSH规则集、PG和PGP数量等。