Linux Cluster setup
http://oboguev.net/kernel-etc/linux-cluster-setup.html
Helpful reading:
https://alteeve.ca/w/AN!Cluster_Tutorial_2
https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial_-_Archive
RedHat 7 documentation
http://clusterlabs.org/quickstart-redhat.html
http://clusterlabs.org/quickstart-ubuntu.html
http://clusterlabs.org/quickstart-suse.html
http://clusterlabs.org/doc
http://clusterlabs.org/faq.html
SUSEdocumentation
"ProLinux High Availability Clustering" (Kindle)
"CentOSHigh Availability" (Kindle)
http://corosync.org
https://alteeve.ca/w/Corosync
google: corosynctotem
google: OpenAIS
https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial_-_Archive
RedHat 7 documentation
RHEL7 High Availability Add-On Administration
HighAvailability Add-On Reference Reference
GlobalFile System 2
LoadBalancer Administration
http://clusterlabs.orgHighAvailability Add-On Reference Reference
GlobalFile System 2
LoadBalancer Administration
http://clusterlabs.org/quickstart-redhat.html
http://clusterlabs.org/quickstart-ubuntu.html
http://clusterlabs.org/quickstart-suse.html
http://clusterlabs.org/doc
http://clusterlabs.org/faq.html
SUSEdocumentation
"ProLinux High Availability Clustering" (Kindle)
"CentOSHigh Availability" (Kindle)
http://corosync.org
https://alteeve.ca/w/Corosync
google: corosynctotem
google: OpenAIS
Older (CMAN-based) clusters included:
/etc/cluster/cluster.conf=> corosync.conf + cib.xml
system-config-cluster or conga (luci + ricci) configuration UI=> replaced by (still deficient) pcs-gui on port 2224
rgmanager => pacemaker
ccs => pcs

system-config-cluster or conga (luci + ricci) configuration UI=> replaced by (still deficient) pcs-gui on port 2224
rgmanager => pacemaker
ccs => pcs
Setup Corosync/Pacemaker cluster namedvccomposed of three nodes (vc1,vc2,vc3)
Based on Fedora Server 22.
Warning: a bug in virt-manager Clone command may destroy AppArmorprofile both on source and target virtual machines.
Replicate virtual machines manually, or at least backup sourcemachine profile (located in /etc/apparmor.d/libvirt).
Networkset-up:
It is desirable to set up separate network cards for general internettraffic, SAN traffic and cluster backchannel traffic.
Ideally,interfaces should be link-aggregated (bonded or teamed) pairs, witheachlink in a pair connected to separate stacked switches.
- backchannel/cluster network
- can be two sub-nets (on separate interfaces) withcorosync redundant ring configured through them
- however bonded interface is easier to set up, moreresilient to failures, and allows traffic for other components befail-safe too
- it is also possible to bind multiple addresses to thebonded interface and set up corosync redundant ring amont them - but itdoes not make sense
- SAN network
- can be two sub-nets (on separate interfaces), with iSCSImulti-pathing configured between them
- however can also be bonded: either utilizing one sub-netfor all SAN traffic (with disks dual-ported between iSCSI portalswithin the same sub-net, but different addresses), or binding muiltiplesub-nets to the bonded interface (with disks dual-ported between
iSCSIportals located on different sub-nets)
- general network
- better be bonded, so each node can be convenientlyaccessed by a single IP address
- however load balancer can instead be configured to usemultiple addresses for a node
It makes sense to use dual-port network cards and scattergeneral/SAN/cluster traffic ports between them, so a card failure doesnot bring down the whole network category.
If interfaces are bonded or teamed (rather than configured for separatesub-nets), switches should allow cross-traffic, i.e. be eitherstackable(preferably) or have ISL/IST (inter-switch link/trunking, akaSMLT/DSMLT/R-SMLT). 802.1aq (Shortest Path Bridging) support may bedesirable. See here.
Notethat IPMI (AMT/SOL) interface cannot be included in the bond or teamwithout loosing its IPMI capabillity, since it ceases to be indviduallyaddressable (having own P address).
Thus if IPMI is to be used for fencing or remote management, IPMI port is to be left alone.
For a real physical NIC, can identify port with
ethtool --identify ethX [10] => flashes LED 10 times
When hosting cluster nodes in KVM, create KVM macvtap interfaces (virtio/Bridge).
Bond interfaces:
Notethat bonded/teamed interfaces in most setups do not provide increaseddata speed or increased bandwidth from one node to another. Theyprovide a failover and may provide an increasedaggregate
bandwidth for concurrent connections tomultiple target hosts (but not to the same target host). However, see further down below.
Use network manager GUI:
"+" -> select Bond
Add->Create->Ethernet->select eth0
Add->Create->Ethernet->select eth1
Link Monitoring: MII => check media state
ARP => use ARP to "ping" specified IP addresses(comma-separated),
at least one responds -> link ok (canalso configure to require all to respond)
Mode = 802.3ad => if linked to a real switch (802.3ad-compliant peer)
Adaptive load balancing => otherwise (if connected directly or via a hub, not a switch)
Monitoring frequency = 100 ms
Add->Create->Ethernet->select eth0
Add->Create->Ethernet->select eth1
Link Monitoring: MII => check media state
ARP => use ARP to "ping" specified IP addresses(comma-separated),
at least one responds -> link ok (canalso configure to require all to respond)
Mode = 802.3ad => if linked to a real switch (802.3ad-compliant peer)
Adaptive load balancing => otherwise (if connected directly or via a hub, not a switch)
Monitoring frequency = 100 ms
Or create files:
nmcli device disconntctifname
nmcli connection reload [ifname]
nmcli connecton up ifname
route -n => must go to bond, not slaves
also make sure default route is present
if not, add to /etc/sysconfig/network: GATEWAY=xx.xx.xx.xx
/etc/sysconfig/network-scripts/ifcfg-bond0
/etc/sysconfig/network-scripts/ifcfg-bond0_slave_1
/etc/sysconfig/network-scripts/ifcfg-bond0_slave_2
DEVICE=bond0
NAME=bond0
TYPE=Bond
ONBOOT=yes
BONDING_MASTER=yes
BOOTPROTO=none
#DEFROUTE=yes
#IPV4_FAILURE_FATAL=no
#UUID=9d1c6d47-2246-4c74-9c62-adf260d3fcfc
#BONDING_OPTS="miimon=100 updelay=0 downdelay=0 mode=balance-rr"
BONDING_OPTS="miimon=100 updelay=0 downdelay=0 mode=balance-alb"
IPADDR=223.100.0.10
PREFIX=24
#IPV6INIT=yes
#IPV6_AUTOCONF=yes
#IPV6_DEFROUTE=yes
#IPV6_FAILURE_FATAL=no
#IPV6_PEERDNS=yes
#IPV6_PEERROUTES=yes
#IPV6_PRIVACY=no
NAME=bond0
TYPE=Bond
ONBOOT=yes
BONDING_MASTER=yes
BOOTPROTO=none
#DEFROUTE=yes
#IPV4_FAILURE_FATAL=no
#UUID=9d1c6d47-2246-4c74-9c62-adf260d3fcfc
#BONDING_OPTS="miimon=100 updelay=0 downdelay=0 mode=balance-rr"
BONDING_OPTS="miimon=100 updelay=0 downdelay=0 mode=balance-alb"
IPADDR=223.100.0.10
PREFIX=24
#IPV6INIT=yes
#IPV6_AUTOCONF=yes
#IPV6_DEFROUTE=yes
#IPV6_FAILURE_FATAL=no
#IPV6_PEERDNS=yes
#IPV6_PEERROUTES=yes
#IPV6_PRIVACY=no
/etc/sysconfig/network-scripts/ifcfg-bond0_slave_1
HWADDR=52:54:00:9C:32:50
TYPE=Ethernet
NAME="bond0 slave 1"
#UUID=97b83c1b-de26-43f0-91e7-885ef758d0ec
ONBOOT=yes
MASTER=bond0
#MASTER=9d1c6d47-2246-4c74-9c62-adf260d3fcfc
SLAVE=yes
TYPE=Ethernet
NAME="bond0 slave 1"
#UUID=97b83c1b-de26-43f0-91e7-885ef758d0ec
ONBOOT=yes
MASTER=bond0
#MASTER=9d1c6d47-2246-4c74-9c62-adf260d3fcfc
SLAVE=yes
/etc/sysconfig/network-scripts/ifcfg-bond0_slave_2
HWADDR=52:54:00:CE:B6:91
TYPE=Ethernet
NAME="bond0 slave 2"
#UUID=2bf74af0-191a-4bf3-b9df-36b930e2cc2f
ONBOOT=yes
MASTER=bond0
#MASTER=9d1c6d47-2246-4c74-9c62-adf260d3fcfc
SLAVE=yes
TYPE=Ethernet
NAME="bond0 slave 2"
#UUID=2bf74af0-191a-4bf3-b9df-36b930e2cc2f
ONBOOT=yes
MASTER=bond0
#MASTER=9d1c6d47-2246-4c74-9c62-adf260d3fcfc
SLAVE=yes
nmcli device disconntctifname
nmcli connection reload [ifname]
nmcli connecton up ifname
route -n => must go to bond, not slaves
also make sure default route is present
if not, add to /etc/sysconfig/network: GATEWAY=xx.xx.xx.xx
To team interfaces:
dnf install -y teamd NetworkManager-team
then configure team interface with NetworkManager GIU
Bonded/teamed interfaces in most setups do not provide increaseddata speed or increased bandwidth from one node to another. Theyprovide a failover and may provide an increasedaggregate bandwidth for concurrent connections tomultiple target hosts (but not to the same target host). However, there is a couple of workarounds:
-
Option 1:
Use
bonding mode=4 (802.3ad)
lacp_rate=0
xmit_hash_policy=layer3+4
lacp_rate=0
xmit_hash_policy=layer3+4
The latter hashes using src-(ip,port) and dst-(ip,port).
Still not good for a single connection.
Create separate VLAN for each port (on each of the nodes) and use bonding mode = Adaptive load balancing.
ThenLACP-compliant bridge will consider links separate and won't try tocorrelate the traffic and direct it via a single link according toxmit_hash_policy.
However this will reduce somewhat failover capacity: for example if Node1.LinkVLAN1 and Node2.LinkVLAN2 both fail.
Italso requires that all peer systems (such as iSCSI servers, iSNS, etc.)have their interfaces configured accordingly to the sameVLAN scheme.
ThenLACP-compliant bridge will consider links separate and won't try tocorrelate the traffic and direct it via a single link according toxmit_hash_policy.
However this will reduce somewhat failover capacity: for example if Node1.LinkVLAN1 and Node2.LinkVLAN2 both fail.
Italso requires that all peer systems (such as iSCSI servers, iSNS, etc.)have their interfaces configured accordingly to the sameVLAN scheme.
Remember to enable jumbo frames: ifconfig ethX mtu 9000.
Prepare:
Names vc1,vc2 andvc3 below are forcluster backchannel.
On each node:
# set node name
hostnamectlset-hostname vcx
# disable "captive portal"detection in Fedora
dnfinstall -y crudini
crudini --set/etc/NetworkManager/conf.d/21-connectivity-local.conf connectivityinterval 0
systemctl restartNetworkManager
hostnamectlset-hostname vcx
# disable "captive portal"detection in Fedora
dnfinstall -y crudini
crudini --set/etc/NetworkManager/conf.d/21-connectivity-local.conf connectivityinterval 0
systemctl restartNetworkManager
Clustershells
Install
dnf install-y pdsh clustershell
To use pdsh:
#non-interactive:
pdsh -R exec -f 1 -w vc1,vc2,vc3 cmd | dshbak
pdsh -R exec -f 1 -w vc[1-3] cmd | dshbak
#interactive:
pdsh -R exec -f 1 -w vc1,vc2,vc3
pdsh -R exec -f 1 -w vc[1-3]
cmd substitution:
%h => remote host name
%u => remote user name
%n => 0, 1, 2, 3 ...
%% => %
pdsh -R exec -f 1 -w vc1,vc2,vc3 cmd | dshbak
pdsh -R exec -f 1 -w vc[1-3] cmd | dshbak
#interactive:
pdsh -R exec -f 1 -w vc1,vc2,vc3
pdsh -R exec -f 1 -w vc[1-3]
cmd substitution:
%h => remote host name
%u => remote user name
%n => 0, 1, 2, 3 ...
%% => %
To set up for clush, first enable password-less ssh.
Clumsy way:
ssh vc1
ssh vc2
ssh vc3
ssh-****** -t rsa
ssh vc1 mkdir -p .ssh
ssh vc2 mkdir -p .ssh
ssh vc3 mkdir -p .ssh
ssh vc1 chmod 700 .ssh
ssh vc2 chmod 700 .ssh
ssh vc3 chmod 700 .ssh
cat .ssh/id_rsa.pub | ssh vc1 'cat>> .ssh/authorized_keys'
cat .ssh/id_rsa.pub | ssh vc2 'cat >>.ssh/authorized_keys'
cat .ssh/id_rsa.pub | ssh vc3 'cat >>.ssh/authorized_keys'
Ctrl-Dssh vc1 mkdir -p .ssh
ssh vc2 mkdir -p .ssh
ssh vc3 mkdir -p .ssh
ssh vc1 chmod 700 .ssh
ssh vc2 chmod 700 .ssh
ssh vc3 chmod 700 .ssh
cat .ssh/id_rsa.pub | ssh vc1 'cat>> .ssh/authorized_keys'
cat .ssh/id_rsa.pub | ssh vc2 'cat >>.ssh/authorized_keys'
cat .ssh/id_rsa.pub | ssh vc3 'cat >>.ssh/authorized_keys'
ssh vc2
ssh-****** -t rsa
cat .ssh/id_rsa.pub | ssh vc1 'cat >>.ssh/authorized_keys'
cat .ssh/id_rsa.pub | ssh vc2 'cat >>.ssh/authorized_keys'
cat .ssh/id_rsa.pub | ssh vc3 'cat >>.ssh/authorized_keys'
Ctrl-Dcat .ssh/id_rsa.pub | ssh vc1 'cat >>.ssh/authorized_keys'
cat .ssh/id_rsa.pub | ssh vc2 'cat >>.ssh/authorized_keys'
cat .ssh/id_rsa.pub | ssh vc3 'cat >>.ssh/authorized_keys'
ssh vc3
ssh-****** -t rsa
cat .ssh/id_rsa.pub | ssh vc1 'cat >>.ssh/authorized_keys'
cat .ssh/id_rsa.pub | ssh vc2 'cat >>.ssh/authorized_keys'
cat .ssh/id_rsa.pub | ssh vc3 'cat >>.ssh/authorized_keys'
Ctrl-D
cat .ssh/id_rsa.pub | ssh vc1 'cat >>.ssh/authorized_keys'
cat .ssh/id_rsa.pub | ssh vc2 'cat >>.ssh/authorized_keys'
cat .ssh/id_rsa.pub | ssh vc3 'cat >>.ssh/authorized_keys'
Cleaner way:
Create id_rsa.pub, id_rsaand authorized_keys on one node,
then replicate them to other nodes in the cluster.
then replicate them to other nodes in the cluster.
To use clush:
clush -w vc1,vc2,vc3 -b [cmd]
clush -w vc[1-3] -b [cmd]
clush -w vc[1-3] -b [cmd]
Basiccluster install:
On each node:
dnf install -y pcsfence-agents-all fence-agents-virsh resource-agents pacemaker
Optional: dnfinstall -y dlm lvm2-cluster gfs2-utils iscsi-initiator-utils lsscsihttpd wget
systemctl startfirewalld.service
firewall-cmd--permanent --add-service=high-availability
firewall-cmd--add-service=high-availability
systemctl stopfirewalld.service
iptables --flush
## optionally disable SELinux:
#setenforce 0
#edit /etc/selinux/config and change SELINUX=enforcing=>SELINUX=permissive
passwd hacluster
systemctl startpcsd.service
systemctl enablepcsd.service
firewall-cmd--permanent --add-service=high-availability
firewall-cmd--add-service=high-availability
systemctl stopfirewalld.service
iptables --flush
## optionally disable SELinux:
#setenforce 0
#edit /etc/selinux/config and change SELINUX=enforcing=>SELINUX=permissive
passwd hacluster
systemctl startpcsd.service
systemctl enablepcsd.service
# make sure no http_proxyexported
pcscluster auth vc1.example.com vc2.example.com vc3.example.com -uhacluster -pxxxxx--force
e.g. pcs cluster auth vc1 vc2 vc3-u hacluster -p abc123 --force
pcscluster auth vc1.example.com vc2.example.com vc3.example.com -uhacluster -pxxxxx--force
e.g. pcs cluster auth vc1 vc2 vc3-u hacluster -p abc123 --force
# created auth data isstored in /var/lib/pcsd
On one node:
pcs cluster setup [--force] --name vcvc1.example.com vc2.example.com vc3.example.com
pcs cluster start --all
pcs cluster start --all
to stop: pcscluster stop --all
On each node:
# to auto-start cluster onreboot
# alternatively can manually do "pcs cluster start" on each reboot
pcscluster enable --all
to disable: pcs cluster disable --all
# alternatively can manually do "pcs cluster start" on each reboot
pcscluster enable --all
to disable: pcs cluster disable --all
View status:
pcs status
pcs cluster status
pcs clusterpcsd-status
systemctl statuscorosync.service
journalctl -xe
cibadmin--query
pcsproperty list [--all] [--defaults]
corosync-quorumtool-oi [-i]
corosync-cpgtool
corosync-cmapctl [ | grepmembers]
corosync-cfgtool -s
pcs cluster cib
pcs cluster status
pcs clusterpcsd-status
systemctl statuscorosync.service
journalctl -xe
cibadmin--query
pcsproperty list [--all] [--defaults]
corosync-quorumtool-oi [-i]
corosync-cpgtool
corosync-cmapctl [ | grepmembers]
corosync-cfgtool -s
pcs cluster cib
Verify current configuration
crm_verify --live --verbose
Start/stop node
pcs cluster stop vc2
pcs status
pcs cluster start vc2
pcs status
pcs cluster start vc2
Disable/enable hosting resources on the node (standby state)
pcs cluster standby vc2
pcs status
pcs cluster unstandby vc2
pcs status
pcs cluster unstandby vc2
"Transactional"configuration:
pcs clustercib my.xml # get a copy ofCIB to my.xml
pcs -f my.xml ... change command ... #make changes of config in my.xml
crm_verify --verbose --xml-file=q.xml # verifyconfig
pcs cluster cib-push my.xml # push config from my.xml to CIB
pcs -f my.xml ... change command ... #make changes of config in my.xml
crm_verify --verbose --xml-file=q.xml # verifyconfig
pcs cluster cib-push my.xml # push config from my.xml to CIB
ConfigureSTONITH
fence_virsh -fences machine via ssh to vm host and execuitingsudo virsh destroy<vmid> or
sudo virsh reboot<vmid>
Alternative to virsh:fence_virt/fence_xvm
dnf install -y fence-virt
STONITH is needed:
- In resource (non-quorum) basedclusters, for obvious reasons
- In two-node clusters withoutquorum disk (a special case of the above), for obvious reasons
- Inquorum-based clusters, because Linux clustering solutions includingCorosync and CMAN run as user-level processes and are unable tointerdict user-level and kernel-level activity on the node when clusternode losesconnection to majority-votes partition. By comparison, in VMSCNXMAN is a kernel component which makes all CPUs to spin inIOPOST by requeueing the request to the tail of IOPOST queueuntilquorum is restored and the node re-joins the majoritypartition.During this time, no user-level processes can execute, and no new IOcan be initiated, except the controlled IO to the quorum disk and SCSdatagrams by CNXMAN. When connection to the majority-parition isrestored, mount verification is further executed, and all file systemrequests are held off until mount verification completes. If anoderestores connection to the majority parition and detects newincarnation of the cluster, the node executes a bugcheck to reboot.
Configure virshSTONISH
On the vm host:
define user stonithmgr
add it to sudoers as
add it to sudoers as
stonithmgrALL=(ALL) NOPASSWD: ALL
On a cluster node:
pcs stonith list
pcs stonith describefence-virsh
man fence_virsh
fence_virhs -h
#test
fence_virsh--ip=vc2-vmhost --username=stonithmgr --password=vc-cluster --verbose--plug=vc2 --action=metadata
fence_virsh--ip=vc2-vmhost --username=stonithmgr --password=vc-cluster --verbose--plug=vc2 --use-sudo --action=status
fence_virsh--ip=vc2-vmhost --username=stonithmgr --password=vc-cluster --verbose--plug=vc2 --use-sudo --action=list
fence_virsh--ip=vc2-vmhost --username=stonithmgr --password=vc-cluster --verbose--plug=vc2 --use-sudo --action=monitor
fence_virsh--ip=vc2-vmhost --username=stonithmgr --password=vc-cluster --verbose--plug=vc2 --use-sudo --action=off
create file /root/stonithmgr-passwd.sh as
chmod 755/root/stonithmgr-passwd.sh
rsync -av/root/stonithmgr-passwd.sh vc2:/root
rsync -av/root/stonithmgr-passwd.sh vc3:/root
for node in vc1 vc2vc3; do
pcs stonith deletefence_${node}_virsh
pcs stonith createfence_${node}_virsh \
fence_virsh \
priority=10 \
ipaddr=${node}-vmhost \
login=stonithmgr passwd_script="/root/stonithmgr-passwd.sh"sudo=1 \
port=${node} \
pcmk_host_list=${node}
done
pcmk_host_list => vc1.example.com
port => vm name in virsh
ipaddr => name of machine hosting vm
delay=15 => delay for execution of fencing action
pcs stonith describefence-virsh
man fence_virsh
fence_virhs -h
#test
fence_virsh--ip=vc2-vmhost --username=stonithmgr --password=vc-cluster --verbose--plug=vc2 --action=metadata
fence_virsh--ip=vc2-vmhost --username=stonithmgr --password=vc-cluster --verbose--plug=vc2 --use-sudo --action=status
fence_virsh--ip=vc2-vmhost --username=stonithmgr --password=vc-cluster --verbose--plug=vc2 --use-sudo --action=list
fence_virsh--ip=vc2-vmhost --username=stonithmgr --password=vc-cluster --verbose--plug=vc2 --use-sudo --action=monitor
fence_virsh--ip=vc2-vmhost --username=stonithmgr --password=vc-cluster --verbose--plug=vc2 --use-sudo --action=off
create file /root/stonithmgr-passwd.sh as
#!/bin/sh
echo"vc-cluster-passwd"
echo"vc-cluster-passwd"
chmod 755/root/stonithmgr-passwd.sh
rsync -av/root/stonithmgr-passwd.sh vc2:/root
rsync -av/root/stonithmgr-passwd.sh vc3:/root
for node in vc1 vc2vc3; do
pcs stonith deletefence_${node}_virsh
pcs stonith createfence_${node}_virsh \
fence_virsh \
priority=10 \
ipaddr=${node}-vmhost \
login=stonithmgr passwd_script="/root/stonithmgr-passwd.sh"sudo=1 \
port=${node} \
pcmk_host_list=${node}
done
pcmk_host_list => vc1.example.com
port => vm name in virsh
ipaddr => name of machine hosting vm
delay=15 => delay for execution of fencing action
STONITH commands:
pcs stonithshow --full
pcs stonith fence vc2 --off
pcs stonith confirm vc2
pcs stonith delete fence_vc1_virsh
pcs stonith fence vc2 --off
pcs stonith confirm vc2
pcs stonith delete fence_vc1_virsh
Reading:
https://alteeve.ca/w/Anvil!_Tutorial_3#Fencing_using_fence_virsh
https://www.centos.org/forums/viewtopic.php?f=48&t=50904
https://www.ibm.com/developerworks/community/blogs/mhhaque/entry/configure_two_node_highly_available_cluster_using_kvm_fencing_on_rhel7
http://www.hpuxtips.es/?q=content/part6-fencing-fencevirsh-my-study-notes-red-hat-certificate-expertise-clustering-and-storage
https://www.centos.org/forums/viewtopic.php?f=48&t=50904
https://www.ibm.com/developerworks/community/blogs/mhhaque/entry/configure_two_node_highly_available_cluster_using_kvm_fencing_on_rhel7
http://www.hpuxtips.es/?q=content/part6-fencing-fencevirsh-my-study-notes-red-hat-certificate-expertise-clustering-and-storage
Management GUI:
https://vc1:2224
log in as hacluster
log in as hacluster
ManagementGUI, Hawk:
Essential files:
/etc/corosync/cofosync.conf
/etc/corosync/cofosync.xml
/etc/corosync/authkey
/var/lib/pacemaker/cib/cib.xml (do not edit manually)
/etc/sysconfig/corosync
/etc/sysconfig/corosync-inotifyd
/etc/sysconfig/pacemaker
/var/log/pacemaker.log
/var/log/corosync.log (but by default sent to syslog)
/var/log/pcsd/...
/var/log/cluster/...
/var/log/syslog
or on new Fedora:
/etc/corosync/cofosync.xml
/etc/corosync/authkey
/var/lib/pacemaker/cib/cib.xml (do not edit manually)
/etc/sysconfig/corosync
/etc/sysconfig/corosync-inotifyd
/etc/sysconfig/pacemaker
/var/log/pacemaker.log
/var/log/corosync.log (but by default sent to syslog)
/var/log/pcsd/...
/var/log/cluster/...
/var/log/syslog
or on new Fedora:
journalctl --boot -x
journalctl --list-boots
journalctl --follow-x
journalctl --all-x
journalctl -xe
journalctl --list-boots
journalctl --follow-x
journalctl --all-x
journalctl -xe
Man pages:
mancorosync.conf
man corosync.xml
man corosync-xmlproc
man corosync_overview
man corosync
man corosync-cfgtool
man quorum_overview // quorum library
man votequorum_overview // ...
man votequorum // quorum configuration
man corosync-quorumtool
man cibadmin
man cmap_overview // corosync config registry
man cmap_keys
man corosync-cmapctl
man sam_overview // library to register processfor a restart on failure
man cpg_overview // closed group messaginglibrary w/virtual synchrony
man corosync-cpgtool
man corosync-blackbox // dump protocol"blackbox" data
man qb-blackbox
man ocf-tester
man crmadmin
man gfs2
man tunegfs2
man corosync.xml
man corosync-xmlproc
man corosync_overview
man corosync
man corosync-cfgtool
man quorum_overview // quorum library
man votequorum_overview // ...
man votequorum // quorum configuration
man corosync-quorumtool
man cibadmin
man cmap_overview // corosync config registry
man cmap_keys
man corosync-cmapctl
man sam_overview // library to register processfor a restart on failure
man cpg_overview // closed group messaginglibrary w/virtual synchrony
man corosync-cpgtool
man corosync-blackbox // dump protocol"blackbox" data
man qb-blackbox
man ocf-tester
man crmadmin
man gfs2
man tunegfs2
Essential processes:
corosync | totem,membership and quorum manager, messaging |
cib | clusterinformation base |
stonithd | fencingdaemon |
crmd | clusterresource management daemon |
lrmd | localresource management daemon |
pengine | policyengine |
attrd | co-ordinatesupdates to cib, as an intermediary |
dlm_controld | distributedlock manager |
clvmd | clusteredLVM daemon |
Alternatives to corosync:CMAN or CCM + HEARTBEAT
DC ≡ DesignatedController. One of CRMd instances elected to act as a master. Shouldthe elected CRMd process or its node fail, a new master is elected. DCcarries out PEngine's instructions by passing them to LRMd on a localnode,
or to CRMd peers on other nodes, which in turn pass them to theirLRMd's. Peers then report the results of execution to DC.
Resource categories:
LSB | Services from /etc/init.d |
Systemd | systemd units |
Upstart | upstart jobs |
OCF | Open Cluster Framework scripts |
Nagios | Nagios monitoring plugins |
STONITH | fence agents |
pcs resource standards
pcs resourceproviders
pcs resource agentsocf:heartbeat
pcs resource agentsocf:pacemaker
pcs resource agentssystemd
pcs resource agentsservice
pcs resource agentslsb
pcs resource agentsstonith
pcs resourceproviders
pcs resource agentsocf:heartbeat
pcs resource agentsocf:pacemaker
pcs resource agentssystemd
pcs resource agentsservice
pcs resource agentslsb
pcs resource agentsstonith
Resource consraints:
location | Which nodes the resource can run on |
order | The order in which the resource is launched |
colocation | Where the resource will be placed relative to otherresources |
Connectto iSCSI drives:
See iSCSI page.
Briefly, on each cluster node:
Install the open-iscsi package. The package is also known as the LinuxOpen-iSCSI Initiator.
Ubuntu:
apt-get install open-iscsilsscsi
gedit/etc/iscsi/iscsid.conf
/etc/init.d/open-iscsirestart
gedit/etc/iscsi/iscsid.conf
/etc/init.d/open-iscsirestart
Fedora:
dnf install -yiscsi-initiator-utils lsscsi
systemctl enableiscsid.service
systemctl startiscsid.service
systemctl enableiscsid.service
systemctl startiscsid.service
Display/edit initiator name, ensure it is unique in the landscape(especially if cloned the system)
cat/etc/iscsi/initiatorname.iscsi
e.g.
InitiatorName=iqn.1994-05.com.redhat:cbf2ba2dff2=> iqn.1994-05.com.redhat:mynode1
InitiatorName=iqn.1993-08.org.debian:01:16c1be18eee8=> iqn.1993-08.org.debian:01:myhost2
e.g.
InitiatorName=iqn.1994-05.com.redhat:cbf2ba2dff2=> iqn.1994-05.com.redhat:mynode1
InitiatorName=iqn.1993-08.org.debian:01:16c1be18eee8=> iqn.1993-08.org.debian:01:myhost2
Optional: edit configuration
gedit /etc/iscsi/iscsid.conf
restart the service
restart the service
Discover the iSCSI targets on a specific host
iscsiadm -mdiscovery -tsendtargets -p qnap1x:3260 \
--name discovery.sendtargets.auth.authmethod--value CHAP \
--name discovery.sendtargets.auth.username--value sergey \
--name discovery.sendtargets.auth.password--value abc123abc123
--name discovery.sendtargets.auth.authmethod--value CHAP \
--name discovery.sendtargets.auth.username--value sergey \
--name discovery.sendtargets.auth.password--value abc123abc123
Check the available iSCSI node(s) to connect to.
iscsiadm -m node
Delete node(s) you don’t want to connect to when the service is on withthe following command:
iscsiadm -m node --op delete--targetname <target_iqn>
Configure authentication for the remaining targets:
iscsiadm --mode node --targetname"iqn.2004-04.com.qnap:ts-569l:iscsi.xs1.e4cd7c" -pqnap1x:3260--op=update
--name node.session.auth.authmethod --value=CHAP
iscsiadm --mode node --targetname"iqn.2004-04.com.qnap:ts-569l:iscsi.xs1.e4cd7c" -pqnap1x:3260--op=update --name node.session.auth.username --value=sergey
iscsiadm --mode node --targetname"iqn.2004-04.com.qnap:ts-569l:iscsi.xs1.e4cd7c" -pqnap1x:3260--op=update --name node.session.auth.password --value=abc123abc123
iscsiadm --mode node --targetname"iqn.2004-04.com.qnap:ts-569l:iscsi.xs1.e4cd7c" -pqnap1x:3260 --login
iscsiadm --mode node --targetname"iqn.2004-04.com.qnap:ts-569l:iscsi.xs2.e4cd7c" -pqnap1x:3260--op=update --name node.session.auth.authmethod --value=CHAP
iscsiadm --mode node --targetname"iqn.2004-04.com.qnap:ts-569l:iscsi.xs2.e4cd7c" -pqnap1x:3260--op=update --name node.session.auth.username --value=sergey
iscsiadm --mode node --targetname"iqn.2004-04.com.qnap:ts-569l:iscsi.xs2.e4cd7c" -pqnap1x:3260--op=update --name node.session.auth.password --value=abc123abc123
iscsiadm --mode node --targetname"iqn.2004-04.com.qnap:ts-569l:iscsi.xs2.e4cd7c" -pqnap1x:3260 --login
iscsiadm --mode node --targetname"iqn.2004-04.com.qnap:ts-569l:iscsi.xs3.e4cd7c" -pqnap1x:3260--op=update --name node.session.auth.authmethod --value=CHAP
iscsiadm --mode node --targetname"iqn.2004-04.com.qnap:ts-569l:iscsi.xs3.e4cd7c" -pqnap1x:3260--op=update --name node.session.auth.username --value=sergey
iscsiadm --mode node --targetname"iqn.2004-04.com.qnap:ts-569l:iscsi.xs3.e4cd7c" -pqnap1x:3260--op=update --name node.session.auth.password --value=abc123abc123
iscsiadm --mode node --targetname"iqn.2004-04.com.qnap:ts-569l:iscsi.xs3.e4cd7c" -pqnap1x:3260 --login
iscsiadm --mode node --targetname"iqn.2004-04.com.qnap:ts-569l:iscsi.xs1.e4cd7c" -pqnap1x:3260--op=update --name node.session.auth.username --value=sergey
iscsiadm --mode node --targetname"iqn.2004-04.com.qnap:ts-569l:iscsi.xs1.e4cd7c" -pqnap1x:3260--op=update --name node.session.auth.password --value=abc123abc123
iscsiadm --mode node --targetname"iqn.2004-04.com.qnap:ts-569l:iscsi.xs1.e4cd7c" -pqnap1x:3260 --login
iscsiadm --mode node --targetname"iqn.2004-04.com.qnap:ts-569l:iscsi.xs2.e4cd7c" -pqnap1x:3260--op=update --name node.session.auth.authmethod --value=CHAP
iscsiadm --mode node --targetname"iqn.2004-04.com.qnap:ts-569l:iscsi.xs2.e4cd7c" -pqnap1x:3260--op=update --name node.session.auth.username --value=sergey
iscsiadm --mode node --targetname"iqn.2004-04.com.qnap:ts-569l:iscsi.xs2.e4cd7c" -pqnap1x:3260--op=update --name node.session.auth.password --value=abc123abc123
iscsiadm --mode node --targetname"iqn.2004-04.com.qnap:ts-569l:iscsi.xs2.e4cd7c" -pqnap1x:3260 --login
iscsiadm --mode node --targetname"iqn.2004-04.com.qnap:ts-569l:iscsi.xs3.e4cd7c" -pqnap1x:3260--op=update --name node.session.auth.authmethod --value=CHAP
iscsiadm --mode node --targetname"iqn.2004-04.com.qnap:ts-569l:iscsi.xs3.e4cd7c" -pqnap1x:3260--op=update --name node.session.auth.username --value=sergey
iscsiadm --mode node --targetname"iqn.2004-04.com.qnap:ts-569l:iscsi.xs3.e4cd7c" -pqnap1x:3260--op=update --name node.session.auth.password --value=abc123abc123
iscsiadm --mode node --targetname"iqn.2004-04.com.qnap:ts-569l:iscsi.xs3.e4cd7c" -pqnap1x:3260 --login
Youshould be able to see the login message as below:
Login session [iface:default, target: iqn.2004-04.com:NAS:iSCSI.ForUbuntu.B9281B, portal:10.8.12.31,3260] [ OK ]
Restart open-iscsi to login to all of the available nodes.
Fedora: systemctlrestart iscsid.service
Ubuntu: /etc/init.d/open-iscsi restart
Ubuntu: /etc/init.d/open-iscsi restart
Check the device status with dmesg.
dmesg | tail -30
List available devices:
lsscsi
lsscsi-s
lsscsi-dg
lsscsi-c
lsscsi-Lvl
iscsiadm-m session [-P 3] [-o show]
lsscsi-s
lsscsi-dg
lsscsi-c
lsscsi-Lvl
iscsiadm-m session [-P 3] [-o show]
For multipathing, see a section below.
Formatvolume with cluster LVM
See RHEL7LVM Administration, chapters 1.4, 3.1, 4.3.3, 4.3.8, 4.7, 5.5.
On each node:
lvmconf --enable-cluster
systemctl stoplvm2-lvmetad.service
systemctl disablelvm2-lvmetad.service
To revert (if desired later):
lvmconf
--disable-cluster
edit /etc/lvm/lvm.conf
change use_lvmetad to 1
systemctlstart lvm2-lvmetad.service
systemctl enablelvm2-lvmetad.service
On one node (cluster must be running):
pcs resource create dlm ocf:pacemaker:controld opmonitor interval=30s on-fail=fence clone interleave=true ordered=true
pcsresource create clvmd ocf:heartbeat:clvm with_cmirrord=true op monitorinterval=30s on-fail=fence clone interleave=true ordered=true
pcs constraint order start dlm-clone then clvmd-clone
pcs constraint colocation add clvmd-clone with dlm-clone
pcs constraint show
pcs resource show
pcsresource create clvmd ocf:heartbeat:clvm with_cmirrord=true op monitorinterval=30s on-fail=fence clone interleave=true ordered=true
pcs constraint order start dlm-clone then clvmd-clone
pcs constraint colocation add clvmd-clone with dlm-clone
pcs constraint show
pcs resource show
If clvmdwas already configured earlier, but withoutcmirrord, canenable the latter with:
pcs resource update clvmdwith_cmirrord=true
Identify the drive
iscsiadm -msession -P 3 | grep Target
iscsiadm -m session -P 3 | grep scsi | grep Channel
lsscsi
tree /dev/disk
iscsiadm -m session -P 3 | grep scsi | grep Channel
lsscsi
tree /dev/disk
Partition the drive and create volume group
fdisk/dev/disk/by-path/ip-192.168.73.2:3260-iscsi-iqn.2004-04.com.qnap:ts-569l:iscsi.xs1.e4cd7c-lun-0
Refresh parition table view on all other nodes:
pvcreate /dev/disk/by-path/ip-192.168.73.2:3260-iscsi-iqn.2004-04.com.qnap:ts-569l:iscsi.xs1.e4cd7c-lun-0-part1
vgcreate[--clustered y] vg1/dev/disk/by-path/ip-192.168.73.2:3260-iscsi-iqn.2004-04.com.qnap:ts-569l:iscsi.xs1.e4cd7c-lun-0-part1
vgdisplay vg1
pvdisplay
vgs
respond: n, p,...., w, p, q
Refresh parition table view on all other nodes:
partprobe
pvcreate /dev/disk/by-path/ip-192.168.73.2:3260-iscsi-iqn.2004-04.com.qnap:ts-569l:iscsi.xs1.e4cd7c-lun-0-part1
vgcreate[--clustered y] vg1/dev/disk/by-path/ip-192.168.73.2:3260-iscsi-iqn.2004-04.com.qnap:ts-569l:iscsi.xs1.e4cd7c-lun-0-part1
vgdisplay vg1
pvdisplay
vgs
Create logical volume:
lvcreate vg1 --name lv1 --size 9500M
lvcreate vg1 --namelv1 --extents 2544 # find the number of free extents from vgdisplay
lvcreate vg1 --namelv1 --extents 100%FREE
lvdisplay
ls -l /dev/vg1/lv1
lvcreate vg1 --namelv1 --extents 2544 # find the number of free extents from vgdisplay
lvcreate vg1 --namelv1 --extents 100%FREE
lvdisplay
ls -l /dev/vg1/lv1
Multipathing
See here.
GFS2
RedHat GFS2 documentation
- File system name must be unique in a cluster (DLM locknames derive from it)
- File system hosts journal files. One journal is requiredper each cluster node that mounts this file system.
Default journal size: 128 MB (per journal).
Minimum journal size: 8 MB.
For large file systems, increase to 256 MB.
If journal is too small, requests will have to wait for journal space,and performance will suffer.
- Do not use SELinux with GFS2.
SELinux stores information in every file's extended attributes, whichwill cause significant GFS2 slowdown.
- If GFS2 filesystem is mounted manually (rather than throughPacemaker resource), unmount it manually.
Otherwise shutdown script will kill cluster processes and will then tryto unmount the GFS2 file system,
but without the processes the unmount will fail and the system willhang (and a hardware reboot will be required).
pcs property setno-quorum-policy=freeze
By default, the value of
no-quorum-policyis set to
stop,indicating that once quorum is lost, all the resources on the remaining(minority) partition will immediately be stopped. Typically thisdefault is the safest and most optimal option, but unlike mostresources, GFS2 and OCFS2 require quorum to function.
When quorum islost both the applications using the GFS2 mounts and the GFS2 mountitself cannot be correctly stopped in a partition that has becomenon-quorate. Any attempts to stop these resources without quorum willfail which will ultimately result in the
entire cluster being fencedevery time quorum is lost.
To address this situation, set the no-quorum-policy=freezewhen GFS2 is in use. This means that when quorum is lost, the remaining(minority) partition will do nothing until quorum is regained.
If majority partition remains, it will fence the minority partition.
To address this situation, set the no-quorum-policy=freezewhen GFS2 is in use. This means that when quorum is lost, the remaining(minority) partition will do nothing until quorum is regained.
If majority partition remains, it will fence the minority partition.
Find out for sure: if themajoritypartition can launch a failover replica of a service (that was runninginside a minority partition)beforefencing a minority partition, or will do it onlyafter
fencing aminority parition . If before, two replicas can conflict whenno-quorum-policy isfreeze(and even when it is
stop).
Create file system and Pacemaker resource for it:
mkfs.gfs2 -j 3 -p lock_dlm -tvc:cfs1 /dev/vg1/lv1
# view settings
tunegfs2 /dev/vg1/lv1
# change label (note: labelis also the lcck table name)
tunegfs2 -L vc:cfs1 /dev/vg1/lv1
# some other settings can also later be changed with tunegfs2
pcs resource create cfs1Filesystem device="/dev/vg1/lv1" directory="/var/mnt/cfs1" fstype=gfs2\
options="noatime,nodiratime" run_fsck=no\
op monitor interval=10s on-fail=fence cloneinterleave=none
pcs constraint order start clvmd-clone then cfs1-clone
pcs constraint colocation add cfs1-clonewith clvmd-clone
mount | grep /var/mnt/cfs1
-j 3 => pre-createjournals for three cluster nodes
-t value => locking table name (must beClusterName:FilesystemName)
-O => do not ask for confirmation
-J 256 => create journal with size of 256 MB (default: 128, min:8)
-r <mb> => size of allocation "resource group",usually 256 MB
-t value => locking table name (must beClusterName:FilesystemName)
-O => do not ask for confirmation
-J 256 => create journal with size of 256 MB (default: 128, min:8)
-r <mb> => size of allocation "resource group",usually 256 MB
# view settings
tunegfs2 /dev/vg1/lv1
# change label (note: labelis also the lcck table name)
tunegfs2 -L vc:cfs1 /dev/vg1/lv1
# some other settings can also later be changed with tunegfs2
pcs resource create cfs1Filesystem device="/dev/vg1/lv1" directory="/var/mnt/cfs1" fstype=gfs2\
options="noatime,nodiratime" run_fsck=no\
op monitor interval=10s on-fail=fence cloneinterleave=none
Mount options:
acl enable ACLs
discard when on SSD or SCSCI devices, enableUNMAP function for blocks being freed
quota=on enforce quota
quota=account matain quota, but do not enforce it
noatime disable update of access time
nodiratime same for directories
lockproto=lock_nolock => mounting out of cluster (no DLM)
discard when on SSD or SCSCI devices, enableUNMAP function for blocks being freed
quota=on enforce quota
quota=account matain quota, but do not enforce it
noatime disable update of access time
nodiratime same for directories
lockproto=lock_nolock => mounting out of cluster (no DLM)
pcs constraint order start clvmd-clone then cfs1-clone
pcs constraint colocation add cfs1-clonewith clvmd-clone
mount | grep /var/mnt/cfs1
To suspend write activity on file system (e.g. to create LVM snapshot)
dmsetup suspend /dev/vg1/lv1
[... use LVM tocreate a snapshot ...]
dmsetup resume /dev/vg1/lv1
[... use LVM tocreate a snapshot ...]
dmsetup resume /dev/vg1/lv1
To run fsck, stop the resource to unmount file systems from all thenodes:
pcs resource disablecfs1 [--wait=60] # default wait time is 60 seconds
fsck.gfs2 -y /dev/vg1/lv1
pcs resource enable cfs1
fsck.gfs2 -y /dev/vg1/lv1
pcs resource enable cfs1
To expand file system:
lvextend ... vg1/lv1
gfs2_grow /var/mnt/cfs1
gfs2_grow /var/mnt/cfs1
When adding node to cluster, provide enough journals first:
# find out how many journals are available
# must unmount file syste first
pcs resource disable cfs1
gfs2_edit -p jindex /dev/vg1/lv1 | grepjournal
pcs resource enable cfs1
# add one more journal, sized 128 MB
gfs2_jadd /var/mnt/cfs1
# add twomore journals sized 256 MB
gfs2_jadd -j 2 -J 256 /var/mnt/cfs1
[... add the node ...]
# must unmount file syste first
pcs resource disable cfs1
gfs2_edit -p jindex /dev/vg1/lv1 | grepjournal
pcs resource enable cfs1
# add one more journal, sized 128 MB
gfs2_jadd /var/mnt/cfs1
# add twomore journals sized 256 MB
gfs2_jadd -j 2 -J 256 /var/mnt/cfs1
[... add the node ...]
Optional – Performance tuning – Increase DLM table sizes
echo 1024 >/sys/kernel/config/dlm/cluster/lkbtbl_size
echo 1024 > /sys/kernel/config/dlm/cluster/rsbtbl_size
echo 1024 > /sys/kernel/config/dlm/cluster/dirtbl_size
echo 1024 > /sys/kernel/config/dlm/cluster/rsbtbl_size
echo 1024 > /sys/kernel/config/dlm/cluster/dirtbl_size
Optional – Performance tuning – Tune VFS
# percentage of system memory that can be filledwith “dirty” pages before the pdflush kicks in
sysctl -n vm.dirty_background_ratio #default is 5-10
sysctl -wvm.dirty_background_ratio=20
# discard inodes anddirectory entries from cache more agressively
sysctl -n vm.vfs_cache_pressure # default is 100
sysctl -n vm.vfs_cache_pressure=500
# can be permanently changed in /etc/sysctl.conf
sysctl -n vm.dirty_background_ratio #default is 5-10
sysctl -wvm.dirty_background_ratio=20
# discard inodes anddirectory entries from cache more agressively
sysctl -n vm.vfs_cache_pressure # default is 100
sysctl -n vm.vfs_cache_pressure=500
# can be permanently changed in /etc/sysctl.conf
Optional – Tuning
/sys/fs/gfs2/vc:cfs1/tune/...
To enable data journaling on a file (default: disabled)
chattr +j /var/mnt/cfs1/path/file #enable
chattr -j /var/mnt/cfs1/path/file #disable
chattr -j /var/mnt/cfs1/path/file #disable
Program optimizations:
- preallocate file space – use fallocate(...) if possible
- flock(...) is faster than fcntl(...) with GFS2
- with fcntl(...), l_pid may refer to a process on adifferent node
echo 3 >/proc/sys/vm/drop_caches
View lock etc. status:
/sys/kernel/debug/gfs2/vc:cfs1/glocks # decodedhere
dlm_tool ls [-n] [-s] [-v] [-w]
dlm_tool plocks lockspace-name[options]
dlm_tool dump [options]
dlm_tool log_plock [options]
dlm_tool lockdump lockspace-name[options]
dlm_tool lockdebug lockspace-name[options]
tunegfs2 /dev/vg1/lv1
dlm_tool ls [-n] [-s] [-v] [-w]
dlm_tool plocks lockspace-name[options]
dlm_tool dump [options]
dlm_tool log_plock [options]
dlm_tool lockdump lockspace-name[options]
dlm_tool lockdebug lockspace-name[options]
tunegfs2 /dev/vg1/lv1
Quota manipulations:
mount with "quota=on"
to create quotafiles: quotacheck -cug /var/mnt/cfs1
to edit userquota: exportEDITOR=`which nano' ; edquotausername
to edit groupquota: export EDITOR=`whichnano' ; edquota -ggroupname
graceperiods: edquota -t
verify userquota: quota -u username
verify group quota: quota -g groupname
report quota: repquota /var/mnt/cfs1
synchronizequota data between nodes: quotasync -ug/var/mnt/cfs1
to create quotafiles: quotacheck -cug /var/mnt/cfs1
to edit userquota: exportEDITOR=`which nano' ; edquotausername
to edit groupquota: export EDITOR=`whichnano' ; edquota -ggroupname
graceperiods: edquota -t
verify userquota: quota -u username
verify group quota: quota -g groupname
report quota: repquota /var/mnt/cfs1
synchronizequota data between nodes: quotasync -ug/var/mnt/cfs1
NFS over GFS2: see here
=========
### multipath: man mpathpersist https://www.suse.com/documentation/sles-12/stor_admin/data/sec_multipath_mpiotools.html
### LVM: fsfreeze
misc,iscsi:https://www.ibm.com/developerworks/community/blogs/mhhaque/entry/configure_two_node_highly_available_cluster_using_kvm_fencing_on_rhel7?lang=en
http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_moving_resources_due_to_connectivity_changes.html
https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/IPaddr2
### add node (also GFS2 journals)
### virtual-ip
### httpd
### nfs
### fence_scsi
### GFS2
### OCFS2
### DRBD
### interface bonding/teaming
### quorum disk, qdiskd, mkqdisk
### GlusterFS
### Lustre
### hawk GUI https://github.com/ClusterLabs/hawk
### http://www.spinics.net/lists/cluster/threads.html