Managing Linux Clusters

ServiceAdmin,Mgmt,Admin This page will show you techniques helping to manage Linux clusters of any size or shape.

Using such common techniques even a couple of more or less heterogeneous servers can - and should - be handled the same way as a big number of nodes forming a homogeneous cluster.

Although the methods and programs described here will be used on SUSE Linux distributions like SLES (current version 11.3) or OpenSuse (12.3), they can be adopted or even just copied due to the nature of the standard features used to other Linux distributions as well.

PDSH
pdsh is short for "parallel distributed shell". It is intended to access arbitrary groups of hosts via remote shell like ssh, which is regarded to be the standard means to access any remote host, in parallel, or at least, in non-sequential order.

Installation
pdsh is available as an ordinary RPM package. It is included in OpenSuse's standard repositories, which should fit for corresponding SLES versions, where this package might not be available by default, also.

Currently, pdsh version pdsh-2.26-12.1 will run on SLES 11.1 through 3]. For SLES 10.2 pdsh version pdsh-2.26-12.1 can be installed without further requirements, although this version does not support rpdcp.

As managing hosts means to work as root, a proper configuration of pdsh should be tailored to run remote commands via ssh password-less as user root.

So a public/private ssh key pair needs to be created using ssh-keygen and the resulting files placed properly with the public key added to all root's authorized_key files and the private key being kept on the pdsh master, which has to be configured properly as shown below.

Note, that pdsh clients do not need any or further configuration. They'll be set up properly just with having installed their pdsh package.

So, the following sections are meant for a pdsh master only.

Configuration
As /etc/pdsh will serve as pdsh's base configuration directory we put the key file in /etc/pdsh/auth and will reference it implicitely using proper pdsh environment variable settings.

Be sure to have each machine listed in /etc/ssh/ssh_hosts with her short name, FQN and ip address. Otherwise pdsh will fail to establish a connection.

As some pdsh commands or options keep searching for old configuration data, to keep them going we will have to create some symbolic links

$ ln -s pdsh /etc/dsh $ ln -s group/all /etc/pdsh/machines

See below on how to create the necessary group definition files. At least /etc/pdsh/group/all</tt> must exist to contain all available host names, if pdsh is called with the -a</tt> flag, meaning all these hosts.

Further configuration of pdsh in needed and easy to do. It consists of setting a few environment variables, which should be done using standard profiles.

To do so, just create files /etc/profile.d/pdsh.sh</tt> and /etc/profile.d/pdsh.sh</tt>, which might show the following contents:

$ cat /etc/profile.d/pdsh.sh PDSH_RCMD_TYPE=ssh export PDSH_RCMD_TYPE DSH_SSH_ARGS="-2 -a -x -i /etc/pdsh/auth/id_rsa-for-pdsh -l root %h " export PDSH_SSH_ARGS alias pdshc='_{ set -f; pdsh $* 2>&1 | grep -v "ssh exited with exit code 1" | dshbak -c; set +f; }; _'

respectively

$ cat /etc/profile.d/pdsh.csh setenv PDSH_RCMD_TYPE ssh setenv PDSH_SSH_ARGS "-2 -a -x -i /etc/pdsh/auth/id_rsa-for-pdsh -l root %h " alias pdshc "pdsh \!* |& grep -v 'ssh exited with exit code 1' | dshbak -c"

where pdshc is like pdsh with its output properly grouped, even with standard errors.

Machine Definitions
Although pdsh</tt> may be called by simply supplying host names, or patterns, or lists thereof, it is much more comfortable to define, file, and use arbitrary host groups ind pre-defined directories.

The pdsh host groups containing directories may contain files consisting of host names, one single entry or pattern per line, where the special file named all resp. machine will be consulted by pdsh</tt> when the option -a</tt> meaning all defined hosts is provided to specfiy that special group.

With WCOLL</tt> not set because of possibly fatal misbehavior in case of option mistypes, to supply some machine information is mandatory. So pdsh</tt> will not work without -w</tt> or -g</tt> options.

Standard Application
...

Grouping Output
Together with the package installation there comes the program dshbak</tt>, which can be used to group standard pdsh output to be more compact or readable.

Plain dshbak</tt> will just group identical host outputs, whereas dshbak -c</tt> will group its output.

host01: Linux host02: Linux host03: Linux host04: Linux host[01-04] Linux
 * 1) pdsh -a "uname"
 * 1) pdsh -a "uname" | dshbak -c

There may exist an extended implementation of "dshbak" which provides more detailed information on the size of a host group showing the number of their members in addition.

In case this version exists (in fact, it does not exist yet) it would be called like this:

host[01-04] (#4) Linux
 * 1) pdsh -a "uname" | dshbak -nc

Because output grouping is such a nice thing a shell alias "pdshc" has been provided to automatically use "dshbak -c" with "pdsh".

Examples
Given the hosts server01</tt> to server16</tt> the following host groups shall be defined: all servers and lower and upper half of all servers.

The resulting host group definition file <tt>/etc/pdsh/group/machines</tt> then could look like this:

server0[1-9] server1[0-6]
 * 1) all servers

The files for the lower resp. upper half could look this:

server0[1-8]
 * 1) lower half

server09 server1[0-6]
 * 1) upper half