Bonjour à tous,

J’ai eu un jour un problème avec les volumes répliqués (AFR) de glusterfs. En effet si vous avez un serveur qui ne répond pas correctement comme par exemple l’arbitrer, les fichiers à répliquer vont s’accumuler et une fois que vous en avez beaucoup, vos serveurs vont monter en CPU à fond car ils n’arrivent pas à gérer ces nombres incroyables de fichiers.

Il est donc important de surveiller le nombre de « heals ». En effet, s’ils dépassent la centaine (100) il faut se poser des questions sur la lenteur ou sur l’état de santé de votre glusterfs.

On peut également vérifier directement sur le serveur avec la commande :

[pastacode lang= »bash » manual= »gluster%20volume%20heal%20NOM-DE-LA-BRICK%20statistics%20heal-count » message= » » highlight= » » provider= »manual »/]

Voici un script donc à utiliser sur Centreon (et Nagios/Shinken/etc…) :

[pastacode lang= »bash » manual= »%23!%2Fbin%2Fsh%0A%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%0A%23%20Author%20%3A%20Bruno%20LEAL%20DE%20SOUSA%0A%23%20Version%20%3A%20v1%0A%23%20Configuration%20%3A%0A%23%20%09%09Afin%20de%20faciliter%20la%20communication%2C%20il%20faut%20au%20prealable%20autoriser%20les%20connexions%20SSH%20entre%20les%20machines%20au%20travers%20des%20cles%20SSH.%0A%23%09%09Copier%20le%20contenu%20de%20du%20fichier%20.ssh%2Fid_rsa.pub%20du%20serveur%20SUPERVISION%20dans%20%20.ssh%2Fauthorized_keys%20du%20serveur%20distant%0A%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%0A%0ASTATE_OK%3D0%0ASTATE_WARNING%3D1%0ASTATE_CRITICAL%3D2%0ASTATE_UNKNOWN%3D3%0A%0Aif%20%5B%5B%20-z%20%22%241%22%20%5D%5D%20%0Athen%0A%20%20%20%20%20%20%20%20echo%20%22Parametres%20manquants%20!%20Syntax%3A%20.%2Fcheck_rhel_gluster_heals.sh%20NOMSERVEUR%20Brick%20SeuilWarning%20SeuilCritical%22%0A%20%20%20%20%20%20%20%20echo%20%22Ex%3A%20.%2Fcheck_rhel_gluster_heals.sh%20SRV-01%20SHARE-BUREAU%2010%20150%22%0A%20%20%20%20%20%20%20%20exit%20%24STATE_UNKNOWN%0Afi%0A%0ASRV%3D%241%0ABRICK%3D%242%0AWARN%3D%243%0ACRIT%3D%244%0A%0AREQUEST_SSH%3D%22%24(ssh%20root%40%24SRV%20gluster%20volume%20heal%20%24BRICK%20statistics%20heal-count%20%7C%20awk%20’%241%3D%3D%22Brick%22%7Bprintf%20%242%22%3D%22%7D%3B%241%3D%3D%22Number%22%7Bprint%20%244%22%2C%22%7D’)%22%0A%0ASRV1%3D%22%24(echo%20%24REQUEST_SSH%20%7C%20awk%20-F%20%22%3D%7C%2C%22%20’%7Bprint%20%20%241%7D’)%22%0ASRV2%3D%22%24(echo%20%24REQUEST_SSH%20%7C%20awk%20-F%20%22%3D%7C%2C%22%20’%7Bprint%20%20%243%7D’)%22%0ASRV3%3D%22%24(echo%20%24REQUEST_SSH%20%7C%20awk%20-F%20%22%3D%7C%2C%22%20’%7Bprint%20%20%245%7D’)%22%0APERFSRV1%3D%22%24(echo%20%24REQUEST_SSH%20%7C%20awk%20-F%20%22%3D%7C%2C%22%20’%7Bprint%20%20%242%7D’)%22%0APERFSRV2%3D%22%24(echo%20%24REQUEST_SSH%20%7C%20awk%20-F%20%22%3D%7C%2C%22%20’%7Bprint%20%20%244%7D’)%22%0APERFSRV3%3D%22%24(echo%20%24REQUEST_SSH%20%7C%20awk%20-F%20%22%3D%7C%2C%22%20’%7Bprint%20%20%246%7D’)%22%0A%0A%0AOUTPUT%3D%22Heals%20%3A%20%24PERFSRV1%2C%24PERFSRV2%2C%24PERFSRV3%20%7C%20%24SRV1%3D%24PERFSRV1%2C%24SRV2%3D%24PERFSRV2%2C%24SRV3%3D%24PERFSRV3%22%0A%23echo%20%24OUTPUT%0A%0Aif%20%5B%20%24PERFSRV1%20-le%20%24WARN%20%5D%20%7C%7C%20%5B%20%24PERFSRV2%20-le%20%24WARN%20%5D%20%7C%7C%20%20%5B%20%24PERFSRV3%20-le%20%24WARN%20%5D%0Athen%0A%20%20echo%20%22OK%20-%20%24OUTPUT%22%0A%20%20exit%20%24STATE_OK%0Aelif%20%5B%20%24PERFSRV1%20-le%20%24CRIT%20%5D%20%7C%7C%20%5B%20%24PERFSRV2%20-le%20%24CRIT%20%5D%20%7C%7C%20%20%5B%20%24PERFSRV3%20-le%20%24CRIT%20%5D%0Athen%0A%20%20echo%20%22WARNING%20-%20%24OUTPUT%22%0A%20%20exit%20%24STATE_WARNING%0Aelif%20%5B%20%24PERFSRV1%20-gt%20%24CRIT%20%5D%20%7C%7C%20%5B%20%24PERFSRV2%20-gt%20%24CRIT%20%5D%20%7C%7C%20%20%5B%20%24PERFSRV3%20-gt%20%24CRIT%20%5D%0Athen%0A%20%20echo%20%22CRITICAL%20-%20%24OUTPUT%22%0A%20%20exit%20%24STATE_CRITICAL%0Aelse%0A%20%20echo%20%22UNKNOWN%22%0A%20%20exit%20%24STATE_UNKNOWN%0Afi » message= »check-rhel-gluster-heals.sh » highlight= » » provider= »manual »/]

Il a été testé et validé sur RedHat Gluster 3.3 en RedHat 7.

A bientôt.

—————

Bruno SOUSA