当前位置: 首页 > HA, Linux > 正文

将 resouce angent 改造成 multi state 类型

习惯了资源clone的方便快捷

突然发现,有时候我们需要在这些无差异的clone资源中选出一个来,好做一些“不一样“的事情,

就好像是在一群人中选出一个代表,来做一些领导性的工作一样

怎么样来做到这点呢?

pacemaker提供了多态(multi-state)的概念,除了具备clone的特性外,还多出了

master/slave的概念,可以在资源agent中判断自己所处的角色,然后根据角色的不同,做一些不同的操作

那么,如何将自己的agent改造为 master/slave 的呢?

最简单的思路:在每个节点写一个文件,记录下自己所处的角色好了

有了思路就可以开始实施了,这里完全照搬 pacemaker/Stateful 资源的做法

改造开始

1. 设置 $CRM_MASTER

    在 # Initialization: 块中加入
    CRM_MASTER="${HA_SBIN_DIR}/crm_master -l reboot"

2. usage 函数改造[可选]

usage函数本来就是给“人”看的,就算不修改,也不会影响功能。可以加入以下的内容对demote和promote进行说明

    The 'promote' operation xxxxxx xxxxxx.
    The 'demote' operation xxx xxxxxx.

3. meta_data函数处理

加入状态文件的参数

<parameter name="state" unique="1">
<longdesc lang="en">
Location to store the resource state in
</longdesc>
<shortdesc lang="en">State file</shortdesc>
<content type="string" default="${HA_VARRUN}/Stateful-{OCF_RESOURCE_INSTANCE}.state" />
</parameter>

action模块加入monitor的动作

<action name="monitor" depth="0"  timeout="20" interval="30" role="Master"/>
<action name="monitor" depth="0"  timeout="20" interval="30" role="Slave"/>

注意,默认timeout相同时, pacemaker会认为两个timeout是一个操作,需要在crm中进行修改

下面是pacemaker对设置不同interval的说明

It is crucial that every monitor operation has a different interval!
This is because Pacemaker currently differentiates between operations only 
by resource and interval;
so if eg. a master/slave resource has the same monitor interval for both roles, 
Pacemaker would ignore the role when checking the status - which would 
cause unexpected return codes,and therefore unnecessary complications.

4. 添加 demote 和 promote 函数

    stateful_update() {
        echo $1 > ${OCF_RESKEY_state}
    }

    stateful_check_state() {
        target=$1
        if [ -f ${OCF_RESKEY_state} ]; then
            state=`cat ${OCF_RESKEY_state}`
            if [ "x$target" = "x$state" ]; then
                return 0
            fi

        else
            if [ "x$target" = "x" ]; then
                return 0
            fi
        fi

        return 1
    }

    stateful_demote() {
        stateful_check_state
        if [ $? = 0 ]; then
            # CRM Error - Should never happen
            return $OCF_NOT_RUNNING
        fi
        stateful_update "slave"
        $CRM_MASTER -v ${slave_score}
        return $OCF_SUCCESS
    }

    stateful_promote() {
        stateful_check_state
        if [ $? = 0 ]; then
            return $OCF_NOT_RUNNING
        fi
        stateful_update "master"
        $CRM_MASTER -v ${master_score}
        return $OCF_SUCCESS
    }

对于 master/slave 的agent,必须要有 promote 和 demote 函数

5. start 函数改造

    返回之前添加
    stateful_update "slave"
    $CRM_MASTER -v ${slave_score}

标记自己的状态为slave,顺便告诉pacemaker(使用crm_master来实现)

注意:master/slave资源启动的角色必须是 slave, 然后由pacemaker在slave的节点中选择一个promote为master

6 stop 函数改造

    $CRM_MASTER -D
    stateful_check_state "master"
    if [ $? = 0 ]; then
        # CRM Error - Should never happen
        return $OCF_RUNNING_MASTER
    fi
    if [ -f ${OCF_RESKEY_state} ]; then
        rm ${OCF_RESKEY_state}
    fi

在发送停止命令之前,先进行demote操作

7 monitor函数改造

将返回的 $OCF_SUCCESS 那一行修改为

    stateful_check_state "master"
    if [ $? = 0 ]; then
        if [ $OCF_RESKEY_CRM_meta_interval = 0 ]; then
            # Restore the master setting during probes
            $CRM_MASTER -v ${master_score}
        fi
        return $OCF_RUNNING_MASTER
    fi

    stateful_check_state "slave"
    if [ $? = 0 ]; then
        if [ $OCF_RESKEY_CRM_meta_interval = 0 ]; then
            # Restore the master setting during probes
            $CRM_MASTER -v ${slave_score}
        fi
        return $OCF_SUCCESS
    fi

    echo "File '${OCF_RESKEY_state}' exists but contains unexpected contents"
    return $OCF_ERR_GENERIC

pacemaker充分相信agent,它对资源角色的判断完全来自monitor函数,所有只要我们告诉它自己是 $OCF_RUNNING_MASTER, 它就会认为我们是master

8 加入默认值

: ${slave_score=5}
: ${master_score=10}

: ${OCF_RESKEY_CRM_meta_interval=0}
: ${OCF_RESKEY_CRM_meta_globally_unique:="true"}

if [ "x$OCF_RESKEY_state" = "x" ]; then
    if [ ${OCF_RESKEY_CRM_meta_globally_unique} = "false" ]; then
        state="${HA_VARRUN}/Stateful-${OCF_RESOURCE_INSTANCE}.state"

        # Strip off the trailing clone marker
        OCF_RESKEY_state=`echo $state | sed s/:[0-9][0-9]*\.state/.state/`
    else
        OCF_RESKEY_state="${HA_VARRUN}/Stateful-${OCF_RESOURCE_INSTANCE}.state"
    fi
fi

上面的操作中多次用到了 slave_score 和 master_score 等,现在就对它们进行赋值,要点:只要 master > slave 即可

9 最后,加入动作

case $1 in

    promote)        stateful_promote
            ;;
    demote)         stateful_demote
            ;;

好了,agent已经改造完成了,现在展示一下

在pacemaker中的配置如下

primitive testklwang ocf:test:klwang \
	op monitor interval="30" role="Master" \
	op monitor interval="29" role="Slave"

下面是crm_mon的结果:

Last updated: Sat Jun  8 12:28:15 2013
Last change: Sat Jun  8 12:28:14 2013 via cibadmin on node1
Stack: cman
Current DC: cent1 - partition with quorum
Version: 1.1.8-7.el6-394e906
5 Nodes configured, 3 expected votes
32 Resources configured.



Online: [ node1 node2 node3 node4 node5 ]

 Master/Slave Set: ms_klwang [testklwang]
     Masters: [ node1 ]
     Slaves: [ node2 node3 node4 node5 ]

现在,我们的目的达到了,已经在集群中选出了一个作为master的节点,下来我们就可以做一些不一样的事情啦

    stateful_check_state "master"
    if [ $? -eq 0  ]; then
        # 将你想干的事情放在这里
        return $OCF_SUCCESS
    fi

上面演示中 test:klwang 是使用自带的Dummy示例改造而来,把代码贴出来,也好做个对照

#!/bin/sh
#
#
#	Dummy OCF RA. Does nothing but wait a few seconds, can be
#	configured to fail occassionally.
#
# Copyright (c) 2004 SUSE LINUX AG, Lars Marowsky-Brée
#                    All Rights Reserved.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of version 2 of the GNU General Public License as
# published by the Free Software Foundation.
#
# This program is distributed in the hope that it would be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#
# Further, this software is distributed without any warranty that it is
# free of the rightful claim of any third person regarding infringement
# or the like.  Any license provided herein, whether implied or
# otherwise, applies only to this software file.  Patent licenses, if
# any, provided herein do not apply to combinations of this program with
# other software, or any other product whatsoever.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.
#

#######################################################################
# Initialization:

: ${OCF_FUNCTIONS=${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs}
. ${OCF_FUNCTIONS}
: ${__OCF_ACTION=$1}
CRM_MASTER="${HA_SBIN_DIR}/crm_master -l reboot"

#######################################################################

meta_data() {
	cat <<END
<?xml version="1.0"?>
<!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
<resource-agent name="Dummy" version="1.0">
<version>1.0</version>

<longdesc lang="en">
This is a Dummy Resource Agent. It does absolutely nothing except 
keep track of whether its running or not.
Its purpose in life is for testing and to serve as a template for RA writers.

NB: Please pay attention to the timeouts specified in the actions
section below. They should be meaningful for the kind of resource
the agent manages. They should be the minimum advised timeouts,
but they shouldn't/cannot cover _all_ possible resource
instances. So, try to be neither overly generous nor too stingy,
but moderate. The minimum timeouts should never be below 10 seconds.
</longdesc>
<shortdesc lang="en">Example stateless resource agent</shortdesc>

<parameters>
<parameter name="state" unique="1">
<longdesc lang="en">
Location to store the resource state in.
</longdesc>
<shortdesc lang="en">State file</shortdesc>
<content type="string" default="${HA_VARRUN}/Dummy-{OCF_RESOURCE_INSTANCE}.state" />
</parameter>

<parameter name="fake" unique="0">
<longdesc lang="en">
Fake attribute that can be changed to cause a reload
</longdesc>
<shortdesc lang="en">Fake attribute that can be changed to cause a reload</shortdesc>
<content type="string" default="dummy" />
</parameter>

<parameter name="op_sleep" unique="1">
<longdesc lang="en">
Number of seconds to sleep during operations.  This can be used to test how
the cluster reacts to operation timeouts.
</longdesc>
<shortdesc lang="en">Operation sleep duration in seconds.</shortdesc>
<content type="string" default="0" />
</parameter>

<parameter name="stateful" unique="1">
<longdesc lang="en">
Location to store the resource stateful in
</longdesc>
<shortdesc lang="en">stateful file</shortdesc>
<content type="string" default="${HA_VARRUN}/Dummy-{OCF_RESOURCE_INSTANCE}.stateful" />
</parameter>

</parameters>

<actions>
<action name="start"        timeout="20" />
<action name="stop"         timeout="20" />
<action name="monitor"		depth="0"  timeout="20" interval="30" role="Master"/>
<action name="monitor"		depth="0"  timeout="20" interval="30" role="Slave"/>
<action name="reload"       timeout="20" />
<action name="migrate_to"   timeout="20" />
<action name="migrate_from" timeout="20" />
<action name="validate-all" timeout="20" />
<action name="meta-data"    timeout="5" />
</actions>
</resource-agent>
END
}

#######################################################################

# don't exit on TERM, to test that lrmd makes sure that we do exit
trap sigterm_handler TERM
sigterm_handler() {
	ocf_log info "They use TERM to bring us down. No such luck."
	return
}

dummy_usage() {
	cat <<END
usage: $0 {start|stop|monitor|migrate_to|migrate_from|validate-all|meta-data}

Expects to have a fully populated OCF RA-compliant environment set.
END
}

stateful_update() {
	echo $1 > ${OCF_RESKEY_state}
}

stateful_check_state() {
	target=$1
	if [ -f ${OCF_RESKEY_state} ]; then
		state=`cat ${OCF_RESKEY_state}`
		if [ "x$target" = "x$state" ]; then
		    return 0
		fi

	else
		if [ "x$target" = "x" ]; then
		    return 0
		fi
	fi

	return 1
}

stateful_demote() {
	stateful_check_state
	if [ $? = 0 ]; then
		# CRM Error - Should never happen
		return $OCF_NOT_RUNNING
	fi
	stateful_update "slave"
	$CRM_MASTER -v ${slave_score}
	return $OCF_SUCCESS
}

stateful_promote() {
	stateful_check_state
	if [ $? = 0 ]; then
		return $OCF_NOT_RUNNING
	fi
	stateful_update "master"
	$CRM_MASTER -v ${master_score}
	return $OCF_SUCCESS
}



dummy_start() {
    dummy_monitor
    if [ $? =  $OCF_SUCCESS ]; then
		return $OCF_SUCCESS
    fi
    touch ${OCF_RESKEY_state}
	stateful_update "slave"
	$CRM_MASTER -v ${slave_score}
}

dummy_stop() {
    dummy_monitor
    if [ $? = $OCF_SUCCESS ]; then
    	$CRM_MASTER -D
    	stateful_check_state "master"
    	if [ $? = 0 ]; then
    	    # CRM Error - Should never happen
    	    return $OCF_RUNNING_MASTER
    	fi
    	if [ -f ${OCF_RESKEY_state} ]; then
    	    rm ${OCF_RESKEY_state}
    	fi
		rm ${OCF_RESKEY_state}
    fi
    return $OCF_SUCCESS
}

dummy_monitor() {
	# Monitor _MUST!_ differentiate correctly between running
	# (SUCCESS), failed (ERROR) or _cleanly_ stopped (NOT RUNNING).
	# That is THREE states, not just yes/no.

	sleep ${OCF_RESKEY_op_sleep}
	
	if [ -f ${OCF_RESKEY_state} ]; then
		    stateful_check_state "master"
		    if [ $? = 0 ]; then
		        if [ $OCF_RESKEY_CRM_meta_interval = 0 ]; then
		            # Restore the master setting during probes
		            $CRM_MASTER -v ${master_score}
		        fi
		        return $OCF_RUNNING_MASTER
		    fi

		    stateful_check_state "slave"
		    if [ $? = 0 ]; then
		        if [ $OCF_RESKEY_CRM_meta_interval = 0 ]; then
		            # Restore the master setting during probes
		            $CRM_MASTER -v ${slave_score}
		        fi
		        return $OCF_SUCCESS
		    fi
		
		    echo "File '${OCF_RESKEY_state}' exists but contains unexpected contents"
		    return $OCF_ERR_GENERIC
	fi
	if false ; then
		return $OCF_ERR_GENERIC
	fi
	return $OCF_NOT_RUNNING
}

dummy_validate() {
    
    # Is the state directory writable? 
    state_dir=`dirname "$OCF_RESKEY_state"`
    touch "$state_dir/$$"
    if [ $? != 0 ]; then
	return $OCF_ERR_ARGS
    fi
    rm "$state_dir/$$"

    return $OCF_SUCCESS
}


: ${slave_score=5}
: ${master_score=10}
: ${OCF_RESKEY_fake=dummy}
: ${OCF_RESKEY_op_sleep=0}
: ${OCF_RESKEY_CRM_meta_interval=0}
: ${OCF_RESKEY_CRM_meta_globally_unique:="true"}

if [ "x$OCF_RESKEY_state" = "x" ]; then
    if [ ${OCF_RESKEY_CRM_meta_globally_unique} = "false" ]; then
	state="${HA_VARRUN}/Dummy-${OCF_RESOURCE_INSTANCE}.state"
	
	# Strip off the trailing clone marker
	OCF_RESKEY_state=`echo $state | sed s/:[0-9][0-9]*\.state/.state/`
    else 
	OCF_RESKEY_state="${HA_VARRUN}/Dummy-${OCF_RESOURCE_INSTANCE}.state"
    fi
fi

case $__OCF_ACTION in
meta-data)	meta_data
		exit $OCF_SUCCESS
		;;
start)		dummy_start;;
stop)		dummy_stop;;
monitor)	dummy_monitor;;
migrate_to)	ocf_log info "Migrating ${OCF_RESOURCE_INSTANCE} to ${OCF_RESKEY_CRM_meta_migrate_target}."
	        dummy_stop
		;;
migrate_from)	ocf_log info "Migrating ${OCF_RESOURCE_INSTANCE} to ${OCF_RESKEY_CRM_meta_migrate_source}."
	        dummy_start
		;;
promote)	stateful_promote
		;;
demote)		stateful_demote
		;;

reload)		ocf_log err "Reloading..."
	        dummy_start
		;;
validate-all)	dummy_validate;;
usage|help)	dummy_usage
		exit $OCF_SUCCESS
		;;
*)		dummy_usage
		exit $OCF_ERR_UNIMPLEMENTED
		;;
esac
rc=$?
ocf_log debug "${OCF_RESOURCE_INSTANCE} $__OCF_ACTION : $rc"
exit $rc

申明:本文中大量的代码来自ClusterLabs, 使用时请遵守相关约束

    分享到:

本文固定链接: http://klwang.info/change-your-resouce-angent-to-multistate/ | 数据库|Linux|软件开发

该日志由 klwang 于2013年06月10日发表在 HA, Linux 分类下, 你可以发表评论,并在保留原文地址及作者的情况下引用到你的网站或博客。
原创文章转载请注明: 将 resouce angent 改造成 multi state 类型 | 数据库|Linux|软件开发
关键字: , , , , , ,

将 resouce angent 改造成 multi state 类型:等您坐沙发呢!

发表评论

*
快捷键:Ctrl+Enter