redis sentinel集群搭建及监控

db redis

创建时间:2019-08-07 11:49

阅读:

搭建环境介绍
redis各角色的启动脚本
sentinel配置文件
redis集群状态变化的通知脚本
redis相关命令
搭建监控prometheus+grafana+redis_exporter
参考

搭建环境介绍
redis各角色的启动脚本
sentinel配置文件
redis集群状态变化的通知脚本
搭建监控prometheus+grafana+redis_exporter
参考

搭建环境介绍

在一台服务器上使用redis docker镜像安装集群测试环境
一主二从三哨兵:

ip	port	name	role
10.0.1.200	6479	master	master
10.0.1.200	6579	slave01	slave
10.0.1.200	6679	slave02	slave
10.0.1.200	26479	sen01	sentinel
10.0.1.200	26579	sen02	sentinel
10.0.1.200	26679	sen03	sentinel

redis各角色的启动脚本

为了避免通告的IP和端口问题，docker容器的网络直接使用host
1_run_redis_master.sh

#!/bin/bash
name=master
docker stop ${name} && docker rm ${name}
docker run -d --name ${name} --net 'host' --restart always \
-v /etc/localtime:/etc/localtime \
-v /data/redis_sentinel/master_data:/data redis:5.0.5 \
redis-server --port 6479

2_run_redis_slave01.sh

#!/bin/bash
name=slave01
docker stop ${name} && docker rm ${name}
docker run -d --name ${name} --net 'host' --restart always \
-v /etc/localtime:/etc/localtime \
-v /data/redis_sentinel/slave01_data:/data redis:5.0.5 \
redis-server --port 6579 --slaveof 10.0.1.200 6479

3_run_redis_slave02.sh

#!/bin/bash
name=slave02
docker stop ${name} && docker rm ${name}
docker run -d --name ${name} --net 'host' --restart always
-v /etc/localtime:/etc/localtime \
-v /data/redis_sentinel/slave02_data:/data redis:5.0.5 \
redis-server --port 6679 --slaveof 10.0.1.200 6479

4_start_sen01.sh

#!/bin/bash
name="sen01"
docker stop ${name} && docker rm ${name}
docker run -d --name ${name} --net 'host' \
-v /data/redis_sentinel/conf:/conf \
redis:5.0.5 redis-server /conf/sentinel26479.conf --sentinel

5_start_sen02.sh

#!/bin/bash
name="sen02"
docker stop ${name} && docker rm ${name}
docker run -d --name ${name} --net 'host' \
-v /data/redis_sentinel/conf:/conf redis:5.0.5 \
redis-server /conf/sentinel26579.conf --sentinel

6_start_sen03.sh

#!/bin/bash
name="sen03"
docker stop ${name} && docker rm ${name}
docker run -d --name ${name} --net 'host' \
-v /data/redis_sentinel/conf:/conf redis:5.0.5 \
redis-server /conf/sentinel26679.conf --sentinel

sentinel配置文件

从github获取redis sentinel配置文件
wget https://raw.githubusercontent.com/antirez/redis/unstable/sentinel.conf

为方便以后快速查看配置选项的含义，保留配置项的英文注释，修改配置文件如下：
sentinel26479.conf

port 26479
daemonize no
pidfile "/var/run/redis-sentinel.pid"
logfile ""

# dir <working-directory>
# Every long running process should have a well-defined working directory.
# For Redis Sentinel to chdir to /tmp at startup is the simplest thing
# for the process to don't interfere with administrative tasks such as
# unmounting filesystems.
dir "/tmp"

# sentinel monitor <master-name> <ip> <redis-port> <quorum>
#
# Tells Sentinel to monitor this master, and to consider it in O_DOWN
# (Objectively Down) state only if at least <quorum> sentinels agree.
#
# Note that whatever is the ODOWN quorum, a Sentinel will require to
# be elected by the majority of the known Sentinels in order to
# start a failover, so no failover can be performed in minority.
#
# Replicas are auto-discovered, so you don't need to specify replicas in
# any way. Sentinel itself will rewrite this configuration file adding
# the replicas using additional configuration options.
# Also note that the configuration file is rewritten when a
# replica is promoted to master.
#
# Note: master name should not include special characters or spaces.
# The valid charset is A-z 0-9 and the three characters ".-_".


# Number of milliseconds the master (or any attached replica or sentinel) should
# be unreachable (as in, not acceptable reply to PING, continuously, for the
# specified period) in order to consider it in S_DOWN state (Subjectively
# Down).
#
# Default is 30 seconds.

sentinel deny-scripts-reconfig yes

# sentinel parallel-syncs <master-name> <numreplicas>
#
# How many replicas we can reconfigure to point to the new replica simultaneously
# during the failover. Use a low number if you use the replicas to serve query
# to avoid that all the replicas will be unreachable at about the same
# time while performing the synchronization with the master.

sentinel monitor mymaster 10.0.1.200 6479 2

# sentinel failover-timeout <master-name> <milliseconds>
#
# Specifies the failover timeout in milliseconds. It is used in many ways:
#
# - The time needed to re-start a failover after a previous failover was
#   already tried against the same master by a given Sentinel, is two
#   times the failover timeout.
#
# - The time needed for a replica replicating to a wrong master according
#   to a Sentinel current configuration, to be forced to replicate
#   with the right master, is exactly the failover timeout (counting since
#   the moment a Sentinel detected the misconfiguration).
#
# - The time needed to cancel a failover that is already in progress but
#   did not produced any configuration change (SLAVEOF NO ONE yet not
#   acknowledged by the promoted replica).
#
# - The maximum time a failover in progress waits for all the replicas to be
#   reconfigured as replicas of the new master. However even after this time
#   the replicas will be reconfigured by the Sentinels anyway, but not with
#   the exact parallel-syncs progression as specified.
#
# Default is 3 minutes.

sentinel down-after-milliseconds mymaster 5000

# NOTIFICATION SCRIPT
#
# sentinel notification-script <master-name> <script-path>
#
# Call the specified notification script for any sentinel event that is
# generated in the WARNING level (for instance -sdown, -odown, and so forth).
# This script should notify the system administrator via email, SMS, or any
# other messaging system, that there is something wrong with the monitored
# Redis systems.
#
# The script is called with just two arguments: the first is the event type
# and the second the event description.
#
# The script must exist and be executable in order for sentinel to start if
# this option is provided.

sentinel notification-script mymaster /conf/notify.sh
sentinel config-epoch mymaster 0
# Generated by CONFIG REWRITE
sentinel leader-epoch mymaster 0
sentinel known-replica mymaster 10.0.1.200 6679
sentinel known-replica mymaster 10.0.1.200 6579
sentinel current-epoch 0

sentinel26579.conf

port 26579
daemonize no
pidfile "/var/run/redis-sentinel.pid"
logfile ""
dir "/tmp"
sentinel deny-scripts-reconfig yes
sentinel monitor mymaster 10.0.1.200 6479 2
sentinel down-after-milliseconds mymaster 5000
sentinel notification-script mymaster /conf/notify.sh
sentinel config-epoch mymaster 0
sentinel leader-epoch mymaster 0
sentinel known-replica mymaster 10.0.1.200 6679
sentinel known-replica mymaster 10.0.1.200 6579
sentinel current-epoch 0

sentinel26679.conf

port 26679
daemonize no
pidfile "/var/run/redis-sentinel.pid"
logfile ""
dir "/tmp"
sentinel deny-scripts-reconfig yes
sentinel monitor mymaster 10.0.1.200 6479 2
sentinel down-after-milliseconds mymaster 5000
sentinel notification-script mymaster /conf/notify.sh
sentinel config-epoch mymaster 0
sentinel leader-epoch mymaster 0
sentinel known-replica mymaster 10.0.1.200 6579
sentinel known-replica mymaster 10.0.1.200 6679
sentinel current-epoch 0

redis集群状态变化的通知脚本

由于官方的redis镜像中没有curl命令，所以自己动手用golang写个简单的http POST字符串的脚本编译成gohttppost，
调用钉钉的webhook机器人，达到通知的目的，golang脚本参见：
使用go发送http POST请求：https://gitrootid.github.io/2019/08/07/golang/go-http-post_/

notify.sh

#!/bin/bash
dir=$(dirname "$0")
cd ${dir}
dir=`pwd`
cd ${dir}

event_type=$1
event_description=$2

topic='redis-sentinel-notification'
content='topic:'${topic}',event:'${event_type}',description:'${event_description}
ding_talk_url='https://oapi.dingtalk.com/robot/send?access_token=XXX'
MSG='{"msgtype": "text","text": {"content": "'${content}'"}}'
./gohttppost -u "${ding_talk_url}" -b "${MSG}" -s true

至此集群搭建完成

redis相关命令

1.info查看集群相关信息

1
2
3

docker exec -it sen01 /bin/bash
redis-cli -p 26679
>info

2.monitor命令可以查看redis服务器正在执行的所有命令
注意：由于monitor命令返回服务器处理的所有的命令, 所以在性能上会有一些消耗.
3.save阻塞式数据持久化
4.bgsave非阻塞式数据持久化

搭建监控prometheus+grafana+redis_exporter

参考：https://github.com/oliver006/redis_exporter
修改prometheus配置文件

# my global config
global:
  scrape_interval:     15s
  evaluation_interval: 30s
  # scrape_timeout is set to the global default (10s).

scrape_configs:

- job_name: redis_irs_cs
  static_configs:
    - targets:
      - redis://10.0.1.200:6479
      - redis://10.0.1.200:6579
      - redis://10.0.1.200:6679
  metrics_path: /scrape
  relabel_configs:
    - source_labels: [__address__]
      target_label: __param_target
    - source_labels: [__param_target]
      target_label: addr
    - target_label: __address__
      replacement: 10.0.1.200:9121

使用docker镜像运行prometheus

#!/bin/bash
name=prometheus
docker stop ${name} && docker rm ${name}
docker run --name ${name} -d -p 9090:9090 \
-v /data/project/prometheus/conf/prometheus.yml:/etc/prometheus/prometheus.yml \
-v /data/project/prometheus/data:/prometheus \
-v /etc/localtime:/etc/localtime prom/prometheus

使用docker镜像运行redis_exporter
1.0版本支持直接把redis_exporter作为黑盒运行，不用关心配置

#!/bin/bash
name=redis_exporter
docker stop ${name} && docker rm ${name}
docker run -d --name ${name} --restart always -p 9121:9121 oliver006/redis_exporter

使用docker镜像运行grafana

#!/bin/bash
name=grafana
docker stop ${name} && docker rm ${name}
docker run -d --name ${name} --restart always \
-v /data/project/grafana:/var/lib/grafana \
-p 3000:3000 grafana/grafana

默认用户名密码：
admin/admin

登陆后导入从grafana官网下载的redis dashboard仪表盘：https://grafana.com/api/dashboards/2751/revisions/1/download
然后在configuration里配置datasource,再进入prometheus-redis的dashboard,就可以看到漂亮的各指标界面了

prometheus的报警规则以后再增加