Redis 哨兵(sentinal)

Redis 的高可用策略包括持久化,复制,哨兵和集群,本文讲解redis的哨兵.

简介

哨兵是redis分布式集群架构中非常重要的一个组件,主要有集群监控,消息通知,故障转移等功能.

核心概念

sdown和odown转换机制

  • sdown是主观宕机,哨兵ping一个master,如果超过了 is-master-down-after-milliseconds指定的毫秒数,就主观认为master宕机
  • odown是客观宕机,在指定时间内, 哨兵收到了quorum指定数量的哨兵也认为 master 是 sdown 了,那么就认为是 odown 了

哨兵集群的自动发现机制

哨兵的自动发现是通过pub/sub实现的,每两秒钟哨兵都会往自己监控的集群中的 __sentinel__:hello channel 里发送一个包含自身信息的消息,
每个哨兵都会监听这个 __sentinel__:hello channel,以此确定其他哨兵的存在,并且会跟其他哨兵交换对master的监控配置,互相进行监控配置的同步.

slave 配置的自动纠正

哨兵会负责自动纠正 slave 的一些配置,比如哨兵会确保将要成为 master 的 slave 在复制现有 master 的数据,故障转移之后,哨兵会确保 slave 连接到正确的 master 上

master 选举算法

当一个 msater 被判定为 odown 并且有 majority 的哨兵允许了主备切换,那么就会从 slave 中选出一个master。

master 的选取 slave 的步骤:

  1. 首先去掉不合适的slave, 如果与 master 断开连接的时长超过了 (down-after-milliseconds * 10) + milliseconds_since_master_is_in_SDOWN_state,就不能被选为新的master
  2. 根据 slave priority 选举,slave priority越低优先级越高
  3. 如果 slave priority 相同,就比较 replica offset, 复制数据越多的slave优先级越高
  4. 如果 slave priority 和 replica offset 都相同,就选择 run id 最小的

quorum 和 majority

做主备切换前,首先需要 quorum 数量的哨兵认为master 是 odown,然后选举出一个哨兵来做切换,这个哨兵还得得到 majority 哨兵的授权,才能正式执行切换。

假设 quorum < majority,比如5个哨兵,majority就是3,quorum设置为2,那么就3个哨兵授权就可以执行切换,

如果quorum >= majority,那么必须quorum数量的哨兵都授权,比如5个哨兵,quorum是5,那么必须5个哨兵都同意授权,才能执行切换

configuration epoch

哨兵会对 master+slave 进行监控,有相应的监控的配置,执行切换的那个哨兵,会从要切换到的新 master 那里得到一个 configuration epoch ,这就是一个version号,每次切换的 version 号都必须是唯一的
如果第一个选举出的哨兵切换失败了,那么其他哨兵,会等待 failover-timeout 时间后继续执行切换,此时会重新获取一个新的configuration epoch,作为新的version号。

configuraiton传播

哨兵完成切换之后,会在本地生成最新的master配置包括新的version,然后通过pub/sub消息机制同步给其他的哨兵,其他哨兵根据版本号的大小判断是否更新自己的 master 配置。

数据丢失问题

在故障转移,主备切换的时候,可能会发生数据丢失的情况

异步复制导致的数据丢失

redis的复制功能是异步的,所以如果master在数据复制的过程中宕机,就会发生数据丢失

脑裂导致的数据丢失

master有时会因为网络问题暂时无法连接 salve,这时哨兵会误认为master宕机了并且开始进行故障转移,这会造成哨兵集群中出现两个 master.因为之前的master还在正常的接收请求,所以这时就会发生数据不一致数据丢失的情况

解决数据丢失问题

配置一下两个选项可以减少异步复制和脑裂导致的数据丢失

1
2
3
4
# 表示至少有1个slave,数据复制和同步的延迟不能超过10秒,
# 如果所有slave数据同步都超过10秒,则master停止接收请求
min-slaves-to-write 1
min-slaves-max-lag 10

详细说明:

min-slaves-max-lag 可以确保一旦slave同步数据延迟过长,就认为master宕机后大量丢失数据,这时就拒绝写请求,这样可以把由异步复制导致的数据丢失问题控制到最小范围内.

当master出现了脑裂,会逐渐和其他slave失去连接,配置min-slaves-to-write就可以给当前连接的slave设置一个阈值,如果小于了设置值,master就直接拒绝客户端的写请求.

实践演练

配置项说明

配置项 说明
sentinel monitor 哨兵指定要监控的master
down-after-milliseconds 1000 哨兵与redis节点的超时阈值
sentinel failover-timeout 180000 执行故障转移的timeout超时时长
sentinel parallel-syncs 1 slave切换为master后同时进行同步的slave数量

完整配置示例

准备三个Redis节点

一主两从,参考Redis复制

ip 角色
172.17.0.2 master
172.17.0.3 slave
172.17.0.4 slave
修改 sentinel 配置文件

三个节点是一样的配置/etc/redis/sentinel.conf

1
2
3
4
sentinel monitor mymaster 172.17.0.2 6379 2
sentinel down-after-milliseconds mymaster 30000
sentinel failover-timeout mymaster 60000
sentinel parallel-syncs mymaster 1
依次启动三个哨兵
1
redis-sentinel /etc/redis/sentinel.conf
客户端登录查看状态
1
2
3
4
zj@zj-pc:~$ redis-cli -h 172.17.0.2 -p 26379 
172.17.0.2:26379> sentinel get-master-addr-by-name mymaster
1) "172.17.0.2"
2) "6379"

其他状态命令:

1
2
3
sentinel master mymaster
sentinel slaves mymaster
sentinel sentinels mymaster

容灾演练

查看当前master节点信息
1
2
3
172.17.0.2:26379> sentinel get-master-addr-by-name mymaster
1) "172.17.0.2"
2) "6379"
手动使master宕机
查看 pid
1
2
3
4
5
6
7
8
root@507b5013f669:~# ps -aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 65508 5288 ? Ss Jul16 0:00 /usr/sbin/sshd -D
root 47 0.0 0.0 65508 6308 ? Rs Jul16 0:00 sshd: root@pts/0
root 49 0.0 0.0 18320 3324 pts/0 Ss Jul16 0:00 -bash
root 94 0.3 0.0 39552 4936 pts/0 Sl 06:13 0:41 redis-server 0.0.0.0:6379
root 113 0.4 0.0 38412 4004 pts/0 Sl 06:31 0:50 redis-sentinel 0.0.0.0:26379 [sentinel]
root 118 0.0 0.0 34424 2892 pts/0 R+ 09:56 0:00 ps -aux
kill 命令
1
root@507b5013f669:~# kill -9 94
删除 pid 文件
1
rm /var/run/redis_6379.pid
查看三个哨兵的日志
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
# 172.17.0.2 节点
113:X 17 Jul 09:57:04.675 # +sdown master mymaster 172.17.0.2 6379
113:X 17 Jul 09:57:04.711 # +new-epoch 1
113:X 17 Jul 09:57:04.719 # +vote-for-leader b9932b720e70c34059617068763213c5459516aa 1
113:X 17 Jul 09:57:04.758 # +odown master mymaster 172.17.0.2 6379 #quorum 3/2
113:X 17 Jul 09:57:04.758 # Next failover delay: I will not start a failover before Tue Jul 17 10:03:05 2018
113:X 17 Jul 09:57:05.552 # +config-update-from sentinel b9932b720e70c34059617068763213c5459516aa 172.17.0.4 26379 @ mymaster 172.17.0.2 6379
113:X 17 Jul 09:57:05.552 # +switch-master mymaster 172.17.0.2 6379 172.17.0.4 6379
113:X 17 Jul 09:57:05.552 * +slave slave 172.17.0.3:6379 172.17.0.3 6379 @ mymaster 172.17.0.4 6379
113:X 17 Jul 09:57:05.552 * +slave slave 172.17.0.2:6379 172.17.0.2 6379 @ mymaster 172.17.0.4 6379

# 172.17.0.3 节点
152:S 17 Jul 09:56:34.576 # Connection with master lost.
152:S 17 Jul 09:56:34.576 * Caching the disconnected master state.
152:S 17 Jul 09:56:35.379 * Connecting to MASTER 172.17.0.2:6379
152:S 17 Jul 09:56:35.379 * MASTER <-> SLAVE sync started
152:S 17 Jul 09:56:35.379 # Error condition on socket for SYNC: Connection refused
152:S 17 Jul 09:56:36.382 * Connecting to MASTER 172.17.0.2:6379
152:S 17 Jul 09:56:36.382 * MASTER <-> SLAVE sync started
152:S 17 Jul 09:56:36.382 # Error condition on socket for SYNC: Connection refused
152:S 17 Jul 09:56:37.386 * Connecting to MASTER 172.17.0.2:6379
152:S 17 Jul 09:56:37.386 * MASTER <-> SLAVE sync started
152:S 17 Jul 09:56:37.386 # Error condition on socket for SYNC: Connection refused
......
161:X 17 Jul 09:57:04.639 # +sdown master mymaster 172.17.0.2 6379
161:X 17 Jul 09:57:04.710 # +new-epoch 1
161:X 17 Jul 09:57:04.719 # +vote-for-leader b9932b720e70c34059617068763213c5459516aa 1
152:S 17 Jul 09:57:05.460 * Connecting to MASTER 172.17.0.2:6379
152:S 17 Jul 09:57:05.461 * MASTER <-> SLAVE sync started
152:S 17 Jul 09:57:05.461 # Error condition on socket for SYNC: Connection refused
152:S 17 Jul 09:57:05.552 * SLAVE OF 172.17.0.4:6379 enabled (user request from 'id=15 addr=172.17.0.4:32787 fd=14 name=sentinel-b9932b72-cmd age=12159 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=135 qbuf-free=32633 obl=36 oll=0 omem=0 events=r cmd=exec')
161:X 17 Jul 09:57:05.552 # +config-update-from sentinel b9932b720e70c34059617068763213c5459516aa 172.17.0.4 26379 @ mymaster 172.17.0.2 6379
161:X 17 Jul 09:57:05.552 # +switch-master mymaster 172.17.0.2 6379 172.17.0.4 6379
161:X 17 Jul 09:57:05.552 * +slave slave 172.17.0.3:6379 172.17.0.3 6379 @ mymaster 172.17.0.4 6379
161:X 17 Jul 09:57:05.552 * +slave slave 172.17.0.2:6379 172.17.0.2 6379 @ mymaster 172.17.0.4 6379
152:S 17 Jul 09:57:05.554 # CONFIG REWRITE executed with success.
152:S 17 Jul 09:57:06.464 * Connecting to MASTER 172.17.0.4:6379
152:S 17 Jul 09:57:06.465 * MASTER <-> SLAVE sync started
152:S 17 Jul 09:57:06.465 * Non blocking connect for SYNC fired the event.
152:S 17 Jul 09:57:06.465 * Master replied to PING, replication can continue...
152:S 17 Jul 09:57:06.465 * Trying a partial resynchronization (request 19c4a1290245b212de71a98c36fab894d23be166:2412576).
152:S 17 Jul 09:57:06.466 * Successful partial resynchronization with master.
152:S 17 Jul 09:57:06.466 # Master replication ID changed to 350c1de8bccc6d29411e343d029b5254edc68330
152:S 17 Jul 09:57:06.466 * MASTER <-> SLAVE sync: Master accepted a Partial Resynchronization.

# 172.17.0.4 节点

85:X 17 Jul 09:57:04.641 # +sdown master mymaster 172.17.0.2 6379
85:X 17 Jul 09:57:04.700 # +odown master mymaster 172.17.0.2 6379 #quorum 2/2
85:X 17 Jul 09:57:04.700 # +new-epoch 1
85:X 17 Jul 09:57:04.700 # +try-failover master mymaster 172.17.0.2 6379
85:X 17 Jul 09:57:04.702 # +vote-for-leader b9932b720e70c34059617068763213c5459516aa 1
85:X 17 Jul 09:57:04.719 # 2ded228e7ac4b38c47d8947cc243375a14c72061 voted for b9932b720e70c34059617068763213c5459516aa 1
85:X 17 Jul 09:57:04.719 # 9ec47b87ed2e9ef90a82d8b2752429fa22708ae1 voted for b9932b720e70c34059617068763213c5459516aa 1
85:X 17 Jul 09:57:04.754 # +elected-leader master mymaster 172.17.0.2 6379
85:X 17 Jul 09:57:04.754 # +failover-state-select-slave master mymaster 172.17.0.2 6379
85:X 17 Jul 09:57:04.838 # +selected-slave slave 172.17.0.4:6379 172.17.0.4 6379 @ mymaster 172.17.0.2 6379
85:X 17 Jul 09:57:04.838 * +failover-state-send-slaveof-noone slave 172.17.0.4:6379 172.17.0.4 6379 @ mymaster 172.17.0.2 6379
85:X 17 Jul 09:57:04.896 * +failover-state-wait-promotion slave 172.17.0.4:6379 172.17.0.4 6379 @ mymaster 172.17.0.2 6379
85:X 17 Jul 09:57:05.503 # +promoted-slave slave 172.17.0.4:6379 172.17.0.4 6379 @ mymaster 172.17.0.2 6379
85:X 17 Jul 09:57:05.503 # +failover-state-reconf-slaves master mymaster 172.17.0.2 6379
85:X 17 Jul 09:57:05.551 * +slave-reconf-sent slave 172.17.0.3:6379 172.17.0.3 6379 @ mymaster 172.17.0.2 6379
85:X 17 Jul 09:57:05.842 # -odown master mymaster 172.17.0.2 6379
85:X 17 Jul 09:57:06.543 * +slave-reconf-inprog slave 172.17.0.3:6379 172.17.0.3 6379 @ mymaster 172.17.0.2 6379
85:X 17 Jul 09:57:06.543 * +slave-reconf-done slave 172.17.0.3:6379 172.17.0.3 6379 @ mymaster 172.17.0.2 6379
85:X 17 Jul 09:57:06.596 # +failover-end master mymaster 172.17.0.2 6379
85:X 17 Jul 09:57:06.596 # +switch-master mymaster 172.17.0.2 6379 172.17.0.4 6379
85:X 17 Jul 09:57:06.596 * +slave slave 172.17.0.3:6379 172.17.0.3 6379 @ mymaster 172.17.0.4 6379
85:X 17 Jul 09:57:06.596 * +slave slave 172.17.0.2:6379 172.17.0.2 6379 @ mymaster 172.17.0.4 6379
85:X 17 Jul 09:57:36.644 # +sdown slave 172.17.0.2:6379 172.17.0.2 6379 @ mymaster 172.17.0.4 6379

根据日志可以看出:

  1. 首先三个哨兵都认为 master 是 sdown了(+sdown master mymaster 172.17.0.2 6379
  2. 超过 quorum 指定的哨兵都认为 sdown 后,就变为 odown (+odown master mymaster 172.17.0.2 6379 #quorum 2/2)
  3. 哨兵更新配置版本号(+new-epoch 1
  4. 172.17.0.4 节点的哨兵尝试执行主备切换(+try-failover master mymaster 172.17.0.2 6379
  5. 选举出一个要作为master的节点(+vote-for-leader b9932b720e70c34059617068763213c5459516aa 1
  6. 对被选的slave执行 slaveof-noone,不再做为salve,旧的master不再做为master
  7. 哨兵开始修改各个Redis节点配置
  8. 旧的master(172.17.0.2)被改为salve,依然是宕机状态,被哨兵认为是 sdown
查看当前master节点信息

可以发现master已经从 172.17.0.2 切换到了 172.17.0.4

1
2
3
172.17.0.2:26379> sentinel get-master-addr-by-name mymaster 
1) "172.17.0.4"
2) "6379"

登录 172.17.0.4 查看replication信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
zj@zj-pc:~$ redis-cli -h 172.17.0.4
172.17.0.4:6379> info replication
# Replication
role:master
connected_slaves:1
slave0:ip=172.17.0.3,port=6379,state=online,offset=2468205,lag=0
master_replid:350c1de8bccc6d29411e343d029b5254edc68330
master_replid2:19c4a1290245b212de71a98c36fab894d23be166
master_repl_offset:2468205
second_repl_offset:2412576
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1419630
repl_backlog_histlen:1048576

可以看出当前集群中只有172.17.0.3一个slave, 172.17.0.2依然是宕机状态

故障恢复

启动 172.17.0.2 节点的Redis进程

1
redis-server /etc/redis/6379.conf &

查看日志:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
113:X 17 Jul 10:02:24.631 # -sdown slave 172.17.0.2:6379 172.17.0.2 6379 @ mymaster 172.17.0.4 6379
120:S 17 Jul 10:02:33.886 * Before turning into a slave, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
120:S 17 Jul 10:02:33.886 * SLAVE OF 172.17.0.4:6379 enabled (user request from 'id=2 addr=172.17.0.3:39643 fd=7 name=sentinel-9ec47b87-cmd age=10 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=0 qbuf-free=32768 obl=36 oll=0 omem=0 events=r cmd=exec')
120:S 17 Jul 10:02:33.889 # CONFIG REWRITE executed with success.
120:S 17 Jul 10:02:34.755 * Connecting to MASTER 172.17.0.4:6379
120:S 17 Jul 10:02:34.756 * MASTER <-> SLAVE sync started
120:S 17 Jul 10:02:34.756 * Non blocking connect for SYNC fired the event.
120:S 17 Jul 10:02:34.756 * Master replied to PING, replication can continue...
120:S 17 Jul 10:02:34.756 * Trying a partial resynchronization (request c491477a647e8351f61a5edd36dc896049eaf86f:1).
120:S 17 Jul 10:02:34.758 * Full resync from master: 350c1de8bccc6d29411e343d029b5254edc68330:2478400
120:S 17 Jul 10:02:34.758 * Discarding previously cached master state.
120:S 17 Jul 10:02:34.790 * MASTER <-> SLAVE sync: receiving 188 bytes from master
120:S 17 Jul 10:02:34.790 * MASTER <-> SLAVE sync: Flushing old data
120:S 17 Jul 10:02:34.791 * MASTER <-> SLAVE sync: Loading DB in memory
120:S 17 Jul 10:02:34.791 * MASTER <-> SLAVE sync: Finished with success

可以发现,重启的Redis 依次进行了重写配置,连接master,复制等一系列操作

再次查看master的replication信息
1
2
3
4
5
6
7
8
9
10
11
12
13
14
172.17.0.4:6379> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=172.17.0.3,port=6379,state=online,offset=2480192,lag=0
slave1:ip=172.17.0.2,port=6379,state=online,offset=2480057,lag=1
master_replid:350c1de8bccc6d29411e343d029b5254edc68330
master_replid2:19c4a1290245b212de71a98c36fab894d23be166
master_repl_offset:2480192
second_repl_offset:2412576
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1431617
repl_backlog_histlen:1048576

发现宕机的旧master(172.17.0.2)节点已经成功的成为新master(172.17.0.4)的salve

其他操作

哨兵节点的增加和删除

哨兵节点的增加,可以被自动发现,删除哨兵需要以下步骤:

  • 停止 setinel 进程
  • 在其他所有哨兵上执行 SENTINEL RESET *
  • 在其他所有哨兵上执行 SENTINEL MASTER <mastername>

永久下线一个salve

在所有哨兵上执行:

1
SENTINEL RESET <mastername>

文章标题:Redis 哨兵(sentinal)

文章字数:3.7k

本文作者:Waterandair

发布时间:2018-03-16, 23:38:23

最后更新:2019-12-28, 14:03:59

原始链接:https://waterandair.github.io/2018-03-16-redis-sentinel.html

版权声明: "署名-非商用-相同方式共享 4.0" 转载请保留原文链接及作者。

目录
×

喜欢就点赞,疼爱就打赏

github