Redis 的高可用策略包括持久化,复制,哨兵和集群,本文讲解redis的哨兵.
简介
哨兵是redis分布式集群架构中非常重要的一个组件,主要有集群监控,消息通知,故障转移等功能.
核心概念
sdown和odown转换机制
- sdown是主观宕机,哨兵ping一个master,如果超过了
is-master-down-after-milliseconds
指定的毫秒数,就主观认为master宕机
- odown是客观宕机,在指定时间内, 哨兵收到了quorum指定数量的哨兵也认为 master 是 sdown 了,那么就认为是 odown 了
哨兵集群的自动发现机制
哨兵的自动发现是通过pub/sub实现的,每两秒钟哨兵都会往自己监控的集群中的 __sentinel__:hello
channel 里发送一个包含自身信息的消息,
每个哨兵都会监听这个 __sentinel__:hello
channel,以此确定其他哨兵的存在,并且会跟其他哨兵交换对master的监控配置,互相进行监控配置的同步.
slave 配置的自动纠正
哨兵会负责自动纠正 slave 的一些配置,比如哨兵会确保将要成为 master 的 slave 在复制现有 master 的数据,故障转移之后,哨兵会确保 slave 连接到正确的 master 上
master 选举算法
当一个 msater 被判定为 odown 并且有 majority 的哨兵允许了主备切换,那么就会从 slave 中选出一个master。
master 的选取 slave 的步骤:
- 首先去掉不合适的slave, 如果与 master 断开连接的时长超过了
(down-after-milliseconds * 10) + milliseconds_since_master_is_in_SDOWN_state
,就不能被选为新的master
- 根据 slave priority 选举,slave priority越低优先级越高
- 如果 slave priority 相同,就比较 replica offset, 复制数据越多的slave优先级越高
- 如果 slave priority 和 replica offset 都相同,就选择 run id 最小的
quorum 和 majority
做主备切换前,首先需要 quorum 数量的哨兵认为master 是 odown,然后选举出一个哨兵来做切换,这个哨兵还得得到 majority 哨兵的授权,才能正式执行切换。
假设 quorum < majority,比如5个哨兵,majority就是3,quorum设置为2,那么就3个哨兵授权就可以执行切换,
如果quorum >= majority,那么必须quorum数量的哨兵都授权,比如5个哨兵,quorum是5,那么必须5个哨兵都同意授权,才能执行切换
configuration epoch
哨兵会对 master+slave 进行监控,有相应的监控的配置,执行切换的那个哨兵,会从要切换到的新 master 那里得到一个 configuration epoch ,这就是一个version号,每次切换的 version 号都必须是唯一的
如果第一个选举出的哨兵切换失败了,那么其他哨兵,会等待 failover-timeout 时间后继续执行切换,此时会重新获取一个新的configuration epoch,作为新的version号。
configuraiton传播
哨兵完成切换之后,会在本地生成最新的master配置包括新的version,然后通过pub/sub消息机制同步给其他的哨兵,其他哨兵根据版本号的大小判断是否更新自己的 master 配置。
数据丢失问题
在故障转移,主备切换的时候,可能会发生数据丢失的情况
异步复制导致的数据丢失
redis的复制功能是异步的,所以如果master在数据复制的过程中宕机,就会发生数据丢失
脑裂导致的数据丢失
master有时会因为网络问题暂时无法连接 salve,这时哨兵会误认为master宕机了并且开始进行故障转移,这会造成哨兵集群中出现两个 master.因为之前的master还在正常的接收请求,所以这时就会发生数据不一致数据丢失的情况
解决数据丢失问题
配置一下两个选项可以减少异步复制和脑裂导致的数据丢失
1 2 3 4
| # 表示至少有1个slave,数据复制和同步的延迟不能超过10秒, # 如果所有slave数据同步都超过10秒,则master停止接收请求 min-slaves-to-write 1 min-slaves-max-lag 10
|
详细说明:
min-slaves-max-lag
可以确保一旦slave同步数据延迟过长,就认为master宕机后大量丢失数据,这时就拒绝写请求,这样可以把由异步复制导致的数据丢失问题控制到最小范围内.
当master出现了脑裂,会逐渐和其他slave失去连接,配置min-slaves-to-write
就可以给当前连接的slave设置一个阈值,如果小于了设置值,master就直接拒绝客户端的写请求.
实践演练
配置项说明
配置项 |
说明 |
sentinel monitor |
哨兵指定要监控的master |
down-after-milliseconds 1000 |
哨兵与redis节点的超时阈值 |
sentinel failover-timeout 180000 |
执行故障转移的timeout超时时长 |
sentinel parallel-syncs 1 |
slave切换为master后同时进行同步的slave数量 |
完整配置示例
准备三个Redis节点
一主两从,参考Redis复制
ip |
角色 |
172.17.0.2 |
master |
172.17.0.3 |
slave |
172.17.0.4 |
slave |
修改 sentinel 配置文件
三个节点是一样的配置/etc/redis/sentinel.conf
:
1 2 3 4
| sentinel monitor mymaster 172.17.0.2 6379 2 sentinel down-after-milliseconds mymaster 30000 sentinel failover-timeout mymaster 60000 sentinel parallel-syncs mymaster 1
|
依次启动三个哨兵
1
| redis-sentinel /etc/redis/sentinel.conf
|
客户端登录查看状态
1 2 3 4
| zj@zj-pc:~$ redis-cli -h 172.17.0.2 -p 26379 172.17.0.2:26379> sentinel get-master-addr-by-name mymaster 1) "172.17.0.2" 2) "6379"
|
其他状态命令:
1 2 3
| sentinel master mymaster sentinel slaves mymaster sentinel sentinels mymaster
|
容灾演练
查看当前master节点信息
1 2 3
| 172.17.0.2:26379> sentinel get-master-addr-by-name mymaster 1) "172.17.0.2" 2) "6379"
|
手动使master宕机
查看 pid
1 2 3 4 5 6 7 8
| root@507b5013f669:~# ps -aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 65508 5288 ? Ss Jul16 0:00 /usr/sbin/sshd -D root 47 0.0 0.0 65508 6308 ? Rs Jul16 0:00 sshd: root@pts/0 root 49 0.0 0.0 18320 3324 pts/0 Ss Jul16 0:00 -bash root 94 0.3 0.0 39552 4936 pts/0 Sl 06:13 0:41 redis-server 0.0.0.0:6379 root 113 0.4 0.0 38412 4004 pts/0 Sl 06:31 0:50 redis-sentinel 0.0.0.0:26379 [sentinel] root 118 0.0 0.0 34424 2892 pts/0 R+ 09:56 0:00 ps -aux
|
kill 命令
1
| root@507b5013f669:~# kill -9 94
|
删除 pid 文件
1
| rm /var/run/redis_6379.pid
|
查看三个哨兵的日志
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70
| # 172.17.0.2 节点 113:X 17 Jul 09:57:04.675 # +sdown master mymaster 172.17.0.2 6379 113:X 17 Jul 09:57:04.711 # +new-epoch 1 113:X 17 Jul 09:57:04.719 # +vote-for-leader b9932b720e70c34059617068763213c5459516aa 1 113:X 17 Jul 09:57:04.758 # +odown master mymaster 172.17.0.2 6379 #quorum 3/2 113:X 17 Jul 09:57:04.758 # Next failover delay: I will not start a failover before Tue Jul 17 10:03:05 2018 113:X 17 Jul 09:57:05.552 # +config-update-from sentinel b9932b720e70c34059617068763213c5459516aa 172.17.0.4 26379 @ mymaster 172.17.0.2 6379 113:X 17 Jul 09:57:05.552 # +switch-master mymaster 172.17.0.2 6379 172.17.0.4 6379 113:X 17 Jul 09:57:05.552 * +slave slave 172.17.0.3:6379 172.17.0.3 6379 @ mymaster 172.17.0.4 6379 113:X 17 Jul 09:57:05.552 * +slave slave 172.17.0.2:6379 172.17.0.2 6379 @ mymaster 172.17.0.4 6379
# 172.17.0.3 节点 152:S 17 Jul 09:56:34.576 # Connection with master lost. 152:S 17 Jul 09:56:34.576 * Caching the disconnected master state. 152:S 17 Jul 09:56:35.379 * Connecting to MASTER 172.17.0.2:6379 152:S 17 Jul 09:56:35.379 * MASTER <-> SLAVE sync started 152:S 17 Jul 09:56:35.379 # Error condition on socket for SYNC: Connection refused 152:S 17 Jul 09:56:36.382 * Connecting to MASTER 172.17.0.2:6379 152:S 17 Jul 09:56:36.382 * MASTER <-> SLAVE sync started 152:S 17 Jul 09:56:36.382 # Error condition on socket for SYNC: Connection refused 152:S 17 Jul 09:56:37.386 * Connecting to MASTER 172.17.0.2:6379 152:S 17 Jul 09:56:37.386 * MASTER <-> SLAVE sync started 152:S 17 Jul 09:56:37.386 # Error condition on socket for SYNC: Connection refused ...... 161:X 17 Jul 09:57:04.639 # +sdown master mymaster 172.17.0.2 6379 161:X 17 Jul 09:57:04.710 # +new-epoch 1 161:X 17 Jul 09:57:04.719 # +vote-for-leader b9932b720e70c34059617068763213c5459516aa 1 152:S 17 Jul 09:57:05.460 * Connecting to MASTER 172.17.0.2:6379 152:S 17 Jul 09:57:05.461 * MASTER <-> SLAVE sync started 152:S 17 Jul 09:57:05.461 # Error condition on socket for SYNC: Connection refused 152:S 17 Jul 09:57:05.552 * SLAVE OF 172.17.0.4:6379 enabled (user request from 'id=15 addr=172.17.0.4:32787 fd=14 name=sentinel-b9932b72-cmd age=12159 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=135 qbuf-free=32633 obl=36 oll=0 omem=0 events=r cmd=exec') 161:X 17 Jul 09:57:05.552 # +config-update-from sentinel b9932b720e70c34059617068763213c5459516aa 172.17.0.4 26379 @ mymaster 172.17.0.2 6379 161:X 17 Jul 09:57:05.552 # +switch-master mymaster 172.17.0.2 6379 172.17.0.4 6379 161:X 17 Jul 09:57:05.552 * +slave slave 172.17.0.3:6379 172.17.0.3 6379 @ mymaster 172.17.0.4 6379 161:X 17 Jul 09:57:05.552 * +slave slave 172.17.0.2:6379 172.17.0.2 6379 @ mymaster 172.17.0.4 6379 152:S 17 Jul 09:57:05.554 # CONFIG REWRITE executed with success. 152:S 17 Jul 09:57:06.464 * Connecting to MASTER 172.17.0.4:6379 152:S 17 Jul 09:57:06.465 * MASTER <-> SLAVE sync started 152:S 17 Jul 09:57:06.465 * Non blocking connect for SYNC fired the event. 152:S 17 Jul 09:57:06.465 * Master replied to PING, replication can continue... 152:S 17 Jul 09:57:06.465 * Trying a partial resynchronization (request 19c4a1290245b212de71a98c36fab894d23be166:2412576). 152:S 17 Jul 09:57:06.466 * Successful partial resynchronization with master. 152:S 17 Jul 09:57:06.466 # Master replication ID changed to 350c1de8bccc6d29411e343d029b5254edc68330 152:S 17 Jul 09:57:06.466 * MASTER <-> SLAVE sync: Master accepted a Partial Resynchronization.
# 172.17.0.4 节点
85:X 17 Jul 09:57:04.641 # +sdown master mymaster 172.17.0.2 6379 85:X 17 Jul 09:57:04.700 # +odown master mymaster 172.17.0.2 6379 #quorum 2/2 85:X 17 Jul 09:57:04.700 # +new-epoch 1 85:X 17 Jul 09:57:04.700 # +try-failover master mymaster 172.17.0.2 6379 85:X 17 Jul 09:57:04.702 # +vote-for-leader b9932b720e70c34059617068763213c5459516aa 1 85:X 17 Jul 09:57:04.719 # 2ded228e7ac4b38c47d8947cc243375a14c72061 voted for b9932b720e70c34059617068763213c5459516aa 1 85:X 17 Jul 09:57:04.719 # 9ec47b87ed2e9ef90a82d8b2752429fa22708ae1 voted for b9932b720e70c34059617068763213c5459516aa 1 85:X 17 Jul 09:57:04.754 # +elected-leader master mymaster 172.17.0.2 6379 85:X 17 Jul 09:57:04.754 # +failover-state-select-slave master mymaster 172.17.0.2 6379 85:X 17 Jul 09:57:04.838 # +selected-slave slave 172.17.0.4:6379 172.17.0.4 6379 @ mymaster 172.17.0.2 6379 85:X 17 Jul 09:57:04.838 * +failover-state-send-slaveof-noone slave 172.17.0.4:6379 172.17.0.4 6379 @ mymaster 172.17.0.2 6379 85:X 17 Jul 09:57:04.896 * +failover-state-wait-promotion slave 172.17.0.4:6379 172.17.0.4 6379 @ mymaster 172.17.0.2 6379 85:X 17 Jul 09:57:05.503 # +promoted-slave slave 172.17.0.4:6379 172.17.0.4 6379 @ mymaster 172.17.0.2 6379 85:X 17 Jul 09:57:05.503 # +failover-state-reconf-slaves master mymaster 172.17.0.2 6379 85:X 17 Jul 09:57:05.551 * +slave-reconf-sent slave 172.17.0.3:6379 172.17.0.3 6379 @ mymaster 172.17.0.2 6379 85:X 17 Jul 09:57:05.842 # -odown master mymaster 172.17.0.2 6379 85:X 17 Jul 09:57:06.543 * +slave-reconf-inprog slave 172.17.0.3:6379 172.17.0.3 6379 @ mymaster 172.17.0.2 6379 85:X 17 Jul 09:57:06.543 * +slave-reconf-done slave 172.17.0.3:6379 172.17.0.3 6379 @ mymaster 172.17.0.2 6379 85:X 17 Jul 09:57:06.596 # +failover-end master mymaster 172.17.0.2 6379 85:X 17 Jul 09:57:06.596 # +switch-master mymaster 172.17.0.2 6379 172.17.0.4 6379 85:X 17 Jul 09:57:06.596 * +slave slave 172.17.0.3:6379 172.17.0.3 6379 @ mymaster 172.17.0.4 6379 85:X 17 Jul 09:57:06.596 * +slave slave 172.17.0.2:6379 172.17.0.2 6379 @ mymaster 172.17.0.4 6379 85:X 17 Jul 09:57:36.644 # +sdown slave 172.17.0.2:6379 172.17.0.2 6379 @ mymaster 172.17.0.4 6379
|
根据日志可以看出:
- 首先三个哨兵都认为 master 是 sdown了(
+sdown master mymaster 172.17.0.2 6379
)
- 超过 quorum 指定的哨兵都认为 sdown 后,就变为 odown (
+odown master mymaster 172.17.0.2 6379 #quorum 2/2
)
- 哨兵更新配置版本号(
+new-epoch 1
)
- 172.17.0.4 节点的哨兵尝试执行主备切换(
+try-failover master mymaster 172.17.0.2 6379
)
- 选举出一个要作为master的节点(
+vote-for-leader b9932b720e70c34059617068763213c5459516aa 1
)
- 对被选的slave执行
slaveof-noone
,不再做为salve,旧的master不再做为master
- 哨兵开始修改各个Redis节点配置
- 旧的master(172.17.0.2)被改为salve,依然是宕机状态,被哨兵认为是 sdown
查看当前master节点信息
可以发现master已经从 172.17.0.2
切换到了 172.17.0.4
:
1 2 3
| 172.17.0.2:26379> sentinel get-master-addr-by-name mymaster 1) "172.17.0.4" 2) "6379"
|
登录 172.17.0.4
查看replication信息:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| zj@zj-pc:~$ redis-cli -h 172.17.0.4 172.17.0.4:6379> info replication # Replication role:master connected_slaves:1 slave0:ip=172.17.0.3,port=6379,state=online,offset=2468205,lag=0 master_replid:350c1de8bccc6d29411e343d029b5254edc68330 master_replid2:19c4a1290245b212de71a98c36fab894d23be166 master_repl_offset:2468205 second_repl_offset:2412576 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:1419630 repl_backlog_histlen:1048576
|
可以看出当前集群中只有172.17.0.3
一个slave, 172.17.0.2
依然是宕机状态
故障恢复
启动 172.17.0.2
节点的Redis进程
1
| redis-server /etc/redis/6379.conf &
|
查看日志:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| 113:X 17 Jul 10:02:24.631 # -sdown slave 172.17.0.2:6379 172.17.0.2 6379 @ mymaster 172.17.0.4 6379 120:S 17 Jul 10:02:33.886 * Before turning into a slave, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer. 120:S 17 Jul 10:02:33.886 * SLAVE OF 172.17.0.4:6379 enabled (user request from 'id=2 addr=172.17.0.3:39643 fd=7 name=sentinel-9ec47b87-cmd age=10 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=0 qbuf-free=32768 obl=36 oll=0 omem=0 events=r cmd=exec') 120:S 17 Jul 10:02:33.889 # CONFIG REWRITE executed with success. 120:S 17 Jul 10:02:34.755 * Connecting to MASTER 172.17.0.4:6379 120:S 17 Jul 10:02:34.756 * MASTER <-> SLAVE sync started 120:S 17 Jul 10:02:34.756 * Non blocking connect for SYNC fired the event. 120:S 17 Jul 10:02:34.756 * Master replied to PING, replication can continue... 120:S 17 Jul 10:02:34.756 * Trying a partial resynchronization (request c491477a647e8351f61a5edd36dc896049eaf86f:1). 120:S 17 Jul 10:02:34.758 * Full resync from master: 350c1de8bccc6d29411e343d029b5254edc68330:2478400 120:S 17 Jul 10:02:34.758 * Discarding previously cached master state. 120:S 17 Jul 10:02:34.790 * MASTER <-> SLAVE sync: receiving 188 bytes from master 120:S 17 Jul 10:02:34.790 * MASTER <-> SLAVE sync: Flushing old data 120:S 17 Jul 10:02:34.791 * MASTER <-> SLAVE sync: Loading DB in memory 120:S 17 Jul 10:02:34.791 * MASTER <-> SLAVE sync: Finished with success
|
可以发现,重启的Redis 依次进行了重写配置,连接master,复制等一系列操作
再次查看master的replication信息
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| 172.17.0.4:6379> info replication # Replication role:master connected_slaves:2 slave0:ip=172.17.0.3,port=6379,state=online,offset=2480192,lag=0 slave1:ip=172.17.0.2,port=6379,state=online,offset=2480057,lag=1 master_replid:350c1de8bccc6d29411e343d029b5254edc68330 master_replid2:19c4a1290245b212de71a98c36fab894d23be166 master_repl_offset:2480192 second_repl_offset:2412576 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:1431617 repl_backlog_histlen:1048576
|
发现宕机的旧master(172.17.0.2)节点已经成功的成为新master(172.17.0.4)的salve
其他操作
哨兵节点的增加和删除
哨兵节点的增加,可以被自动发现,删除哨兵需要以下步骤:
- 停止 setinel 进程
- 在其他所有哨兵上执行
SENTINEL RESET *
- 在其他所有哨兵上执行
SENTINEL MASTER <mastername>
永久下线一个salve
在所有哨兵上执行:
1
| SENTINEL RESET <mastername>
|