在使用mgr的过程中,我们会经常看到以下报错,大概意思是:“由于网络故障,成员被逐出mgr集群,并将该成员状态更改为error”。这因为mgr集群节点间心跳检测超时,少数派节点被驱逐导致的。
2022-07-31t13:07:30.464239-00:00 0 [warning] [my-011498] [repl] plugin group_replication reported: 'the member has resumed contact with a majority of the members in the group. regular operation is restored and transactions are unblocked.'
2022-07-31t13:07:37.458761-00:00 0 [error] [my-011505] [repl] plugin group_replication reported: 'member was expelled from the group due to network failures, changing member status to error.'
下面结合2个具体案例,分析以下3个参数配置对mgr网络故障的影响和mgr的故障检测流程。
集群拓扑:单主模式
节点 ip地址 主机名 状态
node1 172.16.86.61 mysqlwjhtest01 primary
node2 172.16.86.62 mysqlwjhtest02 secondary
node3 172.16.86.63 mysqlwjhtest03 secondary
3个测试参数
#适用于8.0.21以上版本
group_replication_member_expel_timeout=30 #可疑节点超过30秒被驱逐集群
group_replication_unreachable_majority_timeout=0 #网络分区后少数派节点网络不可达超时时间
group_replication_autorejoin_tries=3 #error状态节点重新加入集群的次数
本次测试主要包括两步:
- 模拟网络分区,看它对集群各节点的影响。
- 恢复网络连接,看看各节点又是如何反应的。
案例1:模拟secondary节点网络中断
1.1 模拟网络分区故障,在 node3 上执行(当前node3为secondary节点)。通过iptables 命令断开 node3 与 node1、node2 之间的网络连接。
iptables -a input -p tcp -s 172.16.86.61 -j drop
iptables -a output -p tcp -d 172.16.86.61 -j drop
iptables -a input -p tcp -s 172.16.86.62 -j drop
iptables -a output -p tcp -d 172.16.86.62 -j drop
命令执行完 5s(mgr心跳检测时间,固定值,无法修改)后,各个节点开始响应。
a. 首先查看多数派节点(node1 、node2)日志及集群状态信息。显示 node3 处于 unreachable 状态。
[warning] [my-011493] [repl] plugin group_replication reported: 'member with address mysqlwjhtest03:3307 has become unreachable.'
root@localhost: 19:38: [(none)]> select member_host,member_state,member_role,member_version from performance_schema.replication_group_members;
---------------- -------------- ------------- ----------------
| member_host | member_state | member_role | member_version |
---------------- -------------- ------------- ----------------
| mysqlwjhtest03 | unreachable | secondary | 8.0.28 |
| mysqlwjhtest02 | online | secondary | 8.0.28 |
| mysqlwjhtest01 | online | primary | 8.0.28 |
---------------- -------------- ------------- ----------------
3 rows in set (0.00 sec)
日志显示,检测到node3网络不可达以后,等过了30秒(参数group_replication_member_expel_timeout=30控制)以后,node3被驱逐出集群。集群状态视图只有2个节点了。
2023-05-24t19:38:39.413051 08:00 0 [warning] [my-011493] [repl] plugin group_replication reported: 'member with address mysqlwjhtest03:3307 has become unreachable.'
2023-05-24t19:39:09.413491 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] xcom_client_remove_node: try to push xcom_client_remove_node to xcom'
2023-05-24t19:39:09.468173 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] updating physical connections to other servers'
2023-05-24t19:39:09.468299 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] using existing server node 0 host 172.16.86.61:13307'
2023-05-24t19:39:09.468320 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] using existing server node 1 host 172.16.86.62:13307'
2023-05-24t19:39:09.468336 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] sucessfully installed new site definition. start synode for this configuration is {88e4dbe0 29726 0}, boot key synode is {88e4dbe0 29715 0}, configured event horizon=10, my node identifier is 0'
2023-05-24t19:39:10.399352 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] a configuration change was detected. sending a global view message to all nodes. my node identifier is 0 and my address is 172.16.86.61:13307'
2023-05-24t19:39:10.404044 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] group is able to support up to communication protocol version 8.0.27'
2023-05-24t19:39:10.404139 08:00 0 [warning] [my-011499] [repl] plugin group_replication reported: 'members removed from the group: mysqlwjhtest03:3307'
2023-05-24t19:39:10.404219 08:00 0 [system] [my-011503] [repl] plugin group_replication reported: 'group membership changed to mysqlwjhtest02:3307, mysqlwjhtest01:3307 on view 16849117016333239:4.'
2023-05-24t19:39:20.951668 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] failure reading from fd=-1 n=18446744073709551615 from 172.16.86.63:13307'
此时查看多数派节点(node1 、node2)状态信息,已经看不到node3节点了。
root@localhost: 20:02: [(none)]> select member_host,member_state,member_role,member_version from performance_schema.replication_group_members;
---------------- -------------- ------------- ----------------
| member_host | member_state | member_role | member_version |
---------------- -------------- ------------- ----------------
| mysqlwjhtest02 | online | secondary | 8.0.28 |
| mysqlwjhtest01 | online | primary | 8.0.28 |
---------------- -------------- ------------- ----------------
2 rows in set (0.00 sec)
b. 然后查看少数派节点(node3)日志及集群状态信息,显示node1,node2 处于 unreachable 状态。
[warning] [my-011493] [repl] plugin group_replication reported: 'member with address mysqlwjhtest01:3307 has become unreachable.'
[warning] [my-011493] [repl] plugin group_replication reported: 'member with address mysqlwjhtest02:3307 has become unreachable.'
[error] [my-011495] [repl] plugin group_replication reported: 'this server is not able to reach a majority of members in the group. this server will now block all updates. the server will remain blocked until contact with the majority is restored. it is possible to use group_replication_force_members to force a new group membership.'
并且在整个过程中,集群状态信息显示node1和node2一直处于unreachable状态。(这是由于group_replication_unreachable_majority_timeout=0,后面展开聊这个参数。)
root@localhost: 19:54: [(none)]> select member_host,member_state,member_role,member_version from performance_schema.replication_group_members;
---------------- -------------- ------------- ----------------
| member_host | member_state | member_role | member_version |
---------------- -------------- ------------- ----------------
| mysqlwjhtest03 | online | secondary | 8.0.28 |
| mysqlwjhtest02 | unreachable | secondary | 8.0.28 |
| mysqlwjhtest01 | unreachable | primary | 8.0.28 |
---------------- -------------- ------------- ----------------
3 rows in set (0.00 sec)
1.2 恢复网络连接
接下来我们尝试恢复 node3 与 node1、node2 之间的网络。
iptables -f
a. 查看原少数派节点(node3)日志,显示心跳检测到node1、node2网络恢复正常,然后auto-rejoin再次加入集群,gcs通信恢复正常,应用xcom cache,更新change master等。
2023-05-24t20:20:52.475071 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] connected to 172.16.86.61:13307'
2023-05-24t20:20:53.655956 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] connected to 172.16.86.62:13307'
2023-05-24t20:21:03.214971 08:00 0 [warning] [my-011494] [repl] plugin group_replication reported: 'member with address mysqlwjhtest01:3307 is reachable again.'
2023-05-24t20:21:03.215075 08:00 0 [warning] [my-011494] [repl] plugin group_replication reported: 'member with address mysqlwjhtest02:3307 is reachable again.'
2023-05-24t20:21:03.215100 08:00 0 [warning] [my-011498] [repl] plugin group_replication reported: 'the member has resumed contact with a majority of the members in the group. regular operation is restored and transactions are unblocked.'
2023-05-24t20:21:19.209028 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] a configuration change was detected. sending a global view message to all nodes. my node identifier is 2 and my address is 172.16.86.63:13307'
2023-05-24t20:21:20.208934 08:00 0 [warning] [my-011493] [repl] plugin group_replication reported: 'member with address mysqlwjhtest01:3307 has become unreachable.'
2023-05-24t20:21:20.209098 08:00 0 [warning] [my-011493] [repl] plugin group_replication reported: 'member with address mysqlwjhtest02:3307 has become unreachable.'
2023-05-24t20:21:20.209131 08:00 0 [error] [my-011495] [repl] plugin group_replication reported: 'this server is not able to reach a majority of members in the group. this server will now block all updates. the server will remain blocked until contact with the majority is restored. it is possible to use group_replication_force_members to force a new group membership.'
2023-05-24t20:21:23.097685 08:00 0 [warning] [my-011494] [repl] plugin group_replication reported: 'member with address mysqlwjhtest01:3307 is reachable again.'
2023-05-24t20:21:23.097871 08:00 0 [warning] [my-011494] [repl] plugin group_replication reported: 'member with address mysqlwjhtest02:3307 is reachable again.'
2023-05-24t20:21:23.097916 08:00 0 [warning] [my-011498] [repl] plugin group_replication reported: 'the member has resumed contact with a majority of the members in the group. regular operation is restored and transactions are unblocked.'
2023-05-24t20:21:23.107732 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] updating physical connections to other servers'
2023-05-24t20:21:23.107836 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] using existing server node 0 host 172.16.86.61:13307'
2023-05-24t20:21:23.107854 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] using existing server node 1 host 172.16.86.62:13307'
2023-05-24t20:21:23.107872 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] sucessfully installed new site definition. start synode for this configuration is {88e4dbe0 29726 0}, boot key synode is {88e4dbe0 29715 0}, configured event horizon=10, my node identifier is 4294967295'
2023-05-24t20:21:26.125137 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] sending a request to a remote xcom failed. please check the remote node log for more details.'
2023-05-24t20:21:26.125323 08:00 0 [error] [my-011505] [repl] plugin group_replication reported: 'member was expelled from the group due to network failures, changing member status to error.'
2023-05-24t20:21:26.125534 08:00 0 [error] [my-011712] [repl] plugin group_replication reported: 'the server was automatically set into read only mode after an error was detected.'
2023-05-24t20:21:26.126663 08:00 5344 [system] [my-013373] [repl] plugin group_replication reported: 'started auto-rejoin procedure attempt 1 of 3'
2023-05-24t20:21:26.127111 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] xcom_client_remove_node: try to push xcom_client_remove_node to xcom'
2023-05-24t20:21:26.127169 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] xcom_client_remove_node: failed to push into xcom.'
2023-05-24t20:21:26.291624 08:00 0 [system] [my-011504] [repl] plugin group_replication reported: 'group membership changed: this member has left the group.'
2023-05-24t20:21:26.293673 08:00 13 [note] [my-010596] [repl] error reading relay log event for channel 'group_replication_applier': slave sql thread was killed
2023-05-24t20:21:26.294631 08:00 13 [note] [my-010587] [repl] slave sql thread for channel 'group_replication_applier' exiting, replication stopped in log 'first' at position 0
2023-05-24t20:21:26.294877 08:00 11 [note] [my-011444] [repl] plugin group_replication reported: 'the group replication applier thread was killed.'
2023-05-24t20:21:26.296001 08:00 5344 [note] [my-011673] [repl] plugin group_replication reported: 'group communication ssl configuration: group_replication_ssl_mode: "disabled"'
2023-05-24t20:21:26.299638 08:00 5344 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] debug messages will be sent to: asynchronous::/data/mysql/data/gcs_debug_trace'
2023-05-24t20:21:26.300015 08:00 5344 [warning] [my-011735] [repl] plugin group_replication reported: '[gcs] automatically adding ipv4 localhost address to the allowlist. it is mandatory that it is added.'
2023-05-24t20:21:26.300062 08:00 5344 [warning] [my-011735] [repl] plugin group_replication reported: '[gcs] automatically adding ipv6 localhost address to the allowlist. it is mandatory that it is added.'
2023-05-24t20:21:26.300132 08:00 5344 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] ssl was not enabled'
2023-05-24t20:21:26.301794 08:00 5346 [system] [my-010597] [repl] 'change master to for channel 'group_replication_applier' executed'. previous state master_host='<null>', master_port= 0, master_log_file='', master_log_pos= 3942, master_bind=''. new state master_host='<null>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2023-05-24t20:21:26.331516 08:00 5344 [note] [my-011670] [repl] plugin group_replication reported: 'group replication applier module successfully initialized!'
2023-05-24t20:21:26.331583 08:00 5348 [note] [my-010581] [repl] slave sql thread for channel 'group_replication_applier' initialized, starting replication in log 'first' at position 0, relay log '/data/mysql/relaylog/mysql-relay-bin-group_replication_applier.000002' position: 4213
2023-05-24t20:21:26.473164 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] using xcom as communication stack for xcom'
2023-05-24t20:21:26.473594 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] xcom initialized and ready to accept incoming connections on port 13307'
2023-05-24t20:21:26.473792 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] successfully connected to the local xcom via anonymous pipe'
2023-05-24t20:21:26.474619 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] tcp_nodelay already set'
2023-05-24t20:21:26.474690 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] sucessfully connected to peer 172.16.86.61:13307. sending a request to be added to the group'
此时查看原少数派节点(node3)集群状态信息,也恢复正常。
root@localhost: 20:20: [(none)]> select member_host,member_state,member_role,member_version from performance_schema.replication_group_members;
---------------- -------------- ------------- ----------------
| member_host | member_state | member_role | member_version |
---------------- -------------- ------------- ----------------
| mysqlwjhtest03 | online | secondary | 8.0.28 |
| mysqlwjhtest02 | online | secondary | 8.0.28 |
| mysqlwjhtest01 | online | primary | 8.0.28 |
---------------- -------------- ------------- ----------------
3 rows in set (0.00 sec)
b. 查看原多数派节点(node1 、node2)日志,大致就是与node3建立连接,然后集群恢复。
2023-05-24t20:21:26.124082 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] node has already been removed: 172.16.86.63:13307 16849122691765501.'
2023-05-24t20:21:26.522275 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] adding new node to the configuration: 172.16.86.63:13307'
2023-05-24t20:21:26.522513 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] updating physical connections to other servers'
2023-05-24t20:21:26.522574 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] using existing server node 0 host 172.16.86.61:13307'
2023-05-24t20:21:26.522609 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] using existing server node 1 host 172.16.86.62:13307'
2023-05-24t20:21:26.522624 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] using existing server node 2 host 172.16.86.63:13307'
2023-05-24t20:21:26.522641 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] sucessfully installed new site definition. start synode for this configuration is {88e4dbe0 32296 0}, boot key synode is {88e4dbe0 32285 0}, configured event horizon=10, my node identifier is 0'
2023-05-24t20:21:26.522899 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] sender task disconnected from 172.16.86.63:13307'
2023-05-24t20:21:26.522931 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] connecting to 172.16.86.63:13307'
2023-05-24t20:21:26.523349 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] connected to 172.16.86.63:13307'
2023-05-24t20:21:27.452375 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] a configuration change was detected. sending a global view message to all nodes. my node identifier is 0 and my address is 172.16.86.61:13307'
2023-05-24t20:21:27.453687 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] received an xcom snapshot request from 172.16.86.63:13307'
上面案例是mysql8.0 mgr secondary节点网络故障过程中的mgr集群变化。最后看到,网络恢复后集群自行恢复,整个过程无需dba干预。
案例2:模拟primary节点网络中断
2.1 继续模拟网络分区故障,在 node1 上执行(当前node1为primary节点)。通过iptables 命令断开 node1 与 node2、node3 之间的网络连接。
iptables -a input -p tcp -s 172.16.86.62 -j drop
iptables -a output -p tcp -d 172.16.86.62 -j drop
iptables -a input -p tcp -s 172.16.86.63 -j drop
iptables -a output -p tcp -d 172.16.86.63 -j drop
date
2023年 05月 27日 星期六 09:29:46 cst
命令执行完 5后,各个节点开始响应。
a. 首先查看多数派节点(node2 、node3)日志集群状态信息
primary节点故障后会触发主库切换,我们详细看一下时间线:
- 09:29:46 网络中断
- 09:29:51 5秒后,心跳检测超时
- 09:30:21 35秒后,提升node3为主节点
023-05-27t09:29:51.229977 08:00 0 [warning] [my-011493] [repl] plugin group_replication reported: 'member with address mysqlwjhtest01:3307 has become unreachable.'
2023-05-27t09:30:20.947030 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] updating physical connections to other servers'
2023-05-27t09:30:20.947163 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] using existing server node 0 host 172.16.86.62:13307'
2023-05-27t09:30:20.947191 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] using existing server node 1 host 172.16.86.63:13307'
2023-05-27t09:30:20.947217 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] sucessfully installed new site definition. start synode for this co
nfiguration is {88e4dbe0 309739 1}, boot key synode is {88e4dbe0 309728 1}, configured event horizon=10, my node identifier is 1'
2023-05-27t09:30:21.689907 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] group is able to support up to communication protocol version 8.0.2
7'
2023-05-27t09:30:21.689994 08:00 0 [warning] [my-011499] [repl] plugin group_replication reported: 'members removed from the group: mysqlwjhtest01:3307'
2023-05-27t09:30:21.690006 08:00 0 [system] [my-011500] [repl] plugin group_replication reported: 'primary server with address mysqlwjhtest01:3307 left the group. electin
g new primary.'
2023-05-27t09:30:21.690097 08:00 0 [note] [my-013519] [repl] plugin group_replication reported: 'elected primary member gtid_executed: debbaad4-1fd7-4652-80a4-5c87cfde0db
6:1-16'
2023-05-27t09:30:21.690108 08:00 0 [note] [my-013519] [repl] plugin group_replication reported: 'elected primary member applier channel received_transaction_set: debbaad4
-1fd7-4652-80a4-5c87cfde0db6:1-16'
2023-05-27t09:30:21.690123 08:00 0 [system] [my-011507] [repl] plugin group_replication reported: 'a new primary with address mysqlwjhtest03:3307 was elected. the new pri
mary will execute all previous group transactions before allowing writes.'
2023-05-27t09:30:21.690509 08:00 0 [system] [my-011503] [repl] plugin group_replication reported: 'group membership changed to mysqlwjhtest03:3307, mysqlwjhtest02:3307 on
view 16849117016333239:10.'
2023-05-27t09:30:21.691815 08:00 6189 [system] [my-013731] [repl] plugin group_replication reported: 'the member action "mysql_disable_super_read_only_if_primary" for eve
nt "after_primary_election" with priority "1" will be run.'
2023-05-27t09:30:21.692135 08:00 6189 [system] [my-011566] [repl] plugin group_replication reported: 'setting super_read_only=off.'
2023-05-27t09:30:21.692186 08:00 6189 [system] [my-013731] [repl] plugin group_replication reported: 'the member action "mysql_start_failover_channels_if_primary" for eve
nt "after_primary_election" with priority "10" will be run.'
2023-05-27t09:30:21.693182 08:00 6181 [note] [my-011485] [repl] plugin group_replication reported: 'primary had applied all relay logs, disabled conflict detection.'
2023-05-27t09:30:21.693200 08:00 68187 [system] [my-011510] [repl] plugin group_replication reported: 'this server is working as primary member.'
2023-05-27t09:30:31.213720 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] failure reading from fd=-1 n=18446744073709551615 from 172.16.86.61
:13307'
多数派节点(node2 、node3)集群状态信息显示,node3节点已经被选举为新主。
#node2、node3显示状态
root@localhost: 09:30: [(none)]> select member_host,member_state,member_role,member_version from performance_schema.replication_group_members;
---------------- -------------- ------------- ----------------
| member_host | member_state | member_role | member_version |
---------------- -------------- ------------- ----------------
| mysqlwjhtest03 | online | primary | 8.0.28 |
| mysqlwjhtest02 | online | secondary | 8.0.28 |
---------------- -------------- ------------- ----------------
2 rows in set (0.01 sec)
b. 查看少数派节点(node1)日志。
node1提供的日志信息较少,只是在心跳检测超时后阻塞了数据写入,这里并没有提到阻塞多长时间。
2023-05-27t09:29:51.277851 08:00 0 [warning] [my-011493] [repl] plugin group_replication reported: 'member with address mysqlwjhtest02:3307 has become unreachable.'
2023-05-27t09:29:51.277990 08:00 0 [warning] [my-011493] [repl] plugin group_replication reported: 'member with address mysqlwjhtest03:3307 has become unreachable.'
2023-05-27t09:29:51.278007 08:00 0 [error] [my-011495] [repl] plugin group_replication reported: 'this server is not able to reach a majority of members in the group. this server will now block all updates. the server will remain blocked until contact with the majority is restored. it is possible to use group_replication_force_members to force a new group membership.'
查看node1集群状态信息。我们发现网络分区后,少数派节点(node1)依然处于"online primary"状态。
#node1显示状态
root@localhost: 09:30: [(none)]> select member_host,member_state,member_role,member_version from performance_schema.replication_group_members;
---------------- -------------- ------------- ----------------
| member_host | member_state | member_role | member_version |
---------------- -------------- ------------- ----------------
| mysqlwjhtest03 | unreachable | secondary | 8.0.28 |
| mysqlwjhtest02 | unreachable | secondary | 8.0.28 |
| mysqlwjhtest01 | online | primary | 8.0.28 |
---------------- -------------- ------------- ----------------
3 rows in set (0.00 sec)
所以,结合对比多数派节点集群信息,网络分区后,node1、node3均为"online primary"状态,但是少数派节点(node1)会话阻塞了所有更新,即node1处于只读状态。如果mgrvip切换检测逻辑只依赖于"online primary"状态时,这里会有问题。
此时我们在node1尝试写入数据,发现被阻塞。这是因为node1分区后只有自己一个节点,属于少数派节点,不满足组复制的多数派原则,符合预期。
root@localhost: 10:10: [(none)]> create database db1;
...... 一直卡在这里
2.2 恢复网络连接
接下来我们尝试恢复 node1 与 node2、node3 之间的网络。
iptables -f
a. 对比查看少数派节点(node1)和多数派节点(node2、node3)上的集群状态变化。
在这里我发现了比较有意思的事情,首先各个节点的unreachable状态变为online。但此时少数派节点(node1)集群视图信息显示,node1依然是"online primary"状态,然后过了两三秒,原少数派节点(node1)中的primary节点才由node1变为node3。从日志来看,网络恢复后节点从unreachable状态变为online,然后由于xcom层配置的改变,最终报错后,重新加入集群。
以下是我在原少数派节点(node1)监控到的集群状态变化:
#首先,网络刚刚恢复时,primary为node1
root@localhost: 10:17: [(none)]> select member_host,member_state,member_role,member_version from performance_schema.replication_group_members;
---------------- -------------- ------------- ----------------
| member_host | member_state | member_role | member_version |
---------------- -------------- ------------- ----------------
| mysqlwjhtest03 | online | secondary | 8.0.28 |
| mysqlwjhtest02 | online | secondary | 8.0.28 |
| mysqlwjhtest01 | online | primary | 8.0.28 |
---------------- -------------- ------------- ----------------
3 rows in set (0.00 sec)
#大概两三秒后primary变为node3。
root@localhost: 10:24: [(none)]> select member_host,member_state,member_role,member_version from performance_schema.replication_group_members;
---------------- -------------- ------------- ----------------
| member_host | member_state | member_role | member_version |
---------------- -------------- ------------- ----------------
| mysqlwjhtest03 | online | primary | 8.0.28 |
| mysqlwjhtest02 | online | secondary | 8.0.28 |
| mysqlwjhtest01 | online | secondary | 8.0.28 |
---------------- -------------- ------------- ----------------
3 rows in set (0.00 sec)
原少数派节点(node1)网络恢复过程中的日志
2023-05-27t10:17:07.490608 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] connected to 172.16.86.63:13307'
2023-05-27t10:17:08.491033 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] connected to 172.16.86.62:13307'
2023-05-27t10:17:08.586063 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] a configuration change was detected. sending a global view message to all nodes. my node identifier is 0 and my address is 172.16.86.61:13307'
2023-05-27t10:17:08.586486 08:00 0 [warning] [my-011494] [repl] plugin group_replication reported: 'member with address mysqlwjhtest03:3307 is reachable again.'
2023-05-27t10:17:08.586546 08:00 0 [warning] [my-011498] [repl] plugin group_replication reported: 'the member has resumed contact with a majority of the members in the group. regular operation is restored and transactions are unblocked.'
2023-05-27t10:17:09.202537 08:00 0 [warning] [my-011494] [repl] plugin group_replication reported: 'member with address mysqlwjhtest02:3307 is reachable again.'
2023-05-27t10:17:20.197898 08:00 0 [warning] [my-011493] [repl] plugin group_replication reported: 'member with address mysqlwjhtest02:3307 has become unreachable.'
2023-05-27t10:17:20.198008 08:00 0 [warning] [my-011493] [repl] plugin group_replication reported: 'member with address mysqlwjhtest03:3307 has become unreachable.'
2023-05-27t10:17:20.198026 08:00 0 [error] [my-011495] [repl] plugin group_replication reported: 'this server is not able to reach a majority of members in the group. this server will now block all updates. the server will remain blocked until contact with the majority is restored. it is possible to use group_replication_force_members to force a new group membership.'
2023-05-27t10:17:20.815474 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] a configuration change was detected. sending a global view message to all nodes. my node identifier is 0 and my address is 172.16.86.61:13307'
2023-05-27t10:17:20.815730 08:00 0 [warning] [my-011494] [repl] plugin group_replication reported: 'member with address mysqlwjhtest02:3307 is reachable again.'
2023-05-27t10:17:38.361435 08:00 0 [warning] [my-011498] [repl] plugin group_replication reported: 'the member has resumed contact with a majority of the members in the group. regular operation is restored and transactions are unblocked.'
2023-05-27t10:17:38.361659 08:00 0 [warning] [my-011494] [repl] plugin group_replication reported: 'member with address mysqlwjhtest03:3307 is reachable again.'
2023-05-27t10:17:40.489269 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] updating physical connections to other servers'
2023-05-27t10:17:40.489357 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] using existing server node 0 host 172.16.86.62:13307'
2023-05-27t10:17:40.489370 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] using existing server node 1 host 172.16.86.63:13307'
2023-05-27t10:17:40.489382 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] sucessfully installed new site definition. start synode for this configuration is {88e4dbe0 309739 1}, boot key synode is {88e4dbe0 309728 1}, configured event horizon=10, my node identifier is 4294967295'
2023-05-27t10:17:43.526534 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] sending a request to a remote xcom failed. please check the remote node log for more details.'
2023-05-27t10:17:43.526695 08:00 0 [error] [my-011505] [repl] plugin group_replication reported: 'member was expelled from the group due to network failures, changing member status to error.'
2023-05-27t10:17:43.526856 08:00 0 [warning] [my-011630] [repl] plugin group_replication reported: 'due to a plugin error, some transactions were unable to be certified and will now rollback.'
2023-05-27t10:17:43.526929 08:00 0 [error] [my-011712] [repl] plugin group_replication reported: 'the server was automatically set into read only mode after an error was detected.'
2023-05-27t10:17:43.527329 08:00 53521 [error] [my-011615] [repl] plugin group_replication reported: 'error while waiting for conflict detection procedure to finish on session 53521'
2023-05-27t10:17:43.527405 08:00 53521 [error] [my-010207] [repl] run function 'before_commit' in plugin 'group_replication' failed
2023-05-27t10:17:43.533122 08:00 53521 [note] [my-010914] [server] aborted connection 53521 to db: 'unconnected' user: 'root' host: 'localhost' (error on observer while running replication hook 'before_commit').
2023-05-27t10:17:43.533134 08:00 53627 [system] [my-011565] [repl] plugin group_replication reported: 'setting super_read_only=on.'
2023-05-27t10:17:43.533523 08:00 53628 [system] [my-013373] [repl] plugin group_replication reported: 'started auto-rejoin procedure attempt 1 of 3'
2023-05-27t10:17:43.533650 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] xcom_client_remove_node: try to push xcom_client_remove_node to xcom'
2023-05-27t10:17:43.533682 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] xcom_client_remove_node: failed to push into xcom.'
2023-05-27t10:17:44.837043 08:00 0 [system] [my-011504] [repl] plugin group_replication reported: 'group membership changed: this member has left the group.'
2023-05-27t10:17:44.838114 08:00 13 [note] [my-010596] [repl] error reading relay log event for channel 'group_replication_applier': slave sql thread was killed
2023-05-27t10:17:44.838433 08:00 13 [note] [my-010587] [repl] slave sql thread for channel 'group_replication_applier' exiting, replication stopped in log 'first' at position 0
2023-05-27t10:17:44.838544 08:00 11 [note] [my-011444] [repl] plugin group_replication reported: 'the group replication applier thread was killed.'
2023-05-27t10:17:44.839024 08:00 53628 [note] [my-011673] [repl] plugin group_replication reported: 'group communication ssl configuration: group_replication_ssl_mode: "disabled"'
2023-05-27t10:17:44.840989 08:00 53628 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] debug messages will be sent to: asynchronous::/data/mysql/data/gcs_debug_trace'
2023-05-27t10:17:44.841282 08:00 53628 [warning] [my-011735] [repl] plugin group_replication reported: '[gcs] automatically adding ipv4 localhost address to the allowlist. it is mandatory that it is added.'
2023-05-27t10:17:44.841313 08:00 53628 [warning] [my-011735] [repl] plugin group_replication reported: '[gcs] automatically adding ipv6 localhost address to the allowlist. it is mandatory that it is added.'
2023-05-27t10:17:44.841357 08:00 53628 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] ssl was not enabled'
2023-05-27t10:17:44.842115 08:00 53630 [system] [my-010597] [repl] 'change master to for channel 'group_replication_applier' executed'. previous state master_host='<null>', master_port= 0, master_log_file='', master_log_pos= 2046, master_bind=''. new state master_host='<null>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2023-05-27t10:17:44.860923 08:00 53628 [note] [my-011670] [repl] plugin group_replication reported: 'group replication applier module successfully initialized!'
2023-05-27t10:17:44.861050 08:00 53632 [note] [my-010581] [repl] slave sql thread for channel 'group_replication_applier' initialized, starting replication in log 'first' at position 0, relay log '/data/mysql/relaylog/mysql-relay-bin-group_replication_applier.000002' position: 2317
2023-05-27t10:17:45.006131 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] using xcom as communication stack for xcom'
2023-05-27t10:17:45.006450 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] xcom initialized and ready to accept incoming connections on port 13307'
2023-05-27t10:17:45.006546 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] successfully connected to the local xcom via anonymous pipe'
2023-05-27t10:17:45.007262 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] tcp_nodelay already set'
2023-05-27t10:17:45.007298 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] sucessfully connected to peer 172.16.86.62:13307. sending a request to be added to the group'
2023-05-27t10:17:45.007314 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] sending add_node request to a peer xcom node'
2023-05-27t10:17:47.166146 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] node has not booted. requesting an xcom snapshot from node number 1 in the current configuration'
2023-05-27t10:17:47.167161 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] installing requested snapshot. importing all incoming configurations.'
2023-05-27t10:17:47.167358 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] updating physical connections to other servers'
2023-05-27t10:17:47.167393 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] creating new server node 0 host 172.16.86.61:13307'
2023-05-27t10:17:47.167502 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] creating new server node 1 host 172.16.86.62:13307'
2023-05-27t10:17:47.167613 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] sucessfully installed new site definition. start synode for this configuration is {88e4dbe0 36985 0}, boot key synode is {88e4dbe0 36974 0}, configured event horizon=10, my node identifier is 0'
2023-05-27t10:17:47.167699 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] updating physical connections to other servers'
2023-05-27t10:17:47.167723 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] using existing server node 0 host 172.16.86.61:13307'
2023-05-27t10:17:47.167747 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] using existing server node 1 host 172.16.86.62:13307'
2023-05-27t10:17:47.167761 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] creating new server node 2 host 172.16.86.63:13307'
2023-05-27t10:17:47.167866 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] sucessfully installed new site definition. start synode for this configuration is {88e4dbe0 37375 0}, boot key synode is {88e4dbe0 37364 0}, configured event horizon=10, my node identifier is 0'
2023-05-27t10:17:47.167964 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] updating physical connections to other servers'
2023-05-27t10:17:47.167980 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] using existing server node 0 host 172.16.86.62:13307'
2023-05-27t10:17:47.167993 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] using existing server node 1 host 172.16.86.63:13307'
2023-05-27t10:17:47.168009 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] sucessfully installed new site definition. start synode for this configuration is {88e4dbe0 309739 1}, boot key synode is {88e4dbe0 309728 1}, configured event horizon=10, my node identifier is 4294967295'
2023-05-27t10:17:47.168089 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] updating physical connections to other servers'
2023-05-27t10:17:47.168104 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] using existing server node 0 host 172.16.86.62:13307'
2023-05-27t10:17:47.168117 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] using existing server node 1 host 172.16.86.63:13307'
2023-05-27t10:17:47.168131 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] using existing server node 2 host 172.16.86.61:13307'
2023-05-27t10:17:47.168146 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] sucessfully installed new site definition. start synode for this configuration is {88e4dbe0 312618 0}, boot key synode is {88e4dbe0 312607 0}, configured event horizon=10, my node identifier is 2'
2023-05-27t10:17:47.168161 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] finished snapshot installation. my node number is 2'
2023-05-27t10:17:47.168269 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] the member has joined the group. local port: 13307'
2023-05-27t10:17:47.168610 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] connecting to 172.16.86.62:13307'
2023-05-27t10:17:47.168910 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] connected to 172.16.86.62:13307'
2023-05-27t10:17:47.168950 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] connecting to 172.16.86.63:13307'
2023-05-27t10:17:47.169306 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] connected to 172.16.86.63:13307'
2023-05-27t10:17:47.658374 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] this server adjusted its communication protocol to 8.0.27 in order to join the group.'
2023-05-27t10:17:47.658436 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] group is able to support up to communication protocol version 8.0.27'
2023-05-27t10:17:47.660220 08:00 53628 [system] [my-013375] [repl] plugin group_replication reported: 'auto-rejoin procedure attempt 1 of 3 finished. member was able to join the group.'
2023-05-27t10:17:48.661179 08:00 0 [system] [my-013471] [repl] plugin group_replication reported: 'distributed recovery will transfer data using: incremental recovery from a group donor'
2023-05-27t10:17:48.661739 08:00 53647 [note] [my-011576] [repl] plugin group_replication reported: 'establishing group recovery connection with a possible donor. attempt 1/10'
2023-05-27t10:17:48.661780 08:00 0 [system] [my-011503] [repl] plugin group_replication reported: 'group membership changed to mysqlwjhtest03:3307, mysqlwjhtest02:3307, mysqlwjhtest01:3307 on view 16849117016333239:11.'
2023-05-27t10:17:48.693433 08:00 53647 [system] [my-010597] [repl] 'change master to for channel 'group_replication_recovery' executed'. previous state master_host='', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. new state master_host='mysqlwjhtest02', master_port= 3307, master_log_file='', master_log_pos= 4, master_bind=''.
2023-05-27t10:17:48.727090 08:00 53647 [note] [my-011580] [repl] plugin group_replication reported: 'establishing connection to a group replication recovery donor 62a5b927-fa01-11ed-82ce-0050569c05d6 at mysqlwjhtest02 port: 3307.'
2023-05-27t10:17:48.727967 08:00 53648 [warning] [my-010897] [repl] storing mysql user name or password information in the master info repository is not secure and is therefore not recommended. please consider using the user and password connection options for start slave; see the 'start slave syntax' in the mysql manual for more information.
2023-05-27t10:17:48.730254 08:00 53648 [system] [my-010562] [repl] slave i/o thread for channel 'group_replication_recovery': connected to master 'repl@mysqlwjhtest02:3307',replication started in log 'first' at position 4
2023-05-27t10:17:48.733307 08:00 53649 [note] [my-010581] [repl] slave sql thread for channel 'group_replication_recovery' initialized, starting replication in log 'first' at position 0, relay log '/data/mysql/relaylog/mysql-relay-bin-group_replication_recovery.000001' position: 4
2023-05-27t10:17:48.757900 08:00 53647 [note] [my-011585] [repl] plugin group_replication reported: 'terminating existing group replication donor connection and purging the corresponding logs.'
2023-05-27t10:17:48.758896 08:00 53649 [note] [my-010587] [repl] slave sql thread for channel 'group_replication_recovery' exiting, replication stopped in log 'mysql-bin.000001' at position 6217
2023-05-27t10:17:48.759258 08:00 53648 [note] [my-011026] [repl] slave i/o thread killed while reading event for channel 'group_replication_recovery'.
2023-05-27t10:17:48.759324 08:00 53648 [note] [my-010570] [repl] slave i/o thread exiting for channel 'group_replication_recovery', read up to log 'mysql-bin.000001', position 6217
2023-05-27t10:17:48.796509 08:00 53647 [system] [my-010597] [repl] 'change master to for channel 'group_replication_recovery' executed'. previous state master_host='mysqlwjhtest02', master_port= 3307, master_log_file='', master_log_pos= 4, master_bind=''. new state master_host='<null>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2023-05-27t10:17:48.831253 08:00 0 [system] [my-011490] [repl] plugin group_replication reported: 'this server was declared online within the replication group.'
2023-05-27t10:29:27.705642 08:00 53768 [note] [my-013730] [server] 'wait_timeout' period of 300 seconds was exceeded for `root`@`localhost`. the idle time since last command was too long.
2023-05-27t10:29:27.705830 08:00 53768 [note] [my-010914] [server] aborted connection 53768 to db: 'unconnected' user: 'root' host: 'localhost' (the client was disconnected by the server because of inactivity.).
在以上两个案例中,我将多数派网络不可达超时参数(group_replication_unreachable_majority_timeout)设置为0,所以心跳检测超时后,可疑节点被标记为unreachable。因为设置了可疑节点被驱逐时间为30秒(group_replication_member_expel_timeout=30)。所以在网络中断35秒后,多数派节点集群中的可疑节点被驱逐。而少数派节点集群中的可疑节点一直处于unreachable状态。
参数group_replication_unreachable_majority_timeout对集群的影响
为了便于观察,这里设置为80秒,然后重复案例2的步骤。
group_replication_member_expel_timeout=30 #与上面案例保持不变,设为30秒
group_replication_unreachable_majority_timeout=80 #设置为80秒
注意理解,参数group_replication_unreachable_majority_timeout表示多数派节点不可达超时时间。即网络故障分区后,它只对少数派节点有效。
此时primary节点是node3,所以断开node3与另外两个节点的网络。
iptables -a input -p tcp -s 172.16.86.62 -j drop
iptables -a output -p tcp -d 172.16.86.62 -j drop
iptables -a input -p tcp -s 172.16.86.61 -j drop
iptables -a output -p tcp -d 172.16.86.61 -j drop
date
#恢复网络:iptables -f
观察主节点网络故障过程中各节点集群状态
- 13:15:56 模拟网络中断
- 13:16:01 5秒时,心跳检测网络中断,少数派节点(node3)日志显示所有更新将被阻塞85秒,并将另外两个节点状态标记为unreachalbe。同时,多数派节点(node1、node2)集群状态视图将node3标记为unreachalbe。
- 13:16:31 35秒时,多数派节点(node1、node2)日志显示,node3对逐出集群,同时选举新主。 少数派节点无日志输出,此时少数派节点(node3)视图的primary依然是自己(但是所有更新被阻塞中)。
- 13:17:26 85秒时,少数派节点(node3)更改自己集群视图状态,由online primary改为error。 多数派节点无日志输出。
- 13:17:56 115秒时,少数派节点(node3)更新集群状态视图,踢出node1、node2节点信息。
5秒后,心跳检测网络中断,所有节点检测到网络不可达,并更新集群状态视图
#少数派节点(node3)
2023-05-27t13:16:01.716803 08:00 0 [warning] [my-011493] [repl] plugin group_replication reported: 'member with address mysqlwjhtest01:3307 has become unreachable.'
2023-05-27t13:16:01.716934 08:00 0 [warning] [my-011493] [repl] plugin group_replication reported: 'member with address mysqlwjhtest02:3307 has become unreachable.'
2023-05-27t13:16:01.716951 08:00 0 [error] [my-011496] [repl] plugin group_replication reported: 'this server is not able to reach a majority of members in the group. this server will now block all updates. the server will remain blocked for the next 85 seconds. unless contact with the majority is restored, after this time the member will error out and leave the group. it is possible to use group_replication_force_members to force a new group membership.'
2023-05-27t13:17:26.724015 08:00 71231 [error] [my-011711] [repl] plugin group_replication reported: 'this member could not reach a majority of the members for more than 85 seconds. the member will now leave the group as instructed by the group_replication_unreachable_majority_timeout option.'
#多数派节点(node1、node2)
2023-05-27t13:16:01.719345 08:00 0 [warning] [my-011493] [repl] plugin group_replication reported: 'member with address mysqlwjhtest03:3307 has become unreachable.'
35秒后,多数派节点(node1、node2)将node3逐出集群,并选择node2为新主库,更新状态视图。 故障节点无日志输出。
2023-05-27t13:16:32.706137 08:00 0 [warning] [my-011499] [repl] plugin group_replication reported: 'members removed from the group: mysqlwjhtest03:3307'
2023-05-27t13:16:32.706150 08:00 0 [system] [my-011500] [repl] plugin group_replication reported: 'primary server with address mysqlwjhtest03:3307 left the group. electing new primary.'
2023-05-27t13:16:32.706259 08:00 0 [note] [my-013519] [repl] plugin group_replication reported: 'elected primary member gtid_executed: debbaad4-1fd7-4652-80a4-5c87cfde0db6:1-19'
2023-05-27t13:16:32.706272 08:00 0 [note] [my-013519] [repl] plugin group_replication reported: 'elected primary member applier channel received_transaction_set: debbaad4-1fd7-4652-80a4-5c87cfde0db6:1-19'
2023-05-27t13:16:32.706284 08:00 0 [system] [my-011507] [repl] plugin group_replication reported: 'a new primary with address mysqlwjhtest02:3307 was elected. the new primary will execute all previous group transactions before allowing writes.'
85秒后,少数派节点(node3)更改自己状态,由online改为error。多数派节点无日志输出。
2023-05-27t13:17:26.724015 08:00 71231 [error] [my-011711] [repl] plugin group_replication reported: 'this member could not reach a majority of the members for more than 85 seconds. the member will now leave the group as instructed by the group_replication_unreachable_majority_timeout option.'
2023-05-27t13:17:26.724483 08:00 0 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] xcom_client_remove_node: try to push xcom_client_remove_node to xcom'
2023-05-27t13:17:26.724560 08:00 71231 [error] [my-011712] [repl] plugin group_replication reported: 'the server was automatically set into read only mode after an error was detected.'
2023-05-27t13:17:26.725179 08:00 71231 [system] [my-011565] [repl] plugin group_replication reported: 'setting super_read_only=on.'
2023-05-27t13:17:26.725340 08:00 71231 [note] [my-011647] [repl] plugin group_replication reported: 'going to wait for view modification'
115秒以后,少数派节点(node3)从自己的集群状态视图中踢出node1、node2节点信息,然后auto-rejoin尝试加入集群(如果在85s-115s间网络未恢复,则加入失败)。
2023-05-27t13:17:56.296998 08:00 0 [error] [my-011735] [repl] plugin group_replication reported: '[gcs] timeout while waiting for the group communication engine to exit!'
2023-05-27t13:17:56.297075 08:00 0 [error] [my-011735] [repl] plugin group_replication reported: '[gcs] the member has failed to gracefully leave the group.'
2023-05-27t13:17:56.396287 08:00 0 [system] [my-011504] [repl] plugin group_replication reported: 'group membership changed: this member has left the group.'
2023-05-27t13:17:56.396530 08:00 72720 [system] [my-013373] [repl] plugin group_replication reported: 'started auto-rejoin procedure attempt 1 of 3'
2023-05-27t13:17:56.397353 08:00 71913 [note] [my-010596] [repl] error reading relay log event for channel 'group_replication_applier': slave sql thread was killed
2023-05-27t13:17:56.397914 08:00 71913 [note] [my-010587] [repl] slave sql thread for channel 'group_replication_applier' exiting, replication stopped in log 'first' at position 0
2023-05-27t13:17:56.398188 08:00 71911 [note] [my-011444] [repl] plugin group_replication reported: 'the group replication applier thread was killed.'
2023-05-27t13:17:56.398830 08:00 72720 [note] [my-011673] [repl] plugin group_replication reported: 'group communication ssl configuration: group_replication_ssl_mode: "disabled"'
2023-05-27t13:17:56.400717 08:00 72720 [note] [my-011735] [repl] plugin group_replication reported: '[gcs] debug messages will be sent to: asynchronous::/data/mysql/data/gcs_debug_trace'
2023-05-27t13:17:56.400984 08:00 72720 [warning] [my-011735] [repl] plugin group_replication reported: '[gcs] automatically adding ipv4 localhost address to the allowlist. it is mandatory that it is added.'
2023-05-27t13:17:56.401014 08:00 72720 [warning] [my-011735] [repl] plugin group_replication reported: '[gcs] automatically adding ipv6 localhost address to the allowlist. it is mandatory that it is added.'
少数派节点(node3)集群视图变化
#5秒后
root@localhost: 14:05: [(none)]> select member_host,member_state,member_role,member_version from performance_schema.replication_group_members;system date;
---------------- -------------- ------------- ----------------
| member_host | member_state | member_role | member_version |
---------------- -------------- ------------- ----------------
| mysqlwjhtest01 | unreachable | secondary | 8.0.28 |
| mysqlwjhtest03 | online | primary | 8.0.28 |
| mysqlwjhtest02 | unreachable | secondary | 8.0.28 |
---------------- -------------- ------------- ----------------
3 rows in set (0.00 sec)
#85秒后
root@localhost: 14:07: [(none)]> select member_host,member_state,member_role,member_version from performance_schema.replication_group_members;system date;
---------------- -------------- ------------- ----------------
| member_host | member_state | member_role | member_version |
---------------- -------------- ------------- ----------------
| mysqlwjhtest01 | unreachable | secondary | 8.0.28 |
| mysqlwjhtest03 | error | | 8.0.28 |
| mysqlwjhtest02 | unreachable | secondary | 8.0.28 |
---------------- -------------- ------------- ----------------
3 rows in set (0.00 sec)
#115秒后
root@localhost: 14:08: [(none)]> select member_host,member_state,member_role,member_version from performance_schema.replication_group_members;system date;
---------------- -------------- ------------- ----------------
| member_host | member_state | member_role | member_version |
---------------- -------------- ------------- ----------------
| mysqlwjhtest03 | error | | 8.0.28 |
---------------- -------------- ------------- ----------------
1 row in set (0.01 sec)
测试结果分析
场景1:secondary节点发生故障
- 网络中断5秒时,心跳检测超时,少数派节点和多数派节点均将可疑节点设置为unreachable。
- 网络中断35秒时,多数派节点将可疑节点逐出集群。
- 网络中断85秒时,少数派节点(故障节点)将自己的状态改为error。(如果在85秒前网络恢复,则集群自动恢复)
- 网络中断115秒时,少数派节点(故障节点)将两个失联节点逐出集群。(如果在115秒前网络恢复,则在115秒时mgr先逐出可疑节点,然后auto-rejoin尝试将error节点重新加入集群,且可以成功加入。)
场景2:primary节点发生故障
- 网络中断5秒时,心跳检测超时,少数派节点和多数派节点均将可疑节点设置为unreachable。
- 网络中断35秒时,多数派节点将可疑节点逐出集群,同时会选举出新的主库。
- 网络中断85秒时,少数派节点(故障节点)将自己的状态改为error。(如果在85秒前网络恢复,则集群自动恢复)
- 网络中断115秒时,少数派节点(故障节点)将两个失联节点逐出集群。(如果在115秒前网络恢复,则在115秒时mgr先逐出可疑节点,然后auto-rejoin尝试将error节点重新加入集群,且可以成功加入。)
几个时间点计算
35秒:心跳检测5s group_replication_member_expel_timeout=30s
85秒:心跳检测5s group_replication_unreachable_majority_timeout=80s
115秒:心跳检测5s group_replication_unreachable_majority_timeout=80s (试图退出集群,硬编码超时时间30秒 waiting_time = 30s)
综上,3个参数的关系如下
- group_replication_member_expel_timeout 表示多数派中驱逐unreable节点的时间,也就是容忍多少秒,看看网络是否能够恢复。同时也表示多数派选举新主的时间,驱逐后会立马选择新主。
少数派节点首先读取参数group_replication_unreachable_majority_timeout,到达超时时间后进入error状态。然后节点做退出操作,试图通知其他节点,但是由于无法通信造成超时,超时时间同样为30秒,这里大概流程如下:
紧接着触发auto-rejoin重试(group_replication_autorejoin_tries)。unreable参数打开,partition线程 group_partition_handling::partition_thread_handler ->leave_group_on_failure::leave ->gcs_operations::leave ->gcs_xcom_control::leave ->gcs_xcom_notification *notification = new control_notification(do_function_leave, this); bool scheduled = m_gcs_engine->push(notification); 发送操作给xcom engine 。 todo:时间久了忘记了 xcom engine 操作, todo:时间久了忘记了 gcs_xcom_control::do_leave -> m_xcom_proxy->xcom_remove_node(*m_local_node_info, m_gid_hash) ->gcs_xcom_proxy_base::xcom_remove_nodes ->xcom_client_remove_node ->xcom_send_cfg_wait 通知其他节点进行配置 -> int is_xcom_exit = m_xcom_proxy->xcom_wait_exit() 等待xcom 的新配置处理完成,这里30秒超时 /* time is defined in seconds. */ static const unsigned int waiting_time = 30;
group_replication_unreachable_majority_timeout参数的值必须大于group_replication_member_expel_timeout,否则故障节点会先于大多数节点的容忍时间而将自身状态设置为error。
由此可见,当参数group_replication_member_expel_timeout和group_replication_unreachable_majority_timeout的值相同时,即少数派节点进入error状态时,多数派节点恰好选举了新主,可以避免出现双主的现象。
故障检测流程
结合上面的案例,我们来看看 group repliction 的故障检测流程。
集群中每个节点都会定期(每秒 1 次)向其它节点发送心跳信息。如果在 5s 内(固定值,无参数调整)没有收到其它节点的心跳信息,则会将该节点标记为可疑节点,同时会将该节点的状态设置为 unreachable 。如果集群中有等于或超过 1/2 的节点显示为 unreachable ,则该集群不能对外提供写服务。
如果在group_replication_member_expel_timeout(从 mysql 8.0.21 开始,该参数的默认值为 5,单位 s,最大可设置值为3600,即 1 小时)时间内,多数派中可疑节点恢复正常,则会直接应用 xcom cache 中的消息。xcom cache 的大小由group_replication_message_cache_size 决定,默认是 1g。
如果在group_replication_member_expel_timeout时间内,多数派中可疑节点没有恢复正常,则会被驱逐出集群。
在少数派中可疑节点不会自动离开集群,
- 如果参数group_replication_unreachable_majority_timeout没有设置(默认值为0),一直处于unreachable,直到网络恢复。
达到 group_replication_unreachable_majority_timeout 的限制,自动设置为error。
如果group_replication_auto_rejoin_tries不为 0,对于 error 状态的节点,会自动重试,重新加入集群(auto-rejoin)。
如果group_replication_auto_rejoin_tries为 0 或重试失败,则会执行 group_replication_exit_state_action 指定的操作。可选的操作有:
read_only:只读模式。在这种模式下,会将 super_read_only 设置为 on。默认值。
offline_mode:离线模式。在这种模式下,会将 offline_mode 和 super_read_only 设置为 on,此时,只有connection_admin(super)权限的用户才能登陆,普通用户不能登录。
abort_server:关闭实例。
xcom cache
xcom cache 是 xcom 使用的消息缓存,用来缓存集群节点之间交换的消息。缓存的消息是共识协议的一部分。如果网络不稳定,可能会出现节点失联的情况。
如果节点在一定时间(由 group_replication_member_expel_timeout 决定)内恢复正常,它会首先应用 xcom cache 中的消息。如果 xcom cache 没有它需要的所有消息,这个节点会被驱逐出集群。驱逐出集群后,如果 group_replication_autorejoin_tries 不为 0,它会重新加入集群(auto-rejoin)。
重新加入集群会使用 distributed recovery 补齐差异数据。相比较直接使用 xcom cache 中的消息,通过 distributed recovery 加入集群需要的时间相对较长,过程也较复杂,并且集群的性能也会受到影响。
所以,我们在设置 xcom cache 的大小时,需预估 group_replication_member_expel_timeout 5s 这段时间内的内存使用量。
生产配置建议
#适用于8.0.21以上版本
group_replication_member_expel_timeout=30 #只对多数派中可疑节点有效,超过30秒后可疑节点将被驱逐集群,默认5秒。
group_replication_unreachable_majority_timeout=35 #只对少数派有效,当超过设置的网络不可达超时时间时,会进入error状态。默认值为0,表示一直处于unreachable状态。
group_replication_message_cache_size=5g #xcom cache使用的消息缓存,默认1g,建议设置为5g。
group_replication_autorejoin_tries=3 #error状态节点重新加入集群的次数
注意理解:参数group_replication_unreachable_majority_timeout表示多数派节点不可达,即网络故障分区后,它只对少数派节点有效。
目前我司vip脚本切换逻辑主要判断旧主状态,当旧主从 online primary变为error状态时,触发vip切换。所以我们将前两个参数设置相同值,这样可以保证触发vip切换的那一刻,mgr多数派节点刚好选举了新主。同时也避免了双主现象。
扩展:mysql5.7故障流程及配置建议
primary节点故障流程(假设group_replication_unreachable_majority_timeout=30)
- 网络中断5秒时,心跳检测超时,少数派节点和多数派节点均将可疑节点设置为unreachable,同时多数派驱逐可疑节点、选举新主。
- 网络中断35秒时,少数派节点(故障节点)将自己的状态改为error。
- 网络中断65秒时,少数派节点(故障节点)将两个失联节点逐出集群。(固定值 waiting_time = 30s)
注意:这里与mysql8.0最大的区别是,心跳检测(硬编码5秒,无法修改)超时后,多数派节点会立刻选择新主,此后即便网络恢复,少数派节点也会进入error状态。对于集群可用性,参数group_replication_unreachable_majority_timeout毫无意义。
#适用于mysql 5.7
group_replication_unreachable_majority_timeout=5 #只对少数派节点有效,10s(含5秒心跳检测时间)后进入error状态。
在mysql5.7,mgr参数可靠性较弱并且可优化参数很少,推荐使用mysql8.0。