mysql从库异常宕机-m6米乐安卓版下载

原创冯刚 2022-10-18

552

前言

上一篇文章介绍了因为bug导致主从数据不一致的案例

mysql主从数据不一致-bug：https://www.modb.pro/db/518430

此案例介绍碰到的另外一个bug：因为所在主机宕机，mysql从库i/o和sql线程启动报错问题。

1 环境信息

自建mysql：5.7.22

os 版本：centos linux release 7.6

2 错误描述

2022-08-17 06:45:04 从库所在主机告警，提示宕机，并伴随mysql异常告警；

2022-08-17 06:45:53 宕机告警恢复；

2022-08-17 06:46:52 提示mysql i/o和sql线程停止；

3 排查

3.1 登录实例，查看从库同步状态

此时i/o和sql线程都是"no"。

mysql> show slave status\g
*************************** 1. row ***************************
               slave_io_state:
                  master_host: xxx.xxx.xxx.xxx
                  master_user: xxxxxx
                  master_port: 3306
                connect_retry: 60
              master_log_file: mysql-bin.003334
          read_master_log_pos: 102990348
               relay_log_file: relay.011586
                relay_log_pos: 111435792
        relay_master_log_file: mysql-bin.003334
             slave_io_running: no
            slave_sql_running: no
              replicate_do_db:
          replicate_ignore_db:
           replicate_do_table:
       replicate_ignore_table:
      replicate_wild_do_table:
  replicate_wild_ignore_table:
                   last_errno: 1872
                   last_error: slave failed to initialize relay log info structure from the repository
                 skip_counter: 0
          exec_master_log_pos: 111435579
              relay_log_space: 0
              until_condition: none
               until_log_file:
                until_log_pos: 0
           master_ssl_allowed: no
           master_ssl_ca_file:
           master_ssl_ca_path:
              master_ssl_cert:
            master_ssl_cipher:
               master_ssl_key:
        seconds_behind_master: null
master_ssl_verify_server_cert: no
                last_io_errno: 0
                last_io_error:
               last_sql_errno: 1872
               last_sql_error: slave failed to initialize relay log info structure from the repository
  replicate_ignore_server_ids:
             master_server_id: 0
                  master_uuid: acd7aa9e-cc1c-11eb-8e2e-0c42a1474c26
             master_info_file: mysql.slave_master_info
                    sql_delay: 0
          sql_remaining_delay: null
      slave_sql_running_state:
           master_retry_count: 86400
                  master_bind:
      last_io_error_timestamp:
     last_sql_error_timestamp: 220817 09:17:54
               master_ssl_crl:
           master_ssl_crlpath:
           retrieved_gtid_set:
            executed_gtid_set: 3f6aeb45-13b5-11e9-be3e-7cd30ae4123c:1-154276204,
5157c677-13b5-11e9-9a24-7cd30ae43356:1-2847256991,
f537f1cc-a575-11ea-9d39-52540014ff9c:1-6
                auto_position: 1
         replicate_rewrite_db:
                 channel_name:
           master_tls_version:
1 row in set (0.00 sec)

3.2 查看错误日志

## 下面日志是主机启动后，mysql实例的启动日志：2022-08-17t06:45:17.529616 08:00 0 [warning] 'no_zero_date', 'no_zero_in_date' and 'error_for_division_by_zero' sql modes should be used with strict mode. they will be merged wi
th strict mode in a future release.
2022-08-17t06:45:17.532546 08:00 0 [note] /usr/local/mysql57/sbin/mysqld (mysqld 5.7.22-log) starting as process 2277 ...
2022-08-17t06:45:17.550221 08:00 0 [warning] option 'innodb-sort-buffer-size': unsigned value 134217728 adjusted to 67108864
2022-08-17t06:45:17.552817 08:00 0 [note] innodb: punch hole support available
2022-08-17t06:45:17.552844 08:00 0 [note] innodb: mutexes and rw_locks use gcc atomic builtins
2022-08-17t06:45:17.552850 08:00 0 [note] innodb: uses event mutexes
2022-08-17t06:45:17.552852 08:00 0 [note] innodb: gcc builtin __atomic_thread_fence() is used for memory barrier
2022-08-17t06:45:17.552855 08:00 0 [note] innodb: compressed tables use zlib 1.2.3
2022-08-17t06:45:17.552877 08:00 0 [note] innodb: using linux native aio
2022-08-17t06:45:17.555735 08:00 0 [note] innodb: number of pools: 1
2022-08-17t06:45:17.557545 08:00 0 [note] innodb: using cpu crc32 instructions
2022-08-17t06:45:17.561520 08:00 0 [note] innodb: initializing buffer pool, total size = 8g, instances = 16, chunk size = 128m
2022-08-17t06:45:17.561541 08:00 0 [note] innodb: setting numa memory policy to mpol_interleave
2022-08-17t06:45:18.001568 08:00 0 [note] innodb: setting numa memory policy to mpol_default
2022-08-17t06:45:18.001601 08:00 0 [note] innodb: completed initialization of buffer pool
2022-08-17t06:45:18.054769 08:00 0 [note] innodb: if the mysqld execution user is authorized, page cleaner thread priority can be changed. see the man page of setpriority().
2022-08-17t06:45:18.070395 08:00 0 [note] innodb: highest supported file format is barracuda.
2022-08-17t06:45:18.134991 08:00 0 [note] innodb: log scan progressed past the checkpoint lsn 13018018506772
2022-08-17t06:45:18.197995 08:00 0 [note] innodb: doing recovery: scanned up to log sequence number 13018019491644
2022-08-17t06:45:18.198032 08:00 0 [note] innodb: database was not shutdown normally!
2022-08-17t06:45:18.198036 08:00 0 [note] innodb: starting crash recovery.
2022-08-17t06:45:18.307661 08:00 0 [note] innodb: transaction 10209047171 was in the xa prepared state.
2022-08-17t06:45:18.310909 08:00 0 [note] innodb: transaction 10209047171 was in the xa prepared state.
2022-08-17t06:45:18.311385 08:00 0 [note] innodb: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo
2022-08-17t06:45:18.311399 08:00 0 [note] innodb: trx id counter is 10209047552
2022-08-17t06:45:18.311601 08:00 0 [note] innodb: starting an apply batch of log records to the database...
innodb: progress in percent: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 
2022-08-17t06:45:19.168448 08:00 0 [note] innodb: apply batch completed
2022-08-17t06:45:19.168500 08:00 0 [note] innodb: last mysql binlog file position 0 358270216, file name mysql_bin.003520
2022-08-17t06:45:19.332107 08:00 0 [note] innodb: starting in background the rollback of uncommitted transactions
2022-08-17t06:45:19.332133 08:00 0 [note] innodb: rollback of non-prepared transactions completed
2022-08-17t06:45:19.332171 08:00 0 [note] innodb: removed temporary tablespace data file: "ibtmp1"
2022-08-17t06:45:19.332190 08:00 0 [note] innodb: creating shared tablespace for temporary tables
2022-08-17t06:45:19.332240 08:00 0 [note] innodb: setting file './ibtmp1' size to 12 mb. physically writing the file full; please wait ...
2022-08-17t06:45:19.344599 08:00 0 [note] innodb: file './ibtmp1' size is now 12 mb.
2022-08-17t06:45:19.345375 08:00 0 [note] innodb: 96 redo rollback segment(s) found. 96 redo rollback segment(s) are active.
2022-08-17t06:45:19.345396 08:00 0 [note] innodb: 32 non-redo rollback segment(s) are active.
2022-08-17t06:45:19.345772 08:00 0 [note] innodb: 5.7.22 started; log sequence number 13018019491644
2022-08-17t06:45:19.346378 08:00 0 [note] innodb: loading buffer pool(s) from /data/mysql/mysql57_3307/data/ib_buffer_pool
2022-08-17t06:45:19.347415 08:00 0 [note] plugin 'federated' is disabled.
220817  6:45:19 server_audit: incl_dml_users and excl_dml_users specified simultaneously - both set to empty
220817  6:45:19 server_audit: server_audit_incl_users set to ''.
220817  6:45:19 server_audit: mariadb audit plugin version 1.4.4 started.
220817  6:45:19 server_audit: could not create file '/xxxxx/auditlog/audit.log'.
2022-08-17t06:45:19.354790 08:00 0 [note] recovering after a crash using /xxxxx/binlog/mysql_bin
2022-08-17t06:45:21.272713 08:00 0 [note] innodb: buffer pool(s) load completed at 220817  6:45:21
2022-08-17t06:45:24.365162 08:00 0 [note] starting crash recovery...
2022-08-17t06:45:24.365218 08:00 0 [note] innodb: starting recovery for xa transactions...
2022-08-17t06:45:24.365224 08:00 0 [note] innodb: transaction 10209047171 in prepared state after recovery
2022-08-17t06:45:24.365227 08:00 0 [note] innodb: transaction contains changes to 2 rows
2022-08-17t06:45:24.365230 08:00 0 [note] innodb: 1 transactions in prepared state after recovery
2022-08-17t06:45:24.365232 08:00 0 [note] found 1 prepared transaction(s) in innodb
2022-08-17t06:45:24.365654 08:00 0 [note] crash recovery finished.
2022-08-17t06:45:25.533770 08:00 0 [warning] failed to set up ssl because of the following ssl library error: ssl context is not usable without certificate and private key
2022-08-17t06:45:25.533805 08:00 0 [note] server hostname (bind-address): '*'; port: 3307
2022-08-17t06:45:25.533832 08:00 0 [note] ipv6 is available.
2022-08-17t06:45:25.533837 08:00 0 [note]   - '::' resolves to '::';
2022-08-17t06:45:25.533851 08:00 0 [note] server socket created on ip: '::'.
2022-08-17t06:45:25.541233 08:00 0 [warning] 'user' entry 'dba-docp@localhost' ignored in --skip-name-resolve mode.
2022-08-17t06:45:25.541250 08:00 0 [warning] 'user' entry 'mysql.session@localhost' ignored in --skip-name-resolve mode.
2022-08-17t06:45:25.541256 08:00 0 [warning] 'user' entry 'zabbix_monitor@localhost' ignored in --skip-name-resolve mode.
2022-08-17t06:45:25.541291 08:00 0 [warning] 'db' entry 'performance_schema mysql.session@localhost' ignored in --skip-name-resolve mode.
2022-08-17t06:45:25.541295 08:00 0 [warning] 'db' entry 'sys mysql.sys@localhost' ignored in --skip-name-resolve mode.
2022-08-17t06:45:25.541304 08:00 0 [warning] 'proxies_priv' entry '@ root@localhost' ignored in --skip-name-resolve mode.
2022-08-17t06:45:25.552222 08:00 0 [warning] 'tables_priv' entry 'user mysql.session@localhost' ignored in --skip-name-resolve mode.
2022-08-17t06:45:25.552241 08:00 0 [warning] 'tables_priv' entry 'sys_config mysql.sys@localhost' ignored in --skip-name-resolve mode.
2022-08-17t06:45:25.559579 08:00 0 [note] slave: mts group recovery relay log info based on worker-id 0, group_relay_log_name /xxxxx/relaylog/relay.011626, group_relay_log_pos 4217146 group_master_log_name mysql-bin.005810, group_master_log_pos 4216933## 实例启动后，slave进程的报错
2022-08-17t06:45:25.559872 08:00 0 [error] error looking for file after /xxxxx/relaylog/relay.011627.
2022-08-17t06:45:25.560208 08:00 0 [error] slave: failed to initialize the master info structure for channel ''; its record may still be present in 'mysql.slave_master_info' table, consider deleting it.
2022-08-17t06:45:25.560215 08:00 0 [error] failed to create or recover replication info repositories.
2022-08-17t06:45:25.560221 08:00 0 [error] slave sql for channel '': slave failed to initialize relay log info structure from the repository, error_code: 1872
2022-08-17t06:45:25.560224 08:00 0 [error] /usr/local/mysql57/sbin/mysqld: slave failed to initialize relay log info structure from the repository
2022-08-17t06:45:25.560225 08:00 0 [error] failed to start slave threads for channel ''
2022-08-17t06:45:25.560227 08:00 0 [note] some of the channels are not created/initialized properly. check for additional messages above. you will not be able to start replication on those channels until the issue is resolved and the server restarted.
2022-08-17t06:45:25.581309 08:00 0 [note] event scheduler: loaded 0 events
2022-08-17t06:45:25.581497 08:00 0 [note] /usr/local/mysql57/sbin/mysqld: ready for connections.
version: '5.7.22-log'  socket: '/xxxxx/socketdir/mysql.sock'  port: 3307  mysql community server (gpl)

3.3 尝试手动启动slave

报错：error 1872 (hy000): slave failed to initialize relay log info structure from the repository

mysql> start slave;
error 1872 (hy000): slave failed to initialize relay log info structure from the repository

2022-08-17t09:02:30.689881 08:00 551 [error] slave sql for channel '': slave failed to initialize relay log info structure from the repository, error_code: 1872
2022-08-17t09:02:33.382601 08:00 551 [error] slave sql for channel '': slave failed to initialize relay log info structure from the repository, error_code: 1872
2022-08-17t09:03:31.630680 08:00 551 [error] slave sql for channel '': slave failed to initialize relay log info structure from the repository, error_code: 1872
2022-08-17t09:05:21.955292 08:00 564 [warning] storing mysql user name or password information in the master info repository is not secure and is therefore not recommended. please consider using the user and password connection options for start slave; see the 'start slave syntax' in the mysql manual for more information.

4 原因

根据报错，推测是mysql bug导致。

4.1 bug名称

bug #83713 : slave failed to initialize relay log info after os crash when use mts and gtid

mysql bug链接：https://bugs.mysql.com/bug.php?id=83713

4.2 bug描述

description:
when i tried to configurate a crash safe slave with mts and gtid based replication, but after a os crash
replication failed to be start.

4.3 参数配置

[root@localhost] {18:09:49} ((none)) [5]> show variables where variable_name in('log_slave_updates', 'gtid_mode', 'enforce_gtid_consistency', 'master_info_repository', 'relay_log_info_repository', 'relay_log_recovery', 'sync_binlog', 'innodb_flush_log_at_trx_commit', 'slave_parallel_workers','sync_master_info', 'sync_relay_log', 'sync_relay_log_info'); 
 -------------------------------- ------- 
| variable_name                  | value |
 -------------------------------- ------- 
| enforce_gtid_consistency       | on    |
| gtid_mode                      | on    |
| innodb_flush_log_at_trx_commit | 1     |
| log_slave_updates              | on    |
| master_info_repository         | table |
| relay_log_info_repository      | table |
| relay_log_recovery             | on    |
| slave_parallel_workers         | 16    |
| sync_binlog                    | 1     |
| sync_master_info               | 10000 |
| sync_relay_log                 | 10000 |
| sync_relay_log_info            | 10000 |
 -------------------------------- ------- 
12 rows in set (0.01 sec)

4.4 复现

step1:
run a script to execute update sql on the master continuously
step2:
use "kill -9" kill the kvm process of the slave in the physical machine
step3:  
start mysql of the slave

按照上面方法，没有复现。下面是具体步骤

步骤1 主库会话持续插入数据

create table `t2` (
  `id` bigint(11) unsigned not null auto_increment,
  `name` varchar(50) not null,
  `sex` char(1) not null,
  `score` char(2) not null,
  primary key (`id`)
) engine=innodb auto_increment=1 default charset=utf8mb4;
drop procedure p_1;
delimiter //
create procedure p_1(in num int)
begin
	declare i int default 0;
	declare sex char(1);
	declare score char(2);
	set score= floor(40   (rand() *60));
	set sex=floor(0   (rand() * 2));
	while i < 1000000
	do
		insert into t2(name,sex,score) values (concat('f',i),sex,score);
		set i = i   1;
	end while;
end
//
delimiter ;
call p_1(1000000);

步骤2 从库主机层面kill实例进程

[ root@mysql10-dba:~ ]# kill -9 53372

步骤3 主库启动mysql实例，启动slave

[ root@mysql10-dba:~ ]# systemctl start mysql57@7777

4.5 原因/建议

suggested fix:
the error ocuurs when mts_recovery_groups() reading the corrupted relay log files in mts group recovery.
but with "gtid_mode=on" and "relay_log_recovery = 1", the relay log files will be discard later, and so, at the beginning should skip the relay log read.
当mts_recovery_groups()读取mts组恢复中损坏的中继日志文件时发生错误。
但是对于"gtid_mode=on"和"relay_log_recovery = 1"，中继日志文件稍后将被丢弃，因此，在开始时应该跳过中继日志读取。
workaround:
the following steps can start slave successfully
    reset slave;
    start slave io_thread;
    stop slave io_thread;
    reset slave;
    start slave;

5 m6米乐安卓版下载的解决方案

[root@localhost:] {09:18:17} ((none)) [1]> reset slave;
query ok, 0 rows affected (0.01 sec)
[root@localhost:] {09:18:36} ((none)) [3]> start slave;
query ok, 0 rows affected (0.04 sec)

6 补充

reset slave干的那些事：

删除slave_master_info ，slave_relay_log_info两个表中数据；
删除所有relay log文件，并重新创建新的relay log文件；
不会改变gtid_executed 或者 gtid_purged的值；

官方文档链接描述：

reset slave makes the replica forget its replication position in the source's binary log. this statement is meant to be used for a clean start: it clears the replication metadata repositories, deletes
all the relay log files, and starts a new relay log file. it also resets to 0 the replication delay specified with the master_delay option to change master to.
all relay log files are deleted, even if they have not been completely executed by the replication sql thread. (this is a condition likely to exist on a replica if
you have issued a stop slave statement or if the replica is highly loaded.)
for a server where gtids are in use (gtid_mode is on), issuing reset slave has no effect on the gtid execution history. the statement does not change the values of gtid_executed or
gtid_purged, or the mysql.gtid_executed table. if you need to reset the gtid execution history, use reset master, even if the gtid-enabled server is a replica where binary logging is disabled.
reset slave requires the reload privilege.

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者