m6米乐安卓版下载-米乐app官网下载
暂无图片
14

mogdb/opengauss与pg的repmgr对比 -m6米乐安卓版下载

原创 阎书利 2021-12-02
2127

提到pg的repmgr,大家可能并不陌生,他是现在pg比较流行的一套开源工具,用于管理postgresql服务器集群中的复制管理和故障转移,也就是相当于一个集群管理 ha工具。当前pg的高可用方案,大致有keepalived、pgpool、repmgr、pacemaker corosync、etcd patroni等等。其中etcd patroni和repmgr是目前用的较多的高可用。patroni的话需要至少三个以上且为奇数的 etcd 节点,而且大部分参数都需要通过更改 etcd 中键值来修改。而repmgr相对来说配置简单、添加节点比较方便。

从opengauss数据库开源以来,也研究mogdb/opengauss一年了,个人感觉mogdb/opengauss加上云和恩墨自主研发的mogha高可用工具 和 pg的repmgr高可用方案比较类似。

众所周知mogdb/opengauss数据库是基于postgresql研发而出,因此一些功能和工具也比较类似。

工具的话比较常用的gs_basebackup/pg_basebackup,gs_probackup/pg_probackup,pg_resetxlog等等。今天对于工具这边就不做详细说明了。如下,是我对mogdb/opengauss集群管理、mogha高可用软件以及pg的repmgr相对应的功能的一些对比。接触过mogdb/opengauss或者repmgr单一一种的可以试着了解一下另外一部分的,上手应该会比较快。

1.查看集群状态
2.mogdb/opengauss全量build  & repmgr clone
3.mogdb/opengauss增量build & repmgr node rejoin --force-rewind
4.switchover
5.mogdb/opengauss高可用工具mogha & repmgr 切换

一、查看集群状态

1.mogha/opengauss:

mogha/opengauss的数据库可以通过om工具来查看集群状态。


     [omm@node1 ~]$ gs_om -t status --detail
    [   cluster state   ]
    
    cluster_state   : normal
    redistributing  : no
    current_az      : az_all
    
    [  datanode state   ]
    
    ## node     node_ip         instance                  state            | node     node_ip         instance                  state
    
    1  node1 172.20.10.7     6001 /gaussdb/data/dn1 p primary normal | 2  node2 172.20.10.8     6002 /gaussdb/data/dn1 s standby normal

2.pg repmgr

    /home/postgres/repmgr-5.1.0/repmgr -f /home/postgres/repmgr.conf cluster show
    
    debug: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=172.20.10.1 port=6000 fallback_application_name=repmgr"
    debug: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=172.20.10.1 port=6000 fallback_application_name=repmgr"
    debug: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=172.20.10.2 port=6000 fallback_application_name=repmgr"
    debug: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=172.20.10.1 port=6000 fallback_application_name=repmgr"
    debug: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=172.20.10.3 port=6000 fallback_application_name=repmgr"
    debug: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=172.20.10.1 port=6000 fallback_application_name=repmgr"
     id | name        | role    | status    | upstream    | location | priority | timeline | connection string
    ---- ------------- --------- ----------- ------------- ---------- ---------- ---------- ------------------------------------------------------------------------
     1  | enmo_6001 | primary | * running |             | default  | 100      | 1        | host=172.20.10.1 port=6000 user=repmgr dbname=repmgr connect_timeout=2
     2  | enmo_6002 | standby |   running | enmo_6001 | default  | 100      | 1        | host=172.20.10.2 port=6000 user=repmgr dbname=repmgr connect_timeout=2
     3  | enmo_6003 | standby |   running | enmo_6001 | default  | 100      | 1        | host=172.20.10.3 port=6000 user=repmgr dbname=repmgr connect_timeout=2

二、mogdb/opengauss全量build & repmgr clone

这两种方法,mogdb/opengauss的build一般用在主备的数据不一致,且日志追不上的情况,备库从主库重新拉取一份数据目录数据,且在拉取之前会清空数据目录,repmgr一般用在重新搭建主从,或者主备的数据不一致,且日志追不上的情况,或者timeline存在问题的时候。clone要在备机register之前做。

1.mogdb/opengauss

gs\_ctl build -d /gaussdb/data/dn1/ -b full

mogdb/opengauss的全量build是通过全量镜像的方式重新同步主机的数据目录 。

 [omm@node2 ~]$ gs_ctl build -d /gaussdb/data/dn1/ -b full
    [2021-09-30 07:27:48.355][18280][][gs_ctl]: gs_ctl full build ,datadir is /gaussdb/data/dn1
    waiting for server to shut down.... done
    server stopped
    [2021-09-30 07:27:49.375][18280][][gs_ctl]: current workdir is (/home/omm).
    [2021-09-30 07:27:49.375][18280][][gs_ctl]:  fopen build pid file "/gaussdb/data/dn1/gs_build.pid" success
    [2021-09-30 07:27:49.375][18280][][gs_ctl]:  fprintf build pid file "/gaussdb/data/dn1/gs_build.pid" success
    [2021-09-30 07:27:49.376][18280][][gs_ctl]:  fsync build pid file "/gaussdb/data/dn1/gs_build.pid" success
    [2021-09-30 07:27:49.376][18280][][gs_ctl]: set gaussdb state file when full build:db state(building_state), server mode(standby_mode), build mode(full_build).
    [2021-09-30 07:27:49.381][18280][dn_6001_6002][gs_ctl]: connect to server success, build started.
    [2021-09-30 07:27:49.381][18280][dn_6001_6002][gs_ctl]: create build tag file success
    [2021-09-30 07:27:49.806][18280][dn_6001_6002][gs_ctl]: clear old target dir success
    [2021-09-30 07:27:49.806][18280][dn_6001_6002][gs_ctl]: create build tag file again success
    [2021-09-30 07:27:49.806][18280][dn_6001_6002][gs_ctl]: get system identifier success
    [2021-09-30 07:27:49.806][18280][dn_6001_6002][gs_ctl]: receiving and unpacking files...
    [2021-09-30 07:27:49.806][18280][dn_6001_6002][gs_ctl]: create backup label success
    [2021-09-30 07:27:50.634][18280][dn_6001_6002][gs_ctl]: xlog start point: 0/3a000028
    [2021-09-30 07:27:50.634][18280][dn_6001_6002][gs_ctl]: begin build tablespace list
    [2021-09-30 07:27:50.635][18280][dn_6001_6002][gs_ctl]: finish build tablespace list
    [2021-09-30 07:27:50.635][18280][dn_6001_6002][gs_ctl]: begin get xlog by xlogstream
    [2021-09-30 07:27:50.635][18280][dn_6001_6002][gs_ctl]: starting background wal receiver
    [2021-09-30 07:27:50.635][18280][dn_6001_6002][gs_ctl]: starting walreceiver
    [2021-09-30 07:27:50.635][18280][dn_6001_6002][gs_ctl]: begin receive tar files
    [2021-09-30 07:27:50.636][18280][dn_6001_6002][gs_ctl]: receiving and unpacking files...
    [2021-09-30 07:27:50.656][18280][dn_6001_6002][gs_ctl]:  check identify system success
    [2021-09-30 07:27:50.657][18280][dn_6001_6002][gs_ctl]:  send start_replication 0/3a000000 success
    [2021-09-30 07:28:01.151][18280][dn_6001_6002][gs_ctl]: finish receive tar files
    [2021-09-30 07:28:01.151][18280][dn_6001_6002][gs_ctl]: xlog end point: 0/3a000170
    [2021-09-30 07:28:01.151][18280][dn_6001_6002][gs_ctl]: fetching mot checkpoint
    [2021-09-30 07:28:01.152][18280][dn_6001_6002][gs_ctl]: waiting for background process to finish streaming...
    [2021-09-30 07:28:05.805][18280][dn_6001_6002][gs_ctl]: build dummy dw file success
    [2021-09-30 07:28:05.805][18280][dn_6001_6002][gs_ctl]: rename build status file success
    [2021-09-30 07:28:05.805][18280][dn_6001_6002][gs_ctl]: build completed(/gaussdb/data/dn1).
    [2021-09-30 07:28:06.157][18280][dn_6001_6002][gs_ctl]: waiting for server to start...

2.pg repmgr

repmgr -h 172.20.10.7 -p6000 -u repmgr -d repmgr -f

而repmgr默认是通过pg_basebackup的方式,在日志里会打印出来,且查看源码发现还支持barman的方式,opengauss的全量build在源码里暂未找到调用gs_basebackup的相关代码(可能我没找到,欢迎帮我指正交流)

[postgres@enmo-02 ~]$repmgr -h 172.20.10.1 -p6000 -u repmgr -d repmgr -f /home/postgres/repmgr.conf standby clone 
 notice: destination directory "/home/postgres/data" provided
 info: connecting to source node
 detail: connection string is: host=172.20.10.1 port=6000 user=repmgr 
 dbname=repmgr
 detail: current installation size is 42 mb
 debug: 1 node records returned by source node
 debug: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr 
 host=172.20.10.1 port=6000 fallback_application_name=repmgr"
 debug: upstream_node_id determined as 11
 notice: checking for available walsenders on the source node (2 
 required)
 notice: checking replication connections can be made to the source 
 server (2 required)
 info: checking and correcting permissions on existing directory 
 "/home/postgres/data"
 notice: starting backup (using pg_basebackup)...
 hint: this may take some time; consider using the -c/--fastcheckpoint option
 info: executing:
  /usr/pgsql-11/bin/pg_basebackup -l "repmgr base backup" -d 
 /home/postgres/data -h 172.20.10.7 -p 6000 -u repmgr -x stream 

image.png
image.png

除此之外,发现mogdb/opengauss和repmgr在做全量build或者clone的时候,mogdb/opengauss会把数据目录清空,而repmgr的clone则是覆盖。

mogha/opengauss:
image.png
image.png
repmgr:
image.png
当然,repmgr加上 -f/–force也不能覆盖一个活跃的目录。
image.png

三.mogdb/opengauss增量build & repmgr node rejoin --force-rewind

mogha/opengauss:

可以看到,mogha/opengauss:在通过-b increment做增量build的时候,用到了gs_rewind,与其类似的repmgr也可以通过 repmgr node rejoin --force-rewind,实现类似的功能。

[omm@node2 ~]$ gs_ctl build -d /gaussdb/data/dn1/ -b increment
   [2021-09-30 07:54:12.879][27818][][gs_ctl]: gs_ctl incremental build ,datadir is /gaussdb/data/dn1
   waiting for server to shut down.... done
   server stopped
   [2021-09-30 07:54:13.897][27818][][gs_ctl]:  fopen build pid file "/gaussdb/data/dn1/gs_build.pid" success
   [2021-09-30 07:54:13.897][27818][][gs_ctl]:  fprintf build pid file "/gaussdb/data/dn1/gs_build.pid" success
   [2021-09-30 07:54:13.897][27818][][gs_ctl]:  fsync build pid file "/gaussdb/data/dn1/gs_build.pid" success
   [2021-09-30 07:54:13.902][27818][dn_6001_6002][gs_rewind]: set gaussdb state file when rewind:db state(building_state), server mode(standby_mode), build mode(inc_build).
   [2021-09-30 07:54:13.965][27818][dn_6001_6002][gs_rewind]: connected to server: host=172.20.10.7 port=26001 dbname=postgres application_name=gs_rewind connect_timeout=5
   [2021-09-30 07:54:13.970][27818][dn_6001_6002][gs_rewind]: connect to primary success
   [2021-09-30 07:54:13.971][27818][dn_6001_6002][gs_rewind]: get pg_control success
   [2021-09-30 07:54:13.972][27818][dn_6001_6002][gs_rewind]: target server was interrupted in mode 2.
   [2021-09-30 07:54:13.972][27818][dn_6001_6002][gs_rewind]: sanitychecks success
   [2021-09-30 07:54:13.972][27818][dn_6001_6002][gs_rewind]: find last checkpoint at 0/3b3ce868 and checkpoint redo at 0/3b3ce7e8 from source control file
   [2021-09-30 07:54:13.972][27818][dn_6001_6002][gs_rewind]: find last checkpoint at 0/3b3ce868 and checkpoint redo at 0/3b3ce7e8 from target control file
   [2021-09-30 07:54:13.974][27818][dn_6001_6002][gs_rewind]: find max lsn success, find max lsn rec (0/3b3ce868) success.
   
   [2021-09-30 07:54:13.979][27818][dn_6001_6002][gs_rewind]: request lsn is 0/3b3ce868 and its crc(source, target):[3849081485, 3849081485]
   [2021-09-30 07:54:13.979][27818][dn_6001_6002][gs_rewind]: find common checkpoint 0/3b3ce868
   [2021-09-30 07:54:13.979][27818][dn_6001_6002][gs_rewind]: find diverge point success
   [2021-09-30 07:54:13.979][27818][dn_6001_6002][gs_rewind]: read checkpoint redo (0/3b3ce7e8) success before rewinding.
   [2021-09-30 07:54:13.979][27818][dn_6001_6002][gs_rewind]: rewinding from checkpoint redo point at 0/3b3ce7e8 on timeline 1
   [2021-09-30 07:54:13.979][27818][dn_6001_6002][gs_rewind]: diverge xlogfile is 00000001000000000000003b, older ones will not be copied or removed.
   [2021-09-30 07:54:13.980][27818][dn_6001_6002][gs_rewind]: targetfilestatthread success pid 139740617094912.
   [2021-09-30 07:54:13.980][27818][dn_6001_6002][gs_rewind]: reading source file list
   [2021-09-30 07:54:13.980][27818][dn_6001_6002][gs_rewind]: traverse_datadir start.
   [2021-09-30 07:54:13.983][27818][dn_6001_6002][gs_rewind]: filemap_list_to_array start.
   [2021-09-30 07:54:13.983][27818][dn_6001_6002][gs_rewind]: filemap_list_to_array end sort start. length is 2704
   [2021-09-30 07:54:13.983][27818][dn_6001_6002][gs_rewind]: sort end.
   [2021-09-30 07:54:13.990][27818][dn_6001_6002][gs_rewind]: targetfilestatthread return success.
   [2021-09-30 07:54:14.001][27818][dn_6001_6002][gs_rewind]: reading target file list
   [2021-09-30 07:54:14.005][27818][dn_6001_6002][gs_rewind]: traverse target datadir success
   [2021-09-30 07:54:14.005][27818][dn_6001_6002][gs_rewind]: reading wal in target
   [2021-09-30 07:54:14.005][27818][dn_6001_6002][gs_rewind]: could not read wal record at 0/3b3ce900: invalid record length at 0/3b3ce900: wanted 32, got 0
   [2021-09-30 07:54:14.006][27818][dn_6001_6002][gs_rewind]: calculate totals rewind success
   [2021-09-30 07:54:14.006][27818][dn_6001_6002][gs_rewind]: need to copy 17mb (total source directory size is 540mb)
   [2021-09-30 07:54:14.006][27818][dn_6001_6002][gs_rewind]: starting background wal receiver
   [2021-09-30 07:54:14.006][27818][dn_6001_6002][gs_rewind]: starting copy xlog, start point: 0/3b3ce7e8
   [2021-09-30 07:54:14.006][27818][dn_6001_6002][gs_rewind]: in gs_rewind proecess,so no need remove.
   [2021-09-30 07:54:14.012][27818][dn_6001_6002][gs_rewind]:  check identify system success
   [2021-09-30 07:54:14.012][27818][dn_6001_6002][gs_rewind]:  send start_replication 0/3b000000 success
   [2021-09-30 07:54:14.047][27818][dn_6001_6002][gs_rewind]: receiving and unpacking files...
   [2021-09-30 07:54:14.173][27818][dn_6001_6002][gs_rewind]: execute file map success
   [2021-09-30 07:54:14.174][27818][dn_6001_6002][gs_rewind]: find minrecoverypoint success from xlog insert location 0/3b3d2d80
   [2021-09-30 07:54:14.174][27818][dn_6001_6002][gs_rewind]: update pg_control file success, minrecoverypoint: 0/3b3d2d80, ckploc:0/3b3ce868, ckpredo:0/3b3ce7e8, preckp:0/3b3ce750
   [2021-09-30 07:54:14.176][27818][dn_6001_6002][gs_rewind]: update pg_dw file success
   [2021-09-30 07:54:14.177][27818][dn_6001_6002][gs_rewind]: xlog end point: 0/3b3d2d80
   [2021-09-30 07:54:14.177][27818][dn_6001_6002][gs_rewind]: waiting for background process to finish streaming...
   [2021-09-30 07:54:19.086][27818][dn_6001_6002][gs_rewind]: creating backup label and updating control file
   [2021-09-30 07:54:19.086][27818][dn_6001_6002][gs_rewind]: create backup label success
   [2021-09-30 07:54:19.086][27818][dn_6001_6002][gs_rewind]: read checkpoint redo (0/3b3ce7e8) success.
   [2021-09-30 07:54:19.086][27818][dn_6001_6002][gs_rewind]: read checkpoint rec (0/3b3ce868) success.
   [2021-09-30 07:54:19.086][27818][dn_6001_6002][gs_rewind]: dn incremental build completed.
   [2021-09-30 07:54:19.090][27818][dn_6001_6002][gs_rewind]: fetching mot checkpoint
   [2021-09-30 07:54:19.201][27818][dn_6001_6002][gs_ctl]: waiting for server to start...

repmgr:

命令为:

repmgr node rejoin -d ‘host=172.20.10.1 user=repmgr dbname=repmgr connect\_timeout=2’ --force-rewind --verbose

四.switchover

都是在备机执行,将本节点提升为主库,同时主节点降级为该新主的备机。

mogha/opengauss:

gs_ctl switchover -d /gaussdb/data/dn1

    [omm@node2 ~]$ gs_ctl switchover -d /gaussdb/data/dn1
    [2021-09-30 08:02:09.192][30742][][gs_ctl]: gs_ctl switchover ,datadir is /gaussdb/data/dn1
    [2021-09-30 08:02:09.192][30742][][gs_ctl]: switchover term (1)
    [2021-09-30 08:02:09.196][30742][][gs_ctl]: waiting for server to switchover..........
    [2021-09-30 08:02:16.327][30742][][gs_ctl]: done
    [2021-09-30 08:02:16.327][30742][][gs_ctl]: switchover completed (/gaussdb/data/dn1)

repmgr:

repmgr -f /home/postgres/repmgr.conf standby switchover -u repmgr --verbose

[postgres@enmo-02 ~]$ repmgr -f /home/postgres/repmgr.conf standby switchover -u repmgr  --verbose
    
    notice: using provided configuration file "/home/postgres/repmgr.conf"
    
    warning: following problems with command line parameters detected:
    
      database connection parameters not required when executing standby switchover
    
    debug: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=10.28.3.134 port=6000 fallback_application_name=repmgr"
    
    debug: set_config():
    
      set synchronous_commit to 'local'
    
    debug: get_node_record():
    
      select n.node_id, n.type, n.upstream_node_id, n.node_name,  n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' as upstream_node_name, null as attached   from repmgr.nodes n  where n.node_id = 2
    
    notice: executing switchover on node "falcon_6002" (id: 2)
    
    debug: get_recovery_type(): select pg_catalog.pg_is_in_recovery()
    
    info: searching for primary node
    
    debug: get_primary_connection():

五.mogdb/opengauss高可用工具mogha & repmgr 切换

在这一方面mogha/opengauss的集群管理工具是gs_om,通过gs_om -t status --detail或者gs_om -t query 查看集群的相关状态,但是om工具本身是不支持高可用的自动切换的,因此,针对这个问题,云和恩墨自主研发了mogha高可用工具,它可以通过网络,主备角色状态,孤单检查,心跳检查等多个维度对数据库和主备节点进行检查,确保mogha/opengauss数据库集群能提供长稳运行,且最大程度降低故障切换对业务的影响。支持脑裂检查以及假主处理,vip自动漂移到新主节点的功能,是一款比较方便且可信的高可用工具。

而repmgr在这边可以通过编写repmgr_promote.sh和repmgr_follow.shd脚本的方式,实现数据库的高可用自动切换。和mogdb/opengauss mogha om的一套体系相比,缺少了mogha在switchover之后对数据库节点角色检查并vip飘移到新主的功能,可能是美中不足。

最后修改时间:2021-12-10 02:05:49
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【米乐app官网下载的版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

网站地图