Jump to content

Server Admin Log

From Wikitech

2025-02-20

  • 18:42 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: test operations in mixed opensearch/elasticsearch cluster - bking@cumin2002 - T380752:
  • 18:42 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: test operations in mixed opensearch/elasticsearch cluster - bking@cumin2002 - T380752:
  • 18:18 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 18:18 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 18:17 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 18:11 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 18:10 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 18:09 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:51 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching cassandra-dev200[2-3].codfw.wmnet: Upgrading to Cassandra 4.1.8 — T385819 - eevans@cumin1002
  • 17:47 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Upgrading to Cassandra 4.1.7 — T380420 - eevans@cumin1002
  • 17:37 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching cassandra-dev200[2-3].codfw.wmnet: Upgrading to Cassandra 4.1.8 — T385819 - eevans@cumin1002
  • 17:29 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Upgrading to Cassandra 4.1.7 — T380420 - eevans@cumin1002
  • 17:29 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching cassandra-dev2001.codfw.wmnet: Upgrade to Cassandra 4.1.8 — T385819 - eevans@cumin1002
  • 17:22 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching cassandra-dev2001.codfw.wmnet: Upgrade to Cassandra 4.1.8 — T385819 - eevans@cumin1002
  • 17:19 rzl@deploy2002: Finished scap sync-world: T385520 (duration: 09m 01s)
  • 17:13 rzl@deploy2002: rzl: Continuing with sync
  • 17:12 rzl@deploy2002: rzl: T385520 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 17:10 rzl@deploy2002: Started scap sync-world: T385520
  • 17:08 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:05 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 17:04 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 16:58 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 16:56 mutante: phab1004 (phabricator) - systemctl stop phabricator_stats_job_mfa_check timer and service; systemctl (gerrit:1117489)
  • 16:55 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 16:50 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:49 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 16:49 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 16:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:35 dancy@deploy2002: Installation of scap version "4.137.0" completed for 204 hosts
  • 16:31 dancy@deploy2002: Installing scap version "4.137.0" for 204 host(s)
  • 16:27 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:13 elukey@puppetserver1001: conftool action : set/pooled=inactive:weight=5; selector: name=wikikube-worker1004.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
  • 16:12 elukey@puppetserver1001: conftool action : set/pooled=inactive:weight=5; selector: name=wikikube-worker2003.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
  • 16:10 vgutierrez: updating liberica to version 0.10 in ulsfo load balancers
  • 16:03 vgutierrez: upload liberica 0.9 to apt.wm.o (bookworm-wikimedia)
  • 15:56 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Remove `tmpEnableMulLanguageCode` setting (T330217) (duration: 10m 43s)
  • 15:51 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=5; selector: name=wikikube-worker2003.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
  • 15:51 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: test operations in mixed opensearch/elasticsearch cluster - bking@cumin2002 - T380752
  • 15:51 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: test operations in mixed opensearch/elasticsearch cluster - bking@cumin2002 - T380752
  • 15:49 lucaswerkmeister-wmde@deploy2002: arthurtaylor, lucaswerkmeister-wmde: Continuing with sync
  • 15:48 lucaswerkmeister-wmde@deploy2002: arthurtaylor, lucaswerkmeister-wmde: Backport for Remove `tmpEnableMulLanguageCode` setting (T330217) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:46 arturo: update k9s in bookworm-wikimedia thirdparty/k9s to 0.40.5
  • 15:45 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Remove `tmpEnableMulLanguageCode` setting (T330217)
  • 15:29 ihurbain: UTC afternoon deploys done
  • {{safesubst:SAL entry|1=15:28 ihurbain@deploy2002: Finished scap sync-world: Backport for Restore "Add configuration options and global preference for the SUL3 rolllout" (T386836), Restore "Add configuration options and global preference for the SUL3 rolllout" (T386836), SharedDomainUtils: Avoid early instantiation of NamespaceInfo (T386836), [[gerrit:1121332|SharedDomainUtils: Avoid early i}}
  • 15:21 ihurbain@deploy2002: tgr, ihurbain: Continuing with sync
  • 15:20 inflatador: bking@apt1002:~/pkg$ sudo -E reprepro -C component/opensearch13 include bullseye-wikimedia $HOME/pkg/wmf-opensearch-search-plugins_1.3.20-1_amd64.changes (again)T380752
  • 15:20 inflatador: bking@apt1002:~/pkg$ sudo -E reprepro -C component/opensearch13 remove bullseye-wikimedia wmf-opensearch-search-plugins T380752
  • 15:19 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ms-be2088.codfw.wmnet
  • 15:17 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=5; selector: name=wikikube-worker2002.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
  • 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be2088.codfw.wmnet
  • 15:02 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=5; selector: name=wikikube-worker2001.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
  • 14:58 elukey@puppetserver1001: conftool action : set/pooled=inactive:weight=5; selector: name=wikikube-worker2001.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
  • 14:55 vgutierrez: testing liberica 0.8 in lvs1013
  • {{safesubst:SAL entry|1=14:55 ihurbain@deploy2002: tgr, ihurbain: Backport for Restore "Add configuration options and global preference for the SUL3 rolllout" (T386836), Restore "Add configuration options and global preference for the SUL3 rolllout" (T386836), SharedDomainUtils: Avoid early instantiation of NamespaceInfo (T386836), [[gerrit:1121332|SharedDomainUtils: Avoid early instantiatio}}
  • 14:53 vgutierrez: upload liberica 0.8 to apt.wm.o (bookworm-wikimedia)
  • {{safesubst:SAL entry|1=14:52 ihurbain@deploy2002: Started scap sync-world: Backport for Restore "Add configuration options and global preference for the SUL3 rolllout" (T386836), Restore "Add configuration options and global preference for the SUL3 rolllout" (T386836), SharedDomainUtils: Avoid early instantiation of NamespaceInfo (T386836), [[gerrit:1121332|SharedDomainUtils: Avoid early in}}
  • 14:50 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=5; selector: name=wikikube-worker1004.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
  • 14:50 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=5; selector: name=wikikube-worker1003.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
  • 14:47 inflatador: bking@apt1002:~/pkg$ sudo -E reprepro -C component/opensearch13 include bullseye-wikimedia $HOME/pkg/wmf-opensearch-search-plugins_1.3.20-1_amd64.changes T380752
  • 14:37 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=wikikube-worker1003.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
  • 14:36 ihurbain@deploy2002: Finished scap sync-world: Backport for Turn on Parsoid Read Views for 27 wiktionaries (T386762) (duration: 12m 00s)
  • 14:29 ihurbain@deploy2002: arlolra, ihurbain: Continuing with sync
  • 14:27 ihurbain@deploy2002: arlolra, ihurbain: Backport for Turn on Parsoid Read Views for 27 wiktionaries (T386762) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:24 ihurbain@deploy2002: Started scap sync-world: Backport for Turn on Parsoid Read Views for 27 wiktionaries (T386762)
  • 14:21 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
  • 14:21 ihurbain@deploy2002: Finished scap sync-world: Backport for Enable $wgCampaignEventsEnableEventInvitation on most wikis (T383800) (duration: 12m 02s)
  • 14:21 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
  • 14:14 ihurbain@deploy2002: daimona, ihurbain: Continuing with sync
  • 14:12 ihurbain@deploy2002: daimona, ihurbain: Backport for Enable $wgCampaignEventsEnableEventInvitation on most wikis (T383800) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:09 ihurbain@deploy2002: Started scap sync-world: Backport for Enable $wgCampaignEventsEnableEventInvitation on most wikis (T383800)
  • 14:01 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
  • 14:01 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/kartotherian: sync
  • 14:01 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/kartotherian: sync
  • 14:00 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/kartotherian: sync
  • 13:37 ladsgroup@deploy2002: Finished scap sync-world: Backport for Take 2: Footer: Wikimedia icon should collapse at lower resolutions"" (T384619) (duration: 11m 54s)
  • 13:30 ladsgroup@deploy2002: ladsgroup, jdlrobson: Continuing with sync
  • 13:28 ladsgroup@deploy2002: ladsgroup, jdlrobson: Backport for Take 2: Footer: Wikimedia icon should collapse at lower resolutions"" (T384619) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:25 ladsgroup@deploy2002: Started scap sync-world: Backport for Take 2: Footer: Wikimedia icon should collapse at lower resolutions"" (T384619)
  • 13:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
  • 13:10 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
  • 13:03 sgimeno@deploy2002: Finished scap sync-world: Backport for LevelingUp: Schema migration for GELevelingUpKeepGoingNotificationThresholds. (T369551), Revert "LevelingUp: Schema migration for GELevelingUpKeepGoingNotificationThresholds." (duration: 10m 43s)
  • 12:56 sgimeno@deploy2002: sgimeno: Continuing with sync
  • 12:55 sgimeno@deploy2002: sgimeno: Backport for LevelingUp: Schema migration for GELevelingUpKeepGoingNotificationThresholds. (T369551), Revert "LevelingUp: Schema migration for GELevelingUpKeepGoingNotificationThresholds." synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:52 sgimeno@deploy2002: Started scap sync-world: Backport for LevelingUp: Schema migration for GELevelingUpKeepGoingNotificationThresholds. (T369551), Revert "LevelingUp: Schema migration for GELevelingUpKeepGoingNotificationThresholds."
  • 11:51 vgutierrez: restarting pybal on lvs1019, effectively switching swift-fe@eqiad to maglev - T385564
  • 11:47 vgutierrez: restarting pybal on lvs1020 - T385564
  • 11:43 vgutierrez: restarting pybal on lvs2013, effectively switching swift-fe@codfw to maglev - T385564
  • 11:41 vgutierrez: restarting pybal on lvs2014 - T385564
  • 11:10 vgutierrez: restarting pybal on lvs1019, effectively enabling IPIP encapsulation for swift-fe@eqiad - T385564
  • 11:08 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ganeti1026.eqiad.wmnet
  • 11:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1026.eqiad.wmnet
  • 11:08 vgutierrez: restarting pybal on lvs1020 - T385564
  • 11:02 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=wikikube-worker1003.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
  • 10:58 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1026.eqiad.wmnet
  • 10:54 cmooney@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on netflow1002.eqiad.wmnet with reason: keeping gnmic running in debug mode to observe performance change
  • 10:44 urbanecm@deploy2002: Finished scap sync-world: Backport for [Growth] enwiki: Release Add Link to 15% of newcomers (T386029) (duration: 09m 50s)
  • 10:44 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1036.eqiad.wmnet to cluster eqiad and group B
  • 10:43 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1036.eqiad.wmnet to cluster eqiad and group B
  • 10:41 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti1026.eqiad.wmnet
  • 10:37 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 10:37 urbanecm@deploy2002: urbanecm: Backport for [Growth] enwiki: Release Add Link to 15% of newcomers (T386029) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:34 urbanecm@deploy2002: Started scap sync-world: Backport for [Growth] enwiki: Release Add Link to 15% of newcomers (T386029)
  • 10:34 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti1026.eqiad.wmnet with reason: remove from cluster for reimage
  • 10:27 cmooney@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netflow1002.eqiad.wmnet with reason: disabling gnmic in systemd
  • 10:24 vgutierrez: restarting pybal on lvs2013, effectively enabling IPIP encapsulation for swift-fe@codfw - T385564
  • 10:23 aborrero@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) virt.cloudgw.eqiad1.wikimediacloud.org on all recursors
  • 10:22 aborrero@cumin1002: START - Cookbook sre.dns.wipe-cache virt.cloudgw.eqiad1.wikimediacloud.org on all recursors
  • 10:22 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 10:22 urbanecm@deploy2002: urbanecm: Backport for [Growth] enwiki: Release Add Link to 15% of newcomers (T386029) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:19 urbanecm@deploy2002: Started scap sync-world: Backport for [Growth] enwiki: Release Add Link to 15% of newcomers (T386029)
  • 10:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1026.eqiad.wmnet
  • 10:16 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=5; selector: name=wikikube-worker1003.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
  • 10:16 vgutierrez: restarting pybal on lvs2014 - T385564
  • 10:14 aborrero@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:14 aborrero@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw updates - aborrero@cumin1002"
  • 10:14 aborrero@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw updates - aborrero@cumin1002"
  • 10:12 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
  • 10:11 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/kartotherian: sync
  • 10:11 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/kartotherian: sync
  • 10:10 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/kartotherian: sync
  • 10:10 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/kartotherian: sync
  • 10:10 elukey@deploy2002: helmfile [staging] START helmfile.d/services/kartotherian: sync
  • 09:59 aborrero@cumin1002: START - Cookbook sre.dns.netbox
  • 09:51 vgutierrez: enabling IPIP encapsulation for swift-fe@codfw - T385564
  • 09:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti1036.eqiad.wmnet
  • 09:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1036.eqiad.wmnet
  • 08:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1036.eqiad.wmnet with OS bookworm
  • 08:44 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=5; selector: name=wikikube-worker1002.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
  • 08:42 vgutierrez: uploaded haproxy 3.1.3 to thirdparty/haproxy31 - T386796
  • 08:42 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=wikikube-worker1002*.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
  • 08:39 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
  • 08:38 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/kartotherian: sync
  • 08:38 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/kartotherian: sync
  • 08:37 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/kartotherian: sync
  • 08:37 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/kartotherian: sync
  • 08:37 elukey@deploy2002: helmfile [staging] START helmfile.d/services/kartotherian: sync
  • 08:28 moritzm: installing ruby2.7 security updates
  • 08:25 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1026.eqiad.wmnet
  • 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1026.eqiad.wmnet
  • 08:21 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1026.eqiad.wmnet
  • 08:20 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti1036.eqiad.wmnet with reason: host reimage
  • 08:20 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1036.eqiad.wmnet with reason: host reimage
  • 07:59 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1036.eqiad.wmnet with OS bookworm
  • 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ganeti1036.eqiad.wmnet
  • 07:29 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1025.eqiad.wmnet to cluster eqiad and group A
  • 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1025.eqiad.wmnet to cluster eqiad and group A
  • 07:18 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti1036.eqiad.wmnet
  • 07:05 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti1036.eqiad.wmnet with reason: remove from cluster for reimage
  • 05:34 kart_: Updated cxserver to 2025-02-20-032928-production (T386677, T386464)
  • 05:33 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 05:33 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 05:31 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 05:31 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 05:14 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 05:14 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 01:12 eileen: civicrm upgraded from 944ad623 to bdcc7de1

2025-02-19

  • 23:13 eileen: civicrm upgraded from afe59b16 to 944ad623
  • 22:36 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 22:35 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 22:35 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 22:34 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 22:31 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 22:30 fab@deploy2002: Finished deploy [airflow-dags/research@b5ce354]: (no justification provided) (duration: 00m 38s)
  • 22:30 bd808@deploy2002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 22:30 fab@deploy2002: Started deploy [airflow-dags/research@b5ce354]: (no justification provided)
  • 22:05 cjming@deploy2002: Started scap sync-world: Backport for Revert parsoid read views on frwiktionary (T356718 T386272)
  • 22:04 cjming@deploy2002: Finished scap sync-world: Backport for Lazy image loading Grade C fallback is broken (T386400) (duration: 11m 41s)
  • 21:57 cjming@deploy2002: cjming, jdlrobson: Continuing with sync
  • 21:55 cjming@deploy2002: cjming, jdlrobson: Backport for Lazy image loading Grade C fallback is broken (T386400) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:52 cjming@deploy2002: Started scap sync-world: Backport for Lazy image loading Grade C fallback is broken (T386400)
  • 21:52 cjming@deploy2002: Finished scap sync-world: Backport for Revert "Footer: Wikimedia icon should collapse at lower resolutions" (duration: 09m 48s)
  • 21:47 fab@deploy2002: Finished deploy [airflow-dags/research@b5ce354]: (no justification provided) (duration: 01m 19s)
  • 21:46 fab@deploy2002: Started deploy [airflow-dags/research@b5ce354]: (no justification provided)
  • 21:45 cjming@deploy2002: trainbranchbot, cjming: Continuing with sync
  • 21:45 cjming@deploy2002: trainbranchbot, cjming: Backport for Revert "Footer: Wikimedia icon should collapse at lower resolutions" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:42 cjming@deploy2002: Started scap sync-world: Backport for Revert "Footer: Wikimedia icon should collapse at lower resolutions"
  • 21:39 cjming@deploy2002: Finished scap sync-world: Backport for Footer: Wikimedia icon should collapse at lower resolutions (T384619), Update Search AB test config, increase bucketing/sampling rates for eu/ca, deploy to testwiki (T386734) (duration: 11m 47s)
  • 21:33 cjming@deploy2002: jdlrobson, cjming, bwang: Continuing with sync
  • 21:31 cjming@deploy2002: jdlrobson, cjming, bwang: Backport for Footer: Wikimedia icon should collapse at lower resolutions (T384619), Update Search AB test config, increase bucketing/sampling rates for eu/ca, deploy to testwiki (T386734) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:30 ejegg: payments-wiki upgraded from 7f66dea6 to 028cc28a
  • 21:28 cjming@deploy2002: Started scap sync-world: Backport for Footer: Wikimedia icon should collapse at lower resolutions (T384619), Update Search AB test config, increase bucketing/sampling rates for eu/ca, deploy to testwiki (T386734)
  • 21:17 cjming@deploy2002: Finished scap sync-world: Backport for NewUserMessage: Enable on test2wiki (duration: 10m 52s)
  • 21:10 cjming@deploy2002: tgr, cjming: Continuing with sync
  • 21:09 cjming@deploy2002: tgr, cjming: Backport for NewUserMessage: Enable on test2wiki synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:06 cjming@deploy2002: Started scap sync-world: Backport for NewUserMessage: Enable on test2wiki
  • 21:00 eileen: civicrm upgraded from 4ffa9c7c to afe59b16
  • 20:06 dduvall: jenkins successfully restarted via `systemctl restart jenkins`
  • 20:03 dduvall: restarting jenkins via systemctl due to crash
  • 20:03 fab@deploy2002: Finished deploy [airflow-dags/research@b5ce354]: (no justification provided) (duration: 00m 40s)
  • 20:03 fab@deploy2002: Started deploy [airflow-dags/research@b5ce354]: (no justification provided)
  • 20:01 fab@deploy2002: Finished deploy [airflow-dags/research@b5ce354]: (no justification provided) (duration: 00m 11s)
  • 20:01 fab@deploy2002: Started deploy [airflow-dags/research@b5ce354]: (no justification provided)
  • 19:59 fab@deploy2002: Finished deploy [airflow-dags/research@b5ce354]: (no justification provided) (duration: 00m 10s)
  • 19:59 fab@deploy2002: Started deploy [airflow-dags/research@b5ce354]: (no justification provided)
  • 19:58 fab@deploy2002: Finished deploy [airflow-dags/research@b5ce354]: (no justification provided) (duration: 00m 10s)
  • 19:58 fab@deploy2002: Started deploy [airflow-dags/research@b5ce354]: (no justification provided)
  • 19:58 dduvall: cancelling queued castor builds to unblock completed builds and jenkins restart
  • 19:57 fab@deploy2002: Finished deploy [airflow-dags/research@b5ce354]: (no justification provided) (duration: 00m 10s)
  • 19:57 fab@deploy2002: Started deploy [airflow-dags/research@b5ce354]: (no justification provided)
  • 19:35 dduvall: restarting jenkins to fix git related issues following java update (T386755)
  • 19:28 fab@deploy2002: Finished deploy [airflow-dags/research@b5ce354]: (no justification provided) (duration: 00m 46s)
  • 19:27 fab@deploy2002: Started deploy [airflow-dags/research@b5ce354]: (no justification provided)
  • 19:18 dancy@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.17 refs T382368
  • 18:54 fab@deploy2002: Finished deploy [airflow-dags/research@95b14c7]: (no justification provided) (duration: 00m 11s)
  • 18:54 fab@deploy2002: Started deploy [airflow-dags/research@95b14c7]: (no justification provided)
  • 16:52 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=wikikube-worker200.*.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
  • 16:52 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=wikikube-worker100.*.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
  • 16:38 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=wikikube-worker100.*.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
  • 16:34 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=wikikube-worker200.*.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
  • 16:33 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 16:32 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 16:32 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 16:32 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/kartotherian: sync
  • 16:32 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/kartotherian: sync
  • 16:31 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
  • 16:31 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/kartotherian: sync
  • 16:31 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 16:30 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 16:30 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/kartotherian: sync
  • 16:30 elukey@deploy2002: helmfile [staging] START helmfile.d/services/kartotherian: sync
  • 16:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1036.eqiad.wmnet
  • 16:29 bd808@deploy2002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 16:15 tgr@deploy2002: Finished scap sync-world: Backport for Revert "Add configuration options and global preference for the SUL3 rolllout", Revert "Add configuration options and global preference for the SUL3 rolllout" (duration: 12m 43s)
  • 16:09 tgr@deploy2002: tgr, trainbranchbot: Continuing with sync
  • 16:06 tgr@deploy2002: tgr, trainbranchbot: Backport for Revert "Add configuration options and global preference for the SUL3 rolllout", Revert "Add configuration options and global preference for the SUL3 rolllout" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:03 tgr@deploy2002: Started scap sync-world: Backport for Revert "Add configuration options and global preference for the SUL3 rolllout", Revert "Add configuration options and global preference for the SUL3 rolllout"
  • 15:32 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=wikikube-worker2001.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
  • 15:29 tgr@deploy2002: Started scap sync-world: Backport for Add configuration options and global preference for the SUL3 rolllout (T384549 T377144 T384552 T384215), Add configuration options and global preference for the SUL3 rolllout (T384549 T377144 T384552 T384215)
  • 15:26 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 15:25 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 15:25 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 15:24 gengh@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 15:24 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 15:23 gengh@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 15:23 gengh@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 15:22 gengh@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 15:21 gengh@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 15:20 gengh@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 15:18 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:sessionstore: Apply JDK 11 update - eevans@cumin1002
  • 15:14 gengh@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 15:13 gengh@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 15:13 gengh@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 15:12 gengh@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 15:11 gengh@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 15:10 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Growth: increase minimum tasks per topic on idwiki; ruwiki => default (T385343), fix(Surfacing): make instrumentation platform-aware (T386490), feat(Surfacing): track performance metrics with statslib (T386490), fix(surfacing): add dependency for link-icon in popup header
  • 15:10 gengh@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 15:05 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 15:04 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 15:04 ottomata: upgrading eventgate-analytics in eqiad to node20 - T383814
  • 15:03 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, migr: Continuing with sync
  • {{safesubst:SAL entry|1=15:01 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, migr: Backport for Growth: increase minimum tasks per topic on idwiki; ruwiki => default (T385343), fix(Surfacing): make instrumentation platform-aware (T386490), feat(Surfacing): track performance metrics with statslib (T386490), [[gerrit:1120643|fix(surfacing): add dependency for link-icon in popup heade}}
  • 14:58 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Growth: increase minimum tasks per topic on idwiki; ruwiki => default (T385343), fix(Surfacing): make instrumentation platform-aware (T386490), feat(Surfacing): track performance metrics with statslib (T386490), fix(surfacing): add dependency for link-icon in popup header
  • 14:55 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for knwiki, knwikisource, tcywikisource: add confirmed user usergroup (T386781) (duration: 11m 41s)
  • 14:49 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, anzx: Continuing with sync
  • 14:46 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, anzx: Backport for knwiki, knwikisource, tcywikisource: add confirmed user usergroup (T386781) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:43 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for knwiki, knwikisource, tcywikisource: add confirmed user usergroup (T386781)
  • 14:42 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet
  • 14:42 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:sessionstore: Apply JDK 11 update - eevans@cumin1002
  • 14:42 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Lift IP cap for edit-a-thon on 2025-02-26 (T386793) (duration: 10m 24s)
  • 14:36 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet
  • 14:35 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, anzx: Continuing with sync
  • 14:34 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, anzx: Backport for Lift IP cap for edit-a-thon on 2025-02-26 (T386793) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:31 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Lift IP cap for edit-a-thon on 2025-02-26 (T386793)
  • 14:28 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=wikikube-worker200.*.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
  • 14:28 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=maps2006.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
  • 14:27 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=maps2005.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
  • 14:27 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=wikikube-worker100.*.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
  • 14:26 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=maps1005.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
  • 14:26 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=maps1006.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
  • 14:26 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Introduce config setting to disable default event-organizer group (T386290), enwiki, mswikt: Enable the CampaignEvents extension (T386290 T386538) (duration: 16m 58s)
  • 14:19 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, daimona: Continuing with sync
  • 14:12 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, daimona: Backport for Introduce config setting to disable default event-organizer group (T386290), enwiki, mswikt: Enable the CampaignEvents extension (T386290 T386538) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:09 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Introduce config setting to disable default event-organizer group (T386290), enwiki, mswikt: Enable the CampaignEvents extension (T386290 T386538)
  • 14:08 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_esams and A:cp
  • 14:00 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 13:59 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 13:45 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_esams and A:cp
  • 13:42 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:42 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:41 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudgw1002.eqiad.wmnet
  • 13:41 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:41 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 13:41 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:41 moritzm: installing libtasn1-6 security updates
  • 13:41 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:41 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 13:37 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 13:32 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudgw1002.eqiad.wmnet
  • 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet
  • 13:31 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudgw1001.eqiad.wmnet
  • 13:31 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:31 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 13:30 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 13:26 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 13:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1025.eqiad.wmnet with OS bookworm
  • 13:11 aborrero@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) vlan1120.cloudgw1003.eqiad1.wikimediacloud.org on all recursors
  • 13:11 aborrero@cumin1002: START - Cookbook sre.dns.wipe-cache vlan1120.cloudgw1003.eqiad1.wikimediacloud.org on all recursors
  • 13:09 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudgw1003.eqiad.wmnet with OS bullseye
  • 12:54 aborrero@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:54 aborrero@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw updates - aborrero@cumin1002"
  • 12:54 aborrero@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw updates - aborrero@cumin1002"
  • 12:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1025.eqiad.wmnet with reason: host reimage
  • 12:51 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1025.eqiad.wmnet with reason: host reimage
  • 12:50 aborrero@cumin1002: START - Cookbook sre.dns.netbox
  • 12:50 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1003.eqiad.wmnet with reason: host reimage
  • 12:46 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1003.eqiad.wmnet with reason: host reimage
  • 12:46 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_esams and A:cp
  • 12:31 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1025.eqiad.wmnet with OS bookworm
  • 12:30 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudgw1003.eqiad.wmnet with OS bullseye
  • 12:29 arnaudb@dns1004: END - running authdns-update
  • 12:28 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudgw1001.eqiad.wmnet with OS bookworm
  • 12:27 arnaudb@dns1004: START - running authdns-update
  • 12:27 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudgw1003.eqiad.wmnet with OS bullseye
  • 12:22 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_esams and A:cp
  • 12:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1036.eqiad.wmnet
  • 12:10 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad and A:cp
  • 12:10 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: host reimage
  • 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of kubestagemaster1004.eqiad.wmnet to plain
  • 12:09 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of kubestagemaster1004.eqiad.wmnet to plain
  • 12:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1036.eqiad.wmnet
  • 12:07 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1036.eqiad.wmnet
  • 12:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of kubestagemaster1004.eqiad.wmnet to drbd
  • 12:06 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: host reimage
  • 12:01 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ganeti1025.eqiad.wmnet
  • 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet
  • 11:53 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad and A:cp
  • 11:51 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of kubestagemaster1004.eqiad.wmnet to drbd
  • 11:49 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudgw1001.eqiad.wmnet with OS bookworm
  • 11:49 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad and A:cp
  • 11:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet
  • 11:48 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudgw1003.eqiad.wmnet with OS bullseye
  • 11:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1036.eqiad.wmnet
  • 11:46 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 11:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1036.eqiad.wmnet
  • 11:34 ladsgroup@deploy2002: Finished scap sync-world: Backport for ChangeTagsStore: Lengthen cache times (T384921), ChangeTagsStore: Lengthen cache times (T384921) (duration: 10m 16s)
  • 11:33 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti1025.eqiad.wmnet
  • 11:29 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti1025.eqiad.wmnet with reason: remove from cluster for reimage
  • 11:28 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad and A:cp
  • 11:27 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 11:27 ladsgroup@deploy2002: ladsgroup: Backport for ChangeTagsStore: Lengthen cache times (T384921), ChangeTagsStore: Lengthen cache times (T384921) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 11:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet
  • 11:24 ladsgroup@deploy2002: Started scap sync-world: Backport for ChangeTagsStore: Lengthen cache times (T384921), ChangeTagsStore: Lengthen cache times (T384921)
  • 11:09 fabfur: upgrading haproxykafka to 0.3.5 on all DCs (T374128)
  • 10:53 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=maps1006.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
  • 10:53 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=10; selector: name=wikikube-worker1002.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
  • 10:52 elukey@puppetserver1001: conftool action : set/weight=10; selector: name=wikikube-worker1002.eqiad.wmnet.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
  • 10:48 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=maps1005.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
  • 10:45 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=maps2005.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
  • 10:45 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=maps2006.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
  • 10:42 elukey@puppetserver1001: conftool action : set/pooled=no; selector: name=maps2005.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
  • 10:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_drmrs and A:cp
  • 10:24 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_drmrs and A:cp
  • 10:10 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=30; selector: name=wikikube-worker1002.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
  • 10:09 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=30; selector: name=wikikube-worker1138.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
  • 10:09 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=30; selector: name=wikikube-worker1138.codfw.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
  • 10:00 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_drmrs and A:cp
  • 10:00 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_drmrs and A:cp
  • 10:00 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_codfw and A:cp
  • 09:58 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_codfw and A:cp
  • 09:51 klausman@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1002
  • 09:37 Emperor: restart envoy/swift on ms-fe201[2-4]
  • 09:36 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw and A:cp
  • 09:36 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_codfw and A:cp
  • 09:35 Emperor: restart envoy/swift on ms-fe1013
  • 09:33 klausman@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1002
  • 09:33 klausman@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1002
  • 09:28 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqsin and A:cp
  • 09:27 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqsin and A:cp
  • 09:18 dcausse: closing the UTC morning backport window
  • 09:17 dcausse@deploy2002: Finished scap sync-world: Backport for Do not update the search index if the assessment did not change, Do not update the search index if the assessment did not change (duration: 09m 51s)
  • 09:16 klausman@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1002
  • 09:10 dcausse@deploy2002: dcausse: Continuing with sync
  • 09:10 dcausse@deploy2002: dcausse: Backport for Do not update the search index if the assessment did not change, Do not update the search index if the assessment did not change synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:10 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=wikikube-worker100.*.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
  • 09:09 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=wikikube-worker200.*.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
  • 09:07 dcausse@deploy2002: Started scap sync-world: Backport for Do not update the search index if the assessment did not change, Do not update the search index if the assessment did not change
  • 09:07 fabfur: upgrading haproxykafka to 0.3.5 on ulsfo (T374128)
  • 09:02 elukey: elukey@cumin1002:~$ sudo cumin --m async 'aux-k8s-etcd*' 'systemctl stop etcd-backup.timer etcd-backup.service' 'rm /lib/systemd/system/etcd-backup.service /lib/systemd/system/etcd-backup.timer' 'systemctl daemon-reload' - T385727
  • 09:01 dcausse@deploy2002: Finished scap sync-world: Backport for testwiki: enable surfacing structured task experiment (T386739) (duration: 10m 41s)
  • 08:58 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqsin and A:cp
  • 08:58 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqsin and A:cp
  • 08:54 dcausse@deploy2002: migr, dcausse: Continuing with sync
  • 08:53 dcausse@deploy2002: migr, dcausse: Backport for testwiki: enable surfacing structured task experiment (T386739) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:50 dcausse@deploy2002: Started scap sync-world: Backport for testwiki: enable surfacing structured task experiment (T386739)
  • 08:47 dcausse@deploy2002: Finished scap sync-world: Backport for satwiktionary: add sitename, timezone, projectnamespace (T386631), madwiki: add namespace aliases (T382087), uzwikiquote: add logos (T386569) (duration: 13m 00s)
  • 08:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts moscovium.eqiad.wmnet
  • 08:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: moscovium.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 08:46 arnaudb@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: moscovium.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 08:45 fabfur: upgrading haproxykafka to 0.3.5 on cp4037 to test new feature (T374128)
  • 08:45 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru and A:cp
  • 08:42 arnaudb@cumin1002: START - Cookbook sre.dns.netbox
  • 08:42 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru and A:cp
  • 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet
  • 08:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet
  • 08:41 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet
  • 08:40 fabfur: upgrading haproxykafka package on apt repo to 0.3.5 (T374128)
  • 08:40 dcausse@deploy2002: dcausse, anzx: Continuing with sync
  • 08:37 arnaudb@cumin1002: START - Cookbook sre.hosts.decommission for hosts moscovium.eqiad.wmnet
  • 08:37 dcausse@deploy2002: dcausse, anzx: Backport for satwiktionary: add sitename, timezone, projectnamespace (T386631), madwiki: add namespace aliases (T382087), uzwikiquote: add logos (T386569) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:33 dcausse@deploy2002: Started scap sync-world: Backport for satwiktionary: add sitename, timezone, projectnamespace (T386631), madwiki: add namespace aliases (T382087), uzwikiquote: add logos (T386569)
  • 08:19 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru and A:cp
  • 08:19 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru and A:cp
  • 08:15 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo and not P{cp4044.*} and A:cp
  • 08:11 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo and not P{cp4052.*} and A:cp
  • 08:08 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1033.eqiad.wmnet to cluster eqiad and group D
  • 08:04 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1033.eqiad.wmnet to cluster eqiad and group D
  • 08:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1023.eqiad.wmnet to cluster eqiad and group A
  • 08:03 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1023.eqiad.wmnet to cluster eqiad and group A
  • 07:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1023.eqiad.wmnet
  • 07:54 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo and not P{cp4044.*} and A:cp
  • 07:53 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo and not P{cp4052.*} and A:cp
  • 07:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1023.eqiad.wmnet
  • 07:49 moritzm: installing openjdk-11 security updates
  • 07:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1033.eqiad.wmnet
  • 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1033.eqiad.wmnet
  • 07:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1023.eqiad.wmnet with OS bookworm
  • 07:34 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp4052.ulsfo.wmnet,cp4044.ulsfo.wmnet} and A:cp
  • 07:29 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp4052.ulsfo.wmnet,cp4044.ulsfo.wmnet} and A:cp
  • 07:24 vgutierrez: upload haproxy 2.8.14 to apt.wm.o (bullseye-wikimedia) - T386751
  • 07:17 kart_: Updated MinT to 2025-02-05-115716-production (T383750, T385552)
  • 07:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1023.eqiad.wmnet with reason: host reimage
  • 07:12 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1023.eqiad.wmnet with reason: host reimage
  • 07:05 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 06:52 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1023.eqiad.wmnet with OS bookworm
  • 06:50 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 06:38 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 06:28 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 06:12 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 06:06 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 02:21 zabe@deploy2002: Finished scap sync-world: T386619 (duration: 09m 44s)
  • 02:11 zabe@deploy2002: Started scap sync-world: T386619
  • 02:07 zabe@deploy2002: Finished scap sync-world: Backport for Activate satwiktionary (T386619), Increase revision-slots cache expiry back to default for 3 wikis (T183490) (duration: 10m 55s)
  • 02:01 zabe@deploy2002: zabe: Continuing with sync
  • 01:59 zabe@deploy2002: zabe: Backport for Activate satwiktionary (T386619), Increase revision-slots cache expiry back to default for 3 wikis (T183490) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 01:56 zabe@deploy2002: Started scap sync-world: Backport for Activate satwiktionary (T386619), Increase revision-slots cache expiry back to default for 3 wikis (T183490)
  • 01:51 zabe@deploy2002: Finished scap sync-world: Backport for Prepare satwiktionary (T386619) (duration: 09m 45s)
  • 01:44 zabe@deploy2002: zabe: Continuing with sync
  • 01:44 zabe@deploy2002: zabe: Backport for Prepare satwiktionary (T386619) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 01:41 zabe@deploy2002: Started scap sync-world: Backport for Prepare satwiktionary (T386619)

2025-02-18

  • 23:51 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp5020.eqsin.wmnet,service=(cdn|ats-be) [reason: repooling; resolved service errors]
  • 23:48 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Apply JDK 11 update - eevans@cumin1002
  • 23:47 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp5020.eqsin.wmnet,service=(cdn|ats-be)
  • 22:51 tchin@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 22:50 tchin@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 22:50 tchin@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 22:49 tchin@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 22:48 tchin@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 22:48 tchin@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 22:47 tchin@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 22:47 tchin@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 22:46 tchin@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 22:46 eileen: civicrm upgraded from 73758e67 to 4ffa9c7c
  • 22:46 tchin@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 22:45 tchin@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 22:45 tchin@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 22:43 tchin@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 22:43 tchin@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 22:31 jdrewniak@deploy2002: Finished scap sync-world: Backport for Fix session tick logging (T386229) (duration: 11m 13s)
  • 22:24 jdrewniak@deploy2002: jdrewniak, jdlrobson: Continuing with sync
  • 22:23 jdrewniak@deploy2002: jdrewniak, jdlrobson: Backport for Fix session tick logging (T386229) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:20 jdrewniak@deploy2002: Started scap sync-world: Backport for Fix session tick logging (T386229)
  • 22:00 cjming@deploy2002: Finished scap sync-world: Backport for Add "suppressredirect" to "editor" on Russian Wikisource (T386367) (duration: 09m 51s)
  • 21:53 cjming@deploy2002: nmw03, cjming: Continuing with sync
  • 21:53 cjming@deploy2002: nmw03, cjming: Backport for Add "suppressredirect" to "editor" on Russian Wikisource (T386367) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:50 cjming@deploy2002: Started scap sync-world: Backport for Add "suppressredirect" to "editor" on Russian Wikisource (T386367)
  • 21:48 cjming@deploy2002: Finished scap sync-world: Backport for Allow sysops to add/remove "confirmed" on English Wikivoyage (T386313) (duration: 09m 48s)
  • 21:41 cjming@deploy2002: cjming, nmw03: Continuing with sync
  • 21:41 cjming@deploy2002: cjming, nmw03: Backport for Allow sysops to add/remove "confirmed" on English Wikivoyage (T386313) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:40 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Apply JDK 11 update - eevans@cumin1002
  • 21:38 cjming@deploy2002: Started scap sync-world: Backport for Allow sysops to add/remove "confirmed" on English Wikivoyage (T386313)
  • 21:37 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Apply JDK 11 update - eevans@cumin1002
  • 21:35 cjming@deploy2002: Finished scap sync-world: Backport for Follow-up Iccb97796: Remove ru.wiki from DiscussionTools visual enhancements deployment (T379102) (duration: 11m 31s)
  • 21:28 cjming@deploy2002: esanders, cjming: Continuing with sync
  • 21:26 cjming@deploy2002: esanders, cjming: Backport for Follow-up Iccb97796: Remove ru.wiki from DiscussionTools visual enhancements deployment (T379102) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:23 cjming@deploy2002: Started scap sync-world: Backport for Follow-up Iccb97796: Remove ru.wiki from DiscussionTools visual enhancements deployment (T379102)
  • 21:20 cjming@deploy2002: Finished scap sync-world: Backport for Optimize CirrusSearch index update to trigger only when necessary, Optimize CirrusSearch index update to trigger only when necessary (duration: 09m 36s)
  • 21:13 cjming@deploy2002: dcausse, cjming: Continuing with sync
  • 21:13 cjming@deploy2002: dcausse, cjming: Backport for Optimize CirrusSearch index update to trigger only when necessary, Optimize CirrusSearch index update to trigger only when necessary synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:10 cjming@deploy2002: Started scap sync-world: Backport for Optimize CirrusSearch index update to trigger only when necessary, Optimize CirrusSearch index update to trigger only when necessary
  • 20:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2243.codfw.wmnet with OS bookworm
  • 20:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 19:59 dancy@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.17 refs T382368
  • 19:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2243.codfw.wmnet with reason: host reimage
  • 19:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2243.codfw.wmnet with reason: host reimage
  • 19:26 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Apply JDK 11 update - eevans@cumin1002
  • 19:12 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2243.codfw.wmnet with OS bookworm
  • 19:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2243.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 19:01 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2243.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 19:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2243.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 19:00 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2243.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:59 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2243.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2243.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:14 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-eqiad: Apply JDK 11 update - eevans@cumin1002
  • 18:11 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:11 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:05 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2243.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:05 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2243.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2243.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:36 mutante: LDAP/mwmaint1002: changed email address for LDAP user jonkolbert (T386473)
  • 17:30 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 17:30 ottomata: upgrading eventgate-analytics in codfw to node20 (will let this simmer for a day before proceeding to eqiad) - T383814
  • 17:29 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 17:25 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 17:25 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 17:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2243.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:08 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2243.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2243.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:07 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=wikikube-worker200.*.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
  • 17:07 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=wikikube-worker100.*.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
  • 16:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2243.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:55 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2243.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:45 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-eqiad: Apply JDK 11 update - eevans@cumin1002
  • 16:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-codfw: Apply JDK 11 update - eevans@cumin1002
  • 16:30 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=10; selector: name=wikikube-worker200.*.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
  • 16:29 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=10; selector: name=wikikube-worker200.*.eqiad.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
  • 16:29 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=10; selector: name=wikikube-worker100.*.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
  • 16:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:12 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:09 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=20; selector: name=wikikube-worker100.*.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
  • 16:07 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=20; selector: name=wikikube-worker100*.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
  • 16:07 brennen@deploy2002: Finished deploy [phabricator/deployment@c1262ac]: deploy phab1004 for T386522 (duration: 01m 17s)
  • 16:06 brennen@deploy2002: Started deploy [phabricator/deployment@c1262ac]: deploy phab1004 for T386522
  • 16:05 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: phab deploy
  • 16:05 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=20; selector: name=wikikube-worker2001.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
  • 16:05 brennen@deploy2002: Finished deploy [phabricator/deployment@c1262ac]: deploy phab2002 for T386522 (duration: 00m 28s)
  • 16:05 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=20; selector: name=wikikube-worker2005.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
  • 16:05 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=20; selector: name=wikikube-worker2004.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
  • 16:05 brennen@deploy2002: Started deploy [phabricator/deployment@c1262ac]: deploy phab2002 for T386522
  • 16:05 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: phab deploy
  • 16:00 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=20; selector: name=wikikube-worker2003.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
  • 16:00 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=20; selector: name=wikikube-worker2002.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
  • 15:51 tappof@dns1004: END - running authdns-update
  • 15:49 tappof@dns1004: START - running authdns-update
  • 15:47 sukhe: unarchive debs/dnsdist repository on Gerrit
  • 15:46 tappof@dns1004: END - running authdns-update
  • 15:45 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 15:45 Ammar: T386711 Ran mwscript-k8s --comment="T386711" -f -- extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=nlwiktionary --logwiki=metawiki 'イム乙ノの' 'Renamed user 19841400c4049534bc11b1ec9a011fb8'
  • 15:44 tappof@dns1004: START - running authdns-update
  • 15:41 tappof: performing grafana failback (grafana1002 is becoming the new active host) T385282
  • 15:35 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 15:28 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 15:27 otto@deploy2002: Finished scap sync-world: Backport for mediawiki.org/beacon/event - don't raise error on failure (T383939 T353817) (duration: 12m 12s)
  • 15:20 otto@deploy2002: otto: Continuing with sync
  • 15:20 otto@deploy2002: otto: Backport for mediawiki.org/beacon/event - don't raise error on failure (T383939 T353817) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:17 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 15:17 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough
  • 15:15 otto@deploy2002: Started scap sync-world: Backport for mediawiki.org/beacon/event - don't raise error on failure (T383939 T353817)
  • 15:15 ottomata: deploying mediawiki.org/beacon/event - don't raise error on failure
  • 15:11 Lucas_WMDE: UTC afternoon backport+config window done
  • 15:10 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for i18n: Split hak.json system messages (T371883) (duration: 15m 08s)
  • 15:04 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough
  • 15:04 lucaswerkmeister-wmde@deploy2002: wsung, lucaswerkmeister-wmde: Continuing with sync
  • 15:02 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-codfw: Apply JDK 11 update - eevans@cumin1002
  • 15:01 sukhe: cumin A:wikidough 'run-puppet-agent'
  • 15:00 lucaswerkmeister-wmde@deploy2002: wsung, lucaswerkmeister-wmde: Backport for i18n: Split hak.json system messages (T371883) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:55 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for i18n: Split hak.json system messages (T371883)
  • 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1033.eqiad.wmnet with OS bookworm
  • 14:38 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ganeti1023.eqiad.wmnet
  • 14:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1023.eqiad.wmnet
  • 14:38 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=10; selector: name=wikikube-worker1006.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
  • 14:38 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=10; selector: name=wikikube-worker1005.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
  • 14:37 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=10; selector: name=wikikube-worker2005.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
  • 14:37 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=10; selector: name=wikikube-worker2004.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
  • 14:35 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Rename the `tmpEnableMulLanguageCode` flag to `enableMulLanguageCode` (T330217) (duration: 12m 50s)
  • 14:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1033.eqiad.wmnet with reason: host reimage
  • 14:31 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1033.eqiad.wmnet with reason: host reimage
  • 14:29 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
  • 14:27 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Rename the `tmpEnableMulLanguageCode` flag to `enableMulLanguageCode` (T330217) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1023.eqiad.wmnet
  • 14:22 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Rename the `tmpEnableMulLanguageCode` flag to `enableMulLanguageCode` (T330217)
  • 14:21 urbanecm@deploy2002: Finished scap sync-world: Backport for Deploy DiscussionTools visual enhancements to top 10 wikis (exc. enwiki, ruwiki & zhwiki) (T379102), cswiki beta: A/B test setup for surfacing structured tasks (T385903) (duration: 16m 29s)
  • 14:14 urbanecm@deploy2002: urbanecm, esanders, sgimeno: Continuing with sync
  • 14:12 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1033.eqiad.wmnet with OS bookworm
  • 14:10 urbanecm@deploy2002: urbanecm, esanders, sgimeno: Backport for Deploy DiscussionTools visual enhancements to top 10 wikis (exc. enwiki, ruwiki & zhwiki) (T379102), cswiki beta: A/B test setup for surfacing structured tasks (T385903) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:05 urbanecm@deploy2002: Started scap sync-world: Backport for Deploy DiscussionTools visual enhancements to top 10 wikis (exc. enwiki, ruwiki & zhwiki) (T379102), cswiki beta: A/B test setup for surfacing structured tasks (T385903)
  • 14:02 tappof@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on grafana1002.eqiad.wmnet with reason: expand the root partition and fs on grafana1002
  • 13:56 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti1023.eqiad.wmnet
  • 13:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1023.eqiad.wmnet
  • 13:42 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Java sec updates - jmm@cumin2002
  • 13:21 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Java sec updates - jmm@cumin2002
  • 13:18 taavi@deploy2002: Finished scap sync-world: Backport for wikitech: Remove useless conditional (duration: 12m 15s)
  • 13:15 moritzm: installing openjdk-11 security updates
  • 13:11 taavi@deploy2002: taavi: Continuing with sync
  • 13:11 taavi@deploy2002: taavi: Backport for wikitech: Remove useless conditional synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:06 taavi@deploy2002: Started scap sync-world: Backport for wikitech: Remove useless conditional
  • 12:41 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti1023.eqiad.wmnet with reason: remove from cluster for reimage
  • 12:40 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti1033.eqiad.wmnet with reason: remove from cluster for reimage
  • 12:01 moritzm: installing openjdk-17 security updates
  • 11:01 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ganeti1033.eqiad.wmnet
  • 11:01 taavi@deploy2002: Finished scap sync-world: Backport for wikitech: Unset $wgEnableCreativeCommonsRdf (duration: 15m 45s)
  • 11:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1033.eqiad.wmnet
  • 10:51 taavi@deploy2002: taavi: Continuing with sync
  • 10:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1033.eqiad.wmnet
  • 10:51 taavi@deploy2002: taavi: Backport for wikitech: Unset $wgEnableCreativeCommonsRdf synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:50 tappof@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on grafana1002.eqiad.wmnet with reason: expand the root partition and fs on grafana1002
  • 10:45 taavi@deploy2002: Started scap sync-world: Backport for wikitech: Unset $wgEnableCreativeCommonsRdf
  • 10:24 moritzm: installing libpgjava security updates
  • 10:14 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti1033.eqiad.wmnet
  • 10:14 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:14 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 10:10 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:09 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 10:09 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:09 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 10:08 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:08 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 10:07 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:07 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 10:07 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:07 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:59 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:58 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:58 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:58 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:58 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:58 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:57 tappof@dns1004: END - running authdns-update
  • 09:55 tappof@dns1004: START - running authdns-update
  • 09:53 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ganeti1033.eqiad.wmnet
  • 09:47 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti1033.eqiad.wmnet
  • 09:42 tappof: performing grafana failover (grafana2001 is becoming the new active host) T385282
  • 09:42 kamila@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker1123.eqiad.wmnet
  • 09:42 kamila@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker1123.eqiad.wmnet
  • 09:35 elukey@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ganeti1033.eqiad.wmnet
  • 09:32 kamila@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1148-1153].eqiad.wmnet
  • 09:32 kamila@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1148-1153].eqiad.wmnet
  • 09:26 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti1033.eqiad.wmnet
  • 08:57 urbanecm@deploy2002: Finished scap sync-world: Backport for Rename global variable from the WikimediaIncubator extension (duration: 24m 32s)
  • 08:50 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 08:50 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 08:48 urbanecm@deploy2002: urbanecm, jhsoby: Continuing with sync
  • 08:39 urbanecm@deploy2002: urbanecm, jhsoby: Backport for Rename global variable from the WikimediaIncubator extension synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:33 urbanecm@deploy2002: Started scap sync-world: Backport for Rename global variable from the WikimediaIncubator extension
  • 07:18 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1023.eqiad.wmnet
  • 07:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1023.eqiad.wmnet
  • 07:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1023.eqiad.wmnet
  • 05:02 mwpresync@deploy2002: Pruned MediaWiki: 1.44.0-wmf.14 (duration: 02m 56s)
  • 04:51 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.17 refs T382368 (duration: 48m 21s)
  • 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.17 refs T382368

2025-02-17

  • away: UTC late deploys done
  • 21:48 tgr@deploy2002: Finished scap sync-world: Backport for auth: Log actual error message for action=login, Lower log level of SUL3 start/end events (T377261) (duration: 15m 50s)
  • 21:41 tgr@deploy2002: tgr: Continuing with sync
  • 21:35 tgr@deploy2002: tgr: Backport for auth: Log actual error message for action=login, Lower log level of SUL3 start/end events (T377261) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:32 tgr@deploy2002: Started scap sync-world: Backport for auth: Log actual error message for action=login, Lower log level of SUL3 start/end events (T377261)
  • 21:14 urbanecm@deploy2002: Finished scap sync-world: Backport for Restrict unfuzzy on Commons (T386561), Re-enable test experiment for testwiki for upcoming demos (T383801) (duration: 11m 59s)
  • 21:07 urbanecm@deploy2002: urbanecm, pppery, cjming: Continuing with sync
  • 21:06 urbanecm@deploy2002: urbanecm, pppery, cjming: Backport for Restrict unfuzzy on Commons (T386561), Re-enable test experiment for testwiki for upcoming demos (T383801) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for Restrict unfuzzy on Commons (T386561), Re-enable test experiment for testwiki for upcoming demos (T383801)
  • 21:01 urbanecm@deploy2002: Finished scap sync-world: Backport for Growth: increase minimum tasks per topic for 4 more wikis (T386248) (duration: 11m 05s)
  • 20:50 urbanecm@deploy2002: Started scap sync-world: Backport for Growth: increase minimum tasks per topic for 4 more wikis (T386248)
  • 19:58 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:58 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:58 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:56 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:25 zabe: zabe@mwmaint2002:~$ cat /home/zabe/group2.dblist | xargs -I{} bash -c "echo {}; mwscript extensions/WikimediaMaintenance/migrateESRefToContentTableStage2.php {} --delete /home/zabe/text_table_cleanup/{} --sleep 0.3" # T183490
  • 16:57 fabfur: removed systemd override for haproxykafka on cp4037 (T378758)
  • 16:49 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage2003.codfw.wmnet
  • 16:44 jayme@cumin1002: START - Cookbook sre.hosts.reboot-single for host kubestage2003.codfw.wmnet
  • 16:11 krinkle@deploy2002: Finished scap sync-world: Backport for docroot: Add experimental assetlinks.json from and to various domains (T385520) (duration: 12m 53s)
  • 16:04 krinkle@deploy2002: krinkle: Continuing with sync
  • 16:02 krinkle@deploy2002: krinkle: Backport for docroot: Add experimental assetlinks.json from and to various domains (T385520) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:59 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on relforge1004.eqiad.wmnet with reason: T380752
  • 15:59 krinkle@deploy2002: Started scap sync-world: Backport for docroot: Add experimental assetlinks.json from and to various domains (T385520)
  • 15:41 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet
  • 15:31 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet
  • 15:13 elukey: restart all kartotherian services on maps1* - high unavalability
  • 15:07 Lucas_WMDE: UTC afternoon backport+config window done
  • 15:02 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Enable fixed Wikibase RDF on Test Wikidata (T384344) (duration: 12m 24s)
  • 14:56 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=maps1006.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
  • 14:56 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=maps1005.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
  • 14:56 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=maps2005.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
  • 14:56 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=maps2006.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
  • 14:55 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
  • 14:54 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Enable fixed Wikibase RDF on Test Wikidata (T384344) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:50 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Enable fixed Wikibase RDF on Test Wikidata (T384344)
  • 14:49 tgr@deploy2002: Finished scap sync-world: Backport for Suppress login audit hook in local leg of SUL3 authentication (T385574 T385572) (duration: 27m 43s)
  • 14:42 tgr@deploy2002: tgr: Continuing with sync
  • 14:26 tappof@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on grafana2001.codfw.wmnet with reason: expand the root partition and fs on grafana2001
  • 14:26 tgr@deploy2002: tgr: Backport for Suppress login audit hook in local leg of SUL3 authentication (T385574 T385572) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:21 tgr@deploy2002: Started scap sync-world: Backport for Suppress login audit hook in local leg of SUL3 authentication (T385574 T385572)
  • 14:15 elukey@puppetserver1001: conftool action : set/pooled=no; selector: name=maps2006.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
  • 14:15 elukey@puppetserver1001: conftool action : set/pooled=no; selector: name=maps2005.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
  • 14:15 elukey@puppetserver1001: conftool action : set/pooled=no; selector: name=maps1005.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
  • 14:12 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/kartotherian: sync
  • 14:12 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/kartotherian: sync
  • 14:09 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
  • 14:09 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/kartotherian: sync
  • 12:51 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=10; selector: name=wikikube-worker1004.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
  • 12:51 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=10; selector: name=wikikube-worker1003.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
  • 12:50 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=10; selector: name=wikikube-worker2003.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
  • 12:49 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=10; selector: name=wikikube-worker2002.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
  • 12:35 urbanecm@deploy2002: Finished scap sync-world: Backport for [Growth] enwiki: Enable mentorship for 100% of new accounts (T384505) (duration: 21m 37s)
  • 12:14 urbanecm@deploy2002: Started scap sync-world: Backport for [Growth] enwiki: Enable mentorship for 100% of new accounts (T384505)
  • 12:14 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 12:13 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 11:59 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 11:58 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 09:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 09:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 09:09 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 09:09 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'logo-detection' for release 'main' .
  • 09:04 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 09:04 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 09:03 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 09:03 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 09:03 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 09:03 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 09:03 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:01 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 08:56 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 08:54 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
  • 08:52 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 08:51 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 08:51 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 08:50 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 08:49 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:48 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 08:48 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 08:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.

2025-02-15

  • 03:39 cmooney@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on cr2-magru with reason: IBGP instability from cr1 to cr2 in magru causing ping faulures from alert1002

2025-02-14

  • 18:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade gitlab
  • 16:05 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
  • 16:04 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
  • 16:04 ottomata: roll restart eventgate-main in codfw for T386138 -- the previous command roll restarted in eqiad.
  • 16:02 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
  • 16:01 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
  • 16:00 ottomata: roll restart eventgate-main in codfw for T386138
  • 15:55 logmsgbot: Roses are red / Violets are blue / If you hack on MediaWiki / Wikimedians <3 you! #ilovefs #wmhack
  • 14:32 arnaudb@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade gitlab
  • 14:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade gitlab
  • 14:30 arnaudb@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade gitlab
  • 14:23 arnaudb@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade gitlab
  • 14:19 arnaudb@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade gitlab
  • 14:18 arnaudb@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=93) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade gitlab
  • 14:18 arnaudb@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade gitlab

2025-02-13

  • 22:44 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on relforge[1003-1007].eqiad.wmnet with reason: T386357
  • 22:38 zabe: zabe@mwmaint2002:~$ mwscript extensions/WikimediaMaintenance/migrateESRefToContentTable.php ttwiki --skip /home/zabe/text_table_cleanup/ttwiki --dump /home/zabe/text_table_dump/ttwiki --sleep 0.5 --start 867501 # T183490
  • 22:15 rzl: rzl@idp2004:~$ sudo systemctl restart tomcat10
  • away: UTC late deploys done
  • 22:06 tgr@deploy2002: Finished scap sync-world: Backport for auth: Use POST trxProfiler expectations during return/reauth (T385566), Track the number of started / finished SUL3 flows (T377261), Do not preserve 'sul3-action' when restarting authentication (T364866) (duration: 15m 03s)
  • 22:01 zabe: zabe@mwmaint2002:~$ mwscript extensions/WikimediaMaintenance/migrateESRefToContentTable.php diqwiki --skip /home/zabe/text_table_cleanup/diqwiki --dump /home/zabe/text_table_dump/diqwiki --sleep 0.5 --start 318769 # T183490
  • 22:00 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
  • 22:00 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
  • 21:59 tgr@deploy2002: tgr: Continuing with sync
  • 21:53 tgr@deploy2002: tgr: Backport for auth: Use POST trxProfiler expectations during return/reauth (T385566), Track the number of started / finished SUL3 flows (T377261), Do not preserve 'sul3-action' when restarting authentication (T364866) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:50 tgr@deploy2002: Started scap sync-world: Backport for auth: Use POST trxProfiler expectations during return/reauth (T385566), Track the number of started / finished SUL3 flows (T377261), Do not preserve 'sul3-action' when restarting authentication (T364866)
  • 21:47 tgr@deploy2002: Finished scap sync-world: Backport for Fix name of ABTestEnrollment configuration (T384019) (duration: 19m 24s)
  • 21:41 tgr@deploy2002: jdlrobson, tgr: Continuing with sync
  • 21:37 eileen: civicrm upgraded from a62ed046 to 0cbf8b0a
  • 21:31 tgr@deploy2002: jdlrobson, tgr: Backport for Fix name of ABTestEnrollment configuration (T384019) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:28 tgr@deploy2002: Started scap sync-world: Backport for Fix name of ABTestEnrollment configuration (T384019)
  • 21:26 tgr@deploy2002: Finished scap sync-world: Backport for Turn on Parsoid Read Views for 33 wiktionaries (T386272), Turn on Parsoid Read Views for mobile wiktionary (T386272) (duration: 12m 08s)
  • 21:19 tgr@deploy2002: tgr, cscott: Continuing with sync
  • 21:16 tgr@deploy2002: tgr, cscott: Backport for Turn on Parsoid Read Views for 33 wiktionaries (T386272), Turn on Parsoid Read Views for mobile wiktionary (T386272) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:14 tgr@deploy2002: Started scap sync-world: Backport for Turn on Parsoid Read Views for 33 wiktionaries (T386272), Turn on Parsoid Read Views for mobile wiktionary (T386272)
  • 21:12 tgr@deploy2002: Sync cancelled.
  • 21:06 tgr@deploy2002: cscott, tgr: Backport for Turn on Parsoid Read Views for 33 wiktionaries (T386272) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:04 tgr@deploy2002: Started scap sync-world: Backport for Turn on Parsoid Read Views for 33 wiktionaries (T386272)
  • 21:01 inflatador: bking@cephosd1001:~$ sudo radosgw-admin quota set --quota-scope=user --uid=research --max-size=4T
  • 20:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 20:57 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 20:56 inflatador: bking@cephosd1001:~$ sudo radosgw-admin user create --uid=research --display-name="research"
  • 20:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 20:49 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 20:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 20:36 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 20:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 20:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 20:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 20:32 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 20:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 20:31 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 20:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 20:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 20:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 20:24 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@092b9d3]: deploy latest DAGs to analytics Airflow instance. T386114. (duration: 00m 33s)
  • 20:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 20:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 20:23 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@092b9d3]: deploy latest DAGs to analytics Airflow instance. T386114.
  • 20:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 20:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 20:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 20:07 urbanecm: mwscript-k8s --attach extensions/Translate/scripts/moveTranslatableBundle.php -- --wiki=metawiki 'Wiki_Movement_Brazil_User_Group' 'Wikimedia Brasil' 'Martin Urbanec' --reason='per request (phab:T386402)' # T386402
  • 20:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 20:03 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 20:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 19:59 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 19:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 19:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 19:58 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 19:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 19:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 19:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2146 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73475 and previous config saved to /var/cache/conftool/dbconfig/20250213-195454-root.json
  • 19:44 rzl@deploy2002: Finished scap sync-world: T383952, T384137 (duration: 11m 16s)
  • 19:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2146 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73474 and previous config saved to /var/cache/conftool/dbconfig/20250213-193949-root.json
  • 19:38 rzl@deploy2002: rzl: Continuing with sync
  • 19:37 rzl@deploy2002: rzl: T383952, T384137 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 19:35 rzl@deploy2002: Started scap sync-world: T383952, T384137
  • 19:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2146 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73473 and previous config saved to /var/cache/conftool/dbconfig/20250213-192444-root.json
  • 19:20 marostegui@cumin1002: dbctl commit (dc=all): 'db1219 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73472 and previous config saved to /var/cache/conftool/dbconfig/20250213-192047-root.json
  • 19:18 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host relforge1006.eqiad.wmnet with OS bullseye
  • 19:15 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host relforge1007.eqiad.wmnet with OS bullseye
  • 19:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2146 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73471 and previous config saved to /var/cache/conftool/dbconfig/20250213-190938-root.json
  • 19:05 marostegui@cumin1002: dbctl commit (dc=all): 'db1219 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73470 and previous config saved to /var/cache/conftool/dbconfig/20250213-190542-root.json
  • 19:01 tchin@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 19:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on relforge1006.eqiad.wmnet with reason: host reimage
  • 19:00 tchin@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 18:58 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on relforge1007.eqiad.wmnet with reason: host reimage
  • 18:55 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on relforge1006.eqiad.wmnet with reason: host reimage
  • 18:54 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on relforge1007.eqiad.wmnet with reason: host reimage
  • 18:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2146 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73469 and previous config saved to /var/cache/conftool/dbconfig/20250213-185433-root.json
  • 18:50 marostegui@cumin1002: dbctl commit (dc=all): 'db1219 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73467 and previous config saved to /var/cache/conftool/dbconfig/20250213-185036-root.json
  • 18:40 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1009.eqiad.wmnet with OS bookworm
  • 18:39 bking@cumin2002: START - Cookbook sre.hosts.reimage for host relforge1006.eqiad.wmnet with OS bullseye
  • 18:39 bking@cumin2002: START - Cookbook sre.hosts.reimage for host relforge1007.eqiad.wmnet with OS bullseye
  • 18:35 marostegui@cumin1002: dbctl commit (dc=all): 'db1219 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73466 and previous config saved to /var/cache/conftool/dbconfig/20250213-183531-root.json
  • 18:28 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host relforge1007.eqiad.wmnet with OS bullseye
  • 18:28 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host relforge1006.eqiad.wmnet with OS bullseye
  • 18:22 tchin@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 18:22 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1009.eqiad.wmnet with reason: host reimage
  • 18:21 tchin@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 18:20 marostegui@cumin1002: dbctl commit (dc=all): 'db1219 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73465 and previous config saved to /var/cache/conftool/dbconfig/20250213-182026-root.json
  • 18:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host relforge1005.eqiad.wmnet with OS bullseye
  • 18:18 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1009.eqiad.wmnet with reason: host reimage
  • 18:05 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1009.eqiad.wmnet with OS bookworm
  • 18:05 tchin@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 18:04 tchin@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 18:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on relforge1005.eqiad.wmnet with reason: host reimage
  • 17:58 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on relforge1005.eqiad.wmnet with reason: host reimage
  • 17:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host relforge1006.eqiad.wmnet with OS bullseye
  • 17:56 bking@cumin2002: START - Cookbook sre.hosts.reimage for host relforge1007.eqiad.wmnet with OS bullseye
  • 17:53 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1008.eqiad.wmnet with OS bookworm
  • 17:42 bking@cumin2002: START - Cookbook sre.hosts.reimage for host relforge1005.eqiad.wmnet with OS bullseye
  • 17:35 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host relforge1007.eqiad.wmnet with OS bullseye
  • 17:35 tchin@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 17:35 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host relforge1006.eqiad.wmnet with OS bullseye
  • 17:34 tchin@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 17:34 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host relforge1005.eqiad.wmnet with OS bullseye
  • 17:34 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1008.eqiad.wmnet with reason: host reimage
  • 17:31 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1008.eqiad.wmnet with reason: host reimage
  • 17:13 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1008.eqiad.wmnet with OS bookworm
  • 17:04 tchin@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 17:04 tchin@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 17:03 tchin@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 17:03 tchin@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 17:01 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1007.eqiad.wmnet with OS bookworm
  • 16:58 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2243.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:55 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1153.eqiad.wmnet with reason: maintenance
  • 16:53 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad
  • 16:53 cmooney@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad
  • 16:42 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1007.eqiad.wmnet with reason: host reimage
  • 16:39 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1007.eqiad.wmnet with reason: host reimage
  • 16:38 cgoubert@deploy2002: Finished deploy [restbase/deploy@511b3a4]: Add kncwiki (T385186) (duration: 15m 54s)
  • 16:37 bking@cumin2002: START - Cookbook sre.hosts.reimage for host relforge1007.eqiad.wmnet with OS bullseye
  • 16:36 bking@cumin2002: START - Cookbook sre.hosts.reimage for host relforge1006.eqiad.wmnet with OS bullseye
  • 16:35 bking@cumin2002: START - Cookbook sre.hosts.reimage for host relforge1005.eqiad.wmnet with OS bullseye
  • 16:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1106 to relforge1007
  • 16:33 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host relforge1007
  • 16:33 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host relforge1007
  • 16:33 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:33 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1106 to relforge1007 - bking@cumin2002"
  • 16:32 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1106 to relforge1007 - bking@cumin2002"
  • 16:29 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 16:29 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1106 to relforge1007
  • 16:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1105 to relforge1006
  • 16:28 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host relforge1006
  • 16:27 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host relforge1006
  • 16:27 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:27 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1105 to relforge1006 - bking@cumin2002"
  • 16:27 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1105 to relforge1006 - bking@cumin2002"
  • 16:23 cgoubert@deploy2002: Started deploy [restbase/deploy@511b3a4]: Add kncwiki (T385186)
  • 16:22 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 16:22 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1105 to relforge1006
  • 16:21 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1104 to relforge1005
  • 16:20 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1007.eqiad.wmnet with OS bookworm
  • 16:20 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host relforge1005
  • 16:20 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host relforge1005
  • 16:20 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:20 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1104 to relforge1005 - bking@cumin2002"
  • 16:19 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1104 to relforge1005 - bking@cumin2002"
  • 16:16 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 16:15 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1104 to relforge1005
  • 16:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2243.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2243.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:29 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Add config option to make somevalue hashes use URI (T384344), Make somevalue hashes use URI in tests (T384344), Add config option to fix s:, ref:, v: namespace prefix (T384344), Fix s:, ref:, v: namespace prefix in tests (T384344) (duration: 11m 19s)
  • 15:28 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2243.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2243.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:22 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
  • 15:20 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Add config option to make somevalue hashes use URI (T384344), Make somevalue hashes use URI in tests (T384344), Add config option to fix s:, ref:, v: namespace prefix (T384344), Fix s:, ref:, v: namespace prefix in tests (T384344) synced to the testservers (https://wikitech.
  • 15:19 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2146.codfw.wmnet with reason: maintenance
  • 15:19 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic[1104-1106].eqiad.wmnet with reason: T386357
  • 15:18 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2146.codfw.wmnet
  • 15:18 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Add config option to make somevalue hashes use URI (T384344), Make somevalue hashes use URI in tests (T384344), Add config option to fix s:, ref:, v: namespace prefix (T384344), Fix s:, ref:, v: namespace prefix in tests (T384344)
  • 15:17 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2243.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:15 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1006.eqiad.wmnet with OS bookworm
  • 15:15 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1219.eqiad.wmnet with reason: Index rebuild
  • 15:13 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1219.eqiad.wmnet
  • 15:12 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2243.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:12 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2146.codfw.wmnet
  • 15:11 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2146', diff saved to https://phabricator.wikimedia.org/P73463 and previous config saved to /var/cache/conftool/dbconfig/20250213-151117-marostegui.json
  • 15:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2243.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:07 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1219.eqiad.wmnet
  • 15:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1219', diff saved to https://phabricator.wikimedia.org/P73462 and previous config saved to /var/cache/conftool/dbconfig/20250213-150715-marostegui.json
  • 15:05 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2243.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:03 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1104*,elastic1105*,elastic1106* for ban hosts prior to reimage/repurpose - bking@cumin2002 - T386357
  • 15:03 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1104*,elastic1105*,elastic1106* for ban hosts prior to reimage/repurpose - bking@cumin2002 - T386357
  • 15:03 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 14:57 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1006.eqiad.wmnet with reason: host reimage
  • 14:53 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1006.eqiad.wmnet with reason: host reimage
  • 14:35 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1006.eqiad.wmnet with OS bookworm
  • 14:16 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1104*,elastic1005*,elastic1006* for ban hosts prior to reimage/repurpose - bking@cumin2002 - T386357
  • 14:16 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1104*,elastic1005*,elastic1006* for ban hosts prior to reimage/repurpose - bking@cumin2002 - T386357
  • 14:06 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1005.eqiad.wmnet with OS bookworm
  • 14:00 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:59 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:48 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1005.eqiad.wmnet with reason: host reimage
  • 13:48 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:46 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:45 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1005.eqiad.wmnet with reason: host reimage
  • 13:27 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1005.eqiad.wmnet with OS bookworm
  • 13:04 kart_: Updated Cxserver to 2025-02-13-102531-production (T381943, T386231)
  • 13:03 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 13:02 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1004.eqiad.wmnet with OS bookworm
  • 13:02 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 13:01 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 13:01 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 12:56 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 12:56 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 12:45 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1004.eqiad.wmnet with reason: host reimage
  • 12:41 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1004.eqiad.wmnet with reason: host reimage
  • 12:25 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1004.eqiad.wmnet with OS bookworm
  • 12:19 ladsgroup@deploy2002: Finished deploy [dumps/dumps@2e0a7a5]: Stop producing Yahoo! abstract dumps (T382069) (duration: 00m 07s)
  • 12:19 ladsgroup@deploy2002: Started deploy [dumps/dumps@2e0a7a5]: Stop producing Yahoo! abstract dumps (T382069)
  • 12:18 stevemunene: draining dse-k8s-worker1004 ready for reimage to bookworm and containerd for T377875
  • 12:04 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database kncwiki (T385188)
  • 11:38 fnegri@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database kncwiki (T385188)
  • 11:08 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1003.eqiad.wmnet with OS bookworm
  • 10:50 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1003.eqiad.wmnet with reason: host reimage
  • 10:49 aklapper@deploy2002: Finished scap sync-world: Backport for ApiPageTriageList: Check that $user is defined before using it (T386332) (duration: 10m 47s)
  • 10:48 joal@deploy2002: Finished deploy [analytics/refinery@08b2bd2] (hadoop-test): Analytics one-off deploy - TEST [analytics/refinery@08b2bd2e] (duration: 00m 44s)
  • 10:47 joal@deploy2002: Started deploy [analytics/refinery@08b2bd2] (hadoop-test): Analytics one-off deploy - TEST [analytics/refinery@08b2bd2e]
  • 10:47 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1003.eqiad.wmnet with reason: host reimage
  • 10:47 joal@deploy2002: Finished deploy [analytics/refinery@08b2bd2] (thin): Analytics one-off deploy -THIN [analytics/refinery@08b2bd2e] (duration: 00m 46s)
  • 10:46 joal@deploy2002: Started deploy [analytics/refinery@08b2bd2] (thin): Analytics one-off deploy -THIN [analytics/refinery@08b2bd2e]
  • 10:46 joal@deploy2002: Finished deploy [analytics/refinery@08b2bd2]: Analytics one-off deploy [analytics/refinery@08b2bd2e] (duration: 02m 07s)
  • 10:44 joal@deploy2002: Started deploy [analytics/refinery@08b2bd2]: Analytics one-off deploy [analytics/refinery@08b2bd2e]
  • 10:42 aklapper@deploy2002: kharlan, aklapper: Continuing with sync
  • 10:41 aklapper@deploy2002: kharlan, aklapper: Backport for ApiPageTriageList: Check that $user is defined before using it (T386332) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:38 aklapper@deploy2002: Started scap sync-world: Backport for ApiPageTriageList: Check that $user is defined before using it (T386332)
  • 10:14 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1003.eqiad.wmnet with OS bookworm
  • 09:40 aklapper@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.16 refs T382367
  • 09:03 dcausse: closing UTC morning backport window
  • 09:01 dcausse@deploy2002: Finished scap sync-world: Backport for cirrus: enable mlr-2025 for select wikis (T385972) (duration: 19m 06s)
  • 08:54 dcausse@deploy2002: dcausse, gmodena: Continuing with sync
  • 08:45 dcausse@deploy2002: dcausse, gmodena: Backport for cirrus: enable mlr-2025 for select wikis (T385972) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:42 dcausse@deploy2002: Started scap sync-world: Backport for cirrus: enable mlr-2025 for select wikis (T385972)
  • 08:36 dcausse@deploy2002: Finished scap sync-world: Backport for Lift IP cap for edit-a-thon on 2025-02-17 & 2025-03-10 (T386126) (duration: 09m 45s)
  • 08:27 dcausse@deploy2002: Started scap sync-world: Backport for Lift IP cap for edit-a-thon on 2025-02-17 & 2025-03-10 (T386126)
  • 08:23 dcausse@deploy2002: Finished scap sync-world: Backport for Revert "zhwiki: Add 2025 CNY celebration logos" (duration: 13m 24s)
  • 08:17 dcausse@deploy2002: stang, dcausse: Continuing with sync
  • 08:13 dcausse@deploy2002: stang, dcausse: Backport for Revert "zhwiki: Add 2025 CNY celebration logos" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:10 dcausse@deploy2002: Started scap sync-world: Backport for Revert "zhwiki: Add 2025 CNY celebration logos"
  • 04:45 kart_: Updated cxserver to 2025-02-12-075258-production (T381943)
  • 04:41 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 04:40 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 04:38 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 04:37 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 04:28 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 04:28 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 03:40 tchin@deploy2002: Finished deploy [airflow-dags/analytics@aaba3ff]: Deploying airflow for T306896 (duration: 01m 07s)
  • 03:39 tchin@deploy2002: Started deploy [airflow-dags/analytics@aaba3ff]: Deploying airflow for T306896
  • 03:36 eileen: civicrm upgraded from c52e87d6 to a62ed046
  • 01:48 zabe: zabe@deploy2002:~$ mwscript-k8s --comment="T386292" -f -- extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=metawiki --logwiki=metawiki 'Nebuls' 'Renamed user 9b7b870ac2b7d3f071232203ec1030d1'
  • 01:48 zabe: zabe@deploy2002:~$ mwscript-k8s --comment="T386292" -f -- extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=loginwiki --logwiki=metawiki 'Sofia Baldelli' 'AnonymWikiuser 245'
  • 01:35 zabe: zabe@mwmaint2002:/tmp/uploads$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=OGPawlis --overwrite /tmp/uploads # T382976
  • 00:15 zabe@deploy2002: Finished scap sync-world: Backport for Reduce revision-slots cache expiry to 60s on diqwiki and ttwiki (T183490) (duration: 10m 39s)
  • 00:08 zabe@deploy2002: zabe: Continuing with sync
  • 00:07 zabe@deploy2002: zabe: Backport for Reduce revision-slots cache expiry to 60s on diqwiki and ttwiki (T183490) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 00:04 zabe@deploy2002: Started scap sync-world: Backport for Reduce revision-slots cache expiry to 60s on diqwiki and ttwiki (T183490)
  • 00:03 eileen: civicrm upgraded from 454e0ccd to c52e87d6
  • 00:00 toyofuku@deploy2002: Finished scap sync-world: Backport for Lazy Load Images (T366402), Lazy Load Images (T366402) (duration: 31m 40s)

2025-02-12

  • 23:51 toyofuku@deploy2002: toyofuku: Continuing with sync
  • 23:34 toyofuku@deploy2002: toyofuku: Backport for Lazy Load Images (T366402), Lazy Load Images (T366402) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:28 toyofuku@deploy2002: Started scap sync-world: Backport for Lazy Load Images (T366402), Lazy Load Images (T366402)
  • 22:23 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host relforge1004.eqiad.wmnet
  • 22:07 apine@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 22:06 apine@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 22:06 apine@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 22:05 apine@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 22:04 apine@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 22:04 apine@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 22:02 apine@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 21:31 kamila@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1124-1128].eqiad.wmnet
  • 21:31 kamila@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1124-1128].eqiad.wmnet
  • 21:14 cjming@deploy2002: Finished scap sync-world: Backport for [arwiki] Set noindex for namespace user talk (T371470) (duration: 11m 05s)
  • 21:07 cjming@deploy2002: cjming, gergesshamon: Continuing with sync
  • 21:06 cjming@deploy2002: cjming, gergesshamon: Backport for [arwiki] Set noindex for namespace user talk (T371470) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:03 cjming@deploy2002: Started scap sync-world: Backport for [arwiki] Set noindex for namespace user talk (T371470)
  • 20:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2243.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 20:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2243.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 20:14 marostegui@cumin1002: dbctl commit (dc=all): 'db2176 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73456 and previous config saved to /var/cache/conftool/dbconfig/20250212-201424-root.json
  • 20:01 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on relforge1004.eqiad.wmnet with reason: T380752
  • 19:59 marostegui@cumin1002: dbctl commit (dc=all): 'db2176 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73455 and previous config saved to /var/cache/conftool/dbconfig/20250212-195919-root.json
  • 19:44 marostegui@cumin1002: dbctl commit (dc=all): 'db2176 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73454 and previous config saved to /var/cache/conftool/dbconfig/20250212-194414-root.json
  • 19:29 marostegui@cumin1002: dbctl commit (dc=all): 'db2176 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73453 and previous config saved to /var/cache/conftool/dbconfig/20250212-192909-root.json
  • 19:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1232 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73452 and previous config saved to /var/cache/conftool/dbconfig/20250212-192700-root.json
  • 19:14 marostegui@cumin1002: dbctl commit (dc=all): 'db2176 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73451 and previous config saved to /var/cache/conftool/dbconfig/20250212-191404-root.json
  • 19:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1232 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73450 and previous config saved to /var/cache/conftool/dbconfig/20250212-191155-root.json
  • 18:56 marostegui@cumin1002: dbctl commit (dc=all): 'db1232 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73449 and previous config saved to /var/cache/conftool/dbconfig/20250212-185649-root.json
  • 18:53 tchin@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 18:52 tchin@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 18:51 tchin@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 18:51 tchin@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 18:50 tchin@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 18:50 tchin@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 18:47 tchin@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 18:47 tchin@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 18:47 tchin@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 18:46 tchin@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 18:45 tchin@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 18:45 tchin@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 18:44 tchin@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 18:44 tchin@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 18:41 marostegui@cumin1002: dbctl commit (dc=all): 'db1232 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73448 and previous config saved to /var/cache/conftool/dbconfig/20250212-184143-root.json
  • 18:35 bking@cumin2002: START - Cookbook sre.hosts.dhcp for host relforge1004.eqiad.wmnet
  • 18:26 marostegui@cumin1002: dbctl commit (dc=all): 'db1232 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73447 and previous config saved to /var/cache/conftool/dbconfig/20250212-182637-root.json
  • 17:20 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1002.eqiad.wmnet with OS bookworm
  • 17:14 tchin@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 17:14 tchin@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 17:13 tchin@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 17:12 tchin@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 17:10 marostegui: Install 10.6.21 on db2230 T385678
  • 17:08 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2230.codfw.wmnet with reason: maintenance
  • 17:02 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1002.eqiad.wmnet with reason: host reimage
  • 16:59 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1002.eqiad.wmnet with reason: host reimage
  • 16:43 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1002.eqiad.wmnet with OS bookworm
  • 16:34 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1002.eqiad.wmnet with OS bookworm
  • 16:32 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 16:31 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 16:31 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:30 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 16:29 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:29 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:26 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:26 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:15 claime: Halving mw-api-int staging replicas to free pod ip blocks - T386107
  • 16:14 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-api-int: apply
  • 16:14 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/mw-api-int: apply
  • 16:09 claime: Deleting benthos, changeprop, changeprop-jobqueue from staging to free pod ip blocks - T386107
  • 16:07 cmooney@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cr2-magru with reason: IBGP instability from cr1 to cr2 in magru causing ping faulures from alert1002
  • 15:37 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2176.codfw.wmnet with reason: Index rebuild
  • 15:36 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2176.codfw.wmnet
  • 15:32 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1002.eqiad.wmnet with OS bookworm
  • 15:31 stevemunene@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1002.eqiad.wmnet with OS bookworm
  • 15:30 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2176.codfw.wmnet with reason: maintenance
  • 15:29 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2176.codfw.wmnet
  • 15:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2176 T385561', diff saved to https://phabricator.wikimedia.org/P73446 and previous config saved to /var/cache/conftool/dbconfig/20250212-152738-marostegui.json
  • 15:25 apine@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 15:23 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1232.eqiad.wmnet with reason: Index rebuild
  • 15:22 root@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1232.eqiad.wmnet
  • 15:18 Lucas_WMDE: UTC backport+config window done
  • 15:18 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1002.eqiad.wmnet with OS bookworm
  • 15:17 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 15:16 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Let sysops add/remove the event-organizer group by default (T376822) (duration: 12m 53s)
  • 15:16 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1232.eqiad.wmnet with reason: maintenance
  • 15:15 root@cumin1002: START - Cookbook sre.mysql.upgrade for db1232.eqiad.wmnet
  • 15:15 apine@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 15:15 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1232', diff saved to https://phabricator.wikimedia.org/P73445 and previous config saved to /var/cache/conftool/dbconfig/20250212-151533-marostegui.json
  • 15:14 apine@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 15:09 lucaswerkmeister-wmde@deploy2002: daimona, lucaswerkmeister-wmde: Continuing with sync
  • 15:07 lucaswerkmeister-wmde@deploy2002: daimona, lucaswerkmeister-wmde: Backport for Let sysops add/remove the event-organizer group by default (T376822) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:04 apine@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 15:04 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Let sysops add/remove the event-organizer group by default (T376822)
  • 14:59 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 14:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for refactor(AddLink): Make eval steps more legible, feat(AddLink): store null if there is no recommendation (T382270) (duration: 11m 47s)
  • 14:41 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, migr: Continuing with sync
  • 14:39 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, migr: Backport for refactor(AddLink): Make eval steps more legible, feat(AddLink): store null if there is no recommendation (T382270) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for refactor(AddLink): Make eval steps more legible, feat(AddLink): store null if there is no recommendation (T382270)
  • 14:29 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for viwiki: Restrict the "changetags" permission to the sysop and bot groups (T385960), beta: fix typo in GEApiQueryGrowthTasksLookaheadSize variable (duration: 10m 59s)
  • 14:22 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, dragoniez, sgimeno: Continuing with sync
  • 14:21 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, dragoniez, sgimeno: Backport for viwiki: Restrict the "changetags" permission to the sysop and bot groups (T385960), beta: fix typo in GEApiQueryGrowthTasksLookaheadSize variable synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:18 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for viwiki: Restrict the "changetags" permission to the sysop and bot groups (T385960), beta: fix typo in GEApiQueryGrowthTasksLookaheadSize variable
  • 14:17 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Enable fixed Wikibase RDF on Beta (T384344) (duration: 10m 35s)
  • 14:10 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
  • 14:09 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Enable fixed Wikibase RDF on Beta (T384344) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:06 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Enable fixed Wikibase RDF on Beta (T384344)
  • 13:49 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1001.eqiad.wmnet with OS bookworm
  • 13:40 aklapper@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.16 refs T382367
  • 13:31 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1001.eqiad.wmnet with reason: host reimage
  • 13:25 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1001.eqiad.wmnet with reason: host reimage
  • 13:19 tchin@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 13:18 tchin@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 13:17 tchin@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 13:16 tchin@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 13:14 tchin@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 13:13 tchin@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 13:09 mszabo@deploy2002: Finished scap sync-world: Backport for Use original connection handle in onTransactionPreCommitOrIdle() (T386171) (duration: 11m 27s)
  • 13:09 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1001.eqiad.wmnet with OS bookworm
  • 13:08 tchin@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 13:08 tchin@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 13:07 stevemunene@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1001.eqiad.wmnet with OS bookworm
  • 13:06 tchin@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 13:06 tchin@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 13:04 tchin@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 13:03 mszabo@deploy2002: mszabo: Continuing with sync
  • 13:02 tchin@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 13:01 mszabo@deploy2002: mszabo: Backport for Use original connection handle in onTransactionPreCommitOrIdle() (T386171) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:58 mszabo@deploy2002: Started scap sync-world: Backport for Use original connection handle in onTransactionPreCommitOrIdle() (T386171)
  • 12:40 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1001.eqiad.wmnet with OS bookworm
  • 12:27 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 12:27 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 12:26 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 12:25 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 12:23 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 12:22 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 12:14 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 12:14 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 12:13 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 12:13 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 12:09 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 12:08 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 10:30 aklapper@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.16 refs T382367
  • 09:27 brouberol@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: test rolling-operation cookbook - brouberol@cumin2002
  • 09:27 brouberol@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: test rolling-operation cookbook - brouberol@cumin2002
  • 09:24 aklapper@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.16 refs T382367
  • 09:18 brouberol@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: test rolling-operation cookbook - brouberol@cumin2002
  • 09:18 brouberol@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: test rolling-operation cookbook - brouberol@cumin2002
  • 09:16 brouberol@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: test rolling-operation cookbook - brouberol@cumin2002
  • 09:16 brouberol@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: test rolling-operation cookbook - brouberol@cumin2002
  • 08:49 dcausse: closing the UTC morning backport widow
  • 08:42 dcausse@deploy2002: Finished scap sync-world: Backport for cirrus: update ltr model on enwiki (T385972) (duration: 13m 10s)
  • 08:35 dcausse@deploy2002: gmodena, dcausse: Continuing with sync
  • 08:31 dcausse@deploy2002: gmodena, dcausse: Backport for cirrus: update ltr model on enwiki (T385972) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:28 dcausse@deploy2002: Started scap sync-world: Backport for cirrus: update ltr model on enwiki (T385972)
  • 08:25 dcausse@deploy2002: Finished scap sync-world: Backport for cirrus: create buckets for mlr 2025 experiment (T385972), cirrus: deploy new mlr models (T385972) (duration: 17m 03s)
  • 08:18 dcausse@deploy2002: dcausse, gmodena: Continuing with sync
  • 08:11 dcausse@deploy2002: dcausse, gmodena: Backport for cirrus: create buckets for mlr 2025 experiment (T385972), cirrus: deploy new mlr models (T385972) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:08 dcausse@deploy2002: Started scap sync-world: Backport for cirrus: create buckets for mlr 2025 experiment (T385972), cirrus: deploy new mlr models (T385972)
  • 04:55 eileen: civicrm upgraded from 7ceb3ee9 to 454e0ccd
  • 02:02 zabe@deploy2002: Finished scap sync-world: Backport for MCR Stage 4: Reduce dewiktionary revision-slots cache expiry (duration: 11m 46s)
  • 01:55 zabe@deploy2002: zabe: Continuing with sync
  • 01:55 zabe@deploy2002: zabe: Backport for MCR Stage 4: Reduce dewiktionary revision-slots cache expiry synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 01:50 zabe@deploy2002: Started scap sync-world: Backport for MCR Stage 4: Reduce dewiktionary revision-slots cache expiry
  • 00:23 eileen: civicrm upgraded from 00b560e4 to 7ceb3ee9
  • 00:18 eileen: civicrm upgraded from d027bc7b to 00b560e4
  • 00:12 zabe: zabe@mwmaint2002:~$ cat /srv/mediawiki-staging/dblists/group1.dblist | xargs -I{} bash -c "echo {}; mwscript extensions/WikimediaMaintenance/migrateESRefToContentTableStage2.php {} --delete /home/zabe/text_table_cleanup/{} --sleep 0.3" # T183490

2025-02-11

2025-02-10

2025-02-09

  • 13:52 cmooney@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on cr2-magru with reason: IBGP instability from cr1 to cr2 in magru causing ping faulures from alert1002
  • 01:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2240 (T384592)', diff saved to https://phabricator.wikimedia.org/P73430 and previous config saved to /var/cache/conftool/dbconfig/20250209-013642-marostegui.json
  • 01:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P73429 and previous config saved to /var/cache/conftool/dbconfig/20250209-012135-marostegui.json
  • 01:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P73428 and previous config saved to /var/cache/conftool/dbconfig/20250209-010628-marostegui.json
  • 00:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2240 (T384592)', diff saved to https://phabricator.wikimedia.org/P73427 and previous config saved to /var/cache/conftool/dbconfig/20250209-005121-marostegui.json

2025-02-08

  • 19:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2240 (T384592)', diff saved to https://phabricator.wikimedia.org/P73426 and previous config saved to /var/cache/conftool/dbconfig/20250208-193620-marostegui.json
  • 19:36 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2240.codfw.wmnet with reason: Maintenance
  • 15:32 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance
  • 15:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T384592)', diff saved to https://phabricator.wikimedia.org/P73425 and previous config saved to /var/cache/conftool/dbconfig/20250208-153144-marostegui.json
  • 15:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P73424 and previous config saved to /var/cache/conftool/dbconfig/20250208-151636-marostegui.json
  • 15:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P73423 and previous config saved to /var/cache/conftool/dbconfig/20250208-150130-marostegui.json
  • 14:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T384592)', diff saved to https://phabricator.wikimedia.org/P73422 and previous config saved to /var/cache/conftool/dbconfig/20250208-144623-marostegui.json
  • 09:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2237 (T384592)', diff saved to https://phabricator.wikimedia.org/P73421 and previous config saved to /var/cache/conftool/dbconfig/20250208-091745-marostegui.json
  • 09:17 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2237.codfw.wmnet with reason: Maintenance
  • 09:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236 (T384592)', diff saved to https://phabricator.wikimedia.org/P73420 and previous config saved to /var/cache/conftool/dbconfig/20250208-091721-marostegui.json
  • 09:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P73419 and previous config saved to /var/cache/conftool/dbconfig/20250208-090214-marostegui.json
  • 08:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P73418 and previous config saved to /var/cache/conftool/dbconfig/20250208-084707-marostegui.json
  • 08:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236 (T384592)', diff saved to https://phabricator.wikimedia.org/P73417 and previous config saved to /var/cache/conftool/dbconfig/20250208-083201-marostegui.json
  • 03:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2236 (T384592)', diff saved to https://phabricator.wikimedia.org/P73416 and previous config saved to /var/cache/conftool/dbconfig/20250208-034038-marostegui.json
  • 03:40 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2236.codfw.wmnet with reason: Maintenance
  • 03:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T384592)', diff saved to https://phabricator.wikimedia.org/P73415 and previous config saved to /var/cache/conftool/dbconfig/20250208-034015-marostegui.json
  • 03:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P73414 and previous config saved to /var/cache/conftool/dbconfig/20250208-032508-marostegui.json
  • 03:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P73413 and previous config saved to /var/cache/conftool/dbconfig/20250208-031000-marostegui.json
  • 02:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T384592)', diff saved to https://phabricator.wikimedia.org/P73412 and previous config saved to /var/cache/conftool/dbconfig/20250208-025453-marostegui.json
  • 00:31 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs1025.eqiad.wmnet
  • 00:30 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs1026.eqiad.wmnet

2025-02-07

  • 23:11 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs1025.eqiad.wmnet
  • 23:11 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs1026.eqiad.wmnet
  • 23:06 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs2021.codfw.wmnet
  • 23:01 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs1022.eqiad.wmnet
  • 23:00 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs1021.eqiad.wmnet
  • 22:58 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs2020.codfw.wmnet
  • 22:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T384592)', diff saved to https://phabricator.wikimedia.org/P73410 and previous config saved to /var/cache/conftool/dbconfig/20250207-220433-marostegui.json
  • 22:04 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 22:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T384592)', diff saved to https://phabricator.wikimedia.org/P73409 and previous config saved to /var/cache/conftool/dbconfig/20250207-220411-marostegui.json
  • 21:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P73408 and previous config saved to /var/cache/conftool/dbconfig/20250207-214904-marostegui.json
  • 21:40 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs1022.eqiad.wmnet
  • 21:39 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs1021.eqiad.wmnet
  • 21:38 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs2021.codfw.wmnet
  • 21:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P73407 and previous config saved to /var/cache/conftool/dbconfig/20250207-213357-marostegui.json
  • 21:33 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs2019.codfw.wmnet
  • 21:28 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs2020.codfw.wmnet
  • 21:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T384592)', diff saved to https://phabricator.wikimedia.org/P73406 and previous config saved to /var/cache/conftool/dbconfig/20250207-211851-marostegui.json
  • 21:16 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs2018.codfw.wmnet
  • 20:38 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73405 and previous config saved to /var/cache/conftool/dbconfig/20250207-203816-root.json
  • 20:23 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73404 and previous config saved to /var/cache/conftool/dbconfig/20250207-202311-root.json
  • 20:08 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73403 and previous config saved to /var/cache/conftool/dbconfig/20250207-200805-root.json
  • 20:06 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs2019.codfw.wmnet
  • 19:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73402 and previous config saved to /var/cache/conftool/dbconfig/20250207-195300-root.json
  • 19:48 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs2018.codfw.wmnet
  • 19:37 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73401 and previous config saved to /var/cache/conftool/dbconfig/20250207-193754-root.json
  • 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host db1256.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:32 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1054.eqiad.wmnet with OS bookworm
  • 18:14 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1256
  • 18:12 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host db1256
  • 18:07 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:07 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt db1256 - vriley@cumin1002"
  • 18:07 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt db1256 - vriley@cumin1002"
  • 18:04 rzl@deploy2002: Finished scap sync-world: https://gerrit.wikimedia.org/r/1118003 (duration: 12m 54s)
  • 18:03 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 18:01 vriley@cumin1002: START - Cookbook sre.hosts.provision for host db1255.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:59 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1255
  • 17:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1053.eqiad.wmnet with OS bookworm
  • 17:58 rzl@deploy2002: rzl: Continuing with sync
  • 17:58 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host db1255
  • 17:58 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:58 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt db1255 - vriley@cumin1002"
  • 17:58 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt db1255 - vriley@cumin1002"
  • 17:57 rzl@deploy2002: rzl: https://gerrit.wikimedia.org/r/1118003 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 17:55 rzl@deploy2002: Started scap sync-world: https://gerrit.wikimedia.org/r/1118003
  • 17:52 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 17:38 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host ganeti1054.eqiad.wmnet with OS bookworm
  • 17:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1054.eqiad.wmnet with OS bookworm
  • 17:36 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:36 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add mgmt dns names for test nokia switches - cmooney@cumin1002"
  • 17:35 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add mgmt dns names for test nokia switches - cmooney@cumin1002"
  • 16:46 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - cdanis@cumin1002"
  • 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: [not really into teleological thinking] - cdanis@cumin1002
  • 16:34 cdanis@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: [not really into teleological thinking] - cdanis@cumin1002
  • 16:34 cdanis@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - cdanis@cumin1002"
  • 16:33 cdanis@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - cdanis@cumin1002"
  • 16:33 cdanis@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: [not really into teleological thinking] - cdanis@cumin1002
  • 16:33 cdanis@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: [not really into teleological thinking] - cdanis@cumin1002
  • 16:33 cdanis@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - cdanis@cumin1002"
  • 16:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2210 (T384592)', diff saved to https://phabricator.wikimedia.org/P73400 and previous config saved to /var/cache/conftool/dbconfig/20250207-161646-marostegui.json
  • 16:16 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2210.codfw.wmnet with reason: Maintenance
  • 16:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T384592)', diff saved to https://phabricator.wikimedia.org/P73399 and previous config saved to /var/cache/conftool/dbconfig/20250207-161624-marostegui.json
  • 16:16 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host ganeti1054.eqiad.wmnet with OS bookworm
  • 16:15 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1054.eqiad.wmnet with OS bookworm
  • 16:13 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host ganeti1053.eqiad.wmnet with OS bookworm
  • 16:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P73398 and previous config saved to /var/cache/conftool/dbconfig/20250207-160117-marostegui.json
  • 15:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P73397 and previous config saved to /var/cache/conftool/dbconfig/20250207-154610-marostegui.json
  • 15:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T384592)', diff saved to https://phabricator.wikimedia.org/P73396 and previous config saved to /var/cache/conftool/dbconfig/20250207-153103-marostegui.json
  • 15:04 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1020.eqiad.wmnet with reason: maintenance
  • 15:04 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1016.eqiad.wmnet with reason: maintenance
  • 15:03 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1170.eqiad.wmnet with reason: Index rebuild
  • 15:02 root@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1170.eqiad.wmnet
  • 14:56 root@cumin1002: START - Cookbook sre.mysql.upgrade for db1170.eqiad.wmnet
  • 14:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1170', diff saved to https://phabricator.wikimedia.org/P73395 and previous config saved to /var/cache/conftool/dbconfig/20250207-145547-marostegui.json
  • 14:50 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1020.eqiad.wmnet,service=s5
  • 14:50 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1020.eqiad.wmnet,service=s8
  • 14:36 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1020.eqiad.wmnet,service=s8
  • 14:36 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1020.eqiad.wmnet,service=s5
  • 14:36 fnegri@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1020.eqiad.wmnet with reason: Rebooting clouddb1020 T384946
  • 14:35 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1019.eqiad.wmnet,service=s6
  • 14:35 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1019.eqiad.wmnet,service=s4
  • 14:22 fnegri@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1019.eqiad.wmnet with reason: Rebooting clouddb1019 T384946
  • 14:21 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1019.eqiad.wmnet,service=s6
  • 14:21 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1019.eqiad.wmnet,service=s4
  • 14:20 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for clouddb1015.eqiad.wmnet
  • 14:20 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for clouddb1015.eqiad.wmnet
  • 14:20 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1015.eqiad.wmnet,service=s4
  • 14:20 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1015.eqiad.wmnet,service=s6
  • 14:11 fnegri@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1015.eqiad.wmnet with reason: Rebooting clouddb1015 T384946
  • 14:10 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1015.eqiad.wmnet
  • 14:03 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1018.eqiad.wmnet with reason: maintenance
  • 14:03 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb1015.eqiad.wmnet
  • 14:02 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1014.eqiad.wmnet with reason: maintenance
  • 14:02 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1015.eqiad.wmnet,service=s6
  • 14:02 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1015.eqiad.wmnet,service=s4
  • 12:26 marostegui@cumin1002: dbctl commit (dc=all): 'db2150 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73394 and previous config saved to /var/cache/conftool/dbconfig/20250207-122645-root.json
  • 12:11 marostegui@cumin1002: dbctl commit (dc=all): 'db2150 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73393 and previous config saved to /var/cache/conftool/dbconfig/20250207-121140-root.json
  • 12:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1033.eqiad.wmnet
  • 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1033.eqiad.wmnet
  • 11:56 marostegui@cumin1002: dbctl commit (dc=all): 'db2150 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73392 and previous config saved to /var/cache/conftool/dbconfig/20250207-115634-root.json
  • 11:50 jmm@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ganeti1033.eqiad.wmnet
  • 11:42 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1018.eqiad.wmnet,service=s2
  • 11:42 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1018.eqiad.wmnet,service=s7
  • 11:41 marostegui@cumin1002: dbctl commit (dc=all): 'db2150 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73391 and previous config saved to /var/cache/conftool/dbconfig/20250207-114129-root.json
  • 11:40 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti1033.eqiad.wmnet
  • 11:40 jmm@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ganeti1033.eqiad.wmnet
  • 11:35 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti1033.eqiad.wmnet
  • 11:35 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ganeti1033.eqiad.wmnet
  • 11:35 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti1033.eqiad.wmnet
  • 11:35 jmm@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ganeti1033.eqiad.wmnet
  • 11:29 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s2
  • 11:29 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s7
  • 11:28 fnegri@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1018.eqiad.wmnet with reason: Rebooting clouddb1018 T384946
  • 11:26 marostegui@cumin1002: dbctl commit (dc=all): 'db2150 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73390 and previous config saved to /var/cache/conftool/dbconfig/20250207-112624-root.json
  • 11:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2145 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73389 and previous config saved to /var/cache/conftool/dbconfig/20250207-111619-root.json
  • 11:14 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti1033.eqiad.wmnet
  • 11:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2145 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73388 and previous config saved to /var/cache/conftool/dbconfig/20250207-110114-root.json
  • 10:48 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73387 and previous config saved to /var/cache/conftool/dbconfig/20250207-104818-root.json
  • 10:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2145 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73386 and previous config saved to /var/cache/conftool/dbconfig/20250207-104609-root.json
  • 10:37 marostegui@cumin1002: dbctl commit (dc=all): 'db1234 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73385 and previous config saved to /var/cache/conftool/dbconfig/20250207-103710-root.json
  • 10:33 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73384 and previous config saved to /var/cache/conftool/dbconfig/20250207-103312-root.json
  • 10:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2145 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73383 and previous config saved to /var/cache/conftool/dbconfig/20250207-103104-root.json
  • 10:30 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for clouddb1014.eqiad.wmnet
  • 10:30 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for clouddb1014.eqiad.wmnet
  • 10:24 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1014.eqiad.wmnet,service=s7
  • 10:24 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1014.eqiad.wmnet,service=s2
  • 10:22 marostegui@cumin1002: dbctl commit (dc=all): 'db1234 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73382 and previous config saved to /var/cache/conftool/dbconfig/20250207-102205-root.json
  • 10:18 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73381 and previous config saved to /var/cache/conftool/dbconfig/20250207-101807-root.json
  • 10:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2145 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73380 and previous config saved to /var/cache/conftool/dbconfig/20250207-101559-root.json
  • 10:08 fnegri@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1014.eqiad.wmnet with reason: Rebooting clouddb1014 T384946
  • 10:07 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1014.eqiad.wmnet,service=s2
  • 10:07 marostegui@cumin1002: dbctl commit (dc=all): 'db1234 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73379 and previous config saved to /var/cache/conftool/dbconfig/20250207-100700-root.json
  • 10:07 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1014.eqiad.wmnet,service=s7
  • 10:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73378 and previous config saved to /var/cache/conftool/dbconfig/20250207-100302-root.json
  • 09:51 marostegui@cumin1002: dbctl commit (dc=all): 'db1234 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73377 and previous config saved to /var/cache/conftool/dbconfig/20250207-095154-root.json
  • 09:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73376 and previous config saved to /var/cache/conftool/dbconfig/20250207-094756-root.json
  • 09:36 marostegui@cumin1002: dbctl commit (dc=all): 'db1234 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73375 and previous config saved to /var/cache/conftool/dbconfig/20250207-093649-root.json
  • 09:15 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2206 (T384592)', diff saved to https://phabricator.wikimedia.org/P73374 and previous config saved to /var/cache/conftool/dbconfig/20250207-091459-marostegui.json
  • 09:14 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2206.codfw.wmnet with reason: Maintenance
  • 08:06 marostegui@cumin1002: dbctl commit (dc=all): 'es1030 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P73373 and previous config saved to /var/cache/conftool/dbconfig/20250207-080638-root.json
  • 08:02 marostegui@cumin1002: dbctl commit (dc=all): 'es1027 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P73372 and previous config saved to /var/cache/conftool/dbconfig/20250207-080218-root.json
  • 07:51 marostegui@cumin1002: dbctl commit (dc=all): 'es1030 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P73371 and previous config saved to /var/cache/conftool/dbconfig/20250207-075132-root.json
  • 07:47 marostegui@cumin1002: dbctl commit (dc=all): 'es1027 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P73370 and previous config saved to /var/cache/conftool/dbconfig/20250207-074712-root.json
  • 07:36 marostegui@cumin1002: dbctl commit (dc=all): 'es1030 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P73369 and previous config saved to /var/cache/conftool/dbconfig/20250207-073627-root.json
  • 07:32 marostegui@cumin1002: dbctl commit (dc=all): 'es1027 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P73368 and previous config saved to /var/cache/conftool/dbconfig/20250207-073207-root.json
  • 07:21 marostegui@cumin1002: dbctl commit (dc=all): 'es1030 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P73367 and previous config saved to /var/cache/conftool/dbconfig/20250207-072122-root.json
  • 07:17 marostegui@cumin1002: dbctl commit (dc=all): 'es1027 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P73366 and previous config saved to /var/cache/conftool/dbconfig/20250207-071702-root.json
  • 07:13 kevinbazira@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
  • 07:12 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-models' for release 'main' .
  • 07:08 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
  • 07:06 marostegui@cumin1002: dbctl commit (dc=all): 'es1030 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P73365 and previous config saved to /var/cache/conftool/dbconfig/20250207-070617-root.json
  • 07:06 root@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for es1030.eqiad.wmnet
  • 07:01 marostegui@cumin1002: dbctl commit (dc=all): 'es1027 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P73364 and previous config saved to /var/cache/conftool/dbconfig/20250207-070156-root.json
  • 07:01 root@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for es1027.eqiad.wmnet
  • 06:57 root@cumin1002: START - Cookbook sre.mysql.upgrade for es1030.eqiad.wmnet
  • 06:57 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1030', diff saved to https://phabricator.wikimedia.org/P73363 and previous config saved to /var/cache/conftool/dbconfig/20250207-065730-marostegui.json
  • 06:57 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es1026 to es2 master', diff saved to https://phabricator.wikimedia.org/P73362 and previous config saved to /var/cache/conftool/dbconfig/20250207-065700-root.json
  • 06:56 root@cumin1002: START - Cookbook sre.mysql.upgrade for es1027.eqiad.wmnet
  • 06:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1027', diff saved to https://phabricator.wikimedia.org/P73361 and previous config saved to /var/cache/conftool/dbconfig/20250207-065600-marostegui.json
  • 06:55 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es1029 to es1 master', diff saved to https://phabricator.wikimedia.org/P73360 and previous config saved to /var/cache/conftool/dbconfig/20250207-065546-root.json
  • 06:36 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1234.eqiad.wmnet with reason: Index rebuild
  • 06:36 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2145.codfw.wmnet with reason: Index rebuild
  • 06:36 root@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2145.codfw.wmnet
  • 06:35 root@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1234.eqiad.wmnet
  • 06:35 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1174.eqiad.wmnet with reason: Index rebuild
  • 06:35 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2150.codfw.wmnet with reason: Index rebuild
  • 06:34 root@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2150.codfw.wmnet
  • 06:34 root@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1174.eqiad.wmnet
  • 06:29 root@cumin1002: START - Cookbook sre.mysql.upgrade for db1234.eqiad.wmnet
  • 06:29 root@cumin1002: START - Cookbook sre.mysql.upgrade for db2145.codfw.wmnet
  • 06:28 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1234 db2145', diff saved to https://phabricator.wikimedia.org/P73359 and previous config saved to /var/cache/conftool/dbconfig/20250207-062857-marostegui.json
  • 06:28 root@cumin1002: START - Cookbook sre.mysql.upgrade for db1174.eqiad.wmnet
  • 06:28 root@cumin1002: START - Cookbook sre.mysql.upgrade for db2150.codfw.wmnet
  • 06:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1174 db2150', diff saved to https://phabricator.wikimedia.org/P73358 and previous config saved to /var/cache/conftool/dbconfig/20250207-062745-marostegui.json
  • 03:42 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2199.codfw.wmnet with reason: Maintenance
  • 03:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T384592)', diff saved to https://phabricator.wikimedia.org/P73357 and previous config saved to /var/cache/conftool/dbconfig/20250207-034149-marostegui.json
  • 03:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P73356 and previous config saved to /var/cache/conftool/dbconfig/20250207-032642-marostegui.json
  • 03:14 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1053.eqiad.wmnet with OS bookworm
  • 03:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P73355 and previous config saved to /var/cache/conftool/dbconfig/20250207-031134-marostegui.json
  • 02:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T384592)', diff saved to https://phabricator.wikimedia.org/P73354 and previous config saved to /var/cache/conftool/dbconfig/20250207-025628-marostegui.json
  • 02:00 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host ganeti1054.eqiad.wmnet with OS bookworm
  • 01:57 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1041.eqiad.wmnet
  • 01:54 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host ganeti1053.eqiad.wmnet with OS bookworm
  • 01:49 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 01:49 andrew@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudvirt1041.eqiad.wmnet
  • 01:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 01:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 01:41 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART

2025-02-06

  • 23:48 cstone: payments-wiki upgraded from d266fdf9 to 793998c0
  • 23:07 swfrench-wmf: ran cumin 'A:cp-text' 'run-puppet-agent -e "merging ATS Lua config change - T383845"' at 21:58:47 (retroactive)
  • 21:48 swfrench-wmf: ran cumin 'A:cp-text' 'disable-puppet "merging ATS Lua config change - T383845"'
  • 21:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T384592)', diff saved to https://phabricator.wikimedia.org/P73352 and previous config saved to /var/cache/conftool/dbconfig/20250206-212719-marostegui.json
  • 21:27 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 21:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T384592)', diff saved to https://phabricator.wikimedia.org/P73351 and previous config saved to /var/cache/conftool/dbconfig/20250206-212656-marostegui.json
  • 21:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P73350 and previous config saved to /var/cache/conftool/dbconfig/20250206-211149-marostegui.json
  • 20:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P73349 and previous config saved to /var/cache/conftool/dbconfig/20250206-205642-marostegui.json
  • 20:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P73348 and previous config saved to /var/cache/conftool/dbconfig/20250206-205437-root.json
  • 20:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T384592)', diff saved to https://phabricator.wikimedia.org/P73347 and previous config saved to /var/cache/conftool/dbconfig/20250206-204135-marostegui.json
  • 20:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P73346 and previous config saved to /var/cache/conftool/dbconfig/20250206-203932-root.json
  • 20:27 pt1979@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 20:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P73345 and previous config saved to /var/cache/conftool/dbconfig/20250206-202426-root.json
  • 20:21 pt1979@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 20:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P73344 and previous config saved to /var/cache/conftool/dbconfig/20250206-200921-root.json
  • 19:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P73343 and previous config saved to /var/cache/conftool/dbconfig/20250206-195417-root.json
  • 19:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2213', diff saved to https://phabricator.wikimedia.org/P73342 and previous config saved to /var/cache/conftool/dbconfig/20250206-195250-marostegui.json
  • 19:52 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2213.codfw.wmnet with reason: maintenance
  • 19:32 sukhe: sudo cumin 'A:cumin' 'run-puppet-agent'
  • 19:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1033.eqiad.wmnet
  • 18:44 marostegui@cumin1002: dbctl commit (dc=all): 'db2159 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73339 and previous config saved to /var/cache/conftool/dbconfig/20250206-184451-root.json
  • 18:42 cdanis@deploy2002: Finished scap sync-world: Backport for Route PHP8 Excimer profiles to separate ArcLamp sinks (T383845 T385395 T385199) (duration: 10m 58s)
  • 18:34 cdanis@deploy2002: cdanis: Continuing with sync
  • 18:33 cdanis@deploy2002: cdanis: Backport for Route PHP8 Excimer profiles to separate ArcLamp sinks (T383845 T385395 T385199) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 18:31 cdanis@deploy2002: Started scap sync-world: Backport for Route PHP8 Excimer profiles to separate ArcLamp sinks (T383845 T385395 T385199)
  • 18:29 marostegui@cumin1002: dbctl commit (dc=all): 'db2159 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73338 and previous config saved to /var/cache/conftool/dbconfig/20250206-182946-root.json
  • 18:28 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet2008-dev.codfw.wmnet
  • 18:22 andrew@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudnet2008-dev.codfw.wmnet
  • 18:21 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet2007-dev.codfw.wmnet
  • 18:15 andrew@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudnet2007-dev.codfw.wmnet
  • 18:15 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet2006-dev.codfw.wmnet
  • 18:14 marostegui@cumin1002: dbctl commit (dc=all): 'db2159 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73337 and previous config saved to /var/cache/conftool/dbconfig/20250206-181441-root.json
  • 18:08 andrew@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudnet2006-dev.codfw.wmnet
  • 18:08 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet2005-dev.codfw.wmnet
  • 18:01 andrew@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudnet2005-dev.codfw.wmnet
  • 17:59 marostegui@cumin1002: dbctl commit (dc=all): 'db2159 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73336 and previous config saved to /var/cache/conftool/dbconfig/20250206-175936-root.json
  • 17:55 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices2005-dev.codfw.wmnet
  • 17:48 andrew@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudservices2005-dev.codfw.wmnet
  • 17:48 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices2004-dev.codfw.wmnet
  • 17:44 marostegui@cumin1002: dbctl commit (dc=all): 'db2159 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73335 and previous config saved to /var/cache/conftool/dbconfig/20250206-174431-root.json
  • 17:41 andrew@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudservices2004-dev.codfw.wmnet
  • 17:18 marostegui@cumin1002: dbctl commit (dc=all): 'db1235 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73334 and previous config saved to /var/cache/conftool/dbconfig/20250206-171835-root.json
  • 17:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2188 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73333 and previous config saved to /var/cache/conftool/dbconfig/20250206-171626-root.json
  • 17:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73332 and previous config saved to /var/cache/conftool/dbconfig/20250206-171601-root.json
  • 17:15 swfrench-wmf: mw-api-int mw-jobrunner mw-parsoid reverted to 100% PHP 7.4 as of 17:03 - T383845
  • 17:03 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 17:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1235 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73331 and previous config saved to /var/cache/conftool/dbconfig/20250206-170330-root.json
  • 17:03 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 17:03 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 17:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 17:02 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
  • 17:02 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
  • 17:02 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
  • 17:01 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
  • 17:01 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:01 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 17:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2188 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73330 and previous config saved to /var/cache/conftool/dbconfig/20250206-170121-root.json
  • 17:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 17:00 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 17:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73329 and previous config saved to /var/cache/conftool/dbconfig/20250206-170055-root.json
  • 16:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 16:58 swfrench@deploy2002: Finished scap sync-world: Backport for Disable cookie-based enrollment in 8.1 (T383845) (duration: 10m 03s)
  • 16:52 swfrench@deploy2002: swfrench: Continuing with sync
  • 16:51 swfrench@deploy2002: swfrench: Backport for Disable cookie-based enrollment in 8.1 (T383845) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:48 swfrench@deploy2002: Started scap sync-world: Backport for Disable cookie-based enrollment in 8.1 (T383845)
  • 16:48 marostegui@cumin1002: dbctl commit (dc=all): 'db1235 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73328 and previous config saved to /var/cache/conftool/dbconfig/20250206-164825-root.json
  • 16:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2188 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73327 and previous config saved to /var/cache/conftool/dbconfig/20250206-164615-root.json
  • 16:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73326 and previous config saved to /var/cache/conftool/dbconfig/20250206-164550-root.json
  • 16:42 aikochou@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
  • 16:34 aikochou@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
  • 16:33 marostegui@cumin1002: dbctl commit (dc=all): 'db1235 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73325 and previous config saved to /var/cache/conftool/dbconfig/20250206-163320-root.json
  • 16:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2188 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73324 and previous config saved to /var/cache/conftool/dbconfig/20250206-163109-root.json
  • 16:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73323 and previous config saved to /var/cache/conftool/dbconfig/20250206-163044-root.json
  • 16:18 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 16:18 marostegui@cumin1002: dbctl commit (dc=all): 'db1235 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73322 and previous config saved to /var/cache/conftool/dbconfig/20250206-161814-root.json
  • 16:18 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 16:17 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 16:17 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 16:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2188 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73321 and previous config saved to /var/cache/conftool/dbconfig/20250206-161604-root.json
  • 16:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73320 and previous config saved to /var/cache/conftool/dbconfig/20250206-161539-root.json
  • 16:12 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 16:12 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 16:12 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 16:11 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[2051-2056].codfw.wmnet
  • 16:11 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:11 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ms-be[2051-2056].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin2002"
  • 16:09 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ms-be[2051-2056].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin2002"
  • 16:05 pt1979@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 16:01 mvernon@cumin2002: START - Cookbook sre.dns.netbox
  • 15:40 mvernon@cumin2002: START - Cookbook sre.hosts.decommission for hosts ms-be[2051-2056].codfw.wmnet
  • 15:38 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host fransc1001
  • 15:38 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host fransc1001
  • 15:36 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:36 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002"
  • 15:36 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002"
  • 15:34 godog: systemctl restart thanos-query on titan1*
  • 15:32 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 15:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T384592)', diff saved to https://phabricator.wikimedia.org/P73319 and previous config saved to /var/cache/conftool/dbconfig/20250206-150702-marostegui.json
  • 15:06 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 15:06 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 15:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T384592)', diff saved to https://phabricator.wikimedia.org/P73318 and previous config saved to /var/cache/conftool/dbconfig/20250206-150624-marostegui.json
  • 15:01 Lucas_WMDE: elukey@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . [re-log due to stashbot issue, originally logged 14:58 UTC]
  • 14:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1033.eqiad.wmnet
  • 14:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P73317 and previous config saved to /var/cache/conftool/dbconfig/20250206-145117-marostegui.json
  • 14:51 urbanecm@deploy2002: Finished scap sync-world: Backport for temp accounts: Enable IP reveal rights for local groups on meta (T356294) (duration: 13m 28s)
  • 14:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of kubestagemaster1003.eqiad.wmnet to plain
  • 14:46 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of kubestagemaster1003.eqiad.wmnet to plain
  • 14:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1033.eqiad.wmnet
  • 14:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1033.eqiad.wmnet
  • 14:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of kubestagemaster1003.eqiad.wmnet to drbd
  • 14:44 urbanecm@deploy2002: tchanders, urbanecm: Continuing with sync
  • 14:40 urbanecm@deploy2002: tchanders, urbanecm: Backport for temp accounts: Enable IP reveal rights for local groups on meta (T356294) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:37 urbanecm@deploy2002: Started scap sync-world: Backport for temp accounts: Enable IP reveal rights for local groups on meta (T356294)
  • 14:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P73316 and previous config saved to /var/cache/conftool/dbconfig/20250206-143609-marostegui.json
  • 14:30 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of kubestagemaster1003.eqiad.wmnet to drbd
  • 14:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1033.eqiad.wmnet
  • 14:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1033.eqiad.wmnet
  • 14:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1033.eqiad.wmnet
  • 14:26 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1033.eqiad.wmnet
  • 14:22 urbanecm@deploy2002: Finished scap sync-world: Backport for Disable new WebAuthn credentials creation (T378402 T354701) (duration: 14m 00s)
  • 14:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T384592)', diff saved to https://phabricator.wikimedia.org/P73314 and previous config saved to /var/cache/conftool/dbconfig/20250206-142102-marostegui.json
  • 14:16 urbanecm@deploy2002: pmiazga, urbanecm: Continuing with sync
  • 14:11 urbanecm@deploy2002: pmiazga, urbanecm: Backport for Disable new WebAuthn credentials creation (T378402 T354701) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:08 urbanecm@deploy2002: Started scap sync-world: Backport for Disable new WebAuthn credentials creation (T378402 T354701)
  • 14:04 urbanecm@deploy2002: Finished scap sync-world: Backport for Babel: Remove config that is now in community configuration (T385239), Babel: Do not use a wmg variable for BabelDefaultLevel (T119117), Babel: Merge back into InitialiseSettings.php (T385239) (duration: 10m 36s)
  • 13:58 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1038.eqiad.wmnet to cluster eqiad and group D
  • 13:57 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1038.eqiad.wmnet to cluster eqiad and group D
  • 13:57 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 13:56 urbanecm@deploy2002: urbanecm: Backport for Babel: Remove config that is now in community configuration (T385239), Babel: Do not use a wmg variable for BabelDefaultLevel (T119117), Babel: Merge back into InitialiseSettings.php (T385239) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1038.eqiad.wmnet
  • 13:53 urbanecm@deploy2002: Started scap sync-world: Backport for Babel: Remove config that is now in community configuration (T385239), Babel: Do not use a wmg variable for BabelDefaultLevel (T119117), Babel: Merge back into InitialiseSettings.php (T385239)
  • 13:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1038.eqiad.wmnet
  • 13:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1038.eqiad.wmnet with OS bookworm
  • 13:34 cgoubert@deploy2002: Finished scap sync-world: no-op deploy to clean up diff (duration: 02m 59s)
  • 13:32 cgoubert@deploy2002: Started scap sync-world: no-op deploy to clean up diff
  • 13:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1038.eqiad.wmnet with reason: host reimage
  • 13:21 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-cron: sync
  • 13:21 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-cron: sync
  • 13:19 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1235.eqiad.wmnet with reason: Index rebuild
  • 13:18 root@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1235.eqiad.wmnet
  • 13:18 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1038.eqiad.wmnet with reason: host reimage
  • 13:18 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2188.codfw.wmnet with reason: Index rebuild
  • 13:18 root@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2188.codfw.wmnet
  • 13:13 root@cumin1002: START - Cookbook sre.mysql.upgrade for db2188.codfw.wmnet
  • 13:13 root@cumin1002: START - Cookbook sre.mysql.upgrade for db1235.eqiad.wmnet
  • 13:13 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1235 db2188 T385561', diff saved to https://phabricator.wikimedia.org/P73313 and previous config saved to /var/cache/conftool/dbconfig/20250206-131300-marostegui.json
  • 13:04 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2159.codfw.wmnet with reason: Index rebuild
  • 13:04 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1191.eqiad.wmnet with reason: Index rebuild
  • 13:04 root@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2159.codfw.wmnet
  • 13:04 root@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1191.eqiad.wmnet
  • 12:59 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1038.eqiad.wmnet with OS bookworm
  • 12:58 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 12:57 kamila@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 12:57 root@cumin1002: START - Cookbook sre.mysql.upgrade for db1191.eqiad.wmnet
  • 12:57 root@cumin1002: START - Cookbook sre.mysql.upgrade for db2159.codfw.wmnet
  • 12:57 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2159 db1191 T385550', diff saved to https://phabricator.wikimedia.org/P73312 and previous config saved to /var/cache/conftool/dbconfig/20250206-125713-marostegui.json
  • 12:56 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1016.eqiad.wmnet,service=s5
  • 12:55 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1016.eqiad.wmnet,service=s8
  • 12:45 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:45 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 12:45 kamila@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:44 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 12:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti1038.eqiad.wmnet with reason: remove from cluster for reimage
  • 12:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1038.eqiad.wmnet
  • 12:40 kamila@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:40 kamila@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 12:40 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 12:39 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 12:30 ladsgroup@deploy2002: Finished scap sync-world: Backport for Set categorylinks to write both everywhere except commonswiki (T385164) (duration: 11m 50s)
  • 12:27 moritzm: installing openjpeg2 security updates
  • 12:23 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 12:22 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 12:21 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 12:21 ladsgroup@deploy2002: ladsgroup: Backport for Set categorylinks to write both everywhere except commonswiki (T385164) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:18 ladsgroup@deploy2002: Started scap sync-world: Backport for Set categorylinks to write both everywhere except commonswiki (T385164)
  • 12:11 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 12:11 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 12:06 moritzm: installing bind9 security updates (client-side libs/tools only)
  • 11:58 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 11:55 marostegui@cumin1002: dbctl commit (dc=all): 'db2208 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73311 and previous config saved to /var/cache/conftool/dbconfig/20250206-115556-root.json
  • 11:53 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:53 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 11:53 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:52 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 11:51 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1016.eqiad.wmnet
  • 11:51 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 11:51 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 11:50 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 11:50 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 11:49 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 11:49 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 11:49 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 11:48 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb1016.eqiad.wmnet
  • 11:40 marostegui@cumin1002: dbctl commit (dc=all): 'db2208 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73310 and previous config saved to /var/cache/conftool/dbconfig/20250206-114051-root.json
  • 11:40 moritzm: installing iperf3 security updates
  • 11:34 fnegri@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1016.eqiad.wmnet with reason: Rebooting clouddb1016 T384946
  • 11:32 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1016.eqiad.wmnet,service=s8
  • 11:32 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1016.eqiad.wmnet,service=s5
  • 11:25 marostegui@cumin1002: dbctl commit (dc=all): 'db2208 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73309 and previous config saved to /var/cache/conftool/dbconfig/20250206-112546-root.json
  • 11:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1194 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73308 and previous config saved to /var/cache/conftool/dbconfig/20250206-111559-root.json
  • 11:10 marostegui@cumin1002: dbctl commit (dc=all): 'db2208 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73307 and previous config saved to /var/cache/conftool/dbconfig/20250206-111041-root.json
  • 11:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1194 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73306 and previous config saved to /var/cache/conftool/dbconfig/20250206-110054-root.json
  • 10:55 marostegui@cumin1002: dbctl commit (dc=all): 'db2208 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73303 and previous config saved to /var/cache/conftool/dbconfig/20250206-105536-root.json
  • 10:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1194 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73301 and previous config saved to /var/cache/conftool/dbconfig/20250206-104549-root.json
  • 10:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1194 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73300 and previous config saved to /var/cache/conftool/dbconfig/20250206-103044-root.json
  • 10:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1194 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73299 and previous config saved to /var/cache/conftool/dbconfig/20250206-101538-root.json
  • 09:55 marostegui@cumin1002: dbctl commit (dc=all): 'db2236 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P73298 and previous config saved to /var/cache/conftool/dbconfig/20250206-095515-root.json
  • 09:52 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
  • 09:47 marostegui@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P73297 and previous config saved to /var/cache/conftool/dbconfig/20250206-094724-root.json
  • 09:40 marostegui@cumin1002: dbctl commit (dc=all): 'db2236 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P73296 and previous config saved to /var/cache/conftool/dbconfig/20250206-094009-root.json
  • 09:33 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.15 refs T382366
  • 09:32 marostegui@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P73295 and previous config saved to /var/cache/conftool/dbconfig/20250206-093218-root.json
  • 09:25 marostegui@cumin1002: dbctl commit (dc=all): 'db2236 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P73294 and previous config saved to /var/cache/conftool/dbconfig/20250206-092504-root.json
  • 09:21 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73293 and previous config saved to /var/cache/conftool/dbconfig/20250206-092139-root.json
  • 09:19 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=10; selector: name=wikikube-worker1002.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
  • 09:19 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=10; selector: name=wikikube-worker2001.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
  • 09:17 marostegui@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P73291 and previous config saved to /var/cache/conftool/dbconfig/20250206-091713-root.json
  • 09:10 marostegui@cumin1002: dbctl commit (dc=all): 'db2236 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P73290 and previous config saved to /var/cache/conftool/dbconfig/20250206-090959-root.json
  • 09:06 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73289 and previous config saved to /var/cache/conftool/dbconfig/20250206-090634-root.json
  • 09:02 marostegui@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P73288 and previous config saved to /var/cache/conftool/dbconfig/20250206-090208-root.json
  • 08:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2236 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P73287 and previous config saved to /var/cache/conftool/dbconfig/20250206-085454-root.json
  • 08:51 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73286 and previous config saved to /var/cache/conftool/dbconfig/20250206-085129-root.json
  • 08:51 root@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2236.codfw.wmnet
  • 08:47 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: maintenance
  • 08:47 marostegui@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P73285 and previous config saved to /var/cache/conftool/dbconfig/20250206-084703-root.json
  • 08:44 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1040.eqiad.wmnet with reason: maintenance
  • 08:43 root@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for es1040.eqiad.wmnet
  • 08:38 root@cumin1002: START - Cookbook sre.mysql.upgrade for db2236.codfw.wmnet
  • 08:37 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2236', diff saved to https://phabricator.wikimedia.org/P73284 and previous config saved to /var/cache/conftool/dbconfig/20250206-083758-marostegui.json
  • 08:37 root@cumin1002: START - Cookbook sre.mysql.upgrade for es1040.eqiad.wmnet
  • 08:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1040', diff saved to https://phabricator.wikimedia.org/P73283 and previous config saved to /var/cache/conftool/dbconfig/20250206-083654-marostegui.json
  • 08:36 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73282 and previous config saved to /var/cache/conftool/dbconfig/20250206-083623-root.json
  • 08:36 kartik@deploy2002: Finished scap sync-world: Backport for Enable section translation on Kanuri Wikipedia (T385185) (duration: 12m 25s)
  • 08:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T384592)', diff saved to https://phabricator.wikimedia.org/P73281 and previous config saved to /var/cache/conftool/dbconfig/20250206-083145-marostegui.json
  • 08:31 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 08:30 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=1; selector: name=wikikube-worker2001.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
  • 08:29 kartik@deploy2002: kartik, pppery: Continuing with sync
  • 08:28 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=1; selector: name=wikikube-worker1002.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
  • 08:26 kartik@deploy2002: kartik, pppery: Backport for Enable section translation on Kanuri Wikipedia (T385185) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:24 moritzm: rebalance codfw/B following OS updates T382508
  • 08:23 kartik@deploy2002: Started scap sync-world: Backport for Enable section translation on Kanuri Wikipedia (T385185)
  • 08:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1038.eqiad.wmnet
  • 08:21 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73280 and previous config saved to /var/cache/conftool/dbconfig/20250206-082117-root.json
  • 08:18 kartik@deploy2002: Finished scap sync-world: Backport for Make MT limit more strict by 10 Percentage Point in Bhojpuri Wikipedia (T383789) (duration: 13m 34s)
  • 08:17 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1038.eqiad.wmnet
  • 08:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1038.eqiad.wmnet
  • 08:12 Ammar: T385770 Ran mwscript-k8s extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=dawiki --logwiki=metawiki 'Sprucecopse' 'Renamed user 7cf752558fab818efdcacff8255d91ca'
  • 08:11 kartik@deploy2002: kartik: Continuing with sync
  • 08:09 kartik@deploy2002: kartik: Backport for Make MT limit more strict by 10 Percentage Point in Bhojpuri Wikipedia (T383789) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:05 kartik@deploy2002: Started scap sync-world: Backport for Make MT limit more strict by 10 Percentage Point in Bhojpuri Wikipedia (T383789)
  • 07:28 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Index rebuild
  • 07:28 root@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2213.codfw.wmnet
  • 07:23 root@cumin1002: START - Cookbook sre.mysql.upgrade for db2213.codfw.wmnet
  • 07:21 marostegui@dns1006: END - running authdns-update
  • 07:20 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2213 T385148', diff saved to https://phabricator.wikimedia.org/P73279 and previous config saved to /var/cache/conftool/dbconfig/20250206-072020-marostegui.json
  • 07:19 marostegui@dns1006: START - running authdns-update
  • 07:19 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2192 to s5 primary and set section read-write T385148', diff saved to https://phabricator.wikimedia.org/P73278 and previous config saved to /var/cache/conftool/dbconfig/20250206-071902-root.json
  • 07:18 marostegui@cumin1002: dbctl commit (dc=all): 'Set s5 codfw as read-only for maintenance - T385148', diff saved to https://phabricator.wikimedia.org/P73277 and previous config saved to /var/cache/conftool/dbconfig/20250206-071836-root.json
  • 07:18 marostegui: Starting s5 codfw failover from db2213 to db2192 - T385148
  • 07:07 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1194.eqiad.wmnet with reason: Index rebuild
  • 07:07 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2208.codfw.wmnet with reason: Index rebuild
  • 07:05 root@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1194.eqiad.wmnet
  • 07:02 root@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2208.codfw.wmnet
  • 06:59 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Primary switchover s5 T385148
  • 06:59 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2192 with weight 0 T385148', diff saved to https://phabricator.wikimedia.org/P73276 and previous config saved to /var/cache/conftool/dbconfig/20250206-065925-root.json
  • 06:58 root@cumin1002: START - Cookbook sre.mysql.upgrade for db2208.codfw.wmnet
  • 06:58 root@cumin1002: START - Cookbook sre.mysql.upgrade for db1194.eqiad.wmnet
  • 06:58 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2208 db1194 T385550', diff saved to https://phabricator.wikimedia.org/P73275 and previous config saved to /var/cache/conftool/dbconfig/20250206-065759-marostegui.json
  • 04:55 ejegg: payments-wiki upgraded from MW 1.39 to MW 1.43 (needs db update)
  • 04:01 ejegg: upgraded payments-wiki-staging from 7eeb643 to 4cdd67b
  • 03:22 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 03:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T384592)', diff saved to https://phabricator.wikimedia.org/P73274 and previous config saved to /var/cache/conftool/dbconfig/20250206-032148-marostegui.json
  • 03:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P73273 and previous config saved to /var/cache/conftool/dbconfig/20250206-030641-marostegui.json
  • 02:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P73272 and previous config saved to /var/cache/conftool/dbconfig/20250206-025134-marostegui.json
  • 02:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T384592)', diff saved to https://phabricator.wikimedia.org/P73271 and previous config saved to /var/cache/conftool/dbconfig/20250206-023626-marostegui.json

2025-02-05

  • 23:50 jdrewniak@deploy2002: Finished scap sync-world: Backport for Speed tests: Add HTML files for touch action (T118509) (duration: 11m 10s)
  • 23:44 jdrewniak@deploy2002: jdlrobson, jdrewniak: Continuing with sync
  • 23:42 jdrewniak@deploy2002: jdlrobson, jdrewniak: Backport for Speed tests: Add HTML files for touch action (T118509) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:39 jdrewniak@deploy2002: Started scap sync-world: Backport for Speed tests: Add HTML files for touch action (T118509)
  • 23:35 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2026.codfw.wmnet
  • 23:35 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:35 cwhite@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2026.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
  • 23:35 cwhite@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2026.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
  • 23:31 jdrewniak@deploy2002: Finished scap sync-world: Backport for Deploy dark mode to anonymous users for certain projects (February 2025) (T383451) (duration: 12m 27s)
  • 23:30 cwhite@cumin2002: START - Cookbook sre.dns.netbox
  • 23:25 jdrewniak@deploy2002: jdrewniak, jdlrobson: Continuing with sync
  • 23:22 jdrewniak@deploy2002: jdrewniak, jdlrobson: Backport for Deploy dark mode to anonymous users for certain projects (February 2025) (T383451) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:19 jdrewniak@deploy2002: Started scap sync-world: Backport for Deploy dark mode to anonymous users for certain projects (February 2025) (T383451)
  • 23:19 cwhite@cumin2002: START - Cookbook sre.hosts.decommission for hosts logstash2026.codfw.wmnet
  • 23:07 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2027.codfw.wmnet
  • 23:07 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:07 cwhite@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2027.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
  • 23:06 cwhite@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2027.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
  • 23:00 cwhite@cumin2002: START - Cookbook sre.dns.netbox
  • 22:55 cwhite@cumin2002: START - Cookbook sre.hosts.decommission for hosts logstash2027.codfw.wmnet
  • 22:54 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2028.codfw.wmnet
  • 22:54 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:54 cwhite@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2028.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
  • 22:51 cwhite@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2028.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
  • 22:45 cwhite@cumin2002: START - Cookbook sre.dns.netbox
  • 22:41 cwhite@cumin2002: START - Cookbook sre.hosts.decommission for hosts logstash2028.codfw.wmnet
  • 22:40 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2029.codfw.wmnet
  • 22:40 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:40 cwhite@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2029.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
  • 22:40 cwhite@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2029.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
  • 22:36 cwhite@cumin2002: START - Cookbook sre.dns.netbox
  • 22:31 cwhite@cumin2002: START - Cookbook sre.hosts.decommission for hosts logstash2029.codfw.wmnet
  • 21:42 jdrewniak@deploy2002: Finished scap sync-world: Backport for Enable $wgAllowAuthenticatedCrossOrigin on testwiki (T322944) (duration: 13m 50s)
  • 21:35 jdrewniak@deploy2002: lucaswerkmeister, jdrewniak: Continuing with sync
  • 21:31 jdrewniak@deploy2002: lucaswerkmeister, jdrewniak: Backport for Enable $wgAllowAuthenticatedCrossOrigin on testwiki (T322944) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:28 jdrewniak@deploy2002: Started scap sync-world: Backport for Enable $wgAllowAuthenticatedCrossOrigin on testwiki (T322944)
  • 21:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1249 (T384592)', diff saved to https://phabricator.wikimedia.org/P73270 and previous config saved to /var/cache/conftool/dbconfig/20250205-212751-marostegui.json
  • 21:27 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 21:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T384592)', diff saved to https://phabricator.wikimedia.org/P73269 and previous config saved to /var/cache/conftool/dbconfig/20250205-212729-marostegui.json
  • 21:21 cdanis: upgraded python3-conftool-requestctl and friends on puppetservers/puppetmasters
  • 21:14 cdanis: released new conftool 5.0.2 for all distros to apt.wm.o
  • 21:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P73268 and previous config saved to /var/cache/conftool/dbconfig/20250205-211222-marostegui.json
  • 20:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P73267 and previous config saved to /var/cache/conftool/dbconfig/20250205-205715-marostegui.json
  • 20:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 20:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T384592)', diff saved to https://phabricator.wikimedia.org/P73266 and previous config saved to /var/cache/conftool/dbconfig/20250205-204208-marostegui.json
  • 20:10 sukhe: granting brett member,reader role on beta
  • 18:33 marostegui@cumin1002: dbctl commit (dc=all): 'db1237 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P73264 and previous config saved to /var/cache/conftool/dbconfig/20250205-183318-root.json
  • 18:33 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P73263 and previous config saved to /var/cache/conftool/dbconfig/20250205-183306-root.json
  • 18:31 swfrench@deploy2002: Finished scap sync-world: Backport for Enroll 50% of client sessions in PHP 8.1 (T383845) (duration: 12m 57s)
  • 18:24 swfrench@deploy2002: swfrench: Continuing with sync
  • 18:22 swfrench@deploy2002: swfrench: Backport for Enroll 50% of client sessions in PHP 8.1 (T383845) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 18:18 marostegui@cumin1002: dbctl commit (dc=all): 'db1237 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P73262 and previous config saved to /var/cache/conftool/dbconfig/20250205-181813-root.json
  • 18:18 swfrench@deploy2002: Started scap sync-world: Backport for Enroll 50% of client sessions in PHP 8.1 (T383845)
  • 18:18 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P73261 and previous config saved to /var/cache/conftool/dbconfig/20250205-181801-root.json
  • 18:11 swfrench-wmf: mw-api-int to ~ 5% of traffic on PHP 8.1 - T383845
  • 18:11 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 18:10 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 18:10 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 18:10 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 18:07 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 18:07 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 18:06 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 18:06 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 18:04 swfrench-wmf: scaled mw-api-ext and mw-web next releases to 25% of main - T383845
  • 18:04 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 18:03 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 18:03 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 18:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1237 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P73259 and previous config saved to /var/cache/conftool/dbconfig/20250205-180307-root.json
  • 18:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 18:02 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P73258 and previous config saved to /var/cache/conftool/dbconfig/20250205-180256-root.json
  • 18:01 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 18:01 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 18:00 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 17:48 marostegui@cumin1002: dbctl commit (dc=all): 'db1237 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P73257 and previous config saved to /var/cache/conftool/dbconfig/20250205-174802-root.json
  • 17:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P73256 and previous config saved to /var/cache/conftool/dbconfig/20250205-174750-root.json
  • 17:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:32 marostegui@cumin1002: dbctl commit (dc=all): 'db1237 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P73255 and previous config saved to /var/cache/conftool/dbconfig/20250205-173257-root.json
  • 17:32 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P73254 and previous config saved to /var/cache/conftool/dbconfig/20250205-173245-root.json
  • 17:30 mutante: phab1004 - rm /lib/systemd/system/phabricator_stats_job_mfa_check.* for gerrit:1117489 T299403
  • 16:33 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 16:32 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 16:31 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 16:30 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 16:00 klausman@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
  • 15:59 klausman@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
  • 15:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2221 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73252 and previous config saved to /var/cache/conftool/dbconfig/20250205-155456-root.json
  • 15:51 swfrench-wmf: finished deploying conftool 5.0.1-1 - T383324
  • 15:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2221 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73251 and previous config saved to /var/cache/conftool/dbconfig/20250205-153951-root.json
  • 15:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2221 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73250 and previous config saved to /var/cache/conftool/dbconfig/20250205-152445-root.json
  • 15:20 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 15:19 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 15:19 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 15:18 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 15:15 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 15:15 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:56 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73247 and previous config saved to /var/cache/conftool/dbconfig/20250205-145647-root.json
  • 14:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2221 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73246 and previous config saved to /var/cache/conftool/dbconfig/20250205-145434-root.json
  • 14:53 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:52 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for kywiki: create draft namespace (T385593) (duration: 10m 54s)
  • 14:46 cmooney@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cr2-magru with reason: IBGP instability from cr1 to cr2 in magru causing ping faulures from alert1002
  • 14:46 lucaswerkmeister-wmde@deploy2002: anzx, lucaswerkmeister-wmde: Continuing with sync
  • 14:45 lucaswerkmeister-wmde@deploy2002: anzx, lucaswerkmeister-wmde: Backport for kywiki: create draft namespace (T385593) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:43 jynus: deploy new grants to analytics_meta T385565
  • 14:43 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1237.eqiad.wmnet onto db1179.eqiad.wmnet
  • 14:41 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for kywiki: create draft namespace (T385593)
  • 14:41 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73245 and previous config saved to /var/cache/conftool/dbconfig/20250205-144141-root.json
  • 14:40 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Add sourceswiki to $wgImportSources for all Wikisources (T385591) (duration: 29m 00s)
  • 14:39 klausman@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
  • 14:39 klausman@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
  • 14:33 lucaswerkmeister-wmde@deploy2002: jhsoby, lucaswerkmeister-wmde: Continuing with sync
  • 14:26 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73244 and previous config saved to /var/cache/conftool/dbconfig/20250205-142636-root.json
  • 14:15 lucaswerkmeister-wmde@deploy2002: jhsoby, lucaswerkmeister-wmde: Backport for Add sourceswiki to $wgImportSources for all Wikisources (T385591) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73243 and previous config saved to /var/cache/conftool/dbconfig/20250205-141131-root.json
  • 14:11 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Add sourceswiki to $wgImportSources for all Wikisources (T385591)
  • 14:00 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1248 (T384592)', diff saved to https://phabricator.wikimedia.org/P73241 and previous config saved to /var/cache/conftool/dbconfig/20250205-140039-marostegui.json
  • 14:00 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 14:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T384592)', diff saved to https://phabricator.wikimedia.org/P73240 and previous config saved to /var/cache/conftool/dbconfig/20250205-140017-marostegui.json
  • 13:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73238 and previous config saved to /var/cache/conftool/dbconfig/20250205-135320-root.json
  • 13:49 jynus: deploy removal of old hosts for the m1 dbbackups backup user T383871
  • 13:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P73237 and previous config saved to /var/cache/conftool/dbconfig/20250205-134510-marostegui.json
  • 13:38 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73236 and previous config saved to /var/cache/conftool/dbconfig/20250205-133815-root.json
  • 13:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P73235 and previous config saved to /var/cache/conftool/dbconfig/20250205-133003-marostegui.json
  • 13:25 klausman@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 13:24 klausman@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 13:24 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 13:24 klausman@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 13:23 fceratto@cumin1002: dbctl commit (dc=all): 'db1251 (re)pooling @ 100%: Pooling in', diff saved to https://phabricator.wikimedia.org/P73234 and previous config saved to /var/cache/conftool/dbconfig/20250205-132319-fceratto.json
  • 13:23 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73233 and previous config saved to /var/cache/conftool/dbconfig/20250205-132309-root.json
  • 13:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T384592)', diff saved to https://phabricator.wikimedia.org/P73232 and previous config saved to /var/cache/conftool/dbconfig/20250205-131456-marostegui.json
  • 13:08 fceratto@cumin1002: dbctl commit (dc=all): 'db1251 (re)pooling @ 75%: Pooling in', diff saved to https://phabricator.wikimedia.org/P73231 and previous config saved to /var/cache/conftool/dbconfig/20250205-130813-fceratto.json
  • 13:08 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73230 and previous config saved to /var/cache/conftool/dbconfig/20250205-130804-root.json
  • 12:53 fceratto@cumin1002: dbctl commit (dc=all): 'db1251 (re)pooling @ 50%: Pooling in', diff saved to https://phabricator.wikimedia.org/P73228 and previous config saved to /var/cache/conftool/dbconfig/20250205-125308-fceratto.json
  • 12:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73227 and previous config saved to /var/cache/conftool/dbconfig/20250205-125259-root.json
  • 12:50 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db1237.eqiad.wmnet onto db1179.eqiad.wmnet
  • 12:46 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1237.eqiad.wmnet onto db1179.eqiad.wmnet
  • 12:38 fceratto@cumin1002: dbctl commit (dc=all): 'db1251 (re)pooling @ 35%: Pooling in', diff saved to https://phabricator.wikimedia.org/P73226 and previous config saved to /var/cache/conftool/dbconfig/20250205-123803-fceratto.json
  • 12:22 fceratto@cumin1002: dbctl commit (dc=all): 'db1251 (re)pooling @ 30%: Pooling in', diff saved to https://phabricator.wikimedia.org/P73225 and previous config saved to /var/cache/conftool/dbconfig/20250205-122257-fceratto.json
  • 12:12 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
  • 12:12 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-cron: apply
  • 12:12 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 12:12 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 12:09 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 12:09 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 12:07 fceratto@cumin1002: dbctl commit (dc=all): 'db1251 (re)pooling @ 25%: Pooling in', diff saved to https://phabricator.wikimedia.org/P73224 and previous config saved to /var/cache/conftool/dbconfig/20250205-120752-fceratto.json
  • 12:06 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 12:05 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 12:03 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 12:03 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 12:00 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s1
  • 12:00 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s3
  • 12:00 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
  • 12:00 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-cron: apply
  • 11:56 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1017.eqiad.wmnet with reason: Rebuild tables
  • 11:52 fceratto@cumin1002: dbctl commit (dc=all): 'db1251 (re)pooling @ 20%: Pooling in', diff saved to https://phabricator.wikimedia.org/P73223 and previous config saved to /var/cache/conftool/dbconfig/20250205-115247-fceratto.json
  • 11:42 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
  • 11:42 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
  • 11:41 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1017.eqiad.wmnet
  • 11:38 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb1017.eqiad.wmnet
  • 11:37 fceratto@cumin1002: dbctl commit (dc=all): 'db1251 (re)pooling @ 15%: Pooling in', diff saved to https://phabricator.wikimedia.org/P73222 and previous config saved to /var/cache/conftool/dbconfig/20250205-113741-fceratto.json
  • 11:34 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
  • 11:34 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb1014.eqiad.wmnet with reason: Rebuild tables
  • 11:34 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
  • 11:33 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on clouddb1018.eqiad.wmnet with reason: Rebuild tables
  • 11:33 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1014.eqiad.wmnet with reason: Rebuild tables
  • 11:33 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Rebuild tables
  • 11:32 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1155.eqiad.wmnet with reason: Rebuild tables
  • 11:31 fnegri@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1017.eqiad.wmnet with reason: Rebooting clouddb1017 T384946
  • 11:31 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s3
  • 11:31 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=31
  • 11:31 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s1
  • 11:28 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 11:28 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 11:27 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 11:27 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 11:26 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 11:25 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 11:22 fceratto@cumin1002: dbctl commit (dc=all): 'db1251 (re)pooling @ 10%: Pooling in', diff saved to https://phabricator.wikimedia.org/P73221 and previous config saved to /var/cache/conftool/dbconfig/20250205-112236-fceratto.json
  • 11:11 godog: bounce thanos-query on titan1002
  • 11:07 fceratto@cumin1002: dbctl commit (dc=all): 'db1251 (re)pooling @ 7%: Pooling in', diff saved to https://phabricator.wikimedia.org/P73220 and previous config saved to /var/cache/conftool/dbconfig/20250205-110731-fceratto.json
  • 11:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1179', diff saved to https://phabricator.wikimedia.org/P73219 and previous config saved to /var/cache/conftool/dbconfig/20250205-110628-marostegui.json
  • 11:03 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1202.eqiad.wmnet with reason: Index rebuild
  • 11:03 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2221.codfw.wmnet with reason: Index rebuild
  • 10:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P73218 and previous config saved to /var/cache/conftool/dbconfig/20250205-105928-root.json
  • 10:51 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db1237.eqiad.wmnet onto db1179.eqiad.wmnet
  • 10:47 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1237', diff saved to https://phabricator.wikimedia.org/P73217 and previous config saved to /var/cache/conftool/dbconfig/20250205-104742-marostegui.json
  • 10:47 fceratto@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repooling after cloning db1251', diff saved to https://phabricator.wikimedia.org/P73216 and previous config saved to /var/cache/conftool/dbconfig/20250205-104732-fceratto.json
  • 10:45 fceratto@cumin1002: dbctl commit (dc=all): 'db1251 (re)pooling @ 5%: Pooling host to 5%', diff saved to https://phabricator.wikimedia.org/P73215 and previous config saved to /var/cache/conftool/dbconfig/20250205-104543-fceratto.json
  • 10:45 urbanecm@deploy2002: Finished scap sync-world: Backport for fix(AddLink): button should show after link preview (T385542) (duration: 12m 15s)
  • 10:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P73214 and previous config saved to /var/cache/conftool/dbconfig/20250205-104423-root.json
  • 10:43 marostegui: Set x1 to SBR for a bit T385645
  • 10:39 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 10:37 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1179', diff saved to https://phabricator.wikimedia.org/P73213 and previous config saved to /var/cache/conftool/dbconfig/20250205-103738-marostegui.json
  • 10:36 urbanecm@deploy2002: urbanecm: Backport for fix(AddLink): button should show after link preview (T385542) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:33 urbanecm@deploy2002: Started scap sync-world: Backport for fix(AddLink): button should show after link preview (T385542)
  • 10:32 fceratto@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repooling after cloning db1251', diff saved to https://phabricator.wikimedia.org/P73212 and previous config saved to /var/cache/conftool/dbconfig/20250205-103227-fceratto.json
  • 10:30 klausman@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 10:29 klausman@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 10:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T385645)', diff saved to https://phabricator.wikimedia.org/P73211 and previous config saved to /var/cache/conftool/dbconfig/20250205-102758-marostegui.json
  • 10:27 root@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1202.eqiad.wmnet
  • 10:27 klausman: pushing Changeprop patch (k8s values) https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1117063
  • 10:26 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1179 (T385645)', diff saved to https://phabricator.wikimedia.org/P73210 and previous config saved to /var/cache/conftool/dbconfig/20250205-102650-marostegui.json
  • 10:26 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 10:26 fceratto@cumin1002: dbctl commit (dc=all): 'db1251 (re)pooling @ 1%: Pooling in new host', diff saved to https://phabricator.wikimedia.org/P73209 and previous config saved to /var/cache/conftool/dbconfig/20250205-102614-fceratto.json
  • 10:25 root@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2221.codfw.wmnet
  • 10:20 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db1251.eqiad.wmnet
  • 10:20 fceratto@cumin1002: START - Cookbook sre.hosts.remove-downtime for db1251.eqiad.wmnet
  • 10:20 root@cumin1002: START - Cookbook sre.mysql.upgrade for db1202.eqiad.wmnet
  • 10:20 root@cumin1002: START - Cookbook sre.mysql.upgrade for db2221.codfw.wmnet
  • 10:20 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1202, db2221 for index rebuild', diff saved to https://phabricator.wikimedia.org/P73208 and previous config saved to /var/cache/conftool/dbconfig/20250205-102012-marostegui.json
  • 10:18 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:17 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 10:17 fceratto@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Repooling after cloning db1251', diff saved to https://phabricator.wikimedia.org/P73207 and previous config saved to /var/cache/conftool/dbconfig/20250205-101721-fceratto.json
  • 10:14 dcausse: restarting blazegraph on wdqs1012 (BlazegraphFreeAllocatorsDecreasingRapidly)
  • 10:02 fceratto@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Repooling after cloning db1251', diff saved to https://phabricator.wikimedia.org/P73205 and previous config saved to /var/cache/conftool/dbconfig/20250205-100216-fceratto.json
  • 09:58 mvernon@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ms-be2051.codfw.wmnet with reason: disk failed, due decom soon
  • 09:56 mvernon@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ms-be2075.codfw.wmnet with reason: hardware broken awaiting vendor action
  • 09:55 kevinbazira@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
  • 09:52 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-models' for release 'main' .
  • 09:47 fceratto@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Repooling after cloning db1251', diff saved to https://phabricator.wikimedia.org/P73203 and previous config saved to /var/cache/conftool/dbconfig/20250205-094711-fceratto.json
  • 09:46 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1018.eqiad.wmnet with reason: Rebuild tables
  • 09:39 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Index rebuild
  • 09:38 root@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1156.eqiad.wmnet
  • 09:32 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Rebuild tables
  • 09:32 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1014.eqiad.wmnet with reason: Rebuild tables
  • 09:32 root@cumin1002: START - Cookbook sre.mysql.upgrade for db1156.eqiad.wmnet
  • 09:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1156 for index rebuild', diff saved to https://phabricator.wikimedia.org/P73202 and previous config saved to /var/cache/conftool/dbconfig/20250205-093152-marostegui.json
  • 09:31 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[1155-1156].eqiad.wmnet with reason: Rebuild tables
  • 09:13 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.15 refs T382366
  • 06:53 eileen: civicrm upgraded from 5e01bd21 to d027bc7b
  • 06:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1247 (T384592)', diff saved to https://phabricator.wikimedia.org/P73201 and previous config saved to /var/cache/conftool/dbconfig/20250205-063911-marostegui.json
  • 06:39 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 05:50 kart_: Updated cxserver to 2025-02-03-095815-production (T377966, T385185)
  • 05:49 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 05:49 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 05:44 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 05:43 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 05:31 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 05:31 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 03:04 eileen: config revision changed from f6bc2c51 to f1416f7a
  • 02:45 eileen: civicrm upgraded from ab392bd2 to 5e01bd21
  • 02:34 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 02:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T384592)', diff saved to https://phabricator.wikimedia.org/P73200 and previous config saved to /var/cache/conftool/dbconfig/20250205-023428-marostegui.json
  • 02:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P73199 and previous config saved to /var/cache/conftool/dbconfig/20250205-021921-marostegui.json
  • 02:13 eileen: civicrm upgraded from b869d0c3 to ab392bd2
  • 02:12 wfan: donorwiki revision changed from a039cd50 to 98027151
  • 02:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P73198 and previous config saved to /var/cache/conftool/dbconfig/20250205-020414-marostegui.json
  • 02:02 eileen: config revision changed from dbf6e86a to f6bc2c51
  • 01:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T384592)', diff saved to https://phabricator.wikimedia.org/P73197 and previous config saved to /var/cache/conftool/dbconfig/20250205-014907-marostegui.json
  • 01:28 zabe: zabe@mwmaint2002:/tmp/uploads$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Dyolf77 /tmp/uploads # T385642
  • 00:30 eileen: civicrm upgraded from abe0fc61 to b869d0c3
  • 00:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T371742)', diff saved to https://phabricator.wikimedia.org/P73196 and previous config saved to /var/cache/conftool/dbconfig/20250205-001309-ladsgroup.json

2025-02-04

  • 23:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P73195 and previous config saved to /var/cache/conftool/dbconfig/20250204-235802-ladsgroup.json
  • 23:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P73194 and previous config saved to /var/cache/conftool/dbconfig/20250204-234255-ladsgroup.json
  • 23:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T371742)', diff saved to https://phabricator.wikimedia.org/P73193 and previous config saved to /var/cache/conftool/dbconfig/20250204-232748-ladsgroup.json
  • 22:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1222 (T371742)', diff saved to https://phabricator.wikimedia.org/P73192 and previous config saved to /var/cache/conftool/dbconfig/20250204-223744-ladsgroup.json
  • 22:37 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 22:35 ladsgroup@deploy2002: Synchronized portals: Bump portals to HEAD (duration: 03m 12s)
  • 22:32 ladsgroup@deploy2002: Synchronized portals/wikipedia.org/assets: Bump portals to HEAD (T368221 T373204) (duration: 09m 30s)
  • 22:18 ladsgroup@deploy2002: Finished scap sync-world: Backport for Set categorylinks to write both in group0 (T385164) (duration: 13m 20s)
  • 22:12 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 22:10 ladsgroup@deploy2002: ladsgroup: Backport for Set categorylinks to write both in group0 (T385164) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:05 ladsgroup@deploy2002: Started scap sync-world: Backport for Set categorylinks to write both in group0 (T385164)
  • 22:01 ladsgroup@deploy2002: Finished scap sync-world: Backport for Set file migration to write both everywhere except commons and enwiki (T384481) (duration: 11m 01s)
  • 21:55 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 21:53 ladsgroup@deploy2002: ladsgroup: Backport for Set file migration to write both everywhere except commons and enwiki (T384481) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:50 ladsgroup@deploy2002: Started scap sync-world: Backport for Set file migration to write both everywhere except commons and enwiki (T384481)
  • 21:48 jforrester@deploy2002: Finished scap sync-world: Backport for Drop old wikifunctions.ui event stream, replaced by ….wikifunctions_ui (T369949) (duration: 17m 43s)
  • 21:42 jforrester@deploy2002: jforrester: Continuing with sync
  • 21:36 jforrester@deploy2002: jforrester: Backport for Drop old wikifunctions.ui event stream, replaced by ….wikifunctions_ui (T369949) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:30 jforrester@deploy2002: Started scap sync-world: Backport for Drop old wikifunctions.ui event stream, replaced by ….wikifunctions_ui (T369949)
  • 21:29 jforrester@deploy2002: Finished scap sync-world: Backport for Parsoid fragment support: fix handling of 'nowiki' and 'general' strip markers (duration: 16m 39s)
  • 21:22 jforrester@deploy2002: cscott, jforrester: Continuing with sync
  • 21:17 jforrester@deploy2002: cscott, jforrester: Backport for Parsoid fragment support: fix handling of 'nowiki' and 'general' strip markers synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:12 jforrester@deploy2002: Started scap sync-world: Backport for Parsoid fragment support: fix handling of 'nowiki' and 'general' strip markers
  • 21:09 jforrester@deploy2002: Finished scap sync-world: Backport for [wikifunctionswiki] Set flags for repo mode (on) and client (off) (duration: 09m 56s)
  • 21:03 jforrester@deploy2002: jforrester: Continuing with sync
  • 21:02 jforrester@deploy2002: jforrester: Backport for [wikifunctionswiki] Set flags for repo mode (on) and client (off) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:59 jforrester@deploy2002: Started scap sync-world: Backport for [wikifunctionswiki] Set flags for repo mode (on) and client (off)
  • 20:37 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1244 (T384592)', diff saved to https://phabricator.wikimedia.org/P73191 and previous config saved to /var/cache/conftool/dbconfig/20250204-203754-marostegui.json
  • 20:37 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 20:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T384592)', diff saved to https://phabricator.wikimedia.org/P73190 and previous config saved to /var/cache/conftool/dbconfig/20250204-203732-marostegui.json
  • 20:36 swfrench-wmf: finished running puppet on A:cp-text after merging https://gerrit.wikimedia.org/r/1084247
  • 20:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P73189 and previous config saved to /var/cache/conftool/dbconfig/20250204-202225-marostegui.json
  • 20:09 swfrench-wmf: running puppet on A:cp-text after merging https://gerrit.wikimedia.org/r/1084247
  • 20:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P73188 and previous config saved to /var/cache/conftool/dbconfig/20250204-200718-marostegui.json
  • 20:07 swfrench-wmf: verified behavior of https://gerrit.wikimedia.org/r/1084247 on cp4040
  • 19:59 swfrench-wmf: disabled puppet on A:cp-text before merging https://gerrit.wikimedia.org/r/1084247
  • 19:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T384592)', diff saved to https://phabricator.wikimedia.org/P73187 and previous config saved to /var/cache/conftool/dbconfig/20250204-195211-marostegui.json
  • 18:42 swfrench-wmf: mw-api-int to ~ 2% of traffic on PHP 8.1 - T383845
  • 18:40 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 18:39 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 18:39 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 18:39 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 18:37 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 18:36 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 18:36 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 18:35 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 18:31 cwhite@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 18:31 cwhite@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 18:30 cwhite@deploy2002: helmfile [staging] DONE helmfile.d/services/kartotherian: apply
  • 18:30 cwhite@deploy2002: helmfile [staging] START helmfile.d/services/kartotherian: apply
  • 18:20 swfrench@deploy2002: Finished scap sync-world: Backport for Enroll 25% of client sessions in PHP 8.1 (T383845) (duration: 11m 25s)
  • 18:13 swfrench@deploy2002: swfrench: Continuing with sync
  • 18:12 swfrench@deploy2002: swfrench: Backport for Enroll 25% of client sessions in PHP 8.1 (T383845) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 18:09 swfrench@deploy2002: Started scap sync-world: Backport for Enroll 25% of client sessions in PHP 8.1 (T383845)
  • 18:05 swfrench-wmf: scaled mw-api-ext next to 15% of main release - T383845
  • 18:04 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 18:04 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 18:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 18:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 18:02 swfrench-wmf: scaled mw-web next to 15% of main release - T383845
  • 18:01 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 18:01 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 18:00 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 18:00 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 17:53 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 17:51 vgutierrez: repooling lvs4008 - T384477
  • 17:51 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 17:47 mutante: codesearch.wmflabs.org - hard reboot instance for needed mass reboots in cloud VPS
  • 17:47 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:40 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS bookworm
  • 17:18 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage
  • 17:17 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:16 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage
  • 17:14 marostegui@cumin1002: dbctl commit (dc=all): 'db2222 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73186 and previous config saved to /var/cache/conftool/dbconfig/20250204-171415-root.json
  • 17:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:11 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:01 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 16:59 marostegui@cumin1002: dbctl commit (dc=all): 'db2222 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73185 and previous config saved to /var/cache/conftool/dbconfig/20250204-165909-root.json
  • 16:58 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS bookworm
  • 16:56 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs4008.ulsfo.wmnet with OS bookworm
  • 16:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 16:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 16:49 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 16:48 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73184 and previous config saved to /var/cache/conftool/dbconfig/20250204-164802-root.json
  • 16:44 marostegui@cumin1002: dbctl commit (dc=all): 'db2222 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73183 and previous config saved to /var/cache/conftool/dbconfig/20250204-164405-root.json
  • 16:32 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73182 and previous config saved to /var/cache/conftool/dbconfig/20250204-163256-root.json
  • 16:29 marostegui@cumin1002: dbctl commit (dc=all): 'db2222 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73181 and previous config saved to /var/cache/conftool/dbconfig/20250204-162900-root.json
  • 16:17 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73180 and previous config saved to /var/cache/conftool/dbconfig/20250204-161751-root.json
  • 16:17 topranks: disable et-0/0/0 on cr3-ulsfo to prep for optic replacement T384288
  • 16:17 topranks: disable et-0/0/0 on cr3-ulsfo to prep for optic replacement
  • 16:16 cmooney@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: replace faulty optic et-0/0/0
  • 16:13 marostegui@cumin1002: dbctl commit (dc=all): 'db2222 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73179 and previous config saved to /var/cache/conftool/dbconfig/20250204-161355-root.json
  • 16:12 reedy@deploy2002: Finished scap sync-world: Backport for Poem: Null coalescence $in (T385588), Poem: Null coalescence $in (T385588), Hooks: Check for null option in onSpecialMuteModifyFormFields (T385169), Hooks: Check for null option in onSpecialMuteModifyFormFields (T385169) (duration: 09m 50s)
  • 16:06 reedy@deploy2002: reedy: Continuing with sync
  • 16:06 reedy@deploy2002: reedy: Backport for Poem: Null coalescence $in (T385588), Poem: Null coalescence $in (T385588), Hooks: Check for null option in onSpecialMuteModifyFormFields (T385169), Hooks: Check for null option in onSpecialMuteModifyFormFields (T385169) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:03 reedy@deploy2002: Started scap sync-world: Backport for Poem: Null coalescence $in (T385588), Poem: Null coalescence $in (T385588), Hooks: Check for null option in onSpecialMuteModifyFormFields (T385169), Hooks: Check for null option in onSpecialMuteModifyFormFields (T385169)
  • 16:02 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73178 and previous config saved to /var/cache/conftool/dbconfig/20250204-160246-root.json
  • 16:00 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) _etcd-server-ssl._tcp.aux-k8s-etcd.codfw.wmnet on all recursors
  • 16:00 herron@cumin1002: START - Cookbook sre.dns.wipe-cache _etcd-server-ssl._tcp.aux-k8s-etcd.codfw.wmnet on all recursors
  • 15:56 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage
  • 15:53 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage
  • 15:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 15:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 15:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73177 and previous config saved to /var/cache/conftool/dbconfig/20250204-154740-root.json
  • 15:44 herron@dns1004: END - running authdns-update
  • 15:44 mszabo@deploy2002: Finished scap sync-world: Backport for Remove flag wgSecurePollSingleTransferableVoteEnabled (T376930) (duration: 11m 21s)
  • 15:42 herron@dns1004: START - running authdns-update
  • 15:37 mszabo@deploy2002: mimurawil, mszabo: Continuing with sync
  • 15:36 mszabo@deploy2002: mimurawil, mszabo: Backport for Remove flag wgSecurePollSingleTransferableVoteEnabled (T376930) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:35 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS bookworm
  • 15:33 mszabo@deploy2002: Started scap sync-world: Backport for Remove flag wgSecurePollSingleTransferableVoteEnabled (T376930)
  • 15:32 vgutierrez: reimaging lvs4008 as a liberica LB - T384477
  • 15:29 mszabo@deploy2002: Finished scap sync-world: Backport for Remove flag $wgSecurePollSingleTransferableVoteEnabled (T376930), Remove flag $wgSecurePollSingleTransferableVoteEnabled (T376930) (duration: 13m 46s)
  • 15:23 mszabo@deploy2002: mszabo: Continuing with sync
  • 15:20 mszabo@deploy2002: mszabo: Backport for Remove flag $wgSecurePollSingleTransferableVoteEnabled (T376930), Remove flag $wgSecurePollSingleTransferableVoteEnabled (T376930) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:16 mszabo@deploy2002: Started scap sync-world: Backport for Remove flag $wgSecurePollSingleTransferableVoteEnabled (T376930), Remove flag $wgSecurePollSingleTransferableVoteEnabled (T376930)
  • 15:14 herron@dns1004: END - running authdns-update
  • 15:12 herron@dns1004: START - running authdns-update
  • 15:06 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Avoid PHP Notice on missing entityschema-meta-tags (T385272), Avoid PHP Notice on missing entityschema-meta-tags (T385272) (duration: 10m 56s)
  • 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
  • 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Avoid PHP Notice on missing entityschema-meta-tags (T385272), Avoid PHP Notice on missing entityschema-meta-tags (T385272) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:55 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Avoid PHP Notice on missing entityschema-meta-tags (T385272), Avoid PHP Notice on missing entityschema-meta-tags (T385272)
  • 14:54 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for EventStreamConfig: Add mediawiki.article_country_prediction_change stream (T382295) (duration: 16m 23s)
  • 14:46 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, kevinbazira: Continuing with sync
  • 14:43 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, kevinbazira: Backport for EventStreamConfig: Add mediawiki.article_country_prediction_change stream (T382295) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:38 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for EventStreamConfig: Add mediawiki.article_country_prediction_change stream (T382295)
  • 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for kowikisource: Add Draft namespace (T385162) (duration: 29m 05s)
  • 14:26 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, revi: Continuing with sync
  • 14:25 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, revi: Backport for kowikisource: Add Draft namespace (T385162) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:10 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
  • 14:10 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/kartotherian: sync
  • 14:09 Lucas_WMDE: lucaswerkmeister-wmde@deploy2002 Started scap sync-world: Backport for kowikisource: Add Draft namespace (T385162) # re-log from 14:07 UTC
  • 13:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1229 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73176 and previous config saved to /var/cache/conftool/dbconfig/20250204-134646-root.json
  • 13:44 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudgw1004.eqiad.wmnet with OS bullseye
  • 13:35 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudgw1002.eqiad.wmnet with OS bookworm
  • 13:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1229 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73175 and previous config saved to /var/cache/conftool/dbconfig/20250204-133141-root.json
  • 13:27 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1004.eqiad.wmnet with reason: host reimage
  • 13:23 aborrero@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1004.eqiad.wmnet with reason: host reimage
  • 13:17 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: host reimage
  • 13:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1229 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73174 and previous config saved to /var/cache/conftool/dbconfig/20250204-131636-root.json
  • 13:14 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: host reimage
  • 13:11 marostegui@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73173 and previous config saved to /var/cache/conftool/dbconfig/20250204-131118-root.json
  • 13:09 godog: upgrade poolcounter-prometheus-exporter to 0.1.2 - T333947
  • 13:07 aborrero@cumin1002: START - Cookbook sre.hosts.reimage for host cloudgw1004.eqiad.wmnet with OS bullseye
  • 13:04 aborrero@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudgw1004.eqiad.wmnet with OS bookworm
  • 12:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1229 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73168 and previous config saved to /var/cache/conftool/dbconfig/20250204-124625-root.json
  • 12:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P73167 and previous config saved to /var/cache/conftool/dbconfig/20250204-124345-marostegui.json
  • 12:41 marostegui@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73166 and previous config saved to /var/cache/conftool/dbconfig/20250204-124107-root.json
  • 12:40 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:39 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:38 jynus: deploying new backup grants for ES hosts T383902
  • 12:33 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 12:32 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 12:28 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs2011.codfw.wmnet,lvs6001.drmrs.wmnet,lvs1017.eqiad.wmnet,lvs3008.esams.wmnet,lvs7001.magru.wmnet} and A:lvs (T373027)
  • 12:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P73165 and previous config saved to /var/cache/conftool/dbconfig/20250204-122838-marostegui.json
  • 12:27 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs2011.codfw.wmnet,lvs6001.drmrs.wmnet,lvs1017.eqiad.wmnet,lvs3008.esams.wmnet,lvs7001.magru.wmnet} and A:lvs (T373027)
  • 12:26 vgutierrez: upgrading pybal on high-traffic1 load balancers - T373027
  • 12:26 marostegui@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73164 and previous config saved to /var/cache/conftool/dbconfig/20250204-122602-root.json
  • 12:25 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs2012.codfw.wmnet,lvs6002.drmrs.wmnet,lvs1018.eqiad.wmnet,lvs3009.esams.wmnet,lvs7002.magru.wmnet} and A:lvs (T373027)
  • 12:24 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs2012.codfw.wmnet,lvs6002.drmrs.wmnet,lvs1018.eqiad.wmnet,lvs3009.esams.wmnet,lvs7002.magru.wmnet} and A:lvs (T373027)
  • 12:23 vgutierrez: upgrading pybal on high-traffic2 load balancers - T373027
  • 12:23 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2222.codfw.wmnet with reason: Index rebuild
  • 12:21 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic (T373027)
  • 12:20 root@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2222.codfw.wmnet
  • 12:20 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic (T373027)
  • 12:18 vgutierrez: upgrading pybal on low-traffic load balancers - T373027
  • 12:17 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs2014.codfw.wmnet,lvs6003.drmrs.wmnet,lvs1020.eqiad.wmnet,lvs3010.esams.wmnet,lvs7003.magru.wmnet} and A:lvs (T373027)
  • 12:15 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs2014.codfw.wmnet,lvs6003.drmrs.wmnet,lvs1020.eqiad.wmnet,lvs3010.esams.wmnet,lvs7003.magru.wmnet} and A:lvs (T373027)
  • 12:15 root@cumin1002: START - Cookbook sre.mysql.upgrade for db2222.codfw.wmnet
  • 12:15 vgutierrez: upgrading pybal on secondary load balancers - T373027
  • 12:14 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2222 for index rebuild', diff saved to https://phabricator.wikimedia.org/P73163 and previous config saved to /var/cache/conftool/dbconfig/20250204-121450-marostegui.json
  • 12:14 marostegui@cumin1002: dbctl commit (dc=all): 'es2040 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P73162 and previous config saved to /var/cache/conftool/dbconfig/20250204-121400-root.json
  • 12:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T384592)', diff saved to https://phabricator.wikimedia.org/P73161 and previous config saved to /var/cache/conftool/dbconfig/20250204-121331-marostegui.json
  • 12:11 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs500[4-5]*} and A:lvs (T373027)
  • 12:10 marostegui@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73160 and previous config saved to /var/cache/conftool/dbconfig/20250204-121056-root.json
  • 12:10 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs500[4-5]*} and A:lvs (T373027)
  • 12:07 elukey: manually executed docker-system-prune-dangling.service on build2001
  • 12:04 elukey: manually dropped 2.5.1rocm6.2-1-20250202 on build2001 - T385531
  • 12:03 vgutierrez: upgrading pybal on eqsin - T373027
  • 11:59 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 11:59 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 11:58 marostegui@cumin1002: dbctl commit (dc=all): 'es2040 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P73158 and previous config saved to /var/cache/conftool/dbconfig/20250204-115855-root.json
  • 11:54 vgutierrez: uploaded pybal 1.15.15 to apt.wm.o (bullseye-wikimedia) T373027
  • 11:54 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Index rebuild
  • 11:54 root@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1227.eqiad.wmnet
  • 11:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1236 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73157 and previous config saved to /var/cache/conftool/dbconfig/20250204-115323-root.json
  • 11:48 root@cumin1002: START - Cookbook sre.mysql.upgrade for db1227.eqiad.wmnet
  • 11:48 jynus: deploying new backup grants for matomo and analytics_meta T383902
  • 11:48 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1227 for index rebuild', diff saved to https://phabricator.wikimedia.org/P73156 and previous config saved to /var/cache/conftool/dbconfig/20250204-114808-marostegui.json
  • 11:43 marostegui@cumin1002: dbctl commit (dc=all): 'es2040 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P73155 and previous config saved to /var/cache/conftool/dbconfig/20250204-114350-root.json
  • 11:41 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 11:39 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 11:38 marostegui@cumin1002: dbctl commit (dc=all): 'db1236 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73154 and previous config saved to /var/cache/conftool/dbconfig/20250204-113818-root.json
  • 11:34 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 11:33 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 11:33 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
  • 11:31 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
  • 11:28 marostegui@cumin1002: dbctl commit (dc=all): 'es2040 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P73153 and previous config saved to /var/cache/conftool/dbconfig/20250204-112844-root.json
  • 11:28 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
  • 11:26 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
  • 11:23 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:23 marostegui@cumin1002: dbctl commit (dc=all): 'db1236 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73152 and previous config saved to /var/cache/conftool/dbconfig/20250204-112313-root.json
  • 11:22 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:22 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:20 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:20 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 11:18 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 11:17 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:17 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:13 marostegui@cumin1002: dbctl commit (dc=all): 'es2040 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P73151 and previous config saved to /var/cache/conftool/dbconfig/20250204-111337-root.json
  • 11:08 marostegui@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73150 and previous config saved to /var/cache/conftool/dbconfig/20250204-110830-root.json
  • 11:08 marostegui@cumin1002: dbctl commit (dc=all): 'db1236 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73149 and previous config saved to /var/cache/conftool/dbconfig/20250204-110808-root.json
  • 11:03 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Index rebuild
  • 11:01 root@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1229.eqiad.wmnet
  • 10:59 root@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for es2040.codfw.wmnet
  • 10:56 root@cumin1002: START - Cookbook sre.mysql.upgrade for db1229.eqiad.wmnet
  • 10:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1229 for index rebuild', diff saved to https://phabricator.wikimedia.org/P73148 and previous config saved to /var/cache/conftool/dbconfig/20250204-105546-marostegui.json
  • 10:54 root@cumin1002: START - Cookbook sre.mysql.upgrade for es2040.codfw.wmnet
  • 10:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2040 for kernel reboot', diff saved to https://phabricator.wikimedia.org/P73147 and previous config saved to /var/cache/conftool/dbconfig/20250204-105411-marostegui.json
  • 10:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73146 and previous config saved to /var/cache/conftool/dbconfig/20250204-105323-root.json
  • 10:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1236 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73145 and previous config saved to /var/cache/conftool/dbconfig/20250204-105302-root.json
  • 10:44 Amir1: foreachwiki sql.php /srv/mediawiki/php-1.44.0-wmf.14/sql/mysql/patch-collation.sql (T384592)
  • 10:38 marostegui@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73144 and previous config saved to /var/cache/conftool/dbconfig/20250204-103818-root.json
  • 10:32 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 10:32 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 10:24 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:23 marostegui@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73143 and previous config saved to /var/cache/conftool/dbconfig/20250204-102313-root.json
  • 10:22 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 10:15 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/kartotherian: apply
  • 10:15 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/kartotherian: apply
  • 10:13 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/kartotherian: apply
  • 10:13 elukey: depool maps1006 from all services to run perf tests
  • 10:13 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/kartotherian: apply
  • 10:13 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/kartotherian: apply
  • 10:08 marostegui@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73141 and previous config saved to /var/cache/conftool/dbconfig/20250204-100807-root.json
  • 09:43 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P73140 and previous config saved to /var/cache/conftool/dbconfig/20250204-094344-root.json
  • 09:41 slyngshede@dns1004: END - running authdns-update
  • 09:39 slyngshede@dns1004: START - running authdns-update
  • 09:39 slyngshede@dns1004: START - running authdns-update
  • 09:39 slyngshede@dns1004: START - running authdns-update
  • 09:38 urbanecm: mwmaint2002: Kill `mediawiki_job_growthexperiments-refreshLinkRecommendations-s6[6640]` to pick new config (T378527)
  • 09:34 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/kartotherian: apply
  • 09:33 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/kartotherian: apply
  • 09:28 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P73139 and previous config saved to /var/cache/conftool/dbconfig/20250204-092838-root.json
  • 09:28 marostegui@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P73138 and previous config saved to /var/cache/conftool/dbconfig/20250204-092759-root.json
  • 09:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2209 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P73137 and previous config saved to /var/cache/conftool/dbconfig/20250204-092232-root.json
  • 09:18 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.15 refs T382366
  • 09:13 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P73136 and previous config saved to /var/cache/conftool/dbconfig/20250204-091334-root.json
  • 09:13 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:12 marostegui@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P73135 and previous config saved to /var/cache/conftool/dbconfig/20250204-091254-root.json
  • 09:12 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2209 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P73134 and previous config saved to /var/cache/conftool/dbconfig/20250204-090726-root.json
  • 09:04 urbanecm@deploy2002: Finished scap sync-world: Backport for Move link recommendation minimum tasks per topic to PHP configuration (T383714) (duration: 17m 28s)
  • 09:03 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db1169.eqiad.wmnet
  • 09:03 fceratto@cumin1002: START - Cookbook sre.hosts.remove-downtime for db1169.eqiad.wmnet
  • 08:58 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P73132 and previous config saved to /var/cache/conftool/dbconfig/20250204-085828-root.json
  • 08:57 marostegui@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P73131 and previous config saved to /var/cache/conftool/dbconfig/20250204-085749-root.json
  • 08:57 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1169.eqiad.wmnet with reason: Repooling after clone - T383760
  • 08:56 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 08:53 urbanecm@deploy2002: urbanecm: Backport for Move link recommendation minimum tasks per topic to PHP configuration (T383714) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:52 XioNoX: push pfw policies T384885
  • 08:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2209 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P73130 and previous config saved to /var/cache/conftool/dbconfig/20250204-085221-root.json
  • 08:46 urbanecm@deploy2002: Started scap sync-world: Backport for Move link recommendation minimum tasks per topic to PHP configuration (T383714)
  • 08:42 marostegui@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P73129 and previous config saved to /var/cache/conftool/dbconfig/20250204-084244-root.json
  • 08:40 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P73128 and previous config saved to /var/cache/conftool/dbconfig/20250204-084052-root.json
  • 08:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2209 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P73127 and previous config saved to /var/cache/conftool/dbconfig/20250204-083716-root.json
  • 08:36 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2039.codfw.wmnet with reason: Rebuild tables
  • 08:34 root@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for es2039.codfw.wmnet
  • 08:30 urbanecm@deploy2002: Finished scap sync-world: Backport for Add configurable MinimumTasksPerTopic (T383714), [Growth] Increase minimum tasks per topic to 2000 for eswiki, frwiki (T378527) (duration: 25m 56s)
  • 08:29 root@cumin1002: START - Cookbook sre.mysql.upgrade for es2039.codfw.wmnet
  • 08:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2039 for kernel reboot', diff saved to https://phabricator.wikimedia.org/P73125 and previous config saved to /var/cache/conftool/dbconfig/20250204-082912-marostegui.json
  • 08:27 marostegui@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P73124 and previous config saved to /var/cache/conftool/dbconfig/20250204-082738-root.json
  • 08:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2209 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P73123 and previous config saved to /var/cache/conftool/dbconfig/20250204-082210-root.json
  • 08:19 urbanecm@deploy2002: urbanecm, cyndywikime: Continuing with sync
  • 08:18 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1236.eqiad.wmnet with reason: Index rebuild
  • 08:18 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Index rebuild
  • 08:18 root@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2220.codfw.wmnet
  • 08:17 root@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1236.eqiad.wmnet
  • 08:15 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2227.codfw.wmnet onto db2209.codfw.wmnet
  • 08:13 urbanecm@deploy2002: urbanecm, cyndywikime: Backport for Add configurable MinimumTasksPerTopic (T383714), [Growth] Increase minimum tasks per topic to 2000 for eswiki, frwiki (T378527) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:12 root@cumin1002: START - Cookbook sre.mysql.upgrade for db1236.eqiad.wmnet
  • 08:12 root@cumin1002: START - Cookbook sre.mysql.upgrade for db2220.codfw.wmnet
  • 08:12 root@cumin1002: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 12:00:00 on db2220.codfw.wmnet with reason: Index rebuild
  • 08:11 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2220', diff saved to https://phabricator.wikimedia.org/P73122 and previous config saved to /var/cache/conftool/dbconfig/20250204-081151-marostegui.json
  • 08:11 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1236.eqiad.wmnet with reason: Index rebuild
  • 08:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1236', diff saved to https://phabricator.wikimedia.org/P73121 and previous config saved to /var/cache/conftool/dbconfig/20250204-081056-marostegui.json
  • 08:04 urbanecm@deploy2002: Started scap sync-world: Backport for Add configurable MinimumTasksPerTopic (T383714), [Growth] Increase minimum tasks per topic to 2000 for eswiki, frwiki (T378527)
  • 08:02 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Index rebuild
  • 08:01 root@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1197.eqiad.wmnet
  • 07:54 root@cumin1002: START - Cookbook sre.mysql.upgrade for db1197.eqiad.wmnet
  • 07:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1197', diff saved to https://phabricator.wikimedia.org/P73120 and previous config saved to /var/cache/conftool/dbconfig/20250204-075440-marostegui.json
  • 07:49 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Rebuild tables
  • 07:48 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1013.eqiad.wmnet with reason: Rebuild tables
  • 06:45 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db2227.codfw.wmnet onto db2209.codfw.wmnet
  • 06:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2227 to clone db2209', diff saved to https://phabricator.wikimedia.org/P73119 and previous config saved to /var/cache/conftool/dbconfig/20250204-064425-marostegui.json
  • 06:26 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 05:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T384592)', diff saved to https://phabricator.wikimedia.org/P73118 and previous config saved to /var/cache/conftool/dbconfig/20250204-054505-marostegui.json
  • 05:44 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 05:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T384592)', diff saved to https://phabricator.wikimedia.org/P73117 and previous config saved to /var/cache/conftool/dbconfig/20250204-054443-marostegui.json
  • 05:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P73116 and previous config saved to /var/cache/conftool/dbconfig/20250204-052936-marostegui.json
  • 05:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P73115 and previous config saved to /var/cache/conftool/dbconfig/20250204-051429-marostegui.json
  • 05:04 mwpresync@deploy2002: Pruned MediaWiki: 1.44.0-wmf.12 (duration: 04m 49s)
  • 05:01 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.15 refs T382366 (duration: 58m 53s)
  • 04:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T384592)', diff saved to https://phabricator.wikimedia.org/P73114 and previous config saved to /var/cache/conftool/dbconfig/20250204-045922-marostegui.json
  • 04:02 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.15 refs T382366
  • 03:19 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1054
  • 03:16 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1054
  • 03:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1053
  • 02:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1053
  • 02:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 02:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 02:43 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1053
  • 02:42 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1053
  • 02:42 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 02:40 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 02:06 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1054
  • 02:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1054
  • 02:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1053
  • 02:02 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1053
  • 02:01 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:58 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 01:12 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:11 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add dns entries for new frack nodes - pt1979@cumin2002"
  • 01:11 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add dns entries for new frack nodes - pt1979@cumin2002"
  • 01:08 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 00:00 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1241 (T384592)', diff saved to https://phabricator.wikimedia.org/P73112 and previous config saved to /var/cache/conftool/dbconfig/20250204-000010-marostegui.json
  • 00:00 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1241.eqiad.wmnet with reason: Maintenance

2025-02-03

  • 23:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T384592)', diff saved to https://phabricator.wikimedia.org/P73111 and previous config saved to /var/cache/conftool/dbconfig/20250203-235947-marostegui.json
  • 23:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P73110 and previous config saved to /var/cache/conftool/dbconfig/20250203-234440-marostegui.json
  • 23:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P73109 and previous config saved to /var/cache/conftool/dbconfig/20250203-232933-marostegui.json
  • 23:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T384592)', diff saved to https://phabricator.wikimedia.org/P73108 and previous config saved to /var/cache/conftool/dbconfig/20250203-231428-marostegui.json
  • 23:10 dwisehaupt@dns1004: END - running authdns-update
  • 23:09 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:09 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt franio1002 - vriley@cumin1002"
  • 23:09 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt franio1002 - vriley@cumin1002"
  • 23:08 dwisehaupt@dns1004: START - running authdns-update
  • 23:01 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 22:44 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:44 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt franio1001 - vriley@cumin1002"
  • 22:43 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt franio1001 - vriley@cumin1002"
  • 22:39 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 21:15 urbanecm@deploy2002: Finished scap sync-world: Backport for [Growth] enwiki: Enable mentorship for 75% of new accounts (T384505) (duration: 10m 22s)
  • 21:08 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 21:08 urbanecm@deploy2002: urbanecm: Backport for [Growth] enwiki: Enable mentorship for 75% of new accounts (T384505) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:04 urbanecm@deploy2002: Started scap sync-world: Backport for [Growth] enwiki: Enable mentorship for 75% of new accounts (T384505)
  • 20:33 rzl@deploy2002: Finished scap sync-world: T383952, T384137 (duration: 06m 10s)
  • 20:32 rzl@deploy2002: rzl: Continuing with sync
  • 20:31 rzl@deploy2002: rzl: T383952, T384137 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:29 rzl@deploy2002: Started scap sync-world: T383952, T384137
  • 19:47 rzl@deploy2002: Started scap sync-world: T383952, T384137
  • 19:40 swfrench-wmf: ran reprepro include mercurius 1.1.0-1 - T385225
  • 19:36 ejegg: fundraising civicrm upgraded from 3e566467 to abe0fc61
  • 19:00 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:00 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002"
  • 19:00 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002"
  • 18:55 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 18:51 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 18:50 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 18:50 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1169.eqiad.wmnet onto db1251.eqiad.wmnet
  • 18:46 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 18:45 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 18:41 swfrench-wmf: mw-api-int to ~ 1% of traffic on PHP 8.1 in codfw - T383845
  • 18:39 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 18:38 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 18:28 swfrench-wmf: mw-api-int to ~ 1% of traffic on PHP 8.1 in eqiad - T383845
  • 18:27 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 18:25 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 18:13 swfrench@deploy2002: Finished scap sync-world: Backport for Enroll 10% of client sessions in PHP 8.1 (T383845) (duration: 11m 13s)
  • 18:07 swfrench@deploy2002: swfrench: Continuing with sync
  • 18:06 swfrench@deploy2002: swfrench: Backport for Enroll 10% of client sessions in PHP 8.1 (T383845) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 18:02 swfrench@deploy2002: Started scap sync-world: Backport for Enroll 10% of client sessions in PHP 8.1 (T383845)
  • 18:01 urbanecm: [urbanecm@deploy2002 ~]$ mwscript-k8s -f -- extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=newiki --logwiki=metawiki 'Tarasssst' 'TR101' # T385503
  • 17:59 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1238 (T384592)', diff saved to https://phabricator.wikimedia.org/P73107 and previous config saved to /var/cache/conftool/dbconfig/20250203-175904-marostegui.json
  • 17:58 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1238.eqiad.wmnet with reason: Maintenance
  • 17:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T384592)', diff saved to https://phabricator.wikimedia.org/P73106 and previous config saved to /var/cache/conftool/dbconfig/20250203-175843-marostegui.json
  • 17:58 urbanecm: [urbanecm@deploy2002 ~]$ mwscript-k8s -f -- extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=newiki --logwiki=metawiki 'JOestby' 'Johannesoestby' # T385503
  • 17:47 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
  • 17:46 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply
  • 17:46 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
  • 17:46 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
  • 17:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P73105 and previous config saved to /var/cache/conftool/dbconfig/20250203-174336-marostegui.json
  • 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
  • 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply
  • 17:37 marostegui@cumin1002: dbctl commit (dc=all): 'db1188 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73104 and previous config saved to /var/cache/conftool/dbconfig/20250203-173748-root.json
  • 17:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P73103 and previous config saved to /var/cache/conftool/dbconfig/20250203-172829-marostegui.json
  • 17:22 marostegui@cumin1002: dbctl commit (dc=all): 'db1188 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73102 and previous config saved to /var/cache/conftool/dbconfig/20250203-172243-root.json
  • 17:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T384592)', diff saved to https://phabricator.wikimedia.org/P73101 and previous config saved to /var/cache/conftool/dbconfig/20250203-171322-marostegui.json
  • 17:07 marostegui@cumin1002: dbctl commit (dc=all): 'db1188 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73100 and previous config saved to /var/cache/conftool/dbconfig/20250203-170737-root.json
  • 16:58 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/kartotherian: sync
  • 16:57 elukey@deploy2002: helmfile [staging] START helmfile.d/services/kartotherian: sync
  • 16:52 marostegui@cumin1002: dbctl commit (dc=all): 'db1188 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73099 and previous config saved to /var/cache/conftool/dbconfig/20250203-165232-root.json
  • 16:44 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db1169.eqiad.wmnet onto db1251.eqiad.wmnet
  • 16:37 marostegui@cumin1002: dbctl commit (dc=all): 'db1188 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73097 and previous config saved to /var/cache/conftool/dbconfig/20250203-163727-root.json
  • 16:37 fceratto@cumin1002: dbctl commit (dc=all): 'Add db1251.eqiad.wmnet T385141', diff saved to https://phabricator.wikimedia.org/P73096 and previous config saved to /var/cache/conftool/dbconfig/20250203-163722-fceratto.json
  • 15:41 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1251.eqiad.wmnet with reason: provisioning - T385141
  • 15:40 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1169.eqiad.wmnet with reason: provisioning - T385141
  • 15:37 fceratto@cumin1002: dbctl commit (dc=all): 'Depool db1169.eqiad.wmnet T385141', diff saved to https://phabricator.wikimedia.org/P73093 and previous config saved to /var/cache/conftool/dbconfig/20250203-153755-fceratto.json
  • 14:42 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:33 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Enable VisualEditor EditCheck on dewiki (T385205) (duration: 10m 43s)
  • 14:26 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Index rebuild
  • 14:26 lucaswerkmeister-wmde@deploy2002: kemayo, lucaswerkmeister-wmde: Continuing with sync
  • 14:26 root@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1188.eqiad.wmnet
  • 14:26 lucaswerkmeister-wmde@deploy2002: kemayo, lucaswerkmeister-wmde: Backport for Enable VisualEditor EditCheck on dewiki (T385205) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:22 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Enable VisualEditor EditCheck on dewiki (T385205)
  • 14:20 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Change "$wgUploadMissingFileUrl" for svwiktionary (T383452) (duration: 14m 42s)
  • 14:19 root@cumin1002: START - Cookbook sre.mysql.upgrade for db1188.eqiad.wmnet
  • 14:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1188 T385084', diff saved to https://phabricator.wikimedia.org/P73091 and previous config saved to /var/cache/conftool/dbconfig/20250203-141939-marostegui.json
  • 14:13 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, dreamrimmer: Continuing with sync
  • 14:11 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, dreamrimmer: Backport for Change "$wgUploadMissingFileUrl" for svwiktionary (T383452) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:05 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Change "$wgUploadMissingFileUrl" for svwiktionary (T383452)
  • 13:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1182 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73088 and previous config saved to /var/cache/conftool/dbconfig/20250203-134742-root.json
  • 13:43 reedy@deploy2002: Finished scap sync-world: Backport for Add missing array_values for PHP 7 compatibility (T385255), SpecialMathWikibase: Null-coalescence getDescription() call (T385170), SpecialMathWikibase: Null-coalescence $par (T385269), ApiQueryContentTranslationSuggestions: Set default value for to and from parameters (T385267) (duration
  • 13:34 reedy@deploy2002: reedy: Continuing with sync
  • 13:34 reedy@deploy2002: reedy: Backport for Add missing array_values for PHP 7 compatibility (T385255), SpecialMathWikibase: Null-coalescence getDescription() call (T385170), SpecialMathWikibase: Null-coalescence $par (T385269), ApiQueryContentTranslationSuggestions: Set default value for to and from parameters (T385267) synced to the testservers (h
  • 13:32 marostegui@cumin1002: dbctl commit (dc=all): 'db1182 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73087 and previous config saved to /var/cache/conftool/dbconfig/20250203-133237-root.json
  • 13:28 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2209.codfw.wmnet with reason: Index rebuild
  • 13:27 root@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2209.codfw.wmnet
  • 13:27 reedy@deploy2002: Started scap sync-world: Backport for Add missing array_values for PHP 7 compatibility (T385255), SpecialMathWikibase: Null-coalescence getDescription() call (T385170), SpecialMathWikibase: Null-coalescence $par (T385269), ApiQueryContentTranslationSuggestions: Set default value for to and from parameters (T385267)
  • 13:23 marostegui@dns1006: END - running authdns-update
  • 13:23 root@cumin1002: START - Cookbook sre.mysql.upgrade for db2209.codfw.wmnet
  • 13:22 marostegui@dns1006: START - running authdns-update
  • 13:17 marostegui@cumin1002: dbctl commit (dc=all): 'db1182 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73085 and previous config saved to /var/cache/conftool/dbconfig/20250203-131732-root.json
  • 13:17 cgoubert@deploy2002: Unlocked for deployment [MediaWiki]: Emergency s3 switchover T385457 (duration: 07m 36s)
  • 13:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2209 T385457', diff saved to https://phabricator.wikimedia.org/P73084 and previous config saved to /var/cache/conftool/dbconfig/20250203-131631-marostegui.json
  • 13:15 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2205 to s3 primary and set section read-write T385457', diff saved to https://phabricator.wikimedia.org/P73083 and previous config saved to /var/cache/conftool/dbconfig/20250203-131542-root.json
  • 13:15 jebe@deploy2002: Finished deploy [airflow-dags/analytics_product@ce1f0f6]: (no justification provided) (duration: 00m 36s)
  • 13:14 marostegui@cumin1002: dbctl commit (dc=all): 'Set s3 codfw as read-only for maintenance - T385457', diff saved to https://phabricator.wikimedia.org/P73082 and previous config saved to /var/cache/conftool/dbconfig/20250203-131452-root.json
  • 13:14 jebe@deploy2002: Started deploy [airflow-dags/analytics_product@ce1f0f6]: (no justification provided)
  • 13:09 cgoubert@deploy2002: Locking from deployment [MediaWiki]: Emergency s3 switchover T385457
  • 13:07 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
  • 13:07 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-cron: apply
  • 13:07 cgoubert@deploy2002: Stopping before sync operations
  • 13:06 marostegui: Emergency s3 switchover T385457
  • 13:02 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2205 with weight 0 T385457', diff saved to https://phabricator.wikimedia.org/P73081 and previous config saved to /var/cache/conftool/dbconfig/20250203-130248-root.json
  • 13:02 marostegui@cumin1002: dbctl commit (dc=all): 'db1182 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73080 and previous config saved to /var/cache/conftool/dbconfig/20250203-130226-root.json
  • 13:01 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Primary switchover s3
  • 12:55 cgoubert@deploy2002: Started scap sync-world: Rebuild image and release file for mw-cron
  • 12:53 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
  • 12:53 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-cron: apply
  • 12:53 cgoubert@deploy2002: Finished scap sync-world: Testing scap deployment of mw-cron (duration: 02m 46s)
  • 12:51 cgoubert@deploy2002: Started scap sync-world: Testing scap deployment of mw-cron
  • 12:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1182 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73079 and previous config saved to /var/cache/conftool/dbconfig/20250203-124721-root.json
  • 12:25 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
  • 12:25 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-cron: apply
  • 12:19 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
  • 12:19 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-cron: apply
  • 12:11 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P73078 and previous config saved to /var/cache/conftool/dbconfig/20250203-121113-root.json
  • 11:56 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P73077 and previous config saved to /var/cache/conftool/dbconfig/20250203-115608-root.json
  • 11:41 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P73076 and previous config saved to /var/cache/conftool/dbconfig/20250203-114103-root.json
  • 11:28 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb1014.eqiad.wmnet with reason: Kernel reboot
  • 11:27 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb1018.eqiad.wmnet with reason: Kernel reboot
  • 11:25 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P73075 and previous config saved to /var/cache/conftool/dbconfig/20250203-112558-root.json
  • 11:24 marostegui: Reboot and upgrade db1155
  • 11:23 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1155.eqiad.wmnet with reason: Kernel reboot
  • 11:10 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P73074 and previous config saved to /var/cache/conftool/dbconfig/20250203-111052-root.json
  • 11:10 root@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for es2027.codfw.wmnet
  • 11:04 marostegui@dns1006: END - running authdns-update
  • 11:02 marostegui@dns1006: START - running authdns-update
  • 11:00 root@cumin1002: START - Cookbook sre.mysql.upgrade for es2027.codfw.wmnet
  • 10:59 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2027 for kernel reboot', diff saved to https://phabricator.wikimedia.org/P73073 and previous config saved to /var/cache/conftool/dbconfig/20250203-105935-marostegui.json
  • 10:59 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es2034 to es3 codfw master dbtmaint T376905', diff saved to https://phabricator.wikimedia.org/P73072 and previous config saved to /var/cache/conftool/dbconfig/20250203-105915-root.json
  • 10:36 marostegui@cumin1002: dbctl commit (dc=all): 'es2026 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P73071 and previous config saved to /var/cache/conftool/dbconfig/20250203-103649-root.json
  • 10:36 marostegui@cumin1002: dbctl commit (dc=all): 'es2037 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P73070 and previous config saved to /var/cache/conftool/dbconfig/20250203-103634-root.json
  • 10:21 marostegui@cumin1002: dbctl commit (dc=all): 'es2026 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P73069 and previous config saved to /var/cache/conftool/dbconfig/20250203-102144-root.json
  • 10:21 marostegui@cumin1002: dbctl commit (dc=all): 'es2037 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P73068 and previous config saved to /var/cache/conftool/dbconfig/20250203-102129-root.json
  • 10:06 marostegui@cumin1002: dbctl commit (dc=all): 'es2026 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P73067 and previous config saved to /var/cache/conftool/dbconfig/20250203-100638-root.json
  • 10:06 marostegui@cumin1002: dbctl commit (dc=all): 'es2037 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P73066 and previous config saved to /var/cache/conftool/dbconfig/20250203-100623-root.json
  • 10:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T384592)', diff saved to https://phabricator.wikimedia.org/P73065 and previous config saved to /var/cache/conftool/dbconfig/20250203-100300-marostegui.json
  • 10:02 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:02 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 10:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T384592)', diff saved to https://phabricator.wikimedia.org/P73064 and previous config saved to /var/cache/conftool/dbconfig/20250203-100221-marostegui.json
  • 09:51 marostegui@cumin1002: dbctl commit (dc=all): 'es2026 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P73063 and previous config saved to /var/cache/conftool/dbconfig/20250203-095133-root.json
  • 09:51 marostegui@cumin1002: dbctl commit (dc=all): 'es2037 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P73062 and previous config saved to /var/cache/conftool/dbconfig/20250203-095118-root.json
  • 09:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P73061 and previous config saved to /var/cache/conftool/dbconfig/20250203-094714-marostegui.json
  • 09:36 marostegui@cumin1002: dbctl commit (dc=all): 'es2026 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P73060 and previous config saved to /var/cache/conftool/dbconfig/20250203-093628-root.json
  • 09:36 marostegui@cumin1002: dbctl commit (dc=all): 'es2037 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P73059 and previous config saved to /var/cache/conftool/dbconfig/20250203-093613-root.json
  • 09:36 root@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for es2026.codfw.wmnet
  • 09:35 root@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for es2037.codfw.wmnet
  • 09:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P73058 and previous config saved to /var/cache/conftool/dbconfig/20250203-093207-marostegui.json
  • 09:14 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es2031 to es2 codfw master dbtmaint T376905', diff saved to https://phabricator.wikimedia.org/P73053 and previous config saved to /var/cache/conftool/dbconfig/20250203-091450-root.json
  • 09:13 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Index rebuild
  • 09:12 root@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1182.eqiad.wmnet
  • 09:07 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Index rebuild + upgrade
  • 09:06 root@cumin1002: START - Cookbook sre.mysql.upgrade for db1182.eqiad.wmnet
  • 09:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1182 T385084', diff saved to https://phabricator.wikimedia.org/P73052 and previous config saved to /var/cache/conftool/dbconfig/20250203-090558-marostegui.json
  • 08:37 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
  • 08:37 jelto@deploy2002: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
  • 08:37 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
  • 08:36 jelto@deploy2002: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
  • 08:35 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 08:34 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 08:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics: apply
  • 08:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics: apply
  • 08:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 08:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 02:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1199 (T384592)', diff saved to https://phabricator.wikimedia.org/P73051 and previous config saved to /var/cache/conftool/dbconfig/20250203-025443-marostegui.json
  • 02:54 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 02:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T384592)', diff saved to https://phabricator.wikimedia.org/P73050 and previous config saved to /var/cache/conftool/dbconfig/20250203-025421-marostegui.json
  • 02:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P73049 and previous config saved to /var/cache/conftool/dbconfig/20250203-023914-marostegui.json
  • 02:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P73048 and previous config saved to /var/cache/conftool/dbconfig/20250203-022407-marostegui.json
  • 02:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T384592)', diff saved to https://phabricator.wikimedia.org/P73047 and previous config saved to /var/cache/conftool/dbconfig/20250203-020900-marostegui.json

2025-02-02

  • 20:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1190 (T384592)', diff saved to https://phabricator.wikimedia.org/P73046 and previous config saved to /var/cache/conftool/dbconfig/20250202-200724-marostegui.json
  • 20:07 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 15:51 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 14:47 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db2212.codfw.wmnet with reason: Maintenance
  • 11:45 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 08:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T384592)', diff saved to https://phabricator.wikimedia.org/P73045 and previous config saved to /var/cache/conftool/dbconfig/20250202-085551-marostegui.json
  • 08:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P73044 and previous config saved to /var/cache/conftool/dbconfig/20250202-084044-marostegui.json
  • 08:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P73043 and previous config saved to /var/cache/conftool/dbconfig/20250202-082537-marostegui.json
  • 08:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T384592)', diff saved to https://phabricator.wikimedia.org/P73042 and previous config saved to /var/cache/conftool/dbconfig/20250202-081030-marostegui.json
  • 07:11 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T384592)', diff saved to https://phabricator.wikimedia.org/P73041 and previous config saved to /var/cache/conftool/dbconfig/20250202-071137-marostegui.json
  • 07:11 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db2216.codfw.wmnet with reason: Maintenance
  • 07:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T384592)', diff saved to https://phabricator.wikimedia.org/P73040 and previous config saved to /var/cache/conftool/dbconfig/20250202-071115-marostegui.json
  • 06:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P73039 and previous config saved to /var/cache/conftool/dbconfig/20250202-065608-marostegui.json
  • 06:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P73038 and previous config saved to /var/cache/conftool/dbconfig/20250202-064101-marostegui.json
  • 06:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T384592)', diff saved to https://phabricator.wikimedia.org/P73037 and previous config saved to /var/cache/conftool/dbconfig/20250202-062554-marostegui.json
  • 05:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2203 (T384592)', diff saved to https://phabricator.wikimedia.org/P73036 and previous config saved to /var/cache/conftool/dbconfig/20250202-052741-marostegui.json
  • 05:27 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db2203.codfw.wmnet with reason: Maintenance
  • 04:37 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db2202.codfw.wmnet with reason: Maintenance
  • 04:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T384592)', diff saved to https://phabricator.wikimedia.org/P73035 and previous config saved to /var/cache/conftool/dbconfig/20250202-043646-marostegui.json
  • 04:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P73034 and previous config saved to /var/cache/conftool/dbconfig/20250202-042139-marostegui.json
  • 04:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P73033 and previous config saved to /var/cache/conftool/dbconfig/20250202-040632-marostegui.json
  • 03:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T384592)', diff saved to https://phabricator.wikimedia.org/P73032 and previous config saved to /var/cache/conftool/dbconfig/20250202-035125-marostegui.json
  • 02:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T384592)', diff saved to https://phabricator.wikimedia.org/P73031 and previous config saved to /var/cache/conftool/dbconfig/20250202-025237-marostegui.json
  • 02:52 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 02:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T384592)', diff saved to https://phabricator.wikimedia.org/P73030 and previous config saved to /var/cache/conftool/dbconfig/20250202-025215-marostegui.json
  • 02:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P73029 and previous config saved to /var/cache/conftool/dbconfig/20250202-023708-marostegui.json
  • 02:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P73028 and previous config saved to /var/cache/conftool/dbconfig/20250202-022201-marostegui.json
  • 02:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T384592)', diff saved to https://phabricator.wikimedia.org/P73027 and previous config saved to /var/cache/conftool/dbconfig/20250202-020654-marostegui.json
  • 00:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T384592)', diff saved to https://phabricator.wikimedia.org/P73026 and previous config saved to /var/cache/conftool/dbconfig/20250202-005259-marostegui.json
  • 00:52 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 00:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T384592)', diff saved to https://phabricator.wikimedia.org/P73025 and previous config saved to /var/cache/conftool/dbconfig/20250202-005236-marostegui.json
  • 00:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P73024 and previous config saved to /var/cache/conftool/dbconfig/20250202-003730-marostegui.json
  • 00:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P73023 and previous config saved to /var/cache/conftool/dbconfig/20250202-002223-marostegui.json
  • 00:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T384592)', diff saved to https://phabricator.wikimedia.org/P73022 and previous config saved to /var/cache/conftool/dbconfig/20250202-000716-marostegui.json

2025-02-01

  • 22:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T384592)', diff saved to https://phabricator.wikimedia.org/P73021 and previous config saved to /var/cache/conftool/dbconfig/20250201-225519-marostegui.json
  • 22:55 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 22:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T384592)', diff saved to https://phabricator.wikimedia.org/P73020 and previous config saved to /var/cache/conftool/dbconfig/20250201-225456-marostegui.json
  • 22:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P73019 and previous config saved to /var/cache/conftool/dbconfig/20250201-223949-marostegui.json
  • 22:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P73018 and previous config saved to /var/cache/conftool/dbconfig/20250201-222442-marostegui.json
  • 20:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T384592)', diff saved to https://phabricator.wikimedia.org/P73016 and previous config saved to /var/cache/conftool/dbconfig/20250201-205602-marostegui.json
  • 20:55 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 20:55 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 20:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T384592)', diff saved to https://phabricator.wikimedia.org/P73015 and previous config saved to /var/cache/conftool/dbconfig/20250201-205525-marostegui.json
  • 20:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P73014 and previous config saved to /var/cache/conftool/dbconfig/20250201-204018-marostegui.json
  • 20:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P73013 and previous config saved to /var/cache/conftool/dbconfig/20250201-202511-marostegui.json
  • 20:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T384592)', diff saved to https://phabricator.wikimedia.org/P73012 and previous config saved to /var/cache/conftool/dbconfig/20250201-201004-marostegui.json
  • 19:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T384592)', diff saved to https://phabricator.wikimedia.org/P73011 and previous config saved to /var/cache/conftool/dbconfig/20250201-190526-marostegui.json
  • 19:05 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 19:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T384592)', diff saved to https://phabricator.wikimedia.org/P73010 and previous config saved to /var/cache/conftool/dbconfig/20250201-190504-marostegui.json
  • 18:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P73009 and previous config saved to /var/cache/conftool/dbconfig/20250201-184957-marostegui.json
  • 18:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P73008 and previous config saved to /var/cache/conftool/dbconfig/20250201-183450-marostegui.json
  • 18:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T384592)', diff saved to https://phabricator.wikimedia.org/P73007 and previous config saved to /var/cache/conftool/dbconfig/20250201-181943-marostegui.json
  • 17:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T384592)', diff saved to https://phabricator.wikimedia.org/P73006 and previous config saved to /var/cache/conftool/dbconfig/20250201-170624-marostegui.json
  • 17:06 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 17:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T384592)', diff saved to https://phabricator.wikimedia.org/P73005 and previous config saved to /var/cache/conftool/dbconfig/20250201-170602-marostegui.json
  • 16:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P73004 and previous config saved to /var/cache/conftool/dbconfig/20250201-165055-marostegui.json
  • 16:41 cmooney@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on cr2-magru with reason: IBGP instability from cr1 to cr2 in magru causing ping faulures from alert1002
  • 16:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P73003 and previous config saved to /var/cache/conftool/dbconfig/20250201-163548-marostegui.json
  • 16:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T384592)', diff saved to https://phabricator.wikimedia.org/P73002 and previous config saved to /var/cache/conftool/dbconfig/20250201-162041-marostegui.json
  • 15:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T384592)', diff saved to https://phabricator.wikimedia.org/P73001 and previous config saved to /var/cache/conftool/dbconfig/20250201-151709-marostegui.json
  • 15:17 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 15:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T384592)', diff saved to https://phabricator.wikimedia.org/P73000 and previous config saved to /var/cache/conftool/dbconfig/20250201-151646-marostegui.json
  • 15:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P72999 and previous config saved to /var/cache/conftool/dbconfig/20250201-150139-marostegui.json
  • 14:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P72998 and previous config saved to /var/cache/conftool/dbconfig/20250201-144632-marostegui.json
  • 14:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T384592)', diff saved to https://phabricator.wikimedia.org/P72997 and previous config saved to /var/cache/conftool/dbconfig/20250201-143125-marostegui.json
  • 13:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T384592)', diff saved to https://phabricator.wikimedia.org/P72996 and previous config saved to /var/cache/conftool/dbconfig/20250201-131925-marostegui.json
  • 13:19 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 12:22 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 11:18 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 10:22 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 09:24 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 09:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T384592)', diff saved to https://phabricator.wikimedia.org/P72995 and previous config saved to /var/cache/conftool/dbconfig/20250201-092349-marostegui.json
  • 09:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P72994 and previous config saved to /var/cache/conftool/dbconfig/20250201-090842-marostegui.json
  • 08:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P72993 and previous config saved to /var/cache/conftool/dbconfig/20250201-085335-marostegui.json
  • 08:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T384592)', diff saved to https://phabricator.wikimedia.org/P72992 and previous config saved to /var/cache/conftool/dbconfig/20250201-083827-marostegui.json
  • 07:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T384592)', diff saved to https://phabricator.wikimedia.org/P72991 and previous config saved to /var/cache/conftool/dbconfig/20250201-073139-marostegui.json
  • 07:31 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 07:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T384592)', diff saved to https://phabricator.wikimedia.org/P72990 and previous config saved to /var/cache/conftool/dbconfig/20250201-073116-marostegui.json
  • 07:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P72989 and previous config saved to /var/cache/conftool/dbconfig/20250201-071609-marostegui.json
  • 07:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P72988 and previous config saved to /var/cache/conftool/dbconfig/20250201-070103-marostegui.json
  • 06:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T384592)', diff saved to https://phabricator.wikimedia.org/P72987 and previous config saved to /var/cache/conftool/dbconfig/20250201-064555-marostegui.json
  • 05:30 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T384592)', diff saved to https://phabricator.wikimedia.org/P72986 and previous config saved to /var/cache/conftool/dbconfig/20250201-053027-marostegui.json
  • 05:30 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db1234.eqiad.wmnet with reason: Maintenance
  • 05:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T384592)', diff saved to https://phabricator.wikimedia.org/P72985 and previous config saved to /var/cache/conftool/dbconfig/20250201-053005-marostegui.json
  • 05:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P72984 and previous config saved to /var/cache/conftool/dbconfig/20250201-051458-marostegui.json
  • 04:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P72983 and previous config saved to /var/cache/conftool/dbconfig/20250201-045951-marostegui.json
  • 04:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T384592)', diff saved to https://phabricator.wikimedia.org/P72982 and previous config saved to /var/cache/conftool/dbconfig/20250201-044444-marostegui.json
  • 03:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T384592)', diff saved to https://phabricator.wikimedia.org/P72981 and previous config saved to /var/cache/conftool/dbconfig/20250201-033412-marostegui.json
  • 03:34 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 03:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T384592)', diff saved to https://phabricator.wikimedia.org/P72980 and previous config saved to /var/cache/conftool/dbconfig/20250201-033350-marostegui.json
  • 03:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P72979 and previous config saved to /var/cache/conftool/dbconfig/20250201-031843-marostegui.json
  • 03:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P72978 and previous config saved to /var/cache/conftool/dbconfig/20250201-030337-marostegui.json
  • 02:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T384592)', diff saved to https://phabricator.wikimedia.org/P72977 and previous config saved to /var/cache/conftool/dbconfig/20250201-024829-marostegui.json
  • 01:37 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1219 (T384592)', diff saved to https://phabricator.wikimedia.org/P72976 and previous config saved to /var/cache/conftool/dbconfig/20250201-013748-marostegui.json
  • 01:37 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 01:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T384592)', diff saved to https://phabricator.wikimedia.org/P72975 and previous config saved to /var/cache/conftool/dbconfig/20250201-013726-marostegui.json
  • 01:25 brett: import ncmonitor 1.3.1 into bookworm-wikimedia
  • 01:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P72974 and previous config saved to /var/cache/conftool/dbconfig/20250201-012219-marostegui.json
  • 01:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P72973 and previous config saved to /var/cache/conftool/dbconfig/20250201-010712-marostegui.json
  • 00:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T384592)', diff saved to https://phabricator.wikimedia.org/P72971 and previous config saved to /var/cache/conftool/dbconfig/20250201-005205-marostegui.json

Archives

See Server Admin Log/Archives.