Jump to content

Server Admin Log

From Wikitech
(Redirected from Server admin log)

2025-04-04

  • 21:18 inflatador: bking@apt1002 publish-wmf-opensearch-search-plugins_1.3.20-4 to component/opensearch13 bullseye-wikimedia 1134285
  • 20:22 urandom: starting `nodetool garbage collect -j 2`, sessionstore Cassandra
  • 19:03 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 19:03 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 18:57 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 18:56 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 18:49 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2045.codfw.wmnet with OS bookworm
  • 18:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2046.codfw.wmnet with OS bookworm
  • 18:34 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2045.codfw.wmnet with OS bookworm
  • 17:12 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:10 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:10 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:10 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:09 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:09 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:04 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:03 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:01 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:00 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:57 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:57 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:49 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:49 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:46 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:45 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:45 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:35 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:22 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:22 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:49 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:48 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:46 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:46 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:46 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:46 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:42 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:41 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:11 tchin@deploy1003: Finished deploy [airflow-dags/analytics@bece0a7]: (no justification provided) (duration: 00m 34s)
  • 15:11 tchin@deploy1003: Started deploy [airflow-dags/analytics@bece0a7]: (no justification provided)
  • 15:05 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 15:04 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 15:03 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 15:03 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 15:03 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 15:03 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 15:00 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:59 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:55 tchin@deploy1003: Finished deploy [analytics/refinery@c4ab9ef] (thin): THIN [analytics/refinery@c4ab9efd] (duration: 00m 59s)
  • 14:54 tchin@deploy1003: Started deploy [analytics/refinery@c4ab9ef] (thin): THIN [analytics/refinery@c4ab9efd]
  • 14:53 tchin@deploy1003: Finished deploy [analytics/refinery@c4ab9ef]: [analytics/refinery@c4ab9efd] (duration: 02m 54s)
  • 14:50 tchin@deploy1003: Started deploy [analytics/refinery@c4ab9ef]: [analytics/refinery@c4ab9efd]
  • 14:49 tchin@deploy1003: Finished deploy [analytics/refinery@c4ab9ef] (hadoop-test): TEST [analytics/refinery@c4ab9efd] (duration: 03m 01s)
  • 14:46 tchin@deploy1003: Started deploy [analytics/refinery@c4ab9ef] (hadoop-test): TEST [analytics/refinery@c4ab9efd]
  • 14:45 tchin: Deploying refinery for T389162
  • 14:43 claime: Extending root vg on mwmaint1002 by 20GB
  • 13:11 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:10 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:05 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:05 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 11:01 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Add Item and CustomItem classes as properties to `$.ui.ooMenu` (T390949) (duration: 15m 04s)
  • 10:54 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Continuing with sync
  • 10:54 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Backport for Add Item and CustomItem classes as properties to `$.ui.ooMenu` (T390949) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:46 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Add Item and CustomItem classes as properties to `$.ui.ooMenu` (T390949)
  • 10:39 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be1070.eqiad.wmnet
  • 10:38 mvernon@cumin1002: START - Cookbook sre.hosts.remove-downtime for ms-be1070.eqiad.wmnet
  • 10:02 Emperor: bulk-VACUUM of container dbs ms-be1070 T377827
  • 10:02 mvernon@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be1070.eqiad.wmnet with reason: vacuum overlarge container dbs
  • 09:57 moritzm: installing vim security updates
  • 09:45 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 09:44 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 09:39 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 09:29 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 08:40 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 08:30 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 06:45 aqu@deploy1003: Finished deploy [airflow-dags/analytics_test@d6ad899]: Update artifacts for analytics_test (duration: 00m 15s)
  • 06:45 aqu@deploy1003: Started deploy [airflow-dags/analytics_test@d6ad899]: Update artifacts for analytics_test
  • 06:44 aqu@deploy1003: Finished deploy [airflow-dags/analytics@d6ad899]: Update artifacts for analytics (duration: 00m 35s)
  • 06:44 aqu@deploy1003: Started deploy [airflow-dags/analytics@d6ad899]: Update artifacts for analytics
  • 05:07 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on db2186.codfw.wmnet with reason: Maintenance in sanitarium
  • 05:06 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on db1154.eqiad.wmnet with reason: Maintenance in sanitarium
  • 05:02 TimStarling: on mwmaint1002 ran cleanupBlocks.php on all wikis
  • 00:51 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 00:41 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 00:34 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 00:24 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 00:23 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 00:14 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 00:11 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 00:10 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply

2025-04-03

  • 23:45 tstarling@deploy1003: Finished scap sync-world: Backport for Enable Codex and Multiblocks in German and Italian wiki (T377121) (duration: 15m 25s)
  • 23:38 tstarling@deploy1003: hmonroy, tstarling: Continuing with sync
  • 23:35 tstarling@deploy1003: hmonroy, tstarling: Backport for Enable Codex and Multiblocks in German and Italian wiki (T377121) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:30 tstarling@deploy1003: Started scap sync-world: Backport for Enable Codex and Multiblocks in German and Italian wiki (T377121)
  • 21:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2056.codfw.wmnet with OS bullseye
  • 21:37 James_F: Backport deploy done.
  • 21:36 jforrester@deploy1003: Finished scap sync-world: Backport for Revert "VE: Enable mobile insert menu everywhere except top 20 mobile VE wikipedias" (duration: 15m 28s)
  • 21:29 jforrester@deploy1003: jforrester: Continuing with sync
  • 21:29 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cirrussearch* for ban cirrus nodes to prevent replication problems - bking@cumin2002 - T388610
  • 21:29 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: cirrussearch* for ban cirrus nodes to prevent replication problems - bking@cumin2002 - T388610
  • 21:28 jforrester@deploy1003: jforrester: Backport for Revert "VE: Enable mobile insert menu everywhere except top 20 mobile VE wikipedias" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:21 jforrester@deploy1003: Started scap sync-world: Backport for Revert "VE: Enable mobile insert menu everywhere except top 20 mobile VE wikipedias"
  • 21:19 jforrester@deploy1003: Sync cancelled.
  • 21:13 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2056.codfw.wmnet with reason: host reimage
  • 21:09 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2056.codfw.wmnet with reason: host reimage
  • 21:06 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2056
  • 21:06 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2056
  • 21:06 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2056.codfw.wmnet with OS bullseye
  • 21:06 jforrester@deploy1003: esanders, jforrester: Backport for Mobile insert menu: Exclude media and signature tools (T385851), VE: Enable mobile insert menu everywhere except top 20 mobile VE wikipedias (T388604) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:00 jforrester@deploy1003: Started scap sync-world: Backport for Mobile insert menu: Exclude media and signature tools (T385851), VE: Enable mobile insert menu everywhere except top 20 mobile VE wikipedias (T388604)
  • 20:27 jforrester@deploy1003: esanders, jforrester: Backport for wikifunctionswiki: Disable 'mathml' mode for Maths, requires RESTbase, Hide "Insert graph" tool in VE when graphs are disabled (T387501), Enable DiscussionTools visual enhancements on zhwiki (T379264), Revert "End EmailAuth enforcement group 2 test" synced to the testservers (https://wi
  • 20:23 sukhe: reprepro -C main include bullseye-wikimedia trafficserver_9.2.10-1wm1_amd64.changes: T379797
  • 20:18 jforrester@deploy1003: Started scap sync-world: Backport for wikifunctionswiki: Disable 'mathml' mode for Maths, requires RESTbase, Hide "Insert graph" tool in VE when graphs are disabled (T387501), Enable DiscussionTools visual enhancements on zhwiki (T379264), Revert "End EmailAuth enforcement group 2 test"
  • 20:13 jforrester@deploy1003: sync-world aborted: Backport for End EmailAuth enforcement group 2 test (T390662), wikifunctionswiki: Disable 'mathml' mode for Maths, requires RESTbase (duration: 00m 33s)
  • 20:12 jforrester@deploy1003: Started scap sync-world: Backport for End EmailAuth enforcement group 2 test (T390662), wikifunctionswiki: Disable 'mathml' mode for Maths, requires RESTbase
  • 19:34 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 19:34 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 19:33 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 19:33 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 19:32 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 19:32 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 19:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2056
  • 19:20 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2056
  • 19:19 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2056
  • 19:19 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2056.codfw.wmnet 181.0.192.10.in-addr.arpa 1.8.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 19:19 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2056.codfw.wmnet 181.0.192.10.in-addr.arpa 1.8.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 19:19 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:19 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2056 - bking@cumin2002"
  • 19:19 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2056 - bking@cumin2002"
  • 19:17 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 19:17 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 19:15 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 19:14 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 19:14 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 19:14 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 19:13 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 19:13 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 19:13 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 19:13 akosiaris@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 19:13 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2056
  • 19:13 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2056.codfw.wmnet with OS bullseye
  • 19:13 akosiaris@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 19:13 akosiaris@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 19:12 akosiaris@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 19:12 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 19:11 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 19:06 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cirrussearch2055*,cirrussearch2056* for ban cirrus nodes to prevent replication problems - bking@cumin2002 - T388610
  • 19:06 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: cirrussearch2055*,cirrussearch2056* for ban cirrus nodes to prevent replication problems - bking@cumin2002 - T388610
  • 19:02 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cirrussearch* for ban cirrus nodes to prevent replication problems - bking@cumin2002 - T388610
  • 19:02 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: cirrussearch* for ban cirrus nodes to prevent replication problems - bking@cumin2002 - T388610
  • 18:21 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test one - bking@cumin2002 - T388610
  • 18:20 dancy@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.23 refs T386218
  • 18:12 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test one - bking@cumin2002 - T388610
  • 18:08 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test one - bking@cumin2002 - T388610
  • 18:06 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test one - bking@cumin2002 - T388610
  • 18:04 dancy@deploy1003: Installation of scap version "4.149.0" completed for 2 hosts
  • 18:03 dancy@deploy1003: Installing scap version "4.149.0" for 2 host(s)
  • 17:57 reedy@deploy1003: Finished scap sync-world: Backport for Banner: More reading from primary... (T390956), CommonSettings-labs: Update BounceHandler config (duration: 17m 43s)
  • 17:48 reedy@deploy1003: reedy: Continuing with sync
  • 17:47 reedy@deploy1003: reedy: Backport for Banner: More reading from primary... (T390956), CommonSettings-labs: Update BounceHandler config synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 17:39 reedy@deploy1003: Started scap sync-world: Backport for Banner: More reading from primary... (T390956), CommonSettings-labs: Update BounceHandler config
  • 17:38 swfrench@deploy1003: Finished scap sync-world: Deployment to pick up new PHP 8.1 production images (duration: 28m 57s)
  • 17:32 dzahn@dns1004: END - running authdns-update
  • 17:30 dzahn@dns1004: START - running authdns-update
  • 17:12 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:11 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:11 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:11 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:11 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:10 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:10 swfrench@deploy1003: Started scap sync-world: Deployment to pick up new PHP 8.1 production images
  • 17:02 sukhe@dns1004: END - running authdns-update
  • 17:02 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 17:02 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 17:00 sukhe@dns1004: START - running authdns-update
  • 16:58 reedy@deploy1003: Finished scap sync-world: Backport for Banner: While saving, do exists() against primary (T390956) (duration: 21m 33s)
  • 16:54 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 16:54 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 16:51 reedy@deploy1003: reedy: Continuing with sync
  • 16:44 reedy@deploy1003: reedy: Backport for Banner: While saving, do exists() against primary (T390956) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:37 reedy@deploy1003: Started scap sync-world: Backport for Banner: While saving, do exists() against primary (T390956)
  • 16:37 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 16:37 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 16:36 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:36 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 16:36 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:36 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:29 reedy@deploy1003: Finished scap sync-world: Backport for Banner: Conditionally check for banner existence from primary db (T390956) (duration: 15m 13s)
  • 16:22 hnowlan: decommissioning all but 1 eqiad jobrunner node in confctl
  • 16:22 reedy@deploy1003: reedy: Continuing with sync
  • 16:21 reedy@deploy1003: reedy: Backport for Banner: Conditionally check for banner existence from primary db (T390956) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:17 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync
  • 16:17 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync
  • 16:14 reedy@deploy1003: Started scap sync-world: Backport for Banner: Conditionally check for banner existence from primary db (T390956)
  • 16:06 hnowlan@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1166-1168].eqiad.wmnet
  • 16:06 hnowlan@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1166-1168].eqiad.wmnet
  • 16:01 ladsgroup@deploy1003: Finished scap sync-world: Backport for Enable EmailAuth enforcement on group 2 for short test (#2) (T390662) (duration: 14m 15s)
  • 15:58 hnowlan: running homer 'cr*eqiad*' commit for new wikikube workers
  • 15:55 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1168.eqiad.wmnet with OS bookworm
  • 15:53 ladsgroup@deploy1003: tgr, ladsgroup: Continuing with sync
  • 15:52 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on elastic2056.codfw.wmnet with reason: adding net-new role
  • 15:52 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1167.eqiad.wmnet with OS bookworm
  • 15:52 ladsgroup@deploy1003: tgr, ladsgroup: Backport for Enable EmailAuth enforcement on group 2 for short test (#2) (T390662) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:46 ladsgroup@deploy1003: Started scap sync-world: Backport for Enable EmailAuth enforcement on group 2 for short test (#2) (T390662)
  • 15:41 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1166.eqiad.wmnet with OS bookworm
  • 15:40 reedy@deploy1003: Finished scap sync-world: Backport for Remove catching of db exception (T390956) (duration: 17m 28s)
  • 15:38 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1168.eqiad.wmnet with reason: host reimage
  • 15:34 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1167.eqiad.wmnet with reason: host reimage
  • 15:33 reedy@deploy1003: reedy: Continuing with sync
  • 15:32 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1168.eqiad.wmnet with reason: host reimage
  • 15:31 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1167.eqiad.wmnet with reason: host reimage
  • 15:30 reedy@deploy1003: reedy: Backport for Remove catching of db exception (T390956) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:24 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1166.eqiad.wmnet with reason: host reimage
  • 15:22 reedy@deploy1003: Started scap sync-world: Backport for Remove catching of db exception (T390956)
  • 15:21 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1166.eqiad.wmnet with reason: host reimage
  • 15:17 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1168
  • 15:17 hnowlan@cumin1002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1168
  • 15:17 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1168.eqiad.wmnet with OS bookworm
  • 15:16 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1167
  • 15:16 hnowlan@cumin1002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1167
  • 15:16 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1167.eqiad.wmnet with OS bookworm
  • 15:16 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1166.eqiad.wmnet wikikube-worker1167.eqiad.wmnet wikikube-worker1168.eqiad.wmnet on all recursors
  • 15:16 hnowlan@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1166.eqiad.wmnet wikikube-worker1167.eqiad.wmnet wikikube-worker1168.eqiad.wmnet on all recursors
  • 15:15 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 15:14 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1438 to wikikube-worker1168
  • 15:14 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1168
  • 15:14 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 15:13 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1168
  • 15:13 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:13 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1438 to wikikube-worker1168 - hnowlan@cumin1002"
  • 15:13 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1438 to wikikube-worker1168 - hnowlan@cumin1002"
  • 15:10 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1437 to wikikube-worker1167
  • 15:10 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1167
  • 15:10 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 15:09 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 15:09 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 15:09 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 15:09 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1167
  • 15:09 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:09 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1437 to wikikube-worker1167 - hnowlan@cumin1002"
  • 15:08 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 15:08 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1437 to wikikube-worker1167 - hnowlan@cumin1002"
  • 15:06 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1166
  • 15:06 hnowlan@cumin1002: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1166
  • 15:06 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1166.eqiad.wmnet with OS bookworm
  • 15:04 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1438 to wikikube-worker1168
  • 15:03 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 15:03 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1437 to wikikube-worker1167
  • 14:49 tgr@deploy1003: Finished scap sync-world: Backport for Enable EmailAuth enforcement on group 2 for short test (T390662) (duration: 16m 18s)
  • 14:42 tgr@deploy1003: tgr: Continuing with sync
  • 14:42 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic2056* for ban node before reimaging - bking@cumin2002 - T388610
  • 14:42 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic2056* for ban node before reimaging - bking@cumin2002 - T388610
  • 14:42 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: elastic2056 for ban node before reimaging - bking@cumin2002 - T388610
  • 14:42 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic2056 for ban node before reimaging - bking@cumin2002 - T388610
  • 14:39 tgr@deploy1003: tgr: Backport for Enable EmailAuth enforcement on group 2 for short test (T390662) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:33 tgr@deploy1003: Started scap sync-world: Backport for Enable EmailAuth enforcement on group 2 for short test (T390662)
  • 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test one - bking@cumin2002 - T388610
  • 14:22 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test one - bking@cumin2002 - T388610
  • 14:18 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
  • 14:17 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
  • 14:12 taavi@deploy1003: Finished scap sync-world: re-syncing 1133581 (duration: 08m 58s)
  • 14:05 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1420 to wikikube-worker1166
  • 14:05 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1166
  • 14:03 taavi@deploy1003: Started scap sync-world: re-syncing 1133581
  • 14:03 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 14:03 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 14:02 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1166
  • 14:02 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:02 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1420 to wikikube-worker1166 - hnowlan@cumin1002"
  • 14:02 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1420 to wikikube-worker1166 - hnowlan@cumin1002"
  • 13:57 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2045.codfw.wmnet
  • 13:56 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1044.eqiad.wmnet
  • 13:55 taavi@deploy1003: scap failed: <CalledProcessError> Command '['helmfile', '-e', 'eqiad', '--selector', 'name=main', 'write-values', '--output-file-template', '/tmp/tmp1ws3xaaw']' returned non-zero exit status 1. (scap version: 4.148.0) (duration: 16m 20s)
  • 13:54 taavi@deploy1003: cscott, taavi: Continuing with sync
  • 13:51 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 13:51 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1420 to wikikube-worker1166
  • 13:50 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1044.eqiad.wmnet
  • 13:50 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2045.codfw.wmnet
  • 13:46 taavi@deploy1003: cscott, taavi: Backport for Parsoid Fragment Support v3: make mStripExtTags a persistent Parser property (T390420) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:45 moritzm: imported imposm3 0.14.1-1 to apt.wikimedia.org for bookworm-wikimedia T389780 T381565
  • 13:39 taavi@deploy1003: Started scap sync-world: Backport for Parsoid Fragment Support v3: make mStripExtTags a persistent Parser property (T390420)
  • 13:38 taavi: install1004: kill a dead `/usr/bin/apt-mark showmanual` process holding puppet runs
  • 13:34 taavi@deploy1003: scap failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.44.0-wmf.22,1.44.0-wmf.23 --multiversion-image-name docker-registry.discovery.wmnet/restricted/mediawiki-multiversion --multiversion-debug-image-name docker-registry.discovery.wmnet/
  • 13:32 taavi@deploy1003: Started scap sync-world: Backport for Parsoid Fragment Support v3: make mStripExtTags a persistent Parser property (T390420)
  • 13:30 taavi@deploy1003: scap failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.44.0-wmf.22,1.44.0-wmf.23 --multiversion-image-name docker-registry.discovery.wmnet/restricted/mediawiki-multiversion --multiversion-debug-image-name docker-registry.discovery.wmnet/
  • 13:28 taavi@deploy1003: Started scap sync-world: Backport for Parsoid Fragment Support v3: make mStripExtTags a persistent Parser property (T390420)
  • 13:28 akosiaris@dns1004: END - running authdns-update
  • 13:27 taavi@deploy1003: Finished scap sync-world: Backport for Enable Parsoid Read Views on 13 wiktionaries (T390680), Enable Parsoid Read Views to incubator and dagwiki mobile frontend (T380768 T381002) (duration: 19m 40s)
  • 13:25 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 13:25 akosiaris@dns1004: START - running authdns-update
  • 13:25 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 13:20 taavi@deploy1003: ihurbain, taavi: Continuing with sync
  • 13:17 taavi@deploy1003: ihurbain, taavi: Backport for Enable Parsoid Read Views on 13 wiktionaries (T390680), Enable Parsoid Read Views to incubator and dagwiki mobile frontend (T380768 T381002) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:07 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe-eqiad
  • 13:07 taavi@deploy1003: Started scap sync-world: Backport for Enable Parsoid Read Views on 13 wiktionaries (T390680), Enable Parsoid Read Views to incubator and dagwiki mobile frontend (T380768 T381002)
  • 13:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:05 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe-eqiad
  • 13:04 jmm@cumin2002: END (FAIL) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=1) rolling restart_daemons on A:thanos-fe
  • 13:02 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
  • 12:56 moritzm: prune now obsolete nginx packages from testreduce1002 T329529
  • 12:55 godog: move k8s instances from prometheus1006 to prometheus1008 - T383232
  • 12:55 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wcqs-public
  • 12:54 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:53 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wcqs-public
  • 12:53 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 12:48 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 12:47 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 12:42 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wdqs-all
  • 12:28 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wdqs-all
  • 12:25 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:24 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 12:22 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wdqs-test
  • 12:21 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wdqs-test
  • 12:16 moritzm: installing libxslt security updates
  • 11:58 moritzm: installing Intel microcode security updates
  • 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 11:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 11:46 moritzm: installing Django security updates on Bullseye
  • 11:37 moritzm: installing Python 3.9 security updates
  • 11:33 topranks: reboot cr2-eqord to complete JunOS upgrade T364092
  • 11:31 topranks: disable EBGP sessions to internet peers on cr2-eqord to prep for JunOS upgrade T364092
  • 11:30 cmooney@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cr2-codfw,cr2-eqiad,cr2-eqord,cr2-eqord IPv6,cr3-ulsfo with reason: Upgrade cr2-eqord JunOS
  • 11:07 moritzm: installing nodejs security updates
  • 11:06 topranks: pre-pend as paths announced to codfw/eqiad from eqord to prep for JunOS upgrade T364092
  • 11:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for Bump thumbnail steps to 65% (T360589) (duration: 16m 34s)
  • 10:55 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 10:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host apus-fe2003.codfw.wmnet with OS bookworm
  • 10:54 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin2002"
  • 10:53 ladsgroup@deploy1003: ladsgroup: Backport for Bump thumbnail steps to 65% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:51 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin2002"
  • 10:50 topranks: drain transport circuits to eqord (Chicago network pop) to prep for Junos upgrade cr2-eqord T364092
  • 10:48 moritzm: remove nodejs from aqs* hosts, no longer used/needed and spares us needless security rollouts T350143
  • 10:46 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump thumbnail steps to 65% (T360589)
  • 10:32 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on apus-fe2003.codfw.wmnet with reason: host reimage
  • 10:27 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on apus-fe2003.codfw.wmnet with reason: host reimage
  • 10:22 akosiaris@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 10:22 akosiaris@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
  • 10:22 akosiaris@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:22 akosiaris@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:21 akosiaris@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:21 akosiaris@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 10:20 akosiaris@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 10:20 akosiaris@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 10:18 akosiaris@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:18 akosiaris@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:17 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 10:17 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 10:17 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 10:17 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 10:16 akosiaris@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:16 akosiaris@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 10:14 akosiaris@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:14 akosiaris@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 10:10 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host apus-fe2003.codfw.wmnet with OS bookworm
  • 10:02 fabfur@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 15 days, 0:00:00 on cp4047.ulsfo.wmnet with reason: HW errors
  • 09:59 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 09:59 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 09:59 fabfur: disable puppet on A:cp-eqsin
  • 09:59 fabfur: applying https://gerrit.wikimedia.org/r/c/operations/puppet/+/1133850 to use TLS on tmpfs on A:cp-eqsin (T384227)
  • 09:54 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 09:54 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 09:54 akosiaris: deploy https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1133745 in all k8s ingresses to stop ingressgateway from forcefully setting the HTTP server header in the responses to "istio-envoy"
  • 09:52 akosiaris@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:52 akosiaris@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:52 godog: lvextend --resizefs --size +1TB vg0/srv on mwlog[12]002
  • 09:52 akosiaris@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:51 akosiaris@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 09:51 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 09:51 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 09:15 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 09:15 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 09:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3006.esams.wmnet
  • 09:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3006.esams.wmnet
  • 09:03 fabfur: secure deleting certificates in /etc/ssl/private from A:cp-ulsfo (T384227)
  • 09:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3006.esams.wmnet
  • 08:53 fabfur: secure deleting certificates in /etc/ssl/private from A:cp-magru (T384227)
  • 08:48 jnuche@deploy1003: Finished deploy [releng/jenkins-deploy@c274545] (releasing): (no justification provided) (duration: 01m 03s)
  • 08:47 jnuche@deploy1003: Started deploy [releng/jenkins-deploy@c274545] (releasing): (no justification provided)
  • 08:46 jnuche@deploy1003: Finished deploy [releng/jenkins-deploy@c274545] (releasing): (no justification provided) (duration: 00m 54s)
  • 08:45 jnuche@deploy1003: Started deploy [releng/jenkins-deploy@c274545] (releasing): (no justification provided)
  • 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3006.esams.wmnet
  • 08:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3005.esams.wmnet
  • 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3005.esams.wmnet
  • 08:24 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 08:22 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 08:21 hashar: Upgrading CI Jenkins
  • 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3005.esams.wmnet
  • 08:20 slyngshede@dns1004: END - running authdns-update
  • 08:18 slyngshede@dns1004: START - running authdns-update
  • 08:12 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 08:12 slyngshede@dns1004: START - running authdns-update
  • 08:06 kevinbazira@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 08:05 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3005.esams.wmnet
  • 07:54 moritzm: failover ganeti masters in esams to ganeti3007/3008
  • 07:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3008.esams.wmnet
  • 07:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3008.esams.wmnet
  • 07:44 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2044.codfw.wmnet
  • 07:44 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1044.eqiad.wmnet
  • 07:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3008.esams.wmnet
  • 07:38 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2044.codfw.wmnet
  • 07:38 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1044.eqiad.wmnet
  • 07:36 moritzm: added spiderpig-access LDAP group T390338
  • 07:31 fabfur: applying patch to use TLS on tmpfs on A:cp-ulsfo (T384227)
  • 07:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3008.esams.wmnet
  • 07:27 fabfur: disabling puppet on A:cp-ulsfo to deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/1133405 (T384227)
  • 07:22 elukey: restart docker on deploy1003 to pick up max-concurrent-uploads=1 - T390251
  • 07:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3007.esams.wmnet
  • 07:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3007.esams.wmnet
  • 07:07 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 07:07 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 07:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3007.esams.wmnet
  • 06:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3007.esams.wmnet
  • 00:39 urandom: starting `nodetool garbagecollect` on Cassandra/sessionstore2006
  • 00:16 tstarling@deploy1003: Finished scap sync-world: Backport for Temporarily disable Lua profiler (T389734) (duration: 15m 04s)
  • 00:15 zabe: zabe@mwmaint1002:~$ cat group2.dblist | xargs -I{} bash -c "echo {}; mwscript extensions/AbuseFilter/maintenance/MigrateESRefToAflTable.php {} --deletedump /home/zabe/afl_text_table_deletedump/{} --dump /home/zabe/afl_text_table_dump/{} --sleep 0.4" # T381599
  • 00:09 tstarling@deploy1003: tstarling: Continuing with sync
  • 00:08 tstarling@deploy1003: tstarling: Backport for Temporarily disable Lua profiler (T389734) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 00:01 tstarling@deploy1003: Started scap sync-world: Backport for Temporarily disable Lua profiler (T389734)

2025-04-02

  • 23:32 urandom: starting `nodetool garbagecollect` on Cassandra/sessionstore1006
  • 23:28 urandom: starting `nodetool garbagecollect` on Cassandra/sessionstore2005
  • 22:38 jhathaway: puppet private repo changes completed, T385995
  • 22:01 brett: Import ncmonitor 1.3.3 into bookworm-wikimedia
  • 22:00 dreamyjazz@deploy1003: Finished scap sync-world: Backport for AbuseLogger: properly distinguish between global filters and central DB (T390904) (duration: 25m 19s)
  • 21:55 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
  • 21:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
  • 21:53 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
  • 21:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
  • 21:53 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
  • 21:53 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
  • 21:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
  • 21:53 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
  • 21:52 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
  • 21:41 dreamyjazz@deploy1003: dreamyjazz: Backport for AbuseLogger: properly distinguish between global filters and central DB (T390904) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:37 urandom: starting `nodetool garbagecollect` on Cassandra/sessionstore2004
  • 21:35 urandom: starting `nodetool garbagecollect` on Cassandra/sessionstore1005
  • 21:35 dreamyjazz@deploy1003: Started scap sync-world: Backport for AbuseLogger: properly distinguish between global filters and central DB (T390904)
  • 21:31 reedy@deploy1003: Finished scap sync-world: Backport for Enable EmailAuth enforcement on group 0/1 (T390662) (duration: 15m 42s)
  • 21:23 reedy@deploy1003: reedy, tgr: Continuing with sync
  • 21:21 reedy@deploy1003: reedy, tgr: Backport for Enable EmailAuth enforcement on group 0/1 (T390662) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:15 reedy@deploy1003: Started scap sync-world: Backport for Enable EmailAuth enforcement on group 0/1 (T390662)
  • 21:07 reedy@deploy1003: Finished scap sync-world: Backport for SUL3: Fix user ID mismatch during login (immediately after creation) (T388177), SUL3: Fix user ID mismatch during login (immediately after creation) (T388177), Remove redundant WaitConditionLoop from CentralAuthTokenManager, Remove redundant WaitConditionLoop from CentralAuthTokenManager
  • 21:00 reedy@deploy1003: d3r1ck01, matmarex, reedy: Continuing with sync
  • {{safesubst:SAL entry|1=20:52 reedy@deploy1003: d3r1ck01, matmarex, reedy: Backport for SUL3: Fix user ID mismatch during login (immediately after creation) (T388177), SUL3: Fix user ID mismatch during login (immediately after creation) (T388177), Remove redundant WaitConditionLoop from CentralAuthTokenManager, [[gerrit:1133504|Remove redundant WaitConditionLoop from CentralAuthTokenManager]}}
  • 20:47 reedy@deploy1003: Started scap sync-world: Backport for SUL3: Fix user ID mismatch during login (immediately after creation) (T388177), SUL3: Fix user ID mismatch during login (immediately after creation) (T388177), Remove redundant WaitConditionLoop from CentralAuthTokenManager, Remove redundant WaitConditionLoop from CentralAuthTokenManager
  • 20:14 reedy@deploy1003: Started scap sync-world: Backport for SUL3: Fix user ID mismatch during login (immediately after creation) (T388177), SUL3: Fix user ID mismatch during login (immediately after creation) (T388177), Remove redundant WaitConditionLoop from CentralAuthTokenManager, Remove redundant WaitConditionLoop from CentralAuthTokenManager
  • 19:54 jhathaway: rolling out a change to private repo, 1127150, please let me know if any issues arise when merging patches
  • 18:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host apus-fe2003.codfw.wmnet with OS bookworm
  • 18:35 dancy@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.23 refs T386218
  • 18:35 cstone: SmashPig upgraded from b9310c06 to 642ae816
  • 18:00 reedy@deploy1003: reedy: Continuing with sync
  • {{safesubst:SAL entry|1=18:00 reedy@deploy1003: reedy: Backport for EmailAuth: Allow forceEmailAuth test check without extension dependencies (T390437), EmailAuth: Add tests for EmailAuthRequireToken handler (T390437), EmailAuthHooks: Exclude bot users from email auth check (T390662), EmailAuth: Add tests for EmailAuthRequireToken handler (T390437), [[gerrit:1133471|EmailA}}
  • 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host apus-fe2003.codfw.wmnet with OS bookworm
  • 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-fe2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • {{safesubst:SAL entry|1=17:47 reedy@deploy1003: Started scap sync-world: Backport for EmailAuth: Allow forceEmailAuth test check without extension dependencies (T390437), EmailAuth: Add tests for EmailAuthRequireToken handler (T390437), EmailAuthHooks: Exclude bot users from email auth check (T390662), EmailAuth: Add tests for EmailAuthRequireToken handler (T390437), [[ger}}
  • 17:41 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 17:40 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 17:34 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 17:34 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 17:33 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 17:31 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 17:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 17:30 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 17:30 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 17:27 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 17:27 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 17:25 urandom: starting `nodetool garbagecollect` on sessionstore1004
  • 17:17 urandom: updating Cassandra/sessionstore `gc_grace_seconds` to 259200 (from 864000)
  • 17:13 brett: reloading varnish-frontend on A:cp and not A:cp-text_drmrs and not A:cp-text_codfw
  • 17:08 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on cirrussearch2055.codfw.wmnet with reason: adding net-new role
  • {{safesubst:SAL entry|1=16:52 reedy@deploy1003: Started scap sync-world: Backport for EmailAuth: Allow forceEmailAuth test check without extension dependencies (T390437), EmailAuth: Add tests for EmailAuthRequireToken handler (T390437), EmailAuthHooks: Exclude bot users from email auth check (T390662), EmailAuth: Add tests for EmailAuthRequireToken handler (T390437), [[ger}}
  • 16:27 vgutierrez: reload varnish on text@codfw to discard stale VCLs - T390846
  • 16:26 swfrench@deploy1003: Finished scap sync-world: Deployment to pick up change in mediawiki-deployments.yaml - T389499 (duration: 03m 21s)
  • 16:25 swfrench@deploy1003: swfrench: Continuing with sync
  • 16:24 swfrench@deploy1003: swfrench: Deployment to pick up change in mediawiki-deployments.yaml - T389499 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:23 vgutierrez: reload varnish on text@drmrs to discard stale VCLs - T390846
  • 16:23 swfrench@deploy1003: Started scap sync-world: Deployment to pick up change in mediawiki-deployments.yaml - T389499
  • 16:10 swfrench-wmf: run-puppet-agent on deploy1003 to pick up mediawiki-deployments.yaml changes - T389499
  • 15:28 arnaudb@dns1004: END - running authdns-update
  • 15:19 arnaudb@dns1004: START - running authdns-update
  • 15:16 arnaudb@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit2002.wikimedia.org with reason: maintenance
  • 15:15 arnaudb@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on gerrit1003.wikimedia.org with reason: maintenance
  • 15:07 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 15:06 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 14:49 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet
  • 14:43 ayounsi@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet
  • 14:43 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host apus-fe2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:43 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host apus-fe2003.codfw.wmnet with OS bookworm
  • 14:35 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart (exit_code=0) rolling restart_daemons on A:dnsbox
  • 14:18 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 14:17 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1042.eqiad.wmnet
  • 14:13 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1042.eqiad.wmnet
  • 14:12 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:12 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:11 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:11 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:10 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:10 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:07 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:06 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:06 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:06 volans@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Release v0.8.0 - volans@cumin1002
  • 14:05 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:05 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:04 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:04 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 14:03 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 14:01 volans: upgrading homer to version 0.8.0 to cumin hosts
  • 14:01 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 14:00 volans@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Release v0.8.0 - volans@cumin1002
  • 13:55 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet
  • 13:52 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough
  • 13:49 ayounsi@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet
  • 13:49 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 13:43 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 13:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet
  • 13:41 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 13:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet
  • 13:40 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet
  • 13:37 akosiaris: depool cp3066 for debugging T390854
  • 13:37 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart rolling restart_daemons on A:dnsbox
  • 13:35 ayounsi@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet
  • 13:33 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough
  • 13:24 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:21 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Configure virtual terms db for wikidata prod & test (T389190), Use wikidata familly in $wgCirrusSearchSimilarityProfile (duration: 16m 55s)
  • 13:19 moritzm: installing gnutls28 security updates on Bookworm
  • 13:14 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 13:14 lucaswerkmeister-wmde@deploy1003: jakob, hashar, lucaswerkmeister-wmde: Continuing with sync
  • 13:14 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 13:11 lucaswerkmeister-wmde@deploy1003: jakob, hashar, lucaswerkmeister-wmde: Backport for Configure virtual terms db for wikidata prod & test (T389190), Use wikidata familly in $wgCirrusSearchSimilarityProfile synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:04 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Configure virtual terms db for wikidata prod & test (T389190), Use wikidata familly in $wgCirrusSearchSimilarityProfile
  • 12:58 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
  • 12:58 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
  • 12:58 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
  • 12:57 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
  • 12:57 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 12:57 jelto@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 12:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd2003.codfw.wmnet
  • 12:41 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P74582 and previous config saved to /var/cache/conftool/dbconfig/20250402-124139-root.json
  • 12:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cephosd2003.codfw.wmnet
  • 12:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd2002.codfw.wmnet
  • 12:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P74581 and previous config saved to /var/cache/conftool/dbconfig/20250402-123029-root.json
  • 12:28 jmm@dns1004: END - running authdns-update
  • 12:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cephosd2002.codfw.wmnet
  • 12:26 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P74580 and previous config saved to /var/cache/conftool/dbconfig/20250402-122634-root.json
  • 12:26 jmm@dns1004: START - running authdns-update
  • 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd2001.codfw.wmnet
  • 12:18 akosiaris@dns1004: END - running authdns-update
  • 12:16 akosiaris@dns1004: START - running authdns-update
  • 12:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P74579 and previous config saved to /var/cache/conftool/dbconfig/20250402-121524-root.json
  • 12:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cephosd2001.codfw.wmnet
  • 12:11 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P74578 and previous config saved to /var/cache/conftool/dbconfig/20250402-121128-root.json
  • 12:11 btullis@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling reboot on A:cephosd
  • 12:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1040.eqiad.wmnet
  • 12:04 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1040.eqiad.wmnet
  • 12:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P74577 and previous config saved to /var/cache/conftool/dbconfig/20250402-120018-root.json
  • 11:56 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P74576 and previous config saved to /var/cache/conftool/dbconfig/20250402-115623-root.json
  • 11:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P74575 and previous config saved to /var/cache/conftool/dbconfig/20250402-114512-root.json
  • 11:44 fabfur: securely erase certificates from A:cp-magru and provide symlink for acmecerts (T384227)
  • 11:41 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P74574 and previous config saved to /var/cache/conftool/dbconfig/20250402-114117-root.json
  • 11:40 vgutierrez: restart varnish on cp6016 - T390846
  • 11:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P74573 and previous config saved to /var/cache/conftool/dbconfig/20250402-113007-root.json
  • 11:26 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P74572 and previous config saved to /var/cache/conftool/dbconfig/20250402-112611-root.json
  • 11:22 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
  • 11:22 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
  • 11:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
  • 11:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
  • 11:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
  • 11:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
  • 11:19 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 11:19 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:18 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:18 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
  • 11:17 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:17 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 11:17 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:17 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 11:16 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:16 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet
  • 11:16 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2043.codfw.wmnet
  • 11:16 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:15 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1043.eqiad.wmnet
  • 11:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P74571 and previous config saved to /var/cache/conftool/dbconfig/20250402-111501-root.json
  • 11:11 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P74570 and previous config saved to /var/cache/conftool/dbconfig/20250402-111106-root.json
  • 11:10 ayounsi@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet
  • 11:09 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1043.eqiad.wmnet
  • 11:09 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2043.codfw.wmnet
  • 11:08 ladsgroup@deploy1003: Finished scap sync-world: Backport for Bump thumbnail steps to 60% (T360589) (duration: 15m 11s)
  • 11:04 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 11:03 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 11:03 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 11:03 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 11:03 akosiaris@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:03 akosiaris@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 11:03 akosiaris@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:02 akosiaris@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:01 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 11:00 btullis@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling reboot on A:cephosd
  • 11:00 ladsgroup@deploy1003: ladsgroup: Backport for Bump thumbnail steps to 60% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P74569 and previous config saved to /var/cache/conftool/dbconfig/20250402-105956-root.json
  • 10:56 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P74568 and previous config saved to /var/cache/conftool/dbconfig/20250402-105601-root.json
  • 10:53 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump thumbnail steps to 60% (T360589)
  • 10:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P74567 and previous config saved to /var/cache/conftool/dbconfig/20250402-104450-root.json
  • 10:40 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P74566 and previous config saved to /var/cache/conftool/dbconfig/20250402-104055-root.json
  • 10:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P74564 and previous config saved to /var/cache/conftool/dbconfig/20250402-102944-root.json
  • 10:25 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P74563 and previous config saved to /var/cache/conftool/dbconfig/20250402-102549-root.json
  • 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1039.eqiad.wmnet
  • 10:19 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1039.eqiad.wmnet
  • 10:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6004.drmrs.wmnet
  • 10:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6004.drmrs.wmnet
  • 10:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P74561 and previous config saved to /var/cache/conftool/dbconfig/20250402-101439-root.json
  • 10:13 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
  • 10:13 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
  • 10:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6004.drmrs.wmnet
  • 10:10 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P74560 and previous config saved to /var/cache/conftool/dbconfig/20250402-101044-root.json
  • 10:10 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
  • 10:09 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
  • 10:09 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 10:09 jelto@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 09:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P74559 and previous config saved to /var/cache/conftool/dbconfig/20250402-095933-root.json
  • 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6004.drmrs.wmnet
  • 09:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6003.drmrs.wmnet
  • 09:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6003.drmrs.wmnet
  • 09:55 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P74558 and previous config saved to /var/cache/conftool/dbconfig/20250402-095538-root.json
  • 09:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6003.drmrs.wmnet
  • 09:52 marostegui@cumin1002: dbctl commit (dc=all): 'Add db2243 to dbctl depooled T381475', diff saved to https://phabricator.wikimedia.org/P74557 and previous config saved to /var/cache/conftool/dbconfig/20250402-095213-marostegui.json
  • 09:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P74556 and previous config saved to /var/cache/conftool/dbconfig/20250402-094428-root.json
  • 09:41 marostegui@cumin1002: dbctl commit (dc=all): 'Add db1257 to dbctl depooled T381475', diff saved to https://phabricator.wikimedia.org/P74555 and previous config saved to /var/cache/conftool/dbconfig/20250402-094109-marostegui.json
  • 09:41 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2042.codfw.wmnet
  • 09:40 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1042.eqiad.wmnet
  • 09:40 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6003.drmrs.wmnet
  • 09:34 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2042.codfw.wmnet
  • 09:34 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1042.eqiad.wmnet
  • 09:29 elukey@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 09:27 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1041.eqiad.wmnet
  • 09:24 XioNoX: rebooting mr1-ulsfo - T390052
  • 09:24 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1036.eqiad.wmnet
  • 09:23 ayounsi@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on mr1-ulsfo with reason: reboot
  • 09:21 elukey@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 09:21 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1041.eqiad.wmnet
  • 09:19 akosiaris@dns1004: END - running authdns-update
  • 09:18 akosiaris: create mw-wikifunctions-ingress.discovery.wmnet and .svc records to facilitate the migration to ingress
  • 09:17 moritzm: failover ganeti masters in drmrs to ganeti6001/6002
  • 09:16 akosiaris@dns1004: START - running authdns-update
  • 09:16 ayounsi@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1036.eqiad.wmnet
  • 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6002.drmrs.wmnet
  • 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6002.drmrs.wmnet
  • 09:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6002.drmrs.wmnet
  • 08:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6002.drmrs.wmnet
  • 08:56 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti6001.drmrs.wmnet
  • 08:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6001.drmrs.wmnet
  • 08:55 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:50 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1036.eqiad.wmnet
  • 08:48 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 08:48 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1036.eqiad.wmnet
  • 08:48 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 08:48 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 08:47 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 08:47 akosiaris@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6001.drmrs.wmnet
  • 08:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6001.drmrs.wmnet
  • 08:47 akosiaris@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 08:46 akosiaris@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 08:46 akosiaris@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 08:45 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 08:45 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 08:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6001.drmrs.wmnet
  • 08:41 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 08:40 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 08:38 jmm@dns1004: END - running authdns-update
  • 08:38 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:37 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:36 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:36 jmm@dns1004: START - running authdns-update
  • 08:36 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:32 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6001.drmrs.wmnet
  • 08:32 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:31 XioNoX: trunk sandbox vlan to eqiad row B ganeti - T385560
  • 08:30 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:30 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:28 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:28 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:26 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:26 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:23 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:23 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:18 fabfur: repooled cp7001 (T384227)
  • 08:15 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:15 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:57 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:49 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:49 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:47 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs2013.*,lvs1019.*} and A:lvs
  • 07:46 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs2013.*,lvs1019.*} and A:lvs
  • 07:39 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:39 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:36 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs2014.*,lvs1020.*} and A:lvs
  • 07:34 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs2014.*,lvs1020.*} and A:lvs
  • 07:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:29 fabfur: depool cp7001 to fix stale ocsp alert (T384227)
  • 07:19 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:18 jmm@dns1004: END - running authdns-update
  • 07:16 jmm@dns1004: START - running authdns-update
  • 07:02 jmm@dns1004: END - running authdns-update
  • 06:59 jmm@dns1004: START - running authdns-update
  • 06:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry2004.codfw.wmnet

2025-04-01

  • 23:43 reedy@deploy1003: rebuilt and synchronized wikiversions files: pihwiki to .23
  • 23:40 ladsgroup@dns1004: END - running authdns-update
  • 23:38 ladsgroup@dns1004: START - running authdns-update
  • 23:34 ladsgroup@dns1004: END - running authdns-update
  • 23:32 ladsgroup@dns1004: START - running authdns-update
  • 23:27 ladsgroup@dns1004: END - running authdns-update
  • 23:25 ladsgroup@dns1004: START - running authdns-update
  • 23:20 ladsgroup@dns1004: END - running authdns-update
  • 23:18 ladsgroup@dns1004: START - running authdns-update
  • 23:03 ladsgroup@dns1004: END - running authdns-update
  • 23:00 ladsgroup@dns1004: START - running authdns-update
  • 22:04 bking@cumin2002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for cirrussearch2055.codfw.wmnet: Renew puppet certificate - bking@cumin2002
  • 21:41 mutante: deploy1003 sudo -u mwdeploy /usr/local/bin/mwscript-cleanup --debug eqiad
  • 20:46 taavi@deploy1003: Finished scap sync-world: Backport for homepage: Add `homepage_transfersize_bytes_total` metric (T382003), homepage: Add `homepage_transfersize_bytes_total` metric (T382003), Don't add WikiLove icon to Minerva (T390642) (duration: 16m 59s)
  • 20:39 taavi@deploy1003: migr, taavi: Continuing with sync
  • 20:37 taavi@deploy1003: migr, taavi: Backport for homepage: Add `homepage_transfersize_bytes_total` metric (T382003), homepage: Add `homepage_transfersize_bytes_total` metric (T382003), Don't add WikiLove icon to Minerva (T390642) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe2006.codfw.wmnet with OS bullseye
  • 20:30 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2016.codfw.wmnet with OS bullseye
  • 20:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:30 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:30 taavi@deploy1003: Started scap sync-world: Backport for homepage: Add `homepage_transfersize_bytes_total` metric (T382003), homepage: Add `homepage_transfersize_bytes_total` metric (T382003), Don't add WikiLove icon to Minerva (T390642)
  • 20:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2015.codfw.wmnet with OS bullseye
  • 20:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe2007.codfw.wmnet with OS bullseye
  • 20:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe2005.codfw.wmnet with OS bullseye
  • 20:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:21 taavi@deploy1003: Finished scap sync-world: Backport for [plwiki] Allow bureaucrats to remove users from sysop usergroup (T389829), Close pihwiki (T390732) (duration: 14m 18s)
  • 20:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:14 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:14 taavi@deploy1003: superpes, taavi: Continuing with sync
  • 20:13 taavi@deploy1003: superpes, taavi: Backport for [plwiki] Allow bureaucrats to remove users from sysop usergroup (T389829), Close pihwiki (T390732) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2006.codfw.wmnet with reason: host reimage
  • 20:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2016.codfw.wmnet with reason: host reimage
  • 20:07 taavi@deploy1003: Started scap sync-world: Backport for [plwiki] Allow bureaucrats to remove users from sysop usergroup (T389829), Close pihwiki (T390732)
  • 20:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2015.codfw.wmnet with reason: host reimage
  • 20:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2007.codfw.wmnet with reason: host reimage
  • 19:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2005.codfw.wmnet with reason: host reimage
  • 19:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2016.codfw.wmnet with reason: host reimage
  • 19:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2007.codfw.wmnet with reason: host reimage
  • 19:55 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2015.codfw.wmnet with reason: host reimage
  • 19:54 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2006.codfw.wmnet with reason: host reimage
  • 19:54 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2005.codfw.wmnet with reason: host reimage
  • 19:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host apus-fe2003.codfw.wmnet with OS bookworm
  • 19:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2016.codfw.wmnet with OS bullseye
  • 19:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-fe2007.codfw.wmnet with OS bullseye
  • 19:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2015.codfw.wmnet with OS bullseye
  • 19:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-fe2006.codfw.wmnet with OS bullseye
  • 19:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-fe2005.codfw.wmnet with OS bullseye
  • 19:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['apus-fe2003']
  • 19:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe2016']
  • 19:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe2015']
  • 19:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['thanos-fe2007']
  • 19:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe2007']
  • 19:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['thanos-fe2006']
  • 19:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['thanos-fe2005']
  • 19:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['thanos-fe2007']
  • 19:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe2015']
  • 19:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe2016']
  • 19:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['apus-fe2003']
  • 19:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe2007']
  • 19:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe2006']
  • 19:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe2005']
  • 19:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-fe2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-fe2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-fe2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe2016.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe2015.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-fe2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-fe2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host apus-fe2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-fe2016.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-fe2015.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-fe2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-fe2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:29 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host apus-fe2003
  • 19:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host apus-fe2003
  • 19:29 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-fe2016
  • 19:28 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-fe2016
  • 19:28 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-fe2015
  • 19:28 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-fe2015
  • 19:28 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-fe2007
  • 19:28 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-fe2007
  • 19:28 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-fe2006
  • 19:28 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-fe2006
  • 19:28 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-fe2005
  • 19:28 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-fe2005
  • 19:27 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding thanos-fe2005-7, ms-fe2015-6, and apus-fe2003 to codfw - jhancock@cumin2002"
  • 19:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding thanos-fe2005-7, ms-fe2015-6, and apus-fe2003 to codfw - jhancock@cumin2002"
  • 19:23 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 18:50 cstone: payments-wiki upgraded from 19b1c505 to e090b97b
  • 18:25 bking@cumin2002: DONE (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for cirrussearch2055.eqiad.wmnet: Renew puppet certificate - bking@cumin2002
  • 18:25 bking@cumin2002: DONE (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for cirrussearch2055.eqiad.wmnet: Renew puppet certificate - bking@cumin2002
  • 18:20 dzahn@dns1004: END - running authdns-update
  • 18:19 mforns@deploy1003: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
  • 18:19 mforns@deploy1003: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
  • 18:17 dzahn@dns1004: START - running authdns-update
  • 18:15 dancy@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.23 refs T386218
  • 18:11 dancy@deploy1003: Testing. Disreagard
  • 17:58 herron@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=k8s-ingress-aux-rw,name=codfw
  • 17:48 herron@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=k8s-ingress-aux-rw,name=eqiad
  • 17:48 herron@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=k8s-ingress-aux-rw,name=codfw
  • 17:48 herron@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=k8s-ingress-aux-ro,name=codfw
  • 17:48 herron@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=k8s-ingress-aux-ro,name=eqiad
  • 17:41 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2055.codfw.wmnet with OS bullseye
  • 17:25 brett: importing varnishkafka 1.2.0-1 into bullseye-wikimedia main (T378737)
  • 17:25 brett: importing libvmod-re2/varnish-re2 2.0.0-2~bpo11+wmf2 into bullseye-wikimedia main (T378737)
  • 17:24 brett: importing libvmod-querysort 0.4-3 into bullseye-wikimedia main (T378737)
  • 17:24 brett: importing libvmod-netmapper 1.9.1-1 into bullseye-wikimedia main (T378737)
  • 17:23 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7001.magru.wmnet
  • 17:23 brett: importing varnish-modules 0.20.0-2~bpo11 into bullseye-wikimedia main (T378737)
  • 17:23 fabfur: repool cp7001, no certs removed (T384227)
  • 17:22 brett: importing varnish 7.1.1-1.1~bpo11+wmf1 into bullseye-wikimedia main (T378737)
  • 16:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2055
  • 16:23 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2055
  • 16:23 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2055.codfw.wmnet with OS bullseye
  • 16:17 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 16:17 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 16:04 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2055.codfw.wmnet with OS bullseye
  • 15:45 topranks: removing et-0/0/0 from ae0 bundle on cr3-ulsfo and cr4-ulsfo T390731
  • 15:27 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on 27 hosts with reason: Maintenance in s2
  • 15:27 dzahn@dns1004: END - running authdns-update
  • 15:25 mutante: DNS - new project language 'nup' - Nupe (also known as Anufe, Nupenci, Nyinfe, and Tapa[3]) is a Volta–Niger language of the Nupoid branch primarily spoken by the Nupe people of the North Central region of Nigeria.
  • 15:24 dzahn@dns1004: START - running authdns-update
  • 15:19 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:18 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:11 brennen@deploy1003: Finished deploy [phabricator/deployment@53fcaf8]: deploy phab1004 for T390737 (duration: 00m 36s)
  • 15:10 brennen@deploy1003: Started deploy [phabricator/deployment@53fcaf8]: deploy phab1004 for T390737
  • 15:09 brennen@deploy1003: Finished deploy [phabricator/deployment@53fcaf8]: test deploy phab2002 for T390737 (duration: 00m 39s)
  • 15:08 brennen@deploy1003: Started deploy [phabricator/deployment@53fcaf8]: test deploy phab2002 for T390737
  • 15:05 dzahn@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: phabricator deploy
  • 15:04 dzahn@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: phabricator deploy
  • 14:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 14:51 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2035.codfw.wmnet [reason: finished T390658]
  • 14:50 fabfur: depooled cp7001 to test secure removal of unused certificates (T384227)
  • 14:49 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp7001.magru.wmnet
  • 14:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2006.codfw.wmnet with OS bookworm
  • 14:47 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2055
  • 14:47 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2055
  • 14:46 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2055
  • 14:46 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2055.codfw.wmnet 180.0.192.10.in-addr.arpa 0.8.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 14:46 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2055.codfw.wmnet 180.0.192.10.in-addr.arpa 0.8.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 14:46 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:46 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2055 - bking@cumin2002"
  • 14:46 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2055 - bking@cumin2002"
  • 14:42 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 14:41 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 14:41 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 14:41 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 14:40 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 14:40 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 14:40 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2055
  • 14:40 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2055.codfw.wmnet with OS bullseye
  • 14:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2055 to cirrussearch2055
  • 14:37 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2055
  • 14:37 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2055
  • 14:37 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:37 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2055 to cirrussearch2055 - bking@cumin2002"
  • 14:36 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2055 to cirrussearch2055 - bking@cumin2002"
  • 14:33 ladsgroup@deploy1003: Finished scap sync-world: Backport for CommmonSettings: Remove old BounceHandler DB config (duration: 15m 28s)
  • 14:32 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 14:31 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2055 to cirrussearch2055
  • 14:28 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 14:27 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2006.codfw.wmnet with reason: host reimage
  • 14:26 ladsgroup@deploy1003: reedy, ladsgroup: Continuing with sync
  • 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T370903)', diff saved to https://phabricator.wikimedia.org/P74547 and previous config saved to /var/cache/conftool/dbconfig/20250401-142516-ladsgroup.json
  • 14:24 ladsgroup@deploy1003: reedy, ladsgroup: Backport for CommmonSettings: Remove old BounceHandler DB config synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:23 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2006.codfw.wmnet with reason: host reimage
  • 14:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T370903)', diff saved to https://phabricator.wikimedia.org/P74546 and previous config saved to /var/cache/conftool/dbconfig/20250401-142228-ladsgroup.json
  • 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for CommmonSettings: Remove old BounceHandler DB config
  • 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo and group 1
  • 14:15 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo and group 1
  • 14:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet
  • 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P74545 and previous config saved to /var/cache/conftool/dbconfig/20250401-141008-ladsgroup.json
  • 14:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P74544 and previous config saved to /var/cache/conftool/dbconfig/20250401-140721-ladsgroup.json
  • 14:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet
  • 14:05 elukey: roll restart nginx on registry* to remove debug logging - too much data, filling up the root partition
  • 14:02 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host registry2005.codfw.wmnet
  • 14:00 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2006.codfw.wmnet with OS bookworm
  • 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P74543 and previous config saved to /var/cache/conftool/dbconfig/20250401-135501-ladsgroup.json
  • 13:53 elukey@cumin1002: START - Cookbook sre.hosts.reboot-single for host registry2005.codfw.wmnet
  • 13:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P74542 and previous config saved to /var/cache/conftool/dbconfig/20250401-135215-ladsgroup.json
  • 13:48 elukey: depool registry2005 to investigate some nginx logging issue
  • 13:44 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2035.codfw.wmnet [reason: T390658]
  • 13:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T370903)', diff saved to https://phabricator.wikimedia.org/P74540 and previous config saved to /var/cache/conftool/dbconfig/20250401-133954-ladsgroup.json
  • 13:39 elukey: restart nginx on registry2005 - stuck writing error logs
  • 13:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2005.codfw.wmnet with OS bookworm
  • 13:37 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.zarcillo (exit_code=0)
  • 13:37 fceratto@cumin1002: START - Cookbook sre.mysql.zarcillo
  • 13:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T370903)', diff saved to https://phabricator.wikimedia.org/P74539 and previous config saved to /var/cache/conftool/dbconfig/20250401-133707-ladsgroup.json
  • 13:35 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.zarcillo (exit_code=0)
  • 13:35 fceratto@cumin1002: START - Cookbook sre.mysql.zarcillo
  • 13:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4008.ulsfo.wmnet with OS bookworm
  • 13:29 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:28 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Remove 'exception-json' logging channel, Disable experiment-related config during active development (duration: 18m 04s)
  • 13:27 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2041.codfw.wmnet
  • 13:26 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1040.eqiad.wmnet
  • 13:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2165 (T370903)', diff saved to https://phabricator.wikimedia.org/P74537 and previous config saved to /var/cache/conftool/dbconfig/20250401-132407-ladsgroup.json
  • 13:24 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 13:21 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, cjming, matmarex: Continuing with sync
  • 13:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1160 (T370903)', diff saved to https://phabricator.wikimedia.org/P74536 and previous config saved to /var/cache/conftool/dbconfig/20250401-132059-ladsgroup.json
  • 13:20 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 13:20 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1040.eqiad.wmnet
  • 13:20 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2041.codfw.wmnet
  • 13:18 moritzm: installing python-cryptography security updates
  • 13:18 moritzm: installing python-cryptohgraphy security updates
  • 13:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2005.codfw.wmnet with reason: host reimage
  • 13:17 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, cjming, matmarex: Backport for Remove 'exception-json' logging channel, Disable experiment-related config during active development synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T371742)', diff saved to https://phabricator.wikimedia.org/P74534 and previous config saved to /var/cache/conftool/dbconfig/20250401-131530-ladsgroup.json
  • 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage
  • 13:13 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2005.codfw.wmnet with reason: host reimage
  • 13:10 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage
  • 13:10 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Remove 'exception-json' logging channel, Disable experiment-related config during active development
  • 13:05 elukey: restart nginx on registry* to pick up https://gerrit.wikimedia.org/r/c/operations/puppet/+/1133112 - debug logs to /var/log/nginx/debug.log - T390251
  • 13:04 XioNoX: msw2-eqiad> restart jsd gracefully - T390052
  • 13:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P74533 and previous config saved to /var/cache/conftool/dbconfig/20250401-130023-ladsgroup.json
  • 12:50 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4008.ulsfo.wmnet with OS bookworm
  • 12:48 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2005.codfw.wmnet with OS bookworm
  • 12:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2004.codfw.wmnet with OS bookworm
  • 12:47 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.zarcillo (exit_code=0)
  • 12:47 fceratto@cumin1002: START - Cookbook sre.mysql.zarcillo
  • 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P74530 and previous config saved to /var/cache/conftool/dbconfig/20250401-124516-ladsgroup.json
  • 12:44 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.zarcillo (exit_code=0)
  • 12:44 fceratto@cumin1002: START - Cookbook sre.mysql.zarcillo
  • 12:43 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.zarcillo (exit_code=0)
  • 12:43 fceratto@cumin1002: START - Cookbook sre.mysql.zarcillo
  • 12:42 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.zarcillo (exit_code=0)
  • 12:42 fceratto@cumin1002: START - Cookbook sre.mysql.zarcillo
  • 12:42 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.zarcillo (exit_code=99)
  • 12:41 fceratto@cumin1002: START - Cookbook sre.mysql.zarcillo
  • 12:41 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2040.codfw.wmnet
  • 12:41 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.zarcillo (exit_code=99)
  • 12:40 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1039.eqiad.wmnet
  • 12:39 fceratto@cumin1002: START - Cookbook sre.mysql.zarcillo
  • 12:39 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ganeti4008.ulsfo.wmnet
  • 12:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet
  • 12:34 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2040.codfw.wmnet
  • 12:34 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1039.eqiad.wmnet
  • 12:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T371742)', diff saved to https://phabricator.wikimedia.org/P74529 and previous config saved to /var/cache/conftool/dbconfig/20250401-123009-ladsgroup.json
  • 12:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet
  • 12:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2004.codfw.wmnet with reason: host reimage
  • 12:24 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2004.codfw.wmnet with reason: host reimage
  • 12:23 moritzm: installing PHP 7.4 security updates (as shipped in Debian, not our internal build running on a few remaining edge cases)
  • 12:12 dbrant@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 12:11 dbrant@deploy1003: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 12:11 dbrant@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 12:11 dbrant@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 12:08 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1038.eqiad.wmnet
  • 12:08 dbrant@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 12:08 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2039.codfw.wmnet
  • 12:08 dbrant@deploy1003: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 12:04 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2004.codfw.wmnet with OS bookworm
  • 12:02 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
  • 12:02 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1038.eqiad.wmnet
  • 12:02 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
  • 12:02 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2039.codfw.wmnet
  • 11:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T371742)', diff saved to https://phabricator.wikimedia.org/P74528 and previous config saved to /var/cache/conftool/dbconfig/20250401-115935-ladsgroup.json
  • 11:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2003.codfw.wmnet with OS bookworm
  • 11:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P74527 and previous config saved to /var/cache/conftool/dbconfig/20250401-114428-ladsgroup.json
  • 11:34 Lucas_WMDE: Deployed patch for T389369
  • 11:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2003.codfw.wmnet with reason: host reimage
  • 11:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P74526 and previous config saved to /var/cache/conftool/dbconfig/20250401-112921-ladsgroup.json
  • 11:26 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2003.codfw.wmnet with reason: host reimage
  • 11:26 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2038.codfw.wmnet
  • 11:25 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1037.eqiad.wmnet
  • 11:24 moritzm: installing squid security updates
  • 11:22 hashar: Restarting Gerrit
  • 11:19 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2038.codfw.wmnet
  • 11:18 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1037.eqiad.wmnet
  • 11:16 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti4008.ulsfo.wmnet
  • 11:16 topranks: reboot cr4-ulsfo to upgrade JunOS T364092
  • 11:15 hashar: Restarted Gerrit replica on gerrit2002 to raise heap from 32G to 64G | T387223
  • 11:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T371742)', diff saved to https://phabricator.wikimedia.org/P74525 and previous config saved to /var/cache/conftool/dbconfig/20250401-111415-ladsgroup.json
  • 11:13 volans@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on sretest1002.eqiad.wmnet with reason: Test
  • 11:12 moritzm: restarting FPM on phab1004 to pick up security update
  • 11:10 volans: upgrading spicerack to v10.0.0 on cumin1002
  • 11:10 cmooney@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 8 hosts with reason: Upgrade cr4-ulsfo JunOS
  • 11:06 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti4008.ulsfo.wmnet with reason: remove from cluster for reimage
  • 11:06 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2003.codfw.wmnet with OS bookworm
  • 11:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4008.ulsfo.wmnet
  • 11:05 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2006.codfw.wmnet
  • 11:04 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 11:04 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wcqs-public
  • 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2002.codfw.wmnet with OS bookworm
  • 11:02 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wcqs-public
  • 10:58 ladsgroup@deploy1003: Finished scap sync-world: Backport for Bump thumbnail steps to 55% (T360589) (duration: 22m 03s)
  • 10:58 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp2006.codfw.wmnet
  • 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wdqs-all
  • 10:56 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2005.codfw.wmnet
  • 10:56 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1006.eqiad.wmnet
  • 10:55 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1211.eqiad.wmnet onto db1257.eqiad.wmnet
  • 10:55 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1211 slowly with 10 steps - Pool db1211.eqiad.wmnet in after cloning
  • 10:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2204 (T371742)', diff saved to https://phabricator.wikimedia.org/P74523 and previous config saved to /var/cache/conftool/dbconfig/20250401-105425-ladsgroup.json
  • 10:54 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2204.codfw.wmnet with reason: Maintenance
  • 10:50 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp2005.codfw.wmnet
  • 10:50 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1006.eqiad.wmnet
  • 10:48 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 10:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2165 (T371742)', diff saved to https://phabricator.wikimedia.org/P74522 and previous config saved to /var/cache/conftool/dbconfig/20250401-104659-ladsgroup.json
  • 10:46 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 10:46 ladsgroup@deploy1003: ladsgroup: Backport for Bump thumbnail steps to 55% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:45 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wdqs-all
  • 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wdqs-test
  • 10:43 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wdqs-test
  • 10:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2002.codfw.wmnet with reason: host reimage
  • 10:40 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2002.codfw.wmnet with reason: host reimage
  • 10:36 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump thumbnail steps to 55% (T360589)
  • 10:33 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2004.codfw.wmnet
  • 10:33 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1005.eqiad.wmnet
  • 10:27 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp2004.codfw.wmnet
  • 10:26 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1005.eqiad.wmnet
  • 10:25 akosiaris@deploy1003: Finished scap sync-world: Backport for typos: Add wnmet as a typo (duration: 29m 34s)
  • 10:24 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1004.eqiad.wmnet
  • 10:20 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2002.codfw.wmnet with OS bookworm
  • 10:19 jiji@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mc-gp2004.codfw.wmnet
  • 10:19 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp2004.codfw.wmnet
  • 10:19 aqu@deploy1003: Finished deploy [airflow-dags/analytics@d96f732]: Update artifacts for analytics (duration: 00m 59s)
  • 10:18 aqu@deploy1003: Started deploy [airflow-dags/analytics@d96f732]: Update artifacts for analytics
  • 10:17 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1004.eqiad.wmnet
  • 10:17 aqu@deploy1003: Finished deploy [airflow-dags/analytics_test@d96f732]: Update artifacts for analytics_test (duration: 00m 12s)
  • 10:17 aqu@deploy1003: Started deploy [airflow-dags/analytics_test@d96f732]: Update artifacts for analytics_test
  • 10:17 jiji@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mc-gp1004.eqiad.wmnet
  • 10:16 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1004.eqiad.wmnet
  • 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2001.codfw.wmnet with OS bookworm
  • 10:09 akosiaris@deploy1003: akosiaris: Continuing with sync
  • 10:08 akosiaris@deploy1003: akosiaris: Backport for typos: Add wnmet as a typo synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:00 joal@deploy1003: Finished deploy [analytics/refinery@efc4808] (hadoop-test): Analytics webrequest migration TEST [analytics/refinery@efc48089] (duration: 00m 40s)
  • 09:59 joal@deploy1003: Started deploy [analytics/refinery@efc4808] (hadoop-test): Analytics webrequest migration TEST [analytics/refinery@efc48089]
  • 09:59 joal@deploy1003: Finished deploy [analytics/refinery@efc4808] (thin): Analytics webrequest migration THIN [analytics/refinery@efc48089] (duration: 00m 55s)
  • 09:58 joal@deploy1003: Started deploy [analytics/refinery@efc4808] (thin): Analytics webrequest migration THIN [analytics/refinery@efc48089]
  • 09:57 joal@deploy1003: Finished deploy [analytics/refinery@efc4808]: Analytics webrequest migration [analytics/refinery@efc48089] (duration: 02m 24s)
  • 09:57 moritzm: installing freetype security updates
  • 09:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2001.codfw.wmnet with reason: host reimage
  • 09:55 akosiaris@deploy1003: Started scap sync-world: Backport for typos: Add wnmet as a typo
  • 09:55 akosiaris: scap backport a noop change https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1133069 for T390251
  • 09:55 joal@deploy1003: Started deploy [analytics/refinery@efc4808]: Analytics webrequest migration [analytics/refinery@efc48089]
  • 09:52 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2001.codfw.wmnet with reason: host reimage
  • 09:50 elukey: restart nginx on registry* to pick up the debug changes
  • 09:42 volans@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on sretest1001.eqiad.wmnet with reason: test
  • 09:39 gmodena@deploy1003: Finished deploy [airflow-dags/search@ed0fc78]: Deploy mjolnir-2.7.0.dev.conda.tgz (duration: 01m 29s)
  • 09:38 gmodena@deploy1003: Started deploy [airflow-dags/search@ed0fc78]: Deploy mjolnir-2.7.0.dev.conda.tgz
  • 09:32 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2001.codfw.wmnet with OS bookworm
  • 09:27 ayounsi@cumin1002: END (ERROR) - Cookbook sre.network.tls (exit_code=97) for network device mr1-ulsfo
  • 09:26 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-ulsfo
  • 09:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 09:19 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 09:00 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4008.ulsfo.wmnet
  • 08:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4008.ulsfo.wmnet
  • 08:58 dcausse@deploy1003: Finished deploy [wdqs/wdqs@354b5ac]: revert T326311, deletion query way too slow (duration: 12m 15s)
  • 08:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4008.ulsfo.wmnet
  • 08:50 hashar@deploy1003: Finished deploy [integration/docroot@5256e19]: build: Updating eslint-config-wikimedia to 0.29.1 (duration: 00m 09s)
  • 08:50 hashar@deploy1003: Started deploy [integration/docroot@5256e19]: build: Updating eslint-config-wikimedia to 0.29.1
  • 08:46 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device msw1-eqiad
  • 08:46 topranks: Drain Lumen cct from codfw to ulsfo due to instability T390660
  • 08:46 dcausse@deploy1003: Started deploy [wdqs/wdqs@354b5ac]: revert T326311, deletion query way too slow
  • 08:45 volans: upgrading spicerack to v10.0.0 on cumin2002
  • 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device msw1-eqiad
  • 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device msw2-eqiad
  • 08:38 marostegui@cumin1002: START - Cookbook sre.mysql.pool db1211 slowly with 10 steps - Pool db1211.eqiad.wmnet in after cloning
  • 08:36 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device msw2-eqiad
  • 08:36 moritzm: failover ganeti master in ulsfo to ganeti4005 T382511
  • 08:35 volans: temporary disable puppet on cumin1002 for the spicerack upgrade to v10.0.0
  • 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device msw1-codfw
  • 08:34 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti4007
  • 08:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti4007
  • 08:33 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4007.ulsfo.wmnet to cluster ulsfo and group 1
  • 08:32 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device msw1-codfw
  • 08:29 elukey: set debug logging for registry*'s nginx - T390251
  • 08:29 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device msw2-codfw
  • 08:29 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4007.ulsfo.wmnet to cluster ulsfo and group 1
  • 08:27 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device msw2-codfw
  • 08:24 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device mr1-eqiad
  • 08:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4007.ulsfo.wmnet
  • 08:18 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-eqiad
  • 08:18 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device mr1-eqsin
  • 08:17 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 08:16 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 08:14 dcausse: T390665: restart blazegraph on wdqs2017
  • 08:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4007.ulsfo.wmnet
  • 08:12 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 08:12 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 08:11 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-eqsin
  • 08:11 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 08:11 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 08:05 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device mr1-esams
  • 08:05 dcausse: restarting blazegraph on wdqs2016
  • 08:04 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 08:03 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 08:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4007.ulsfo.wmnet with OS bookworm
  • 08:00 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 07:59 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 07:59 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-esams
  • 07:58 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device mr1-drmrs
  • 07:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-drmrs
  • 07:50 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device mr1-magru
  • 07:47 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 07:46 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 07:44 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-magru
  • 07:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4007.ulsfo.wmnet with reason: host reimage
  • 07:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device mr1-codfw
  • 07:37 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4007.ulsfo.wmnet with reason: host reimage
  • 07:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-codfw
  • 07:34 kevinbazira@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
  • 07:31 ayounsi@cumin1002: END (ERROR) - Cookbook sre.network.tls (exit_code=97) for network device mr1-ulsfo
  • 07:30 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-models' for release 'main' .
  • 07:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-ulsfo
  • 07:28 ayounsi@cumin1002: END (ERROR) - Cookbook sre.network.tls (exit_code=97) for network device mr1-ulsfo
  • 07:28 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-ulsfo
  • 07:26 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device mr1-ulsfo
  • 07:24 kevinbazira@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
  • 07:20 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4007.ulsfo.wmnet with OS bookworm
  • 07:19 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-ulsfo
  • 07:19 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1b-eqiad
  • 07:17 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1b-eqiad
  • 06:14 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db1211.eqiad.wmnet onto db1257.eqiad.wmnet
  • 05:33 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@557a834]: 0.3.155 (duration: 12m 49s)
  • 05:22 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.155` on canary `wdqs1015`; proceeding to rest of fleet
  • 05:20 ryankemper@deploy1003: Started deploy [wdqs/wdqs@557a834]: 0.3.155
  • 05:14 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.155`. Pre-deploy tests passing on canary `wdqs1016`
  • 04:04 mwpresync@deploy1003: Pruned MediaWiki: 1.44.0-wmf.20 (duration: 04m 34s)
  • 03:02 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.44.0-wmf.23 refs T386218


Archives

See Server Admin Log/Archives.