Server Admin Log
Appearance
2025-04-03
- 00:39 urandom: starting `nodetool garbagecollect` on Cassandra/sessionstore2006
- 00:16 tstarling@deploy1003: Finished scap sync-world: Backport for Temporarily disable Lua profiler (T389734) (duration: 15m 04s)
- 00:15 zabe: zabe@mwmaint1002:~$ cat group2.dblist | xargs -I{} bash -c "echo {}; mwscript extensions/AbuseFilter/maintenance/MigrateESRefToAflTable.php {} --deletedump /home/zabe/afl_text_table_deletedump/{} --dump /home/zabe/afl_text_table_dump/{} --sleep 0.4" # T381599
- 00:09 tstarling@deploy1003: tstarling: Continuing with sync
- 00:08 tstarling@deploy1003: tstarling: Backport for Temporarily disable Lua profiler (T389734) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 00:01 tstarling@deploy1003: Started scap sync-world: Backport for Temporarily disable Lua profiler (T389734)
2025-04-02
- 23:32 urandom: starting `nodetool garbagecollect` on Cassandra/sessionstore1006
- 23:28 urandom: starting `nodetool garbagecollect` on Cassandra/sessionstore2005
- 22:38 jhathaway: puppet private repo changes completed, T385995
- 22:01 brett: Import ncmonitor 1.3.3 into bookworm-wikimedia
- 22:00 dreamyjazz@deploy1003: Finished scap sync-world: Backport for AbuseLogger: properly distinguish between global filters and central DB (T390904) (duration: 25m 19s)
- 21:55 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
- 21:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
- 21:53 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
- 21:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
- 21:53 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
- 21:53 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
- 21:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
- 21:53 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
- 21:52 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: test only - bking@cumin2002 - T388610
- 21:41 dreamyjazz@deploy1003: dreamyjazz: Backport for AbuseLogger: properly distinguish between global filters and central DB (T390904) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:37 urandom: starting `nodetool garbagecollect` on Cassandra/sessionstore2004
- 21:35 urandom: starting `nodetool garbagecollect` on Cassandra/sessionstore1005
- 21:35 dreamyjazz@deploy1003: Started scap sync-world: Backport for AbuseLogger: properly distinguish between global filters and central DB (T390904)
- 21:31 reedy@deploy1003: Finished scap sync-world: Backport for Enable EmailAuth enforcement on group 0/1 (T390662) (duration: 15m 42s)
- 21:23 reedy@deploy1003: reedy, tgr: Continuing with sync
- 21:21 reedy@deploy1003: reedy, tgr: Backport for Enable EmailAuth enforcement on group 0/1 (T390662) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:15 reedy@deploy1003: Started scap sync-world: Backport for Enable EmailAuth enforcement on group 0/1 (T390662)
- 21:07 reedy@deploy1003: Finished scap sync-world: Backport for SUL3: Fix user ID mismatch during login (immediately after creation) (T388177), SUL3: Fix user ID mismatch during login (immediately after creation) (T388177), Remove redundant WaitConditionLoop from CentralAuthTokenManager, Remove redundant WaitConditionLoop from CentralAuthTokenManager
- 21:00 reedy@deploy1003: d3r1ck01, matmarex, reedy: Continuing with sync
- {{safesubst:SAL entry|1=20:52 reedy@deploy1003: d3r1ck01, matmarex, reedy: Backport for SUL3: Fix user ID mismatch during login (immediately after creation) (T388177), SUL3: Fix user ID mismatch during login (immediately after creation) (T388177), Remove redundant WaitConditionLoop from CentralAuthTokenManager, [[gerrit:1133504|Remove redundant WaitConditionLoop from CentralAuthTokenManager]}}
- 20:47 reedy@deploy1003: Started scap sync-world: Backport for SUL3: Fix user ID mismatch during login (immediately after creation) (T388177), SUL3: Fix user ID mismatch during login (immediately after creation) (T388177), Remove redundant WaitConditionLoop from CentralAuthTokenManager, Remove redundant WaitConditionLoop from CentralAuthTokenManager
- 20:14 reedy@deploy1003: Started scap sync-world: Backport for SUL3: Fix user ID mismatch during login (immediately after creation) (T388177), SUL3: Fix user ID mismatch during login (immediately after creation) (T388177), Remove redundant WaitConditionLoop from CentralAuthTokenManager, Remove redundant WaitConditionLoop from CentralAuthTokenManager
- 19:54 jhathaway: rolling out a change to private repo, 1127150, please let me know if any issues arise when merging patches
- 18:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host apus-fe2003.codfw.wmnet with OS bookworm
- 18:35 dancy@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.23 refs T386218
- 18:35 cstone: SmashPig upgraded from b9310c06 to 642ae816
- 18:00 reedy@deploy1003: reedy: Continuing with sync
- {{safesubst:SAL entry|1=18:00 reedy@deploy1003: reedy: Backport for EmailAuth: Allow forceEmailAuth test check without extension dependencies (T390437), EmailAuth: Add tests for EmailAuthRequireToken handler (T390437), EmailAuthHooks: Exclude bot users from email auth check (T390662), EmailAuth: Add tests for EmailAuthRequireToken handler (T390437), [[gerrit:1133471|EmailA}}
- 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host apus-fe2003.codfw.wmnet with OS bookworm
- 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-fe2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- {{safesubst:SAL entry|1=17:47 reedy@deploy1003: Started scap sync-world: Backport for EmailAuth: Allow forceEmailAuth test check without extension dependencies (T390437), EmailAuth: Add tests for EmailAuthRequireToken handler (T390437), EmailAuthHooks: Exclude bot users from email auth check (T390662), EmailAuth: Add tests for EmailAuthRequireToken handler (T390437), [[ger}}
- 17:41 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 17:40 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 17:34 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 17:34 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 17:33 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 17:31 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
- 17:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply
- 17:30 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
- 17:30 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
- 17:27 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
- 17:27 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/api-gateway: apply
- 17:25 urandom: starting `nodetool garbagecollect` on sessionstore1004
- 17:17 urandom: updating Cassandra/sessionstore `gc_grace_seconds` to 259200 (from 864000)
- 17:13 brett: reloading varnish-frontend on A:cp and not A:cp-text_drmrs and not A:cp-text_codfw
- 17:08 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on cirrussearch2055.codfw.wmnet with reason: adding net-new role
- {{safesubst:SAL entry|1=16:52 reedy@deploy1003: Started scap sync-world: Backport for EmailAuth: Allow forceEmailAuth test check without extension dependencies (T390437), EmailAuth: Add tests for EmailAuthRequireToken handler (T390437), EmailAuthHooks: Exclude bot users from email auth check (T390662), EmailAuth: Add tests for EmailAuthRequireToken handler (T390437), [[ger}}
- 16:27 vgutierrez: reload varnish on text@codfw to discard stale VCLs - T390846
- 16:26 swfrench@deploy1003: Finished scap sync-world: Deployment to pick up change in mediawiki-deployments.yaml - T389499 (duration: 03m 21s)
- 16:25 swfrench@deploy1003: swfrench: Continuing with sync
- 16:24 swfrench@deploy1003: swfrench: Deployment to pick up change in mediawiki-deployments.yaml - T389499 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 16:23 vgutierrez: reload varnish on text@drmrs to discard stale VCLs - T390846
- 16:23 swfrench@deploy1003: Started scap sync-world: Deployment to pick up change in mediawiki-deployments.yaml - T389499
- 16:10 swfrench-wmf: run-puppet-agent on deploy1003 to pick up mediawiki-deployments.yaml changes - T389499
- 15:28 arnaudb@dns1004: END - running authdns-update
- 15:19 arnaudb@dns1004: START - running authdns-update
- 15:16 arnaudb@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit2002.wikimedia.org with reason: maintenance
- 15:15 arnaudb@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on gerrit1003.wikimedia.org with reason: maintenance
- 15:07 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
- 15:06 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync
- 14:49 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet
- 14:43 ayounsi@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet
- 14:43 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host apus-fe2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:43 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host apus-fe2003.codfw.wmnet with OS bookworm
- 14:35 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart (exit_code=0) rolling restart_daemons on A:dnsbox
- 14:18 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
- 14:17 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
- 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1042.eqiad.wmnet
- 14:13 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1042.eqiad.wmnet
- 14:12 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 14:12 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 14:11 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:11 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:10 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:10 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:07 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 14:06 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 14:06 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:06 volans@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Release v0.8.0 - volans@cumin1002
- 14:05 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:05 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:04 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:04 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
- 14:03 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
- 14:01 volans: upgrading homer to version 0.8.0 to cumin hosts
- 14:01 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
- 14:00 volans@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Release v0.8.0 - volans@cumin1002
- 13:55 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet
- 13:52 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough
- 13:49 ayounsi@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet
- 13:49 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
- 13:43 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
- 13:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet
- 13:41 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
- 13:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet
- 13:40 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet
- 13:37 akosiaris: depool cp3066 for debugging T390854
- 13:37 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart rolling restart_daemons on A:dnsbox
- 13:35 ayounsi@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet
- 13:33 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough
- 13:24 Lucas_WMDE: UTC afternoon backport+config window done
- 13:21 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Configure virtual terms db for wikidata prod & test (T389190), Use wikidata familly in $wgCirrusSearchSimilarityProfile (duration: 16m 55s)
- 13:19 moritzm: installing gnutls28 security updates on Bookworm
- 13:14 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
- 13:14 lucaswerkmeister-wmde@deploy1003: jakob, hashar, lucaswerkmeister-wmde: Continuing with sync
- 13:14 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
- 13:11 lucaswerkmeister-wmde@deploy1003: jakob, hashar, lucaswerkmeister-wmde: Backport for Configure virtual terms db for wikidata prod & test (T389190), Use wikidata familly in $wgCirrusSearchSimilarityProfile synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:04 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Configure virtual terms db for wikidata prod & test (T389190), Use wikidata familly in $wgCirrusSearchSimilarityProfile
- 12:58 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
- 12:58 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
- 12:58 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
- 12:57 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
- 12:57 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
- 12:57 jelto@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
- 12:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd2003.codfw.wmnet
- 12:41 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P74582 and previous config saved to /var/cache/conftool/dbconfig/20250402-124139-root.json
- 12:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cephosd2003.codfw.wmnet
- 12:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd2002.codfw.wmnet
- 12:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P74581 and previous config saved to /var/cache/conftool/dbconfig/20250402-123029-root.json
- 12:28 jmm@dns1004: END - running authdns-update
- 12:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cephosd2002.codfw.wmnet
- 12:26 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P74580 and previous config saved to /var/cache/conftool/dbconfig/20250402-122634-root.json
- 12:26 jmm@dns1004: START - running authdns-update
- 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd2001.codfw.wmnet
- 12:18 akosiaris@dns1004: END - running authdns-update
- 12:16 akosiaris@dns1004: START - running authdns-update
- 12:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P74579 and previous config saved to /var/cache/conftool/dbconfig/20250402-121524-root.json
- 12:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cephosd2001.codfw.wmnet
- 12:11 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P74578 and previous config saved to /var/cache/conftool/dbconfig/20250402-121128-root.json
- 12:11 btullis@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling reboot on A:cephosd
- 12:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1040.eqiad.wmnet
- 12:04 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1040.eqiad.wmnet
- 12:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P74577 and previous config saved to /var/cache/conftool/dbconfig/20250402-120018-root.json
- 11:56 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P74576 and previous config saved to /var/cache/conftool/dbconfig/20250402-115623-root.json
- 11:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P74575 and previous config saved to /var/cache/conftool/dbconfig/20250402-114512-root.json
- 11:44 fabfur: securely erase certificates from A:cp-magru and provide symlink for acmecerts (T384227)
- 11:41 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P74574 and previous config saved to /var/cache/conftool/dbconfig/20250402-114117-root.json
- 11:40 vgutierrez: restart varnish on cp6016 - T390846
- 11:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P74573 and previous config saved to /var/cache/conftool/dbconfig/20250402-113007-root.json
- 11:26 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P74572 and previous config saved to /var/cache/conftool/dbconfig/20250402-112611-root.json
- 11:22 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
- 11:22 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
- 11:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
- 11:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
- 11:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
- 11:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
- 11:19 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
- 11:19 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
- 11:18 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
- 11:18 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
- 11:17 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
- 11:17 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 11:17 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
- 11:17 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 11:16 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
- 11:16 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet
- 11:16 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2043.codfw.wmnet
- 11:16 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
- 11:15 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1043.eqiad.wmnet
- 11:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P74571 and previous config saved to /var/cache/conftool/dbconfig/20250402-111501-root.json
- 11:11 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P74570 and previous config saved to /var/cache/conftool/dbconfig/20250402-111106-root.json
- 11:10 ayounsi@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet
- 11:09 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1043.eqiad.wmnet
- 11:09 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2043.codfw.wmnet
- 11:08 ladsgroup@deploy1003: Finished scap sync-world: Backport for Bump thumbnail steps to 60% (T360589) (duration: 15m 11s)
- 11:04 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 11:03 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 11:03 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 11:03 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 11:03 akosiaris@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 11:03 akosiaris@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 11:03 akosiaris@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 11:02 akosiaris@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 11:01 ladsgroup@deploy1003: ladsgroup: Continuing with sync
- 11:00 btullis@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling reboot on A:cephosd
- 11:00 ladsgroup@deploy1003: ladsgroup: Backport for Bump thumbnail steps to 60% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 10:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P74569 and previous config saved to /var/cache/conftool/dbconfig/20250402-105956-root.json
- 10:56 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P74568 and previous config saved to /var/cache/conftool/dbconfig/20250402-105601-root.json
- 10:53 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump thumbnail steps to 60% (T360589)
- 10:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P74567 and previous config saved to /var/cache/conftool/dbconfig/20250402-104450-root.json
- 10:40 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P74566 and previous config saved to /var/cache/conftool/dbconfig/20250402-104055-root.json
- 10:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P74564 and previous config saved to /var/cache/conftool/dbconfig/20250402-102944-root.json
- 10:25 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P74563 and previous config saved to /var/cache/conftool/dbconfig/20250402-102549-root.json
- 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1039.eqiad.wmnet
- 10:19 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1039.eqiad.wmnet
- 10:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6004.drmrs.wmnet
- 10:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6004.drmrs.wmnet
- 10:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P74561 and previous config saved to /var/cache/conftool/dbconfig/20250402-101439-root.json
- 10:13 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
- 10:13 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
- 10:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6004.drmrs.wmnet
- 10:10 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P74560 and previous config saved to /var/cache/conftool/dbconfig/20250402-101044-root.json
- 10:10 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
- 10:09 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
- 10:09 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
- 10:09 jelto@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
- 09:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P74559 and previous config saved to /var/cache/conftool/dbconfig/20250402-095933-root.json
- 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6004.drmrs.wmnet
- 09:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6003.drmrs.wmnet
- 09:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6003.drmrs.wmnet
- 09:55 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P74558 and previous config saved to /var/cache/conftool/dbconfig/20250402-095538-root.json
- 09:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6003.drmrs.wmnet
- 09:52 marostegui@cumin1002: dbctl commit (dc=all): 'Add db2243 to dbctl depooled T381475', diff saved to https://phabricator.wikimedia.org/P74557 and previous config saved to /var/cache/conftool/dbconfig/20250402-095213-marostegui.json
- 09:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P74556 and previous config saved to /var/cache/conftool/dbconfig/20250402-094428-root.json
- 09:41 marostegui@cumin1002: dbctl commit (dc=all): 'Add db1257 to dbctl depooled T381475', diff saved to https://phabricator.wikimedia.org/P74555 and previous config saved to /var/cache/conftool/dbconfig/20250402-094109-marostegui.json
- 09:41 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2042.codfw.wmnet
- 09:40 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1042.eqiad.wmnet
- 09:40 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6003.drmrs.wmnet
- 09:34 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2042.codfw.wmnet
- 09:34 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1042.eqiad.wmnet
- 09:29 elukey@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 09:27 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1041.eqiad.wmnet
- 09:24 XioNoX: rebooting mr1-ulsfo - T390052
- 09:24 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1036.eqiad.wmnet
- 09:23 ayounsi@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on mr1-ulsfo with reason: reboot
- 09:21 elukey@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 09:21 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1041.eqiad.wmnet
- 09:19 akosiaris@dns1004: END - running authdns-update
- 09:18 akosiaris: create mw-wikifunctions-ingress.discovery.wmnet and .svc records to facilitate the migration to ingress
- 09:17 moritzm: failover ganeti masters in drmrs to ganeti6001/6002
- 09:16 akosiaris@dns1004: START - running authdns-update
- 09:16 ayounsi@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1036.eqiad.wmnet
- 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6002.drmrs.wmnet
- 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6002.drmrs.wmnet
- 09:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6002.drmrs.wmnet
- 08:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6002.drmrs.wmnet
- 08:56 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti6001.drmrs.wmnet
- 08:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6001.drmrs.wmnet
- 08:55 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:50 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1036.eqiad.wmnet
- 08:48 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
- 08:48 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1036.eqiad.wmnet
- 08:48 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
- 08:48 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
- 08:47 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
- 08:47 akosiaris@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 08:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6001.drmrs.wmnet
- 08:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6001.drmrs.wmnet
- 08:47 akosiaris@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 08:46 akosiaris@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 08:46 akosiaris@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 08:45 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 08:45 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 08:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6001.drmrs.wmnet
- 08:41 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 08:40 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 08:38 jmm@dns1004: END - running authdns-update
- 08:38 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:37 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:36 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:36 jmm@dns1004: START - running authdns-update
- 08:36 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:32 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6001.drmrs.wmnet
- 08:32 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:31 XioNoX: trunk sandbox vlan to eqiad row B ganeti - T385560
- 08:30 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:30 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:28 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:28 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:26 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:26 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:23 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:23 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:18 fabfur: repooled cp7001 (T384227)
- 08:15 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:15 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 07:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 07:57 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 07:49 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 07:49 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 07:47 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs2013.*,lvs1019.*} and A:lvs
- 07:46 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs2013.*,lvs1019.*} and A:lvs
- 07:39 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 07:39 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 07:36 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs2014.*,lvs1020.*} and A:lvs
- 07:34 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs2014.*,lvs1020.*} and A:lvs
- 07:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 07:29 fabfur: depool cp7001 to fix stale ocsp alert (T384227)
- 07:19 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 07:18 jmm@dns1004: END - running authdns-update
- 07:16 jmm@dns1004: START - running authdns-update
- 07:02 jmm@dns1004: END - running authdns-update
- 06:59 jmm@dns1004: START - running authdns-update
- 06:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry2004.codfw.wmnet
2025-04-01
- 23:43 reedy@deploy1003: rebuilt and synchronized wikiversions files: pihwiki to .23
- 23:40 ladsgroup@dns1004: END - running authdns-update
- 23:38 ladsgroup@dns1004: START - running authdns-update
- 23:34 ladsgroup@dns1004: END - running authdns-update
- 23:32 ladsgroup@dns1004: START - running authdns-update
- 23:27 ladsgroup@dns1004: END - running authdns-update
- 23:25 ladsgroup@dns1004: START - running authdns-update
- 23:20 ladsgroup@dns1004: END - running authdns-update
- 23:18 ladsgroup@dns1004: START - running authdns-update
- 23:03 ladsgroup@dns1004: END - running authdns-update
- 23:00 ladsgroup@dns1004: START - running authdns-update
- 22:04 bking@cumin2002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for cirrussearch2055.codfw.wmnet: Renew puppet certificate - bking@cumin2002
- 21:41 mutante: deploy1003 sudo -u mwdeploy /usr/local/bin/mwscript-cleanup --debug eqiad
- 20:46 taavi@deploy1003: Finished scap sync-world: Backport for homepage: Add `homepage_transfersize_bytes_total` metric (T382003), homepage: Add `homepage_transfersize_bytes_total` metric (T382003), Don't add WikiLove icon to Minerva (T390642) (duration: 16m 59s)
- 20:39 taavi@deploy1003: migr, taavi: Continuing with sync
- 20:37 taavi@deploy1003: migr, taavi: Backport for homepage: Add `homepage_transfersize_bytes_total` metric (T382003), homepage: Add `homepage_transfersize_bytes_total` metric (T382003), Don't add WikiLove icon to Minerva (T390642) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe2006.codfw.wmnet with OS bullseye
- 20:30 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2016.codfw.wmnet with OS bullseye
- 20:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:30 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:30 taavi@deploy1003: Started scap sync-world: Backport for homepage: Add `homepage_transfersize_bytes_total` metric (T382003), homepage: Add `homepage_transfersize_bytes_total` metric (T382003), Don't add WikiLove icon to Minerva (T390642)
- 20:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2015.codfw.wmnet with OS bullseye
- 20:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe2007.codfw.wmnet with OS bullseye
- 20:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe2005.codfw.wmnet with OS bullseye
- 20:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:21 taavi@deploy1003: Finished scap sync-world: Backport for [plwiki] Allow bureaucrats to remove users from sysop usergroup (T389829), Close pihwiki (T390732) (duration: 14m 18s)
- 20:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:14 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:14 taavi@deploy1003: superpes, taavi: Continuing with sync
- 20:13 taavi@deploy1003: superpes, taavi: Backport for [plwiki] Allow bureaucrats to remove users from sysop usergroup (T389829), Close pihwiki (T390732) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2006.codfw.wmnet with reason: host reimage
- 20:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2016.codfw.wmnet with reason: host reimage
- 20:07 taavi@deploy1003: Started scap sync-world: Backport for [plwiki] Allow bureaucrats to remove users from sysop usergroup (T389829), Close pihwiki (T390732)
- 20:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2015.codfw.wmnet with reason: host reimage
- 20:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2007.codfw.wmnet with reason: host reimage
- 19:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2005.codfw.wmnet with reason: host reimage
- 19:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2016.codfw.wmnet with reason: host reimage
- 19:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2007.codfw.wmnet with reason: host reimage
- 19:55 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2015.codfw.wmnet with reason: host reimage
- 19:54 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2006.codfw.wmnet with reason: host reimage
- 19:54 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2005.codfw.wmnet with reason: host reimage
- 19:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host apus-fe2003.codfw.wmnet with OS bookworm
- 19:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2016.codfw.wmnet with OS bullseye
- 19:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-fe2007.codfw.wmnet with OS bullseye
- 19:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2015.codfw.wmnet with OS bullseye
- 19:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-fe2006.codfw.wmnet with OS bullseye
- 19:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-fe2005.codfw.wmnet with OS bullseye
- 19:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['apus-fe2003']
- 19:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe2016']
- 19:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe2015']
- 19:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['thanos-fe2007']
- 19:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe2007']
- 19:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['thanos-fe2006']
- 19:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['thanos-fe2005']
- 19:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['thanos-fe2007']
- 19:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe2015']
- 19:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe2016']
- 19:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['apus-fe2003']
- 19:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe2007']
- 19:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe2006']
- 19:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe2005']
- 19:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-fe2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-fe2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-fe2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe2016.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe2015.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-fe2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-fe2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host apus-fe2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-fe2016.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-fe2015.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-fe2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-fe2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:29 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host apus-fe2003
- 19:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host apus-fe2003
- 19:29 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-fe2016
- 19:28 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-fe2016
- 19:28 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-fe2015
- 19:28 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-fe2015
- 19:28 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-fe2007
- 19:28 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-fe2007
- 19:28 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-fe2006
- 19:28 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-fe2006
- 19:28 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-fe2005
- 19:28 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-fe2005
- 19:27 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding thanos-fe2005-7, ms-fe2015-6, and apus-fe2003 to codfw - jhancock@cumin2002"
- 19:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding thanos-fe2005-7, ms-fe2015-6, and apus-fe2003 to codfw - jhancock@cumin2002"
- 19:23 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 18:50 cstone: payments-wiki upgraded from 19b1c505 to e090b97b
- 18:25 bking@cumin2002: DONE (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for cirrussearch2055.eqiad.wmnet: Renew puppet certificate - bking@cumin2002
- 18:25 bking@cumin2002: DONE (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for cirrussearch2055.eqiad.wmnet: Renew puppet certificate - bking@cumin2002
- 18:20 dzahn@dns1004: END - running authdns-update
- 18:19 mforns@deploy1003: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
- 18:19 mforns@deploy1003: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
- 18:17 dzahn@dns1004: START - running authdns-update
- 18:15 dancy@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.23 refs T386218
- 18:11 dancy@deploy1003: Testing. Disreagard
- 17:58 herron@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=k8s-ingress-aux-rw,name=codfw
- 17:48 herron@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=k8s-ingress-aux-rw,name=eqiad
- 17:48 herron@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=k8s-ingress-aux-rw,name=codfw
- 17:48 herron@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=k8s-ingress-aux-ro,name=codfw
- 17:48 herron@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=k8s-ingress-aux-ro,name=eqiad
- 17:41 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2055.codfw.wmnet with OS bullseye
- 17:25 brett: importing varnishkafka 1.2.0-1 into bullseye-wikimedia main (T378737)
- 17:25 brett: importing libvmod-re2/varnish-re2 2.0.0-2~bpo11+wmf2 into bullseye-wikimedia main (T378737)
- 17:24 brett: importing libvmod-querysort 0.4-3 into bullseye-wikimedia main (T378737)
- 17:24 brett: importing libvmod-netmapper 1.9.1-1 into bullseye-wikimedia main (T378737)
- 17:23 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7001.magru.wmnet
- 17:23 brett: importing varnish-modules 0.20.0-2~bpo11 into bullseye-wikimedia main (T378737)
- 17:23 fabfur: repool cp7001, no certs removed (T384227)
- 17:22 brett: importing varnish 7.1.1-1.1~bpo11+wmf1 into bullseye-wikimedia main (T378737)
- 16:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2055
- 16:23 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2055
- 16:23 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2055.codfw.wmnet with OS bullseye
- 16:17 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
- 16:17 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
- 16:04 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch2055.codfw.wmnet with OS bullseye
- 15:45 topranks: removing et-0/0/0 from ae0 bundle on cr3-ulsfo and cr4-ulsfo T390731
- 15:27 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on 27 hosts with reason: Maintenance in s2
- 15:27 dzahn@dns1004: END - running authdns-update
- 15:25 mutante: DNS - new project language 'nup' - Nupe (also known as Anufe, Nupenci, Nyinfe, and Tapa[3]) is a Volta–Niger language of the Nupoid branch primarily spoken by the Nupe people of the North Central region of Nigeria.
- 15:24 dzahn@dns1004: START - running authdns-update
- 15:19 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:18 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:11 brennen@deploy1003: Finished deploy [phabricator/deployment@53fcaf8]: deploy phab1004 for T390737 (duration: 00m 36s)
- 15:10 brennen@deploy1003: Started deploy [phabricator/deployment@53fcaf8]: deploy phab1004 for T390737
- 15:09 brennen@deploy1003: Finished deploy [phabricator/deployment@53fcaf8]: test deploy phab2002 for T390737 (duration: 00m 39s)
- 15:08 brennen@deploy1003: Started deploy [phabricator/deployment@53fcaf8]: test deploy phab2002 for T390737
- 15:05 dzahn@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: phabricator deploy
- 15:04 dzahn@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: phabricator deploy
- 14:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 14:51 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2035.codfw.wmnet [reason: finished T390658]
- 14:50 fabfur: depooled cp7001 to test secure removal of unused certificates (T384227)
- 14:49 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp7001.magru.wmnet
- 14:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2006.codfw.wmnet with OS bookworm
- 14:47 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch2055
- 14:47 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2055
- 14:46 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2055
- 14:46 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch2055.codfw.wmnet 180.0.192.10.in-addr.arpa 0.8.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 14:46 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch2055.codfw.wmnet 180.0.192.10.in-addr.arpa 0.8.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 14:46 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:46 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2055 - bking@cumin2002"
- 14:46 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cirrussearch2055 - bking@cumin2002"
- 14:42 bking@cumin2002: START - Cookbook sre.dns.netbox
- 14:41 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 14:41 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
- 14:41 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
- 14:40 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
- 14:40 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
- 14:40 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch2055
- 14:40 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2055.codfw.wmnet with OS bullseye
- 14:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic2055 to cirrussearch2055
- 14:37 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch2055
- 14:37 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch2055
- 14:37 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:37 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2055 to cirrussearch2055 - bking@cumin2002"
- 14:36 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic2055 to cirrussearch2055 - bking@cumin2002"
- 14:33 ladsgroup@deploy1003: Finished scap sync-world: Backport for CommmonSettings: Remove old BounceHandler DB config (duration: 15m 28s)
- 14:32 bking@cumin2002: START - Cookbook sre.dns.netbox
- 14:31 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic2055 to cirrussearch2055
- 14:28 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
- 14:27 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
- 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2006.codfw.wmnet with reason: host reimage
- 14:26 ladsgroup@deploy1003: reedy, ladsgroup: Continuing with sync
- 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T370903)', diff saved to https://phabricator.wikimedia.org/P74547 and previous config saved to /var/cache/conftool/dbconfig/20250401-142516-ladsgroup.json
- 14:24 ladsgroup@deploy1003: reedy, ladsgroup: Backport for CommmonSettings: Remove old BounceHandler DB config synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:23 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2006.codfw.wmnet with reason: host reimage
- 14:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T370903)', diff saved to https://phabricator.wikimedia.org/P74546 and previous config saved to /var/cache/conftool/dbconfig/20250401-142228-ladsgroup.json
- 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for CommmonSettings: Remove old BounceHandler DB config
- 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo and group 1
- 14:15 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo and group 1
- 14:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet
- 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P74545 and previous config saved to /var/cache/conftool/dbconfig/20250401-141008-ladsgroup.json
- 14:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P74544 and previous config saved to /var/cache/conftool/dbconfig/20250401-140721-ladsgroup.json
- 14:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet
- 14:05 elukey: roll restart nginx on registry* to remove debug logging - too much data, filling up the root partition
- 14:02 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host registry2005.codfw.wmnet
- 14:00 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2006.codfw.wmnet with OS bookworm
- 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P74543 and previous config saved to /var/cache/conftool/dbconfig/20250401-135501-ladsgroup.json
- 13:53 elukey@cumin1002: START - Cookbook sre.hosts.reboot-single for host registry2005.codfw.wmnet
- 13:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P74542 and previous config saved to /var/cache/conftool/dbconfig/20250401-135215-ladsgroup.json
- 13:48 elukey: depool registry2005 to investigate some nginx logging issue
- 13:44 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2035.codfw.wmnet [reason: T390658]
- 13:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T370903)', diff saved to https://phabricator.wikimedia.org/P74540 and previous config saved to /var/cache/conftool/dbconfig/20250401-133954-ladsgroup.json
- 13:39 elukey: restart nginx on registry2005 - stuck writing error logs
- 13:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2005.codfw.wmnet with OS bookworm
- 13:37 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.zarcillo (exit_code=0)
- 13:37 fceratto@cumin1002: START - Cookbook sre.mysql.zarcillo
- 13:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T370903)', diff saved to https://phabricator.wikimedia.org/P74539 and previous config saved to /var/cache/conftool/dbconfig/20250401-133707-ladsgroup.json
- 13:35 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.zarcillo (exit_code=0)
- 13:35 fceratto@cumin1002: START - Cookbook sre.mysql.zarcillo
- 13:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4008.ulsfo.wmnet with OS bookworm
- 13:29 Lucas_WMDE: UTC afternoon backport+config window done
- 13:28 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Remove 'exception-json' logging channel, Disable experiment-related config during active development (duration: 18m 04s)
- 13:27 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2041.codfw.wmnet
- 13:26 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1040.eqiad.wmnet
- 13:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2165 (T370903)', diff saved to https://phabricator.wikimedia.org/P74537 and previous config saved to /var/cache/conftool/dbconfig/20250401-132407-ladsgroup.json
- 13:24 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2165.codfw.wmnet with reason: Maintenance
- 13:21 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, cjming, matmarex: Continuing with sync
- 13:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1160 (T370903)', diff saved to https://phabricator.wikimedia.org/P74536 and previous config saved to /var/cache/conftool/dbconfig/20250401-132059-ladsgroup.json
- 13:20 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1160.eqiad.wmnet with reason: Maintenance
- 13:20 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1040.eqiad.wmnet
- 13:20 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2041.codfw.wmnet
- 13:18 moritzm: installing python-cryptography security updates
- 13:18 moritzm: installing python-cryptohgraphy security updates
- 13:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2005.codfw.wmnet with reason: host reimage
- 13:17 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, cjming, matmarex: Backport for Remove 'exception-json' logging channel, Disable experiment-related config during active development synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T371742)', diff saved to https://phabricator.wikimedia.org/P74534 and previous config saved to /var/cache/conftool/dbconfig/20250401-131530-ladsgroup.json
- 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage
- 13:13 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2005.codfw.wmnet with reason: host reimage
- 13:10 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage
- 13:10 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Remove 'exception-json' logging channel, Disable experiment-related config during active development
- 13:05 elukey: restart nginx on registry* to pick up https://gerrit.wikimedia.org/r/c/operations/puppet/+/1133112 - debug logs to /var/log/nginx/debug.log - T390251
- 13:04 XioNoX: msw2-eqiad> restart jsd gracefully - T390052
- 13:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P74533 and previous config saved to /var/cache/conftool/dbconfig/20250401-130023-ladsgroup.json
- 12:50 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4008.ulsfo.wmnet with OS bookworm
- 12:48 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2005.codfw.wmnet with OS bookworm
- 12:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2004.codfw.wmnet with OS bookworm
- 12:47 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.zarcillo (exit_code=0)
- 12:47 fceratto@cumin1002: START - Cookbook sre.mysql.zarcillo
- 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P74530 and previous config saved to /var/cache/conftool/dbconfig/20250401-124516-ladsgroup.json
- 12:44 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.zarcillo (exit_code=0)
- 12:44 fceratto@cumin1002: START - Cookbook sre.mysql.zarcillo
- 12:43 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.zarcillo (exit_code=0)
- 12:43 fceratto@cumin1002: START - Cookbook sre.mysql.zarcillo
- 12:42 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.zarcillo (exit_code=0)
- 12:42 fceratto@cumin1002: START - Cookbook sre.mysql.zarcillo
- 12:42 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.zarcillo (exit_code=99)
- 12:41 fceratto@cumin1002: START - Cookbook sre.mysql.zarcillo
- 12:41 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2040.codfw.wmnet
- 12:41 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.zarcillo (exit_code=99)
- 12:40 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1039.eqiad.wmnet
- 12:39 fceratto@cumin1002: START - Cookbook sre.mysql.zarcillo
- 12:39 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ganeti4008.ulsfo.wmnet
- 12:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet
- 12:34 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2040.codfw.wmnet
- 12:34 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1039.eqiad.wmnet
- 12:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T371742)', diff saved to https://phabricator.wikimedia.org/P74529 and previous config saved to /var/cache/conftool/dbconfig/20250401-123009-ladsgroup.json
- 12:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet
- 12:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2004.codfw.wmnet with reason: host reimage
- 12:24 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2004.codfw.wmnet with reason: host reimage
- 12:23 moritzm: installing PHP 7.4 security updates (as shipped in Debian, not our internal build running on a few remaining edge cases)
- 12:12 dbrant@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
- 12:11 dbrant@deploy1003: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
- 12:11 dbrant@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
- 12:11 dbrant@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
- 12:08 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1038.eqiad.wmnet
- 12:08 dbrant@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
- 12:08 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2039.codfw.wmnet
- 12:08 dbrant@deploy1003: helmfile [staging] START helmfile.d/services/wikifeeds: apply
- 12:04 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2004.codfw.wmnet with OS bookworm
- 12:02 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
- 12:02 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1038.eqiad.wmnet
- 12:02 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
- 12:02 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2039.codfw.wmnet
- 11:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T371742)', diff saved to https://phabricator.wikimedia.org/P74528 and previous config saved to /var/cache/conftool/dbconfig/20250401-115935-ladsgroup.json
- 11:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2003.codfw.wmnet with OS bookworm
- 11:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P74527 and previous config saved to /var/cache/conftool/dbconfig/20250401-114428-ladsgroup.json
- 11:34 Lucas_WMDE: Deployed patch for T389369
- 11:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2003.codfw.wmnet with reason: host reimage
- 11:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P74526 and previous config saved to /var/cache/conftool/dbconfig/20250401-112921-ladsgroup.json
- 11:26 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2003.codfw.wmnet with reason: host reimage
- 11:26 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2038.codfw.wmnet
- 11:25 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1037.eqiad.wmnet
- 11:24 moritzm: installing squid security updates
- 11:22 hashar: Restarting Gerrit
- 11:19 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc2038.codfw.wmnet
- 11:18 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc1037.eqiad.wmnet
- 11:16 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti4008.ulsfo.wmnet
- 11:16 topranks: reboot cr4-ulsfo to upgrade JunOS T364092
- 11:15 hashar: Restarted Gerrit replica on gerrit2002 to raise heap from 32G to 64G | T387223
- 11:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T371742)', diff saved to https://phabricator.wikimedia.org/P74525 and previous config saved to /var/cache/conftool/dbconfig/20250401-111415-ladsgroup.json
- 11:13 volans@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on sretest1002.eqiad.wmnet with reason: Test
- 11:12 moritzm: restarting FPM on phab1004 to pick up security update
- 11:10 volans: upgrading spicerack to v10.0.0 on cumin1002
- 11:10 cmooney@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 8 hosts with reason: Upgrade cr4-ulsfo JunOS
- 11:06 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti4008.ulsfo.wmnet with reason: remove from cluster for reimage
- 11:06 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2003.codfw.wmnet with OS bookworm
- 11:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4008.ulsfo.wmnet
- 11:05 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2006.codfw.wmnet
- 11:04 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 11:04 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wcqs-public
- 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2002.codfw.wmnet with OS bookworm
- 11:02 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wcqs-public
- 10:58 ladsgroup@deploy1003: Finished scap sync-world: Backport for Bump thumbnail steps to 55% (T360589) (duration: 22m 03s)
- 10:58 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp2006.codfw.wmnet
- 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wdqs-all
- 10:56 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2005.codfw.wmnet
- 10:56 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1006.eqiad.wmnet
- 10:55 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1211.eqiad.wmnet onto db1257.eqiad.wmnet
- 10:55 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1211 slowly with 10 steps - Pool db1211.eqiad.wmnet in after cloning
- 10:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2204 (T371742)', diff saved to https://phabricator.wikimedia.org/P74523 and previous config saved to /var/cache/conftool/dbconfig/20250401-105425-ladsgroup.json
- 10:54 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2204.codfw.wmnet with reason: Maintenance
- 10:50 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp2005.codfw.wmnet
- 10:50 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1006.eqiad.wmnet
- 10:48 ladsgroup@deploy1003: ladsgroup: Continuing with sync
- 10:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2165 (T371742)', diff saved to https://phabricator.wikimedia.org/P74522 and previous config saved to /var/cache/conftool/dbconfig/20250401-104659-ladsgroup.json
- 10:46 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance
- 10:46 ladsgroup@deploy1003: ladsgroup: Backport for Bump thumbnail steps to 55% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 10:45 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wdqs-all
- 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wdqs-test
- 10:43 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wdqs-test
- 10:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2002.codfw.wmnet with reason: host reimage
- 10:40 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2002.codfw.wmnet with reason: host reimage
- 10:36 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump thumbnail steps to 55% (T360589)
- 10:33 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2004.codfw.wmnet
- 10:33 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1005.eqiad.wmnet
- 10:27 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp2004.codfw.wmnet
- 10:26 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1005.eqiad.wmnet
- 10:25 akosiaris@deploy1003: Finished scap sync-world: Backport for typos: Add wnmet as a typo (duration: 29m 34s)
- 10:24 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1004.eqiad.wmnet
- 10:20 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2002.codfw.wmnet with OS bookworm
- 10:19 jiji@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mc-gp2004.codfw.wmnet
- 10:19 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp2004.codfw.wmnet
- 10:19 aqu@deploy1003: Finished deploy [airflow-dags/analytics@d96f732]: Update artifacts for analytics (duration: 00m 59s)
- 10:18 aqu@deploy1003: Started deploy [airflow-dags/analytics@d96f732]: Update artifacts for analytics
- 10:17 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1004.eqiad.wmnet
- 10:17 aqu@deploy1003: Finished deploy [airflow-dags/analytics_test@d96f732]: Update artifacts for analytics_test (duration: 00m 12s)
- 10:17 aqu@deploy1003: Started deploy [airflow-dags/analytics_test@d96f732]: Update artifacts for analytics_test
- 10:17 jiji@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mc-gp1004.eqiad.wmnet
- 10:16 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1004.eqiad.wmnet
- 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2001.codfw.wmnet with OS bookworm
- 10:09 akosiaris@deploy1003: akosiaris: Continuing with sync
- 10:08 akosiaris@deploy1003: akosiaris: Backport for typos: Add wnmet as a typo synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 10:00 joal@deploy1003: Finished deploy [analytics/refinery@efc4808] (hadoop-test): Analytics webrequest migration TEST [analytics/refinery@efc48089] (duration: 00m 40s)
- 09:59 joal@deploy1003: Started deploy [analytics/refinery@efc4808] (hadoop-test): Analytics webrequest migration TEST [analytics/refinery@efc48089]
- 09:59 joal@deploy1003: Finished deploy [analytics/refinery@efc4808] (thin): Analytics webrequest migration THIN [analytics/refinery@efc48089] (duration: 00m 55s)
- 09:58 joal@deploy1003: Started deploy [analytics/refinery@efc4808] (thin): Analytics webrequest migration THIN [analytics/refinery@efc48089]
- 09:57 joal@deploy1003: Finished deploy [analytics/refinery@efc4808]: Analytics webrequest migration [analytics/refinery@efc48089] (duration: 02m 24s)
- 09:57 moritzm: installing freetype security updates
- 09:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2001.codfw.wmnet with reason: host reimage
- 09:55 akosiaris@deploy1003: Started scap sync-world: Backport for typos: Add wnmet as a typo
- 09:55 akosiaris: scap backport a noop change https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1133069 for T390251
- 09:55 joal@deploy1003: Started deploy [analytics/refinery@efc4808]: Analytics webrequest migration [analytics/refinery@efc48089]
- 09:52 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2001.codfw.wmnet with reason: host reimage
- 09:50 elukey: restart nginx on registry* to pick up the debug changes
- 09:42 volans@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on sretest1001.eqiad.wmnet with reason: test
- 09:39 gmodena@deploy1003: Finished deploy [airflow-dags/search@ed0fc78]: Deploy mjolnir-2.7.0.dev.conda.tgz (duration: 01m 29s)
- 09:38 gmodena@deploy1003: Started deploy [airflow-dags/search@ed0fc78]: Deploy mjolnir-2.7.0.dev.conda.tgz
- 09:32 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2001.codfw.wmnet with OS bookworm
- 09:27 ayounsi@cumin1002: END (ERROR) - Cookbook sre.network.tls (exit_code=97) for network device mr1-ulsfo
- 09:26 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-ulsfo
- 09:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 09:19 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 09:00 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4008.ulsfo.wmnet
- 08:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4008.ulsfo.wmnet
- 08:58 dcausse@deploy1003: Finished deploy [wdqs/wdqs@354b5ac]: revert T326311, deletion query way too slow (duration: 12m 15s)
- 08:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4008.ulsfo.wmnet
- 08:50 hashar@deploy1003: Finished deploy [integration/docroot@5256e19]: build: Updating eslint-config-wikimedia to 0.29.1 (duration: 00m 09s)
- 08:50 hashar@deploy1003: Started deploy [integration/docroot@5256e19]: build: Updating eslint-config-wikimedia to 0.29.1
- 08:46 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device msw1-eqiad
- 08:46 topranks: Drain Lumen cct from codfw to ulsfo due to instability T390660
- 08:46 dcausse@deploy1003: Started deploy [wdqs/wdqs@354b5ac]: revert T326311, deletion query way too slow
- 08:45 volans: upgrading spicerack to v10.0.0 on cumin2002
- 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device msw1-eqiad
- 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device msw2-eqiad
- 08:38 marostegui@cumin1002: START - Cookbook sre.mysql.pool db1211 slowly with 10 steps - Pool db1211.eqiad.wmnet in after cloning
- 08:36 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device msw2-eqiad
- 08:36 moritzm: failover ganeti master in ulsfo to ganeti4005 T382511
- 08:35 volans: temporary disable puppet on cumin1002 for the spicerack upgrade to v10.0.0
- 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device msw1-codfw
- 08:34 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti4007
- 08:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti4007
- 08:33 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4007.ulsfo.wmnet to cluster ulsfo and group 1
- 08:32 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device msw1-codfw
- 08:29 elukey: set debug logging for registry*'s nginx - T390251
- 08:29 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device msw2-codfw
- 08:29 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4007.ulsfo.wmnet to cluster ulsfo and group 1
- 08:27 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device msw2-codfw
- 08:24 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device mr1-eqiad
- 08:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4007.ulsfo.wmnet
- 08:18 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-eqiad
- 08:18 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device mr1-eqsin
- 08:17 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
- 08:16 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply
- 08:14 dcausse: T390665: restart blazegraph on wdqs2017
- 08:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4007.ulsfo.wmnet
- 08:12 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
- 08:12 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply
- 08:11 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-eqsin
- 08:11 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply
- 08:11 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply
- 08:05 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device mr1-esams
- 08:05 dcausse: restarting blazegraph on wdqs2016
- 08:04 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 08:03 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 08:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4007.ulsfo.wmnet with OS bookworm
- 08:00 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 07:59 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 07:59 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-esams
- 07:58 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device mr1-drmrs
- 07:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-drmrs
- 07:50 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device mr1-magru
- 07:47 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 07:46 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 07:44 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-magru
- 07:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4007.ulsfo.wmnet with reason: host reimage
- 07:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device mr1-codfw
- 07:37 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4007.ulsfo.wmnet with reason: host reimage
- 07:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-codfw
- 07:34 kevinbazira@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
- 07:31 ayounsi@cumin1002: END (ERROR) - Cookbook sre.network.tls (exit_code=97) for network device mr1-ulsfo
- 07:30 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-models' for release 'main' .
- 07:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-ulsfo
- 07:28 ayounsi@cumin1002: END (ERROR) - Cookbook sre.network.tls (exit_code=97) for network device mr1-ulsfo
- 07:28 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-ulsfo
- 07:26 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device mr1-ulsfo
- 07:24 kevinbazira@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
- 07:20 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4007.ulsfo.wmnet with OS bookworm
- 07:19 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device mr1-ulsfo
- 07:19 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1b-eqiad
- 07:17 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1b-eqiad
- 06:14 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db1211.eqiad.wmnet onto db1257.eqiad.wmnet
- 05:33 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@557a834]: 0.3.155 (duration: 12m 49s)
- 05:22 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.155` on canary `wdqs1015`; proceeding to rest of fleet
- 05:20 ryankemper@deploy1003: Started deploy [wdqs/wdqs@557a834]: 0.3.155
- 05:14 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.155`. Pre-deploy tests passing on canary `wdqs1016`
- 04:04 mwpresync@deploy1003: Pruned MediaWiki: 1.44.0-wmf.20 (duration: 04m 34s)
- 03:02 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.44.0-wmf.23 refs T386218