Server Admin Log/Archive 86

2024-10-31

23:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T376905)', diff saved to https://phabricator.wikimedia.org/P70827 and previous config saved to /var/cache/conftool/dbconfig/20241031-234959-ladsgroup.json
23:41 urbanecm: Run extensions/Flow/maintenance/FlowMoveBoardsToSubpages.php for several wikis (T376749; wiki list is on task)
23:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T376905)', diff saved to https://phabricator.wikimedia.org/P70809 and previous config saved to /var/cache/conftool/dbconfig/20241031-234030-ladsgroup.json
23:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
23:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
23:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T376905)', diff saved to https://phabricator.wikimedia.org/P70808 and previous config saved to /var/cache/conftool/dbconfig/20241031-234003-ladsgroup.json
23:37 swfrench@deploy2002: Finished scap sync-world: Deployment to clear noop chart diff from 1085491 - T372604 T377040 (duration: 01m 49s)
23:35 swfrench@deploy2002: Started scap sync-world: Deployment to clear noop chart diff from 1085491 - T372604 T377040
23:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P70807 and previous config saved to /var/cache/conftool/dbconfig/20241031-232456-ladsgroup.json
23:15 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet']
23:13 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet']
23:12 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet']
23:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P70806 and previous config saved to /var/cache/conftool/dbconfig/20241031-230949-ladsgroup.json
22:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T376905)', diff saved to https://phabricator.wikimedia.org/P70805 and previous config saved to /var/cache/conftool/dbconfig/20241031-225442-ladsgroup.json
22:48 dancy@deploy2002: Finished scap sync-world: Backport for Dummy commit for testing (duration: 07m 28s)
22:46 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1019.eqiad.wmnet with OS bullseye
22:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T376905)', diff saved to https://phabricator.wikimedia.org/P70804 and previous config saved to /var/cache/conftool/dbconfig/20241031-224513-ladsgroup.json
22:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
22:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
22:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T376905)', diff saved to https://phabricator.wikimedia.org/P70803 and previous config saved to /var/cache/conftool/dbconfig/20241031-224446-ladsgroup.json
22:43 dancy@deploy2002: dancy: Continuing with sync
22:43 dancy@deploy2002: dancy: Backport for Dummy commit for testing synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:40 dancy@deploy2002: Started scap sync-world: Backport for Dummy commit for testing
22:30 dancy@deploy2002: Installation of scap version "4.119.4" completed for 1 hosts
22:29 dancy@deploy2002: Installing scap version "4.119.4" for 1 hosts
22:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P70802 and previous config saved to /var/cache/conftool/dbconfig/20241031-222939-ladsgroup.json
22:21 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye
22:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P70801 and previous config saved to /var/cache/conftool/dbconfig/20241031-221432-ladsgroup.json
21:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T376905)', diff saved to https://phabricator.wikimedia.org/P70800 and previous config saved to /var/cache/conftool/dbconfig/20241031-215925-ladsgroup.json
21:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T376905)', diff saved to https://phabricator.wikimedia.org/P70799 and previous config saved to /var/cache/conftool/dbconfig/20241031-215056-ladsgroup.json
21:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
21:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
21:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
21:50 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
21:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T376905)', diff saved to https://phabricator.wikimedia.org/P70798 and previous config saved to /var/cache/conftool/dbconfig/20241031-215025-ladsgroup.json
21:50 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1019.eqiad.wmnet with OS bullseye
21:40 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye
21:40 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet']
21:37 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet']
21:37 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet']
21:35 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet']
21:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P70797 and previous config saved to /var/cache/conftool/dbconfig/20241031-213518-ladsgroup.json
21:35 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet']
21:22 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet']
21:22 urandom: Bootstrapping Cassandra/aqs1022-b — T378725
21:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P70796 and previous config saved to /var/cache/conftool/dbconfig/20241031-212011-ladsgroup.json
21:19 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1019.eqiad.wmnet with OS bullseye
21:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye
21:18 dancy@deploy2002: Installing scap version "4.119.3" for 210 hosts
21:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T376905)', diff saved to https://phabricator.wikimedia.org/P70795 and previous config saved to /var/cache/conftool/dbconfig/20241031-210504-ladsgroup.json
20:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T376905)', diff saved to https://phabricator.wikimedia.org/P70794 and previous config saved to /var/cache/conftool/dbconfig/20241031-205631-ladsgroup.json
20:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
20:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
20:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T376905)', diff saved to https://phabricator.wikimedia.org/P70793 and previous config saved to /var/cache/conftool/dbconfig/20241031-205604-ladsgroup.json
20:55 jsn@deploy2002: Finished scap sync-world: Backport for Translations for configuration for same-user-same-page reverts in Automoderator (T370795), Add follow-up message (T372476) (duration: 27m 10s)
20:46 jsn@deploy2002: jsn: Continuing with sync
20:46 jsn@deploy2002: jsn: Backport for Translations for configuration for same-user-same-page reverts in Automoderator (T370795), Add follow-up message (T372476) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P70792 and previous config saved to /var/cache/conftool/dbconfig/20241031-204057-ladsgroup.json
20:28 jsn@deploy2002: Started scap sync-world: Backport for Translations for configuration for same-user-same-page reverts in Automoderator (T370795), Add follow-up message (T372476)
20:25 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
20:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P70791 and previous config saved to /var/cache/conftool/dbconfig/20241031-202549-ladsgroup.json
20:25 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
20:15 dancy@deploy2002: Finished scap sync-world: Backport for tcywikisource: fix typo of author namespace (T378555) (duration: 07m 46s)
20:10 dancy@deploy2002: dancy, anzx: Continuing with sync
20:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T376905)', diff saved to https://phabricator.wikimedia.org/P70790 and previous config saved to /var/cache/conftool/dbconfig/20241031-201042-ladsgroup.json
20:10 dancy@deploy2002: dancy, anzx: Backport for tcywikisource: fix typo of author namespace (T378555) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:07 dancy@deploy2002: Started scap sync-world: Backport for tcywikisource: fix typo of author namespace (T378555)
20:03 dancy@deploy2002: Installation of scap version "4.119.2" completed for 210 hosts
20:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2162 (T376905)', diff saved to https://phabricator.wikimedia.org/P70789 and previous config saved to /var/cache/conftool/dbconfig/20241031-200214-ladsgroup.json
20:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
20:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
20:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T376905)', diff saved to https://phabricator.wikimedia.org/P70788 and previous config saved to /var/cache/conftool/dbconfig/20241031-200148-ladsgroup.json
19:58 dancy@deploy2002: Installing scap version "4.119.2" for 210 hosts
19:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P70787 and previous config saved to /var/cache/conftool/dbconfig/20241031-194640-ladsgroup.json
19:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P70786 and previous config saved to /var/cache/conftool/dbconfig/20241031-193133-ladsgroup.json
19:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T376905)', diff saved to https://phabricator.wikimedia.org/P70785 and previous config saved to /var/cache/conftool/dbconfig/20241031-191626-ladsgroup.json
19:15 dduvall@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.1 refs T375660
19:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2161 (T376905)', diff saved to https://phabricator.wikimedia.org/P70784 and previous config saved to /var/cache/conftool/dbconfig/20241031-190648-ladsgroup.json
19:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
19:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
19:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T376905)', diff saved to https://phabricator.wikimedia.org/P70783 and previous config saved to /var/cache/conftool/dbconfig/20241031-190622-ladsgroup.json
19:06 swfrench@deploy2002: Finished scap sync-world: Backport for TimedMediaHandler: revert commonswiki changes due to capacity issues (duration: 07m 38s)
19:01 swfrench@deploy2002: swfrench, hnowlan: Continuing with sync
19:01 swfrench@deploy2002: swfrench, hnowlan: Backport for TimedMediaHandler: revert commonswiki changes due to capacity issues synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
18:58 swfrench@deploy2002: Started scap sync-world: Backport for TimedMediaHandler: revert commonswiki changes due to capacity issues
18:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P70782 and previous config saved to /var/cache/conftool/dbconfig/20241031-185115-ladsgroup.json
18:47 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
18:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P70781 and previous config saved to /var/cache/conftool/dbconfig/20241031-183608-ladsgroup.json
18:26 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
18:26 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
18:24 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
18:23 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
18:23 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
18:23 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
18:22 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
18:22 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
18:22 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
18:21 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T376905)', diff saved to https://phabricator.wikimedia.org/P70780 and previous config saved to /var/cache/conftool/dbconfig/20241031-182101-ladsgroup.json
18:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T376905)', diff saved to https://phabricator.wikimedia.org/P70779 and previous config saved to /var/cache/conftool/dbconfig/20241031-181225-ladsgroup.json
18:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
18:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
18:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T376905)', diff saved to https://phabricator.wikimedia.org/P70778 and previous config saved to /var/cache/conftool/dbconfig/20241031-181158-ladsgroup.json
18:05 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
17:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P70777 and previous config saved to /var/cache/conftool/dbconfig/20241031-175651-ladsgroup.json
17:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P70776 and previous config saved to /var/cache/conftool/dbconfig/20241031-174144-ladsgroup.json
17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T376905)', diff saved to https://phabricator.wikimedia.org/P70775 and previous config saved to /var/cache/conftool/dbconfig/20241031-172637-ladsgroup.json
17:26 volans: uploaded spicerack_8.15.2 to apt.wikimedia.org bullseye-wikimedia
17:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T376905)', diff saved to https://phabricator.wikimedia.org/P70774 and previous config saved to /var/cache/conftool/dbconfig/20241031-171824-ladsgroup.json
17:18 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
17:18 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
17:16 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1216.eqiad.wmnet with reason: Maintenance
17:16 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1216.eqiad.wmnet with reason: Maintenance
17:13 swfrench@deploy2002: Finished scap sync-world: Deployment to pick up PHP version parameterization - T372604 T377040 (duration: 01m 52s)
17:11 swfrench@deploy2002: Started scap sync-world: Deployment to pick up PHP version parameterization - T372604 T377040
17:01 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye
17:00 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1020.eqiad.wmnet with OS bullseye
16:57 Emperor: set mgr mgr/prometheus/scrape_interval 15.0 in both apus clusters
16:56 urandom: Bootstrapping Cassandra/aqs1022-a — T378725
16:52 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1022.eqiad.wmnet with reason: Bootstrapping — T378725
16:52 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1022.eqiad.wmnet with reason: Bootstrapping — T378725
16:45 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye
16:37 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1016.eqiad.wmnet with OS bullseye
16:27 taavi@deploy2002: Finished scap sync-world: Backport for Drop 'nonglobal' dblist (duration: 08m 44s)
16:23 taavi@deploy2002: taavi: Continuing with sync
16:21 taavi@deploy2002: taavi: Backport for Drop 'nonglobal' dblist synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:19 taavi@deploy2002: Started scap sync-world: Backport for Drop 'nonglobal' dblist
16:16 taavi@deploy2002: Finished scap sync-world: Backport for Drop labtestwiki config (T378260) (duration: 09m 39s)
16:12 taavi@deploy2002: taavi: Continuing with sync
16:09 taavi@deploy2002: taavi: Backport for Drop labtestwiki config (T378260) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:07 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:07 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Additional IPs for Cassandra — aqs1022 - eevans@cumin1002"
16:07 taavi@deploy2002: Started scap sync-world: Backport for Drop labtestwiki config (T378260)
16:06 ryankemper: [archiva] Freed up space on `archiva1002.wikimedia.org` like so: `sudo rm -rfv /var/cache/archiva/temp* && sudo systemctl restart archiva`. We're down to 31% usage now
16:05 arnaudb@cumin1002: dbctl commit (dc=all): 'db1234 (re)pooling @ 100%: post T378267 reclone', diff saved to https://phabricator.wikimedia.org/P70772 and previous config saved to /var/cache/conftool/dbconfig/20241031-160542-arnaudb.json
16:04 dancy@deploy2002: scap failed: <CalledProcessError> Command '['sudo', '-u', 'mwbuilder', '-n', '--', '/home/dancy/src/venvs/scap/bin/scap', 'mwshell', '--no-local-config', '--directory', '/srv/mediawiki-staging', '--user', 'www-data', '--', 'rm -f /srv/mediawiki-staging/php-1.43.0-wmf.28/cache/l10n/*.tmp.*']' returned non-zero exit status 1. (scap version: 4.118.0) (duration: 00m 01s)
16:04 dancy@deploy2002: Started scap sync-world: Backport for Drop labtestwiki config (T378260)
16:03 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Additional IPs for Cassandra — aqs1022 - eevans@cumin1002"
15:59 eevans@cumin1002: START - Cookbook sre.dns.netbox
15:55 samtar@deploy2002: Finished scap sync-world: Backport for [CommunityRequests] disable wgCommunityRequestsEnable by default (T366194) (duration: 07m 51s)
15:50 arnaudb@cumin1002: dbctl commit (dc=all): 'db1234 (re)pooling @ 75%: post T378267 reclone', diff saved to https://phabricator.wikimedia.org/P70770 and previous config saved to /var/cache/conftool/dbconfig/20241031-155037-arnaudb.json
15:50 samtar@deploy2002: samtar, musikanimal: Continuing with sync
15:49 samtar@deploy2002: samtar, musikanimal: Backport for [CommunityRequests] disable wgCommunityRequestsEnable by default (T366194) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:47 samtar@deploy2002: Started scap sync-world: Backport for [CommunityRequests] disable wgCommunityRequestsEnable by default (T366194)
15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2190']
15:44 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2190']
15:35 eevans@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
15:35 arnaudb@cumin1002: dbctl commit (dc=all): 'db1234 (re)pooling @ 50%: post T378267 reclone', diff saved to https://phabricator.wikimedia.org/P70769 and previous config saved to /var/cache/conftool/dbconfig/20241031-153531-arnaudb.json
15:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db2146 (re)pooling @ 100%: post maintenance', diff saved to https://phabricator.wikimedia.org/P70768 and previous config saved to /var/cache/conftool/dbconfig/20241031-152220-arnaudb.json
15:20 arnaudb@cumin1002: dbctl commit (dc=all): 'db1234 (re)pooling @ 25%: post T378267 reclone', diff saved to https://phabricator.wikimedia.org/P70767 and previous config saved to /var/cache/conftool/dbconfig/20241031-152026-arnaudb.json
15:15 eevans@cumin1002: START - Cookbook sre.dns.netbox
15:08 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
15:08 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
15:07 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add tooltips to expressions - oblivian@cumin1002"
15:07 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add tooltips to expressions - oblivian@cumin1002
15:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db2146 (re)pooling @ 75%: post maintenance', diff saved to https://phabricator.wikimedia.org/P70766 and previous config saved to /var/cache/conftool/dbconfig/20241031-150714-arnaudb.json
15:06 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add tooltips to expressions - oblivian@cumin1002
15:06 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add tooltips to expressions - oblivian@cumin1002"
15:05 arnaudb@cumin1002: dbctl commit (dc=all): 'db1234 (re)pooling @ 10%: post T378267 reclone', diff saved to https://phabricator.wikimedia.org/P70765 and previous config saved to /var/cache/conftool/dbconfig/20241031-150521-arnaudb.json
15:00 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
14:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
14:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db2146 (re)pooling @ 50%: post maintenance', diff saved to https://phabricator.wikimedia.org/P70764 and previous config saved to /var/cache/conftool/dbconfig/20241031-145209-arnaudb.json
14:50 arnaudb@cumin1002: dbctl commit (dc=all): 'db1234 (re)pooling @ 5%: post T378267 reclone', diff saved to https://phabricator.wikimedia.org/P70763 and previous config saved to /var/cache/conftool/dbconfig/20241031-145015-arnaudb.json
14:49 arnaudb@cumin1002: dbctl commit (dc=all): 'db1232 (re)pooling @ 100%: post db1234.eqiad.wmnet clone', diff saved to https://phabricator.wikimedia.org/P70762 and previous config saved to /var/cache/conftool/dbconfig/20241031-144902-arnaudb.json
14:37 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on db2190.codfw.wmnet with reason: host has hardware issues T378628
14:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on db2190.codfw.wmnet with reason: host has hardware issues T378628
14:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db2146 (re)pooling @ 25%: post maintenance', diff saved to https://phabricator.wikimedia.org/P70761 and previous config saved to /var/cache/conftool/dbconfig/20241031-143704-arnaudb.json
14:35 arnaudb@cumin1002: dbctl commit (dc=all): 'db1234 (re)pooling @ 4%: post T378267 reclone', diff saved to https://phabricator.wikimedia.org/P70760 and previous config saved to /var/cache/conftool/dbconfig/20241031-143510-arnaudb.json
14:33 arnaudb@cumin1002: dbctl commit (dc=all): 'db1232 (re)pooling @ 75%: post db1234.eqiad.wmnet clone', diff saved to https://phabricator.wikimedia.org/P70759 and previous config saved to /var/cache/conftool/dbconfig/20241031-143356-arnaudb.json
14:24 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database tcywikisource (T378469)
14:23 fnegri@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database tcywikisource (T378469)
14:22 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database tcywiktionary (T378462)
14:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db2146 (re)pooling @ 10%: post maintenance', diff saved to https://phabricator.wikimedia.org/P70758 and previous config saved to /var/cache/conftool/dbconfig/20241031-142158-arnaudb.json
14:21 fnegri@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database tcywiktionary (T378462)
14:21 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database ibawiki (T376571)
14:21 fnegri@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database ibawiki (T376571)
14:20 arnaudb@cumin1002: dbctl commit (dc=all): 'db1234 (re)pooling @ 2%: post T378267 reclone', diff saved to https://phabricator.wikimedia.org/P70757 and previous config saved to /var/cache/conftool/dbconfig/20241031-142004-arnaudb.json
14:19 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database bclwikisource (T377087)
14:19 fnegri@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database bclwikisource (T377087)
14:18 arnaudb@cumin1002: dbctl commit (dc=all): 'db1232 (re)pooling @ 50%: post db1234.eqiad.wmnet clone', diff saved to https://phabricator.wikimedia.org/P70756 and previous config saved to /var/cache/conftool/dbconfig/20241031-141851-arnaudb.json
14:14 sergi0: Running `foreachwiki userOptions.php --delete --old=sectionlevelimages growthexperiments-homepage-variant` (T375753)
14:11 sergi0: eswiki, arwiki, cswiki, frwiki running `mwscript userOptions.php --wiki=frwiki --delete-defaults growthexperiments-homepage-variant` (T374664)
14:06 arnaudb@cumin1002: dbctl commit (dc=all): 'db2146 (re)pooling @ 5%: post maintenance', diff saved to https://phabricator.wikimedia.org/P70755 and previous config saved to /var/cache/conftool/dbconfig/20241031-140653-arnaudb.json
14:05 arnaudb@cumin1002: dbctl commit (dc=all): 'db1234 (re)pooling @ 1%: post T378267 reclone', diff saved to https://phabricator.wikimedia.org/P70754 and previous config saved to /var/cache/conftool/dbconfig/20241031-140459-arnaudb.json
14:03 arnaudb@cumin1002: dbctl commit (dc=all): 'db1232 (re)pooling @ 25%: post db1234.eqiad.wmnet clone', diff saved to https://phabricator.wikimedia.org/P70753 and previous config saved to /var/cache/conftool/dbconfig/20241031-140345-arnaudb.json
13:50 urbanecm@deploy2002: Finished scap sync-world: Backport for tcywikisource: add logo (T378555) (duration: 08m 56s)
13:46 urbanecm@deploy2002: urbanecm, anzx: Continuing with sync
13:44 urbanecm@deploy2002: urbanecm, anzx: Backport for tcywikisource: add logo (T378555) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:41 urbanecm@deploy2002: Started scap sync-world: Backport for tcywikisource: add logo (T378555)
{{safesubst:SAL entry|1=13:38 urbanecm@deploy2002: Finished scap sync-world: Backport for Set username in user mock and reset state after test (T378573), Fix and re-enable selenium test (T378581), Fix selenium test loading the wrong talk page, HomepageHooks: do not store assigned variant on account creation (T377713), [[gerrit:1085347|SpecialHomepage: show community update}}
13:34 urbanecm@deploy2002: hnowlan, sgimeno, urbanecm: Continuing with sync
{{safesubst:SAL entry|1=13:30 urbanecm@deploy2002: hnowlan, sgimeno, urbanecm: Backport for Set username in user mock and reset state after test (T378573), Fix and re-enable selenium test (T378581), Fix selenium test loading the wrong talk page, HomepageHooks: do not store assigned variant on account creation (T377713), [[gerrit:1085347|SpecialHomepage: show community upda}}
{{safesubst:SAL entry|1=13:28 urbanecm@deploy2002: Started scap sync-world: Backport for Set username in user mock and reset state after test (T378573), Fix and re-enable selenium test (T378581), Fix selenium test loading the wrong talk page, HomepageHooks: do not store assigned variant on account creation (T377713), [[gerrit:1085347|SpecialHomepage: show community update}}
13:25 urbanecm@deploy2002: Finished scap sync-world: Backport for tcywikisource: Add namespaces, SITENAME and timezone (T378555), tcywiktionary: add SITENAME and timezone (T378556), tcywiktionary: add logo (T378556) (duration: 09m 39s)
13:20 urbanecm@deploy2002: anzx, urbanecm: Continuing with sync
13:19 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
13:18 urbanecm@deploy2002: anzx, urbanecm: Backport for tcywikisource: Add namespaces, SITENAME and timezone (T378555), tcywiktionary: add SITENAME and timezone (T378556), tcywiktionary: add logo (T378556) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:18 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
13:15 urbanecm@deploy2002: Started scap sync-world: Backport for tcywikisource: Add namespaces, SITENAME and timezone (T378555), tcywiktionary: add SITENAME and timezone (T378556), tcywiktionary: add logo (T378556)
13:14 urbanecm@deploy2002: Finished scap sync-world: Backport for TimedMediaHandler: use shellbox globally (T357309), Remove RunSingleJobStdin script (T369048) (duration: 09m 43s)
13:09 urbanecm@deploy2002: urbanecm, hnowlan: Continuing with sync
13:08 urbanecm@deploy2002: urbanecm, hnowlan: Backport for TimedMediaHandler: use shellbox globally (T357309), Remove RunSingleJobStdin script (T369048) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:04 urbanecm@deploy2002: Started scap sync-world: Backport for TimedMediaHandler: use shellbox globally (T357309), Remove RunSingleJobStdin script (T369048)
12:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1237 (T376905)', diff saved to https://phabricator.wikimedia.org/P70752 and previous config saved to /var/cache/conftool/dbconfig/20241031-122719-ladsgroup.json
12:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1237', diff saved to https://phabricator.wikimedia.org/P70751 and previous config saved to /var/cache/conftool/dbconfig/20241031-121212-ladsgroup.json
12:06 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host aux-k8s-worker1002.eqiad.wmnet
12:06 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host aux-k8s-worker1002.eqiad.wmnet
12:01 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database annwiki (T377118)
12:01 fnegri@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database annwiki (T377118)
12:01 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database tddwiki (T375016)
12:00 fnegri@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database tddwiki (T375016)
12:00 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database rskwiki (T375016)
11:59 fnegri@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database rskwiki (T375016)
11:59 fnegri@cumin1002: END (ERROR) - Cookbook sre.wikireplicas.add-wiki (exit_code=97) for database rskwiki (T375016)
11:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1237', diff saved to https://phabricator.wikimedia.org/P70750 and previous config saved to /var/cache/conftool/dbconfig/20241031-115705-ladsgroup.json
11:54 fnegri@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database rskwiki (T375016)
11:47 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1232.eqiad.wmnet onto db1234.eqiad.wmnet
11:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1237 (T376905)', diff saved to https://phabricator.wikimedia.org/P70747 and previous config saved to /var/cache/conftool/dbconfig/20241031-114158-ladsgroup.json
11:38 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1002.eqiad.wmnet with OS bookworm
11:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1237 (T376905)', diff saved to https://phabricator.wikimedia.org/P70746 and previous config saved to /var/cache/conftool/dbconfig/20241031-113456-ladsgroup.json
11:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1237.eqiad.wmnet with reason: Maintenance
11:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1237.eqiad.wmnet with reason: Maintenance
11:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
11:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
11:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T376905)', diff saved to https://phabricator.wikimedia.org/P70744 and previous config saved to /var/cache/conftool/dbconfig/20241031-112924-ladsgroup.json
11:26 fabfur: reverted previous action (T378578)
11:20 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1002.eqiad.wmnet with reason: host reimage
11:17 fabfur: install haproxykafka on cp4037 and cp3066 (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1085308) (T378578)
11:17 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1002.eqiad.wmnet with reason: host reimage
11:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P70743 and previous config saved to /var/cache/conftool/dbconfig/20241031-111417-ladsgroup.json
11:02 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1002.eqiad.wmnet with OS bookworm
11:01 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host aux-k8s-worker1002.eqiad.wmnet
11:00 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host aux-k8s-worker1002.eqiad.wmnet
10:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P70742 and previous config saved to /var/cache/conftool/dbconfig/20241031-105910-ladsgroup.json
10:58 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host aux-k8s-worker1002.eqiad.wmnet
10:58 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host aux-k8s-worker1002.eqiad.wmnet
10:56 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host aux-k8s-ctrl1003.eqiad.wmnet
10:56 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host aux-k8s-ctrl1003.eqiad.wmnet
10:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db[1232,1234].eqiad.wmnet with reason: hosts in cloning, avoiding alerts
10:52 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db[1232,1234].eqiad.wmnet with reason: hosts in cloning, avoiding alerts
10:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T376905)', diff saved to https://phabricator.wikimedia.org/P70741 and previous config saved to /var/cache/conftool/dbconfig/20241031-104404-ladsgroup.json
10:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1224 (T376905)', diff saved to https://phabricator.wikimedia.org/P70740 and previous config saved to /var/cache/conftool/dbconfig/20241031-103406-ladsgroup.json
10:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
10:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
10:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
10:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
10:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T376905)', diff saved to https://phabricator.wikimedia.org/P70739 and previous config saved to /var/cache/conftool/dbconfig/20241031-102835-ladsgroup.json
10:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P70738 and previous config saved to /var/cache/conftool/dbconfig/20241031-101328-ladsgroup.json
10:06 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-ctrl1003.eqiad.wmnet with OS bookworm
10:04 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db1232.eqiad.wmnet onto db1234.eqiad.wmnet
10:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db1232 in db1234 for T378267', diff saved to https://phabricator.wikimedia.org/P70737 and previous config saved to /var/cache/conftool/dbconfig/20241031-100301-arnaudb.json
09:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P70736 and previous config saved to /var/cache/conftool/dbconfig/20241031-095821-ladsgroup.json
09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-ctrl1003.eqiad.wmnet with reason: host reimage
09:47 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-ctrl1003.eqiad.wmnet with reason: host reimage
09:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T376905)', diff saved to https://phabricator.wikimedia.org/P70735 and previous config saved to /var/cache/conftool/dbconfig/20241031-094314-ladsgroup.json
09:35 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-ctrl1003.eqiad.wmnet with OS bookworm
09:35 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host aux-k8s-ctrl1003.eqiad.wmnet
09:35 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host aux-k8s-ctrl1003.eqiad.wmnet
09:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1179 (T376905)', diff saved to https://phabricator.wikimedia.org/P70734 and previous config saved to /var/cache/conftool/dbconfig/20241031-093446-ladsgroup.json
09:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
09:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
09:34 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-worker1003.eqiad.wmnet
09:32 elukey@puppetserver1001: conftool action : set/weight=10; selector: name=aux-k8s-ctrl1003.eqiad.wmnet
09:07 fabfur: importing haproxykafka 0.3 package into apt repository (T377613)
08:23 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye
08:23 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1016.eqiad.wmnet with OS bullseye
08:21 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1019.eqiad.wmnet with OS bullseye
08:13 ayounsi@cumin1002: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 56258
08:12 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 56258
08:01 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye
04:54 eileen: civicrm upgraded from 0eb881ca to 31f5cbdb
01:45 krinkle@deploy2002: Finished deploy [integration/docroot@0b03488]: (no justification provided) (duration: 00m 10s)
01:45 krinkle@deploy2002: Started deploy [integration/docroot@0b03488]: (no justification provided)
01:42 Krinkle: krinkle@mwmaint2001$ Purge https://doc.wikimedia.org/lib/wmui-page.css via `mwscript extensions/WikimediaMaintenance/purgeUrls.php`, T257188 T378542
01:38 krinkle@deploy2002: Finished deploy [integration/docroot@a2c044c]: T378542 (duration: 00m 23s)
01:38 krinkle@deploy2002: Started deploy [integration/docroot@a2c044c]: T378542
00:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2215 (T376905)', diff saved to https://phabricator.wikimedia.org/P70733 and previous config saved to /var/cache/conftool/dbconfig/20241031-003014-ladsgroup.json
00:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2215', diff saved to https://phabricator.wikimedia.org/P70732 and previous config saved to /var/cache/conftool/dbconfig/20241031-001507-ladsgroup.json
00:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2215', diff saved to https://phabricator.wikimedia.org/P70731 and previous config saved to /var/cache/conftool/dbconfig/20241031-000000-ladsgroup.json

2024-10-30

23:53 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2081.codfw.wmnet with OS bullseye
23:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2215 (T376905)', diff saved to https://phabricator.wikimedia.org/P70730 and previous config saved to /var/cache/conftool/dbconfig/20241030-234453-ladsgroup.json
23:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye
22:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2215 (T376905)', diff saved to https://phabricator.wikimedia.org/P70729 and previous config saved to /var/cache/conftool/dbconfig/20241030-225520-ladsgroup.json
22:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2215.codfw.wmnet with reason: Maintenance
22:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2215.codfw.wmnet with reason: Maintenance
22:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2191 (T376905)', diff saved to https://phabricator.wikimedia.org/P70728 and previous config saved to /var/cache/conftool/dbconfig/20241030-225449-ladsgroup.json
22:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2191', diff saved to https://phabricator.wikimedia.org/P70727 and previous config saved to /var/cache/conftool/dbconfig/20241030-223942-ladsgroup.json
22:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye
22:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
22:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2191', diff saved to https://phabricator.wikimedia.org/P70726 and previous config saved to /var/cache/conftool/dbconfig/20241030-222435-ladsgroup.json
22:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2191 (T376905)', diff saved to https://phabricator.wikimedia.org/P70725 and previous config saved to /var/cache/conftool/dbconfig/20241030-220928-ladsgroup.json
22:03 brett: Running ./redis-check-aof --fix on rdb1014 tcp_6379 instance - T376961
21:26 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Fix bug in BlockManager::getUniqueBlocks (T378563) (duration: 07m 22s)
21:21 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
21:21 dreamyjazz@deploy2002: dreamyjazz: Backport for Fix bug in BlockManager::getUniqueBlocks (T378563) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:18 dreamyjazz@deploy2002: Started scap sync-world: Backport for Fix bug in BlockManager::getUniqueBlocks (T378563)
21:17 tgr@deploy2002: Finished scap sync-world: Backport for GrowthExperiments: enable community updates module in pilot wikis (T374664) (duration: 10m 10s)
21:12 tgr@deploy2002: tgr, sgimeno: Continuing with sync
21:09 tgr@deploy2002: tgr, sgimeno: Backport for GrowthExperiments: enable community updates module in pilot wikis (T374664) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2191 (T376905)', diff saved to https://phabricator.wikimedia.org/P70724 and previous config saved to /var/cache/conftool/dbconfig/20241030-210902-ladsgroup.json
21:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2191.codfw.wmnet with reason: Maintenance
21:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2191.codfw.wmnet with reason: Maintenance
21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T376905)', diff saved to https://phabricator.wikimedia.org/P70723 and previous config saved to /var/cache/conftool/dbconfig/20241030-210836-ladsgroup.json
21:07 tgr@deploy2002: Started scap sync-world: Backport for GrowthExperiments: enable community updates module in pilot wikis (T374664)
{{safesubst:SAL entry|1=21:01 tgr@deploy2002: Finished scap sync-world: Backport for Set username in user mock and reset state after test (T378573), Fix and re-enable selenium test (T378581), Fix selenium test loading the wrong talk page, build: Suppress phan issue with null for Message::numParams, [[gerrit:1084181|HomepageHooks: do not store assigned variant on account cr}}
20:57 tgr@deploy2002: sgimeno, umherirrender, tgr: Continuing with sync
20:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P70722 and previous config saved to /var/cache/conftool/dbconfig/20241030-205329-ladsgroup.json
20:51 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
20:51 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
{{safesubst:SAL entry|1=20:45 tgr@deploy2002: sgimeno, umherirrender, tgr: Backport for Set username in user mock and reset state after test (T378573), Fix and re-enable selenium test (T378581), Fix selenium test loading the wrong talk page, build: Suppress phan issue with null for Message::numParams, [[gerrit:1084181|HomepageHooks: do not store assigned variant on account}}
{{safesubst:SAL entry|1=20:43 tgr@deploy2002: Started scap sync-world: Backport for Set username in user mock and reset state after test (T378573), Fix and re-enable selenium test (T378581), Fix selenium test loading the wrong talk page, build: Suppress phan issue with null for Message::numParams, [[gerrit:1084181|HomepageHooks: do not store assigned variant on account cre}}
20:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P70721 and previous config saved to /var/cache/conftool/dbconfig/20241030-203822-ladsgroup.json
20:24 tgr@deploy2002: Finished scap sync-world: Backport for Set Flow to read-only on nowiki (T377990) (duration: 13m 21s)
20:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T376905)', diff saved to https://phabricator.wikimedia.org/P70720 and previous config saved to /var/cache/conftool/dbconfig/20241030-202315-ladsgroup.json
20:20 tgr@deploy2002: esanders, tgr: Continuing with sync
20:16 tgr@deploy2002: esanders, tgr: Backport for Set Flow to read-only on nowiki (T377990) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2131 (T376905)', diff saved to https://phabricator.wikimedia.org/P70719 and previous config saved to /var/cache/conftool/dbconfig/20241030-201331-ladsgroup.json
20:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2131.codfw.wmnet with reason: Maintenance
20:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2131.codfw.wmnet with reason: Maintenance
20:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2115 (T376905)', diff saved to https://phabricator.wikimedia.org/P70718 and previous config saved to /var/cache/conftool/dbconfig/20241030-201305-ladsgroup.json
20:11 tgr@deploy2002: Started scap sync-world: Backport for Set Flow to read-only on nowiki (T377990)
19:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2115', diff saved to https://phabricator.wikimedia.org/P70717 and previous config saved to /var/cache/conftool/dbconfig/20241030-195758-ladsgroup.json
19:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2115', diff saved to https://phabricator.wikimedia.org/P70716 and previous config saved to /var/cache/conftool/dbconfig/20241030-194251-ladsgroup.json
19:40 swfrench-wmf: all shellbox instances updated to shellbox 2024-10-15-214239 - T375243
19:39 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
19:39 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
19:37 mutante: gitlab - deleting user "jfk" on main server and both replicas T376936
19:37 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
19:36 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
19:36 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
19:35 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
19:35 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
19:34 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
19:34 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
19:33 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply
19:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2115 (T376905)', diff saved to https://phabricator.wikimedia.org/P70715 and previous config saved to /var/cache/conftool/dbconfig/20241030-192744-ladsgroup.json
19:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2115 (T376905)', diff saved to https://phabricator.wikimedia.org/P70714 and previous config saved to /var/cache/conftool/dbconfig/20241030-192011-ladsgroup.json
19:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2115.codfw.wmnet with reason: Maintenance
19:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2115.codfw.wmnet with reason: Maintenance
19:17 dduvall@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.1 refs T375660
18:40 dduvall@deploy2002: Finished scap sync-world: Backport for Revert "Use array instead of string for class list" (T378531) (duration: 19m 04s)
18:39 inflatador: bking@stat1008,stat1009,stat1010.mgmt racadm jobqueue delete -i $job T376813
18:36 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database nrwiki (T375101)
18:35 dduvall@deploy2002: ammarpad, dduvall: Continuing with sync
18:35 dduvall: error is still occurring following backport deployment of https://gerrit.wikimedia.org/r/c/mediawiki/skins/MinervaNeue/+/1084759 (T378531)
18:27 dduvall: monitoring testwiki error rates for a few minutes to see if the error related to T378531 subsides (current rate is 23 errors in the last 15 minutes)
18:23 dduvall@deploy2002: ammarpad, dduvall: Backport for Revert "Use array instead of string for class list" (T378531) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
18:21 dduvall@deploy2002: Started scap sync-world: Backport for Revert "Use array instead of string for class list" (T378531)
18:10 fnegri@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database nrwiki (T375101)
17:35 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s3
17:35 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s1
17:31 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s3
17:26 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s1
17:24 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1018.eqiad.wmnet,service=s7
17:23 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s7
17:21 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1019.eqiad.wmnet,service=s6
17:21 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1019.eqiad.wmnet,service=s4
17:20 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1019.eqiad.wmnet,service=s6
17:20 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1019.eqiad.wmnet,service=s4
17:19 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1020.eqiad.wmnet,service=s5
17:18 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1020.eqiad.wmnet,service=s8
17:11 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1020.eqiad.wmnet,service=s8
17:11 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1020.eqiad.wmnet,service=s5
17:03 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
17:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
17:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
17:00 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
16:59 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
16:58 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
16:58 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
16:57 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
16:54 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
16:53 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
16:44 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1017.eqiad.wmnet with OS bullseye
16:39 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
16:39 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
16:39 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
16:39 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
16:38 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
16:38 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
16:38 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
16:38 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
16:38 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
16:38 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
16:37 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
16:37 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
16:37 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
16:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
16:26 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
16:21 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
16:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
16:11 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye
16:09 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Fix bug in BlockManager::getUniqueBlocks (T378563) (duration: 07m 06s)
16:08 pfischer@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:08 pfischer@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
16:07 pfischer@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:07 pfischer@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:07 pfischer@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:06 pfischer@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:06 pfischer@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:06 pfischer@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
16:04 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
16:04 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
16:04 dreamyjazz@deploy2002: dreamyjazz: Backport for Fix bug in BlockManager::getUniqueBlocks (T378563) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:02 pfischer@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:02 pfischer@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:01 dreamyjazz@deploy2002: Started scap sync-world: Backport for Fix bug in BlockManager::getUniqueBlocks (T378563)
16:01 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1017.eqiad.wmnet with reason: host reimage
15:59 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
15:57 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1017.eqiad.wmnet with reason: host reimage
15:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
15:56 pfischer@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:55 pfischer@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
15:55 pfischer@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:54 pfischer@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
15:52 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
15:50 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
15:47 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
15:47 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
15:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
15:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
15:43 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1017.eqiad.wmnet with OS bullseye
15:39 moritzm: re-enable Puppet fleet-wide after puppetserver2001 maintenance
15:39 moritzm: re-enable Puppet fleet-wide for puppetserver2001 maintenance
15:39 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
15:38 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
15:36 ejegg: Standalone SmashPig upgraded from eaa176f7 to be47dddd
15:35 pfischer@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:35 pfischer@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
15:35 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
15:35 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
15:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
15:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
15:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
15:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on puppetserver2001.codfw.wmnet with reason: puppetserver2001 maintenance
15:29 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on puppetserver2001.codfw.wmnet with reason: puppetserver2001 maintenance
15:27 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
15:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
15:26 moritzm: disable Puppet fleet-wide for puppetserver2001 maintenance
15:25 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye
15:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
15:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
15:23 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1017.eqiad.wmnet with OS bullseye
15:07 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1017.eqiad.wmnet with OS bullseye
15:06 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on an-presto1020.eqiad.wmnet with reason: reimaging the hosts to bullseye
15:06 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on an-presto1020.eqiad.wmnet with reason: reimaging the hosts to bullseye
15:05 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on an-presto[1017-1019].eqiad.wmnet with reason: reimaging the hosts to bullseye
15:05 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on an-presto[1017-1019].eqiad.wmnet with reason: reimaging the hosts to bullseye
15:02 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1016.eqiad.wmnet with OS bullseye
15:01 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host aux-k8s-ctrl1002.eqiad.wmnet
15:00 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host aux-k8s-ctrl1002.eqiad.wmnet
14:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on puppetserver2003.codfw.wmnet with reason: RAM expansion
14:58 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on puppetserver2003.codfw.wmnet with reason: RAM expansion
14:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-ctrl1002.eqiad.wmnet with OS bookworm
14:56 fabfur: importing haproxykafka 0.2 package into apt repository (T377613)
14:43 joal@deploy2002: Finished deploy [airflow-dags/analytics@ec02629]: Regular analytics weekly train SECOND [airflow-dags/analytics@ec02629d] (duration: 00m 55s)
14:42 joal@deploy2002: Started deploy [airflow-dags/analytics@ec02629]: Regular analytics weekly train SECOND [airflow-dags/analytics@ec02629d]
14:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-ctrl1002.eqiad.wmnet with reason: host reimage
14:37 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2016.codfw.wmnet
14:37 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
14:37 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-ctrl1002.eqiad.wmnet with reason: host reimage
14:37 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
14:34 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
14:34 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
14:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2190 (T376905)', diff saved to https://phabricator.wikimedia.org/P70712 and previous config saved to /var/cache/conftool/dbconfig/20241030-143303-ladsgroup.json
14:32 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2190.codfw.wmnet with reason: Maintenance
14:32 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2190.codfw.wmnet with reason: Maintenance
14:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T376905)', diff saved to https://phabricator.wikimedia.org/P70711 and previous config saved to /var/cache/conftool/dbconfig/20241030-143236-ladsgroup.json
14:30 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
14:30 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
14:28 dreamyjazz@deploy2002: Finished scap sync-world: Backport for [BlockManager] Don't assume autoblocks have ::getParentBlockId (T378563), [GlobalBlocking] Enable global autoblocks on all WMF wikis (T377760) (duration: 09m 10s)
14:23 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
14:23 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-ctrl1002.eqiad.wmnet with OS bookworm
14:22 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host aux-k8s-ctrl1002.eqiad.wmnet
14:22 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host aux-k8s-ctrl1002.eqiad.wmnet
14:21 dreamyjazz@deploy2002: dreamyjazz: Backport for [BlockManager] Don't assume autoblocks have ::getParentBlockId (T378563), [GlobalBlocking] Enable global autoblocks on all WMF wikis (T377760) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on puppetserver2002.codfw.wmnet with reason: RAM expansion
14:20 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on puppetserver2002.codfw.wmnet with reason: RAM expansion
14:19 dreamyjazz@deploy2002: Started scap sync-world: Backport for [BlockManager] Don't assume autoblocks have ::getParentBlockId (T378563), [GlobalBlocking] Enable global autoblocks on all WMF wikis (T377760)
14:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P70710 and previous config saved to /var/cache/conftool/dbconfig/20241030-141729-ladsgroup.json
14:11 urbanecm: mwmaint2002: kill all running instances of `refreshLinkRecommendations.php` (T377150)
14:06 urbanecm@deploy2002: Finished scap sync-world: Backport for [BlockManager] Don't assume autoblocks have ::getParentBlockId (T378563), CirrusSearch: Enable offloading weighted tags via EventBus (T377150), cswiki: Add celebration logo (T378597) (duration: 15m 30s)
14:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P70709 and previous config saved to /var/cache/conftool/dbconfig/20241030-140222-ladsgroup.json
14:01 urbanecm@deploy2002: dreamyjazz, pfischer, urbanecm: Continuing with sync
13:58 joal@deploy2002: Finished deploy [airflow-dags/analytics@ec4746b]: Regular analytics weekly train [airflow-dags/analytics@ec4746b5] (duration: 00m 41s)
13:57 joal@deploy2002: Started deploy [airflow-dags/analytics@ec4746b]: Regular analytics weekly train [airflow-dags/analytics@ec4746b5]
13:53 urbanecm@deploy2002: dreamyjazz, pfischer, urbanecm: Backport for [BlockManager] Don't assume autoblocks have ::getParentBlockId (T378563), CirrusSearch: Enable offloading weighted tags via EventBus (T377150), cswiki: Add celebration logo (T378597) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:50 urbanecm@deploy2002: Started scap sync-world: Backport for [BlockManager] Don't assume autoblocks have ::getParentBlockId (T378563), CirrusSearch: Enable offloading weighted tags via EventBus (T377150), cswiki: Add celebration logo (T378597)
13:48 urbanecm@deploy2002: Finished scap sync-world: Backport for Growth [test2wiki]: enable community updates module (T376952), [Growth] beta: configure the A/B test experiment variants (T377233) (duration: 29m 00s)
13:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T376905)', diff saved to https://phabricator.wikimedia.org/P70707 and previous config saved to /var/cache/conftool/dbconfig/20241030-134715-ladsgroup.json
13:43 urbanecm@deploy2002: sgimeno, urbanecm: Continuing with sync
13:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P70704 and previous config saved to /var/cache/conftool/dbconfig/20241030-132204-ladsgroup.json
13:20 moritzm: upgrade PHP 7.4 on mwdebug* to 1:7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 T378173
13:19 urbanecm@deploy2002: Started scap sync-world: Backport for Growth [test2wiki]: enable community updates module (T376952), [Growth] beta: configure the A/B test experiment variants (T377233)
13:18 andrewtavis-wmde@deploy2002: Finished deploy [airflow-dags/wmde@ec4746b]: (no justification provided) (duration: 00m 07s)
13:18 andrewtavis-wmde@deploy2002: Started deploy [airflow-dags/wmde@ec4746b]: (no justification provided)
13:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P70703 and previous config saved to /var/cache/conftool/dbconfig/20241030-130657-ladsgroup.json
12:55 andrewtavis-wmde@deploy2002: Finished deploy [airflow-dags/wmde@ec4746b]: (no justification provided) (duration: 00m 11s)
12:54 andrewtavis-wmde@deploy2002: Started deploy [airflow-dags/wmde@ec4746b]: (no justification provided)
12:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T376905)', diff saved to https://phabricator.wikimedia.org/P70702 and previous config saved to /var/cache/conftool/dbconfig/20241030-125150-ladsgroup.json
12:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2156 (T376905)', diff saved to https://phabricator.wikimedia.org/P70701 and previous config saved to /var/cache/conftool/dbconfig/20241030-124316-ladsgroup.json
12:43 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
12:43 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
12:43 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
12:43 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
12:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T376905)', diff saved to https://phabricator.wikimedia.org/P70700 and previous config saved to /var/cache/conftool/dbconfig/20241030-124256-ladsgroup.json
12:30 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Handle a missing parent block in GlobalBlockLookup::getUserBlock (T378447), Handle a missing parent block in GlobalBlockLookup::getUserBlock (T378447), globalblocks API: Hide autoblocks when target param has username and IP (T377855) (duration: 10m 28s)
12:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P70699 and previous config saved to /var/cache/conftool/dbconfig/20241030-122749-ladsgroup.json
12:25 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
12:22 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
12:22 dreamyjazz@deploy2002: dreamyjazz: Backport for Handle a missing parent block in GlobalBlockLookup::getUserBlock (T378447), Handle a missing parent block in GlobalBlockLookup::getUserBlock (T378447), globalblocks API: Hide autoblocks when target param has username and IP (T377855) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:22 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
12:21 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
12:21 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
12:20 dreamyjazz@deploy2002: Started scap sync-world: Backport for Handle a missing parent block in GlobalBlockLookup::getUserBlock (T378447), Handle a missing parent block in GlobalBlockLookup::getUserBlock (T378447), globalblocks API: Hide autoblocks when target param has username and IP (T377855)
12:19 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
12:19 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
12:18 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
12:17 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
12:17 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
12:16 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
12:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P70698 and previous config saved to /var/cache/conftool/dbconfig/20241030-121242-ladsgroup.json
12:12 moritzm: installing podman security updates
12:11 joal@deploy2002: Finished deploy [analytics/refinery@0855ce2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0855ce28] (duration: 03m 41s)
12:07 joal@deploy2002: Started deploy [analytics/refinery@0855ce2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0855ce28]
12:04 joal@deploy2002: Finished deploy [analytics/refinery@0855ce2] (thin): Regular analytics weekly train THIN [analytics/refinery@0855ce28] (duration: 06m 54s)
11:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T376905)', diff saved to https://phabricator.wikimedia.org/P70697 and previous config saved to /var/cache/conftool/dbconfig/20241030-115735-ladsgroup.json
11:57 joal@deploy2002: Started deploy [analytics/refinery@0855ce2] (thin): Regular analytics weekly train THIN [analytics/refinery@0855ce28]
11:55 joal@deploy2002: Finished deploy [analytics/refinery@0855ce2]: Regular analytics weekly train [analytics/refinery@0855ce28] (duration: 08m 14s)
11:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2149 (T376905)', diff saved to https://phabricator.wikimedia.org/P70696 and previous config saved to /var/cache/conftool/dbconfig/20241030-114808-ladsgroup.json
11:48 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
11:47 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
11:47 joal@deploy2002: Started deploy [analytics/refinery@0855ce2]: Regular analytics weekly train [analytics/refinery@0855ce28]
11:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1216.eqiad.wmnet with reason: Maintenance
11:43 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1216.eqiad.wmnet with reason: Maintenance
11:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
11:41 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
11:39 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2016.codfw.wmnet
11:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd2003.codfw.wmnet to plain
11:38 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd2003.codfw.wmnet to plain
11:38 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1011.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2016.codfw.wmnet
11:37 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2016.codfw.wmnet
11:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd2003.codfw.wmnet to drbd
11:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve1011.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:28 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1010.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:26 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd2003.codfw.wmnet to drbd
11:23 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve1010.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2016.codfw.wmnet
11:19 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1009.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:19 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2016.codfw.wmnet
11:19 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye
11:17 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1016.eqiad.wmnet with OS bullseye
11:14 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve1009.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:09 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
11:09 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
11:06 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye
11:02 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2044.codfw.wmnet to cluster codfw and group D
11:01 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2044.codfw.wmnet to cluster codfw and group D
10:40 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 16347
10:40 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 16347
10:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 16347
10:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 16347
10:32 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 852
10:32 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 852
10:31 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 14593
10:29 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 14593
10:21 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6461
10:18 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 6461
10:04 moritzm: installing python-idna security updates
09:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P70694 and previous config saved to /var/cache/conftool/dbconfig/20241030-095904-arnaudb.json
09:50 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling reboot on A:docker-registry
09:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P70693 and previous config saved to /var/cache/conftool/dbconfig/20241030-094357-arnaudb.json
09:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 40676
09:40 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 40676
09:38 fabfur: importing haproxykafka package into apt repository (T377613)
09:33 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling reboot on A:docker-registry
09:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2044.codfw.wmnet
09:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P70692 and previous config saved to /var/cache/conftool/dbconfig/20241030-092850-arnaudb.json
09:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testreduce1002.eqiad.wmnet
09:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2044.codfw.wmnet
09:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testreduce1002.eqiad.wmnet
09:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P70691 and previous config saved to /var/cache/conftool/dbconfig/20241030-091343-arnaudb.json
09:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P70690 and previous config saved to /var/cache/conftool/dbconfig/20241030-091131-arnaudb.json
09:11 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
09:11 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
09:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70689 and previous config saved to /var/cache/conftool/dbconfig/20241030-091108-arnaudb.json
09:08 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2043.codfw.wmnet to cluster codfw and group D
09:07 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2043.codfw.wmnet to cluster codfw and group D
09:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2043.codfw.wmnet
09:00 arnaudb@cumin1002: dbctl commit (dc=all): 'db2223 (re)pooling @ 100%: post clone repool', diff saved to https://phabricator.wikimedia.org/P70688 and previous config saved to /var/cache/conftool/dbconfig/20241030-090002-arnaudb.json
08:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2043.codfw.wmnet
08:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70687 and previous config saved to /var/cache/conftool/dbconfig/20241030-085601-arnaudb.json
08:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db2223 (re)pooling @ 75%: post clone repool', diff saved to https://phabricator.wikimedia.org/P70685 and previous config saved to /var/cache/conftool/dbconfig/20241030-084457-arnaudb.json
08:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70684 and previous config saved to /var/cache/conftool/dbconfig/20241030-084054-arnaudb.json
08:29 arnaudb@cumin1002: dbctl commit (dc=all): 'db2223 (re)pooling @ 50%: post clone repool', diff saved to https://phabricator.wikimedia.org/P70683 and previous config saved to /var/cache/conftool/dbconfig/20241030-082952-arnaudb.json
08:28 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2239.codfw.wmnet with reason: host in preparation
08:28 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2239.codfw.wmnet with reason: host in preparation
08:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70682 and previous config saved to /var/cache/conftool/dbconfig/20241030-082547-arnaudb.json
08:14 arnaudb@cumin1002: dbctl commit (dc=all): 'db2223 (re)pooling @ 25%: post clone repool', diff saved to https://phabricator.wikimedia.org/P70680 and previous config saved to /var/cache/conftool/dbconfig/20241030-081446-arnaudb.json
07:59 arnaudb@cumin1002: dbctl commit (dc=all): 'db2223 (re)pooling @ 10%: post clone repool', diff saved to https://phabricator.wikimedia.org/P70678 and previous config saved to /var/cache/conftool/dbconfig/20241030-075941-arnaudb.json
07:57 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
07:52 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
07:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db2223 (re)pooling @ 5%: post clone repool', diff saved to https://phabricator.wikimedia.org/P70677 and previous config saved to /var/cache/conftool/dbconfig/20241030-074436-arnaudb.json
07:29 arnaudb@cumin1002: dbctl commit (dc=all): 'db2223 (re)pooling @ 4%: post clone repool', diff saved to https://phabricator.wikimedia.org/P70676 and previous config saved to /var/cache/conftool/dbconfig/20241030-072930-arnaudb.json
07:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70675 and previous config saved to /var/cache/conftool/dbconfig/20241030-072520-arnaudb.json
07:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
07:25 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
07:24 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
07:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
07:14 arnaudb@cumin1002: dbctl commit (dc=all): 'db2223 (re)pooling @ 2%: post clone repool', diff saved to https://phabricator.wikimedia.org/P70674 and previous config saved to /var/cache/conftool/dbconfig/20241030-071425-arnaudb.json
06:59 arnaudb@cumin1002: dbctl commit (dc=all): 'db2223 (re)pooling @ 1%: post clone repool', diff saved to https://phabricator.wikimedia.org/P70673 and previous config saved to /var/cache/conftool/dbconfig/20241030-065920-arnaudb.json
06:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-pii (exit_code=0) Managing PII for wikis tcywikisource, tcywiktionary in section s5
06:47 arnaudb@cumin1002: START - Cookbook sre.mysql.sanitize-pii Managing PII for wikis tcywikisource, tcywiktionary in section s5
06:47 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-pii (exit_code=0) Checking PII for wikis tcywikisource in section s5
06:46 arnaudb@cumin1002: START - Cookbook sre.mysql.sanitize-pii Checking PII for wikis tcywikisource in section s5
00:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T376905)', diff saved to https://phabricator.wikimedia.org/P70672 and previous config saved to /var/cache/conftool/dbconfig/20241030-003847-ladsgroup.json
00:28 zabe@deploy2002: Finished scap sync-world: update interwiki cache (duration: 09m 01s)
00:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P70671 and previous config saved to /var/cache/conftool/dbconfig/20241030-002340-ladsgroup.json
00:19 zabe@deploy2002: Started scap sync-world: update interwiki cache
00:14 zabe: zabe@mwmaint2002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=tcywikisource --cluster=all 2>&1 | tee /tmp/tcywikisource.UpdateSearchIndexConfig.log # T377919
00:11 zabe@deploy2002: Finished scap sync-world: Creating tcywikisource (T377919) (duration: 08m 13s)
00:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P70670 and previous config saved to /var/cache/conftool/dbconfig/20241030-000833-ladsgroup.json
00:03 zabe@deploy2002: Started scap sync-world: Creating tcywikisource (T377919)

2024-10-29

23:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T376905)', diff saved to https://phabricator.wikimedia.org/P70669 and previous config saved to /var/cache/conftool/dbconfig/20241029-235326-ladsgroup.json
23:53 zabe: zabe@mwmaint2002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=tcywiktionary --cluster=all 2>&1 | tee /tmp/tcywiktionary.UpdateSearchIndexConfig.log # T377922
23:48 zabe@deploy2002: Finished scap sync-world: Creating tcywiktionary (T377922) (duration: 07m 26s)
23:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T376905)', diff saved to https://phabricator.wikimedia.org/P70668 and previous config saved to /var/cache/conftool/dbconfig/20241029-234608-ladsgroup.json
23:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2216.codfw.wmnet with reason: Maintenance
23:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2216.codfw.wmnet with reason: Maintenance
23:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T376905)', diff saved to https://phabricator.wikimedia.org/P70667 and previous config saved to /var/cache/conftool/dbconfig/20241029-234541-ladsgroup.json
23:41 zabe@deploy2002: Started scap sync-world: Creating tcywiktionary (T377922)
23:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P70666 and previous config saved to /var/cache/conftool/dbconfig/20241029-233034-ladsgroup.json
23:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P70665 and previous config saved to /var/cache/conftool/dbconfig/20241029-231527-ladsgroup.json
23:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T376905)', diff saved to https://phabricator.wikimedia.org/P70664 and previous config saved to /var/cache/conftool/dbconfig/20241029-230020-ladsgroup.json
22:47 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2202.codfw.wmnet with reason: Maintenance
22:47 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2202.codfw.wmnet with reason: Maintenance
22:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T376905)', diff saved to https://phabricator.wikimedia.org/P70662 and previous config saved to /var/cache/conftool/dbconfig/20241029-224717-ladsgroup.json
22:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P70661 and previous config saved to /var/cache/conftool/dbconfig/20241029-223210-ladsgroup.json
22:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P70660 and previous config saved to /var/cache/conftool/dbconfig/20241029-221703-ladsgroup.json
22:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T376905)', diff saved to https://phabricator.wikimedia.org/P70659 and previous config saved to /var/cache/conftool/dbconfig/20241029-220156-ladsgroup.json
21:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T376905)', diff saved to https://phabricator.wikimedia.org/P70658 and previous config saved to /var/cache/conftool/dbconfig/20241029-215443-ladsgroup.json
21:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
21:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
21:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T376905)', diff saved to https://phabricator.wikimedia.org/P70657 and previous config saved to /var/cache/conftool/dbconfig/20241029-215417-ladsgroup.json
21:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P70656 and previous config saved to /var/cache/conftool/dbconfig/20241029-213910-ladsgroup.json
21:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P70655 and previous config saved to /var/cache/conftool/dbconfig/20241029-212402-ladsgroup.json
21:09 eileen: civicrm upgraded from 0b7f3b47 to 0eb881ca
21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T376905)', diff saved to https://phabricator.wikimedia.org/P70654 and previous config saved to /var/cache/conftool/dbconfig/20241029-210855-ladsgroup.json
20:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T376905)', diff saved to https://phabricator.wikimedia.org/P70653 and previous config saved to /var/cache/conftool/dbconfig/20241029-205718-ladsgroup.json
20:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
20:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
20:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T376905)', diff saved to https://phabricator.wikimedia.org/P70652 and previous config saved to /var/cache/conftool/dbconfig/20241029-205652-ladsgroup.json
20:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P70651 and previous config saved to /var/cache/conftool/dbconfig/20241029-204145-ladsgroup.json
20:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P70650 and previous config saved to /var/cache/conftool/dbconfig/20241029-202638-ladsgroup.json
20:14 kostajh: UTC late deploys done
20:12 kharlan@deploy2002: Finished scap sync-world: Backport for QuickSurveys: Undeploy safety survey (T376517), Missing.php: redirect wikisources to localized main page (duration: 09m 16s)
20:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T376905)', diff saved to https://phabricator.wikimedia.org/P70649 and previous config saved to /var/cache/conftool/dbconfig/20241029-201131-ladsgroup.json
20:08 kharlan@deploy2002: pppery, kharlan: Continuing with sync
20:05 kharlan@deploy2002: pppery, kharlan: Backport for QuickSurveys: Undeploy safety survey (T376517), Missing.php: redirect wikisources to localized main page synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:03 kharlan@deploy2002: Started scap sync-world: Backport for QuickSurveys: Undeploy safety survey (T376517), Missing.php: redirect wikisources to localized main page
20:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T376905)', diff saved to https://phabricator.wikimedia.org/P70648 and previous config saved to /var/cache/conftool/dbconfig/20241029-200056-ladsgroup.json
20:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
20:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
20:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T376905)', diff saved to https://phabricator.wikimedia.org/P70647 and previous config saved to /var/cache/conftool/dbconfig/20241029-200029-ladsgroup.json
19:56 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on an-worker1165.eqiad.wmnet with reason: T378454
19:55 bking@cumin2002: START - Cookbook sre.hosts.downtime for 6 days, 0:00:00 on an-worker1165.eqiad.wmnet with reason: T378454
19:48 eileen: civicrm upgraded from 8f5c8b33 to 0b7f3b47
19:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P70646 and previous config saved to /var/cache/conftool/dbconfig/20241029-194522-ladsgroup.json
19:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P70645 and previous config saved to /var/cache/conftool/dbconfig/20241029-193015-ladsgroup.json
19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T376905)', diff saved to https://phabricator.wikimedia.org/P70644 and previous config saved to /var/cache/conftool/dbconfig/20241029-191508-ladsgroup.json
19:05 eileen: civicrm upgraded from 1c6c4e08 to 8f5c8b33
19:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T376905)', diff saved to https://phabricator.wikimedia.org/P70643 and previous config saved to /var/cache/conftool/dbconfig/20241029-190442-ladsgroup.json
19:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
19:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
19:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
19:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
19:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T376905)', diff saved to https://phabricator.wikimedia.org/P70642 and previous config saved to /var/cache/conftool/dbconfig/20241029-190359-ladsgroup.json
18:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P70641 and previous config saved to /var/cache/conftool/dbconfig/20241029-184852-ladsgroup.json
18:37 swfrench-wmf: shellbox-syntaxhighlight updated to shellbox 2024-10-15-214239 - T375243
18:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P70640 and previous config saved to /var/cache/conftool/dbconfig/20241029-183345-ladsgroup.json
18:32 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
18:31 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
18:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T376905)', diff saved to https://phabricator.wikimedia.org/P70639 and previous config saved to /var/cache/conftool/dbconfig/20241029-181838-ladsgroup.json
18:10 dduvall@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.1 refs T375660
18:10 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
18:09 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
18:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T376905)', diff saved to https://phabricator.wikimedia.org/P70638 and previous config saved to /var/cache/conftool/dbconfig/20241029-180816-ladsgroup.json
18:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
18:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
18:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T376905)', diff saved to https://phabricator.wikimedia.org/P70637 and previous config saved to /var/cache/conftool/dbconfig/20241029-180750-ladsgroup.json
17:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P70636 and previous config saved to /var/cache/conftool/dbconfig/20241029-175243-ladsgroup.json
17:51 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
17:50 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
17:49 brett: Remove RSA cert support from Icinga, librenms (T375569)
17:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P70635 and previous config saved to /var/cache/conftool/dbconfig/20241029-173735-ladsgroup.json
17:37 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
17:36 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
17:32 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
17:31 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
17:30 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
17:30 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
17:29 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
17:29 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply
17:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T376905)', diff saved to https://phabricator.wikimedia.org/P70634 and previous config saved to /var/cache/conftool/dbconfig/20241029-172228-ladsgroup.json
17:17 sergi0: Running `foreachwiki userOptions.php --delete --old=A --old=D --old=C --old=null --old=imagerecommendation --old=linkrecommendation growthexperiments-homepage-variant`
17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T376905)', diff saved to https://phabricator.wikimedia.org/P70633 and previous config saved to /var/cache/conftool/dbconfig/20241029-171258-ladsgroup.json
17:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
17:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
17:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
17:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T376905)', diff saved to https://phabricator.wikimedia.org/P70632 and previous config saved to /var/cache/conftool/dbconfig/20241029-170657-ladsgroup.json
17:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
17:00 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
16:58 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
16:58 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
16:57 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
16:56 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
16:55 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
16:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
16:54 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
16:54 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
16:53 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P70631 and previous config saved to /var/cache/conftool/dbconfig/20241029-165150-ladsgroup.json
16:49 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
16:47 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
16:47 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1016.eqiad.wmnet with OS bullseye
16:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
16:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
16:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P70630 and previous config saved to /var/cache/conftool/dbconfig/20241029-163643-ladsgroup.json
16:35 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
16:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
16:26 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
16:26 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
16:26 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
16:25 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2041.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
16:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T376905)', diff saved to https://phabricator.wikimedia.org/P70629 and previous config saved to /var/cache/conftool/dbconfig/20241029-162136-ladsgroup.json
16:21 rzl@deploy2002: Finished scap sync-world: 1079056 T376923 (duration: 11m 47s)
16:19 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti2041.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
16:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2044.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
16:16 rzl@deploy2002: rzl: Continuing with sync
16:16 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
16:15 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
16:14 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti2044.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
16:13 rzl@deploy2002: rzl: 1079056 T376923 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:12 rzl@deploy2002: Started scap sync-world: 1079056 T376923
16:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T376905)', diff saved to https://phabricator.wikimedia.org/P70627 and previous config saved to /var/cache/conftool/dbconfig/20241029-161103-ladsgroup.json
16:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
16:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
16:07 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2043.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
16:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
16:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
16:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T376905)', diff saved to https://phabricator.wikimedia.org/P70626 and previous config saved to /var/cache/conftool/dbconfig/20241029-160607-ladsgroup.json
16:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2041.codfw.wmnet
16:05 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2041.codfw.wmnet
16:03 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
16:02 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
16:01 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti2043.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
16:00 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2040.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
15:56 cjming@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
15:56 cjming@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
15:55 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
15:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti2040.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
15:54 cjming@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
15:54 cjming@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
15:54 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
15:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P70625 and previous config saved to /var/cache/conftool/dbconfig/20241029-155101-ladsgroup.json
15:47 moritzm: installing libheif security updates
15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P70624 and previous config saved to /var/cache/conftool/dbconfig/20241029-153554-ladsgroup.json
15:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2040.codfw.wmnet
15:25 XioNoX: test prefering lumen-ATT path in eqiad
15:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2040.codfw.wmnet
15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T376905)', diff saved to https://phabricator.wikimedia.org/P70623 and previous config saved to /var/cache/conftool/dbconfig/20241029-152047-ladsgroup.json
15:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2039.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
15:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1003.eqiad.wmnet with OS bookworm
15:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti2039.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
15:10 claime: Running `/usr/bin/systemd-cat -t "import-wikitech.sh" /wikitech-static/wikitechsync/import-wikitech.sh &` on wikitech-static - T348503
15:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2130 (T376905)', diff saved to https://phabricator.wikimedia.org/P70622 and previous config saved to /var/cache/conftool/dbconfig/20241029-150953-ladsgroup.json
15:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
15:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
15:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T376905)', diff saved to https://phabricator.wikimedia.org/P70621 and previous config saved to /var/cache/conftool/dbconfig/20241029-150926-ladsgroup.json
15:08 claime: Running `find /srv/mediawiki/images/wikitech/archive -type f | xargs rm` on wikitech-static - T374114 T348503
15:00 claime: Running php maintenance/deleteArchivedFiles.php --delete on wikitech-static - T374114
14:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2039.codfw.wmnet
14:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1003.eqiad.wmnet with reason: host reimage
14:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P70619 and previous config saved to /var/cache/conftool/dbconfig/20241029-145419-ladsgroup.json
14:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2039.codfw.wmnet
14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1003.eqiad.wmnet with reason: host reimage
14:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2038.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
14:47 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti2038.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
14:44 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1009.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
14:40 reedy@deploy2002: Finished scap sync-world: 1.44.0-wmf.1 backports to fix deprecated logspam T375660 T377521 (duration: 07m 21s)
14:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2038.codfw.wmnet
14:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P70616 and previous config saved to /var/cache/conftool/dbconfig/20241029-143912-ladsgroup.json
14:39 herron: centrallog1002:~# systemctl restart rsyslogd
14:38 elukey@cumin2002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1009.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
14:35 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2038.codfw.wmnet
14:35 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2037.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
14:34 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1003.eqiad.wmnet with OS bookworm
14:32 reedy@deploy2002: Started scap sync-world: 1.44.0-wmf.1 backports to fix deprecated logspam T375660 T377521
14:29 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2037.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
14:29 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2037.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
14:25 MichaelG_WMF: T372337 clearing dangling database-records for link suggestions by running `mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=eswiki --db-table --force`
14:24 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2037.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
14:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T376905)', diff saved to https://phabricator.wikimedia.org/P70615 and previous config saved to /var/cache/conftool/dbconfig/20241029-142405-ladsgroup.json
14:20 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-lab1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
14:19 elukey: restart rsyslog on centrallog1002 - connection errors, failing prometheus probes
14:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2037.codfw.wmnet
14:18 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2037.codfw.wmnet
14:17 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ml-lab1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
14:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2116 (T376905)', diff saved to https://phabricator.wikimedia.org/P70614 and previous config saved to /var/cache/conftool/dbconfig/20241029-141532-ladsgroup.json
14:16 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
14:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
14:14 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2036.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
14:09 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2036.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
14:07 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-lab1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
14:06 kostajh: UTC afternoon deploys done
14:05 kharlan@deploy2002: Finished scap sync-world: Backport for AuthManagerStatsdHandler: Add label for wiki (T375505), AuthManagerStatsdHandler: Add label for wiki (T375505) (duration: 07m 53s)
14:01 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-lab1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
14:00 kharlan@deploy2002: kharlan: Continuing with sync
13:59 kharlan@deploy2002: kharlan: Backport for AuthManagerStatsdHandler: Add label for wiki (T375505), AuthManagerStatsdHandler: Add label for wiki (T375505) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:57 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2036.codfw.wmnet
13:57 kharlan@deploy2002: Started scap sync-world: Backport for AuthManagerStatsdHandler: Add label for wiki (T375505), AuthManagerStatsdHandler: Add label for wiki (T375505)
13:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2036.codfw.wmnet
13:48 jforrester@deploy2002: Finished scap sync-world: Backport for fix ibawiki's tagline svg path (duration: 07m 41s)
13:47 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 16347
13:46 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 16347
13:45 ayounsi@cumin1002: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 16347
13:45 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 16347
13:43 jforrester@deploy2002: jforrester, hamishz: Continuing with sync
13:42 jforrester@deploy2002: jforrester, hamishz: Backport for fix ibawiki's tagline svg path synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:42 moritzm: installing ghoscript security updates
13:40 jforrester@deploy2002: Started scap sync-world: Backport for fix ibawiki's tagline svg path
13:38 jforrester@deploy2002: Finished scap sync-world: Backport for Allow admins on testwiki to grant and remove upwizcampeditors (T378067), nlwiki, commonswiki, wikidata: lift IP cap for edit-a-thon (T377930) (duration: 08m 03s)
13:34 jforrester@deploy2002: dreamrimmer, superzerocool, jforrester: Continuing with sync
13:33 jforrester@deploy2002: dreamrimmer, superzerocool, jforrester: Backport for Allow admins on testwiki to grant and remove upwizcampeditors (T378067), nlwiki, commonswiki, wikidata: lift IP cap for edit-a-thon (T377930) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:31 jforrester@deploy2002: Started scap sync-world: Backport for Allow admins on testwiki to grant and remove upwizcampeditors (T378067), nlwiki, commonswiki, wikidata: lift IP cap for edit-a-thon (T377930)
13:31 arnaudb@cumin1002: dbctl commit (dc=all): 'db2211 (re)pooling @ 100%: post clone repool', diff saved to https://phabricator.wikimedia.org/P70612 and previous config saved to /var/cache/conftool/dbconfig/20241029-132956-arnaudb.json
13:30 mszabo@deploy2002: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
13:30 mszabo@deploy2002: helmfile [codfw] START helmfile.d/services/ipoid: apply
13:29 mszabo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
{{safesubst:SAL entry|1=13:28 jforrester@deploy2002: Finished scap sync-world: Backport for annwiki: Add logo (T377535), kgewiki: Add logo (T377075), shnwikinews: Add logo (T377543), gorwikiquote: Add logo (T377542), moswiki: Add logo (T377539), ibawiki: Add logo (T377538), rskwiki: Add logo (T377536), [[gerrit:10840}}
13:28 mszabo@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
13:27 mszabo@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
13:26 mszabo@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2035.codfw.wmnet
13:23 jforrester@deploy2002: jforrester, hamishz: Continuing with sync
{{safesubst:SAL entry|1=13:22 jforrester@deploy2002: jforrester, hamishz: Backport for annwiki: Add logo (T377535), kgewiki: Add logo (T377075), shnwikinews: Add logo (T377543), gorwikiquote: Add logo (T377542), moswiki: Add logo (T377539), ibawiki: Add logo (T377538), rskwiki: Add logo (T377536), [[gerrit:1084079|td}}
{{safesubst:SAL entry|1=13:20 jforrester@deploy2002: Started scap sync-world: Backport for annwiki: Add logo (T377535), kgewiki: Add logo (T377075), shnwikinews: Add logo (T377543), gorwikiquote: Add logo (T377542), moswiki: Add logo (T377539), ibawiki: Add logo (T377538), rskwiki: Add logo (T377536), [[gerrit:108407}}
13:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2035.codfw.wmnet
13:17 jforrester@deploy2002: Finished scap sync-world: Backport for ExtensionDistributor: Mark 1.43 as beta (T372322), ExtensionDistributor: Remove EOL 1.40 (T364989), enwiktionary: Enable mobile page tabs for non logged in users (T377648) (duration: 12m 41s)
13:14 arnaudb@cumin1002: dbctl commit (dc=all): 'db2211 (re)pooling @ 75%: post clone repool', diff saved to https://phabricator.wikimedia.org/P70610 and previous config saved to /var/cache/conftool/dbconfig/20241029-131451-arnaudb.json
13:11 jforrester@deploy2002: zabe, macfan4000, hamishz, jforrester: Continuing with sync
13:10 jforrester@deploy2002: zabe, macfan4000, hamishz, jforrester: Backport for ExtensionDistributor: Mark 1.43 as beta (T372322), ExtensionDistributor: Remove EOL 1.40 (T364989), enwiktionary: Enable mobile page tabs for non logged in users (T377648) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2035.codfw.wmnet
13:07 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2035.codfw.wmnet
13:05 jforrester@deploy2002: Started scap sync-world: Backport for ExtensionDistributor: Mark 1.43 as beta (T372322), ExtensionDistributor: Remove EOL 1.40 (T364989), enwiktionary: Enable mobile page tabs for non logged in users (T377648)
12:59 arnaudb@cumin1002: dbctl commit (dc=all): 'db2211 (re)pooling @ 50%: post clone repool', diff saved to https://phabricator.wikimedia.org/P70607 and previous config saved to /var/cache/conftool/dbconfig/20241029-125945-arnaudb.json
12:50 moritzm: installing Apache security updates
12:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db2211 (re)pooling @ 25%: post clone repool', diff saved to https://phabricator.wikimedia.org/P70606 and previous config saved to /var/cache/conftool/dbconfig/20241029-124440-arnaudb.json
12:43 claime: Manually relaunched import-wikitech.sh on wikitech-static - T374114
12:42 claime: Killed dead and stacked import-wikitech.sh processes on wikitech-static - T374114
12:28 andrewtavis-wmde@deploy2002: Finished deploy [airflow-dags/wmde@d85a93c]: (no justification provided) (duration: 00m 30s)
12:27 andrewtavis-wmde@deploy2002: Started deploy [airflow-dags/wmde@d85a93c]: (no justification provided)
12:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2015.codfw.wmnet
12:04 cgoubert@deploy2002: Finished scap sync-world: T377958 - full mediawiki image rebuild and deployment to add helper scripts for mwcron, mwscript (duration: 29m 44s)
11:39 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2044.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
11:39 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2044.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
11:36 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2041.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
11:36 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2041.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
11:36 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2040.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
11:36 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2040.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
11:35 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2039.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
11:35 cgoubert@deploy2002: Started scap sync-world: T377958 - full mediawiki image rebuild and deployment to add helper scripts for mwcron, mwscript
11:35 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2039.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
11:35 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2038.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
11:34 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2038.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
11:33 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2037.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
11:33 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2037.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
11:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2036.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
11:32 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2036.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
11:30 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:30 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:29 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:29 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:29 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:29 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:28 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:28 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:27 claime: Rebuilding php{7.4,8.1}-fpm-multiversion-base - T377958
11:26 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:26 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:25 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:25 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:24 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:23 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:23 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:23 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:22 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:21 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:18 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:18 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:16 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:16 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:15 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:15 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:11 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1009.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:11 elukey@cumin2002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1009.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:10 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-lab1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:10 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ml-lab1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:10 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-lab1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:09 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ml-lab1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:09 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1011.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:09 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ml-serve1011.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1010.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:07 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ml-serve1010.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:05 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1009.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:05 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ml-serve1009.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:02 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
11:01 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:59 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2044.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:59 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2043.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:58 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2043.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:53 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2042.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2211.codfw.wmnet onto db2223.codfw.wmnet
10:50 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2042.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:14 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2015.codfw.wmnet
10:08 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2211.codfw.wmnet onto db2223.codfw.wmnet
10:07 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye
10:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2015.codfw.wmnet
10:04 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2015.codfw.wmnet
09:56 moritzm: installing wireshark security updates
09:41 kostajh: UTC morning deploys done
09:23 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2211.codfw.wmnet onto db2223.codfw.wmnet
09:23 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2211.codfw.wmnet onto db2223.codfw.wmnet
09:22 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2211.codfw.wmnet onto db2223.codfw.wmnet
09:21 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2211.codfw.wmnet onto db2223.codfw.wmnet
09:20 kharlan@deploy2002: Finished scap sync-world: Backport for temp accounts: Enable temp account autocreation on five pilot wikis (T378334), beta: enable "Surfacing structured tasks" for an early beta-wiki (T376677) (duration: 24m 42s)
09:20 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2211.codfw.wmnet onto db2223.codfw.wmnet
09:20 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2211.codfw.wmnet onto db2223.codfw.wmnet
09:16 kharlan@deploy2002: migr, kharlan: Continuing with sync
09:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: maintenance, host is not pooled
09:13 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: maintenance, host is not pooled
09:07 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2211.codfw.wmnet onto db2223.codfw.wmnet
09:07 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2211.codfw.wmnet onto db2223.codfw.wmnet
08:58 kharlan@deploy2002: migr, kharlan: Backport for temp accounts: Enable temp account autocreation on five pilot wikis (T378334), beta: enable "Surfacing structured tasks" for an early beta-wiki (T376677) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:56 kharlan@deploy2002: Started scap sync-world: Backport for temp accounts: Enable temp account autocreation on five pilot wikis (T378334), beta: enable "Surfacing structured tasks" for an early beta-wiki (T376677)
08:55 moritzm: upgrade irc.wikimedia.org to ircstream 1.0+wmf12u1 T376014
08:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: T378320', diff saved to https://phabricator.wikimedia.org/P70604 and previous config saved to /var/cache/conftool/dbconfig/20241029-085507-arnaudb.json
08:53 kharlan@deploy2002: Finished scap sync-world: Backport for Unblock CI (T377947), StatsLib: Set label for wiki ID (T375496) (duration: 13m 06s)
08:52 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2211.codfw.wmnet onto db2223.codfw.wmnet
08:52 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2211.codfw.wmnet onto db2223.codfw.wmnet
08:51 moritzm: uploaded ircstream 1.0+wmf12u1 to apt.wikimedia.org T376014
08:49 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 56258
08:48 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 56258
08:47 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 264567
08:47 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 264567
08:47 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16591
08:46 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 16591
08:46 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 200478
08:46 kharlan@deploy2002: kharlan: Continuing with sync
08:45 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 200478
08:45 kharlan@deploy2002: kharlan: Backport for Unblock CI (T377947), StatsLib: Set label for wiki ID (T375496) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 56258
08:44 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 56258
08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8966
08:42 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 8966
08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9038
08:41 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 9038
08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16347
08:41 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2211.codfw.wmnet onto db2223.codfw.wmnet
08:41 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2211.codfw.wmnet onto db2223.codfw.wmnet
08:41 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 16347
08:40 arnaudb@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: T378320', diff saved to https://phabricator.wikimedia.org/P70603 and previous config saved to /var/cache/conftool/dbconfig/20241029-084002-arnaudb.json
08:40 kharlan@deploy2002: Started scap sync-world: Backport for Unblock CI (T377947), StatsLib: Set label for wiki ID (T375496)
08:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2211 in db2223 for T373579', diff saved to https://phabricator.wikimedia.org/P70602 and previous config saved to /var/cache/conftool/dbconfig/20241029-083035-arnaudb.json
08:29 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 28306
08:29 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2211 - depooling db2211 to clone on db2223
08:29 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2211 - depooling db2211 to clone on db2223
08:29 arnaudb@cumin1002: dbctl commit (dc=all): 'depool preshot db2211', diff saved to https://phabricator.wikimedia.org/P70601 and previous config saved to /var/cache/conftool/dbconfig/20241029-082903-arnaudb.json
08:29 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2223.codfw.wmnet with reason: provisionning db2223.codfw.wmnet - T373579
08:28 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2223.codfw.wmnet with reason: provisionning db2223.codfw.wmnet - T373579
08:28 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2211.codfw.wmnet with reason: provisionning db2223.codfw.wmnet - T373579
08:28 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 28306
08:28 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2211.codfw.wmnet with reason: provisionning db2223.codfw.wmnet - T373579
08:24 arnaudb@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: T378320', diff saved to https://phabricator.wikimedia.org/P70600 and previous config saved to /var/cache/conftool/dbconfig/20241029-082456-arnaudb.json
08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts irc2004.wikimedia.org
08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc2004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
08:21 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc2004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
08:15 jmm@cumin2002: START - Cookbook sre.dns.netbox
08:09 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts irc2004.wikimedia.org
08:09 arnaudb@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: T378320', diff saved to https://phabricator.wikimedia.org/P70599 and previous config saved to /var/cache/conftool/dbconfig/20241029-080951-arnaudb.json
08:08 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1169 quickly with 2 steps - index rebuilt
08:08 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1169 quickly with 2 steps - index rebuilt
08:08 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1169 gradually with 4 steps - index rebuilt
08:08 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1169 gradually with 4 steps - index rebuilt
08:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts irc1004.wikimedia.org
08:07 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:07 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc1004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
08:06 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1169 gradually with 4 steps - index rebuilt
08:06 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1169 gradually with 4 steps - index rebuilt
08:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc1004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
08:03 moritzm: installing qemu security updates
07:58 jmm@cumin2002: START - Cookbook sre.dns.netbox
07:53 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts irc1004.wikimedia.org
04:01 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.26 (duration: 01m 04s)
03:53 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.1 refs T375660 (duration: 49m 51s)
03:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.1 refs T375660

2024-10-28

23:08 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
23:08 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
23:06 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
23:06 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
23:05 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
23:05 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
23:04 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
23:03 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
23:03 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
23:02 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
23:01 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
23:01 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
22:28 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@d85a93c]: add missing comma (duration: 00m 36s)
22:27 ebernhardson@deploy2002: Started deploy [airflow-dags/search@d85a93c]: add missing comma
22:10 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@99eb6f3]: T375387: update discolytics to 0.27.0 (duration: 00m 50s)
22:09 ebernhardson@deploy2002: Started deploy [airflow-dags/search@99eb6f3]: T375387: update discolytics to 0.27.0
22:00 ryankemper: T372074 `sudo requestctl delete ipblock abuse/wdqs` && `sudo requestctl delete pattern ua/wdqs_sparql` to clean up objects removed in commit `d26fc1e910579d33d33ec3d5a192d137045eba4b` ( <-- this occurred before the requestctl commit; i just missed making the irc log)
21:48 ryankemper: T372074 `sudo requestctl commit`
21:29 kostajh: UTC late deploys done, for real
21:26 ryankemper: T372074 `sudo requestctl delete action cache-text/T372074` && `sudo requestctl delete action cache-text/T372074_wdqs_codfw_flap`
21:26 kharlan@deploy2002: Finished scap sync-world: Backport for GlobalContributionsPager: Make article link redirect to the page (T378155) (duration: 09m 01s)
21:21 kharlan@deploy2002: kharlan: Continuing with sync
21:19 kharlan@deploy2002: kharlan: Backport for GlobalContributionsPager: Make article link redirect to the page (T378155) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:17 kharlan@deploy2002: Started scap sync-world: Backport for GlobalContributionsPager: Make article link redirect to the page (T378155)
20:44 kostajh: UTC late deploys done
{{safesubst:SAL entry|1=20:42 kharlan@deploy2002: Finished scap sync-world: Backport for Partial Revert "Make sure contributor's name is on its line" (T378142), Restore missing second argument to "mapState" in QuickView.vue (T378204), GlobalContributionsPager: Use Special:PermanentLink to construct link (T378155), [[gerrit:1083886|GlobalContributionsPager: Don't display external namespace in}}
20:37 kharlan@deploy2002: jdlrobson, kharlan: Continuing with sync
{{safesubst:SAL entry|1=20:33 kharlan@deploy2002: jdlrobson, kharlan: Backport for Partial Revert "Make sure contributor's name is on its line" (T378142), Restore missing second argument to "mapState" in QuickView.vue (T378204), GlobalContributionsPager: Use Special:PermanentLink to construct link (T378155), [[gerrit:1083886|GlobalContributionsPager: Don't display external namespace in artic}}
{{safesubst:SAL entry|1=20:30 kharlan@deploy2002: Started scap sync-world: Backport for Partial Revert "Make sure contributor's name is on its line" (T378142), Restore missing second argument to "mapState" in QuickView.vue (T378204), GlobalContributionsPager: Use Special:PermanentLink to construct link (T378155), [[gerrit:1083886|GlobalContributionsPager: Don't display external namespace in}}
19:52 brett: Removed RSA certificate support from tlsproxy (T375569)
19:33 brett: Removed RSA certificate support from mirrors, dumps (T375569)
19:27 gmodena@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
19:26 gmodena@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
19:24 gmodena@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
19:24 gmodena@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
19:23 brett: Removed RSA certificate support from ldap, archiva, durum
19:21 gmodena@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
19:21 gmodena@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
19:18 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
19:17 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
19:15 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
19:15 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
19:14 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
19:13 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
17:43 jiawang@deploy2002: Finished deploy [airflow-dags/analytics_product@a7456f9]: deploy tsp pipelines (duration: 01m 33s)
17:42 jiawang@deploy2002: Started deploy [airflow-dags/analytics_product@a7456f9]: deploy tsp pipelines
17:04 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: cloudvirt1063 needs maintenance T375223
17:03 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: cloudvirt1063 needs maintenance T375223
16:55 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1013.eqiad.wmnet
16:50 vgutierrez@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs1013.eqiad.wmnet
16:50 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1014.eqiad.wmnet
16:44 vgutierrez@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs1014.eqiad.wmnet
16:44 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1015.eqiad.wmnet
16:38 vgutierrez@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs1015.eqiad.wmnet
16:38 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1016.eqiad.wmnet
16:32 vgutierrez@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs1016.eqiad.wmnet
16:26 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcumin1001.eqiad.wmnet
16:20 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcumin1001.eqiad.wmnet
16:20 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcumin2001.codfw.wmnet
16:16 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcumin2001.codfw.wmnet
15:51 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 02m 25s)
15:49 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 07m 35s)
15:48 XioNoX: re-enable IX BGP sessions in eqiad
15:30 jan_drewniak: starting portals deployment
15:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
15:28 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
15:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
15:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:51 MichaelG_WMF: T372337 - run `mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=eswiki --search-index` to fix the remaining ca. 10K dangling search index records
14:37 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2042.codfw.wmnet to cluster codfw and group D
14:36 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2042.codfw.wmnet to cluster codfw and group D
14:08 urbanecm@deploy2002: Finished scap sync-world: Backport for knwiktionary: update logo, wordmark (T360022), hewikisource: add project namespace alias (T378303), Add config for testing T375264 on beta (T377988) (duration: 10m 43s)
14:04 urbanecm@deploy2002: anzx, cparle, urbanecm: Continuing with sync
14:01 urbanecm@deploy2002: anzx, cparle, urbanecm: Backport for knwiktionary: update logo, wordmark (T360022), hewikisource: add project namespace alias (T378303), Add config for testing T375264 on beta (T377988) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:58 urbanecm@deploy2002: Started scap sync-world: Backport for knwiktionary: update logo, wordmark (T360022), hewikisource: add project namespace alias (T378303), Add config for testing T375264 on beta (T377988)
13:57 urbanecm@deploy2002: Sync cancelled.
13:54 urbanecm@deploy2002: anzx, urbanecm: Backport for knwiktionary: update logo, wordmark (T360022) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:52 urbanecm@deploy2002: Started scap sync-world: Backport for knwiktionary: update logo, wordmark (T360022)
13:49 arnaudb@cumin2002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2211 quickly with 2 steps - test fast pool
13:41 urbanecm@deploy2002: Finished scap sync-world: Backport for Enable CampaignEvents collaboration list by default (T375141), beta: Drop $wgCampaignEventsShowEventInvitationSpecialPages (T373442), prod: Drop $wgCampaignEventsShowEventInvitationSpecialPages (T373442) (duration: 13m 43s)
13:38 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2041.codfw.wmnet to cluster codfw and group D
13:37 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2041.codfw.wmnet to cluster codfw and group D
13:36 urbanecm@deploy2002: urbanecm, daimona: Continuing with sync
13:33 arnaudb@cumin2002: START - Cookbook sre.mysql.pool db2211 quickly with 2 steps - test fast pool
13:31 arnaudb@cumin2002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2211 - test depool
13:31 arnaudb@cumin2002: START - Cookbook sre.mysql.depool db2211 - test depool
13:29 urbanecm@deploy2002: urbanecm, daimona: Backport for Enable CampaignEvents collaboration list by default (T375141), beta: Drop $wgCampaignEventsShowEventInvitationSpecialPages (T373442), prod: Drop $wgCampaignEventsShowEventInvitationSpecialPages (T373442) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:27 urbanecm@deploy2002: Started scap sync-world: Backport for Enable CampaignEvents collaboration list by default (T375141), beta: Drop $wgCampaignEventsShowEventInvitationSpecialPages (T373442), prod: Drop $wgCampaignEventsShowEventInvitationSpecialPages (T373442)
13:16 moritzm: installing bash/zsh updates from bookworm point release
12:12 moritzm: upgrade irc.wikimedia.org to ircstream 0.13.0+wmf12u3 T376014
12:06 _joe_: uploaded conftool 4.0.0-1 to reprepro T376877
11:30 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Specify wiki ID to ::getId call in GlobalBlockingHandler (T378085) (duration: 07m 44s)
11:25 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
11:25 dreamyjazz@deploy2002: dreamyjazz: Backport for Specify wiki ID to ::getId call in GlobalBlockingHandler (T378085) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:23 dreamyjazz@deploy2002: Started scap sync-world: Backport for Specify wiki ID to ::getId call in GlobalBlockingHandler (T378085)
11:05 volans: updated spicerack to v8.15.1 on cumin1002
10:58 Dreamy_Jazz: Ran `DROP TABLE /*_*/globalblocks` on all beta wikis (excluding the centralauth DB) - T377742
10:51 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2042.codfw.wmnet
10:50 elukey: elukey@puppetmaster1001:~$ sudo puppet cert destroy puppetboard.discovery.wmnet
10:46 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti2042.codfw.wmnet
10:46 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2041.codfw.wmnet
10:39 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2042.codfw.wmnet
10:36 dcausse: T378227: rebuilding dewiki_titlesuggest
10:35 moritzm: uploaded ircstream 0.13.0+wmf12u3 to apt.wikimedia.org (includes a fix which should hopefully reduce connection errors with bots using smart4irc)
10:34 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti2041.codfw.wmnet
10:34 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2041.codfw.wmnet
10:34 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti2042.codfw.wmnet
10:29 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti2041.codfw.wmnet
10:28 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2041.codfw.wmnet
10:12 volans: updated spicerack to v8.15.1 on cumin2002
09:21 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti2041.codfw.wmnet
09:11 hashar: Restarted CI Jenkins for plugin update - T378327
08:42 dcausse: T378227: deleting broken cirrus titlesugest index dewiki_titlesuggest_1729824440
08:38 kostajh: UTC morning deploys done
08:38 kharlan@deploy2002: Finished scap sync-world: Backport for ContributionsPager: Fix getTemplateParams() parameter (T378132), Fix getTemplateParams() $classes parameter (T378132) (duration: 09m 38s)
08:33 kharlan@deploy2002: kharlan: Continuing with sync
08:31 kharlan@deploy2002: kharlan: Backport for ContributionsPager: Fix getTemplateParams() parameter (T378132), Fix getTemplateParams() $classes parameter (T378132) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:28 kharlan@deploy2002: Started scap sync-world: Backport for ContributionsPager: Fix getTemplateParams() parameter (T378132), Fix getTemplateParams() $classes parameter (T378132)
08:27 hashar: Pushed https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CheckUser/+/1083592 and https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1083591 for wmf/1.43.0-wmf.28 / T378132 due to a dependency loop
08:24 hashar: Pushed https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CheckUser/+/1083592 and https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CheckUser/+/1083592 for wmf/1.43.0-wmf.28 / T378132 due to a dependency loop
08:19 hashar: Changed UTC morning backport window from 00:00 SF to 09:00 CET (aka 08:00 UTC) | UTC morning backport window
08:07 kartik@deploy2002: Finished scap sync-world: Backport for Disable MT in Content Translation on Lithuanian Wikipedia (T364073) (duration: 22m 24s)
08:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1234.eqiad.wmnet with reason: maintenance T378267
08:01 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1234.eqiad.wmnet with reason: maintenance T378267
08:01 hashar: Restarted CI Jenkins to update the Collapsible Sections plugin | T378327
07:57 kartik@deploy2002: kartik: Continuing with sync
07:56 kartik@deploy2002: kartik: Backport for Disable MT in Content Translation on Lithuanian Wikipedia (T364073) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:45 kartik@deploy2002: Started scap sync-world: Backport for Disable MT in Content Translation on Lithuanian Wikipedia (T364073)
07:14 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db[1169,1234].eqiad.wmnet with reason: maintenance
07:14 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db[1169,1234].eqiad.wmnet with reason: maintenance
06:07 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: replication broken T378320
06:06 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: replication broken T378320
06:03 taavi@cumin1002: dbctl commit (dc=all): 'depool db1169', diff saved to https://phabricator.wikimedia.org/P70590 and previous config saved to /var/cache/conftool/dbconfig/20241028-060327-taavi.json

2024-10-27

13:41 Dreamy_Jazz: Starting MediaModeration scanning on group1 wikis
13:37 Dreamy_Jazz: Starting MediaModeration scanning on group2 wikis

2024-10-26

16:29 mvernon@cumin1002: dbctl commit (dc=all): 'Depool db1234', diff saved to https://phabricator.wikimedia.org/P70589 and previous config saved to /var/cache/conftool/dbconfig/20241026-162946-mvernon.json
16:29 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1234.eqiad.wmnet with reason: spontaneous reboot, depooling 'til Monday
16:28 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1234.eqiad.wmnet with reason: spontaneous reboot, depooling 'til Monday
02:03 tzatziki: removing 9 files for legal compliance

2024-10-25

18:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup2012.codfw.wmnet with OS bookworm
18:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
18:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup2012.codfw.wmnet with reason: host reimage
17:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2012.codfw.wmnet with reason: host reimage
16:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host backup2012.codfw.wmnet with OS bookworm
16:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2012.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:28 JustHannah: T378170 Ran mwscript-k8s extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=trwiki --logwiki=metawiki 'Peter.kerepesi' 'Peakbagger77' @ 11:57:19 UTC
15:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host backup2012.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup2012.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host backup2012.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:53 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host backup2012
15:52 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host backup2012
15:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:52 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding backup2012 to codfw - jhancock@cumin2002"
15:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding backup2012 to codfw - jhancock@cumin2002"
15:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox
15:07 dancy@deploy2002: Installation of scap version "4.118.0" completed for 209 hosts
15:03 dancy@deploy2002: Installing scap version "4.118.0" for 209 hosts
14:31 herron: alert1002: manually killed stunnel4 process to clear puppet failure T375143
14:02 sukhe: running authdns-update for CR 1082548
10:31 arnaudb@cumin1002: dbctl commit (dc=all): 'maintenance', diff saved to https://phabricator.wikimedia.org/P70588 and previous config saved to /var/cache/conftool/dbconfig/20241025-103157-arnaudb.json
10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:18 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:17 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
10:16 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
10:15 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
10:12 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
09:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2014.codfw.wmnet
09:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2012.codfw.wmnet
09:17 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:17 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2012.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
09:17 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2012.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
09:12 jmm@cumin2002: START - Cookbook sre.dns.netbox
09:06 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2012.codfw.wmnet
09:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2011.codfw.wmnet
09:05 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2011.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
09:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2011.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
08:54 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
08:53 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
08:47 jmm@cumin2002: START - Cookbook sre.dns.netbox
08:42 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2011.codfw.wmnet
08:27 moritzm: installing wireshark security updates
08:24 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2014.codfw.wmnet
08:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of kubestagemaster2004.codfw.wmnet to plain
08:22 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of kubestagemaster2004.codfw.wmnet to plain
08:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2014.codfw.wmnet
08:21 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2014.codfw.wmnet
08:17 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of kubestagemaster2004.codfw.wmnet to drbd
08:11 moritzm: imported openjdk-8 8u422-b05-1~deb12u1 to component/jdk for bookworm-wikimedia
08:04 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of kubestagemaster2004.codfw.wmnet to drbd
08:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2014.codfw.wmnet
08:02 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2014.codfw.wmnet
08:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of kubestagemaster2004.codfw.wmnet to plain
08:01 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of kubestagemaster2004.codfw.wmnet to plain
07:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of kubestagemaster2004.codfw.wmnet to drbd
07:42 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of kubestagemaster2004.codfw.wmnet to drbd
06:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2014.codfw.wmnet
06:47 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2014.codfw.wmnet
06:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast1003.wikimedia.org
06:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast1003.wikimedia.org
06:27 jmm@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast2003.wikimedia.org
06:19 jmm@cumin1002: START - Cookbook sre.hosts.reboot-single for host bast2003.wikimedia.org

2024-10-24

23:09 tzatziki: removing 3 files for legal compliance
22:27 zabe@deploy2002: Finished scap sync-world: Backport for s8: Reduce revision-slots cache expiry to 60 seconds (T183490) (duration: 07m 03s)
22:23 zabe@deploy2002: zabe: Continuing with sync
22:23 zabe@deploy2002: zabe: Backport for s8: Reduce revision-slots cache expiry to 60 seconds (T183490) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:20 zabe@deploy2002: Started scap sync-world: Backport for s8: Reduce revision-slots cache expiry to 60 seconds (T183490)
21:37 legoktm@deploy2002: Finished scap sync-world: Backport for Update interwiki cache (duration: 07m 51s)
21:32 legoktm@deploy2002: legoktm: Continuing with sync
21:31 legoktm@deploy2002: legoktm: Backport for Update interwiki cache synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:29 legoktm@deploy2002: Started scap sync-world: Backport for Update interwiki cache
21:25 tzatziki: removing 1 file for legal compliance
21:24 Dreamy_Jazz: Ran `foreachwiki emptyUserGroup.php checkuser-temporary-account-viewer` on the beta wikis.
21:14 thcipriani@deploy2002: Finished scap sync-world: Backport for Enable edit check on nlwiki (T377551) (duration: 09m 07s)
21:09 thcipriani@deploy2002: thcipriani, kemayo: Continuing with sync
21:07 thcipriani@deploy2002: thcipriani, kemayo: Backport for Enable edit check on nlwiki (T377551) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:05 thcipriani@deploy2002: Started scap sync-world: Backport for Enable edit check on nlwiki (T377551)
21:02 thcipriani@deploy2002: Finished scap sync-world: Backport for chore: Move authevents logging into AuthManager (T341650 T375510 T375505), chore: AuthManager::autoCreateUser log authevents now (T341650 T375510 T375505) (duration: 18m 10s)
20:58 thcipriani@deploy2002: tgr, thcipriani: Continuing with sync
20:53 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002
20:46 thcipriani@deploy2002: tgr, thcipriani: Backport for chore: Move authevents logging into AuthManager (T341650 T375510 T375505), chore: AuthManager::autoCreateUser log authevents now (T341650 T375510 T375505) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:44 thcipriani@deploy2002: Started scap sync-world: Backport for chore: Move authevents logging into AuthManager (T341650 T375510 T375505), chore: AuthManager::autoCreateUser log authevents now (T341650 T375510 T375505)
20:40 thcipriani@deploy2002: Finished scap sync-world: Backport for Configure settings for annwiki, nrwiki, mywikisource (T375102 T377160 T363270) (duration: 11m 09s)
20:35 thcipriani@deploy2002: thcipriani, pppery: Continuing with sync
20:31 thcipriani@deploy2002: thcipriani, pppery: Backport for Configure settings for annwiki, nrwiki, mywikisource (T375102 T377160 T363270) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:29 thcipriani@deploy2002: Started scap sync-world: Backport for Configure settings for annwiki, nrwiki, mywikisource (T375102 T377160 T363270)
20:24 thcipriani@deploy2002: Finished scap sync-world: Backport for Deploy missing.php redirects for Allemanic German (T376923) (duration: 14m 08s)
20:20 thcipriani@deploy2002: thcipriani, pppery: Continuing with sync
20:13 thcipriani@deploy2002: thcipriani, pppery: Backport for Deploy missing.php redirects for Allemanic German (T376923) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:10 thcipriani@deploy2002: Started scap sync-world: Backport for Deploy missing.php redirects for Allemanic German (T376923)
19:34 dancy@deploy2002: Finished scap sync-world: Backport for Use SpecialPage::getRobotPolicy to set robot policy (T378108) (duration: 07m 08s)
19:29 dancy@deploy2002: dancy: Continuing with sync
19:29 dancy@deploy2002: dancy: Backport for Use SpecialPage::getRobotPolicy to set robot policy (T378108) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
19:26 dancy@deploy2002: Started scap sync-world: Backport for Use SpecialPage::getRobotPolicy to set robot policy (T378108)
18:46 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002
18:09 dancy@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.43.0-wmf.28 refs T375659
17:42 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
17:42 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
17:42 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
17:41 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
17:38 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
17:38 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
17:13 dancy@deploy2002: Finished scap sync-world: Backport for AbuseLogPager: Fix passing `false` as message parameter (T377917) (duration: 07m 18s)
17:09 dancy@deploy2002: dancy: Continuing with sync
17:09 dancy@deploy2002: dancy: Backport for AbuseLogPager: Fix passing `false` as message parameter (T377917) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
17:09 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2013.codfw.wmnet
17:06 dancy@deploy2002: Started scap sync-world: Backport for AbuseLogPager: Fix passing `false` as message parameter (T377917)
17:04 urbanecm: `mwscript-k8s -f extensions/Flow/maintenance/FlowMoveBoardsToSubpages.php -- --wiki=nowiki` (running as `mw-script.codfw.ui7285yu`; T376749)
16:56 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002
16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2088.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:45 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix encoding of usernames with non-ascii letters - oblivian@cumin1002"
16:44 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix encoding of usernames with non-ascii letters - oblivian@cumin1002
16:43 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix encoding of usernames with non-ascii letters - oblivian@cumin1002
16:43 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix encoding of usernames with non-ascii letters - oblivian@cumin1002"
16:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2088.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2087
16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2088
16:13 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2088
16:13 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2087
16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2087 to codfw - jhancock@cumin2002"
16:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2087 to codfw - jhancock@cumin2002"
16:06 jhancock@cumin2002: START - Cookbook sre.dns.netbox
16:05 dancy@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.43.0-wmf.28 refs T375659
16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:51 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubernetes2016.codfw.wmnet
15:51 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubernetes2016.codfw.wmnet
15:50 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes2016.codfw.wmnet
15:48 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes2016.codfw.wmnet
15:47 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@325d943]: Deploy latest DAGs to analytics Airflow instance. T377999. (duration: 01m 07s)
15:46 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes2016.codfw.wmnet
15:46 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes2016.codfw.wmnet
15:46 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubernetes2015.codfw.wmnet
15:46 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubernetes2015.codfw.wmnet
15:45 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes2015.codfw.wmnet
15:45 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@325d943]: Deploy latest DAGs to analytics Airflow instance. T377999.
15:43 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:43 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes2015.codfw.wmnet
15:42 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2086
15:42 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2086
15:41 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:41 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2086 to codfw - jhancock@cumin2002"
15:41 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2086 to codfw - jhancock@cumin2002"
15:41 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes2015.codfw.wmnet
15:41 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes2015.codfw.wmnet
15:41 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubernetes2006.codfw.wmnet
15:40 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubernetes2006.codfw.wmnet
15:40 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes2006.codfw.wmnet
15:38 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes2006.codfw.wmnet
15:37 jhancock@cumin2002: START - Cookbook sre.dns.netbox
15:37 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes2006.codfw.wmnet
15:37 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes2006.codfw.wmnet
15:36 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubernetes2005.codfw.wmnet
15:36 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubernetes2005.codfw.wmnet
15:35 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes2005.codfw.wmnet
15:34 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubernetes1016.eqiad.wmnet
15:34 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubernetes1016.eqiad.wmnet
15:33 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes1016.eqiad.wmnet
15:32 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes2005.codfw.wmnet
15:31 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes1016.eqiad.wmnet
15:30 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes2005.codfw.wmnet
15:30 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes2005.codfw.wmnet
15:30 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes1016.eqiad.wmnet
15:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:29 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes1016.eqiad.wmnet
15:29 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubernetes1015.eqiad.wmnet
15:29 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubernetes1015.eqiad.wmnet
15:28 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes1015.eqiad.wmnet
15:26 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes1015.eqiad.wmnet
15:25 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes1015.eqiad.wmnet
15:24 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes1015.eqiad.wmnet
15:24 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubernetes1006.eqiad.wmnet
15:23 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubernetes1006.eqiad.wmnet
15:23 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes1006.eqiad.wmnet
15:21 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes1006.eqiad.wmnet
15:19 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes1006.eqiad.wmnet
15:18 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes1006.eqiad.wmnet
15:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:16 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubernetes1005.eqiad.wmnet
15:16 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubernetes1005.eqiad.wmnet
15:16 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes1005.eqiad.wmnet
15:15 ihurbain@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
15:15 ihurbain@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
15:13 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes1005.eqiad.wmnet
15:13 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes1005.eqiad.wmnet
15:13 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes1005.eqiad.wmnet
15:11 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:09 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubernetes1005.eqiad.wmnet
15:08 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubernetes1005.eqiad.wmnet
15:08 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes1005.eqiad.wmnet
15:08 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:04 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes1005.eqiad.wmnet
15:03 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes1005.eqiad.wmnet
15:02 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes1005.eqiad.wmnet
14:54 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1285-1286,1288-1289].eqiad.wmnet
14:53 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1285-1286,1288-1289].eqiad.wmnet
14:50 ihurbain@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
14:48 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002
14:42 hashar: Restarting CI Jenkins
14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2085
14:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2085
14:32 gmodena@deploy2002: Finished deploy [analytics/refinery@413e5d9] (hadoop-test): 2024-10-24 refinery hotfix deployment TEST [analytics/refinery@413e5d91] (duration: 04m 03s)
14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2085 to codfw - jhancock@cumin2002"
14:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2085 to codfw - jhancock@cumin2002"
14:27 gmodena@deploy2002: Started deploy [analytics/refinery@413e5d9] (hadoop-test): 2024-10-24 refinery hotfix deployment TEST [analytics/refinery@413e5d91]
14:24 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:24 gmodena@deploy2002: Finished deploy [analytics/refinery@413e5d9] (thin): 2024-10-24 refinery hotfix deployment THIN [analytics/refinery@413e5d91] (duration: 04m 59s)
14:22 urbanecm@deploy2002: Finished scap sync-world: Backport for Add maintenance script to move all flow boards on a wiki to a subpage (T371738), Add maintenance script to move all flow boards on a wiki to a subpage (T371738) (duration: 07m 28s)
14:22 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye
14:20 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2013.codfw.wmnet
14:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2013.codfw.wmnet
14:19 gmodena@deploy2002: Started deploy [analytics/refinery@413e5d9] (thin): 2024-10-24 refinery hotfix deployment THIN [analytics/refinery@413e5d91]
14:18 sukhe: running authdns-update for CR 1042919
14:16 gmodena@deploy2002: Finished deploy [analytics/refinery@413e5d9]: 2024-10-24 refinery hotfix deployment [analytics/refinery@413e5d91] (duration: 07m 48s)
14:15 urbanecm@deploy2002: Started scap sync-world: Backport for Add maintenance script to move all flow boards on a wiki to a subpage (T371738), Add maintenance script to move all flow boards on a wiki to a subpage (T371738)
14:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2013.codfw.wmnet
14:08 gmodena@deploy2002: Started deploy [analytics/refinery@413e5d9]: 2024-10-24 refinery hotfix deployment [analytics/refinery@413e5d91]
14:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2037.codfw.wmnet
14:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2037.codfw.wmnet
14:00 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be1066.eqiad.wmnet
13:59 mvernon@cumin1002: START - Cookbook sre.hosts.remove-downtime for ms-be1066.eqiad.wmnet
13:57 Emperor: restarting swift after vacuum on ms-be1066 T377827
13:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2037.codfw.wmnet
13:53 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be1066.eqiad.wmnet with reason: vacuum an overlarge container db
13:52 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be1066.eqiad.wmnet with reason: vacuum an overlarge container db
13:49 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1286.eqiad.wmnet with OS bookworm
13:47 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1288.eqiad.wmnet with OS bookworm
13:45 oblivian@cumin2002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool mw-web-ro in codfw: maintenance
13:43 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1289.eqiad.wmnet with OS bookworm
13:42 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2037.codfw.wmnet
13:40 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1285.eqiad.wmnet with OS bookworm
13:40 oblivian@cumin2002: START - Cookbook sre.discovery.service-route pool mw-web-ro in codfw: maintenance
13:29 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1286.eqiad.wmnet with reason: host reimage
13:26 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1288.eqiad.wmnet with reason: host reimage
13:23 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1289.eqiad.wmnet with reason: host reimage
13:23 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
13:22 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
13:20 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1285.eqiad.wmnet with reason: host reimage
13:19 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1288.eqiad.wmnet with reason: host reimage
13:18 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1289.eqiad.wmnet with reason: host reimage
13:18 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1286.eqiad.wmnet with reason: host reimage
13:16 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1285.eqiad.wmnet with reason: host reimage
13:15 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
13:14 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
13:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
12:59 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1288.eqiad.wmnet with OS bookworm
12:59 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1289.eqiad.wmnet with OS bookworm
12:58 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1286.eqiad.wmnet with OS bookworm
12:57 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1285.eqiad.wmnet with OS bookworm
12:55 btullis@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-main-eqiad cluster: Roll restart of jvm daemons.
12:46 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be1003.eqiad.wmnet
12:45 btullis@cumin1002: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-main-eqiad cluster: Roll restart of jvm daemons.
12:40 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1289.eqiad.wmnet with OS bookworm
12:38 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1286.eqiad.wmnet with OS bookworm
12:38 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host moss-be1003.eqiad.wmnet
12:37 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be2003.codfw.wmnet
12:37 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be1002.eqiad.wmnet
12:34 moritzm: bump qemu migration speed to 1000 for esams, ulsfo, eqsin, drmrs, magru Ganeti clusters
12:34 moritzm: bump qemu migration speed to 1000 for esams, ulsfo, eqsin, drmrs, magru clusters
12:33 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1288.eqiad.wmnet with OS bookworm
12:30 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host moss-be1002.eqiad.wmnet
12:29 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1285.eqiad.wmnet with OS bookworm
12:29 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-be2003.codfw.wmnet
12:28 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be2002.codfw.wmnet
12:22 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be1001.eqiad.wmnet
12:21 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1289.eqiad.wmnet with reason: host reimage
12:21 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-be2002.codfw.wmnet
12:21 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be2001.codfw.wmnet
12:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2036.codfw.wmnet
12:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2036.codfw.wmnet
12:17 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1286.eqiad.wmnet with reason: host reimage
12:15 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host moss-be1001.eqiad.wmnet
12:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2036.codfw.wmnet
12:14 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1288.eqiad.wmnet with reason: host reimage
12:13 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-be2001.codfw.wmnet
12:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2036.codfw.wmnet
12:10 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1285.eqiad.wmnet with reason: host reimage
12:08 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1288.eqiad.wmnet with reason: host reimage
12:08 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1289.eqiad.wmnet with reason: host reimage
12:07 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1286.eqiad.wmnet with reason: host reimage
12:07 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1285.eqiad.wmnet with reason: host reimage
11:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2038.codfw.wmnet
11:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2038.codfw.wmnet
11:48 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1289.eqiad.wmnet with OS bookworm
11:48 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1288.eqiad.wmnet with OS bookworm
11:48 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1286.eqiad.wmnet with OS bookworm
11:47 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1285.eqiad.wmnet with OS bookworm
11:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2038.codfw.wmnet
11:23 oblivian@cumin1002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool mw-web-ro in codfw: maintenance
11:18 oblivian@cumin1002: START - Cookbook sre.discovery.service-route depool mw-web-ro in codfw: maintenance
11:14 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-mariadb1002.eqiad.wmnet
11:07 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-mariadb1002.eqiad.wmnet
11:05 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
10:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2038.codfw.wmnet
10:51 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-cluster
10:43 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bookworm
10:38 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
10:30 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host an-redacteddb1001.eqiad.wmnet
10:27 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1285.eqiad.wmnet with OS bookworm
10:26 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: stopped being the active one, stopping replication
10:26 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: stopped being the active one, stopping replication
10:23 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-cluster
10:22 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1288.eqiad.wmnet with OS bookworm
10:22 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
10:21 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-cluster
10:21 Emperor: reboot apus frontends T376800
10:19 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1289.eqiad.wmnet with OS bookworm
10:18 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-redacteddb1001.eqiad.wmnet
10:17 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1286.eqiad.wmnet with OS bookworm
10:11 jynus@cumin1002: dbctl commit (dc=all): 'promoting pc1014 as the master of pc5 T378068', diff saved to https://phabricator.wikimedia.org/P70584 and previous config saved to /var/cache/conftool/dbconfig/20241024-101150-jynus.json
10:08 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1285.eqiad.wmnet with reason: host reimage
10:03 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1288.eqiad.wmnet with reason: host reimage
10:03 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1014.eqiad.wmnet with reason: moved pc number
10:03 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1014.eqiad.wmnet with reason: moved pc number
10:00 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1289.eqiad.wmnet with reason: host reimage
09:59 jynus: restart pc1014 T378068
09:57 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1286.eqiad.wmnet with reason: host reimage
09:57 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1289.eqiad.wmnet with reason: host reimage
09:55 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1288.eqiad.wmnet with reason: host reimage
09:54 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1285.eqiad.wmnet with reason: host reimage
09:54 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1286.eqiad.wmnet with reason: host reimage
09:37 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1289.eqiad.wmnet with OS bookworm
09:35 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1288.eqiad.wmnet with OS bookworm
09:35 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1286.eqiad.wmnet with OS bookworm
09:34 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1285.eqiad.wmnet with OS bookworm
09:28 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bookworm
09:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb1003.eqiad.wmnet
09:25 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[1285-1286,1288-1289].eqiad.wmnet
09:23 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
09:22 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
09:22 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[1285-1286,1288-1289].eqiad.wmnet
09:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb1003.eqiad.wmnet
09:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb2003.codfw.wmnet
09:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb2003.codfw.wmnet
09:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on pc[1014,1017].eqiad.wmnet with reason: pc maintenance T378068
09:12 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on pc[1014,1017].eqiad.wmnet with reason: pc maintenance T378068
08:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P70582 and previous config saved to /var/cache/conftool/dbconfig/20241024-083027-arnaudb.json
08:30 kevinbazira@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
08:27 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
08:23 moritzm: installing bash/zsh updates from bookworm point release
08:23 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
08:22 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master
08:18 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors
08:17 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors
08:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P70581 and previous config saved to /var/cache/conftool/dbconfig/20241024-081520-arnaudb.json
08:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors
08:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors
08:13 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master
08:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2035.codfw.wmnet
08:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2035.codfw.wmnet
08:01 moritzm: installing libssh2 security updates
08:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2035.codfw.wmnet
08:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P70580 and previous config saved to /var/cache/conftool/dbconfig/20241024-080013-arnaudb.json
08:00 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
07:59 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
07:57 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2035.codfw.wmnet
07:56 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
07:56 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
07:55 elukey: restart ircstream on irc.wikimedia.org to remove a performance experiment
07:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P70579 and previous config saved to /var/cache/conftool/dbconfig/20241024-074506-arnaudb.json
07:34 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-pii (exit_code=0) Checking PII for wikis annwiki in section s5
07:33 arnaudb@cumin1002: START - Cookbook sre.mysql.sanitize-pii Checking PII for wikis annwiki in section s5
07:32 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-pii (exit_code=0) Setting up permissions and view database PII for wikis annwiki in section s5
07:32 arnaudb@cumin1002: START - Cookbook sre.mysql.sanitize-pii Setting up permissions and view database PII for wikis annwiki in section s5
07:15 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti2039.codfw.wmnet
07:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2039.codfw.wmnet
06:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P70578 and previous config saved to /var/cache/conftool/dbconfig/20241024-064440-arnaudb.json
06:44 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
06:44 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
06:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70577 and previous config saved to /var/cache/conftool/dbconfig/20241024-064418-arnaudb.json
06:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70576 and previous config saved to /var/cache/conftool/dbconfig/20241024-062910-arnaudb.json
06:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70575 and previous config saved to /var/cache/conftool/dbconfig/20241024-061403-arnaudb.json
05:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70574 and previous config saved to /var/cache/conftool/dbconfig/20241024-055856-arnaudb.json
04:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70573 and previous config saved to /var/cache/conftool/dbconfig/20241024-045830-arnaudb.json
04:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
04:58 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
04:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
04:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
03:57 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
03:57 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
03:57 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
03:56 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
03:56 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
03:55 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
03:55 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
03:55 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
03:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
03:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
03:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
03:53 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
02:15 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye
01:01 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
00:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:44 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on gerrit2003.wikimedia.org with reason: in setup and T338470
00:44 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on gerrit2003.wikimedia.org with reason: in setup and T338470
00:26 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on gerrit2003.wikimedia.org with reason: reboot
00:26 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on gerrit2003.wikimedia.org with reason: reboot
00:26 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:22 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:22 mutante: gerrit2003 rebooting for T338470
00:15 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:14 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:05 dzahn@cumin2002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: security release 20241023

2024-10-23

23:47 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-eqiad: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002
23:46 reedy@deploy2002: Finished scap sync-world: T378006 (duration: 07m 09s)
23:39 reedy@deploy2002: Started scap sync-world: T378006
22:21 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-eqiad: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002
22:08 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-codfw: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002
21:59 urbanecm@deploy2002: Finished scap sync-world: Backport for throttle: Add exemption for WikiArabia (T377957) (duration: 07m 06s)
21:52 urbanecm@deploy2002: Started scap sync-world: Backport for throttle: Add exemption for WikiArabia (T377957)
21:22 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: security release 20241023
21:16 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye
away: UTC late deploys done
21:05 tgr@deploy2002: Finished scap sync-world: Backport for SessionManager: Add more logging when unpersisting invalid sessions (T372702), Log unexpected central session lookup misses (T372702) (duration: 15m 07s)
21:02 dzahn@cumin2002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: security release 20241023
21:00 tgr@deploy2002: tgr: Continuing with sync
20:55 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release 20241023
20:53 dzahn@cumin2002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: security release 20241023
20:52 tgr@deploy2002: tgr: Backport for SessionManager: Add more logging when unpersisting invalid sessions (T372702), Log unexpected central session lookup misses (T372702) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:50 tgr@deploy2002: Started scap sync-world: Backport for SessionManager: Add more logging when unpersisting invalid sessions (T372702), Log unexpected central session lookup misses (T372702)
20:46 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: security release 20241023
20:41 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-codfw: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002
20:40 eileen: civicrm upgraded from e787e5f2 to 1c6c4e08
20:06 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:05 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:02 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
19:56 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:55 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:46 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:46 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:35 eileen: civicrm upgraded from ce44ce45 to e787e5f2
19:18 dancy@deploy2002: Finished scap sync-world: Backport for Adjust return type documentation on SuggestedEdits (T378003) (duration: 13m 20s)
19:13 dancy@deploy2002: dancy: Continuing with sync
19:13 dancy@deploy2002: dancy: Backport for Adjust return type documentation on SuggestedEdits (T378003) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
19:09 sukhe: dummy authdns-update run
19:04 dancy@deploy2002: Started scap sync-world: Backport for Adjust return type documentation on SuggestedEdits (T378003)
18:53 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
18:53 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
18:46 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:26 dancy@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.28 refs T375659
18:09 sukhe: running agent on A:dnsbox
17:43 urbanecm@deploy2002: Finished scap sync-world: Backport for StructuredTaskMobileArticleTarget: Fix history hacks to avoid firing events (T377907) (duration: 11m 56s)
17:38 urbanecm@deploy2002: urbanecm: Continuing with sync
17:33 urbanecm@deploy2002: urbanecm: Backport for StructuredTaskMobileArticleTarget: Fix history hacks to avoid firing events (T377907) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
17:31 urbanecm@deploy2002: Started scap sync-world: Backport for StructuredTaskMobileArticleTarget: Fix history hacks to avoid firing events (T377907)
17:03 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
17:02 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
17:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
17:00 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
16:58 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
16:57 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
16:52 sukhe: restart ircecho on alerting hosts
16:35 sukhe: sudo cumin 'O:alerting_host or O:dnsbox' 'run-puppet-agent'
16:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
16:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
16:30 hnowlan@cumin1002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool sessionstore in codfw: sessionstore mesh migration T363996
16:25 hnowlan@cumin1002: START - Cookbook sre.discovery.service-route pool sessionstore in codfw: sessionstore mesh migration T363996
16:22 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
16:22 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply
16:21 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
16:20 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply
16:19 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
16:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
16:15 hnowlan@cumin1002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool sessionstore in codfw: sessionstore mesh migration T363996
16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
16:09 hnowlan@cumin1002: START - Cookbook sre.discovery.service-route depool sessionstore in codfw: sessionstore mesh migration T363996
15:57 btullis@deploy2002: Finished deploy [airflow-dags/analytics_product@ba61f77]: T351388 (duration: 01m 15s)
15:56 btullis@deploy2002: Started deploy [airflow-dags/analytics_product@ba61f77]: T351388
15:55 btullis@deploy2002: Finished deploy [airflow-dags/platform_eng@ba61f77]: T351388 (duration: 00m 31s)
15:55 btullis@deploy2002: Started deploy [airflow-dags/platform_eng@ba61f77]: T351388
15:55 btullis@deploy2002: Finished deploy [airflow-dags/research@ba61f77]: T351388 (duration: 00m 45s)
15:54 btullis@deploy2002: Started deploy [airflow-dags/research@ba61f77]: T351388
15:53 btullis@deploy2002: Finished deploy [airflow-dags/search@ba61f77]: T351388 (duration: 00m 29s)
15:53 btullis@deploy2002: Started deploy [airflow-dags/search@ba61f77]: T351388
15:52 btullis@deploy2002: Finished deploy [airflow-dags/analytics@ba61f77]: T351388 (duration: 01m 08s)
15:51 btullis@deploy2002: Started deploy [airflow-dags/analytics@ba61f77]: T351388
15:51 btullis@deploy2002: Finished deploy [airflow-dags/analytics_test@ba61f77]: T351388 (duration: 00m 31s)
15:51 btullis@deploy2002: Started deploy [airflow-dags/analytics_test@ba61f77]: T351388
15:42 dduvall@deploy2002: Finished deploy [releng/jenkins-deploy@e1c56d1] (releasing): Deploying https://gitlab.wikimedia.org/repos/releng/jenkins-deploy/-/merge_requests/95 (duration: 00m 53s)
15:42 dduvall@deploy2002: Started deploy [releng/jenkins-deploy@e1c56d1] (releasing): Deploying https://gitlab.wikimedia.org/repos/releng/jenkins-deploy/-/merge_requests/95
15:35 hashar: Restarted CI Jenkins
15:28 moritzm: uploaded openjdk-8 8u422-b05-1~deb12u0 for component/jdk for bookworm-wikimedia (bootstrap build since openjdk-8 needs openjdk-8 to build)
15:20 dduvall@deploy2002: Finished deploy [releng/jenkins-deploy@d8e345f] (releasing): Deploying https://gitlab.wikimedia.org/repos/releng/jenkins-deploy/-/merge_requests/94 (duration: 01m 05s)
15:19 dduvall@deploy2002: Started deploy [releng/jenkins-deploy@d8e345f] (releasing): Deploying https://gitlab.wikimedia.org/repos/releng/jenkins-deploy/-/merge_requests/94
15:17 Lucas_WMDE: UTC afternoon backport+config window done
15:15 logmsgbot: lucaswerkmeister-wmde Deployed security patch for T377912
15:10 volans: uploaded spicerack_8.15.1 to apt.wikimedia.org bullseye-wikimedia
15:04 stran@deploy2002: Finished scap sync-world: Backport for Support template overrides in ContributionsPager (T356292), Add source wiki to contributions on Special:GlobalContributions (T356292) (duration: 10m 53s)
14:59 stran@deploy2002: stran: Continuing with sync
14:55 stran@deploy2002: stran: Backport for Support template overrides in ContributionsPager (T356292), Add source wiki to contributions on Special:GlobalContributions (T356292) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on rdb1014.eqiad.wmnet with reason: Hardware issue
14:53 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on rdb1014.eqiad.wmnet with reason: Hardware issue
14:53 stran@deploy2002: Started scap sync-world: Backport for Support template overrides in ContributionsPager (T356292), Add source wiki to contributions on Special:GlobalContributions (T356292)
14:46 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:46 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:46 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:43 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Enable CampaignEvents collaboration list in testwiki and test2wiki (v2) (T376055) (duration: 17m 47s)
14:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:39 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, daimona: Continuing with sync
14:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:28 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:28 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, daimona: Backport for Enable CampaignEvents collaboration list in testwiki and test2wiki (v2) (T376055) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:25 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Enable CampaignEvents collaboration list in testwiki and test2wiki (v2) (T376055)
14:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:22 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:22 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:21 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:21 tgr@deploy2002: Finished scap sync-world: Backport for Auth: pass accountType to authevents log stream (T341650 T375510 T375505), Auth: pass accountType to authevents log stream (T341650 T375510 T375505) (duration: 13m 23s)
14:18 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:18 sukhe: sudo cumin 'O:alerting_host' 'run-puppet-agent'
14:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:16 tgr@deploy2002: tgr: Continuing with sync
14:14 sukhe: sudo cumin 'A:dnsbox' 'run-puppet-agent'
14:10 tgr@deploy2002: tgr: Backport for Auth: pass accountType to authevents log stream (T341650 T375510 T375505), Auth: pass accountType to authevents log stream (T341650 T375510 T375505) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:07 tgr@deploy2002: Started scap sync-world: Backport for Auth: pass accountType to authevents log stream (T341650 T375510 T375505), Auth: pass accountType to authevents log stream (T341650 T375510 T375505)
13:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2040.codfw.wmnet
13:54 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs and not A:ulsfo and A:lvs
13:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2040.codfw.wmnet
13:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:37 moritzm: instaling gdk-pixbuf security updates
13:34 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
13:34 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
13:34 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
13:33 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
13:33 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
13:33 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
13:32 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
13:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for WikiProjectIDLookup: use SparqlClient and make endpoint configurable (T377746) (duration: 07m 15s)
13:27 lucaswerkmeister-wmde@deploy2002: daimona, lucaswerkmeister-wmde: Continuing with sync
13:27 lucaswerkmeister-wmde@deploy2002: daimona, lucaswerkmeister-wmde: Backport for WikiProjectIDLookup: use SparqlClient and make endpoint configurable (T377746) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:26 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs and not A:ulsfo and A:lvs
13:24 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for WikiProjectIDLookup: use SparqlClient and make endpoint configurable (T377746)
13:18 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs and A:ulsfo and A:lvs
13:15 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs and A:ulsfo and A:lvs
13:13 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
13:13 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
13:12 moritzm: installing qemu security updates
13:12 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:11 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:09 sukhe: running agent on A:lvs to roll out CR 1082238
13:02 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2040.codfw.wmnet to cluster codfw and group C
12:53 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2040.codfw.wmnet to cluster codfw and group C
12:26 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2039.codfw.wmnet to cluster codfw and group C
12:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2039.codfw.wmnet to cluster codfw and group C
12:16 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2039.codfw.wmnet to cluster codfw and group C
12:16 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2039.codfw.wmnet to cluster codfw and group C
11:49 dreamyjazz@deploy2002: Finished scap sync-world: Backport for recentchanges: Use current time for imported revision category changes (T377932) (duration: 07m 26s)
11:44 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
11:44 dreamyjazz@deploy2002: dreamyjazz: Backport for recentchanges: Use current time for imported revision category changes (T377932) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:41 dreamyjazz@deploy2002: Started scap sync-world: Backport for recentchanges: Use current time for imported revision category changes (T377932)
11:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2012.codfw.wmnet
11:11 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
11:11 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/zotero: apply
11:09 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
11:09 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/zotero: apply
11:05 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply
11:05 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply
10:53 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
10:51 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
10:45 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
10:45 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
10:43 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
10:43 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
10:14 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
10:13 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
10:13 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
10:13 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
10:12 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
10:12 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
10:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2040.codfw.wmnet
10:05 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
10:03 Dreamy_Jazz: Restarted MediaModeration scanning script for commonswiki - https://wikitech.wikimedia.org/wiki/MediaModeration
09:59 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
09:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2040.codfw.wmnet
09:42 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2039.codfw.wmnet
09:34 volans@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1185 gradually with 4 steps - Testing new cookbook
09:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2039.codfw.wmnet
09:30 volans@cumin1002: START - Cookbook sre.mysql.pool db1185 gradually with 4 steps - Testing new cookbook
09:29 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
09:29 volans@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1185 - Testing new cookbook
09:29 volans@cumin1002: START - Cookbook sre.mysql.depool db1185 - Testing new cookbook
09:24 volans@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1185 gradually with 4 steps - Testing new cookbook
09:24 volans@cumin1002: START - Cookbook sre.mysql.pool db1185 gradually with 4 steps - Testing new cookbook
09:09 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner2004.codfw.wmnet with OS bullseye
09:02 Tran: UTC morning deploys done
08:48 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:48 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:32 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner2004.codfw.wmnet with reason: host reimage
08:29 moritzm: installing Java 11 security updates
08:28 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner2004.codfw.wmnet with reason: host reimage
08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1052.eqiad.wmnet
08:26 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: new JDK - jmm@cumin2002
08:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1052.eqiad.wmnet
08:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1051.eqiad.wmnet
08:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1051.eqiad.wmnet
08:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1050.eqiad.wmnet
08:12 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host gitlab-runner2004
08:12 jelto@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host gitlab-runner2004
08:12 jelto@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host gitlab-runner2004
08:12 jelto@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) gitlab-runner2004.codfw.wmnet 71.48.192.10.in-addr.arpa 1.7.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
08:12 jelto@cumin1002: START - Cookbook sre.dns.wipe-cache gitlab-runner2004.codfw.wmnet 71.48.192.10.in-addr.arpa 1.7.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
08:12 jelto@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:11 jelto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host gitlab-runner2004 - jelto@cumin1002"
08:11 jelto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host gitlab-runner2004 - jelto@cumin1002"
08:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1050.eqiad.wmnet
08:08 jelto@cumin1002: START - Cookbook sre.dns.netbox
08:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet
08:07 jelto@cumin1002: START - Cookbook sre.hosts.move-vlan for host gitlab-runner2004
08:07 jelto@cumin1002: START - Cookbook sre.hosts.reimage for host gitlab-runner2004.codfw.wmnet with OS bullseye
08:06 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: new JDK - jmm@cumin2002
08:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet
07:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: reboot
07:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: reboot
07:52 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner2003.codfw.wmnet with OS bullseye
07:35 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner2003.codfw.wmnet with reason: host reimage
07:33 moritzm: installing perf updates on bookworm nodes
07:32 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner2003.codfw.wmnet with reason: host reimage
07:24 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2012.codfw.wmnet
07:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd2002.codfw.wmnet to plain
07:23 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd2002.codfw.wmnet to plain
07:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2012.codfw.wmnet
07:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2012.codfw.wmnet
07:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd2002.codfw.wmnet to drbd
07:15 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host gitlab-runner2003
07:15 jelto@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host gitlab-runner2003
07:15 jelto@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host gitlab-runner2003
07:15 jelto@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) gitlab-runner2003.codfw.wmnet 93.32.192.10.in-addr.arpa 3.9.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
07:15 jelto@cumin1002: START - Cookbook sre.dns.wipe-cache gitlab-runner2003.codfw.wmnet 93.32.192.10.in-addr.arpa 3.9.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
07:15 jelto@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:15 jelto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host gitlab-runner2003 - jelto@cumin1002"
07:15 jelto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host gitlab-runner2003 - jelto@cumin1002"
07:12 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd2002.codfw.wmnet to drbd
07:11 jelto@cumin1002: START - Cookbook sre.dns.netbox
07:11 jelto@cumin1002: START - Cookbook sre.hosts.move-vlan for host gitlab-runner2003
07:10 jelto@cumin1002: START - Cookbook sre.hosts.reimage for host gitlab-runner2003.codfw.wmnet with OS bullseye
06:48 kart_: Updated cxserver to 2024-10-23-055433-production
06:47 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
06:47 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
06:45 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
06:44 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
06:44 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
06:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2012.codfw.wmnet
06:35 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2012.codfw.wmnet
05:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast3007.wikimedia.org
05:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast3007.wikimedia.org
04:18 eileen: civicrm upgraded from de642bea to ce44ce45
00:01 ejegg: fundraising civicrm upgraded from 5463f37b to de642bea

2024-10-22

23:32 ejegg: fundraising civicrm upgraded from d9e85c3d to 5463f37b
22:59 ejegg: fundraising civicrm upgraded from 36660cb3 to d9e85c3d
22:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P70562 and previous config saved to /var/cache/conftool/dbconfig/20241022-223858-ladsgroup.json
22:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P70561 and previous config saved to /var/cache/conftool/dbconfig/20241022-222352-ladsgroup.json
22:11 zabe@deploy2002: Finished scap sync-world: Backport for s1: Reduce revision-slots cache expiry to 60 seconds (T183490) (duration: 07m 17s)
22:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P70560 and previous config saved to /var/cache/conftool/dbconfig/20241022-220847-ladsgroup.json
22:07 zabe@deploy2002: zabe: Continuing with sync
22:06 zabe@deploy2002: zabe: Backport for s1: Reduce revision-slots cache expiry to 60 seconds (T183490) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:03 zabe@deploy2002: Started scap sync-world: Backport for s1: Reduce revision-slots cache expiry to 60 seconds (T183490)
21:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T367856)', diff saved to https://phabricator.wikimedia.org/P70559 and previous config saved to /var/cache/conftool/dbconfig/20241022-215137-ladsgroup.json
21:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncmonitor1001.eqiad.wmnet
21:44 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host ncmonitor1001.eqiad.wmnet
21:44 dancy@deploy2002: Installation of scap version "4.117.0" completed for 209 hosts
21:40 dancy@deploy2002: Installing scap version "4.117.0" for 209 hosts
21:01 dduvall@deploy2002: Finished deploy [releng/jenkins-deploy@b08d130] (releasing): Deploying changes to single-version MediaWiki image build (duration: 01m 44s)
21:00 dduvall@deploy2002: Started deploy [releng/jenkins-deploy@b08d130] (releasing): Deploying changes to single-version MediaWiki image build
20:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
20:32 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
20:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
20:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
20:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T376905)', diff saved to https://phabricator.wikimedia.org/P70558 and previous config saved to /var/cache/conftool/dbconfig/20241022-202717-ladsgroup.json
20:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P70557 and previous config saved to /var/cache/conftool/dbconfig/20241022-201210-ladsgroup.json
19:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P70556 and previous config saved to /var/cache/conftool/dbconfig/20241022-195703-ladsgroup.json
19:54 swfrench-wmf: running puppet on A:cp-text (-b11) after validating ATS Lua changes on cp4040 - T372605
19:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T376905)', diff saved to https://phabricator.wikimedia.org/P70555 and previous config saved to /var/cache/conftool/dbconfig/20241022-194156-ladsgroup.json
19:40 swfrench-wmf: disabling puppet on A:cp-text before merging ATS Lua changes - T372605
19:39 ladsgroup@deploy2002: Finished scap sync-world: Backport for Fix duplicated key in wgVectorNightMode (duration: 07m 51s)
19:36 ladsgroup@deploy2002: ladsgroup, ebrahim: Continuing with sync
19:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T376905)', diff saved to https://phabricator.wikimedia.org/P70554 and previous config saved to /var/cache/conftool/dbconfig/20241022-193352-ladsgroup.json
19:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1235.eqiad.wmnet with reason: Maintenance
19:34 ladsgroup@deploy2002: ladsgroup, ebrahim: Backport for Fix duplicated key in wgVectorNightMode synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
19:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1235.eqiad.wmnet with reason: Maintenance
19:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T376905)', diff saved to https://phabricator.wikimedia.org/P70553 and previous config saved to /var/cache/conftool/dbconfig/20241022-193327-ladsgroup.json
19:31 ladsgroup@deploy2002: Started scap sync-world: Backport for Fix duplicated key in wgVectorNightMode
19:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P70552 and previous config saved to /var/cache/conftool/dbconfig/20241022-191820-ladsgroup.json
19:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P70551 and previous config saved to /var/cache/conftool/dbconfig/20241022-190313-ladsgroup.json
19:00 dduvall@deploy2002: Installation of scap version "4.116.0" completed for 209 hosts
18:56 dduvall@deploy2002: Installing scap version "4.116.0" for 209 hosts
18:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P70550 and previous config saved to /var/cache/conftool/dbconfig/20241022-184946-arnaudb.json
18:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T376905)', diff saved to https://phabricator.wikimedia.org/P70549 and previous config saved to /var/cache/conftool/dbconfig/20241022-184806-ladsgroup.json
18:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T376905)', diff saved to https://phabricator.wikimedia.org/P70548 and previous config saved to /var/cache/conftool/dbconfig/20241022-183955-ladsgroup.json
18:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: Maintenance
18:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: Maintenance
18:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T376905)', diff saved to https://phabricator.wikimedia.org/P70547 and previous config saved to /var/cache/conftool/dbconfig/20241022-183930-ladsgroup.json
18:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P70546 and previous config saved to /var/cache/conftool/dbconfig/20241022-183440-arnaudb.json
18:26 dancy@deploy2002: sync-world aborted: Refreshing (duration: 01m 33s)
18:24 dancy@deploy2002: Started scap sync-world: Refreshing
18:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P70544 and previous config saved to /var/cache/conftool/dbconfig/20241022-182423-ladsgroup.json
18:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P70543 and previous config saved to /var/cache/conftool/dbconfig/20241022-181933-arnaudb.json
18:17 dancy@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.28 refs T375659
18:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P70542 and previous config saved to /var/cache/conftool/dbconfig/20241022-180916-ladsgroup.json
18:09 dancy@deploy2002: Finished scap sync-world: Backport for Prevent blocked users from being able to review/unreview articles (T366991) (duration: 07m 26s)
18:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P70541 and previous config saved to /var/cache/conftool/dbconfig/20241022-180426-arnaudb.json
18:04 dancy@deploy2002: dancy, sbassett: Continuing with sync
18:04 dancy@deploy2002: dancy, sbassett: Backport for Prevent blocked users from being able to review/unreview articles (T366991) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
18:01 dancy@deploy2002: Started scap sync-world: Backport for Prevent blocked users from being able to review/unreview articles (T366991)
17:54 sukhe: sudo cumin -b4 "A:cp-upload" 'run-puppet-agent --enable "merging CR 1078994"': T375761
17:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T376905)', diff saved to https://phabricator.wikimedia.org/P70540 and previous config saved to /var/cache/conftool/dbconfig/20241022-175409-ladsgroup.json
17:50 dduvall@deploy2002: Finished deploy [releng/jenkins-deploy@16eb792] (releasing): Deploying https://gitlab.wikimedia.org/repos/releng/jenkins-deploy/-/merge_requests/90 (duration: 01m 21s)
17:49 dduvall@deploy2002: Started deploy [releng/jenkins-deploy@16eb792] (releasing): Deploying https://gitlab.wikimedia.org/repos/releng/jenkins-deploy/-/merge_requests/90
17:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T376905)', diff saved to https://phabricator.wikimedia.org/P70539 and previous config saved to /var/cache/conftool/dbconfig/20241022-174555-ladsgroup.json
17:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: Maintenance
17:45 sukhe: sudo cumin "A:cp-upload" 'disable-puppet "merging CR 1078994"': T375761
17:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: Maintenance
17:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T376905)', diff saved to https://phabricator.wikimedia.org/P70538 and previous config saved to /var/cache/conftool/dbconfig/20241022-174530-ladsgroup.json
17:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P70537 and previous config saved to /var/cache/conftool/dbconfig/20241022-173022-ladsgroup.json
17:30 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2014.codfw.wmnet
17:23 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs2014.codfw.wmnet
17:18 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs2014.codfw.wmnet with reason: rebooting to test changes rolled out in CR 1006063
17:17 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on lvs2014.codfw.wmnet with reason: rebooting to test changes rolled out in CR 1006063
17:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P70536 and previous config saved to /var/cache/conftool/dbconfig/20241022-171515-ladsgroup.json
17:14 sukhe: re-enable Puppet on A:lvs [change merged on lvs2014]: T358260
17:09 hnowlan@cumin1002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool sessionstore in eqiad: repooling sessionstore post mesh migration T363996
17:04 hnowlan@cumin1002: START - Cookbook sre.discovery.service-route pool sessionstore in eqiad: repooling sessionstore post mesh migration T363996
17:04 sukhe: disable Puppet on A:lvs to merge 1006063: T358260
17:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P70535 and previous config saved to /var/cache/conftool/dbconfig/20241022-170400-arnaudb.json
17:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
17:03 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
17:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P70534 and previous config saved to /var/cache/conftool/dbconfig/20241022-170337-arnaudb.json
17:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T376905)', diff saved to https://phabricator.wikimedia.org/P70533 and previous config saved to /var/cache/conftool/dbconfig/20241022-170008-ladsgroup.json
16:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1219 (T376905)', diff saved to https://phabricator.wikimedia.org/P70532 and previous config saved to /var/cache/conftool/dbconfig/20241022-165211-ladsgroup.json
16:52 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1176.eqiad.wmnet
16:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
16:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T376905)', diff saved to https://phabricator.wikimedia.org/P70531 and previous config saved to /var/cache/conftool/dbconfig/20241022-165147-ladsgroup.json
16:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P70530 and previous config saved to /var/cache/conftool/dbconfig/20241022-164830-arnaudb.json
16:47 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
16:46 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
16:46 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye
16:44 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
16:44 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
16:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P70529 and previous config saved to /var/cache/conftool/dbconfig/20241022-163639-ladsgroup.json
16:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P70528 and previous config saved to /var/cache/conftool/dbconfig/20241022-163323-arnaudb.json
16:31 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1176.eqiad.wmnet
16:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P70527 and previous config saved to /var/cache/conftool/dbconfig/20241022-162132-ladsgroup.json
16:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P70526 and previous config saved to /var/cache/conftool/dbconfig/20241022-161816-arnaudb.json
16:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P70525 and previous config saved to /var/cache/conftool/dbconfig/20241022-161604-arnaudb.json
16:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2172.codfw.wmnet with reason: Maintenance
16:15 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2172.codfw.wmnet with reason: Maintenance
16:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70524 and previous config saved to /var/cache/conftool/dbconfig/20241022-161552-arnaudb.json
16:14 hnowlan@cumin1002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool sessionstore in eqiad: testing sessionstore mesh migration
16:08 hnowlan@cumin1002: START - Cookbook sre.discovery.service-route depool sessionstore in eqiad: testing sessionstore mesh migration
16:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T376905)', diff saved to https://phabricator.wikimedia.org/P70523 and previous config saved to /var/cache/conftool/dbconfig/20241022-160625-ladsgroup.json
16:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70522 and previous config saved to /var/cache/conftool/dbconfig/20241022-160045-arnaudb.json
15:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5004.wikimedia.org
15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1218 (T376905)', diff saved to https://phabricator.wikimedia.org/P70521 and previous config saved to /var/cache/conftool/dbconfig/20241022-155824-ladsgroup.json
15:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
15:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T376905)', diff saved to https://phabricator.wikimedia.org/P70520 and previous config saved to /var/cache/conftool/dbconfig/20241022-155759-ladsgroup.json
15:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2011.codfw.wmnet
15:54 cjming@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
15:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5004.wikimedia.org
15:53 cjming@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
15:53 hnowlan@cumin1002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) check sessionstore: maintenance
15:53 hnowlan@cumin1002: START - Cookbook sre.discovery.service-route check sessionstore: maintenance
15:52 cjming@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
15:52 cjming@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
15:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70519 and previous config saved to /var/cache/conftool/dbconfig/20241022-154538-arnaudb.json
15:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P70518 and previous config saved to /var/cache/conftool/dbconfig/20241022-154251-ladsgroup.json
15:39 sbassett@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
15:38 sbassett@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
15:38 sbassett@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
15:38 sbassett@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
15:38 sbassett@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
15:38 sbassett@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
15:37 sbassett@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
15:37 sbassett@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
15:36 sbassett@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
15:36 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
15:36 sbassett@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
15:35 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
15:32 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
15:31 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
15:30 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
15:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70517 and previous config saved to /var/cache/conftool/dbconfig/20241022-153031-arnaudb.json
15:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P70516 and previous config saved to /var/cache/conftool/dbconfig/20241022-152743-ladsgroup.json
15:19 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
15:19 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
15:18 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
15:18 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
15:15 aqu: Deployed refinery using scap, then deployed onto hdfs
15:14 cgoubert@cumin1002: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) check for host kubestagemaster2003.codfw.wmnet
15:14 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node check for host kubestagemaster2003.codfw.wmnet
15:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T376905)', diff saved to https://phabricator.wikimedia.org/P70515 and previous config saved to /var/cache/conftool/dbconfig/20241022-151237-ladsgroup.json
15:11 gmodena@deploy2002: Finished deploy [airflow-dags/analytics@7c2d65f]: DPE 2024-10-22 deployment train (duration: 01m 16s)
15:10 gmodena@deploy2002: Started deploy [airflow-dags/analytics@7c2d65f]: DPE 2024-10-22 deployment train
15:09 brennen@deploy2002: Finished deploy [phabricator/deployment@582cde5]: deploy phab1004 for T377850 (duration: 01m 04s)
15:08 brennen@deploy2002: Started deploy [phabricator/deployment@582cde5]: deploy phab1004 for T377850
15:07 brennen@deploy2002: Finished deploy [phabricator/deployment@582cde5]: test deploy phab2002 for T377850 (may fail, expected) (duration: 00m 24s)
15:07 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:07 eoghan@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator deployment
15:07 eoghan@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator deployment
15:07 brennen@deploy2002: Started deploy [phabricator/deployment@582cde5]: test deploy phab2002 for T377850 (may fail, expected)
15:06 eoghan@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on phabricator.wikimedia.org with reason: Phabricator deployment
15:06 eoghan@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phabricator.wikimedia.org with reason: Phabricator deployment
15:06 eoghan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator deployment
15:06 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:06 eoghan@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator deployment
15:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1207 (T376905)', diff saved to https://phabricator.wikimedia.org/P70514 and previous config saved to /var/cache/conftool/dbconfig/20241022-150435-ladsgroup.json
15:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
15:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
15:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T376905)', diff saved to https://phabricator.wikimedia.org/P70513 and previous config saved to /var/cache/conftool/dbconfig/20241022-150409-ladsgroup.json
14:57 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 100%: T377718', diff saved to https://phabricator.wikimedia.org/P70512 and previous config saved to /var/cache/conftool/dbconfig/20241022-145653-arnaudb.json
14:53 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:52 hashar@deploy2002: Finished deploy [gerrit/gerrit@30691f2]: Update patch demo to recognize both legacy and new URLs - T374954 (duration: 00m 10s)
14:52 hashar@deploy2002: Started deploy [gerrit/gerrit@30691f2]: Update patch demo to recognize both legacy and new URLs - T374954
14:50 jmm@cumin2002: END (PASS) - Cookbook sre.netbox.restart-reboot (exit_code=0) rolling reboot on A:netbox
14:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P70511 and previous config saved to /var/cache/conftool/dbconfig/20241022-144902-ladsgroup.json
14:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 75%: T377718', diff saved to https://phabricator.wikimedia.org/P70510 and previous config saved to /var/cache/conftool/dbconfig/20241022-144148-arnaudb.json
14:40 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox.discovery.wmnet. on all recursors
14:40 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netbox.discovery.wmnet. on all recursors
14:37 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:37 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2084 to codfw - jhancock@cumin2002"
14:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2084 to codfw - jhancock@cumin2002"
14:36 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 100%: post clone', diff saved to https://phabricator.wikimedia.org/P70509 and previous config saved to /var/cache/conftool/dbconfig/20241022-143628-arnaudb.json
14:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox.discovery.wmnet. on all recursors
14:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netbox.discovery.wmnet. on all recursors
14:34 jmm@cumin2002: START - Cookbook sre.netbox.restart-reboot rolling reboot on A:netbox
14:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P70507 and previous config saved to /var/cache/conftool/dbconfig/20241022-143355-ladsgroup.json
14:32 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Fix performer link on Special:GlobalBlockList (T377398) (duration: 07m 43s)
14:31 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70506 and previous config saved to /var/cache/conftool/dbconfig/20241022-143005-arnaudb.json
14:30 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
14:29 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
14:29 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
14:29 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
14:27 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
14:27 dreamyjazz@deploy2002: dreamyjazz: Backport for Fix performer link on Special:GlobalBlockList (T377398) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 50%: T377718', diff saved to https://phabricator.wikimedia.org/P70505 and previous config saved to /var/cache/conftool/dbconfig/20241022-142642-arnaudb.json
14:24 dreamyjazz@deploy2002: Started scap sync-world: Backport for Fix performer link on Special:GlobalBlockList (T377398)
14:21 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 75%: post clone', diff saved to https://phabricator.wikimedia.org/P70504 and previous config saved to /var/cache/conftool/dbconfig/20241022-142123-arnaudb.json
14:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T376905)', diff saved to https://phabricator.wikimedia.org/P70503 and previous config saved to /var/cache/conftool/dbconfig/20241022-141848-ladsgroup.json
14:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 25%: T377718', diff saved to https://phabricator.wikimedia.org/P70502 and previous config saved to /var/cache/conftool/dbconfig/20241022-141137-arnaudb.json
14:10 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2011.codfw.wmnet
14:10 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2011.codfw.wmnet
14:10 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2011.codfw.wmnet
14:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T376905)', diff saved to https://phabricator.wikimedia.org/P70501 and previous config saved to /var/cache/conftool/dbconfig/20241022-140956-ladsgroup.json
14:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
14:09 ejegg: payments-wiki upgraded from 7ae3479f to a039cd50
14:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
14:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T376905)', diff saved to https://phabricator.wikimedia.org/P70500 and previous config saved to /var/cache/conftool/dbconfig/20241022-140931-ladsgroup.json
14:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2011.codfw.wmnet
14:06 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 50%: post clone', diff saved to https://phabricator.wikimedia.org/P70499 and previous config saved to /var/cache/conftool/dbconfig/20241022-140617-arnaudb.json
14:03 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2011.codfw.wmnet
13:59 moritzm: rebalance ganeti clusters in magru following reboots
13:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7001.magru.wmnet
13:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7001.magru.wmnet
13:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 10%: T377718', diff saved to https://phabricator.wikimedia.org/P70498 and previous config saved to /var/cache/conftool/dbconfig/20241022-135631-arnaudb.json
13:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P70497 and previous config saved to /var/cache/conftool/dbconfig/20241022-135424-ladsgroup.json
13:52 Lucas_WMDE: UTC afternoon backport+window done (a further GlobalBlocking fix will be backported out-of-window soon)
13:51 aqu@deploy2002: Finished deploy [analytics/refinery@ffc985a] (hadoop-test): Adding refinery/source 0.2.49.2 & 0.2.53 [analytics/refinery@ffc985a7] (duration: 03m 17s)
13:51 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 25%: post clone', diff saved to https://phabricator.wikimedia.org/P70496 and previous config saved to /var/cache/conftool/dbconfig/20241022-135112-arnaudb.json
13:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7001.magru.wmnet
13:48 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7001.magru.wmnet
13:48 aqu@deploy2002: Started deploy [analytics/refinery@ffc985a] (hadoop-test): Adding refinery/source 0.2.49.2 & 0.2.53 [analytics/refinery@ffc985a7]
13:48 aqu@deploy2002: Finished deploy [analytics/refinery@ffc985a] (thin): Adding refinery/source 0.2.49.2 & 0.2.53 [analytics/refinery@ffc985a7] (duration: 00m 07s)
13:48 aqu@deploy2002: Started deploy [analytics/refinery@ffc985a] (thin): Adding refinery/source 0.2.49.2 & 0.2.53 [analytics/refinery@ffc985a7]
13:47 aqu@deploy2002: Finished deploy [analytics/refinery@ffc985a] (thin): Adding refinery/source 0.2.49.2 & 0.2.53 [analytics/refinery@ffc985a7] (duration: 00m 57s)
13:47 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
13:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
13:46 aqu@deploy2002: Started deploy [analytics/refinery@ffc985a] (thin): Adding refinery/source 0.2.49.2 & 0.2.53 [analytics/refinery@ffc985a7]
13:45 aqu@deploy2002: deploy aborted: Adding refinery/source 0.2.49.2 & 0.2.53 [analytics/refinery@ffc985a7] (duration: 03m 50s)
13:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
13:44 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Activate feature flag to default move wikibase sidebar link to other projects. (T66315) (duration: 08m 40s)
13:41 aqu@deploy2002: Started deploy [analytics/refinery@ffc985a] (thin): Adding refinery/source 0.2.49.2 & 0.2.53 [analytics/refinery@ffc985a7]
13:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 5%: T377718', diff saved to https://phabricator.wikimedia.org/P70495 and previous config saved to /var/cache/conftool/dbconfig/20241022-134126-arnaudb.json
13:40 lucaswerkmeister-wmde@deploy2002: joelyrookewmde, lucaswerkmeister-wmde: Continuing with sync
13:39 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner2002.codfw.wmnet with OS bullseye
13:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P70494 and previous config saved to /var/cache/conftool/dbconfig/20241022-133916-ladsgroup.json
13:39 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
13:37 lucaswerkmeister-wmde@deploy2002: joelyrookewmde, lucaswerkmeister-wmde: Backport for Activate feature flag to default move wikibase sidebar link to other projects. (T66315) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:35 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Activate feature flag to default move wikibase sidebar link to other projects. (T66315)
13:34 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2149.codfw.wmnet onto db2227.codfw.wmnet
13:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7002.magru.wmnet
13:32 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Don't escape performer link HTML in GlobalBlockDetailsRenderer (T377398) (duration: 15m 27s)
13:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2011.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
13:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve2011.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
13:29 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2011.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
13:27 arnaudb@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 100%: T377718', diff saved to https://phabricator.wikimedia.org/P70493 and previous config saved to /var/cache/conftool/dbconfig/20241022-132745-arnaudb.json
13:25 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, dreamyjazz: Continuing with sync
13:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7002.magru.wmnet
13:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T376905)', diff saved to https://phabricator.wikimedia.org/P70492 and previous config saved to /var/cache/conftool/dbconfig/20241022-132409-ladsgroup.json
13:23 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve2011.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
13:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2003.codfw.wmnet
13:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2003.codfw.wmnet
13:19 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2085-2086,2088-2089].codfw.wmnet
13:19 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, dreamyjazz: Backport for Don't escape performer link HTML in GlobalBlockDetailsRenderer (T377398) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:19 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2085-2086,2088-2089].codfw.wmnet
13:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Don't escape performer link HTML in GlobalBlockDetailsRenderer (T377398)
13:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1196 (T376905)', diff saved to https://phabricator.wikimedia.org/P70491 and previous config saved to /var/cache/conftool/dbconfig/20241022-131448-ladsgroup.json
13:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
13:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
13:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
13:14 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Release CampaignEvents to eswiki (T376786) (duration: 09m 35s)
13:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
13:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T376905)', diff saved to https://phabricator.wikimedia.org/P70490 and previous config saved to /var/cache/conftool/dbconfig/20241022-131415-ladsgroup.json
13:14 aqu@deploy2002: Finished deploy [analytics/refinery@ffc985a]: Adding refinery/source 0.2.49.2 & 0.2.53 [analytics/refinery@ffc985a7] (duration: 19m 41s)
13:12 arnaudb@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 75%: T377718', diff saved to https://phabricator.wikimedia.org/P70489 and previous config saved to /var/cache/conftool/dbconfig/20241022-131239-arnaudb.json
13:09 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync
13:07 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for Release CampaignEvents to eswiki (T376786) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:04 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Release CampaignEvents to eswiki (T376786)
13:02 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner2002.codfw.wmnet with reason: host reimage
12:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P70488 and previous config saved to /var/cache/conftool/dbconfig/20241022-125908-ladsgroup.json
12:58 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner2002.codfw.wmnet with reason: host reimage
12:57 arnaudb@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 50%: T377718', diff saved to https://phabricator.wikimedia.org/P70487 and previous config saved to /var/cache/conftool/dbconfig/20241022-125734-arnaudb.json
12:55 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2089.codfw.wmnet with OS bookworm
12:54 aqu@deploy2002: Started deploy [analytics/refinery@ffc985a]: Adding refinery/source 0.2.49.2 & 0.2.53 [analytics/refinery@ffc985a7]
12:53 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2086.codfw.wmnet with OS bookworm
12:50 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2085.codfw.wmnet with OS bookworm
12:45 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2088.codfw.wmnet with OS bookworm
12:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P70486 and previous config saved to /var/cache/conftool/dbconfig/20241022-124401-ladsgroup.json
12:42 arnaudb@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 25%: T377718', diff saved to https://phabricator.wikimedia.org/P70485 and previous config saved to /var/cache/conftool/dbconfig/20241022-124228-arnaudb.json
12:42 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host gitlab-runner2002
12:42 jelto@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host gitlab-runner2002
12:41 jelto@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host gitlab-runner2002
12:41 jelto@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) gitlab-runner2002.codfw.wmnet 161.16.192.10.in-addr.arpa 1.6.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
12:41 jelto@cumin1002: START - Cookbook sre.dns.wipe-cache gitlab-runner2002.codfw.wmnet 161.16.192.10.in-addr.arpa 1.6.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
12:41 jelto@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:41 jelto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host gitlab-runner2002 - jelto@cumin1002"
12:41 jelto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host gitlab-runner2002 - jelto@cumin1002"
12:37 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2089.codfw.wmnet with reason: host reimage
12:37 jelto@cumin1002: START - Cookbook sre.dns.netbox
12:36 jelto@cumin1002: START - Cookbook sre.hosts.move-vlan for host gitlab-runner2002
12:36 jelto@cumin1002: START - Cookbook sre.hosts.reimage for host gitlab-runner2002.codfw.wmnet with OS bullseye
12:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:34 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2086.codfw.wmnet with reason: host reimage
12:34 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2089.codfw.wmnet with reason: host reimage
12:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:31 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2085.codfw.wmnet with reason: host reimage
12:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T376905)', diff saved to https://phabricator.wikimedia.org/P70484 and previous config saved to /var/cache/conftool/dbconfig/20241022-122854-ladsgroup.json
12:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2010.codfw.wmnet
12:28 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2010.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
12:27 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2088.codfw.wmnet with reason: host reimage
12:27 Dreamy_Jazz: Running MediaModeration scan on all group2 wikis
12:27 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2086.codfw.wmnet with reason: host reimage
12:27 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2085.codfw.wmnet with reason: host reimage
12:27 arnaudb@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 10%: T377718', diff saved to https://phabricator.wikimedia.org/P70483 and previous config saved to /var/cache/conftool/dbconfig/20241022-122723-arnaudb.json
12:27 Dreamy_Jazz: Stopped MediaModeration scan on all group1 wikis
12:24 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2088.codfw.wmnet with reason: host reimage
12:23 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2010.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
12:20 Dreamy_Jazz: Running MediaModeration scan on all group1 wikis
12:20 klausman@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Java 11 security updates - klausman@cumin2002
12:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1195 (T376905)', diff saved to https://phabricator.wikimedia.org/P70482 and previous config saved to /var/cache/conftool/dbconfig/20241022-121928-ladsgroup.json
12:19 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1195.eqiad.wmnet with reason: Maintenance
12:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1195.eqiad.wmnet with reason: Maintenance
12:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T376905)', diff saved to https://phabricator.wikimedia.org/P70481 and previous config saved to /var/cache/conftool/dbconfig/20241022-121903-ladsgroup.json
12:17 jmm@cumin2002: START - Cookbook sre.dns.netbox
12:12 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2149.codfw.wmnet onto db2227.codfw.wmnet
12:12 arnaudb@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 5%: T377718', diff saved to https://phabricator.wikimedia.org/P70480 and previous config saved to /var/cache/conftool/dbconfig/20241022-121218-arnaudb.json
12:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2010.codfw.wmnet
12:09 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2089.codfw.wmnet with OS bookworm
12:09 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db[2149,2227].codfw.wmnet with reason: maintenance
12:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2009.codfw.wmnet
12:09 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:09 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2009.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
12:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db[2149,2227].codfw.wmnet with reason: maintenance
12:08 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2088.codfw.wmnet with OS bookworm
12:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db2149 and db2227 - T377718', diff saved to https://phabricator.wikimedia.org/P70479 and previous config saved to /var/cache/conftool/dbconfig/20241022-120753-arnaudb.json
12:06 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2086.codfw.wmnet with OS bookworm
12:06 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2085.codfw.wmnet with OS bookworm
12:05 Dreamy_Jazz: Running MediaModeration scan on all group0 wikis
12:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P70478 and previous config saved to /var/cache/conftool/dbconfig/20241022-120356-ladsgroup.json
12:03 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for tests: Don't depend on Message implementation details (T377778), Update for Message/MessageValue changes (T377778) (duration: 15m 27s)
12:02 klausman@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Java 11 security updates - klausman@cumin2002
11:57 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2085-2086,2088-2089].codfw.wmnet
11:57 klausman@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Java 11 security updates - klausman@cumin2002
11:56 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
11:55 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2085-2086,2088-2089].codfw.wmnet
11:55 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for tests: Don't depend on Message implementation details (T377778), Update for Message/MessageValue changes (T377778) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:54 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2009.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
11:48 jmm@cumin2002: START - Cookbook sre.dns.netbox
11:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P70477 and previous config saved to /var/cache/conftool/dbconfig/20241022-114849-ladsgroup.json
11:48 kart_: Updated cxserver to 2024-10-22-112806-production (T357950)
11:47 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for tests: Don't depend on Message implementation details (T377778), Update for Message/MessageValue changes (T377778)
11:47 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
11:46 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
11:46 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
11:45 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
11:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
11:44 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2009.codfw.wmnet
11:43 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
11:43 jayme@cumin1002: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) check for host wikikube-worker2085.codfw.wmnet
11:43 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node check for host wikikube-worker2085.codfw.wmnet
11:41 akosiaris: remove faidon from WMCS projects maps, visualeditor, swift, testlabs per his request. Keep the bastion project. cc paravoid
11:39 klausman@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Java 11 security updates - klausman@cumin2002
11:34 jayme@cumin1002: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) check for host kubestagemaster2005.codfw.wmnet
11:34 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node check for host kubestagemaster2005.codfw.wmnet
11:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T376905)', diff saved to https://phabricator.wikimedia.org/P70476 and previous config saved to /var/cache/conftool/dbconfig/20241022-113342-ladsgroup.json
11:27 moritzm: installing Java 11 security updates
11:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1186 (T376905)', diff saved to https://phabricator.wikimedia.org/P70475 and previous config saved to /var/cache/conftool/dbconfig/20241022-112408-ladsgroup.json
11:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
11:23 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
11:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T376905)', diff saved to https://phabricator.wikimedia.org/P70474 and previous config saved to /var/cache/conftool/dbconfig/20241022-112343-ladsgroup.json
11:21 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: sync
11:21 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: sync
11:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P70473 and previous config saved to /var/cache/conftool/dbconfig/20241022-110836-ladsgroup.json
11:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 100%: post clone', diff saved to https://phabricator.wikimedia.org/P70472 and previous config saved to /var/cache/conftool/dbconfig/20241022-110744-arnaudb.json
10:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P70471 and previous config saved to /var/cache/conftool/dbconfig/20241022-105329-ladsgroup.json
10:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 75%: post clone', diff saved to https://phabricator.wikimedia.org/P70470 and previous config saved to /var/cache/conftool/dbconfig/20241022-105238-arnaudb.json
10:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T376905)', diff saved to https://phabricator.wikimedia.org/P70469 and previous config saved to /var/cache/conftool/dbconfig/20241022-103822-ladsgroup.json
10:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 50%: post clone', diff saved to https://phabricator.wikimedia.org/P70468 and previous config saved to /var/cache/conftool/dbconfig/20241022-103733-arnaudb.json
10:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1184 (T376905)', diff saved to https://phabricator.wikimedia.org/P70467 and previous config saved to /var/cache/conftool/dbconfig/20241022-102907-ladsgroup.json
10:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
10:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
10:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T376905)', diff saved to https://phabricator.wikimedia.org/P70466 and previous config saved to /var/cache/conftool/dbconfig/20241022-102843-ladsgroup.json
10:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 25%: post clone', diff saved to https://phabricator.wikimedia.org/P70465 and previous config saved to /var/cache/conftool/dbconfig/20241022-102227-arnaudb.json
10:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P70464 and previous config saved to /var/cache/conftool/dbconfig/20241022-101336-ladsgroup.json
10:12 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
10:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
10:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
10:04 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: sync
10:04 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
10:03 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: sync
10:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2149.codfw.wmnet onto db2205.codfw.wmnet
10:03 andrewtavis-wmde@deploy2002: Finished deploy [airflow-dags/wmde@dcf019d]: (no justification provided) (duration: 00m 11s)
10:02 andrewtavis-wmde@deploy2002: Started deploy [airflow-dags/wmde@dcf019d]: (no justification provided)
09:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P70463 and previous config saved to /var/cache/conftool/dbconfig/20241022-095829-ladsgroup.json
09:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
09:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T376905)', diff saved to https://phabricator.wikimedia.org/P70461 and previous config saved to /var/cache/conftool/dbconfig/20241022-094322-ladsgroup.json
09:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
09:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1169 (T376905)', diff saved to https://phabricator.wikimedia.org/P70460 and previous config saved to /var/cache/conftool/dbconfig/20241022-093345-ladsgroup.json
09:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
09:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
09:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
09:28 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
09:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
09:22 hashar: Restarting CI Jenkins
09:06 hashar: Restarting Gerrit
08:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: maintenance
08:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: maintenance
08:37 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2149.codfw.wmnet onto db2205.codfw.wmnet
08:35 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:33 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:33 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:25 arnaudb@cumin1002: dbctl commit (dc=all): 'db2240 (re)pooling @ 100%: post clone', diff saved to https://phabricator.wikimedia.org/P70459 and previous config saved to /var/cache/conftool/dbconfig/20241022-082545-arnaudb.json
08:24 moritzm: irc.wikimedia.org has been switched to ircstream T376014
08:10 arnaudb@cumin1002: dbctl commit (dc=all): 'db2240 (re)pooling @ 75%: post clone', diff saved to https://phabricator.wikimedia.org/P70457 and previous config saved to /var/cache/conftool/dbconfig/20241022-081040-arnaudb.json
08:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1002.wikimedia.org
08:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1002.wikimedia.org
08:03 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:03 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:00 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
07:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db[2149,2205].codfw.wmnet with reason: db2205 reclone
07:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db[2149,2205].codfw.wmnet with reason: db2205 reclone
07:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
07:58 arnaudb@cumin1002: dbctl commit (dc=all): 'T377718', diff saved to https://phabricator.wikimedia.org/P70456 and previous config saved to /var/cache/conftool/dbconfig/20241022-075830-arnaudb.json
07:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db2240 (re)pooling @ 50%: post clone', diff saved to https://phabricator.wikimedia.org/P70455 and previous config saved to /var/cache/conftool/dbconfig/20241022-075534-arnaudb.json
07:40 arnaudb@cumin1002: dbctl commit (dc=all): 'db2240 (re)pooling @ 28%: post clone', diff saved to https://phabricator.wikimedia.org/P70454 and previous config saved to /var/cache/conftool/dbconfig/20241022-074029-arnaudb.json
07:28 moritzm: installing Java 17 security updates
07:25 arnaudb@cumin1002: dbctl commit (dc=all): 'db2240 (re)pooling @ 27%: post clone', diff saved to https://phabricator.wikimedia.org/P70453 and previous config saved to /var/cache/conftool/dbconfig/20241022-072523-arnaudb.json
07:23 moritzm: rearm keyholder on netmon1003
07:10 arnaudb@cumin1002: dbctl commit (dc=all): 'db2240 (re)pooling @ 26%: post clone', diff saved to https://phabricator.wikimedia.org/P70452 and previous config saved to /var/cache/conftool/dbconfig/20241022-071018-arnaudb.json
07:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast6003.wikimedia.org
07:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon1003.wikimedia.org
06:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast6003.wikimedia.org
06:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db2240 (re)pooling @ 25%: post clone', diff saved to https://phabricator.wikimedia.org/P70451 and previous config saved to /var/cache/conftool/dbconfig/20241022-065513-arnaudb.json
06:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon1003.wikimedia.org
05:41 kart_: Remove servicerunner dependency for cxserver (T357950, T373777)
05:31 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
05:30 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
05:25 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
05:24 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
04:01 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.25 (duration: 00m 58s)
03:52 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.43.0-wmf.28 refs T375659 (duration: 49m 37s)
03:02 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.43.0-wmf.28 refs T375659
01:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2229 (T376905)', diff saved to https://phabricator.wikimedia.org/P70450 and previous config saved to /var/cache/conftool/dbconfig/20241022-010820-ladsgroup.json
00:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P70449 and previous config saved to /var/cache/conftool/dbconfig/20241022-005313-ladsgroup.json
00:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P70448 and previous config saved to /var/cache/conftool/dbconfig/20241022-003807-ladsgroup.json
00:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2229 (T376905)', diff saved to https://phabricator.wikimedia.org/P70447 and previous config saved to /var/cache/conftool/dbconfig/20241022-002259-ladsgroup.json
00:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2229 (T376905)', diff saved to https://phabricator.wikimedia.org/P70446 and previous config saved to /var/cache/conftool/dbconfig/20241022-001606-ladsgroup.json
00:17 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2229.codfw.wmnet with reason: Maintenance
00:16 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2229.codfw.wmnet with reason: Maintenance
00:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T376905)', diff saved to https://phabricator.wikimedia.org/P70445 and previous config saved to /var/cache/conftool/dbconfig/20241022-001539-ladsgroup.json
00:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P70444 and previous config saved to /var/cache/conftool/dbconfig/20241022-000032-ladsgroup.json

2024-10-21

23:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P70443 and previous config saved to /var/cache/conftool/dbconfig/20241021-234525-ladsgroup.json
23:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T376905)', diff saved to https://phabricator.wikimedia.org/P70442 and previous config saved to /var/cache/conftool/dbconfig/20241021-233018-ladsgroup.json
23:20 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye
22:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2224 (T376905)', diff saved to https://phabricator.wikimedia.org/P70441 and previous config saved to /var/cache/conftool/dbconfig/20241021-222952-ladsgroup.json
22:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2224.codfw.wmnet with reason: Maintenance
22:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2224.codfw.wmnet with reason: Maintenance
22:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T376905)', diff saved to https://phabricator.wikimedia.org/P70440 and previous config saved to /var/cache/conftool/dbconfig/20241021-222926-ladsgroup.json
22:21 eileen: config revision changed from a1c7759c to 3bbf553d
22:18 zabe@deploy2002: Finished scap sync-world: Backport for group0: Increase revision-slots cache expiry back to default (T183490) (duration: 06m 58s)
22:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P70439 and previous config saved to /var/cache/conftool/dbconfig/20241021-221419-ladsgroup.json
22:13 zabe@deploy2002: zabe: Continuing with sync
22:13 zabe@deploy2002: zabe: Backport for group0: Increase revision-slots cache expiry back to default (T183490) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:11 zabe@deploy2002: Started scap sync-world: Backport for group0: Increase revision-slots cache expiry back to default (T183490)
21:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
21:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P70438 and previous config saved to /var/cache/conftool/dbconfig/20241021-215912-ladsgroup.json
21:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T376905)', diff saved to https://phabricator.wikimedia.org/P70437 and previous config saved to /var/cache/conftool/dbconfig/20241021-214405-ladsgroup.json
21:43 eileen: config revision changed from d240bcfb to a1c7759c
21:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T376905)', diff saved to https://phabricator.wikimedia.org/P70436 and previous config saved to /var/cache/conftool/dbconfig/20241021-213801-ladsgroup.json
21:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2217.codfw.wmnet with reason: Maintenance
21:37 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2217.codfw.wmnet with reason: Maintenance
21:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T376905)', diff saved to https://phabricator.wikimedia.org/P70435 and previous config saved to /var/cache/conftool/dbconfig/20241021-213733-ladsgroup.json
21:25 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
21:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P70434 and previous config saved to /var/cache/conftool/dbconfig/20241021-212226-ladsgroup.json
21:22 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
21:16 swfrench-wmf: ran authdns-update to pick up mw-(web|api-ext)-next discovery records - T377040
21:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P70433 and previous config saved to /var/cache/conftool/dbconfig/20241021-210718-ladsgroup.json
21:00 sukhe: running authdns-update for CR 1081371
away: UTC late deploys done
20:56 tgr@deploy2002: Finished scap sync-world: Backport for fix(AuthManagerStatsd): counters require static set of labels (T377476) (duration: 18m 43s)
20:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T376905)', diff saved to https://phabricator.wikimedia.org/P70431 and previous config saved to /var/cache/conftool/dbconfig/20241021-205211-ladsgroup.json
20:52 tgr@deploy2002: tgr: Continuing with sync
20:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T376905)', diff saved to https://phabricator.wikimedia.org/P70430 and previous config saved to /var/cache/conftool/dbconfig/20241021-204603-ladsgroup.json
20:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2193.codfw.wmnet with reason: Maintenance
20:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2193.codfw.wmnet with reason: Maintenance
20:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T376905)', diff saved to https://phabricator.wikimedia.org/P70429 and previous config saved to /var/cache/conftool/dbconfig/20241021-204536-ladsgroup.json
20:40 tgr@deploy2002: tgr: Backport for fix(AuthManagerStatsd): counters require static set of labels (T377476) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:37 tgr@deploy2002: Started scap sync-world: Backport for fix(AuthManagerStatsd): counters require static set of labels (T377476)
20:32 tgr@deploy2002: Finished scap sync-world: Backport for frwiki: switch clearing link recommendations to PageSaveComplete hook (T372337) (duration: 08m 19s)
20:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P70428 and previous config saved to /var/cache/conftool/dbconfig/20241021-203029-ladsgroup.json
20:28 tgr@deploy2002: migr, tgr: Continuing with sync
20:26 tgr@deploy2002: migr, tgr: Backport for frwiki: switch clearing link recommendations to PageSaveComplete hook (T372337) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:24 tgr@deploy2002: Started scap sync-world: Backport for frwiki: switch clearing link recommendations to PageSaveComplete hook (T372337)
20:22 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
20:21 tgr@deploy2002: Finished scap sync-world: Backport for Re-apply "Set special footer licence message for MediaWiki.org re. Help: pages" (T301483) (duration: 09m 48s)
20:19 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
20:17 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
20:16 tgr@deploy2002: matmarex, tgr: Continuing with sync
20:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P70427 and previous config saved to /var/cache/conftool/dbconfig/20241021-201522-ladsgroup.json
20:13 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
20:13 tgr@deploy2002: matmarex, tgr: Backport for Re-apply "Set special footer licence message for MediaWiki.org re. Help: pages" (T301483) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:11 tgr@deploy2002: Started scap sync-world: Backport for Re-apply "Set special footer licence message for MediaWiki.org re. Help: pages" (T301483)
20:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T376905)', diff saved to https://phabricator.wikimedia.org/P70426 and previous config saved to /var/cache/conftool/dbconfig/20241021-200015-ladsgroup.json
19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T376905)', diff saved to https://phabricator.wikimedia.org/P70425 and previous config saved to /var/cache/conftool/dbconfig/20241021-195300-ladsgroup.json
19:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
19:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
19:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T376905)', diff saved to https://phabricator.wikimedia.org/P70424 and previous config saved to /var/cache/conftool/dbconfig/20241021-195233-ladsgroup.json
19:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P70423 and previous config saved to /var/cache/conftool/dbconfig/20241021-193726-ladsgroup.json
19:36 swfrench@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=mw-api-ext-next-ro,name=eqiad [reason: preparing mw-api-ext-next-ro (a/a) for discovery - T377040]
19:36 swfrench@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=mw-api-ext-next-ro,name=codfw [reason: preparing mw-api-ext-next-ro (a/a) for discovery - T377040]
19:36 dduvall@deploy2002: Finished deploy [releng/jenkins-deploy@b75c4aa] (releasing): Deploying changes to MediaWiki branch and publish WMF single-version image job (duration: 01m 20s)
19:36 swfrench@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=mw-web-next-ro,name=eqiad [reason: preparing mw-web-next-ro (a/a) for discovery - T377040]
19:35 swfrench@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=mw-web-next-ro,name=codfw [reason: preparing mw-web-next-ro (a/a) for discovery - T377040]
19:34 dduvall@deploy2002: Started deploy [releng/jenkins-deploy@b75c4aa] (releasing): Deploying changes to MediaWiki branch and publish WMF single-version image job
19:31 swfrench@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=mw-api-ext-next,name=codfw [reason: preparing mw-api-ext-next (a/p) for discovery - T377040]
19:30 swfrench@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=mw-web-next,name=codfw [reason: preparing mw-web-next (a/p) for discovery - T377040]
19:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P70422 and previous config saved to /var/cache/conftool/dbconfig/20241021-192219-ladsgroup.json
19:11 ejegg: re-enabled fundraising thank you mailer
19:10 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-codfw (T377040)
19:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T376905)', diff saved to https://phabricator.wikimedia.org/P70421 and previous config saved to /var/cache/conftool/dbconfig/20241021-190712-ladsgroup.json
19:04 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-codfw (T377040)
19:02 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-codfw (T377040)
19:02 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw (T377040)
19:01 swfrench-wmf: ran and enabled puppet agent on 'A:lvs and A:codfw' - T377040
19:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T376905)', diff saved to https://phabricator.wikimedia.org/P70420 and previous config saved to /var/cache/conftool/dbconfig/20241021-185957-ladsgroup.json
19:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
19:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T376905)', diff saved to https://phabricator.wikimedia.org/P70419 and previous config saved to /var/cache/conftool/dbconfig/20241021-185931-ladsgroup.json
18:58 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-eqiad (T377040)
18:52 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-eqiad (T377040)
18:51 zabe@deploy2002: Finished scap sync-world: Backport for s4: Reduce revision-slots cache expiry to 60 seconds (T183490) (duration: 16m 09s)
18:51 ejegg: fundraising civicrm upgraded from cfb0def0 to 36660cb3
18:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1012.eqiad.wmnet with OS bookworm
18:45 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P70418 and previous config saved to /var/cache/conftool/dbconfig/20241021-184424-ladsgroup.json
18:43 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad (T377040)
18:42 zabe@deploy2002: zabe: Continuing with sync
18:42 zabe@deploy2002: zabe: Backport for s4: Reduce revision-slots cache expiry to 60 seconds (T183490) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
18:37 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
18:37 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad (T377040)
18:36 swfrench-wmf: ran and enabled puppet agent on 'A:lvs and A:eqiad' - T377040
18:35 zabe@deploy2002: Started scap sync-world: Backport for s4: Reduce revision-slots cache expiry to 60 seconds (T183490)
18:32 swfrench-wmf: ran disable-puppet on 'A:lvs and (A:eqiad or A:codfw)' - T377040
18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P70417 and previous config saved to /var/cache/conftool/dbconfig/20241021-182916-ladsgroup.json
18:23 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-codfw (T377040)
18:22 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-codfw (T377040)
18:20 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-codfw (T377040)
18:19 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw (T377040)
18:19 swfrench-wmf: ran and enabled pupppet agent on 'A:lvs and A:codfw' - T377040
18:15 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-eqiad (T377040)
18:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1012.eqiad.wmnet with reason: host reimage
18:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T376905)', diff saved to https://phabricator.wikimedia.org/P70416 and previous config saved to /var/cache/conftool/dbconfig/20241021-181410-ladsgroup.json
18:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1012.eqiad.wmnet with reason: host reimage
18:09 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-eqiad (T377040)
18:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T376905)', diff saved to https://phabricator.wikimedia.org/P70415 and previous config saved to /var/cache/conftool/dbconfig/20241021-180654-ladsgroup.json
18:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
18:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
18:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
18:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T376905)', diff saved to https://phabricator.wikimedia.org/P70414 and previous config saved to /var/cache/conftool/dbconfig/20241021-180612-ladsgroup.json
18:06 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad (T377040)
18:05 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad (T377040)
18:04 swfrench-wmf: ran and enabled pupppet agent on 'A:lvs and A:eqiad' - T377040
17:59 swfrench-wmf: ran disable-puppet on 'A:lvs and (A:eqiad or A:codfw)' - T377040
17:56 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
17:53 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1012.eqiad.wmnet with OS bookworm
17:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
17:52 dduvall@deploy2002: Installing scap version "4.115.0" for 209 hosts
17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P70413 and previous config saved to /var/cache/conftool/dbconfig/20241021-175105-ladsgroup.json
17:50 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@671896c]: Deploy T375402. (duration: 01m 04s)
17:48 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@671896c]: Deploy T375402.
17:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
17:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
17:42 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
17:41 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
17:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P70412 and previous config saved to /var/cache/conftool/dbconfig/20241021-173558-ladsgroup.json
17:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T376905)', diff saved to https://phabricator.wikimedia.org/P70411 and previous config saved to /var/cache/conftool/dbconfig/20241021-172051-ladsgroup.json
17:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T376905)', diff saved to https://phabricator.wikimedia.org/P70410 and previous config saved to /var/cache/conftool/dbconfig/20241021-171138-ladsgroup.json
17:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
17:11 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
17:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213 (T376905)', diff saved to https://phabricator.wikimedia.org/P70409 and previous config saved to /var/cache/conftool/dbconfig/20241021-171046-ladsgroup.json
16:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db2172 (re)pooling @ 100%: post clone', diff saved to https://phabricator.wikimedia.org/P70408 and previous config saved to /var/cache/conftool/dbconfig/20241021-165624-arnaudb.json
16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P70407 and previous config saved to /var/cache/conftool/dbconfig/20241021-165539-ladsgroup.json
16:44 herron@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
16:43 herron@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
16:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db2172 (re)pooling @ 75%: post clone', diff saved to https://phabricator.wikimedia.org/P70406 and previous config saved to /var/cache/conftool/dbconfig/20241021-164119-arnaudb.json
16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P70405 and previous config saved to /var/cache/conftool/dbconfig/20241021-164032-ladsgroup.json
16:33 volans@cumin1002: dbctl commit (dc=all): 'Fix db1185 weight', diff saved to https://phabricator.wikimedia.org/P70404 and previous config saved to /var/cache/conftool/dbconfig/20241021-163355-volans.json
16:32 volans@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1185 quickly with 2 steps - Testing new cookbook
16:29 volans@cumin1002: START - Cookbook sre.mysql.pool db1185 quickly with 2 steps - Testing new cookbook
16:29 volans@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1185 quickly with 2 steps - Testing new cookbook
16:28 volans@cumin1002: START - Cookbook sre.mysql.pool db1185 quickly with 2 steps - Testing new cookbook
16:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2172 (re)pooling @ 50%: post clone', diff saved to https://phabricator.wikimedia.org/P70401 and previous config saved to /var/cache/conftool/dbconfig/20241021-162613-arnaudb.json
16:27 volans@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1185 - Testing new cookbook
16:26 volans@cumin1002: START - Cookbook sre.mysql.depool db1185 - Testing new cookbook
16:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213 (T376905)', diff saved to https://phabricator.wikimedia.org/P70399 and previous config saved to /var/cache/conftool/dbconfig/20241021-162525-ladsgroup.json
16:22 volans@cumin1002: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) db1185 - Testing new cookbook
16:22 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
16:22 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
16:21 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
16:19 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
16:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:18 volans@cumin1002: START - Cookbook sre.mysql.depool db1185 - Testing new cookbook
16:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:17 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2213 (T376905)', diff saved to https://phabricator.wikimedia.org/P70398 and previous config saved to /var/cache/conftool/dbconfig/20241021-161701-ladsgroup.json
16:17 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2213.codfw.wmnet with reason: Maintenance
16:16 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2213.codfw.wmnet with reason: Maintenance
16:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T376905)', diff saved to https://phabricator.wikimedia.org/P70397 and previous config saved to /var/cache/conftool/dbconfig/20241021-161634-ladsgroup.json
16:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox
16:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db2172 (re)pooling @ 25%: post clone', diff saved to https://phabricator.wikimedia.org/P70396 and previous config saved to /var/cache/conftool/dbconfig/20241021-161108-arnaudb.json
16:04 ejegg: disabled fundraising Thank You mail send jobs
16:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P70395 and previous config saved to /var/cache/conftool/dbconfig/20241021-160127-ladsgroup.json
15:58 volans@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1185 gradually with 4 steps - Testing new cookbook
15:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:55 volans@cumin1002: START - Cookbook sre.mysql.pool db1185 gradually with 4 steps - Testing new cookbook
15:53 volans@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1185 - Testing new cookbook
15:53 volans@cumin1002: START - Cookbook sre.mysql.depool db1185 - Testing new cookbook
15:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P70389 and previous config saved to /var/cache/conftool/dbconfig/20241021-154620-ladsgroup.json
15:39 Dreamy_Jazz: Starting MediaModeration scanning script for 12 hrs on enwiki - https://wikitech.wikimedia.org/wiki/MediaModeration
15:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
15:37 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2172.codfw.wmnet onto db2240.codfw.wmnet
15:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
15:32 Dreamy_Jazz: Restarted MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
15:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T376905)', diff saved to https://phabricator.wikimedia.org/P70388 and previous config saved to /var/cache/conftool/dbconfig/20241021-153113-ladsgroup.json
15:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2211 (T376905)', diff saved to https://phabricator.wikimedia.org/P70387 and previous config saved to /var/cache/conftool/dbconfig/20241021-152408-ladsgroup.json
15:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2211.codfw.wmnet with reason: Maintenance
15:23 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2211.codfw.wmnet with reason: Maintenance
15:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T376905)', diff saved to https://phabricator.wikimedia.org/P70386 and previous config saved to /var/cache/conftool/dbconfig/20241021-152339-ladsgroup.json
15:20 moritzm: rearm keyholder on netmon2002
15:20 stran@deploy2002: Finished scap sync-world: Backport for Disable local IP view right group on meta (T377584) (duration: 20m 29s)
15:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon2002.wikimedia.org
15:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P70385 and previous config saved to /var/cache/conftool/dbconfig/20241021-150832-ladsgroup.json
15:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon2002.wikimedia.org
15:02 stran@deploy2002: stran: Continuing with sync
15:01 stran@deploy2002: stran: Backport for Disable local IP view right group on meta (T377584) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:59 stran@deploy2002: Started scap sync-world: Backport for Disable local IP view right group on meta (T377584)
14:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P70384 and previous config saved to /var/cache/conftool/dbconfig/20241021-145325-ladsgroup.json
14:53 ejegg: disabled failing CiviCRM contact dedupe job
14:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T376905)', diff saved to https://phabricator.wikimedia.org/P70383 and previous config saved to /var/cache/conftool/dbconfig/20241021-143818-ladsgroup.json
14:33 herron@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
14:32 herron@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
14:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2192 (T376905)', diff saved to https://phabricator.wikimedia.org/P70382 and previous config saved to /var/cache/conftool/dbconfig/20241021-143108-ladsgroup.json
14:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2192.codfw.wmnet with reason: Maintenance
14:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2192.codfw.wmnet with reason: Maintenance
14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T376905)', diff saved to https://phabricator.wikimedia.org/P70381 and previous config saved to /var/cache/conftool/dbconfig/20241021-143042-ladsgroup.json
14:29 moritzm: installing PHP 8.2 security updates
14:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P70380 and previous config saved to /var/cache/conftool/dbconfig/20241021-141535-ladsgroup.json
14:15 Lucas_WMDE: UTC afternoon backport+config window done
14:10 stran@deploy2002: Finished scap sync-world: Backport for Disable IP reveal rights for local metawiki groups (T377584), Set redirect wiki for Special:GlobalContributions (T376612), temp accounts: Make temp accounts known on metawiki (T376132) (duration: 14m 55s)
14:05 stran@deploy2002: stran, kharlan: Continuing with sync
14:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P70379 and previous config saved to /var/cache/conftool/dbconfig/20241021-140028-ladsgroup.json
13:57 stran@deploy2002: stran, kharlan: Backport for Disable IP reveal rights for local metawiki groups (T377584), Set redirect wiki for Special:GlobalContributions (T376612), temp accounts: Make temp accounts known on metawiki (T376132) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2035.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
13:55 stran@deploy2002: Started scap sync-world: Backport for Disable IP reveal rights for local metawiki groups (T377584), Set redirect wiki for Special:GlobalContributions (T376612), temp accounts: Make temp accounts known on metawiki (T376132)
13:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti2035.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
13:53 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2035.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
13:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti2035.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
13:50 stran@deploy2002: Finished scap sync-world: Backport for Apply wmf-specific protected vars rights access (T369610) (duration: 08m 53s)
13:45 stran@deploy2002: stran: Continuing with sync
13:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T376905)', diff saved to https://phabricator.wikimedia.org/P70378 and previous config saved to /var/cache/conftool/dbconfig/20241021-134521-ladsgroup.json
13:43 stran@deploy2002: stran: Backport for Apply wmf-specific protected vars rights access (T369610) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:41 stran@deploy2002: Started scap sync-world: Backport for Apply wmf-specific protected vars rights access (T369610)
13:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2178 (T376905)', diff saved to https://phabricator.wikimedia.org/P70377 and previous config saved to /var/cache/conftool/dbconfig/20241021-133619-ladsgroup.json
13:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
13:36 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
13:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T376905)', diff saved to https://phabricator.wikimedia.org/P70376 and previous config saved to /var/cache/conftool/dbconfig/20241021-133552-ladsgroup.json
13:35 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
13:34 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7002.magru.wmnet
13:34 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Revert "Enable CampaignEvents collaboration list in testwiki and test2wiki" (duration: 08m 20s)
13:33 jelto@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
13:33 inflatador: bking@stat1009,stat1010.mgmt racadm>>racadm set BIOS.MemSettings.NodeInterleave Enabled && racadm jobqueue create BIOS.Setup.1-1 T376813
13:32 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
13:30 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2172.codfw.wmnet onto db2240.codfw.wmnet
13:29 jelto@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
13:29 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, trainbranchbot: Continuing with sync
13:28 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
13:28 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, trainbranchbot: Backport for Revert "Enable CampaignEvents collaboration list in testwiki and test2wiki" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:27 jelto@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
13:26 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Revert "Enable CampaignEvents collaboration list in testwiki and test2wiki"
13:25 inflatador: bking@stat1008.mgmt racadm>>racadm jobqueue create BIOS.Setup.1-1
13:24 inflatador: bking@stat1008.mgmt racadm>>racadm set BIOS.MemSettings.NodeInterleave Enabled T376813
13:24 lucaswerkmeister-wmde@deploy2002: Sync cancelled.
13:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2172 in db2240 for T373579', diff saved to https://phabricator.wikimedia.org/P70375 and previous config saved to /var/cache/conftool/dbconfig/20241021-132351-arnaudb.json
13:22 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2240.codfw.wmnet with reason: provisionning db2240.codfw.wmnet - T373579
13:22 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2240.codfw.wmnet with reason: provisionning db2240.codfw.wmnet - T373579
13:22 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: provisionning db2240.codfw.wmnet - T373579
13:22 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: provisionning db2240.codfw.wmnet - T373579
13:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P70374 and previous config saved to /var/cache/conftool/dbconfig/20241021-132045-ladsgroup.json
13:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db2172 to clone on db2240 T373579', diff saved to https://phabricator.wikimedia.org/P70373 and previous config saved to /var/cache/conftool/dbconfig/20241021-131750-arnaudb.json
13:12 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: test Ide32aa with dummy upgrade
13:11 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: test Ide32aa with dummy upgrade
13:08 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, daimona: Backport for Enable CampaignEvents collaboration list in testwiki and test2wiki (T376055) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P70372 and previous config saved to /var/cache/conftool/dbconfig/20241021-130538-ladsgroup.json
13:05 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Enable CampaignEvents collaboration list in testwiki and test2wiki (T376055)
12:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2035.codfw.wmnet
12:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2035.codfw.wmnet
12:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T376905)', diff saved to https://phabricator.wikimedia.org/P70371 and previous config saved to /var/cache/conftool/dbconfig/20241021-125029-ladsgroup.json
12:45 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-lab1002.eqiad.wmnet with OS bookworm
12:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2171 (T376905)', diff saved to https://phabricator.wikimedia.org/P70370 and previous config saved to /var/cache/conftool/dbconfig/20241021-124217-ladsgroup.json
12:42 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
12:42 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
12:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T376905)', diff saved to https://phabricator.wikimedia.org/P70369 and previous config saved to /var/cache/conftool/dbconfig/20241021-124151-ladsgroup.json
12:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P70368 and previous config saved to /var/cache/conftool/dbconfig/20241021-122644-ladsgroup.json
12:24 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-lab1002.eqiad.wmnet with reason: host reimage
12:21 klausman@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-lab1002.eqiad.wmnet with reason: host reimage
12:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P70367 and previous config saved to /var/cache/conftool/dbconfig/20241021-121136-ladsgroup.json
12:09 klausman@cumin1002: START - Cookbook sre.hosts.reimage for host ml-lab1002.eqiad.wmnet with OS bookworm
12:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:00 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
11:56 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: sync on production
11:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T376905)', diff saved to https://phabricator.wikimedia.org/P70366 and previous config saved to /var/cache/conftool/dbconfig/20241021-115629-ladsgroup.json
11:52 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
11:52 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
11:52 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
11:51 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
11:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2157 (T376905)', diff saved to https://phabricator.wikimedia.org/P70365 and previous config saved to /var/cache/conftool/dbconfig/20241021-114723-ladsgroup.json
11:47 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
11:47 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
11:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T376905)', diff saved to https://phabricator.wikimedia.org/P70364 and previous config saved to /var/cache/conftool/dbconfig/20241021-114657-ladsgroup.json
11:40 moritzm: installing python-idna security updates
11:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P70363 and previous config saved to /var/cache/conftool/dbconfig/20241021-113150-ladsgroup.json
11:17 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
11:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P70362 and previous config saved to /var/cache/conftool/dbconfig/20241021-111643-ladsgroup.json
11:14 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T376905)', diff saved to https://phabricator.wikimedia.org/P70361 and previous config saved to /var/cache/conftool/dbconfig/20241021-110136-ladsgroup.json
10:59 moritzm: installing curl security updates
10:54 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1029.eqiad.wmnet
10:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2128 (T376905)', diff saved to https://phabricator.wikimedia.org/P70360 and previous config saved to /var/cache/conftool/dbconfig/20241021-105223-ladsgroup.json
10:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
10:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
10:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
10:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
10:47 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1029.eqiad.wmnet
10:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2038.codfw.wmnet to cluster codfw and group C
10:31 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2038.codfw.wmnet to cluster codfw and group C
10:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping1004.eqiad.wmnet
10:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping1004.eqiad.wmnet
10:18 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1185.eqiad.wmnet with reason: testing depool/repool
10:18 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1185.eqiad.wmnet with reason: testing depool/repool
10:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
10:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1213.eqiad.wmnet with reason: testing depool/repool
10:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1213.eqiad.wmnet with reason: testing depool/repool
10:14 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1245.eqiad.wmnet with reason: testing depool/repool
10:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
10:14 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1245.eqiad.wmnet with reason: testing depool/repool
10:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
10:10 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host cloudcephmon1006.eqiad.wmnet
10:08 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.reimage-stacked-control-plane (exit_code=0) Reimaging k8s control planes of cluster staging-eqiad: containerd migration
10:08 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster1005.eqiad.wmnet with OS bookworm
10:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
10:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
10:02 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephmon1006.eqiad.wmnet
09:53 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
09:53 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
09:52 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2002.codfw.wmnet
09:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2037.codfw.wmnet to cluster codfw and group C
09:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2037.codfw.wmnet to cluster codfw and group C
09:47 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-staging2002.codfw.wmnet
09:46 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster1005.eqiad.wmnet with reason: host reimage
09:45 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2001.codfw.wmnet
09:42 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster1005.eqiad.wmnet with reason: host reimage
09:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2038.codfw.wmnet
09:40 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-staging2001.codfw.wmnet
09:39 dcausse@deploy2002: Finished scap sync-world: Backport for Fix phan issue with getCounter returning NullMetric|CounterMetric, Do not pass null to DataSender::sendWeightedTagsUpdate $tagWeights (T376715) (duration: 23m 26s)
09:36 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1011.eqiad.wmnet
09:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2038.codfw.wmnet
09:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping2004.codfw.wmnet
09:32 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-serve1011.eqiad.wmnet
09:31 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1010.eqiad.wmnet
09:29 dcausse@deploy2002: dcausse: Continuing with sync
09:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2037.codfw.wmnet
09:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping2004.codfw.wmnet
09:27 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestagemaster1005.eqiad.wmnet with OS bookworm
09:27 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster1004.eqiad.wmnet with OS bookworm
09:27 dcausse@deploy2002: dcausse: Backport for Fix phan issue with getCounter returning NullMetric|CounterMetric, Do not pass null to DataSender::sendWeightedTagsUpdate $tagWeights (T376715) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:26 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-serve1010.eqiad.wmnet
09:24 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1009.eqiad.wmnet
09:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2037.codfw.wmnet
09:19 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-serve1009.eqiad.wmnet
09:18 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet
09:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2038.codfw.wmnet
09:16 dcausse@deploy2002: Started scap sync-world: Backport for Fix phan issue with getCounter returning NullMetric|CounterMetric, Do not pass null to DataSender::sendWeightedTagsUpdate $tagWeights (T376715)
09:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2038.codfw.wmnet
09:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2037.codfw.wmnet
09:11 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet
09:11 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
09:11 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
09:10 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
09:10 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
09:09 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
09:09 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
09:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2037.codfw.wmnet
09:06 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster1004.eqiad.wmnet with reason: host reimage
09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2044.codfw.wmnet
09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet
09:03 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1001.eqiad.wmnet
09:02 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster1004.eqiad.wmnet with reason: host reimage
09:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet
08:58 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2044.codfw.wmnet
08:57 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-lab1001.eqiad.wmnet
08:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2043.codfw.wmnet
08:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2039.codfw.wmnet
08:53 andrewtavis-wmde@deploy2002: Finished deploy [airflow-dags/wmde@d176c47]: (no justification provided) (duration: 00m 11s)
08:53 andrewtavis-wmde@deploy2002: Started deploy [airflow-dags/wmde@d176c47]: (no justification provided)
08:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2043.codfw.wmnet
08:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2039.codfw.wmnet
08:48 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestagemaster1004.eqiad.wmnet with OS bookworm
08:47 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster1003.eqiad.wmnet with OS bookworm
08:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2040.codfw.wmnet
08:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2041.codfw.wmnet
08:44 jnuche@deploy2002: Installing scap version "4.114.0" for 210 hosts
08:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2041.codfw.wmnet
08:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2040.codfw.wmnet
08:26 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster1003.eqiad.wmnet with reason: host reimage
08:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
08:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
08:23 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster1003.eqiad.wmnet with reason: host reimage
08:22 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
08:21 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
08:09 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestagemaster1003.eqiad.wmnet with OS bookworm
08:09 jayme@cumin1002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster staging-eqiad: containerd migration
07:50 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
07:50 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
07:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
07:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
07:36 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1013.eqiad.wmnet with OS bookworm
07:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
07:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
07:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
07:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
07:23 moritzm: installing python-reportlab security updates
07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast7001.wikimedia.org
07:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
07:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
07:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast7001.wikimedia.org
07:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1013.eqiad.wmnet with reason: host reimage
07:10 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1013.eqiad.wmnet with reason: host reimage
07:09 kartik@deploy2002: scap failed: <CalledProcessError> Command '['/usr/bin/scap', 'mwshell', '--no-local-config', '--directory', '/srv/mediawiki-staging', '--user', 'www-data', '--', 'rm -f /srv/mediawiki-staging/php-1.43.0-wmf.27/cache/l10n/*.tmp.*']' returned non-zero exit status 126. (scap version: 4.113.0) (duration: 00m 01s)
07:09 kartik@deploy2002: Started scap sync-world: Backport for Enable Special:Contribute on bnwiki
07:05 kartik@deploy2002: scap failed: <CalledProcessError> Command '['/usr/bin/scap', 'mwshell', '--no-local-config', '--directory', '/srv/mediawiki-staging', '--user', 'www-data', '--', 'rm -f /srv/mediawiki-staging/php-1.43.0-wmf.27/cache/l10n/*.tmp.*']' returned non-zero exit status 126. (scap version: 4.113.0) (duration: 00m 01s)
07:05 kartik@deploy2002: Started scap sync-world: Backport for Enable Special:Contribute on bnwiki
06:58 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 153087
06:58 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 153087
06:58 ayounsi@cumin1002: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'email' for AS: 153087
06:58 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 153087
06:56 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1013.eqiad.wmnet with OS bookworm
06:08 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2203.codfw.wmnet with reason: Maintenance
06:08 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2203.codfw.wmnet with reason: Maintenance
06:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1163.eqiad.wmnet with reason: Maintenance
06:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1163.eqiad.wmnet with reason: Maintenance
00:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 (T367856)', diff saved to https://phabricator.wikimedia.org/P70359 and previous config saved to /var/cache/conftool/dbconfig/20241021-000434-ladsgroup.json
00:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1211.eqiad.wmnet with reason: Maintenance
00:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1211.eqiad.wmnet with reason: Maintenance

2024-10-20

21:19 eileen: civicrm upgraded from 77ea54bc to cfb0def0
09:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T367856)', diff saved to https://phabricator.wikimedia.org/P70358 and previous config saved to /var/cache/conftool/dbconfig/20241020-095904-ladsgroup.json
09:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P70357 and previous config saved to /var/cache/conftool/dbconfig/20241020-094357-ladsgroup.json
09:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P70356 and previous config saved to /var/cache/conftool/dbconfig/20241020-092850-ladsgroup.json
09:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T367856)', diff saved to https://phabricator.wikimedia.org/P70355 and previous config saved to /var/cache/conftool/dbconfig/20241020-091344-ladsgroup.json

2024-10-19

00:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:13 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART

2024-10-18

22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:13 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:52 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:50 pt1979@cumin2002: START - Cookbook sre.dns.netbox
21:45 dduvall@deploy2002: Finished deploy [releng/jenkins-deploy@8c1070f] (releasing): deploying changes to publishMWSingleVersion job (duration: 01m 06s)
21:44 dduvall@deploy2002: Started deploy [releng/jenkins-deploy@8c1070f] (releasing): deploying changes to publishMWSingleVersion job
20:23 dduvall: deployed scap release 4.113.0 to releases{1003,2003} hosts
20:22 dduvall@deploy2002: Installing scap version "4.113.0" for 2 hosts
20:21 dduvall@deploy2002: install-world aborted: (no justification provided) (duration: 00m 52s)
20:20 dduvall@deploy2002: Installing scap version "latest" for 2 hosts
19:09 tzatziki: removing 3 files for legal compliance
18:56 tzatziki: removing 1 file for legal compliance
16:54 dzahn@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
16:54 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.reimage-stacked-control-plane (exit_code=0) Reimaging k8s control planes of cluster staging-codfw: containerd migration
16:54 dzahn@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
16:54 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster2005.codfw.wmnet with OS bookworm
16:32 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2005.codfw.wmnet with reason: host reimage
16:28 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2005.codfw.wmnet with reason: host reimage
16:10 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestagemaster2005.codfw.wmnet with OS bookworm
16:09 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster2004.codfw.wmnet with OS bookworm
15:46 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2004.codfw.wmnet with reason: host reimage
15:43 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2004.codfw.wmnet with reason: host reimage
15:26 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestagemaster2004.codfw.wmnet with OS bookworm
15:26 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster2003.codfw.wmnet with OS bookworm
15:02 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2003.codfw.wmnet with reason: host reimage
14:59 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:58 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2003.codfw.wmnet with reason: host reimage
14:57 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
14:53 akosiaris@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
14:53 akosiaris@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Removal of old mx records and api.svc records - akosiaris@cumin1002"
14:52 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Removal of old mx records and api.svc records - akosiaris@cumin1002"
14:48 milimetric@deploy2002: Finished deploy [airflow-dags/analytics@e44bacc]: Deploying updated dumps reconciliation (duration: 00m 31s)
14:47 milimetric@deploy2002: Started deploy [airflow-dags/analytics@e44bacc]: Deploying updated dumps reconciliation
14:39 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestagemaster2003.codfw.wmnet with OS bookworm
14:38 jayme@cumin1002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster staging-codfw: containerd migration
14:37 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1013.eqiad.wmnet
14:37 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1013.eqiad.wmnet
14:25 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
14:09 sergi0: Running `foreachwiki userOptions.php --delete-defaults growthexperiments-homepage-variant` (T374544, T375753)
13:47 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
13:46 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
13:32 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Hardware replacement
13:31 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Hardware replacement
13:22 milimetric@deploy2002: Finished deploy [airflow-dags/analytics@f020959]: Deploying updated dumps reconciliation (duration: 00m 31s)
13:22 milimetric@deploy2002: Started deploy [airflow-dags/analytics@f020959]: Deploying updated dumps reconciliation
13:03 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
12:22 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
12:22 jelto@deploy2002: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
12:22 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
12:21 jelto@deploy2002: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
11:43 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.reimage-stacked-control-plane (exit_code=0) Reimaging k8s control planes of cluster staging-codfw: containerd migration
11:43 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster2005.codfw.wmnet with OS bookworm
11:31 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dbstore1009.eqiad.wmnet
11:31 btullis@cumin1002: START - Cookbook sre.hosts.remove-downtime for dbstore1009.eqiad.wmnet
11:21 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2005.codfw.wmnet with reason: host reimage
11:17 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2005.codfw.wmnet with reason: host reimage
11:00 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestagemaster2005.codfw.wmnet with OS bookworm
11:00 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
10:59 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
10:59 jayme@cumin1002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster staging-codfw: containerd migration
10:58 jayme@cumin1002: conftool action : set/pooled=yes; selector: name=kubestagemaster2005.codfw.wmnet
10:39 jayme@cumin1002: END (FAIL) - Cookbook sre.k8s.reimage-stacked-control-plane (exit_code=99) Reimaging k8s control planes of cluster staging-codfw: containerd migration
10:38 jayme@cumin1002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster staging-codfw: containerd migration
10:37 jayme@cumin1002: conftool action : set/pooled=yes; selector: name=kubestagemaster2005.codfw.wmnet
10:37 jayme@cumin1002: conftool action : set/pooled=inactive; selector: name=kubestagemaster2005.codfw.wmnet
10:37 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
10:26 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
09:47 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
09:45 jelto@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
09:45 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
09:43 jelto@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
09:42 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
09:41 jelto@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
09:36 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync
09:35 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync
09:35 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync
09:33 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync
09:33 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync
09:33 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync
09:14 Dreamy_Jazz: Restarted MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
09:11 cjming@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
09:10 cjming@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
08:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2161 (T367856)', diff saved to https://phabricator.wikimedia.org/P70348 and previous config saved to /var/cache/conftool/dbconfig/20241018-080343-ladsgroup.json
08:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
08:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
01:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238 (T376905)', diff saved to https://phabricator.wikimedia.org/P70347 and previous config saved to /var/cache/conftool/dbconfig/20241018-015152-ladsgroup.json
01:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P70346 and previous config saved to /var/cache/conftool/dbconfig/20241018-013645-ladsgroup.json
01:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P70345 and previous config saved to /var/cache/conftool/dbconfig/20241018-012138-ladsgroup.json
01:16 eileen: civicrm upgraded from b0508a22 to 77ea54bc
01:16 eileen: ,
01:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238 (T376905)', diff saved to https://phabricator.wikimedia.org/P70344 and previous config saved to /var/cache/conftool/dbconfig/20241018-010631-ladsgroup.json
00:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2238 (T376905)', diff saved to https://phabricator.wikimedia.org/P70343 and previous config saved to /var/cache/conftool/dbconfig/20241018-005819-ladsgroup.json
00:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2238.codfw.wmnet with reason: Maintenance
00:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2238.codfw.wmnet with reason: Maintenance
00:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225 (T376905)', diff saved to https://phabricator.wikimedia.org/P70342 and previous config saved to /var/cache/conftool/dbconfig/20241018-005752-ladsgroup.json
00:43 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
00:43 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove mgmt DNS entries for old frack switches - pt1979@cumin2002"
00:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225', diff saved to https://phabricator.wikimedia.org/P70341 and previous config saved to /var/cache/conftool/dbconfig/20241018-004245-ladsgroup.json
00:42 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove mgmt DNS entries for old frack switches - pt1979@cumin2002"
00:38 pt1979@cumin2002: START - Cookbook sre.dns.netbox
00:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225', diff saved to https://phabricator.wikimedia.org/P70340 and previous config saved to /var/cache/conftool/dbconfig/20241018-002738-ladsgroup.json
00:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225 (T376905)', diff saved to https://phabricator.wikimedia.org/P70339 and previous config saved to /var/cache/conftool/dbconfig/20241018-001231-ladsgroup.json
00:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2225 (T376905)', diff saved to https://phabricator.wikimedia.org/P70338 and previous config saved to /var/cache/conftool/dbconfig/20241018-000422-ladsgroup.json
00:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2225.codfw.wmnet with reason: Maintenance
00:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2225.codfw.wmnet with reason: Maintenance
00:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T376905)', diff saved to https://phabricator.wikimedia.org/P70337 and previous config saved to /var/cache/conftool/dbconfig/20241018-000356-ladsgroup.json

2024-10-17

23:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P70336 and previous config saved to /var/cache/conftool/dbconfig/20241017-234849-ladsgroup.json
23:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P70335 and previous config saved to /var/cache/conftool/dbconfig/20241017-233342-ladsgroup.json
23:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T376905)', diff saved to https://phabricator.wikimedia.org/P70334 and previous config saved to /var/cache/conftool/dbconfig/20241017-231835-ladsgroup.json
23:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2207 (T376905)', diff saved to https://phabricator.wikimedia.org/P70333 and previous config saved to /var/cache/conftool/dbconfig/20241017-231037-ladsgroup.json
23:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Maintenance
23:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Maintenance
23:05 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
23:05 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
23:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T376905)', diff saved to https://phabricator.wikimedia.org/P70332 and previous config saved to /var/cache/conftool/dbconfig/20241017-230457-ladsgroup.json
22:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P70331 and previous config saved to /var/cache/conftool/dbconfig/20241017-224950-ladsgroup.json
22:42 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
22:42 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
22:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T376905)', diff saved to https://phabricator.wikimedia.org/P70330 and previous config saved to /var/cache/conftool/dbconfig/20241017-224209-ladsgroup.json
22:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P70329 and previous config saved to /var/cache/conftool/dbconfig/20241017-223443-ladsgroup.json
22:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P70328 and previous config saved to /var/cache/conftool/dbconfig/20241017-222702-ladsgroup.json
22:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T376905)', diff saved to https://phabricator.wikimedia.org/P70327 and previous config saved to /var/cache/conftool/dbconfig/20241017-221936-ladsgroup.json
22:15 eileen: civicrm upgraded from f980ace9 to b0508a22
22:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P70326 and previous config saved to /var/cache/conftool/dbconfig/20241017-221155-ladsgroup.json
22:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2189 (T376905)', diff saved to https://phabricator.wikimedia.org/P70325 and previous config saved to /var/cache/conftool/dbconfig/20241017-221123-ladsgroup.json
22:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: Maintenance
22:11 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: Maintenance
22:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T376905)', diff saved to https://phabricator.wikimedia.org/P70324 and previous config saved to /var/cache/conftool/dbconfig/20241017-221057-ladsgroup.json
21:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T376905)', diff saved to https://phabricator.wikimedia.org/P70323 and previous config saved to /var/cache/conftool/dbconfig/20241017-215648-ladsgroup.json
21:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P70322 and previous config saved to /var/cache/conftool/dbconfig/20241017-215550-ladsgroup.json
21:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1223 (T376905)', diff saved to https://phabricator.wikimedia.org/P70321 and previous config saved to /var/cache/conftool/dbconfig/20241017-215014-ladsgroup.json
21:50 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
21:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
21:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T376905)', diff saved to https://phabricator.wikimedia.org/P70320 and previous config saved to /var/cache/conftool/dbconfig/20241017-214949-ladsgroup.json
21:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P70319 and previous config saved to /var/cache/conftool/dbconfig/20241017-214043-ladsgroup.json
21:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P70318 and previous config saved to /var/cache/conftool/dbconfig/20241017-213442-ladsgroup.json
21:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T376905)', diff saved to https://phabricator.wikimedia.org/P70317 and previous config saved to /var/cache/conftool/dbconfig/20241017-212536-ladsgroup.json
21:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P70316 and previous config saved to /var/cache/conftool/dbconfig/20241017-211935-ladsgroup.json
21:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2175 (T376905)', diff saved to https://phabricator.wikimedia.org/P70315 and previous config saved to /var/cache/conftool/dbconfig/20241017-211458-ladsgroup.json
21:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
21:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
21:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T376905)', diff saved to https://phabricator.wikimedia.org/P70314 and previous config saved to /var/cache/conftool/dbconfig/20241017-211432-ladsgroup.json
21:11 kindrobot: UTC late backport window finished <3
21:08 kindrobot: results of de-duping: https://phabricator.wikimedia.org/P70313
21:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T376905)', diff saved to https://phabricator.wikimedia.org/P70312 and previous config saved to /var/cache/conftool/dbconfig/20241017-210428-ladsgroup.json
21:01 kindrobot: ran mwscript-k8s -f --comment="https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1080078/comments/02a9334e_cd3e7a0e" -- namespaceDupes.php on: bclwikisource, bewwiki, gorwikiquote, iglwiki, kaawiktionary, kgewiki, kuswiki, madwiktionary, moswiki, nrwiki, rskwiki, shnwikinews, and tddwiki
20:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P70311 and previous config saved to /var/cache/conftool/dbconfig/20241017-205925-ladsgroup.json
20:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1212 (T376905)', diff saved to https://phabricator.wikimedia.org/P70310 and previous config saved to /var/cache/conftool/dbconfig/20241017-205655-ladsgroup.json
20:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
20:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
20:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
20:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
20:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T376905)', diff saved to https://phabricator.wikimedia.org/P70309 and previous config saved to /var/cache/conftool/dbconfig/20241017-205612-ladsgroup.json
20:52 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
20:51 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
20:50 eileen: config revision changed from 150b02a9 to 0d019da0
20:50 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
20:50 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
20:49 eileen: config revision changed from 3b3e5cad to 0d019da0
20:48 kindrobot@deploy2002: Finished scap sync-world: Backport for Configure namespaces, sitenames, and timezones for new wikis (T377160 T375102 T375017 T375424 T376572 T377088 T374644 T375024 T374815 T375095 T375433 T360303 T363256 T360310) (duration: 31m 15s)
20:46 eileen: config revision changed from bf02494d to 3b3e5cad
20:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P70308 and previous config saved to /var/cache/conftool/dbconfig/20241017-204418-ladsgroup.json
20:43 kindrobot@deploy2002: pppery, kindrobot: Continuing with sync
20:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P70307 and previous config saved to /var/cache/conftool/dbconfig/20241017-204105-ladsgroup.json
20:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T376905)', diff saved to https://phabricator.wikimedia.org/P70306 and previous config saved to /var/cache/conftool/dbconfig/20241017-202911-ladsgroup.json
20:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P70305 and previous config saved to /var/cache/conftool/dbconfig/20241017-202558-ladsgroup.json
20:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T376905)', diff saved to https://phabricator.wikimedia.org/P70304 and previous config saved to /var/cache/conftool/dbconfig/20241017-201944-ladsgroup.json
20:20 kindrobot@deploy2002: pppery, kindrobot: Backport for Configure namespaces, sitenames, and timezones for new wikis (T377160 T375102 T375017 T375424 T376572 T377088 T374644 T375024 T374815 T375095 T375433 T360303 T363256 T360310) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
20:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
20:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T376905)', diff saved to https://phabricator.wikimedia.org/P70303 and previous config saved to /var/cache/conftool/dbconfig/20241017-201919-ladsgroup.json
20:17 kindrobot@deploy2002: Started scap sync-world: Backport for Configure namespaces, sitenames, and timezones for new wikis (T377160 T375102 T375017 T375424 T376572 T377088 T374644 T375024 T374815 T375095 T375433 T360303 T363256 T360310)
20:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T376905)', diff saved to https://phabricator.wikimedia.org/P70302 and previous config saved to /var/cache/conftool/dbconfig/20241017-201051-ladsgroup.json
20:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P70301 and previous config saved to /var/cache/conftool/dbconfig/20241017-200412-ladsgroup.json
20:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1198 (T376905)', diff saved to https://phabricator.wikimedia.org/P70300 and previous config saved to /var/cache/conftool/dbconfig/20241017-200147-ladsgroup.json
20:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
20:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
20:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T376905)', diff saved to https://phabricator.wikimedia.org/P70299 and previous config saved to /var/cache/conftool/dbconfig/20241017-200122-ladsgroup.json
19:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P70298 and previous config saved to /var/cache/conftool/dbconfig/20241017-194905-ladsgroup.json
19:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P70297 and previous config saved to /var/cache/conftool/dbconfig/20241017-194615-ladsgroup.json
19:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T376905)', diff saved to https://phabricator.wikimedia.org/P70296 and previous config saved to /var/cache/conftool/dbconfig/20241017-193358-ladsgroup.json
19:33 swfrench-wmf: ran authdns-update to pick up records for mw-(web|api-ext)-next in svc - T377040
19:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P70295 and previous config saved to /var/cache/conftool/dbconfig/20241017-193108-ladsgroup.json
19:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2126 (T376905)', diff saved to https://phabricator.wikimedia.org/P70294 and previous config saved to /var/cache/conftool/dbconfig/20241017-192424-ladsgroup.json
19:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
19:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
19:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
19:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
19:18 dancy@deploy2002: Finished scap sync-world: testing https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/484 (duration: 02m 46s)
19:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T376905)', diff saved to https://phabricator.wikimedia.org/P70293 and previous config saved to /var/cache/conftool/dbconfig/20241017-191601-ladsgroup.json
19:15 dancy@deploy2002: Started scap sync-world: testing https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/484
19:13 dancy@deploy2002: Installing scap version "4.112.0" for 1 hosts
19:07 dancy@deploy2002: Installing scap version "4.112.0" for 210 hosts
19:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1175 (T376905)', diff saved to https://phabricator.wikimedia.org/P70292 and previous config saved to /var/cache/conftool/dbconfig/20241017-190655-ladsgroup.json
19:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
19:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
19:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
19:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
18:54 ladsgroup@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
18:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
18:53 ladsgroup@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
18:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
18:49 dancy@deploy2002: Finished scap sync-world: testing scap 4.111.0 (duration: 02m 44s)
18:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
18:48 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
18:48 urbanecm: mwscript-k8s --comment=T377360 -f -- extensions/Flow/maintenance/FlowFixInconsistentBoards.php --wiki=wikidatawiki # T377360
18:47 dancy@deploy2002: Started scap sync-world: testing scap 4.111.0
18:45 dancy@deploy2002: Installation of scap version "4.111.0" completed for 210 hosts
18:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P70291 and previous config saved to /var/cache/conftool/dbconfig/20241017-184402-arnaudb.json
18:41 dancy@deploy2002: Installing scap version "4.111.0" for 210 hosts
18:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P70290 and previous config saved to /var/cache/conftool/dbconfig/20241017-182855-arnaudb.json
18:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
18:22 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
18:19 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.43.0-wmf.27 refs T375658
18:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2081.codfw.wmnet with OS bullseye
18:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P70289 and previous config saved to /var/cache/conftool/dbconfig/20241017-181348-arnaudb.json
17:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P70288 and previous config saved to /var/cache/conftool/dbconfig/20241017-175841-arnaudb.json
17:56 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
17:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
17:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
17:34 swfrench@deploy2002: Finished scap sync-world: Testing scap after mw-api-ext / mw-web next release bring up - T377040 (duration: 02m 54s)
17:31 swfrench@deploy2002: Started scap sync-world: Testing scap after mw-api-ext / mw-web next release bring up - T377040
17:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
17:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
17:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T376905)', diff saved to https://phabricator.wikimedia.org/P70287 and previous config saved to /var/cache/conftool/dbconfig/20241017-171844-ladsgroup.json
17:18 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
17:17 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
17:17 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
17:16 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
17:15 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
17:15 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
17:14 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
17:14 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
17:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye
17:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
17:12 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
17:07 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
17:06 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
17:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P70286 and previous config saved to /var/cache/conftool/dbconfig/20241017-170337-ladsgroup.json
16:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P70285 and previous config saved to /var/cache/conftool/dbconfig/20241017-165814-arnaudb.json
16:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
16:58 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
16:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P70284 and previous config saved to /var/cache/conftool/dbconfig/20241017-165803-arnaudb.json
16:55 mutante: phab2002 T377396 - reboot | in addition to /etc/passwd also fix aphlict GID in /etc/group | fixed puppet run which can now create group vcs. now equivalent to prod server phab1004.
16:53 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
16:52 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
16:52 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
16:52 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
16:51 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
16:51 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
16:50 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
16:49 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
16:49 mutante: phab2002 T377396 - fix UIDs/GIDs for phab-related system users: vcs: uid 496 -> 497 | aphlict: uid 497 -> uid 496, gid 497 -> gid 496 | chown aphlict:aphlict /var/log/aphlict | chown aphlict:aphlict /run/aphlict
16:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P70283 and previous config saved to /var/cache/conftool/dbconfig/20241017-164830-ladsgroup.json
16:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P70282 and previous config saved to /var/cache/conftool/dbconfig/20241017-164256-arnaudb.json
16:41 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
16:40 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
16:38 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
16:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T376905)', diff saved to https://phabricator.wikimedia.org/P70281 and previous config saved to /var/cache/conftool/dbconfig/20241017-163324-ladsgroup.json
16:28 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
16:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P70280 and previous config saved to /var/cache/conftool/dbconfig/20241017-162749-arnaudb.json
16:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P70279 and previous config saved to /var/cache/conftool/dbconfig/20241017-161242-arnaudb.json
16:02 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
16:01 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/page-analytics: apply
16:00 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
16:00 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply
15:59 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
15:59 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
15:59 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
15:58 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply
15:58 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
15:58 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
15:57 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
15:57 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply
15:56 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
15:56 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply
15:52 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
15:51 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply
15:51 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
15:50 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply
15:48 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
15:48 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply
15:47 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
15:47 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
15:45 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
15:45 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
15:44 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
15:44 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
15:41 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
15:40 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
15:39 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster2005.codfw.wmnet with OS bookworm
15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P70278 and previous config saved to /var/cache/conftool/dbconfig/20241017-153546-ladsgroup.json
15:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T376905)', diff saved to https://phabricator.wikimedia.org/P70277 and previous config saved to /var/cache/conftool/dbconfig/20241017-153257-ladsgroup.json
15:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance
15:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance
15:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T376905)', diff saved to https://phabricator.wikimedia.org/P70276 and previous config saved to /var/cache/conftool/dbconfig/20241017-153238-ladsgroup.json
15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P70275 and previous config saved to /var/cache/conftool/dbconfig/20241017-152040-ladsgroup.json
15:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P70274 and previous config saved to /var/cache/conftool/dbconfig/20241017-151731-ladsgroup.json
15:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P70273 and previous config saved to /var/cache/conftool/dbconfig/20241017-151216-arnaudb.json
15:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2172.codfw.wmnet with reason: Maintenance
15:12 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2172.codfw.wmnet with reason: Maintenance
15:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70272 and previous config saved to /var/cache/conftool/dbconfig/20241017-151204-arnaudb.json
15:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P70271 and previous config saved to /var/cache/conftool/dbconfig/20241017-150535-ladsgroup.json
15:05 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
15:05 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
15:04 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
15:03 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
15:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P70270 and previous config saved to /var/cache/conftool/dbconfig/20241017-150224-ladsgroup.json
15:01 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
15:00 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
15:00 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
14:59 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
14:57 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
14:57 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
14:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70269 and previous config saved to /var/cache/conftool/dbconfig/20241017-145657-arnaudb.json
14:56 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:56 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:54 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
14:54 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
14:54 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
14:53 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
14:53 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
14:52 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
14:52 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
14:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P70268 and previous config saved to /var/cache/conftool/dbconfig/20241017-145030-ladsgroup.json
14:51 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
14:51 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
14:51 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
14:50 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
14:50 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
14:49 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
14:49 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
14:49 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
14:48 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
14:48 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:47 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
14:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T376905)', diff saved to https://phabricator.wikimedia.org/P70267 and previous config saved to /var/cache/conftool/dbconfig/20241017-144717-ladsgroup.json
14:43 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:43 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70266 and previous config saved to /var/cache/conftool/dbconfig/20241017-144150-arnaudb.json
14:41 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:40 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
14:40 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
14:39 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
14:38 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
14:38 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
14:31 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2005.codfw.wmnet with reason: host reimage
14:28 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2005.codfw.wmnet with reason: host reimage
14:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70265 and previous config saved to /var/cache/conftool/dbconfig/20241017-142643-arnaudb.json
14:09 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestagemaster2005.codfw.wmnet with OS bookworm
14:08 urbanecm@deploy2002: Finished scap sync-world: Backport for Bump wikimedia/parsoid to 0.20.0-a26 (T377287), Bump wikimedia/parsoid to 0.20.0-a26 (T377287) (duration: 09m 41s)
14:03 urbanecm@deploy2002: cscott, urbanecm: Continuing with sync
14:00 urbanecm@deploy2002: cscott, urbanecm: Backport for Bump wikimedia/parsoid to 0.20.0-a26 (T377287), Bump wikimedia/parsoid to 0.20.0-a26 (T377287) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:00 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
13:59 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
13:58 urbanecm@deploy2002: Started scap sync-world: Backport for Bump wikimedia/parsoid to 0.20.0-a26 (T377287), Bump wikimedia/parsoid to 0.20.0-a26 (T377287)
13:56 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
13:54 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
13:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T376905)', diff saved to https://phabricator.wikimedia.org/P70264 and previous config saved to /var/cache/conftool/dbconfig/20241017-134651-ladsgroup.json
13:47 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
13:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T376905)', diff saved to https://phabricator.wikimedia.org/P70263 and previous config saved to /var/cache/conftool/dbconfig/20241017-134636-ladsgroup.json
13:35 urbanecm@deploy2002: Finished scap sync-world: Backport for Set $wgAllowRawHtmlCopyrightMessages = false (T375789), tests: ensure maintenance base class has always been requierd (T377391 T357535) (duration: 08m 07s)
13:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70261 and previous config saved to /var/cache/conftool/dbconfig/20241017-133129-ladsgroup.json
13:30 urbanecm@deploy2002: cscott, urbanecm, matmarex: Continuing with sync
13:29 urbanecm@deploy2002: cscott, urbanecm, matmarex: Backport for Set $wgAllowRawHtmlCopyrightMessages = false (T375789), tests: ensure maintenance base class has always been requierd (T377391 T357535) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:29 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript updateCollation.php --wiki=cswikivoyage --previous-collation=uppercase # T377446
13:27 urbanecm@deploy2002: Started scap sync-world: Backport for Set $wgAllowRawHtmlCopyrightMessages = false (T375789), tests: ensure maintenance base class has always been requierd (T377391 T357535)
13:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70260 and previous config saved to /var/cache/conftool/dbconfig/20241017-132617-arnaudb.json
13:26 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
13:26 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
13:26 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
13:26 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
13:24 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2204.codfw.wmnet with reason: Maintenance
13:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2204.codfw.wmnet with reason: Maintenance
13:23 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1222.eqiad.wmnet with reason: Maintenance
13:23 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1222.eqiad.wmnet with reason: Maintenance
13:22 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
13:22 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
13:18 inflatador: bking@wdqs1015 depooling to catch up on lag
13:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70258 and previous config saved to /var/cache/conftool/dbconfig/20241017-131622-ladsgroup.json
13:14 urbanecm@deploy2002: Finished scap sync-world: Backport for cswikivoyage: Set category collation to uca-cs-u-kn (T377446), QuickSurveys: Update safety survey coverage (T376517) (duration: 07m 23s)
13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1166 (T376905)', diff saved to https://phabricator.wikimedia.org/P70257 and previous config saved to /var/cache/conftool/dbconfig/20241017-131012-ladsgroup.json
13:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
13:10 urbanecm@deploy2002: kharlan, urbanecm: Continuing with sync
13:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
13:09 urbanecm@deploy2002: kharlan, urbanecm: Backport for cswikivoyage: Set category collation to uca-cs-u-kn (T377446), QuickSurveys: Update safety survey coverage (T376517) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T376905)', diff saved to https://phabricator.wikimedia.org/P70256 and previous config saved to /var/cache/conftool/dbconfig/20241017-130947-ladsgroup.json
13:09 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
13:07 urbanecm@deploy2002: Started scap sync-world: Backport for cswikivoyage: Set category collation to uca-cs-u-kn (T377446), QuickSurveys: Update safety survey coverage (T376517)
13:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T376905)', diff saved to https://phabricator.wikimedia.org/P70255 and previous config saved to /var/cache/conftool/dbconfig/20241017-130115-ladsgroup.json
13:00 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster2005.codfw.wmnet with OS bookworm
12:59 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
12:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2209.codfw.wmnet with reason: Maintenance
12:58 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2209.codfw.wmnet with reason: Maintenance
12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1189.eqiad.wmnet with reason: Maintenance
12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1189.eqiad.wmnet with reason: Maintenance
12:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P70254 and previous config saved to /var/cache/conftool/dbconfig/20241017-125440-ladsgroup.json
12:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P70253 and previous config saved to /var/cache/conftool/dbconfig/20241017-123932-ladsgroup.json
12:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T376905)', diff saved to https://phabricator.wikimedia.org/P70252 and previous config saved to /var/cache/conftool/dbconfig/20241017-122425-ladsgroup.json
12:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1157 (T376905)', diff saved to https://phabricator.wikimedia.org/P70251 and previous config saved to /var/cache/conftool/dbconfig/20241017-121525-ladsgroup.json
12:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
12:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
12:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
12:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
12:07 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:07 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
12:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T376905)', diff saved to https://phabricator.wikimedia.org/P70250 and previous config saved to /var/cache/conftool/dbconfig/20241017-120049-ladsgroup.json
12:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
12:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
12:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
12:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
12:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T376905)', diff saved to https://phabricator.wikimedia.org/P70249 and previous config saved to /var/cache/conftool/dbconfig/20241017-120029-ladsgroup.json
11:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P70248 and previous config saved to /var/cache/conftool/dbconfig/20241017-114522-ladsgroup.json
11:39 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1177.eqiad.wmnet
11:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P70247 and previous config saved to /var/cache/conftool/dbconfig/20241017-113014-ladsgroup.json
11:29 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1177.eqiad.wmnet
11:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T376905)', diff saved to https://phabricator.wikimedia.org/P70246 and previous config saved to /var/cache/conftool/dbconfig/20241017-111507-ladsgroup.json
11:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T376905)', diff saved to https://phabricator.wikimedia.org/P70245 and previous config saved to /var/cache/conftool/dbconfig/20241017-110527-ladsgroup.json
11:05 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
11:05 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
10:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
10:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
10:17 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on kubestagemaster2005.codfw.wmnet with reason: reimage
10:17 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on kubestagemaster2005.codfw.wmnet with reason: reimage
09:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
09:34 dzahn@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host phab2002.codfw.wmnet with OS bullseye
09:22 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
09:21 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
09:09 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add support for read-only users - oblivian@cumin1002"
09:09 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add support for read-only users - oblivian@cumin1002
09:08 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add support for read-only users - oblivian@cumin1002
09:08 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add support for read-only users - oblivian@cumin1002"
09:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 100%: post clone', diff saved to https://phabricator.wikimedia.org/P70243 and previous config saved to /var/cache/conftool/dbconfig/20241017-090731-arnaudb.json
08:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 75%: post clone', diff saved to https://phabricator.wikimedia.org/P70242 and previous config saved to /var/cache/conftool/dbconfig/20241017-085226-arnaudb.json
08:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 50%: post clone', diff saved to https://phabricator.wikimedia.org/P70241 and previous config saved to /var/cache/conftool/dbconfig/20241017-083721-arnaudb.json
08:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 25%: post clone', diff saved to https://phabricator.wikimedia.org/P70240 and previous config saved to /var/cache/conftool/dbconfig/20241017-082215-arnaudb.json
08:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db2149 to reclone on db2205 - T377276', diff saved to https://phabricator.wikimedia.org/P70239 and previous config saved to /var/cache/conftool/dbconfig/20241017-081822-arnaudb.json
08:18 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 25%: post clone', diff saved to https://phabricator.wikimedia.org/P70238 and previous config saved to /var/cache/conftool/dbconfig/20241017-081802-arnaudb.json
08:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2149.codfw.wmnet onto db2205.codfw.wmnet
08:11 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1065.eqiad.wmnet
08:01 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1065.eqiad.wmnet
07:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
07:55 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
07:51 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2005.codfw.wmnet with reason: host reimage
07:48 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2005.codfw.wmnet with reason: host reimage
07:37 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
07:37 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
07:37 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
07:36 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
07:28 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestagemaster2005.codfw.wmnet with OS bookworm
07:19 dcausse@deploy2002: Finished scap sync-world: Backport for cirrus: cleanup removed label_count field on next re-index (T377226) (duration: 10m 40s)
07:18 jayme@cumin1002: conftool action : set/pooled=inactive; selector: name=kubestagemaster2005.codfw.wmnet
07:14 dcausse@deploy2002: dcausse: Continuing with sync
07:13 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kubestagemaster2005.codfw.wmnet with reason: reimage
07:13 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on kubestagemaster2005.codfw.wmnet with reason: reimage
07:13 dcausse@deploy2002: dcausse: Backport for cirrus: cleanup removed label_count field on next re-index (T377226) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:08 dcausse@deploy2002: Started scap sync-world: Backport for cirrus: cleanup removed label_count field on next re-index (T377226)
07:00 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2149.codfw.wmnet onto db2205.codfw.wmnet
07:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db2149 to reclone on db2205 - T377276', diff saved to https://phabricator.wikimedia.org/P70237 and previous config saved to /var/cache/conftool/dbconfig/20241017-070015-arnaudb.json
06:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2205.codfw.wmnet with OS bookworm
06:32 arnaudb@cumin1002: dbctl commit (dc=all): 'db1219 (re)pooling @ 100%: T367781', diff saved to https://phabricator.wikimedia.org/P70236 and previous config saved to /var/cache/conftool/dbconfig/20241017-063238-arnaudb.json
06:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2205.codfw.wmnet with reason: host reimage
06:26 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2205.codfw.wmnet with reason: host reimage
06:17 arnaudb@cumin1002: dbctl commit (dc=all): 'db1219 (re)pooling @ 75%: T367781', diff saved to https://phabricator.wikimedia.org/P70235 and previous config saved to /var/cache/conftool/dbconfig/20241017-061732-arnaudb.json
06:07 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db2205.codfw.wmnet with OS bookworm
06:02 arnaudb@cumin1002: dbctl commit (dc=all): 'db1219 (re)pooling @ 50%: T367781', diff saved to https://phabricator.wikimedia.org/P70234 and previous config saved to /var/cache/conftool/dbconfig/20241017-060227-arnaudb.json
05:47 arnaudb@cumin1002: dbctl commit (dc=all): 'db1219 (re)pooling @ 25%: T367781', diff saved to https://phabricator.wikimedia.org/P70233 and previous config saved to /var/cache/conftool/dbconfig/20241017-054722-arnaudb.json
05:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222 (T376905)', diff saved to https://phabricator.wikimedia.org/P70231 and previous config saved to /var/cache/conftool/dbconfig/20241017-051700-ladsgroup.json
05:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222', diff saved to https://phabricator.wikimedia.org/P70230 and previous config saved to /var/cache/conftool/dbconfig/20241017-050153-ladsgroup.json
04:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222', diff saved to https://phabricator.wikimedia.org/P70229 and previous config saved to /var/cache/conftool/dbconfig/20241017-044646-ladsgroup.json
04:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222 (T376905)', diff saved to https://phabricator.wikimedia.org/P70228 and previous config saved to /var/cache/conftool/dbconfig/20241017-043139-ladsgroup.json
04:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2222 (T376905)', diff saved to https://phabricator.wikimedia.org/P70227 and previous config saved to /var/cache/conftool/dbconfig/20241017-042440-ladsgroup.json
04:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2222.codfw.wmnet with reason: Maintenance
04:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2222.codfw.wmnet with reason: Maintenance
04:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221 (T376905)', diff saved to https://phabricator.wikimedia.org/P70226 and previous config saved to /var/cache/conftool/dbconfig/20241017-042413-ladsgroup.json
04:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221', diff saved to https://phabricator.wikimedia.org/P70225 and previous config saved to /var/cache/conftool/dbconfig/20241017-040906-ladsgroup.json
03:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221', diff saved to https://phabricator.wikimedia.org/P70224 and previous config saved to /var/cache/conftool/dbconfig/20241017-035359-ladsgroup.json
03:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221 (T376905)', diff saved to https://phabricator.wikimedia.org/P70223 and previous config saved to /var/cache/conftool/dbconfig/20241017-033852-ladsgroup.json
03:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2221 (T376905)', diff saved to https://phabricator.wikimedia.org/P70222 and previous config saved to /var/cache/conftool/dbconfig/20241017-033144-ladsgroup.json
03:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2221.codfw.wmnet with reason: Maintenance
03:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2221.codfw.wmnet with reason: Maintenance
03:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T376905)', diff saved to https://phabricator.wikimedia.org/P70221 and previous config saved to /var/cache/conftool/dbconfig/20241017-033118-ladsgroup.json
03:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P70220 and previous config saved to /var/cache/conftool/dbconfig/20241017-031611-ladsgroup.json
03:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P70219 and previous config saved to /var/cache/conftool/dbconfig/20241017-030104-ladsgroup.json
02:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T376905)', diff saved to https://phabricator.wikimedia.org/P70218 and previous config saved to /var/cache/conftool/dbconfig/20241017-024557-ladsgroup.json
02:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2220 (T376905)', diff saved to https://phabricator.wikimedia.org/P70217 and previous config saved to /var/cache/conftool/dbconfig/20241017-023857-ladsgroup.json
02:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2220.codfw.wmnet with reason: Maintenance
02:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2220.codfw.wmnet with reason: Maintenance
02:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T376905)', diff saved to https://phabricator.wikimedia.org/P70216 and previous config saved to /var/cache/conftool/dbconfig/20241017-023831-ladsgroup.json
02:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P70215 and previous config saved to /var/cache/conftool/dbconfig/20241017-022324-ladsgroup.json
02:18 tstarling@deploy2002: Synchronized wmf-config/InitialiseSettings.php: T4085 Enable en on Commons and Meta (duration: 06m 34s)
02:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P70214 and previous config saved to /var/cache/conftool/dbconfig/20241017-020817-ladsgroup.json
01:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T376905)', diff saved to https://phabricator.wikimedia.org/P70213 and previous config saved to /var/cache/conftool/dbconfig/20241017-015310-ladsgroup.json
01:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2208 (T376905)', diff saved to https://phabricator.wikimedia.org/P70212 and previous config saved to /var/cache/conftool/dbconfig/20241017-014500-ladsgroup.json
01:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2208.codfw.wmnet with reason: Maintenance
01:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2208.codfw.wmnet with reason: Maintenance
01:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
01:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T376905)', diff saved to https://phabricator.wikimedia.org/P70211 and previous config saved to /var/cache/conftool/dbconfig/20241017-013926-ladsgroup.json
01:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P70210 and previous config saved to /var/cache/conftool/dbconfig/20241017-012419-ladsgroup.json
01:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P70209 and previous config saved to /var/cache/conftool/dbconfig/20241017-010912-ladsgroup.json
00:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T376905)', diff saved to https://phabricator.wikimedia.org/P70208 and previous config saved to /var/cache/conftool/dbconfig/20241017-005405-ladsgroup.json
00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2182 (T376905)', diff saved to https://phabricator.wikimedia.org/P70207 and previous config saved to /var/cache/conftool/dbconfig/20241017-004537-ladsgroup.json
00:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
00:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T376905)', diff saved to https://phabricator.wikimedia.org/P70206 and previous config saved to /var/cache/conftool/dbconfig/20241017-004511-ladsgroup.json
00:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P70204 and previous config saved to /var/cache/conftool/dbconfig/20241017-003004-ladsgroup.json
00:26 cjming@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
00:25 cjming@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
00:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P70203 and previous config saved to /var/cache/conftool/dbconfig/20241017-001457-ladsgroup.json

2024-10-16

23:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T376905)', diff saved to https://phabricator.wikimedia.org/P70202 and previous config saved to /var/cache/conftool/dbconfig/20241016-235950-ladsgroup.json
23:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2168 (T376905)', diff saved to https://phabricator.wikimedia.org/P70201 and previous config saved to /var/cache/conftool/dbconfig/20241016-235129-ladsgroup.json
23:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
23:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
23:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T376905)', diff saved to https://phabricator.wikimedia.org/P70200 and previous config saved to /var/cache/conftool/dbconfig/20241016-235102-ladsgroup.json
23:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P70199 and previous config saved to /var/cache/conftool/dbconfig/20241016-233555-ladsgroup.json
23:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P70198 and previous config saved to /var/cache/conftool/dbconfig/20241016-232048-ladsgroup.json
23:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T376905)', diff saved to https://phabricator.wikimedia.org/P70197 and previous config saved to /var/cache/conftool/dbconfig/20241016-230541-ladsgroup.json
22:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2159 (T376905)', diff saved to https://phabricator.wikimedia.org/P70196 and previous config saved to /var/cache/conftool/dbconfig/20241016-225716-ladsgroup.json
22:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
22:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
22:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
22:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
22:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T376905)', diff saved to https://phabricator.wikimedia.org/P70195 and previous config saved to /var/cache/conftool/dbconfig/20241016-225646-ladsgroup.json
22:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P70194 and previous config saved to /var/cache/conftool/dbconfig/20241016-224139-ladsgroup.json
22:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P70193 and previous config saved to /var/cache/conftool/dbconfig/20241016-222632-ladsgroup.json
22:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T376905)', diff saved to https://phabricator.wikimedia.org/P70192 and previous config saved to /var/cache/conftool/dbconfig/20241016-221125-ladsgroup.json
22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2150 (T376905)', diff saved to https://phabricator.wikimedia.org/P70191 and previous config saved to /var/cache/conftool/dbconfig/20241016-220053-ladsgroup.json
22:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
22:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
21:17 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
21:17 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
21:12 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
21:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
20:44 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
20:44 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
20:43 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
20:43 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
20:39 gmodena@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
20:39 gmodena@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
20:37 gmodena@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
20:37 gmodena@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
20:31 brennen@deploy2002: Finished deploy [phabricator/deployment@40a63c9]: deploy phab2002 for T377374 (duration: 00m 08s)
20:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T376905)', diff saved to https://phabricator.wikimedia.org/P70189 and previous config saved to /var/cache/conftool/dbconfig/20241016-203034-ladsgroup.json
20:30 brennen@deploy2002: Started deploy [phabricator/deployment@40a63c9]: deploy phab2002 for T377374
20:29 gmodena@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
20:29 gmodena@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
20:26 jhuneidi@deploy2002: Finished scap sync-world: Backport for Make wikitech a target for CentralNotice banners (T377030) (duration: 10m 02s)
20:21 jhuneidi@deploy2002: ejegg, jhuneidi: Continuing with sync
20:18 jhuneidi@deploy2002: ejegg, jhuneidi: Backport for Make wikitech a target for CentralNotice banners (T377030) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:18 mutante: phab2002 - ln -s /var/lib/scap/scap/bin/scap /usr/bin/scap
20:17 mutante: phab2002 - after manually running bootstrap-scap-target.sh and "Scap from local bullseye wheels successfully installed at /var/lib/scap/scap" still "cannot open `/usr/bin/scap' (No such file or directory)" though. T303559 T310740 T377374
20:17 jhuneidi@deploy2002: Started scap sync-world: Backport for Make wikitech a target for CentralNotice banners (T377030)
20:16 mutante: phab2002 - manually bootstrapping scap since puppet did not do it due to dependency cycles: sudo -u scap /usr/local/bin/bootstrap-scap-target.sh deploy2002.codfw.wmnet /var/lib/scap T303559 T310740 T377374
20:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P70188 and previous config saved to /var/cache/conftool/dbconfig/20241016-201527-ladsgroup.json
20:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P70187 and previous config saved to /var/cache/conftool/dbconfig/20241016-200020-ladsgroup.json
19:54 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx-out1001.wikimedia.org
19:50 jhathaway@cumin1002: START - Cookbook sre.hosts.reboot-single for host mx-out1001.wikimedia.org
19:49 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx-out2001.wikimedia.org
19:47 jhathaway@cumin1002: START - Cookbook sre.hosts.reboot-single for host mx-out2001.wikimedia.org
19:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T376905)', diff saved to https://phabricator.wikimedia.org/P70186 and previous config saved to /var/cache/conftool/dbconfig/20241016-194513-ladsgroup.json
19:47 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mx-out2001.wikimedia.org
19:47 jhathaway@cumin1002: START - Cookbook sre.hosts.reboot-single for host mx-out2001.wikimedia.org
19:46 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mx-out2001.wikimedia.org
19:45 jhathaway@cumin1002: START - Cookbook sre.hosts.reboot-single for host mx-out2001.wikimedia.org
19:45 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mx1001.wikimedia.org
19:44 jhathaway@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:44 jhathaway@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mx1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jhathaway@cumin1002"
19:43 jhathaway@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mx1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jhathaway@cumin1002"
19:42 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mx-out2001.wikimedia.org
19:42 jhathaway@cumin1002: START - Cookbook sre.hosts.reboot-single for host mx-out2001.wikimedia.org
19:40 jhathaway@cumin1002: START - Cookbook sre.dns.netbox
19:36 jhathaway@cumin1002: START - Cookbook sre.hosts.decommission for hosts mx1001.wikimedia.org
19:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2237 (T376905)', diff saved to https://phabricator.wikimedia.org/P70185 and previous config saved to /var/cache/conftool/dbconfig/20241016-193500-ladsgroup.json
19:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance
19:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance
19:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T376905)', diff saved to https://phabricator.wikimedia.org/P70184 and previous config saved to /var/cache/conftool/dbconfig/20241016-193433-ladsgroup.json
19:30 brennen@deploy2002: Finished deploy [phabricator/deployment@40a63c9]: deploy phab2002 for T377374 (duration: 10m 42s)
19:19 brennen@deploy2002: Started deploy [phabricator/deployment@40a63c9]: deploy phab2002 for T377374
19:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P70183 and previous config saved to /var/cache/conftool/dbconfig/20241016-191926-ladsgroup.json
19:16 inflatador: bking@stat1011 racadm>>racadm jobqueue create BIOS.Setup.1-1 Commit JID = JID_291241139935 T376813
19:14 inflatador: bking@stat1011 racadm>>racadm set BIOS.MemSettings.NodeInterleave Enabled T376813
19:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P70182 and previous config saved to /var/cache/conftool/dbconfig/20241016-190419-ladsgroup.json
18:54 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1177.eqiad.wmnet with OS bullseye
18:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T376905)', diff saved to https://phabricator.wikimedia.org/P70181 and previous config saved to /var/cache/conftool/dbconfig/20241016-184912-ladsgroup.json
18:47 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mx2001.wikimedia.org
18:47 jhathaway@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:46 jhathaway@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mx2001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jhathaway@cumin1002"
18:45 jhathaway@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mx2001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jhathaway@cumin1002"
18:43 papaul: maintenance on mr1-ulsfo complete
18:41 jhathaway@cumin1002: START - Cookbook sre.dns.netbox
18:36 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1176.eqiad.wmnet with OS bullseye
18:35 jhathaway@cumin1002: START - Cookbook sre.hosts.decommission for hosts mx2001.wikimedia.org
18:33 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on phab2002.codfw.wmnet with reason: host reimage
18:32 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
18:32 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
18:31 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
18:31 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
18:29 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on phab2002.codfw.wmnet with reason: host reimage
18:27 gmodena@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
18:27 gmodena@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
18:21 gmodena@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
18:20 gmodena@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
18:17 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.43.0-wmf.27 refs T375658
18:13 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host phab2002
18:13 dzahn@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host phab2002
18:13 dzahn@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host phab2002
18:12 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) phab2002.codfw.wmnet 54.32.192.10.in-addr.arpa 4.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
18:12 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache phab2002.codfw.wmnet 54.32.192.10.in-addr.arpa 4.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
18:12 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:12 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host phab2002 - dzahn@cumin2002"
18:11 gmodena@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
18:11 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host phab2002 - dzahn@cumin2002"
18:11 gmodena@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
18:06 dzahn@cumin2002: START - Cookbook sre.dns.netbox
18:05 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
18:04 cdanis@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
18:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
18:01 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
18:00 papaul: ongoing maintenance on mr1-ulsfo
18:00 cdanis@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
17:58 dzahn@cumin2002: START - Cookbook sre.hosts.move-vlan for host phab2002
17:58 cdanis@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
17:57 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host phab2002.codfw.wmnet with OS bullseye
17:56 cdanis@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
17:55 cdanis@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
17:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T376905)', diff saved to https://phabricator.wikimedia.org/P70179 and previous config saved to /var/cache/conftool/dbconfig/20241016-174847-ladsgroup.json
17:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance
17:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance
17:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T376905)', diff saved to https://phabricator.wikimedia.org/P70178 and previous config saved to /var/cache/conftool/dbconfig/20241016-174821-ladsgroup.json
17:48 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:48 swfrench@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly allocated LVS VIPs for mw-web-next and mw-api-ext-next - swfrench@cumin2002"
17:41 swfrench@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly allocated LVS VIPs for mw-web-next and mw-api-ext-next - swfrench@cumin2002"
17:39 gmodena@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
17:38 gmodena@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
17:37 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1177.eqiad.wmnet with OS bullseye
17:37 swfrench@cumin2002: START - Cookbook sre.dns.netbox
17:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P70177 and previous config saved to /var/cache/conftool/dbconfig/20241016-173314-ladsgroup.json
17:20 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1176.eqiad.wmnet with OS bullseye
17:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P70176 and previous config saved to /var/cache/conftool/dbconfig/20241016-171807-ladsgroup.json
17:16 xcollazo@deploy2002: Finished deploy [analytics/refinery@f186c94] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@f186c94a] (duration: 03m 44s)
17:13 xcollazo@deploy2002: Started deploy [analytics/refinery@f186c94] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@f186c94a]
17:12 xcollazo@deploy2002: Finished deploy [analytics/refinery@f186c94] (thin): Regular analytics weekly train THIN [analytics/refinery@f186c94a] (duration: 05m 11s)
17:06 xcollazo@deploy2002: Started deploy [analytics/refinery@f186c94] (thin): Regular analytics weekly train THIN [analytics/refinery@f186c94a]
17:06 xcollazo@deploy2002: Finished deploy [analytics/refinery@f186c94]: Regular analytics weekly train [analytics/refinery@f186c94a] (duration: 08m 54s)
17:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T376905)', diff saved to https://phabricator.wikimedia.org/P70175 and previous config saved to /var/cache/conftool/dbconfig/20241016-170300-ladsgroup.json
16:57 xcollazo@deploy2002: Started deploy [analytics/refinery@f186c94]: Regular analytics weekly train [analytics/refinery@f186c94a]
16:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2210 (T376905)', diff saved to https://phabricator.wikimedia.org/P70174 and previous config saved to /var/cache/conftool/dbconfig/20241016-165343-ladsgroup.json
16:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2210.codfw.wmnet with reason: Maintenance
16:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2210.codfw.wmnet with reason: Maintenance
16:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T376905)', diff saved to https://phabricator.wikimedia.org/P70173 and previous config saved to /var/cache/conftool/dbconfig/20241016-165317-ladsgroup.json
16:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P70172 and previous config saved to /var/cache/conftool/dbconfig/20241016-163810-ladsgroup.json
16:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P70171 and previous config saved to /var/cache/conftool/dbconfig/20241016-162303-ladsgroup.json
16:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T376905)', diff saved to https://phabricator.wikimedia.org/P70170 and previous config saved to /var/cache/conftool/dbconfig/20241016-160756-ladsgroup.json
16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2206 (T376905)', diff saved to https://phabricator.wikimedia.org/P70169 and previous config saved to /var/cache/conftool/dbconfig/20241016-155948-ladsgroup.json
15:59 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance
15:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance
15:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance
15:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance
15:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T376905)', diff saved to https://phabricator.wikimedia.org/P70168 and previous config saved to /var/cache/conftool/dbconfig/20241016-155450-ladsgroup.json
15:52 papaul: maintenance on mr1-eqsin complete
15:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P70167 and previous config saved to /var/cache/conftool/dbconfig/20241016-153943-ladsgroup.json
15:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P70166 and previous config saved to /var/cache/conftool/dbconfig/20241016-152436-ladsgroup.json
15:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T376905)', diff saved to https://phabricator.wikimedia.org/P70165 and previous config saved to /var/cache/conftool/dbconfig/20241016-150928-ladsgroup.json
15:05 papaul: ongoing maintenance on mr1-eqsin
14:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:41 urbanecm@deploy2002: Finished scap sync-world: Backport for [Growth] beta: Lower batch size for reassignMenteesJob (T376124) (duration: 06m 46s)
14:35 urbanecm@deploy2002: Started scap sync-world: Backport for [Growth] beta: Lower batch size for reassignMenteesJob (T376124)
14:25 Lucas_WMDE: UTC afternoon backport+config window done
14:25 Lucas_WMDE: [cont.] 7)]], Hard-code LabelCountField::NAME (T377226), Remove LabelCountField (T377226), Drop label_count field (LabelCountField) (T377226) (duration: 11m 36s)
{{safesubst:SAL entry|1=14:24 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Tests: Skip testViewForExistingGlobalTemporaryAccount (T377197), Hard-code LabelCountField::NAME (T377226), Remove LabelCountField (T377226), Drop label_count field (LabelCountField) (T377226), [[gerrit:1080703|Tests: Skip testViewForExistingGlobalTemporaryAccount (T37719}}
14:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:20 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
14:19 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - oblivian@cumin1002"
14:19 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: [not really into teleological thinking] - oblivian@cumin1002
14:19 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
14:18 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
14:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T367856)', diff saved to https://phabricator.wikimedia.org/P70164 and previous config saved to /var/cache/conftool/dbconfig/20241016-141819-ladsgroup.json
14:18 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: [not really into teleological thinking] - oblivian@cumin1002
14:18 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - oblivian@cumin1002"
14:17 oblivian@cumin1002: END (FAIL) - Cookbook sre.deploy.hiddenparma (exit_code=99) Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - oblivian@cumin1002"
14:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - oblivian@cumin1002"
14:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1219.eqiad.wmnet with reason: Maintenance
14:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1219.eqiad.wmnet with reason: Maintenance
14:15 Lucas_WMDE: [cont.] ], Hard-code LabelCountField::NAME (T377226), Remove LabelCountField (T377226), Drop label_count field (LabelCountField) (T377226) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
{{safesubst:SAL entry|1=14:15 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Tests: Skip testViewForExistingGlobalTemporaryAccount (T377197), Hard-code LabelCountField::NAME (T377226), Remove LabelCountField (T377226), Drop label_count field (LabelCountField) (T377226), Tests: Skip testViewForExistingGlobalTemporaryAccount (T377197)]}}

14:13 Lucas_WMDE: [cont.] ), Hard-code LabelCountField::NAME (T377226), Remove LabelCountField (T377226), Drop label_count field (LabelCountField) (T377226)
{{safesubst:SAL entry|1=14:13 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Tests: Skip testViewForExistingGlobalTemporaryAccount (T377197), Hard-code LabelCountField::NAME (T377226), Remove LabelCountField (T377226), Drop label_count field (LabelCountField) (T377226), [[gerrit:1080703|Tests: Skip testViewForExistingGlobalTemporaryAccount (T377197}}
14:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T376905)', diff saved to https://phabricator.wikimedia.org/P70163 and previous config saved to /var/cache/conftool/dbconfig/20241016-140902-ladsgroup.json
14:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
14:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
14:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T376905)', diff saved to https://phabricator.wikimedia.org/P70162 and previous config saved to /var/cache/conftool/dbconfig/20241016-140835-ladsgroup.json
14:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70161 and previous config saved to /var/cache/conftool/dbconfig/20241016-140312-ladsgroup.json
13:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70160 and previous config saved to /var/cache/conftool/dbconfig/20241016-135328-ladsgroup.json
13:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70159 and previous config saved to /var/cache/conftool/dbconfig/20241016-134805-ladsgroup.json
13:43 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1177.eqiad.wmnet with OS bullseye
13:41 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1176.eqiad.wmnet with OS bullseye
13:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70158 and previous config saved to /var/cache/conftool/dbconfig/20241016-133821-ladsgroup.json
13:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T367856)', diff saved to https://phabricator.wikimedia.org/P70157 and previous config saved to /var/cache/conftool/dbconfig/20241016-133257-ladsgroup.json
13:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Update Z669x references to Z609x (duration: 08m 23s)
13:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T376905)', diff saved to https://phabricator.wikimedia.org/P70156 and previous config saved to /var/cache/conftool/dbconfig/20241016-132314-ladsgroup.json
13:20 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, jforrester: Continuing with sync
13:19 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, jforrester: Backport for Update Z669x references to Z609x synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Update Z669x references to Z609x
13:16 Dreamy_Jazz: Started time limited scan on enwiki - https://wikitech.wikimedia.org/wiki/MediaModeration
13:16 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Remove wgGEUseNewImpactModule config (T350077) (duration: 11m 35s)
13:11 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, cyndywikime: Continuing with sync
13:07 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, cyndywikime: Backport for Remove wgGEUseNewImpactModule config (T350077) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:04 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Remove wgGEUseNewImpactModule config (T350077)
12:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2200.codfw.wmnet with reason: Maintenance
12:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2200.codfw.wmnet with reason: Maintenance
12:52 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1177.eqiad.wmnet with OS bullseye
12:47 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
12:46 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
12:46 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
12:46 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
12:43 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1176.eqiad.wmnet with OS bullseye
12:35 stevemunene@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1177
12:35 stevemunene@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1177
12:35 stevemunene@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1176
12:34 stevemunene@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1176
12:33 Dreamy_Jazz: Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
12:32 stevemunene@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:32 stevemunene@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly reassigned an-worker hosts in analytics eqiad - stevemunene@cumin1002"
12:32 stevemunene@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly reassigned an-worker hosts in analytics eqiad - stevemunene@cumin1002"
12:28 stevemunene@cumin1002: START - Cookbook sre.dns.netbox
12:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T376905)', diff saved to https://phabricator.wikimedia.org/P70155 and previous config saved to /var/cache/conftool/dbconfig/20241016-122248-ladsgroup.json
12:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
12:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
12:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
12:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
12:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T376905)', diff saved to https://phabricator.wikimedia.org/P70154 and previous config saved to /var/cache/conftool/dbconfig/20241016-122206-ladsgroup.json
12:15 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
12:14 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
12:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P70153 and previous config saved to /var/cache/conftool/dbconfig/20241016-120659-ladsgroup.json
11:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P70152 and previous config saved to /var/cache/conftool/dbconfig/20241016-115152-ladsgroup.json
11:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2198.codfw.wmnet with reason: Maintenance
11:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T376905)', diff saved to https://phabricator.wikimedia.org/P70150 and previous config saved to /var/cache/conftool/dbconfig/20241016-113645-ladsgroup.json
11:36 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2198.codfw.wmnet with reason: Maintenance
11:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T371742)', diff saved to https://phabricator.wikimedia.org/P70149 and previous config saved to /var/cache/conftool/dbconfig/20241016-113639-ladsgroup.json
11:29 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
11:28 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
11:26 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
11:25 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
11:22 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
11:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70148 and previous config saved to /var/cache/conftool/dbconfig/20241016-112132-ladsgroup.json
11:21 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
11:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70147 and previous config saved to /var/cache/conftool/dbconfig/20241016-110625-ladsgroup.json
10:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T371742)', diff saved to https://phabricator.wikimedia.org/P70146 and previous config saved to /var/cache/conftool/dbconfig/20241016-105118-ladsgroup.json
10:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T376905)', diff saved to https://phabricator.wikimedia.org/P70145 and previous config saved to /var/cache/conftool/dbconfig/20241016-103620-ladsgroup.json
10:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
10:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
10:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T376905)', diff saved to https://phabricator.wikimedia.org/P70144 and previous config saved to /var/cache/conftool/dbconfig/20241016-103553-ladsgroup.json
10:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P70143 and previous config saved to /var/cache/conftool/dbconfig/20241016-102046-ladsgroup.json
10:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P70142 and previous config saved to /var/cache/conftool/dbconfig/20241016-100539-ladsgroup.json
09:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T376905)', diff saved to https://phabricator.wikimedia.org/P70141 and previous config saved to /var/cache/conftool/dbconfig/20241016-095032-ladsgroup.json
09:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2140 (T376905)', diff saved to https://phabricator.wikimedia.org/P70140 and previous config saved to /var/cache/conftool/dbconfig/20241016-093852-ladsgroup.json
09:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
09:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
09:32 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
09:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
09:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T376905)', diff saved to https://phabricator.wikimedia.org/P70139 and previous config saved to /var/cache/conftool/dbconfig/20241016-093147-ladsgroup.json
09:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 (T371742)', diff saved to https://phabricator.wikimedia.org/P70138 and previous config saved to /var/cache/conftool/dbconfig/20241016-092219-ladsgroup.json
09:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2195.codfw.wmnet with reason: Maintenance
09:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2195.codfw.wmnet with reason: Maintenance
09:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T371742)', diff saved to https://phabricator.wikimedia.org/P70137 and previous config saved to /var/cache/conftool/dbconfig/20241016-092157-ladsgroup.json
09:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P70136 and previous config saved to /var/cache/conftool/dbconfig/20241016-091640-ladsgroup.json
09:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70134 and previous config saved to /var/cache/conftool/dbconfig/20241016-090650-ladsgroup.json
09:04 kevinbazira@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
09:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P70133 and previous config saved to /var/cache/conftool/dbconfig/20241016-090133-ladsgroup.json
08:57 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
08:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70132 and previous config saved to /var/cache/conftool/dbconfig/20241016-085143-ladsgroup.json
08:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T376905)', diff saved to https://phabricator.wikimedia.org/P70131 and previous config saved to /var/cache/conftool/dbconfig/20241016-084626-ladsgroup.json
08:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2136 (T376905)', diff saved to https://phabricator.wikimedia.org/P70130 and previous config saved to /var/cache/conftool/dbconfig/20241016-083651-ladsgroup.json
08:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
08:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T371742)', diff saved to https://phabricator.wikimedia.org/P70129 and previous config saved to /var/cache/conftool/dbconfig/20241016-083636-ladsgroup.json
08:36 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
08:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
08:07 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
08:05 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
08:04 brouberol@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
08:03 brouberol@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
08:02 brouberol@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
08:01 brouberol@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
08:00 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
07:59 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
07:41 awight: UTC morning deployments done
07:40 awight@deploy2002: Finished scap sync-world: Backport for zhwiki: Revise contact page deprecated usage (duration: 09m 07s)
07:35 awight@deploy2002: awight, hamishz: Continuing with sync
07:34 awight@deploy2002: awight, hamishz: Backport for zhwiki: Revise contact page deprecated usage synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:31 awight@deploy2002: Started scap sync-world: Backport for zhwiki: Revise contact page deprecated usage
07:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T376905)', diff saved to https://phabricator.wikimedia.org/P70128 and previous config saved to /var/cache/conftool/dbconfig/20241016-072501-ladsgroup.json
07:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P70127 and previous config saved to /var/cache/conftool/dbconfig/20241016-070954-ladsgroup.json
07:09 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
07:08 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
07:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 (T371742)', diff saved to https://phabricator.wikimedia.org/P70126 and previous config saved to /var/cache/conftool/dbconfig/20241016-070246-ladsgroup.json
07:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2181.codfw.wmnet with reason: Maintenance
07:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2181.codfw.wmnet with reason: Maintenance
07:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T371742)', diff saved to https://phabricator.wikimedia.org/P70125 and previous config saved to /var/cache/conftool/dbconfig/20241016-070224-ladsgroup.json
06:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P70124 and previous config saved to /var/cache/conftool/dbconfig/20241016-065447-ladsgroup.json
06:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70123 and previous config saved to /var/cache/conftool/dbconfig/20241016-064717-ladsgroup.json
06:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T376905)', diff saved to https://phabricator.wikimedia.org/P70122 and previous config saved to /var/cache/conftool/dbconfig/20241016-063940-ladsgroup.json
06:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70121 and previous config saved to /var/cache/conftool/dbconfig/20241016-063210-ladsgroup.json
06:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1249 (T376905)', diff saved to https://phabricator.wikimedia.org/P70120 and previous config saved to /var/cache/conftool/dbconfig/20241016-063132-ladsgroup.json
06:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: Maintenance
06:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: Maintenance
06:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T376905)', diff saved to https://phabricator.wikimedia.org/P70119 and previous config saved to /var/cache/conftool/dbconfig/20241016-063107-ladsgroup.json
06:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T371742)', diff saved to https://phabricator.wikimedia.org/P70118 and previous config saved to /var/cache/conftool/dbconfig/20241016-061703-ladsgroup.json
06:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P70117 and previous config saved to /var/cache/conftool/dbconfig/20241016-061558-ladsgroup.json
06:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P70116 and previous config saved to /var/cache/conftool/dbconfig/20241016-060051-ladsgroup.json
05:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T376905)', diff saved to https://phabricator.wikimedia.org/P70115 and previous config saved to /var/cache/conftool/dbconfig/20241016-054544-ladsgroup.json
05:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1248 (T376905)', diff saved to https://phabricator.wikimedia.org/P70114 and previous config saved to /var/cache/conftool/dbconfig/20241016-053943-ladsgroup.json
05:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: Maintenance
05:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: Maintenance
05:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T376905)', diff saved to https://phabricator.wikimedia.org/P70113 and previous config saved to /var/cache/conftool/dbconfig/20241016-053918-ladsgroup.json
05:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P70112 and previous config saved to /var/cache/conftool/dbconfig/20241016-052411-ladsgroup.json
05:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P70111 and previous config saved to /var/cache/conftool/dbconfig/20241016-050904-ladsgroup.json
04:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T376905)', diff saved to https://phabricator.wikimedia.org/P70110 and previous config saved to /var/cache/conftool/dbconfig/20241016-045356-ladsgroup.json
04:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1247 (T376905)', diff saved to https://phabricator.wikimedia.org/P70109 and previous config saved to /var/cache/conftool/dbconfig/20241016-044657-ladsgroup.json
04:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: Maintenance
04:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: Maintenance
04:42 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
04:42 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
04:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T376905)', diff saved to https://phabricator.wikimedia.org/P70108 and previous config saved to /var/cache/conftool/dbconfig/20241016-044204-ladsgroup.json
04:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T371742)', diff saved to https://phabricator.wikimedia.org/P70107 and previous config saved to /var/cache/conftool/dbconfig/20241016-043757-ladsgroup.json
04:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
04:37 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
04:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T371742)', diff saved to https://phabricator.wikimedia.org/P70106 and previous config saved to /var/cache/conftool/dbconfig/20241016-043734-ladsgroup.json
04:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P70105 and previous config saved to /var/cache/conftool/dbconfig/20241016-042657-ladsgroup.json
04:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P70104 and previous config saved to /var/cache/conftool/dbconfig/20241016-042227-ladsgroup.json
04:22 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
04:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for new frack devices - pt1979@cumin2002"
04:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for new frack devices - pt1979@cumin2002"
04:18 pt1979@cumin2002: START - Cookbook sre.dns.netbox
04:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P70103 and previous config saved to /var/cache/conftool/dbconfig/20241016-041150-ladsgroup.json
04:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P70102 and previous config saved to /var/cache/conftool/dbconfig/20241016-040721-ladsgroup.json
04:05 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
04:05 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for new frack devices - pt1979@cumin2002"
04:05 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for new frack devices - pt1979@cumin2002"
04:01 pt1979@cumin2002: START - Cookbook sre.dns.netbox
03:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T376905)', diff saved to https://phabricator.wikimedia.org/P70101 and previous config saved to /var/cache/conftool/dbconfig/20241016-035643-ladsgroup.json
03:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T371742)', diff saved to https://phabricator.wikimedia.org/P70100 and previous config saved to /var/cache/conftool/dbconfig/20241016-035214-ladsgroup.json
03:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1244 (T376905)', diff saved to https://phabricator.wikimedia.org/P70099 and previous config saved to /var/cache/conftool/dbconfig/20241016-034932-ladsgroup.json
03:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: Maintenance
03:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: Maintenance
03:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T376905)', diff saved to https://phabricator.wikimedia.org/P70098 and previous config saved to /var/cache/conftool/dbconfig/20241016-034907-ladsgroup.json
03:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P70097 and previous config saved to /var/cache/conftool/dbconfig/20241016-033400-ladsgroup.json
03:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P70096 and previous config saved to /var/cache/conftool/dbconfig/20241016-031852-ladsgroup.json
03:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T376905)', diff saved to https://phabricator.wikimedia.org/P70095 and previous config saved to /var/cache/conftool/dbconfig/20241016-030345-ladsgroup.json
02:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T376905)', diff saved to https://phabricator.wikimedia.org/P70094 and previous config saved to /var/cache/conftool/dbconfig/20241016-025633-ladsgroup.json
02:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: Maintenance
02:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: Maintenance
02:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T376905)', diff saved to https://phabricator.wikimedia.org/P70093 and previous config saved to /var/cache/conftool/dbconfig/20241016-025608-ladsgroup.json
02:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P70092 and previous config saved to /var/cache/conftool/dbconfig/20241016-024101-ladsgroup.json
02:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P70091 and previous config saved to /var/cache/conftool/dbconfig/20241016-022554-ladsgroup.json
02:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T371742)', diff saved to https://phabricator.wikimedia.org/P70090 and previous config saved to /var/cache/conftool/dbconfig/20241016-021358-ladsgroup.json
02:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2166.codfw.wmnet with reason: Maintenance
02:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2166.codfw.wmnet with reason: Maintenance
02:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T371742)', diff saved to https://phabricator.wikimedia.org/P70089 and previous config saved to /var/cache/conftool/dbconfig/20241016-021347-ladsgroup.json
02:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T376905)', diff saved to https://phabricator.wikimedia.org/P70088 and previous config saved to /var/cache/conftool/dbconfig/20241016-021047-ladsgroup.json
02:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T376905)', diff saved to https://phabricator.wikimedia.org/P70087 and previous config saved to /var/cache/conftool/dbconfig/20241016-020333-ladsgroup.json
02:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: Maintenance
02:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: Maintenance
02:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T376905)', diff saved to https://phabricator.wikimedia.org/P70086 and previous config saved to /var/cache/conftool/dbconfig/20241016-020308-ladsgroup.json
01:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P70085 and previous config saved to /var/cache/conftool/dbconfig/20241016-015840-ladsgroup.json
01:50 eileen: tools upgraded from 62f2d170 to 68f64e43
01:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P70084 and previous config saved to /var/cache/conftool/dbconfig/20241016-014801-ladsgroup.json
01:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P70083 and previous config saved to /var/cache/conftool/dbconfig/20241016-014333-ladsgroup.json
01:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P70082 and previous config saved to /var/cache/conftool/dbconfig/20241016-013254-ladsgroup.json
01:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T371742)', diff saved to https://phabricator.wikimedia.org/P70081 and previous config saved to /var/cache/conftool/dbconfig/20241016-012826-ladsgroup.json
01:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T376905)', diff saved to https://phabricator.wikimedia.org/P70080 and previous config saved to /var/cache/conftool/dbconfig/20241016-011747-ladsgroup.json
01:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1241 (T376905)', diff saved to https://phabricator.wikimedia.org/P70079 and previous config saved to /var/cache/conftool/dbconfig/20241016-011036-ladsgroup.json
01:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: Maintenance
01:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: Maintenance
01:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T376905)', diff saved to https://phabricator.wikimedia.org/P70078 and previous config saved to /var/cache/conftool/dbconfig/20241016-011010-ladsgroup.json
00:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P70077 and previous config saved to /var/cache/conftool/dbconfig/20241016-005500-ladsgroup.json
00:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P70076 and previous config saved to /var/cache/conftool/dbconfig/20241016-003953-ladsgroup.json
00:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T376905)', diff saved to https://phabricator.wikimedia.org/P70075 and previous config saved to /var/cache/conftool/dbconfig/20241016-002446-ladsgroup.json
00:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1238 (T376905)', diff saved to https://phabricator.wikimedia.org/P70074 and previous config saved to /var/cache/conftool/dbconfig/20241016-001629-ladsgroup.json
00:16 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: Maintenance
00:16 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: Maintenance
00:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T376905)', diff saved to https://phabricator.wikimedia.org/P70073 and previous config saved to /var/cache/conftool/dbconfig/20241016-001604-ladsgroup.json
00:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P70072 and previous config saved to /var/cache/conftool/dbconfig/20241016-000057-ladsgroup.json

2024-10-15

23:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T371742)', diff saved to https://phabricator.wikimedia.org/P70071 and previous config saved to /var/cache/conftool/dbconfig/20241015-235055-ladsgroup.json
23:50 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
23:50 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
23:50 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2164.codfw.wmnet with reason: Maintenance
23:50 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2164.codfw.wmnet with reason: Maintenance
23:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T371742)', diff saved to https://phabricator.wikimedia.org/P70070 and previous config saved to /var/cache/conftool/dbconfig/20241015-235017-ladsgroup.json
23:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P70069 and previous config saved to /var/cache/conftool/dbconfig/20241015-234550-ladsgroup.json
23:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P70068 and previous config saved to /var/cache/conftool/dbconfig/20241015-233510-ladsgroup.json
23:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T376905)', diff saved to https://phabricator.wikimedia.org/P70067 and previous config saved to /var/cache/conftool/dbconfig/20241015-233043-ladsgroup.json
23:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T376905)', diff saved to https://phabricator.wikimedia.org/P70066 and previous config saved to /var/cache/conftool/dbconfig/20241015-232456-ladsgroup.json
23:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
23:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
23:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
23:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
23:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T376905)', diff saved to https://phabricator.wikimedia.org/P70065 and previous config saved to /var/cache/conftool/dbconfig/20241015-232423-ladsgroup.json
23:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P70064 and previous config saved to /var/cache/conftool/dbconfig/20241015-232003-ladsgroup.json
23:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P70063 and previous config saved to /var/cache/conftool/dbconfig/20241015-230916-ladsgroup.json
23:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T371742)', diff saved to https://phabricator.wikimedia.org/P70062 and previous config saved to /var/cache/conftool/dbconfig/20241015-230456-ladsgroup.json
22:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P70061 and previous config saved to /var/cache/conftool/dbconfig/20241015-225409-ladsgroup.json
22:48 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
22:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T376905)', diff saved to https://phabricator.wikimedia.org/P70060 and previous config saved to /var/cache/conftool/dbconfig/20241015-223902-ladsgroup.json
22:38 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
22:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1199 (T376905)', diff saved to https://phabricator.wikimedia.org/P70059 and previous config saved to /var/cache/conftool/dbconfig/20241015-222936-ladsgroup.json
22:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
22:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
22:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T376905)', diff saved to https://phabricator.wikimedia.org/P70058 and previous config saved to /var/cache/conftool/dbconfig/20241015-222911-ladsgroup.json
22:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1222.eqiad.wmnet with reason: Maintenance
22:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1222.eqiad.wmnet with reason: Maintenance
22:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1189.eqiad.wmnet with reason: Maintenance
22:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1189.eqiad.wmnet with reason: Maintenance
22:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P70057 and previous config saved to /var/cache/conftool/dbconfig/20241015-221404-ladsgroup.json
22:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T370903)', diff saved to https://phabricator.wikimedia.org/P70056 and previous config saved to /var/cache/conftool/dbconfig/20241015-221356-ladsgroup.json
22:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P70055 and previous config saved to /var/cache/conftool/dbconfig/20241015-220316-ladsgroup.json
21:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P70054 and previous config saved to /var/cache/conftool/dbconfig/20241015-215857-ladsgroup.json
21:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P70053 and previous config saved to /var/cache/conftool/dbconfig/20241015-215849-ladsgroup.json
21:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye
21:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P70052 and previous config saved to /var/cache/conftool/dbconfig/20241015-214811-ladsgroup.json
21:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T376905)', diff saved to https://phabricator.wikimedia.org/P70051 and previous config saved to /var/cache/conftool/dbconfig/20241015-214350-ladsgroup.json
21:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P70050 and previous config saved to /var/cache/conftool/dbconfig/20241015-214342-ladsgroup.json
21:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1190 (T376905)', diff saved to https://phabricator.wikimedia.org/P70049 and previous config saved to /var/cache/conftool/dbconfig/20241015-213423-ladsgroup.json
21:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
21:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
21:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P70048 and previous config saved to /var/cache/conftool/dbconfig/20241015-213305-ladsgroup.json
21:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T371742)', diff saved to https://phabricator.wikimedia.org/P70047 and previous config saved to /var/cache/conftool/dbconfig/20241015-213227-ladsgroup.json
21:32 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
21:32 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
21:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T371742)', diff saved to https://phabricator.wikimedia.org/P70046 and previous config saved to /var/cache/conftool/dbconfig/20241015-213203-ladsgroup.json
21:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
21:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
21:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T370903)', diff saved to https://phabricator.wikimedia.org/P70045 and previous config saved to /var/cache/conftool/dbconfig/20241015-212835-ladsgroup.json
21:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2205.codfw.wmnet with reason: Sad
21:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2205.codfw.wmnet with reason: Sad
21:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1175 (T370903)', diff saved to https://phabricator.wikimedia.org/P70044 and previous config saved to /var/cache/conftool/dbconfig/20241015-212431-ladsgroup.json
21:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1175.eqiad.wmnet with reason: Maintenance
21:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1175.eqiad.wmnet with reason: Maintenance
21:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P70043 and previous config saved to /var/cache/conftool/dbconfig/20241015-211800-ladsgroup.json
21:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P70042 and previous config saved to /var/cache/conftool/dbconfig/20241015-211656-ladsgroup.json
21:04 cjming: end of UTC late backport window
21:04 cjming@deploy2002: Finished scap sync-world: Backport for SkinComponentCopyright: Fix message existence check for history-copyright (T45646) (duration: 06m 51s)
21:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P70041 and previous config saved to /var/cache/conftool/dbconfig/20241015-210149-ladsgroup.json
20:59 cjming@deploy2002: cjming, matmarex: Continuing with sync
20:59 cjming@deploy2002: cjming, matmarex: Backport for SkinComponentCopyright: Fix message existence check for history-copyright (T45646) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:57 ladsgroup@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2194.codfw.wmnet onto db2205.codfw.wmnet
20:57 cjming@deploy2002: Started scap sync-world: Backport for SkinComponentCopyright: Fix message existence check for history-copyright (T45646)
20:56 cjming@deploy2002: Finished scap sync-world: Backport for Redirect all namespace-in-Wikipedia cases to Wikipedia (T376923) (duration: 12m 33s)
20:51 cjming@deploy2002: cjming, pppery: Continuing with sync
20:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T371742)', diff saved to https://phabricator.wikimedia.org/P70040 and previous config saved to /var/cache/conftool/dbconfig/20241015-204642-ladsgroup.json
20:46 cjming@deploy2002: cjming, pppery: Backport for Redirect all namespace-in-Wikipedia cases to Wikipedia (T376923) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:43 cjming@deploy2002: Started scap sync-world: Backport for Redirect all namespace-in-Wikipedia cases to Wikipedia (T376923)
20:42 cjming@deploy2002: Finished scap sync-world: Backport for Missing.php: Improve detection of interwikis in certain cases (T363538) (duration: 08m 50s)
20:37 cjming@deploy2002: cjming, pppery: Continuing with sync
20:35 cjming@deploy2002: cjming, pppery: Backport for Missing.php: Improve detection of interwikis in certain cases (T363538) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:33 cjming@deploy2002: Started scap sync-world: Backport for Missing.php: Improve detection of interwikis in certain cases (T363538)
20:31 cjming@deploy2002: Finished scap sync-world: Backport for contactpages: Move stewards contactpage to MetaContactPages.php (duration: 10m 56s)
20:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
20:27 cjming@deploy2002: ammarpad, cjming: Continuing with sync
20:23 cjming@deploy2002: ammarpad, cjming: Backport for contactpages: Move stewards contactpage to MetaContactPages.php synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:20 cjming@deploy2002: Started scap sync-world: Backport for contactpages: Move stewards contactpage to MetaContactPages.php
20:16 cjming@deploy2002: Finished scap sync-world: Backport for Remove legacy UI actions tracking (T376065) (duration: 12m 28s)
20:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye
20:12 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
20:12 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
20:11 cjming@deploy2002: ksarabia, cjming: Continuing with sync
20:11 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
20:10 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
20:10 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
20:09 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
20:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye
20:08 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2081.codfw.wmnet with OS bullseye
20:07 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
20:07 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
20:06 cjming@deploy2002: ksarabia, cjming: Backport for Remove legacy UI actions tracking (T376065) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:05 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
20:04 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
20:04 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
20:03 cjming@deploy2002: Started scap sync-world: Backport for Remove legacy UI actions tracking (T376065)
20:03 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
20:02 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
20:01 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
20:00 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
19:59 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
19:56 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
19:56 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
19:16 aklapper@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.27 refs T375658
19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2162 (T371742)', diff saved to https://phabricator.wikimedia.org/P70039 and previous config saved to /var/cache/conftool/dbconfig/20241015-191345-ladsgroup.json
19:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2162.codfw.wmnet with reason: Maintenance
19:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2162.codfw.wmnet with reason: Maintenance
19:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T371742)', diff saved to https://phabricator.wikimedia.org/P70038 and previous config saved to /var/cache/conftool/dbconfig/20241015-191322-ladsgroup.json
19:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T367781)', diff saved to https://phabricator.wikimedia.org/P70037 and previous config saved to /var/cache/conftool/dbconfig/20241015-190231-arnaudb.json
18:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P70036 and previous config saved to /var/cache/conftool/dbconfig/20241015-185814-ladsgroup.json
18:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
18:55 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
18:55 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye
18:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:48 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P70035 and previous config saved to /var/cache/conftool/dbconfig/20241015-184724-arnaudb.json
18:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P70034 and previous config saved to /var/cache/conftool/dbconfig/20241015-184307-ladsgroup.json
18:42 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:42 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:41 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:40 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:40 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:39 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2082
18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2081
18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2083
18:37 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2083
18:37 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2082
18:36 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2081
18:36 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:35 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2081-3 to codfw - jhancock@cumin2002"
18:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2081-3 to codfw - jhancock@cumin2002"
18:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P70033 and previous config saved to /var/cache/conftool/dbconfig/20241015-183218-arnaudb.json
18:31 jhancock@cumin2002: START - Cookbook sre.dns.netbox
18:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T371742)', diff saved to https://phabricator.wikimedia.org/P70032 and previous config saved to /var/cache/conftool/dbconfig/20241015-182800-ladsgroup.json
18:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T376905)', diff saved to https://phabricator.wikimedia.org/P70031 and previous config saved to /var/cache/conftool/dbconfig/20241015-181930-ladsgroup.json
18:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T367781)', diff saved to https://phabricator.wikimedia.org/P70030 and previous config saved to /var/cache/conftool/dbconfig/20241015-181711-arnaudb.json
18:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T367781)', diff saved to https://phabricator.wikimedia.org/P70029 and previous config saved to /var/cache/conftool/dbconfig/20241015-181455-arnaudb.json
18:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2216.codfw.wmnet with reason: Maintenance
18:15 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2216.codfw.wmnet with reason: Maintenance
18:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T367781)', diff saved to https://phabricator.wikimedia.org/P70028 and previous config saved to /var/cache/conftool/dbconfig/20241015-181433-arnaudb.json
18:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P70027 and previous config saved to /var/cache/conftool/dbconfig/20241015-180423-ladsgroup.json
17:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P70026 and previous config saved to /var/cache/conftool/dbconfig/20241015-175926-arnaudb.json
17:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P70025 and previous config saved to /var/cache/conftool/dbconfig/20241015-174916-ladsgroup.json
17:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P70024 and previous config saved to /var/cache/conftool/dbconfig/20241015-174419-arnaudb.json
17:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T376905)', diff saved to https://phabricator.wikimedia.org/P70023 and previous config saved to /var/cache/conftool/dbconfig/20241015-173409-ladsgroup.json
17:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T367781)', diff saved to https://phabricator.wikimedia.org/P70022 and previous config saved to /var/cache/conftool/dbconfig/20241015-172912-arnaudb.json
17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1236 (T376905)', diff saved to https://phabricator.wikimedia.org/P70021 and previous config saved to /var/cache/conftool/dbconfig/20241015-172714-ladsgroup.json
17:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Maintenance
17:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2212 (T367781)', diff saved to https://phabricator.wikimedia.org/P70020 and previous config saved to /var/cache/conftool/dbconfig/20241015-172657-arnaudb.json
17:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Maintenance
17:27 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2212.codfw.wmnet with reason: Maintenance
17:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T376905)', diff saved to https://phabricator.wikimedia.org/P70019 and previous config saved to /var/cache/conftool/dbconfig/20241015-172648-ladsgroup.json
17:27 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2212.codfw.wmnet with reason: Maintenance
17:26 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2202.codfw.wmnet with reason: Maintenance
17:26 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2202.codfw.wmnet with reason: Maintenance
17:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T367781)', diff saved to https://phabricator.wikimedia.org/P70018 and previous config saved to /var/cache/conftool/dbconfig/20241015-172610-arnaudb.json
17:13 swfrench@deploy2002: Finished scap sync-world: Testing scap after mediawiki-deployments.yaml format change - T370934 (duration: 02m 47s)
17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P70017 and previous config saved to /var/cache/conftool/dbconfig/20241015-171141-ladsgroup.json
17:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P70016 and previous config saved to /var/cache/conftool/dbconfig/20241015-171103-arnaudb.json
17:10 swfrench@deploy2002: Started scap sync-world: Testing scap after mediawiki-deployments.yaml format change - T370934
16:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P70015 and previous config saved to /var/cache/conftool/dbconfig/20241015-165634-ladsgroup.json
16:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2161 (T371742)', diff saved to https://phabricator.wikimedia.org/P70014 and previous config saved to /var/cache/conftool/dbconfig/20241015-165608-ladsgroup.json
16:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P70013 and previous config saved to /var/cache/conftool/dbconfig/20241015-165556-arnaudb.json
16:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
16:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T371742)', diff saved to https://phabricator.wikimedia.org/P70012 and previous config saved to /var/cache/conftool/dbconfig/20241015-165539-ladsgroup.json
16:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T376905)', diff saved to https://phabricator.wikimedia.org/P70011 and previous config saved to /var/cache/conftool/dbconfig/20241015-164127-ladsgroup.json
16:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T367781)', diff saved to https://phabricator.wikimedia.org/P70010 and previous config saved to /var/cache/conftool/dbconfig/20241015-164050-arnaudb.json
16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P70009 and previous config saved to /var/cache/conftool/dbconfig/20241015-164032-ladsgroup.json
16:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T367781)', diff saved to https://phabricator.wikimedia.org/P70008 and previous config saved to /var/cache/conftool/dbconfig/20241015-163834-arnaudb.json
16:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2188.codfw.wmnet with reason: Maintenance
16:38 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2188.codfw.wmnet with reason: Maintenance
16:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T367781)', diff saved to https://phabricator.wikimedia.org/P70007 and previous config saved to /var/cache/conftool/dbconfig/20241015-163812-arnaudb.json
16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1227 (T376905)', diff saved to https://phabricator.wikimedia.org/P70006 and previous config saved to /var/cache/conftool/dbconfig/20241015-163419-ladsgroup.json
16:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: Maintenance
16:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: Maintenance
16:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T376905)', diff saved to https://phabricator.wikimedia.org/P70005 and previous config saved to /var/cache/conftool/dbconfig/20241015-163404-ladsgroup.json
16:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P70004 and previous config saved to /var/cache/conftool/dbconfig/20241015-162525-ladsgroup.json
16:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P70003 and previous config saved to /var/cache/conftool/dbconfig/20241015-162305-arnaudb.json
16:21 ladsgroup@cumin1002: START - Cookbook sre.mysql.clone of db2194.codfw.wmnet onto db2205.codfw.wmnet
16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool for reclone (T375652)', diff saved to https://phabricator.wikimedia.org/P70002 and previous config saved to /var/cache/conftool/dbconfig/20241015-161934-ladsgroup.json
16:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P70001 and previous config saved to /var/cache/conftool/dbconfig/20241015-161858-ladsgroup.json
16:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T371742)', diff saved to https://phabricator.wikimedia.org/P70000 and previous config saved to /var/cache/conftool/dbconfig/20241015-161018-ladsgroup.json
16:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P69999 and previous config saved to /var/cache/conftool/dbconfig/20241015-160758-arnaudb.json
16:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P69998 and previous config saved to /var/cache/conftool/dbconfig/20241015-160351-ladsgroup.json
16:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool db2205 T377164', diff saved to https://phabricator.wikimedia.org/P69997 and previous config saved to /var/cache/conftool/dbconfig/20241015-160106-ladsgroup.json
15:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T367781)', diff saved to https://phabricator.wikimedia.org/P69996 and previous config saved to /var/cache/conftool/dbconfig/20241015-155251-arnaudb.json
15:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Promote db2209 to s3 primary and set section read-write T377164', diff saved to https://phabricator.wikimedia.org/P69995 and previous config saved to /var/cache/conftool/dbconfig/20241015-155240-ladsgroup.json
15:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T376905)', diff saved to https://phabricator.wikimedia.org/P69994 and previous config saved to /var/cache/conftool/dbconfig/20241015-154844-ladsgroup.json
15:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Set s3 codfw as read-only for maintenance - T377164', diff saved to https://phabricator.wikimedia.org/P69993 and previous config saved to /var/cache/conftool/dbconfig/20241015-154834-ladsgroup.json
15:48 Amir1: Starting s3 codfw failover from db2205 to db2209 - T377164
15:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T367781)', diff saved to https://phabricator.wikimedia.org/P69992 and previous config saved to /var/cache/conftool/dbconfig/20241015-154318-arnaudb.json
15:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2176.codfw.wmnet with reason: Maintenance
15:45 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2176.codfw.wmnet with reason: Maintenance
15:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T367781)', diff saved to https://phabricator.wikimedia.org/P69991 and previous config saved to /var/cache/conftool/dbconfig/20241015-154256-arnaudb.json
15:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Set db2209 with weight 0 T377164', diff saved to https://phabricator.wikimedia.org/P69990 and previous config saved to /var/cache/conftool/dbconfig/20241015-154228-ladsgroup.json
15:43 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Primary switchover s3 T377164
15:42 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Primary switchover s3 T377164
15:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1202 (T376905)', diff saved to https://phabricator.wikimedia.org/P69989 and previous config saved to /var/cache/conftool/dbconfig/20241015-154027-ladsgroup.json
15:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
15:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
15:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T376905)', diff saved to https://phabricator.wikimedia.org/P69988 and previous config saved to /var/cache/conftool/dbconfig/20241015-154002-ladsgroup.json
15:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P69987 and previous config saved to /var/cache/conftool/dbconfig/20241015-152749-arnaudb.json
15:26 akosiaris: run gnt-cluster verify-disks after ganeti1034 forceful reboot
15:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P69986 and previous config saved to /var/cache/conftool/dbconfig/20241015-152456-ladsgroup.json
15:22 volans: force-rebooting ganeti1034 stuck due to drbd traces via mgmt
15:19 akosiaris@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1034.eqiad.wmnet
15:17 akosiaris: drain ganeti1034 of VMs, hardware might be misbehaving
15:16 akosiaris@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1034.eqiad.wmnet
15:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P69985 and previous config saved to /var/cache/conftool/dbconfig/20241015-151243-arnaudb.json
15:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P69984 and previous config saved to /var/cache/conftool/dbconfig/20241015-150948-ladsgroup.json
14:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T367781)', diff saved to https://phabricator.wikimedia.org/P69983 and previous config saved to /var/cache/conftool/dbconfig/20241015-145734-arnaudb.json
14:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan1001.eqiad.wmnet
14:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T367781)', diff saved to https://phabricator.wikimedia.org/P69982 and previous config saved to /var/cache/conftool/dbconfig/20241015-145517-arnaudb.json
14:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2174.codfw.wmnet with reason: Maintenance
14:55 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2174.codfw.wmnet with reason: Maintenance
14:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T367781)', diff saved to https://phabricator.wikimedia.org/P69981 and previous config saved to /var/cache/conftool/dbconfig/20241015-145453-arnaudb.json
14:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T376905)', diff saved to https://phabricator.wikimedia.org/P69980 and previous config saved to /var/cache/conftool/dbconfig/20241015-145441-ladsgroup.json
14:48 herron@cumin1002: START - Cookbook sre.hosts.reboot-single for host titan1001.eqiad.wmnet
14:47 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan2001.codfw.wmnet
14:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1194 (T376905)', diff saved to https://phabricator.wikimedia.org/P69979 and previous config saved to /var/cache/conftool/dbconfig/20241015-144631-ladsgroup.json
14:47 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
14:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
14:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T376905)', diff saved to https://phabricator.wikimedia.org/P69978 and previous config saved to /var/cache/conftool/dbconfig/20241015-144606-ladsgroup.json
14:45 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 02m 24s)
14:43 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 46s)
14:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P69977 and previous config saved to /var/cache/conftool/dbconfig/20241015-143946-arnaudb.json
14:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T371742)', diff saved to https://phabricator.wikimedia.org/P69976 and previous config saved to /var/cache/conftool/dbconfig/20241015-143803-ladsgroup.json
14:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2154.codfw.wmnet with reason: Maintenance
14:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2154.codfw.wmnet with reason: Maintenance
14:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T371742)', diff saved to https://phabricator.wikimedia.org/P69975 and previous config saved to /var/cache/conftool/dbconfig/20241015-143740-ladsgroup.json
14:36 herron@cumin1002: START - Cookbook sre.hosts.reboot-single for host titan2001.codfw.wmnet
14:35 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host matomo1003.eqiad.wmnet
14:33 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan1002.eqiad.wmnet
14:31 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host matomo1003.eqiad.wmnet
14:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P69974 and previous config saved to /var/cache/conftool/dbconfig/20241015-143059-ladsgroup.json
14:29 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
14:28 herron@cumin1002: START - Cookbook sre.hosts.reboot-single for host titan1002.eqiad.wmnet
14:28 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
14:27 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
14:27 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
14:26 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
14:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P69973 and previous config saved to /var/cache/conftool/dbconfig/20241015-142439-arnaudb.json
14:24 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan2002.codfw.wmnet
14:24 urbanecm@deploy2002: Finished scap sync-world: Backport for SkinComponentCopyright: Fix message existence check for history-copyright (T45646) (duration: 33m 23s)
14:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P69972 and previous config saved to /var/cache/conftool/dbconfig/20241015-142233-ladsgroup.json
14:21 btullis@cumin1002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling reboot on A:schema
14:19 urbanecm@deploy2002: urbanecm, matmarex: Continuing with sync
14:17 herron@cumin1002: START - Cookbook sre.hosts.reboot-single for host titan2002.codfw.wmnet
14:16 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog2002.codfw.wmnet
14:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P69971 and previous config saved to /var/cache/conftool/dbconfig/20241015-141552-ladsgroup.json
14:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T367781)', diff saved to https://phabricator.wikimedia.org/P69970 and previous config saved to /var/cache/conftool/dbconfig/20241015-140932-arnaudb.json
14:09 herron@cumin1002: START - Cookbook sre.hosts.reboot-single for host centrallog2002.codfw.wmnet
14:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P69969 and previous config saved to /var/cache/conftool/dbconfig/20241015-140726-ladsgroup.json
14:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T367781)', diff saved to https://phabricator.wikimedia.org/P69968 and previous config saved to /var/cache/conftool/dbconfig/20241015-140716-arnaudb.json
14:08 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2186.codfw.wmnet with reason: Maintenance
14:08 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1020.eqiad.wmnet
14:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2186.codfw.wmnet with reason: Maintenance
14:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2173.codfw.wmnet with reason: Maintenance
14:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2173.codfw.wmnet with reason: Maintenance
14:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T367781)', diff saved to https://phabricator.wikimedia.org/P69967 and previous config saved to /var/cache/conftool/dbconfig/20241015-140638-arnaudb.json
14:05 btullis@cumin1002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling reboot on A:schema
14:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T376905)', diff saved to https://phabricator.wikimedia.org/P69966 and previous config saved to /var/cache/conftool/dbconfig/20241015-140045-ladsgroup.json
14:00 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-presto1020.eqiad.wmnet
13:57 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1019.eqiad.wmnet
13:55 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog1002.eqiad.wmnet
13:54 urbanecm@deploy2002: urbanecm, matmarex: Backport for SkinComponentCopyright: Fix message existence check for history-copyright (T45646) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1191 (T376905)', diff saved to https://phabricator.wikimedia.org/P69965 and previous config saved to /var/cache/conftool/dbconfig/20241015-135234-ladsgroup.json
13:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
13:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
13:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T371742)', diff saved to https://phabricator.wikimedia.org/P69964 and previous config saved to /var/cache/conftool/dbconfig/20241015-135213-ladsgroup.json
13:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T376905)', diff saved to https://phabricator.wikimedia.org/P69963 and previous config saved to /var/cache/conftool/dbconfig/20241015-135208-ladsgroup.json
13:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P69962 and previous config saved to /var/cache/conftool/dbconfig/20241015-135131-arnaudb.json
13:51 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-presto1019.eqiad.wmnet
13:50 urbanecm@deploy2002: Started scap sync-world: Backport for SkinComponentCopyright: Fix message existence check for history-copyright (T45646)
13:48 herron@cumin1002: START - Cookbook sre.hosts.reboot-single for host centrallog1002.eqiad.wmnet
13:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P69961 and previous config saved to /var/cache/conftool/dbconfig/20241015-133701-ladsgroup.json
13:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P69960 and previous config saved to /var/cache/conftool/dbconfig/20241015-133624-arnaudb.json
13:32 urbanecm@deploy2002: Finished scap sync-world: Backport for eswiki: switch clearing link recommendations to PageSaveComplete hook (T372337), s7: Reduce revision-slots cache expiry to 60 seconds (T183490) (duration: 07m 44s)
13:27 urbanecm@deploy2002: migr, urbanecm, zabe: Continuing with sync
13:26 urbanecm@deploy2002: migr, urbanecm, zabe: Backport for eswiki: switch clearing link recommendations to PageSaveComplete hook (T372337), s7: Reduce revision-slots cache expiry to 60 seconds (T183490) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:24 urbanecm@deploy2002: Started scap sync-world: Backport for eswiki: switch clearing link recommendations to PageSaveComplete hook (T372337), s7: Reduce revision-slots cache expiry to 60 seconds (T183490)
13:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [wikidatawiki] Enable the CampaignEvents extension (T375411), GrowthExperiments: update stream configuration to capture user id (T376833) (duration: 19m 25s)
13:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P69959 and previous config saved to /var/cache/conftool/dbconfig/20241015-132154-ladsgroup.json
13:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T367781)', diff saved to https://phabricator.wikimedia.org/P69958 and previous config saved to /var/cache/conftool/dbconfig/20241015-132117-arnaudb.json
13:19 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1018.eqiad.wmnet
13:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T367781)', diff saved to https://phabricator.wikimedia.org/P69957 and previous config saved to /var/cache/conftool/dbconfig/20241015-131901-arnaudb.json
13:19 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2170.codfw.wmnet with reason: Maintenance
13:18 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2170.codfw.wmnet with reason: Maintenance
13:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T367781)', diff saved to https://phabricator.wikimedia.org/P69956 and previous config saved to /var/cache/conftool/dbconfig/20241015-131839-arnaudb.json
13:16 urbanecm@deploy2002: cyndywikime, daimona, urbanecm: Continuing with sync
13:12 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-presto1018.eqiad.wmnet
13:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T370903)', diff saved to https://phabricator.wikimedia.org/P69955 and previous config saved to /var/cache/conftool/dbconfig/20241015-131122-ladsgroup.json
13:11 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1017.eqiad.wmnet
13:11 urbanecm@deploy2002: cyndywikime, daimona, urbanecm: Backport for [wikidatawiki] Enable the CampaignEvents extension (T375411), GrowthExperiments: update stream configuration to capture user id (T376833) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T376905)', diff saved to https://phabricator.wikimedia.org/P69954 and previous config saved to /var/cache/conftool/dbconfig/20241015-130647-ladsgroup.json
13:04 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-presto1017.eqiad.wmnet
13:04 urbanecm@deploy2002: Started scap sync-world: Backport for [wikidatawiki] Enable the CampaignEvents extension (T375411), GrowthExperiments: update stream configuration to capture user id (T376833)
13:03 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1016.eqiad.wmnet
13:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P69953 and previous config saved to /var/cache/conftool/dbconfig/20241015-130332-arnaudb.json
12:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1174 (T376905)', diff saved to https://phabricator.wikimedia.org/P69952 and previous config saved to /var/cache/conftool/dbconfig/20241015-125748-ladsgroup.json
12:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
12:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
12:57 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-presto1016.eqiad.wmnet
12:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P69951 and previous config saved to /var/cache/conftool/dbconfig/20241015-125615-ladsgroup.json
12:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
12:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
12:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T376905)', diff saved to https://phabricator.wikimedia.org/P69950 and previous config saved to /var/cache/conftool/dbconfig/20241015-125203-ladsgroup.json
12:50 brouberol@cumin1002: END (FAIL) - Cookbook sre.presto.reboot-workers (exit_code=99) for Presto an-presto cluster: Reboot Presto nodes
12:50 elukey: destroy old certs from puppetmaster1001's CA (parsoid.svc.{eqiad,codfw}.wmnet, debmonitor.discovery.wmnet)
12:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P69949 and previous config saved to /var/cache/conftool/dbconfig/20241015-124825-arnaudb.json
12:46 brouberol@cumin1002: START - Cookbook sre.presto.reboot-workers for Presto an-presto cluster: Reboot Presto nodes
12:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P69948 and previous config saved to /var/cache/conftool/dbconfig/20241015-124108-ladsgroup.json
12:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P69947 and previous config saved to /var/cache/conftool/dbconfig/20241015-123656-ladsgroup.json
12:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T367781)', diff saved to https://phabricator.wikimedia.org/P69946 and previous config saved to /var/cache/conftool/dbconfig/20241015-123318-arnaudb.json
12:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T367781)', diff saved to https://phabricator.wikimedia.org/P69945 and previous config saved to /var/cache/conftool/dbconfig/20241015-123101-arnaudb.json
12:30 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2153.codfw.wmnet with reason: Maintenance
12:30 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2153.codfw.wmnet with reason: Maintenance
12:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T367781)', diff saved to https://phabricator.wikimedia.org/P69944 and previous config saved to /var/cache/conftool/dbconfig/20241015-123039-arnaudb.json
12:30 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
12:29 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
12:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T370903)', diff saved to https://phabricator.wikimedia.org/P69943 and previous config saved to /var/cache/conftool/dbconfig/20241015-122601-ladsgroup.json
12:24 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
12:24 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
12:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1230 (T370903)', diff saved to https://phabricator.wikimedia.org/P69942 and previous config saved to /var/cache/conftool/dbconfig/20241015-122251-ladsgroup.json
12:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1230.eqiad.wmnet with reason: Maintenance
12:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1230.eqiad.wmnet with reason: Maintenance
12:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P69941 and previous config saved to /var/cache/conftool/dbconfig/20241015-122149-ladsgroup.json
12:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T371742)', diff saved to https://phabricator.wikimedia.org/P69940 and previous config saved to /var/cache/conftool/dbconfig/20241015-121706-ladsgroup.json
12:17 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2152.codfw.wmnet with reason: Maintenance
12:16 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2152.codfw.wmnet with reason: Maintenance
12:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P69939 and previous config saved to /var/cache/conftool/dbconfig/20241015-121532-arnaudb.json
12:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T371742)', diff saved to https://phabricator.wikimedia.org/P69938 and previous config saved to /var/cache/conftool/dbconfig/20241015-121349-ladsgroup.json
12:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T376905)', diff saved to https://phabricator.wikimedia.org/P69937 and previous config saved to /var/cache/conftool/dbconfig/20241015-120642-ladsgroup.json
12:03 brouberol@cumin1002: END (FAIL) - Cookbook sre.presto.reboot-workers (exit_code=99) for Presto an-presto cluster: Reboot Presto nodes
12:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P69936 and previous config saved to /var/cache/conftool/dbconfig/20241015-120025-arnaudb.json
11:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P69935 and previous config saved to /var/cache/conftool/dbconfig/20241015-115842-ladsgroup.json
11:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1170 (T376905)', diff saved to https://phabricator.wikimedia.org/P69934 and previous config saved to /var/cache/conftool/dbconfig/20241015-115630-ladsgroup.json
11:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
11:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
11:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T376905)', diff saved to https://phabricator.wikimedia.org/P69933 and previous config saved to /var/cache/conftool/dbconfig/20241015-115606-ladsgroup.json
11:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T367781)', diff saved to https://phabricator.wikimedia.org/P69932 and previous config saved to /var/cache/conftool/dbconfig/20241015-114518-arnaudb.json
11:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P69931 and previous config saved to /var/cache/conftool/dbconfig/20241015-114336-ladsgroup.json
11:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T367781)', diff saved to https://phabricator.wikimedia.org/P69930 and previous config saved to /var/cache/conftool/dbconfig/20241015-114302-arnaudb.json
11:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2146.codfw.wmnet with reason: Maintenance
11:42 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2146.codfw.wmnet with reason: Maintenance
11:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T367781)', diff saved to https://phabricator.wikimedia.org/P69929 and previous config saved to /var/cache/conftool/dbconfig/20241015-114240-arnaudb.json
11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P69927 and previous config saved to /var/cache/conftool/dbconfig/20241015-114059-ladsgroup.json
11:34 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
11:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T371742)', diff saved to https://phabricator.wikimedia.org/P69926 and previous config saved to /var/cache/conftool/dbconfig/20241015-112829-ladsgroup.json
11:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P69925 and previous config saved to /var/cache/conftool/dbconfig/20241015-112733-arnaudb.json
11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P69924 and previous config saved to /var/cache/conftool/dbconfig/20241015-112551-ladsgroup.json
11:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P69923 and previous config saved to /var/cache/conftool/dbconfig/20241015-111226-arnaudb.json
11:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T376905)', diff saved to https://phabricator.wikimedia.org/P69922 and previous config saved to /var/cache/conftool/dbconfig/20241015-111045-ladsgroup.json
11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1230 (T371742)', diff saved to https://phabricator.wikimedia.org/P69921 and previous config saved to /var/cache/conftool/dbconfig/20241015-110741-ladsgroup.json
11:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance
11:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance
11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T376905)', diff saved to https://phabricator.wikimedia.org/P69920 and previous config saved to /var/cache/conftool/dbconfig/20241015-110132-ladsgroup.json
11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
10:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T367781)', diff saved to https://phabricator.wikimedia.org/P69919 and previous config saved to /var/cache/conftool/dbconfig/20241015-105719-arnaudb.json
10:53 tappof: expand LVs on prometheus instances (k8s-mlserve and k8s-stagin) T377196
10:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T367781)', diff saved to https://phabricator.wikimedia.org/P69918 and previous config saved to /var/cache/conftool/dbconfig/20241015-105301-arnaudb.json
10:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2145.codfw.wmnet with reason: Maintenance
10:52 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2145.codfw.wmnet with reason: Maintenance
10:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2141.codfw.wmnet with reason: Maintenance
10:52 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2141.codfw.wmnet with reason: Maintenance
10:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T367781)', diff saved to https://phabricator.wikimedia.org/P69917 and previous config saved to /var/cache/conftool/dbconfig/20241015-105213-arnaudb.json
10:38 brouberol@cumin1002: START - Cookbook sre.presto.reboot-workers for Presto an-presto cluster: Reboot Presto nodes
10:38 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk2002.codfw.wmnet
10:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P69915 and previous config saved to /var/cache/conftool/dbconfig/20241015-103706-arnaudb.json
10:34 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host flink-zk2002.codfw.wmnet
10:30 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk2003.codfw.wmnet
10:26 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host flink-zk2003.codfw.wmnet
10:25 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk2001.codfw.wmnet
10:22 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host flink-zk2001.codfw.wmnet
10:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P69914 and previous config saved to /var/cache/conftool/dbconfig/20241015-102159-arnaudb.json
10:21 brouberol@cumin1002: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-flink-codfw cluster: Roll restart of jvm daemons.
10:14 brouberol@cumin1002: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-flink-codfw cluster: Roll restart of jvm daemons.
10:11 brouberol@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:dse-k8s-worker
10:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T367781)', diff saved to https://phabricator.wikimedia.org/P69913 and previous config saved to /var/cache/conftool/dbconfig/20241015-100652-arnaudb.json
10:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2130 (T367781)', diff saved to https://phabricator.wikimedia.org/P69912 and previous config saved to /var/cache/conftool/dbconfig/20241015-100435-arnaudb.json
10:04 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2130.codfw.wmnet with reason: Maintenance
10:04 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2130.codfw.wmnet with reason: Maintenance
10:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T367781)', diff saved to https://phabricator.wikimedia.org/P69911 and previous config saved to /var/cache/conftool/dbconfig/20241015-100413-arnaudb.json
09:57 brouberol@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker
09:55 brouberol@cumin1002: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:dse-k8s-worker
09:52 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
09:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P69910 and previous config saved to /var/cache/conftool/dbconfig/20241015-094906-arnaudb.json
09:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P69909 and previous config saved to /var/cache/conftool/dbconfig/20241015-093359-arnaudb.json
09:26 brouberol@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker
09:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T367781)', diff saved to https://phabricator.wikimedia.org/P69908 and previous config saved to /var/cache/conftool/dbconfig/20241015-091852-arnaudb.json
09:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2116 (T367781)', diff saved to https://phabricator.wikimedia.org/P69907 and previous config saved to /var/cache/conftool/dbconfig/20241015-091635-arnaudb.json
09:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2116.codfw.wmnet with reason: Maintenance
09:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2116.codfw.wmnet with reason: Maintenance
09:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
09:15 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
09:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1240.eqiad.wmnet with reason: Maintenance
09:15 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1240.eqiad.wmnet with reason: Maintenance
09:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1239.eqiad.wmnet with reason: Maintenance
09:15 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1239.eqiad.wmnet with reason: Maintenance
09:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T367781)', diff saved to https://phabricator.wikimedia.org/P69906 and previous config saved to /var/cache/conftool/dbconfig/20241015-091502-arnaudb.json
09:07 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
08:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P69905 and previous config saved to /var/cache/conftool/dbconfig/20241015-085955-arnaudb.json
08:47 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: init - oblivian@cumin2002
08:46 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: init - oblivian@cumin2002
08:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P69903 and previous config saved to /var/cache/conftool/dbconfig/20241015-084448-arnaudb.json
08:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T367781)', diff saved to https://phabricator.wikimedia.org/P69902 and previous config saved to /var/cache/conftool/dbconfig/20241015-082941-arnaudb.json
08:27 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: maintenance
08:27 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
08:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T367781)', diff saved to https://phabricator.wikimedia.org/P69901 and previous config saved to /var/cache/conftool/dbconfig/20241015-082727-arnaudb.json
08:27 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: maintenance
08:27 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1235.eqiad.wmnet with reason: Maintenance
08:27 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1235.eqiad.wmnet with reason: Maintenance
08:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T367781)', diff saved to https://phabricator.wikimedia.org/P69900 and previous config saved to /var/cache/conftool/dbconfig/20241015-082704-arnaudb.json
08:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P69899 and previous config saved to /var/cache/conftool/dbconfig/20241015-081157-arnaudb.json
07:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P69898 and previous config saved to /var/cache/conftool/dbconfig/20241015-075650-arnaudb.json
07:48 arnaudb@cumin1002: dbctl commit (dc=all): 'db2147 (re)pooling @ 100%: post sunday p.age T368098', diff saved to https://phabricator.wikimedia.org/P69897 and previous config saved to /var/cache/conftool/dbconfig/20241015-074843-arnaudb.json
07:47 hashar: Restarted Gerrit - T373897
07:46 hashar@deploy2002: Finished deploy [gerrit/gerrit@2f0c927]: Gerrit to 3.10.2 on gerrit1003 - T373897 (duration: 00m 09s)
07:46 hashar@deploy2002: Started deploy [gerrit/gerrit@2f0c927]: Gerrit to 3.10.2 on gerrit1003 - T373897
07:42 hashar@deploy2002: Finished deploy [gerrit/gerrit@2f0c927]: Gerrit to 3.10.2 on gerrit2002 - T373897 (duration: 00m 07s)
07:42 hashar@deploy2002: Started deploy [gerrit/gerrit@2f0c927]: Gerrit to 3.10.2 on gerrit2002 - T373897
07:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T367781)', diff saved to https://phabricator.wikimedia.org/P69896 and previous config saved to /var/cache/conftool/dbconfig/20241015-074143-arnaudb.json
07:40 hashar@deploy2002: Finished deploy [gerrit/gerrit@2f0c927]: Gerrit to 3.10.2 on gerrit2003 - T373897 (duration: 00m 07s)
07:40 hashar@deploy2002: Started deploy [gerrit/gerrit@2f0c927]: Gerrit to 3.10.2 on gerrit2003 - T373897
07:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T367781)', diff saved to https://phabricator.wikimedia.org/P69895 and previous config saved to /var/cache/conftool/dbconfig/20241015-073928-arnaudb.json
07:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1234.eqiad.wmnet with reason: Maintenance
07:39 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1234.eqiad.wmnet with reason: Maintenance
07:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T367781)', diff saved to https://phabricator.wikimedia.org/P69894 and previous config saved to /var/cache/conftool/dbconfig/20241015-073906-arnaudb.json
07:38 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit[1003,2002-2003].wikimedia.org with reason: Gerrit 3.10.2 update
07:38 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit[1003,2002-2003].wikimedia.org with reason: Gerrit 3.10.2 update
07:35 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
07:33 arnaudb@cumin1002: dbctl commit (dc=all): 'db2147 (re)pooling @ 75%: post sunday p.age T368098', diff saved to https://phabricator.wikimedia.org/P69893 and previous config saved to /var/cache/conftool/dbconfig/20241015-073338-arnaudb.json
07:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P69892 and previous config saved to /var/cache/conftool/dbconfig/20241015-072359-arnaudb.json
07:18 arnaudb@cumin1002: dbctl commit (dc=all): 'db2147 (re)pooling @ 50%: post sunday p.age T368098', diff saved to https://phabricator.wikimedia.org/P69891 and previous config saved to /var/cache/conftool/dbconfig/20241015-071833-arnaudb.json
07:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P69890 and previous config saved to /var/cache/conftool/dbconfig/20241015-070852-arnaudb.json
07:03 arnaudb@cumin1002: dbctl commit (dc=all): 'db2147 (re)pooling @ 25%: post sunday p.age T368098', diff saved to https://phabricator.wikimedia.org/P69889 and previous config saved to /var/cache/conftool/dbconfig/20241015-070327-arnaudb.json
06:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T367781)', diff saved to https://phabricator.wikimedia.org/P69888 and previous config saved to /var/cache/conftool/dbconfig/20241015-065345-arnaudb.json
06:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T367781)', diff saved to https://phabricator.wikimedia.org/P69887 and previous config saved to /var/cache/conftool/dbconfig/20241015-065130-arnaudb.json
06:51 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1232.eqiad.wmnet with reason: Maintenance
06:51 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1232.eqiad.wmnet with reason: Maintenance
06:30 kart_: Updated MinT to 2024-10-11-113932-production
06:27 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
06:18 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
06:16 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
06:08 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
05:38 _joe_: restart tomcat on idp1004
05:35 _joe_: restart tomcat on idp2004
05:15 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
05:10 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
04:00 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.24 (duration: 00m 56s)
03:51 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.43.0-wmf.27 refs T375658 (duration: 48m 30s)
03:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.43.0-wmf.27 refs T375658
02:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
02:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
02:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T376905)', diff saved to https://phabricator.wikimedia.org/P69885 and previous config saved to /var/cache/conftool/dbconfig/20241015-024037-ladsgroup.json
02:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P69884 and previous config saved to /var/cache/conftool/dbconfig/20241015-022530-ladsgroup.json
02:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P69883 and previous config saved to /var/cache/conftool/dbconfig/20241015-021023-ladsgroup.json
01:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T376905)', diff saved to https://phabricator.wikimedia.org/P69882 and previous config saved to /var/cache/conftool/dbconfig/20241015-015516-ladsgroup.json
01:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1246 (T376905)', diff saved to https://phabricator.wikimedia.org/P69881 and previous config saved to /var/cache/conftool/dbconfig/20241015-014831-ladsgroup.json
01:48 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
01:48 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
01:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T376905)', diff saved to https://phabricator.wikimedia.org/P69880 and previous config saved to /var/cache/conftool/dbconfig/20241015-014803-ladsgroup.json
01:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P69879 and previous config saved to /var/cache/conftool/dbconfig/20241015-013257-ladsgroup.json
01:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P69878 and previous config saved to /var/cache/conftool/dbconfig/20241015-011749-ladsgroup.json
01:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T376905)', diff saved to https://phabricator.wikimedia.org/P69877 and previous config saved to /var/cache/conftool/dbconfig/20241015-010242-ladsgroup.json
00:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T376905)', diff saved to https://phabricator.wikimedia.org/P69876 and previous config saved to /var/cache/conftool/dbconfig/20241015-005551-ladsgroup.json
00:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T370903)', diff saved to https://phabricator.wikimedia.org/P69875 and previous config saved to /var/cache/conftool/dbconfig/20241015-005546-ladsgroup.json
00:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
00:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
00:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T376905)', diff saved to https://phabricator.wikimedia.org/P69874 and previous config saved to /var/cache/conftool/dbconfig/20241015-005525-ladsgroup.json
00:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P69873 and previous config saved to /var/cache/conftool/dbconfig/20241015-004039-ladsgroup.json
00:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P69872 and previous config saved to /var/cache/conftool/dbconfig/20241015-004018-ladsgroup.json
00:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P69871 and previous config saved to /var/cache/conftool/dbconfig/20241015-002531-ladsgroup.json
00:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P69870 and previous config saved to /var/cache/conftool/dbconfig/20241015-002511-ladsgroup.json
00:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T370903)', diff saved to https://phabricator.wikimedia.org/P69869 and previous config saved to /var/cache/conftool/dbconfig/20241015-001024-ladsgroup.json
00:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T376905)', diff saved to https://phabricator.wikimedia.org/P69868 and previous config saved to /var/cache/conftool/dbconfig/20241015-001004-ladsgroup.json
00:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1229 (T376905)', diff saved to https://phabricator.wikimedia.org/P69867 and previous config saved to /var/cache/conftool/dbconfig/20241015-000304-ladsgroup.json
00:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: Maintenance
00:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: Maintenance
00:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T376905)', diff saved to https://phabricator.wikimedia.org/P69866 and previous config saved to /var/cache/conftool/dbconfig/20241015-000236-ladsgroup.json

2024-10-14

23:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P69865 and previous config saved to /var/cache/conftool/dbconfig/20241014-234729-ladsgroup.json
23:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P69864 and previous config saved to /var/cache/conftool/dbconfig/20241014-233222-ladsgroup.json
23:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2140 (T370903)', diff saved to https://phabricator.wikimedia.org/P69863 and previous config saved to /var/cache/conftool/dbconfig/20241014-232857-ladsgroup.json
23:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2140.codfw.wmnet with reason: Maintenance
23:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2140.codfw.wmnet with reason: Maintenance
23:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T370903)', diff saved to https://phabricator.wikimedia.org/P69862 and previous config saved to /var/cache/conftool/dbconfig/20241014-232835-ladsgroup.json
23:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T376905)', diff saved to https://phabricator.wikimedia.org/P69861 and previous config saved to /var/cache/conftool/dbconfig/20241014-231715-ladsgroup.json
23:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P69860 and previous config saved to /var/cache/conftool/dbconfig/20241014-231328-ladsgroup.json
23:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T376905)', diff saved to https://phabricator.wikimedia.org/P69859 and previous config saved to /var/cache/conftool/dbconfig/20241014-230903-ladsgroup.json
23:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
23:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
23:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T376905)', diff saved to https://phabricator.wikimedia.org/P69858 and previous config saved to /var/cache/conftool/dbconfig/20241014-230838-ladsgroup.json
22:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P69857 and previous config saved to /var/cache/conftool/dbconfig/20241014-225818-ladsgroup.json
22:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T371742)', diff saved to https://phabricator.wikimedia.org/P69856 and previous config saved to /var/cache/conftool/dbconfig/20241014-225528-ladsgroup.json
22:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P69855 and previous config saved to /var/cache/conftool/dbconfig/20241014-225331-ladsgroup.json
22:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T370903)', diff saved to https://phabricator.wikimedia.org/P69854 and previous config saved to /var/cache/conftool/dbconfig/20241014-224311-ladsgroup.json
22:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P69853 and previous config saved to /var/cache/conftool/dbconfig/20241014-224022-ladsgroup.json
22:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P69852 and previous config saved to /var/cache/conftool/dbconfig/20241014-223824-ladsgroup.json
22:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P69851 and previous config saved to /var/cache/conftool/dbconfig/20241014-222515-ladsgroup.json
22:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T376905)', diff saved to https://phabricator.wikimedia.org/P69850 and previous config saved to /var/cache/conftool/dbconfig/20241014-222317-ladsgroup.json
22:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P69849 and previous config saved to /var/cache/conftool/dbconfig/20241014-222009-ladsgroup.json
22:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T376905)', diff saved to https://phabricator.wikimedia.org/P69848 and previous config saved to /var/cache/conftool/dbconfig/20241014-221508-ladsgroup.json
22:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
22:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
22:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T376905)', diff saved to https://phabricator.wikimedia.org/P69847 and previous config saved to /var/cache/conftool/dbconfig/20241014-221443-ladsgroup.json
22:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T371742)', diff saved to https://phabricator.wikimedia.org/P69846 and previous config saved to /var/cache/conftool/dbconfig/20241014-221008-ladsgroup.json
22:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P69845 and previous config saved to /var/cache/conftool/dbconfig/20241014-220504-ladsgroup.json
22:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T370903)', diff saved to https://phabricator.wikimedia.org/P69844 and previous config saved to /var/cache/conftool/dbconfig/20241014-220134-ladsgroup.json
22:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1243.eqiad.wmnet with reason: Maintenance
22:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1243.eqiad.wmnet with reason: Maintenance
21:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P69843 and previous config saved to /var/cache/conftool/dbconfig/20241014-215936-ladsgroup.json
21:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P69842 and previous config saved to /var/cache/conftool/dbconfig/20241014-214958-ladsgroup.json
21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1236 (T371742)', diff saved to https://phabricator.wikimedia.org/P69841 and previous config saved to /var/cache/conftool/dbconfig/20241014-214515-ladsgroup.json
21:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1236.eqiad.wmnet with reason: Maintenance
21:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1236.eqiad.wmnet with reason: Maintenance
21:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P69840 and previous config saved to /var/cache/conftool/dbconfig/20241014-214429-ladsgroup.json
21:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 (T367856)', diff saved to https://phabricator.wikimedia.org/P69839 and previous config saved to /var/cache/conftool/dbconfig/20241014-213902-ladsgroup.json
21:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1226.eqiad.wmnet with reason: Maintenance
21:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1226.eqiad.wmnet with reason: Maintenance
21:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P69838 and previous config saved to /var/cache/conftool/dbconfig/20241014-213453-ladsgroup.json
21:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T376905)', diff saved to https://phabricator.wikimedia.org/P69837 and previous config saved to /var/cache/conftool/dbconfig/20241014-212922-ladsgroup.json
21:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T376905)', diff saved to https://phabricator.wikimedia.org/P69836 and previous config saved to /var/cache/conftool/dbconfig/20241014-212001-ladsgroup.json
21:19 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
21:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
21:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T376905)', diff saved to https://phabricator.wikimedia.org/P69835 and previous config saved to /var/cache/conftool/dbconfig/20241014-211937-ladsgroup.json
21:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P69834 and previous config saved to /var/cache/conftool/dbconfig/20241014-210430-ladsgroup.json
20:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P69833 and previous config saved to /var/cache/conftool/dbconfig/20241014-204923-ladsgroup.json
20:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T376905)', diff saved to https://phabricator.wikimedia.org/P69832 and previous config saved to /var/cache/conftool/dbconfig/20241014-203416-ladsgroup.json
20:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1162 (T376905)', diff saved to https://phabricator.wikimedia.org/P69831 and previous config saved to /var/cache/conftool/dbconfig/20241014-202504-ladsgroup.json
20:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
20:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
20:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T376905)', diff saved to https://phabricator.wikimedia.org/P69830 and previous config saved to /var/cache/conftool/dbconfig/20241014-202439-ladsgroup.json
20:21 TheresNoTime: UTC late backport window done
20:18 samtar@deploy2002: Finished scap sync-world: Backport for Missing.php: Redirect Scots Wiktionary to Scots Wikipedia (T249648) (duration: 08m 14s)
20:14 samtar@deploy2002: samtar, pppery: Continuing with sync
20:12 samtar@deploy2002: samtar, pppery: Backport for Missing.php: Redirect Scots Wiktionary to Scots Wikipedia (T249648) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:10 samtar@deploy2002: Started scap sync-world: Backport for Missing.php: Redirect Scots Wiktionary to Scots Wikipedia (T249648)
20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P69829 and previous config saved to /var/cache/conftool/dbconfig/20241014-200932-ladsgroup.json
19:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P69828 and previous config saved to /var/cache/conftool/dbconfig/20241014-195425-ladsgroup.json
19:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T376905)', diff saved to https://phabricator.wikimedia.org/P69827 and previous config saved to /var/cache/conftool/dbconfig/20241014-193918-ladsgroup.json
19:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T376905)', diff saved to https://phabricator.wikimedia.org/P69826 and previous config saved to /var/cache/conftool/dbconfig/20241014-192956-ladsgroup.json
19:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
19:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
19:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
19:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
18:57 aqu@deploy2002: Finished deploy [airflow-dags/analytics@a1a70ce]: Deploy last version for Refine staging [airflow-dags@a1a70ce8] (duration: 00m 29s)
18:57 aqu@deploy2002: Started deploy [airflow-dags/analytics@a1a70ce]: Deploy last version for Refine staging [airflow-dags@a1a70ce8]
18:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
18:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
18:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T376905)', diff saved to https://phabricator.wikimedia.org/P69825 and previous config saved to /var/cache/conftool/dbconfig/20241014-185225-ladsgroup.json
18:47 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@a1a70ce]: Deploy last fixes on Refine staging [airflow-dags@a1a70ce8] (duration: 00m 13s)
18:47 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@a1a70ce]: Deploy last fixes on Refine staging [airflow-dags@a1a70ce8]
18:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P69824 and previous config saved to /var/cache/conftool/dbconfig/20241014-183718-ladsgroup.json
18:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P69823 and previous config saved to /var/cache/conftool/dbconfig/20241014-182211-ladsgroup.json
18:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T376905)', diff saved to https://phabricator.wikimedia.org/P69822 and previous config saved to /var/cache/conftool/dbconfig/20241014-180704-ladsgroup.json
17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1231 (T376905)', diff saved to https://phabricator.wikimedia.org/P69821 and previous config saved to /var/cache/conftool/dbconfig/20241014-170647-ladsgroup.json
17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
17:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
17:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
17:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T376905)', diff saved to https://phabricator.wikimedia.org/P69820 and previous config saved to /var/cache/conftool/dbconfig/20241014-170123-ladsgroup.json
16:51 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: cloudvirt1063 needs maintenance T375223
16:50 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: cloudvirt1063 needs maintenance T375223
16:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P69819 and previous config saved to /var/cache/conftool/dbconfig/20241014-164616-ladsgroup.json
16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P69818 and previous config saved to /var/cache/conftool/dbconfig/20241014-163109-ladsgroup.json
16:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T376905)', diff saved to https://phabricator.wikimedia.org/P69817 and previous config saved to /var/cache/conftool/dbconfig/20241014-161602-ladsgroup.json
16:03 sergi0: Running `sgimeno@mwmaint2002:~$ foreachwiki userOptions.php --delete --old=1 growthexperiments-tour-newimpact-discovery` (T376461)
15:52 aikochou@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
15:46 aikochou@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
15:16 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
15:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T376905)', diff saved to https://phabricator.wikimedia.org/P69816 and previous config saved to /var/cache/conftool/dbconfig/20241014-151546-ladsgroup.json
15:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
15:15 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
15:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
15:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T376905)', diff saved to https://phabricator.wikimedia.org/P69815 and previous config saved to /var/cache/conftool/dbconfig/20241014-151521-ladsgroup.json
15:07 elukey@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
15:06 elukey@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
15:05 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
15:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P69814 and previous config saved to /var/cache/conftool/dbconfig/20241014-150014-ladsgroup.json
14:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P69813 and previous config saved to /var/cache/conftool/dbconfig/20241014-144507-ladsgroup.json
14:43 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
14:43 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
14:41 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
14:41 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
14:39 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T376905)', diff saved to https://phabricator.wikimedia.org/P69812 and previous config saved to /var/cache/conftool/dbconfig/20241014-143000-ladsgroup.json
14:16 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts an-worker1177.eqiad.wmnet
14:16 stevemunene@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:16 stevemunene@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1177.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1002"
14:16 stevemunene@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1177.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1002"
14:12 stevemunene@cumin1002: START - Cookbook sre.dns.netbox
14:12 Lucas_WMDE: UTC afternoon backport+config window done
14:10 Lucas_WMDE: [untruncated duration: 06m 48s]
14:09 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for refactor(tests): don't use per-method coverage annotation, refactor(HomepageHooks): extract method for simpler modifyability, Clear LinkRecommendation suggestions on page save (T364341 T372337), Run fixLinkRecommendationData even when disabled in CC (T373176) (duration: 0
14:07 stevemunene@cumin1002: START - Cookbook sre.hosts.decommission for hosts an-worker1177.eqiad.wmnet
14:07 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts an-worker1176.eqiad.wmnet
14:07 stevemunene@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:07 stevemunene@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1176.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1002"
14:06 stevemunene@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1176.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1002"
14:04 lucaswerkmeister-wmde@deploy2002: migr, lucaswerkmeister-wmde: Continuing with sync
14:04 lucaswerkmeister-wmde@deploy2002: migr, lucaswerkmeister-wmde: Backport for refactor(tests): don't use per-method coverage annotation, refactor(HomepageHooks): extract method for simpler modifyability, Clear LinkRecommendation suggestions on page save (T364341 T372337), Run fixLinkRecommendationData even when disabled in CC (T373176) synced to
14:03 stevemunene@cumin1002: START - Cookbook sre.dns.netbox
14:02 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for refactor(tests): don't use per-method coverage annotation, refactor(HomepageHooks): extract method for simpler modifyability, Clear LinkRecommendation suggestions on page save (T364341 T372337), Run fixLinkRecommendationData even when disabled in CC (T373176)
13:58 stevemunene@cumin1002: START - Cookbook sre.hosts.decommission for hosts an-worker1176.eqiad.wmnet
13:46 ladsgroup@deploy2002: Finished scap sync-world: Backport for Update interwiki.php (duration: 07m 00s)
13:45 kcvelaga@deploy2002: Finished deploy [airflow-dags/analytics_product@fbcf880]: T375480 (duration: 01m 07s)
13:44 kcvelaga@deploy2002: Started deploy [airflow-dags/analytics_product@fbcf880]: T375480
13:41 ladsgroup@deploy2002: ladsgroup: Continuing with sync
13:41 ladsgroup@deploy2002: ladsgroup: Backport for Update interwiki.php synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:39 ladsgroup@deploy2002: Started scap sync-world: Backport for Update interwiki.php
13:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aux-k8s-etcd1002.eqiad.wmnet
13:35 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:35 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-etcd1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
13:34 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-etcd1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
13:31 elukey@cumin1002: START - Cookbook sre.dns.netbox
13:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T376905)', diff saved to https://phabricator.wikimedia.org/P69811 and previous config saved to /var/cache/conftool/dbconfig/20241014-132944-ladsgroup.json
13:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
13:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
13:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T376905)', diff saved to https://phabricator.wikimedia.org/P69810 and previous config saved to /var/cache/conftool/dbconfig/20241014-132918-ladsgroup.json
13:26 elukey@cumin1002: START - Cookbook sre.hosts.decommission for hosts aux-k8s-etcd1002.eqiad.wmnet
13:26 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aux-k8s-etcd1001.eqiad.wmnet
13:26 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:26 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-etcd1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
13:26 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-etcd1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
13:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P69809 and previous config saved to /var/cache/conftool/dbconfig/20241014-132409-ladsgroup.json
13:22 elukey@cumin1002: START - Cookbook sre.dns.netbox
13:18 elukey@cumin1002: START - Cookbook sre.hosts.decommission for hosts aux-k8s-etcd1001.eqiad.wmnet
13:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-etcd1002.eqiad.wmnet with reason: about to decom
13:16 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-etcd1002.eqiad.wmnet with reason: about to decom
13:15 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-etcd1001.eqiad.wmnet with reason: about to decom
13:15 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-etcd1001.eqiad.wmnet with reason: about to decom
13:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P69808 and previous config saved to /var/cache/conftool/dbconfig/20241014-131411-ladsgroup.json
13:13 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [uawikimedia] Enable the CampaignEvents extension (T376695) (duration: 10m 19s)
13:09 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, daimona: Continuing with sync
13:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P69807 and previous config saved to /var/cache/conftool/dbconfig/20241014-130904-ladsgroup.json
13:05 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, daimona: Backport for [uawikimedia] Enable the CampaignEvents extension (T376695) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:03 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [uawikimedia] Enable the CampaignEvents extension (T376695)
12:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P69806 and previous config saved to /var/cache/conftool/dbconfig/20241014-125904-ladsgroup.json
12:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P69805 and previous config saved to /var/cache/conftool/dbconfig/20241014-125358-ladsgroup.json
12:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1219 (T367781)', diff saved to https://phabricator.wikimedia.org/P69804 and previous config saved to /var/cache/conftool/dbconfig/20241014-124554-arnaudb.json
12:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1219.eqiad.wmnet with reason: Maintenance
12:45 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1219.eqiad.wmnet with reason: Maintenance
12:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T367781)', diff saved to https://phabricator.wikimedia.org/P69803 and previous config saved to /var/cache/conftool/dbconfig/20241014-124532-arnaudb.json
12:44 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503] (duration: 00m 12s)
12:44 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503]
12:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T376905)', diff saved to https://phabricator.wikimedia.org/P69802 and previous config saved to /var/cache/conftool/dbconfig/20241014-124357-ladsgroup.json
12:43 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aux-k8s-worker1001.eqiad.wmnet
12:43 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:43 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-worker1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
12:41 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-worker1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
12:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P69801 and previous config saved to /var/cache/conftool/dbconfig/20241014-123853-ladsgroup.json
12:37 elukey@cumin1002: START - Cookbook sre.dns.netbox
12:32 elukey@cumin1002: START - Cookbook sre.hosts.decommission for hosts aux-k8s-worker1001.eqiad.wmnet
12:32 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aux-k8s-ctrl1001.eqiad.wmnet
12:32 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:32 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-ctrl1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
12:32 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-ctrl1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
12:30 hnowlan: removed all aqsv1 service components from aqs* hosts
12:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P69800 and previous config saved to /var/cache/conftool/dbconfig/20241014-123025-arnaudb.json
12:28 elukey@cumin1002: START - Cookbook sre.dns.netbox
12:23 elukey@cumin1002: START - Cookbook sre.hosts.decommission for hosts aux-k8s-ctrl1001.eqiad.wmnet
12:22 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=aux-k8s-worker1001.eqiad.wmnet
12:22 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=aux-k8s-ctrl1001.eqiad.wmnet
12:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P69799 and previous config saved to /var/cache/conftool/dbconfig/20241014-121518-arnaudb.json
12:09 elukey: increase etcd k8s aux cluster from 3 -> 5 - T344230
12:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T367781)', diff saved to https://phabricator.wikimedia.org/P69798 and previous config saved to /var/cache/conftool/dbconfig/20241014-120011-arnaudb.json
11:59 aborrero@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:59 aborrero@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb2004-dev cloud-private adddress - aborrero@cumin1002"
11:59 aborrero@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb2004-dev cloud-private adddress - aborrero@cumin1002"
11:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1218 (T367781)', diff saved to https://phabricator.wikimedia.org/P69797 and previous config saved to /var/cache/conftool/dbconfig/20241014-115755-arnaudb.json
11:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1218.eqiad.wmnet with reason: Maintenance
11:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1218.eqiad.wmnet with reason: Maintenance
11:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T367781)', diff saved to https://phabricator.wikimedia.org/P69796 and previous config saved to /var/cache/conftool/dbconfig/20241014-115732-arnaudb.json
11:56 Dreamy_Jazz: Started time limited scan on enwiki for MediaModeration - https://wikitech.wikimedia.org/wiki/MediaModeration
11:56 aborrero@cumin1002: START - Cookbook sre.dns.netbox
11:52 btullis@cumin1002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wcqs-public
11:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2194.codfw.wmnet onto db2227.codfw.wmnet
11:50 btullis@cumin1002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wcqs-public
11:50 hnowlan@deploy2002: Finished deploy [restbase/deploy@26112d4]: Remove unused AQS components. Add bdrwiki (T371761) (duration: 15m 38s)
11:45 Dreamy_Jazz: Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
11:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1173 (T376905)', diff saved to https://phabricator.wikimedia.org/P69794 and previous config saved to /var/cache/conftool/dbconfig/20241014-114341-ladsgroup.json
11:43 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
11:43 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
11:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T376905)', diff saved to https://phabricator.wikimedia.org/P69793 and previous config saved to /var/cache/conftool/dbconfig/20241014-114316-ladsgroup.json
11:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P69792 and previous config saved to /var/cache/conftool/dbconfig/20241014-114225-arnaudb.json
11:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P69791 and previous config saved to /var/cache/conftool/dbconfig/20241014-113941-arnaudb.json
11:34 hnowlan@deploy2002: Started deploy [restbase/deploy@26112d4]: Remove unused AQS components. Add bdrwiki (T371761)
11:31 andrewtavis-wmde@deploy2002: Finished deploy [airflow-dags/wmde@c9a2532]: (no justification provided) (duration: 00m 08s)
11:30 andrewtavis-wmde@deploy2002: Started deploy [airflow-dags/wmde@c9a2532]: (no justification provided)
11:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P69790 and previous config saved to /var/cache/conftool/dbconfig/20241014-112809-ladsgroup.json
11:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P69789 and previous config saved to /var/cache/conftool/dbconfig/20241014-112719-arnaudb.json
11:26 claime: Running ./redis-check-aof --fix on rdb1014 tcp_6379 instance - T376961
11:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P69788 and previous config saved to /var/cache/conftool/dbconfig/20241014-112434-arnaudb.json
11:16 ladsgroup@deploy2002: Finished scap sync-world: Creating bclwikisource (T377084) (duration: 06m 49s)
11:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P69787 and previous config saved to /var/cache/conftool/dbconfig/20241014-111302-ladsgroup.json
11:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T367781)', diff saved to https://phabricator.wikimedia.org/P69786 and previous config saved to /var/cache/conftool/dbconfig/20241014-111211-arnaudb.json
11:10 ladsgroup@deploy2002: Started scap sync-world: Creating bclwikisource (T377084)
11:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1207 (T367781)', diff saved to https://phabricator.wikimedia.org/P69785 and previous config saved to /var/cache/conftool/dbconfig/20241014-110956-arnaudb.json
11:09 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1207.eqiad.wmnet with reason: Maintenance
11:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1207.eqiad.wmnet with reason: Maintenance
11:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T367781)', diff saved to https://phabricator.wikimedia.org/P69784 and previous config saved to /var/cache/conftool/dbconfig/20241014-110933-arnaudb.json
11:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P69783 and previous config saved to /var/cache/conftool/dbconfig/20241014-110927-arnaudb.json
11:07 ladsgroup@deploy2002: Finished scap sync-world: Creating ibawiki (T376568) (duration: 06m 45s)
11:05 eoghan@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts1003.eqiad.wmnet
11:01 ladsgroup@deploy2002: Started scap sync-world: Creating ibawiki (T376568)
11:00 eoghan@cumin2002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet
10:58 ladsgroup@deploy2002: Finished scap sync-world: Creating annwiki (T376332) (duration: 06m 45s)
10:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T376905)', diff saved to https://phabricator.wikimedia.org/P69782 and previous config saved to /var/cache/conftool/dbconfig/20241014-105755-ladsgroup.json
10:55 mvernon@cumin1002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
10:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P69781 and previous config saved to /var/cache/conftool/dbconfig/20241014-105426-arnaudb.json
10:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P69780 and previous config saved to /var/cache/conftool/dbconfig/20241014-105421-arnaudb.json
10:52 ladsgroup@deploy2002: Started scap sync-world: Creating annwiki (T376332)
10:51 mvernon@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
10:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T376905)', diff saved to https://phabricator.wikimedia.org/P69779 and previous config saved to /var/cache/conftool/dbconfig/20241014-104941-ladsgroup.json
10:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
10:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
10:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T376905)', diff saved to https://phabricator.wikimedia.org/P69778 and previous config saved to /var/cache/conftool/dbconfig/20241014-104916-ladsgroup.json
10:48 ladsgroup@deploy2002: Finished scap sync-world: Creating tddwiki (T375422) (duration: 06m 46s)
10:44 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert1002.wikimedia.org with reason: init - oblivian@cumin2002
10:44 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert1002.wikimedia.org with reason: init - oblivian@cumin2002
10:42 ladsgroup@deploy2002: Started scap sync-world: Creating tddwiki (T375422)
10:40 ladsgroup@deploy2002: Finished scap sync-world: Creating nrwiki (T375087) (duration: 06m 54s)
10:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P69777 and previous config saved to /var/cache/conftool/dbconfig/20241014-103919-arnaudb.json
10:35 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert2002.wikimedia.org with reason: init - oblivian@cumin2002
10:35 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert2002.wikimedia.org with reason: init - oblivian@cumin2002
10:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P69776 and previous config saved to /var/cache/conftool/dbconfig/20241014-103410-ladsgroup.json
10:33 ladsgroup@deploy2002: Started scap sync-world: Creating nrwiki (T375087)
10:31 ladsgroup@deploy2002: Finished scap sync-world: Backport for Add namespace translations for Tai Nüa (tdd) (T375421) (duration: 06m 45s)
10:27 ladsgroup@deploy2002: ladsgroup: Continuing with sync
10:27 ladsgroup@deploy2002: ladsgroup: Backport for Add namespace translations for Tai Nüa (tdd) (T375421) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:25 ladsgroup@deploy2002: Started scap sync-world: Backport for Add namespace translations for Tai Nüa (tdd) (T375421)
10:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T367781)', diff saved to https://phabricator.wikimedia.org/P69775 and previous config saved to /var/cache/conftool/dbconfig/20241014-102412-arnaudb.json
10:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T367781)', diff saved to https://phabricator.wikimedia.org/P69774 and previous config saved to /var/cache/conftool/dbconfig/20241014-102256-arnaudb.json
10:22 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1206.eqiad.wmnet with reason: Maintenance
10:22 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1206.eqiad.wmnet with reason: Maintenance
10:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T367781)', diff saved to https://phabricator.wikimedia.org/P69773 and previous config saved to /var/cache/conftool/dbconfig/20241014-102234-arnaudb.json
10:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P69772 and previous config saved to /var/cache/conftool/dbconfig/20241014-101903-ladsgroup.json
10:17 ladsgroup@cumin1002: START - Cookbook sre.mysql.clone of db2194.codfw.wmnet onto db2227.codfw.wmnet
10:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling for reclone (T375652)', diff saved to https://phabricator.wikimedia.org/P69771 and previous config saved to /var/cache/conftool/dbconfig/20241014-101354-ladsgroup.json
10:13 eoghan@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lists1004.wikimedia.org
10:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling for reclone (T375652)', diff saved to https://phabricator.wikimedia.org/P69770 and previous config saved to /var/cache/conftool/dbconfig/20241014-101246-ladsgroup.json
10:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P69769 and previous config saved to /var/cache/conftool/dbconfig/20241014-100727-arnaudb.json
10:06 eoghan@cumin2002: START - Cookbook sre.hosts.reboot-single for host lists1004.wikimedia.org
10:06 eoghan@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lists2001.wikimedia.org
10:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T376905)', diff saved to https://phabricator.wikimedia.org/P69768 and previous config saved to /var/cache/conftool/dbconfig/20241014-100356-ladsgroup.json
10:00 akosiaris: powercycle rdb1014 T376961
10:00 eoghan@cumin2002: START - Cookbook sre.hosts.reboot-single for host lists2001.wikimedia.org
10:00 oblivian@cumin2002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) hiddenparma to alert2002.wikimedia.org with reason: init - oblivian@cumin2002
10:00 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert2002.wikimedia.org with reason: init - oblivian@cumin2002
10:00 ladsgroup@deploy2002: Finished scap sync-world: Creating rskwiki (T374963) (duration: 18m 38s)
09:59 oblivian@cumin2002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) hiddenparma to alert2002.wikimedia.org with reason: init - oblivian@cumin2002
09:59 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert2002.wikimedia.org with reason: init - oblivian@cumin2002
09:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P69767 and previous config saved to /var/cache/conftool/dbconfig/20241014-095354-arnaudb.json
09:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
09:53 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
09:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P69766 and previous config saved to /var/cache/conftool/dbconfig/20241014-095331-arnaudb.json
09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P69765 and previous config saved to /var/cache/conftool/dbconfig/20241014-095220-arnaudb.json
09:41 ladsgroup@deploy2002: Started scap sync-world: Creating rskwiki (T374963)
09:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P69764 and previous config saved to /var/cache/conftool/dbconfig/20241014-093824-arnaudb.json
09:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T367781)', diff saved to https://phabricator.wikimedia.org/P69763 and previous config saved to /var/cache/conftool/dbconfig/20241014-093713-arnaudb.json
09:36 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
09:35 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1196 (T367781)', diff saved to https://phabricator.wikimedia.org/P69762 and previous config saved to /var/cache/conftool/dbconfig/20241014-093459-arnaudb.json
09:34 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
09:34 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
09:34 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1196.eqiad.wmnet with reason: Maintenance
09:34 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1196.eqiad.wmnet with reason: Maintenance
09:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T367781)', diff saved to https://phabricator.wikimedia.org/P69761 and previous config saved to /var/cache/conftool/dbconfig/20241014-093418-arnaudb.json
09:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P69760 and previous config saved to /var/cache/conftool/dbconfig/20241014-092317-arnaudb.json
09:21 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
09:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P69759 and previous config saved to /var/cache/conftool/dbconfig/20241014-091911-arnaudb.json
09:09 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
09:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P69758 and previous config saved to /var/cache/conftool/dbconfig/20241014-090810-arnaudb.json
09:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P69757 and previous config saved to /var/cache/conftool/dbconfig/20241014-090403-arnaudb.json
09:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T376905)', diff saved to https://phabricator.wikimedia.org/P69756 and previous config saved to /var/cache/conftool/dbconfig/20241014-090340-ladsgroup.json
09:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
09:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
09:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
09:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
09:01 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster2005.codfw.wmnet
08:58 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
08:55 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2005.codfw.wmnet
08:55 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster2004.codfw.wmnet
08:49 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2004.codfw.wmnet
08:49 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster2003.codfw.wmnet
08:49 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
08:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T367781)', diff saved to https://phabricator.wikimedia.org/P69755 and previous config saved to /var/cache/conftool/dbconfig/20241014-084856-arnaudb.json
08:48 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
08:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1195 (T367781)', diff saved to https://phabricator.wikimedia.org/P69754 and previous config saved to /var/cache/conftool/dbconfig/20241014-084643-arnaudb.json
08:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1195.eqiad.wmnet with reason: Maintenance
08:46 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1195.eqiad.wmnet with reason: Maintenance
08:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T367781)', diff saved to https://phabricator.wikimedia.org/P69753 and previous config saved to /var/cache/conftool/dbconfig/20241014-084620-arnaudb.json
08:43 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2003.codfw.wmnet
08:43 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1029.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
08:40 elukey@cumin1002: START - Cookbook sre.hosts.provision for host dbproxy1029.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
08:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P69752 and previous config saved to /var/cache/conftool/dbconfig/20241014-083113-arnaudb.json
08:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
08:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P69751 and previous config saved to /var/cache/conftool/dbconfig/20241014-081606-arnaudb.json
08:13 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster2003.codfw.wmnet
08:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
08:12 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
08:11 elukey@cumin1002: START - Cookbook sre.hosts.provision for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
08:11 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
08:10 elukey@cumin1002: START - Cookbook sre.hosts.provision for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
08:10 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster2004.codfw.wmnet
08:08 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2003.codfw.wmnet
08:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P69750 and previous config saved to /var/cache/conftool/dbconfig/20241014-080744-arnaudb.json
08:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2172.codfw.wmnet with reason: Maintenance
08:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2172.codfw.wmnet with reason: Maintenance
08:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P69749 and previous config saved to /var/cache/conftool/dbconfig/20241014-080721-arnaudb.json
08:07 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster2005.codfw.wmnet
08:02 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2004.codfw.wmnet
08:01 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2005.codfw.wmnet
08:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T367781)', diff saved to https://phabricator.wikimedia.org/P69748 and previous config saved to /var/cache/conftool/dbconfig/20241014-080059-arnaudb.json
08:00 jayme@cumin1002: END (FAIL) - Cookbook sre.ganeti.reboot-vm (exit_code=99) for VM kubestagemaster2005.codfw.wmnet
08:00 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2005.codfw.wmnet
07:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1186 (T367781)', diff saved to https://phabricator.wikimedia.org/P69747 and previous config saved to /var/cache/conftool/dbconfig/20241014-075845-arnaudb.json
07:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1186.eqiad.wmnet with reason: Maintenance
07:58 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1186.eqiad.wmnet with reason: Maintenance
07:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T367781)', diff saved to https://phabricator.wikimedia.org/P69746 and previous config saved to /var/cache/conftool/dbconfig/20241014-075823-arnaudb.json
07:52 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
07:52 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
07:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P69745 and previous config saved to /var/cache/conftool/dbconfig/20241014-075214-arnaudb.json
07:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P69744 and previous config saved to /var/cache/conftool/dbconfig/20241014-074317-arnaudb.json
07:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P69743 and previous config saved to /var/cache/conftool/dbconfig/20241014-073707-arnaudb.json
07:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P69742 and previous config saved to /var/cache/conftool/dbconfig/20241014-072810-arnaudb.json
07:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P69741 and previous config saved to /var/cache/conftool/dbconfig/20241014-072201-arnaudb.json
07:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T367781)', diff saved to https://phabricator.wikimedia.org/P69740 and previous config saved to /var/cache/conftool/dbconfig/20241014-071302-arnaudb.json
07:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1184 (T367781)', diff saved to https://phabricator.wikimedia.org/P69739 and previous config saved to /var/cache/conftool/dbconfig/20241014-071048-arnaudb.json
07:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1184.eqiad.wmnet with reason: Maintenance
07:10 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1184.eqiad.wmnet with reason: Maintenance
07:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T367781)', diff saved to https://phabricator.wikimedia.org/P69738 and previous config saved to /var/cache/conftool/dbconfig/20241014-071026-arnaudb.json
06:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P69737 and previous config saved to /var/cache/conftool/dbconfig/20241014-065519-arnaudb.json
06:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P69736 and previous config saved to /var/cache/conftool/dbconfig/20241014-064012-arnaudb.json
06:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T367781)', diff saved to https://phabricator.wikimedia.org/P69735 and previous config saved to /var/cache/conftool/dbconfig/20241014-062505-arnaudb.json
06:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1169 (T367781)', diff saved to https://phabricator.wikimedia.org/P69734 and previous config saved to /var/cache/conftool/dbconfig/20241014-062249-arnaudb.json
06:22 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1169.eqiad.wmnet with reason: Maintenance
06:22 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1169.eqiad.wmnet with reason: Maintenance
06:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P69733 and previous config saved to /var/cache/conftool/dbconfig/20241014-062135-arnaudb.json
06:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
06:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
06:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
06:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
06:20 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2147.codfw.wmnet with reason: Maintenance
06:20 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2147.codfw.wmnet with reason: Maintenance
04:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
04:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
04:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
04:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
04:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T376905)', diff saved to https://phabricator.wikimedia.org/P69732 and previous config saved to /var/cache/conftool/dbconfig/20241014-042443-ladsgroup.json
04:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P69731 and previous config saved to /var/cache/conftool/dbconfig/20241014-040936-ladsgroup.json
03:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P69730 and previous config saved to /var/cache/conftool/dbconfig/20241014-035429-ladsgroup.json
03:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T376905)', diff saved to https://phabricator.wikimedia.org/P69729 and previous config saved to /var/cache/conftool/dbconfig/20241014-033922-ladsgroup.json
03:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1230 (T376905)', diff saved to https://phabricator.wikimedia.org/P69728 and previous config saved to /var/cache/conftool/dbconfig/20241014-033237-ladsgroup.json
03:32 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
03:32 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
03:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
03:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
03:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T376905)', diff saved to https://phabricator.wikimedia.org/P69727 and previous config saved to /var/cache/conftool/dbconfig/20241014-032710-ladsgroup.json
03:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P69726 and previous config saved to /var/cache/conftool/dbconfig/20241014-031203-ladsgroup.json
02:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P69725 and previous config saved to /var/cache/conftool/dbconfig/20241014-025656-ladsgroup.json
02:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T376905)', diff saved to https://phabricator.wikimedia.org/P69724 and previous config saved to /var/cache/conftool/dbconfig/20241014-024149-ladsgroup.json
02:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1213 (T376905)', diff saved to https://phabricator.wikimedia.org/P69723 and previous config saved to /var/cache/conftool/dbconfig/20241014-023616-ladsgroup.json
02:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
02:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
02:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T376905)', diff saved to https://phabricator.wikimedia.org/P69722 and previous config saved to /var/cache/conftool/dbconfig/20241014-023551-ladsgroup.json
02:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P69721 and previous config saved to /var/cache/conftool/dbconfig/20241014-022044-ladsgroup.json
02:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P69720 and previous config saved to /var/cache/conftool/dbconfig/20241014-020537-ladsgroup.json
01:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T376905)', diff saved to https://phabricator.wikimedia.org/P69719 and previous config saved to /var/cache/conftool/dbconfig/20241014-015030-ladsgroup.json
01:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1210 (T376905)', diff saved to https://phabricator.wikimedia.org/P69718 and previous config saved to /var/cache/conftool/dbconfig/20241014-014435-ladsgroup.json
01:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
01:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
01:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T376905)', diff saved to https://phabricator.wikimedia.org/P69717 and previous config saved to /var/cache/conftool/dbconfig/20241014-014410-ladsgroup.json
01:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P69716 and previous config saved to /var/cache/conftool/dbconfig/20241014-012903-ladsgroup.json
01:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P69715 and previous config saved to /var/cache/conftool/dbconfig/20241014-011356-ladsgroup.json
00:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T376905)', diff saved to https://phabricator.wikimedia.org/P69714 and previous config saved to /var/cache/conftool/dbconfig/20241014-005849-ladsgroup.json
00:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1200 (T376905)', diff saved to https://phabricator.wikimedia.org/P69713 and previous config saved to /var/cache/conftool/dbconfig/20241014-005056-ladsgroup.json
00:50 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
00:50 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
00:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T376905)', diff saved to https://phabricator.wikimedia.org/P69712 and previous config saved to /var/cache/conftool/dbconfig/20241014-005042-ladsgroup.json
00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P69711 and previous config saved to /var/cache/conftool/dbconfig/20241014-003534-ladsgroup.json
00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P69710 and previous config saved to /var/cache/conftool/dbconfig/20241014-002027-ladsgroup.json
00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T376905)', diff saved to https://phabricator.wikimedia.org/P69709 and previous config saved to /var/cache/conftool/dbconfig/20241014-000520-ladsgroup.json

2024-10-13

23:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1185 (T376905)', diff saved to https://phabricator.wikimedia.org/P69708 and previous config saved to /var/cache/conftool/dbconfig/20241013-235726-ladsgroup.json
23:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
23:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
23:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T376905)', diff saved to https://phabricator.wikimedia.org/P69707 and previous config saved to /var/cache/conftool/dbconfig/20241013-235701-ladsgroup.json
23:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P69706 and previous config saved to /var/cache/conftool/dbconfig/20241013-234154-ladsgroup.json
23:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P69705 and previous config saved to /var/cache/conftool/dbconfig/20241013-232647-ladsgroup.json
23:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T376905)', diff saved to https://phabricator.wikimedia.org/P69704 and previous config saved to /var/cache/conftool/dbconfig/20241013-231140-ladsgroup.json
23:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1161 (T376905)', diff saved to https://phabricator.wikimedia.org/P69703 and previous config saved to /var/cache/conftool/dbconfig/20241013-230403-ladsgroup.json
23:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
23:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
23:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
23:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
12:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: maintenance
12:12 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: maintenance
12:11 arnaudb@cumin1002: dbctl commit (dc=all): 'depool db2147', diff saved to https://phabricator.wikimedia.org/P69702 and previous config saved to /var/cache/conftool/dbconfig/20241013-121154-arnaudb.json
10:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1216.eqiad.wmnet with reason: Maintenance
10:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1216.eqiad.wmnet with reason: Maintenance
10:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T367856)', diff saved to https://phabricator.wikimedia.org/P69701 and previous config saved to /var/cache/conftool/dbconfig/20241013-102205-ladsgroup.json
10:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P69700 and previous config saved to /var/cache/conftool/dbconfig/20241013-100658-ladsgroup.json
09:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P69699 and previous config saved to /var/cache/conftool/dbconfig/20241013-095151-ladsgroup.json
09:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T367856)', diff saved to https://phabricator.wikimedia.org/P69698 and previous config saved to /var/cache/conftool/dbconfig/20241013-093644-ladsgroup.json

2024-10-11

22:18 btullis@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling reboot on P{cephosd100[3-5]*} and (A:cephosd)
21:38 btullis@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling reboot on P{cephosd100[3-5]*} and (A:cephosd)
21:36 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1002.eqiad.wmnet
21:26 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host cephosd1002.eqiad.wmnet
21:24 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1001.eqiad.wmnet
21:14 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host cephosd1001.eqiad.wmnet
16:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb2004-dev.codfw.wmnet with OS bookworm
16:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
16:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
16:49 btullis@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling reboot on A:cephosd
16:40 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@c1d2914]: bump section topics to v0.16.0 (duration: 00m 42s)
16:39 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@c1d2914]: bump section topics to v0.16.0
16:38 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@c1d2914]: bump section topics to v0.16.0 (duration: 01m 06s)
16:38 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@c1d2914]: bump section topics to v0.16.0
16:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb2004-dev.codfw.wmnet with reason: host reimage
16:34 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb2004-dev.codfw.wmnet with reason: host reimage
16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
16:14 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
16:11 kcvelaga@deploy2002: Finished deploy [airflow-dags/analytics_product@1fb69c4]: T376456 (duration: 01m 15s)
16:10 jhancock@cumin2002: START - Cookbook sre.dns.netbox
16:10 kcvelaga@deploy2002: Started deploy [airflow-dags/analytics_product@1fb69c4]: T376456
15:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
15:40 btullis@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling reboot on A:cephosd
15:37 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:37 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for codfw cloudgw - cmooney@cumin1002"
15:37 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for codfw cloudgw - cmooney@cumin1002"
15:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
15:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
15:34 cmooney@cumin1002: START - Cookbook sre.dns.netbox
15:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
14:48 eevans@deploy2002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
14:48 eevans@deploy2002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
14:47 urandom: upgrading data-gateway to v1.0.10
14:46 eevans@deploy2002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
14:46 eevans@deploy2002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
14:39 eevans@deploy2002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
14:38 eevans@deploy2002: helmfile [staging] START helmfile.d/services/data-gateway: apply
14:31 andrewtavis-wmde@deploy2002: Finished deploy [airflow-dags/wmde@c9a2532]: (no justification provided) (duration: 00m 25s)
14:30 andrewtavis-wmde@deploy2002: Started deploy [airflow-dags/wmde@c9a2532]: (no justification provided)
13:59 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 100%: T376988', diff saved to https://phabricator.wikimedia.org/P69695 and previous config saved to /var/cache/conftool/dbconfig/20241011-135903-arnaudb.json
13:46 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
13:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 75%: T376988', diff saved to https://phabricator.wikimedia.org/P69694 and previous config saved to /var/cache/conftool/dbconfig/20241011-134357-arnaudb.json
13:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 50%: T376988', diff saved to https://phabricator.wikimedia.org/P69693 and previous config saved to /var/cache/conftool/dbconfig/20241011-132852-arnaudb.json
13:13 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 25%: T376988', diff saved to https://phabricator.wikimedia.org/P69692 and previous config saved to /var/cache/conftool/dbconfig/20241011-131347-arnaudb.json
13:13 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "renamed k8s prefixes descriptions in Netbox - ayounsi@cumin1002"
13:12 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "renamed k8s prefixes descriptions in Netbox - ayounsi@cumin1002"
13:08 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
12:58 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 10%: T376988', diff saved to https://phabricator.wikimedia.org/P69691 and previous config saved to /var/cache/conftool/dbconfig/20241011-125841-arnaudb.json
12:43 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 5%: T376988', diff saved to https://phabricator.wikimedia.org/P69690 and previous config saved to /var/cache/conftool/dbconfig/20241011-124336-arnaudb.json
12:37 hashar: Restarting Gerrit
12:34 akosiaris@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts scandium.eqiad.wmnet
12:34 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:34 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: scandium.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - akosiaris@cumin1002"
12:34 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: scandium.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - akosiaris@cumin1002"
12:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 2%: T376988', diff saved to https://phabricator.wikimedia.org/P69688 and previous config saved to /var/cache/conftool/dbconfig/20241011-122830-arnaudb.json
12:13 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 1%: T376988', diff saved to https://phabricator.wikimedia.org/P69687 and previous config saved to /var/cache/conftool/dbconfig/20241011-121325-arnaudb.json
11:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 (T367856)', diff saved to https://phabricator.wikimedia.org/P69686 and previous config saved to /var/cache/conftool/dbconfig/20241011-114446-ladsgroup.json
11:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1214.eqiad.wmnet with reason: Maintenance
11:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1214.eqiad.wmnet with reason: Maintenance
11:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T367856)', diff saved to https://phabricator.wikimedia.org/P69685 and previous config saved to /var/cache/conftool/dbconfig/20241011-114424-ladsgroup.json
11:36 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
11:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P69684 and previous config saved to /var/cache/conftool/dbconfig/20241011-112917-ladsgroup.json
11:27 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-worker2092.codfw.wmnet
11:27 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for wikikube-worker2092.codfw.wmnet
11:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2092.codfw.wmnet
11:26 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2092.codfw.wmnet
11:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2092.codfw.wmnet with OS bullseye
11:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P69683 and previous config saved to /var/cache/conftool/dbconfig/20241011-111410-ladsgroup.json
11:02 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test1001.eqiad.wmnet
10:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T367856)', diff saved to https://phabricator.wikimedia.org/P69682 and previous config saved to /var/cache/conftool/dbconfig/20241011-105903-ladsgroup.json
10:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2092.codfw.wmnet with reason: host reimage
10:57 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host acmechief-test1001.eqiad.wmnet
10:56 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test2001.codfw.wmnet
10:56 cgoubert@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
10:55 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2092.codfw.wmnet with reason: host reimage
10:53 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host acmechief-test2001.codfw.wmnet
10:50 brouberol@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling reboot on A:cephosd
10:50 fabfur: enabled puppet on R:acme_chief::cert for T376800
10:50 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
10:47 fabfur@cumin1002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host acmechief2002.codfw.wmnet
10:44 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host acmechief2002.codfw.wmnet
10:44 fabfur: rebooting acmechief1002|2002 (sequentially) (T376800)
10:37 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief1002.eqiad.wmnet
10:37 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host acmechief1002.eqiad.wmnet
10:35 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2092.codfw.wmnet with OS bullseye
10:34 fabfur: disabled puppet on acmechief1002 (T376800)
10:33 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2175.codfw.wmnet with reason: index corruption
10:33 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2175.codfw.wmnet with reason: index corruption
10:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2092.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTARTand with Dell SCP reboot policy GRACEFUL
10:27 jynus@cumin1002: dbctl commit (dc=all): 'depool db2175', diff saved to https://phabricator.wikimedia.org/P69680 and previous config saved to /var/cache/conftool/dbconfig/20241011-102706-jynus.json
10:26 fabfur: disabling puppet on R:acme_chief::cert for T376800
10:23 cgoubert@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker2092.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTARTand with Dell SCP reboot policy GRACEFUL
09:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 (T367856)', diff saved to https://phabricator.wikimedia.org/P69678 and previous config saved to /var/cache/conftool/dbconfig/20241011-095847-ladsgroup.json
09:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1211.eqiad.wmnet with reason: Maintenance
09:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1211.eqiad.wmnet with reason: Maintenance
09:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T367856)', diff saved to https://phabricator.wikimedia.org/P69677 and previous config saved to /var/cache/conftool/dbconfig/20241011-095826-ladsgroup.json
09:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P69676 and previous config saved to /var/cache/conftool/dbconfig/20241011-094319-ladsgroup.json
09:41 brouberol@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling reboot on A:cephosd
09:38 akosiaris@cumin1002: START - Cookbook sre.hosts.decommission for hosts scandium.eqiad.wmnet
09:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P69675 and previous config saved to /var/cache/conftool/dbconfig/20241011-092812-ladsgroup.json
09:27 Dreamy_Jazz: Restarted MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
09:18 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1176.eqiad.wmnet with OS bullseye
09:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T367856)', diff saved to https://phabricator.wikimedia.org/P69674 and previous config saved to /var/cache/conftool/dbconfig/20241011-091305-ladsgroup.json
08:19 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
08:17 jelto@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
08:12 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
08:10 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1177.eqiad.wmnet with OS bullseye
08:10 jelto@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
08:02 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1176.eqiad.wmnet with OS bullseye
08:00 moritzm: upload ircstream 0.13.0+wmf12u2 to apt.wikimedia.org (sync to latest git and the async_broadcast feature branch) T376014
07:59 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1177.eqiad.wmnet with OS bullseye
07:56 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1177.eqiad.wmnet with OS bullseye
02:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T367781)', diff saved to https://phabricator.wikimedia.org/P69673 and previous config saved to /var/cache/conftool/dbconfig/20241011-021156-arnaudb.json
01:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P69672 and previous config saved to /var/cache/conftool/dbconfig/20241011-015649-arnaudb.json
01:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P69671 and previous config saved to /var/cache/conftool/dbconfig/20241011-014142-arnaudb.json
01:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T367781)', diff saved to https://phabricator.wikimedia.org/P69670 and previous config saved to /var/cache/conftool/dbconfig/20241011-012635-arnaudb.json
01:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2237 (T367781)', diff saved to https://phabricator.wikimedia.org/P69669 and previous config saved to /var/cache/conftool/dbconfig/20241011-012424-arnaudb.json
01:24 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2237.codfw.wmnet with reason: Maintenance
01:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2237.codfw.wmnet with reason: Maintenance
01:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P69668 and previous config saved to /var/cache/conftool/dbconfig/20241011-012401-arnaudb.json
01:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P69667 and previous config saved to /var/cache/conftool/dbconfig/20241011-010854-arnaudb.json
00:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P69666 and previous config saved to /var/cache/conftool/dbconfig/20241011-005347-arnaudb.json
00:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P69665 and previous config saved to /var/cache/conftool/dbconfig/20241011-003840-arnaudb.json

2024-10-10

23:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P69664 and previous config saved to /var/cache/conftool/dbconfig/20241010-233814-arnaudb.json
23:38 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
23:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
23:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T367781)', diff saved to https://phabricator.wikimedia.org/P69663 and previous config saved to /var/cache/conftool/dbconfig/20241010-233752-arnaudb.json
23:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P69662 and previous config saved to /var/cache/conftool/dbconfig/20241010-232245-arnaudb.json
23:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P69661 and previous config saved to /var/cache/conftool/dbconfig/20241010-230738-arnaudb.json
22:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T367781)', diff saved to https://phabricator.wikimedia.org/P69660 and previous config saved to /var/cache/conftool/dbconfig/20241010-225231-arnaudb.json
22:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2210 (T367781)', diff saved to https://phabricator.wikimedia.org/P69659 and previous config saved to /var/cache/conftool/dbconfig/20241010-225019-arnaudb.json
22:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2210.codfw.wmnet with reason: Maintenance
22:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2210.codfw.wmnet with reason: Maintenance
22:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T367781)', diff saved to https://phabricator.wikimedia.org/P69658 and previous config saved to /var/cache/conftool/dbconfig/20241010-224957-arnaudb.json
22:37 cstone: payments-wiki upgraded from ebb42c67 to 40e4a592
22:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P69657 and previous config saved to /var/cache/conftool/dbconfig/20241010-223450-arnaudb.json
22:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P69656 and previous config saved to /var/cache/conftool/dbconfig/20241010-221943-arnaudb.json
22:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T367781)', diff saved to https://phabricator.wikimedia.org/P69655 and previous config saved to /var/cache/conftool/dbconfig/20241010-220437-arnaudb.json
22:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2206 (T367781)', diff saved to https://phabricator.wikimedia.org/P69654 and previous config saved to /var/cache/conftool/dbconfig/20241010-220125-arnaudb.json
22:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2206.codfw.wmnet with reason: Maintenance
22:01 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2206.codfw.wmnet with reason: Maintenance
22:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2199.codfw.wmnet with reason: Maintenance
22:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2199.codfw.wmnet with reason: Maintenance
22:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P69653 and previous config saved to /var/cache/conftool/dbconfig/20241010-220043-arnaudb.json
21:52 jforrester@deploy2002: Finished deploy [integration/docroot@ff9e25a]: Add Codex PHP doc and source code link, for T375939 (duration: 00m 08s)
21:52 jforrester@deploy2002: Started deploy [integration/docroot@ff9e25a]: Add Codex PHP doc and source code link, for T375939
21:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P69652 and previous config saved to /var/cache/conftool/dbconfig/20241010-214536-arnaudb.json
21:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P69651 and previous config saved to /var/cache/conftool/dbconfig/20241010-213029-arnaudb.json
21:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P69650 and previous config saved to /var/cache/conftool/dbconfig/20241010-211522-arnaudb.json
21:05 aqu@deploy2002: Finished deploy [airflow-dags/analytics@c9a2532]: Webrequest-Refine fix [airflow-dags@c9a2532e] (duration: 00m 51s)
21:04 aqu@deploy2002: Started deploy [airflow-dags/analytics@c9a2532]: Webrequest-Refine fix [airflow-dags@c9a2532e]
21:04 thcipriani@deploy2002: Finished scap sync-world: Backport for Update VE core submodule to master (c98f3a542) (T376901) (duration: 08m 56s)
20:59 thcipriani@deploy2002: jforrester, thcipriani: Continuing with sync
20:57 thcipriani@deploy2002: jforrester, thcipriani: Backport for Update VE core submodule to master (c98f3a542) (T376901) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:55 thcipriani@deploy2002: Started scap sync-world: Backport for Update VE core submodule to master (c98f3a542) (T376901)
20:27 eileen: config revision changed from 150b02a9 to 3c6d2054
20:23 thcipriani@deploy2002: Finished scap sync-world: Backport for REST: Make experimental endpoints available on beta and testwiki (T375512) (duration: 08m 34s)
20:18 thcipriani@deploy2002: bpirkle, thcipriani: Continuing with sync
20:16 thcipriani@deploy2002: bpirkle, thcipriani: Backport for REST: Make experimental endpoints available on beta and testwiki (T375512) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P69649 and previous config saved to /var/cache/conftool/dbconfig/20241010-201456-arnaudb.json
20:14 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2172.codfw.wmnet with reason: Maintenance
20:14 thcipriani@deploy2002: Started scap sync-world: Backport for REST: Make experimental endpoints available on beta and testwiki (T375512)
20:14 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2172.codfw.wmnet with reason: Maintenance
20:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P69648 and previous config saved to /var/cache/conftool/dbconfig/20241010-201433-arnaudb.json
20:05 eileen: civicrm upgraded from 07dee21c to ff3144dd
19:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P69647 and previous config saved to /var/cache/conftool/dbconfig/20241010-195926-arnaudb.json
19:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P69646 and previous config saved to /var/cache/conftool/dbconfig/20241010-194419-arnaudb.json
19:43 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@4b69f50]: Stage Webrequest-Refine fix on test cluster [airflow-dags@4b69f503] (duration: 00m 13s)
19:43 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@4b69f50]: Stage Webrequest-Refine fix on test cluster [airflow-dags@4b69f503]
19:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P69645 and previous config saved to /var/cache/conftool/dbconfig/20241010-192912-arnaudb.json
19:23 rzl@deploy2002: Finished scap sync-world: chart version bump for 1078720 (duration: 02m 09s)
19:21 rzl@deploy2002: Started scap sync-world: chart version bump for 1078720
19:06 eileen: config revision changed from ae4a5be9 to 150b02a9
18:50 papaul: maintenance on mr1-eqiad complete
18:44 eileen: tools upgraded from 632bf430 to 62f2d170
18:29 eileen: tools upgraded from e9c05e30 to 632bf430
18:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P69644 and previous config saved to /var/cache/conftool/dbconfig/20241010-182846-arnaudb.json
18:28 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
18:28 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
18:28 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
18:28 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
18:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T367781)', diff saved to https://phabricator.wikimedia.org/P69643 and previous config saved to /var/cache/conftool/dbconfig/20241010-182808-arnaudb.json
18:14 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
18:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P69642 and previous config saved to /var/cache/conftool/dbconfig/20241010-181301-arnaudb.json
18:08 jhathaway@cumin1002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
18:00 papaul: ongoing maintenance on mr1-eqiad
17:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P69641 and previous config saved to /var/cache/conftool/dbconfig/20241010-175754-arnaudb.json
17:57 root@cumin1002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for dbprov1001.eqiad.wmnet: Renew puppet certificate - root@cumin1002
17:54 root@cumin1002: START - Cookbook sre.puppet.renew-cert for dbprov1001.eqiad.wmnet: Renew puppet certificate - root@cumin1002
17:47 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool echostore in eqiad: Repooling echostore after migration to service mesh - T376766
17:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T367781)', diff saved to https://phabricator.wikimedia.org/P69640 and previous config saved to /var/cache/conftool/dbconfig/20241010-174247-arnaudb.json
17:42 swfrench@cumin2002: START - Cookbook sre.discovery.service-route pool echostore in eqiad: Repooling echostore after migration to service mesh - T376766
17:39 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/echostore: apply
17:39 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/echostore: apply
17:38 swfrench-wmf: removing echostore eqiad deployment (depooled) to unblock breaking change - T376766
17:34 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
17:34 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
17:34 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
17:33 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
17:33 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
17:32 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
17:25 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool echostore in eqiad: Depooling echostore for migration to service mesh - T376766
17:20 swfrench@cumin2002: START - Cookbook sre.discovery.service-route depool echostore in eqiad: Depooling echostore for migration to service mesh - T376766
17:04 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
17:04 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
17:04 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool echostore in codfw: Repooling echostore after migration to service mesh - T376766
16:59 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1001.eqiad.wmnet
16:58 swfrench@cumin2002: START - Cookbook sre.discovery.service-route pool echostore in codfw: Repooling echostore after migration to service mesh - T376766
16:53 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
16:53 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
16:53 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
16:51 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
16:51 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage1003.eqiad.wmnet
16:51 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage1003.eqiad.wmnet
16:50 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/echostore: apply
16:50 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/echostore: apply
16:49 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host cephosd1001.eqiad.wmnet
16:47 swfrench-wmf: removing echostore codfw deployment (depooled) to unblock breaking change - T376766
16:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T367781)', diff saved to https://phabricator.wikimedia.org/P69639 and previous config saved to /var/cache/conftool/dbconfig/20241010-164221-arnaudb.json
16:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2147.codfw.wmnet with reason: Maintenance
16:42 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2147.codfw.wmnet with reason: Maintenance
16:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T367781)', diff saved to https://phabricator.wikimedia.org/P69638 and previous config saved to /var/cache/conftool/dbconfig/20241010-164159-arnaudb.json
16:40 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1003.eqiad.wmnet with OS bookworm
16:30 jhathaway@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['sretest1003.eqiad.wmnet']
16:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P69637 and previous config saved to /var/cache/conftool/dbconfig/20241010-162652-arnaudb.json
16:23 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1003.eqiad.wmnet with reason: host reimage
16:23 jhathaway@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003.eqiad.wmnet']
16:21 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1003.eqiad.wmnet with reason: host reimage
16:18 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool echostore in codfw: Depooling echostore for migration to service mesh - T376766
16:13 swfrench@cumin2002: START - Cookbook sre.discovery.service-route depool echostore in codfw: Depooling echostore for migration to service mesh - T376766
16:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P69636 and previous config saved to /var/cache/conftool/dbconfig/20241010-161145-arnaudb.json
16:04 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestage1003.eqiad.wmnet with OS bookworm
16:03 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage1003.eqiad.wmnet
16:02 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage1003.eqiad.wmnet
15:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T367781)', diff saved to https://phabricator.wikimedia.org/P69635 and previous config saved to /var/cache/conftool/dbconfig/20241010-155638-arnaudb.json
15:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2140 (T367781)', diff saved to https://phabricator.wikimedia.org/P69634 and previous config saved to /var/cache/conftool/dbconfig/20241010-155426-arnaudb.json
15:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2140.codfw.wmnet with reason: Maintenance
15:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2140.codfw.wmnet with reason: Maintenance
15:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2139.codfw.wmnet with reason: Maintenance
15:53 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2139.codfw.wmnet with reason: Maintenance
15:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T367781)', diff saved to https://phabricator.wikimedia.org/P69633 and previous config saved to /var/cache/conftool/dbconfig/20241010-155345-arnaudb.json
15:53 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
15:47 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bookworm
15:40 papaul: mr1-drmrs maintenance complete
15:39 dancy@deploy2002: Installation of scap version "4.110.0" completed for 211 hosts
15:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P69632 and previous config saved to /var/cache/conftool/dbconfig/20241010-153838-arnaudb.json
15:35 dancy@deploy2002: Installing scap version "4.110.0" for 211 hosts
15:33 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
15:28 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
15:25 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
15:23 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on A:ncredir
15:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P69631 and previous config saved to /var/cache/conftool/dbconfig/20241010-152331-arnaudb.json
15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:13 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:13 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
15:13 jhancock@cumin2002: START - Cookbook sre.dns.netbox
15:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T367781)', diff saved to https://phabricator.wikimedia.org/P69630 and previous config saved to /var/cache/conftool/dbconfig/20241010-150824-arnaudb.json
15:08 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
15:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2136 (T367781)', diff saved to https://phabricator.wikimedia.org/P69629 and previous config saved to /var/cache/conftool/dbconfig/20241010-150512-arnaudb.json
15:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2136.codfw.wmnet with reason: Maintenance
15:04 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2136.codfw.wmnet with reason: Maintenance
15:04 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
15:04 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
15:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T367781)', diff saved to https://phabricator.wikimedia.org/P69628 and previous config saved to /var/cache/conftool/dbconfig/20241010-150433-arnaudb.json
15:02 papaul: ongoing maintenance on mr1-drmrs
14:56 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@4b69f50]: Revert previous staging of Refine fixes on test cluster [airflow-dags@4b69f503] (duration: 00m 13s)
14:56 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@4b69f50]: Revert previous staging of Refine fixes on test cluster [airflow-dags@4b69f503]
14:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P69626 and previous config saved to /var/cache/conftool/dbconfig/20241010-144926-arnaudb.json
14:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T367781)', diff saved to https://phabricator.wikimedia.org/P69625 and previous config saved to /var/cache/conftool/dbconfig/20241010-143713-arnaudb.json
14:34 jhathaway@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['sretest1002.eqiad.wmnet']
14:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P69624 and previous config saved to /var/cache/conftool/dbconfig/20241010-143419-arnaudb.json
14:28 jhathaway@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1002.eqiad.wmnet']
14:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P69623 and previous config saved to /var/cache/conftool/dbconfig/20241010-142206-arnaudb.json
14:19 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503] (duration: 00m 13s)
14:19 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503]
14:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T367781)', diff saved to https://phabricator.wikimedia.org/P69622 and previous config saved to /var/cache/conftool/dbconfig/20241010-141912-arnaudb.json
14:18 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
14:18 jelto@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
14:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1249 (T367781)', diff saved to https://phabricator.wikimedia.org/P69621 and previous config saved to /var/cache/conftool/dbconfig/20241010-141704-arnaudb.json
14:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1249.eqiad.wmnet with reason: Maintenance
14:16 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
14:16 sukhe@cumin1002: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on A:ncredir
14:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1249.eqiad.wmnet with reason: Maintenance
14:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T367781)', diff saved to https://phabricator.wikimedia.org/P69620 and previous config saved to /var/cache/conftool/dbconfig/20241010-141642-arnaudb.json
14:16 moritzm: failover Ganeti masters in magru to secondary node
14:12 jelto@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
14:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P69619 and previous config saved to /var/cache/conftool/dbconfig/20241010-140659-arnaudb.json
14:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7004.magru.wmnet
14:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7004.magru.wmnet
14:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P69618 and previous config saved to /var/cache/conftool/dbconfig/20241010-140135-arnaudb.json
13:59 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on A:dnsbox and A:ulsfo and A:dnsbox
13:59 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns4004.wikimedia.org
13:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T367781)', diff saved to https://phabricator.wikimedia.org/P69617 and previous config saved to /var/cache/conftool/dbconfig/20241010-135152-arnaudb.json
13:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7004.magru.wmnet
13:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1230 (T367781)', diff saved to https://phabricator.wikimedia.org/P69616 and previous config saved to /var/cache/conftool/dbconfig/20241010-134926-arnaudb.json
13:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1230.eqiad.wmnet with reason: Maintenance
13:49 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1230.eqiad.wmnet with reason: Maintenance
13:48 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns4004.wikimedia.org
13:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P69615 and previous config saved to /var/cache/conftool/dbconfig/20241010-134628-arnaudb.json
13:46 Lucas_WMDE: UTC afternoon backport+config window done
13:45 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Use ?? instead of default value in getRawVal() (T376245) (duration: 07m 16s)
13:43 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7004.magru.wmnet
13:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7003.magru.wmnet
13:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7003.magru.wmnet
13:41 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, fomafix: Continuing with sync
13:41 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, fomafix: Backport for Use ?? instead of default value in getRawVal() (T376245) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:38 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Use ?? instead of default value in getRawVal() (T376245)
13:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Turn on mobile support for Parsoid Read Views (but not on talk pages) (T269499 T376048), Turn on Parsoid Selective Update metrics (take 2) (T371713 T376433) (duration: 16m 09s)
13:36 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns4003.wikimedia.org
13:35 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dns4003.wikimedia.org
13:35 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for dns4003.wikimedia.org
13:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7003.magru.wmnet
13:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, cscott: Continuing with sync
13:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T367781)', diff saved to https://phabricator.wikimedia.org/P69613 and previous config saved to /var/cache/conftool/dbconfig/20241010-133121-arnaudb.json
13:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1248 (T367781)', diff saved to https://phabricator.wikimedia.org/P69612 and previous config saved to /var/cache/conftool/dbconfig/20241010-133113-arnaudb.json
13:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1248.eqiad.wmnet with reason: Maintenance
13:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1248.eqiad.wmnet with reason: Maintenance
13:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T367781)', diff saved to https://phabricator.wikimedia.org/P69611 and previous config saved to /var/cache/conftool/dbconfig/20241010-133049-arnaudb.json
13:23 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7003.magru.wmnet
13:23 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, cscott: Backport for Turn on mobile support for Parsoid Read Views (but not on talk pages) (T269499 T376048), Turn on Parsoid Selective Update metrics (take 2) (T371713 T376433) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:21 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Turn on mobile support for Parsoid Read Views (but not on talk pages) (T269499 T376048), Turn on Parsoid Selective Update metrics (take 2) (T371713 T376433)
13:17 dreamyjazz@deploy2002: Finished scap sync-world: Backport for QuickSurvey.vue: Support using HTML in thank you message (T376517), extension.json: Add mediawiki.jqueryMsg to dependencies for ext.quicksurveys.lib (T376517) (duration: 09m 12s)
13:17 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns4003.wikimedia.org
13:17 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on A:dnsbox and A:ulsfo and A:dnsbox
13:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P69610 and previous config saved to /var/cache/conftool/dbconfig/20241010-131542-arnaudb.json
13:12 dreamyjazz@deploy2002: dreamyjazz, kharlan: Continuing with sync
13:11 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage1004.eqiad.wmnet
13:11 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage1004.eqiad.wmnet
13:10 dreamyjazz@deploy2002: dreamyjazz, kharlan: Backport for QuickSurvey.vue: Support using HTML in thank you message (T376517), extension.json: Add mediawiki.jqueryMsg to dependencies for ext.quicksurveys.lib (T376517) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2034.codfw.wmnet
13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2034.codfw.wmnet
13:08 dreamyjazz@deploy2002: Started scap sync-world: Backport for QuickSurvey.vue: Support using HTML in thank you message (T376517), extension.json: Add mediawiki.jqueryMsg to dependencies for ext.quicksurveys.lib (T376517)
13:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2034.codfw.wmnet
13:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2034.codfw.wmnet
13:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast4005.wikimedia.org
13:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow1002.eqiad.wmnet
13:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2034.codfw.wmnet
13:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P69609 and previous config saved to /var/cache/conftool/dbconfig/20241010-130035-arnaudb.json
12:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow1002.eqiad.wmnet
12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow2003.codfw.wmnet
12:55 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2034.codfw.wmnet
12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast4005.wikimedia.org
12:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow2003.codfw.wmnet
12:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow3003.esams.wmnet
12:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow3003.esams.wmnet
12:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T367781)', diff saved to https://phabricator.wikimedia.org/P69608 and previous config saved to /var/cache/conftool/dbconfig/20241010-124528-arnaudb.json
12:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1247 (T367781)', diff saved to https://phabricator.wikimedia.org/P69607 and previous config saved to /var/cache/conftool/dbconfig/20241010-124319-arnaudb.json
12:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1247.eqiad.wmnet with reason: Maintenance
12:43 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1247.eqiad.wmnet with reason: Maintenance
12:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1245.eqiad.wmnet with reason: Maintenance
12:42 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1245.eqiad.wmnet with reason: Maintenance
12:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T367781)', diff saved to https://phabricator.wikimedia.org/P69606 and previous config saved to /var/cache/conftool/dbconfig/20241010-124241-arnaudb.json
12:38 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1004.eqiad.wmnet with OS bookworm
12:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow4002.ulsfo.wmnet
12:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow4002.ulsfo.wmnet
12:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P69605 and previous config saved to /var/cache/conftool/dbconfig/20241010-122734-arnaudb.json
12:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow5002.eqsin.wmnet
12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow5002.eqsin.wmnet
12:19 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1004.eqiad.wmnet with reason: host reimage
12:16 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1004.eqiad.wmnet with reason: host reimage
12:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P69604 and previous config saved to /var/cache/conftool/dbconfig/20241010-121227-arnaudb.json
12:00 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestage1004.eqiad.wmnet with OS bookworm
11:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T367781)', diff saved to https://phabricator.wikimedia.org/P69603 and previous config saved to /var/cache/conftool/dbconfig/20241010-115720-arnaudb.json
11:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P69599 and previous config saved to /var/cache/conftool/dbconfig/20241010-114042-arnaudb.json
11:34 zabe@deploy2002: Finished scap sync-world: Backport for s2: Reduce revision-slots cache expiry to 60 seconds (T183490) (duration: 06m 58s)
11:29 zabe@deploy2002: zabe: Continuing with sync
11:29 zabe@deploy2002: zabe: Backport for s2: Reduce revision-slots cache expiry to 60 seconds (T183490) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:27 zabe@deploy2002: Started scap sync-world: Backport for s2: Reduce revision-slots cache expiry to 60 seconds (T183490)
11:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet
11:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P69598 and previous config saved to /var/cache/conftool/dbconfig/20241010-112535-arnaudb.json
11:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet
11:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow7001.magru.wmnet
11:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow7001.magru.wmnet
11:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2008.wikimedia.org
11:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2008.wikimedia.org
11:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T367781)', diff saved to https://phabricator.wikimedia.org/P69597 and previous config saved to /var/cache/conftool/dbconfig/20241010-111028-arnaudb.json
11:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T367781)', diff saved to https://phabricator.wikimedia.org/P69596 and previous config saved to /var/cache/conftool/dbconfig/20241010-110920-arnaudb.json
11:09 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1243.eqiad.wmnet with reason: Maintenance
11:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1243.eqiad.wmnet with reason: Maintenance
11:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T367781)', diff saved to https://phabricator.wikimedia.org/P69595 and previous config saved to /var/cache/conftool/dbconfig/20241010-110857-arnaudb.json
11:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2007.codfw.wmnet
10:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2007.codfw.wmnet
10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2006.codfw.wmnet
10:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P69594 and previous config saved to /var/cache/conftool/dbconfig/20241010-105350-arnaudb.json
10:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2006.codfw.wmnet
10:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2005.codfw.wmnet
10:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2005.codfw.wmnet
10:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2004.codfw.wmnet
10:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2004.codfw.wmnet
10:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testhost2001.codfw.wmnet
10:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P69593 and previous config saved to /var/cache/conftool/dbconfig/20241010-103843-arnaudb.json
10:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testhost2001.codfw.wmnet
10:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki2003.codfw.wmnet
10:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T367781)', diff saved to https://phabricator.wikimedia.org/P69592 and previous config saved to /var/cache/conftool/dbconfig/20241010-102336-arnaudb.json
10:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki2003.codfw.wmnet
10:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T367781)', diff saved to https://phabricator.wikimedia.org/P69591 and previous config saved to /var/cache/conftool/dbconfig/20241010-102127-arnaudb.json
10:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1242.eqiad.wmnet with reason: Maintenance
10:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1242.eqiad.wmnet with reason: Maintenance
10:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T367781)', diff saved to https://phabricator.wikimedia.org/P69590 and previous config saved to /var/cache/conftool/dbconfig/20241010-102104-arnaudb.json
10:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P69589 and previous config saved to /var/cache/conftool/dbconfig/20241010-100557-arnaudb.json
09:54 jayme@cumin1002: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) depool for host kubestage1004.eqiad.wmnet
09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt1002.wikimedia.org
09:52 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage1004.eqiad.wmnet
09:52 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage1003.eqiad.wmnet
09:52 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage1003.eqiad.wmnet
09:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P69587 and previous config saved to /var/cache/conftool/dbconfig/20241010-095050-arnaudb.json
09:50 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1003.eqiad.wmnet with OS bookworm
09:49 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
09:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host apt1002.wikimedia.org
09:35 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T367781)', diff saved to https://phabricator.wikimedia.org/P69586 and previous config saved to /var/cache/conftool/dbconfig/20241010-093544-arnaudb.json
09:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1241 (T367781)', diff saved to https://phabricator.wikimedia.org/P69585 and previous config saved to /var/cache/conftool/dbconfig/20241010-093335-arnaudb.json
09:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1241.eqiad.wmnet with reason: Maintenance
09:33 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1241.eqiad.wmnet with reason: Maintenance
09:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T367781)', diff saved to https://phabricator.wikimedia.org/P69584 and previous config saved to /var/cache/conftool/dbconfig/20241010-093313-arnaudb.json
09:33 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1003.eqiad.wmnet with reason: host reimage
09:30 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1003.eqiad.wmnet with reason: host reimage
09:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T367781)', diff saved to https://phabricator.wikimedia.org/P69583 and previous config saved to /var/cache/conftool/dbconfig/20241010-092735-arnaudb.json
09:21 aklapper@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.43.0-wmf.26 refs T375657
09:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P69582 and previous config saved to /var/cache/conftool/dbconfig/20241010-091806-arnaudb.json
09:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt-staging2001.codfw.wmnet
09:14 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestage1003.eqiad.wmnet with OS bookworm
09:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P69581 and previous config saved to /var/cache/conftool/dbconfig/20241010-091228-arnaudb.json
09:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host apt-staging2001.codfw.wmnet
09:10 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage1003.eqiad.wmnet
09:10 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage1003.eqiad.wmnet
09:07 aklapper@deploy2002: Finished scap sync-world: Backport for Revert "Use HTML markup instead of bidi control chars in wiki changes" (T375975 T376814) (duration: 12m 09s)
09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt2002.wikimedia.org
09:03 aklapper@deploy2002: hashar, aklapper: Continuing with sync
09:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P69580 and previous config saved to /var/cache/conftool/dbconfig/20241010-090259-arnaudb.json
09:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host apt2002.wikimedia.org
08:57 aklapper@deploy2002: hashar, aklapper: Backport for Revert "Use HTML markup instead of bidi control chars in wiki changes" (T375975 T376814) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P69579 and previous config saved to /var/cache/conftool/dbconfig/20241010-085721-arnaudb.json
08:55 aklapper@deploy2002: Started scap sync-world: Backport for Revert "Use HTML markup instead of bidi control chars in wiki changes" (T375975 T376814)
08:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T367781)', diff saved to https://phabricator.wikimedia.org/P69578 and previous config saved to /var/cache/conftool/dbconfig/20241010-084752-arnaudb.json
08:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1238 (T367781)', diff saved to https://phabricator.wikimedia.org/P69577 and previous config saved to /var/cache/conftool/dbconfig/20241010-084543-arnaudb.json
08:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1238.eqiad.wmnet with reason: Maintenance
08:45 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1238.eqiad.wmnet with reason: Maintenance
08:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T367781)', diff saved to https://phabricator.wikimedia.org/P69576 and previous config saved to /var/cache/conftool/dbconfig/20241010-084521-arnaudb.json
08:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T367781)', diff saved to https://phabricator.wikimedia.org/P69575 and previous config saved to /var/cache/conftool/dbconfig/20241010-084214-arnaudb.json
08:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on cloudsw1-b1-codfw.mgmt with reason: prevent bgp alerts firing until CRs configured
08:41 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on cloudsw1-b1-codfw.mgmt with reason: prevent bgp alerts firing until CRs configured
08:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org
08:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1236 (T367781)', diff saved to https://phabricator.wikimedia.org/P69574 and previous config saved to /var/cache/conftool/dbconfig/20241010-084003-arnaudb.json
08:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1236.eqiad.wmnet with reason: Maintenance
08:39 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1236.eqiad.wmnet with reason: Maintenance
08:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org
08:33 arnaudb@cumin1002: dbctl commit (dc=all): 'db1236 (re)pooling @ 100%: T376868', diff saved to https://phabricator.wikimedia.org/P69573 and previous config saved to /var/cache/conftool/dbconfig/20241010-083347-arnaudb.json
08:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P69572 and previous config saved to /var/cache/conftool/dbconfig/20241010-083013-arnaudb.json
08:21 brouberol@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling restart_daemons on P{cephosd1001*} and (A:cephosd)
08:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1002.wikimedia.org
08:21 brouberol@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling restart_daemons on P{cephosd1001*} and (A:cephosd)
08:18 arnaudb@cumin1002: dbctl commit (dc=all): 'db1236 (re)pooling @ 75%: T376868', diff saved to https://phabricator.wikimedia.org/P69571 and previous config saved to /var/cache/conftool/dbconfig/20241010-081841-arnaudb.json
08:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1002.wikimedia.org
08:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P69570 and previous config saved to /var/cache/conftool/dbconfig/20241010-081506-arnaudb.json
08:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 100%: T376867', diff saved to https://phabricator.wikimedia.org/P69569 and previous config saved to /var/cache/conftool/dbconfig/20241010-080711-arnaudb.json
08:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1002.eqiad.wmnet
08:03 arnaudb@cumin1002: dbctl commit (dc=all): 'db1236 (re)pooling @ 50%: T376868', diff saved to https://phabricator.wikimedia.org/P69568 and previous config saved to /var/cache/conftool/dbconfig/20241010-080336-arnaudb.json
08:02 moritzm: irc.wikimedia.org not directs to the ircstream implementation on irc1003.wikimedia.org T376014
08:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
08:02 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
08:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T367781)', diff saved to https://phabricator.wikimedia.org/P69567 and previous config saved to /var/cache/conftool/dbconfig/20241010-075959-arnaudb.json
07:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T367781)', diff saved to https://phabricator.wikimedia.org/P69566 and previous config saved to /var/cache/conftool/dbconfig/20241010-075951-arnaudb.json
07:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
07:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
07:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1221.eqiad.wmnet with reason: Maintenance
07:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1221.eqiad.wmnet with reason: Maintenance
07:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb1002.eqiad.wmnet
07:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T367781)', diff saved to https://phabricator.wikimedia.org/P69565 and previous config saved to /var/cache/conftool/dbconfig/20241010-075911-arnaudb.json
07:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1003.eqiad.wmnet
07:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 75%: T376867', diff saved to https://phabricator.wikimedia.org/P69564 and previous config saved to /var/cache/conftool/dbconfig/20241010-075206-arnaudb.json
07:48 arnaudb@cumin1002: dbctl commit (dc=all): 'db1236 (re)pooling @ 25%: T376868', diff saved to https://phabricator.wikimedia.org/P69563 and previous config saved to /var/cache/conftool/dbconfig/20241010-074831-arnaudb.json
07:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1003.eqiad.wmnet
07:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-rw2001.wikimedia.org
07:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P69562 and previous config saved to /var/cache/conftool/dbconfig/20241010-074404-arnaudb.json
07:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-rw2001.wikimedia.org
07:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 50%: T376867', diff saved to https://phabricator.wikimedia.org/P69561 and previous config saved to /var/cache/conftool/dbconfig/20241010-073700-arnaudb.json
07:34 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudidm2001-dev.codfw.wmnet
07:34 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:34 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudidm2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1002"
07:33 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudidm2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1002"
07:33 arnaudb@cumin1002: dbctl commit (dc=all): 'db1236 (re)pooling @ 10%: T376868', diff saved to https://phabricator.wikimedia.org/P69560 and previous config saved to /var/cache/conftool/dbconfig/20241010-073326-arnaudb.json
07:33 awight: UTC morning deployments done.
07:32 hashar: Stopped gerrit service on gerrit2003.codfw.wmnet since it is not starting up properly | T372804
07:32 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
07:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
07:31 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
07:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
07:30 slyngshede@cumin1002: START - Cookbook sre.dns.netbox
07:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P69559 and previous config saved to /var/cache/conftool/dbconfig/20241010-072857-arnaudb.json
07:28 awight@deploy2002: Finished scap sync-world: Backport for [config] Rename moved gadget name setting (T362771) (duration: 09m 22s)
07:25 slyngshede@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudidm2001-dev.codfw.wmnet
07:23 awight@deploy2002: awight, wmde-fisch: Continuing with sync
07:21 arnaudb@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 25%: T376867', diff saved to https://phabricator.wikimedia.org/P69558 and previous config saved to /var/cache/conftool/dbconfig/20241010-072155-arnaudb.json
07:21 awight@deploy2002: awight, wmde-fisch: Backport for [config] Rename moved gadget name setting (T362771) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-rw1001.wikimedia.org
07:18 awight@deploy2002: Started scap sync-world: Backport for [config] Rename moved gadget name setting (T362771)
07:18 arnaudb@cumin1002: dbctl commit (dc=all): 'db1236 (re)pooling @ 5%: T376868', diff saved to https://phabricator.wikimedia.org/P69557 and previous config saved to /var/cache/conftool/dbconfig/20241010-071820-arnaudb.json
07:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1236 T376868', diff saved to https://phabricator.wikimedia.org/P69556 and previous config saved to /var/cache/conftool/dbconfig/20241010-071721-arnaudb.json
07:16 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
07:16 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
07:15 slyngshede@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts cloudidm2001-dev.codfw.wmnet
07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-rw1001.wikimedia.org
07:15 slyngshede@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudidm2001-dev.codfw.wmnet
07:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote db1181 to s7 primary T376868', diff saved to https://phabricator.wikimedia.org/P69555 and previous config saved to /var/cache/conftool/dbconfig/20241010-071453-arnaudb.json
07:14 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
07:14 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
07:14 arnaudb: Starting s7 eqiad failover from db1236 to db1181 - T376868
07:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T367781)', diff saved to https://phabricator.wikimedia.org/P69554 and previous config saved to /var/cache/conftool/dbconfig/20241010-071350-arnaudb.json
07:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1199 (T367781)', diff saved to https://phabricator.wikimedia.org/P69553 and previous config saved to /var/cache/conftool/dbconfig/20241010-071242-arnaudb.json
07:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1199.eqiad.wmnet with reason: Maintenance
07:12 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1199.eqiad.wmnet with reason: Maintenance
07:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T367781)', diff saved to https://phabricator.wikimedia.org/P69552 and previous config saved to /var/cache/conftool/dbconfig/20241010-071219-arnaudb.json
07:08 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
07:08 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
07:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Set db1181 with weight 0 T376868', diff saved to https://phabricator.wikimedia.org/P69551 and previous config saved to /var/cache/conftool/dbconfig/20241010-070843-arnaudb.json
07:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s7 T376868
07:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s7 T376868
07:06 arnaudb@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 10%: T376867', diff saved to https://phabricator.wikimedia.org/P69550 and previous config saved to /var/cache/conftool/dbconfig/20241010-070650-arnaudb.json
06:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P69549 and previous config saved to /var/cache/conftool/dbconfig/20241010-065712-arnaudb.json
06:56 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
06:51 arnaudb@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 5%: T376867', diff saved to https://phabricator.wikimedia.org/P69548 and previous config saved to /var/cache/conftool/dbconfig/20241010-065145-arnaudb.json
06:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1230 T376867', diff saved to https://phabricator.wikimedia.org/P69547 and previous config saved to /var/cache/conftool/dbconfig/20241010-065048-arnaudb.json
06:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote db1183 to s5 primary T376867', diff saved to https://phabricator.wikimedia.org/P69546 and previous config saved to /var/cache/conftool/dbconfig/20241010-064827-arnaudb.json
06:47 arnaudb: Starting s5 eqiad failover from db1230 to db1183 - T376867
06:43 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
06:43 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
06:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Set db1183 with weight 0 T376867', diff saved to https://phabricator.wikimedia.org/P69545 and previous config saved to /var/cache/conftool/dbconfig/20241010-064219-arnaudb.json
06:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Primary switchover s5 T376867
06:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P69544 and previous config saved to /var/cache/conftool/dbconfig/20241010-064206-arnaudb.json
06:41 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Primary switchover s5 T376867
06:37 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
06:37 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
06:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T367781)', diff saved to https://phabricator.wikimedia.org/P69543 and previous config saved to /var/cache/conftool/dbconfig/20241010-062659-arnaudb.json
06:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1190 (T367781)', diff saved to https://phabricator.wikimedia.org/P69542 and previous config saved to /var/cache/conftool/dbconfig/20241010-062450-arnaudb.json
06:24 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1190.eqiad.wmnet with reason: Maintenance
06:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1190.eqiad.wmnet with reason: Maintenance
06:24 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance
06:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance
06:10 jelto@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
06:10 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
06:03 XioNoX: cr2-eqsin> request vmhost snapshot - T375961
03:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P69541 and previous config saved to /var/cache/conftool/dbconfig/20241010-031553-ladsgroup.json
03:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P69540 and previous config saved to /var/cache/conftool/dbconfig/20241010-031531-ladsgroup.json
03:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P69539 and previous config saved to /var/cache/conftool/dbconfig/20241010-030048-ladsgroup.json
03:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P69538 and previous config saved to /var/cache/conftool/dbconfig/20241010-030025-ladsgroup.json
02:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P69537 and previous config saved to /var/cache/conftool/dbconfig/20241010-024543-ladsgroup.json
02:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P69536 and previous config saved to /var/cache/conftool/dbconfig/20241010-024519-ladsgroup.json
02:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P69535 and previous config saved to /var/cache/conftool/dbconfig/20241010-023037-ladsgroup.json
02:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P69534 and previous config saved to /var/cache/conftool/dbconfig/20241010-023014-ladsgroup.json
02:02 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqsin [reason: repooling eqsin after cr2-eqsin replaced, T375961]
02:02 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqsin [reason: repooling eqsin after cr2-eqsin replaced, T375961]
01:50 sukhe: restart bird on doh5001 and dns5003 to resolve flapping BFD session after cr2-eqsin junos upgrade
01:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1198.eqiad.wmnet onto db1223.eqiad.wmnet
00:46 denisse@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host prometheus1006.eqiad.wmnet
00:41 eileen: civicrm upgraded from 3b6a7cbb to 07dee21c
00:27 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus1006.eqiad.wmnet
00:26 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1007.eqiad.wmnet
00:19 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus1007.eqiad.wmnet
00:19 denisse@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host prometheus2005.codfw.wmnet
00:02 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus2005.codfw.wmnet
00:02 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2006.codfw.wmnet

2024-10-09

23:52 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
23:51 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2007.codfw.wmnet
23:49 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gerrit2003.wikimedia.org
23:43 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus2007.codfw.wmnet
23:41 denisse@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host prometheus1005.eqiad.wmnet
23:26 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus1005.eqiad.wmnet
23:25 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2008.codfw.wmnet
23:18 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus2008.codfw.wmnet
23:07 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
23:02 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
22:51 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1012.eqiad.wmnet with OS bookworm
22:51 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
22:51 ladsgroup@cumin1002: START - Cookbook sre.mysql.clone of db1198.eqiad.wmnet onto db1223.eqiad.wmnet
22:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool for reclone (T375652)', diff saved to https://phabricator.wikimedia.org/P69532 and previous config saved to /var/cache/conftool/dbconfig/20241009-225055-ladsgroup.json
22:40 eileen: civicrm upgraded from cc7c7744 to 3b6a7cbb
22:35 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1004.wikimedia.org with reason: security release 20241009-3
22:30 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: security release 20241009-3
22:28 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1004.wikimedia.org with reason: security release 20241009-3
22:28 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: security release 20241009-3
22:01 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1004.wikimedia.org with reason: release 20241009-3
22:00 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: release 20241009-3
21:57 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: release 20241009-3
21:57 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: release 20241009-3
21:55 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009-2
21:54 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009-2
21:48 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009-2
21:47 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009-2
21:45 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
21:45 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on A:dnsbox and (A:esams or A:drmrs) and A:dnsbox
21:45 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns6002.wikimedia.org
21:44 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
21:44 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
21:44 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
21:42 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
21:42 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
21:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1212 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P69531 and previous config saved to /var/cache/conftool/dbconfig/20241009-214117-ladsgroup.json
21:41 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
21:32 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
21:30 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns6002.wikimedia.org
21:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1212 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P69530 and previous config saved to /var/cache/conftool/dbconfig/20241009-212612-ladsgroup.json
21:22 mutante: [apt1002:~] $ sudo -i reprepro --component thirdparty/gitlab-bullseye update bullseye-wikimedia
21:18 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns6001.wikimedia.org
21:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1212 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P69529 and previous config saved to /var/cache/conftool/dbconfig/20241009-211107-ladsgroup.json
21:08 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns6001.wikimedia.org
20:56 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns3004.wikimedia.org
20:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1212 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P69528 and previous config saved to /var/cache/conftool/dbconfig/20241009-205601-ladsgroup.json
20:44 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns3004.wikimedia.org
20:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1198.eqiad.wmnet onto db1212.eqiad.wmnet
20:32 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns3003.wikimedia.org
20:17 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns3003.wikimedia.org
20:17 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on A:dnsbox and (A:esams or A:drmrs) and A:dnsbox
20:12 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/echostore: apply
20:12 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/echostore: apply
20:08 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns2006*} and A:dnsbox
20:08 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns2006.wikimedia.org
19:55 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns2006.wikimedia.org
19:55 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns2006*} and A:dnsbox
19:55 swfrench-wmf: removing echostore staging deployment to unblock breaking change - T376766
19:46 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on A:dnsbox and A:magru and A:dnsbox
19:46 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns7002.wikimedia.org
19:38 mforns@deploy2002: helmfile [codfw] DONE helmfile.d/services/commons-impact-analytics: apply
19:38 mforns@deploy2002: helmfile [codfw] START helmfile.d/services/commons-impact-analytics: apply
19:35 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns7002.wikimedia.org
19:35 mforns@deploy2002: helmfile [eqiad] DONE helmfile.d/services/commons-impact-analytics: apply
19:35 mforns@deploy2002: helmfile [eqiad] START helmfile.d/services/commons-impact-analytics: apply
19:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-misc2002.codfw.wmnet with OS bookworm
19:28 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
19:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-misc2001.codfw.wmnet with OS bookworm
19:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
19:27 mforns@deploy2002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
19:27 mforns@deploy2002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
19:20 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns7001.wikimedia.org
19:05 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns7001.wikimedia.org
19:04 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on A:dnsbox and A:magru and A:dnsbox
19:04 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on A:dnsbox and A:eqsin and A:dnsbox
19:04 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns5004.wikimedia.org
18:49 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns5004.wikimedia.org
18:45 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts2002.codfw.wmnet
18:41 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus3003.esams.wmnet
18:38 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts2002.codfw.wmnet
18:35 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus3003.esams.wmnet
18:34 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns5003.wikimedia.org
18:34 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus4002.ulsfo.wmnet
18:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
18:29 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/echostore: apply
18:29 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/echostore: apply
18:28 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus4002.ulsfo.wmnet
18:26 ladsgroup@cumin1002: START - Cookbook sre.mysql.clone of db1198.eqiad.wmnet onto db1212.eqiad.wmnet
18:26 denisse@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host prometheus5002.eqsin.wmnet
18:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool for reclone (T375652)', diff saved to https://phabricator.wikimedia.org/P69527 and previous config saved to /var/cache/conftool/dbconfig/20241009-182632-ladsgroup.json
18:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
18:24 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns5003.wikimedia.org
18:24 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on A:dnsbox and A:eqsin and A:dnsbox
18:19 eileen: config revision changed from 739e8794 to ae4a5be9
18:18 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus5002.eqsin.wmnet
18:16 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus6002.drmrs.wmnet
18:16 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus6002.drmrs.wmnet
18:15 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs[5004-5006].eqsin.wmnet
18:15 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs[5004-5006].eqsin.wmnet
18:15 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus7001.magru.wmnet
18:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-misc2002.codfw.wmnet with reason: host reimage
18:12 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-misc2002.codfw.wmnet with reason: host reimage
18:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-misc2001.codfw.wmnet with reason: host reimage
18:08 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus7001.magru.wmnet
18:06 eileen: civicrm upgraded from ae54bd5e to cc7c7744
18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-misc2001.codfw.wmnet with reason: host reimage
18:01 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/echostore: apply
18:01 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/echostore: apply
17:58 zabe: zabe@mwmaint2002:~$ cat /home/zabe/s5.txt | xargs -I{} bash -c "echo {}; mwscript extensions/WikimediaMaintenance/migrateESRefToContentTable.php {} --skip /home/zabe/text_table_cleanup/{} --dump /home/zabe/text_table_dump/{} --sleep 1" # T183490
17:53 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-misc2002.codfw.wmnet with OS bookworm
17:53 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-misc2001.codfw.wmnet with OS bookworm
17:51 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/echostore: apply
17:51 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/echostore: apply
17:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P69526 and previous config saved to /var/cache/conftool/dbconfig/20241009-174501-ladsgroup.json
17:44 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host grafana2001.codfw.wmnet
17:41 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply
17:40 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host grafana2001.codfw.wmnet
17:38 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host grafana1002.eqiad.wmnet
17:34 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host grafana1002.eqiad.wmnet
17:31 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host alert1002.wikimedia.org
17:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P69525 and previous config saved to /var/cache/conftool/dbconfig/20241009-172956-ladsgroup.json
17:23 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host alert1002.wikimedia.org
17:23 denisse@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host alert2002.wikimedia.org
17:23 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host alert2002.wikimedia.org
17:21 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host alert1002.wikimedia.org
17:13 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host alert1002.wikimedia.org
17:12 denisse@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host alert2002.wikimedia.org
17:12 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host alert2002.wikimedia.org
16:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1157 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P69523 and previous config saved to /var/cache/conftool/dbconfig/20241009-165944-ladsgroup.json
16:50 kharlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
16:50 kharlan@deploy2002: helmfile [codfw] START helmfile.d/services/ipoid: apply
16:50 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
16:50 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
16:50 kharlan@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
16:50 kharlan@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
16:48 kharlan@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
16:48 kharlan@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
16:44 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:44 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for codfw cr IPs facin cloudsw - cmooney@cumin1002"
16:44 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for codfw cr IPs facin cloudsw - cmooney@cumin1002"
16:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1198.eqiad.wmnet onto db1157.eqiad.wmnet
16:34 cmooney@cumin1002: START - Cookbook sre.dns.netbox
16:32 bvibber: starting requeueTranscodes on old school mwmaint2002 after the k8s blowup last night
16:23 sukhe: running authdns-update to fix broken zone files on dns2004
16:23 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:23 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: picking up zone file 1.0.e.f.0.0.1.a.0.8.c.e.2.0.a.2.ip6.arpa - sukhe@cumin1002"
16:23 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: picking up zone file 1.0.e.f.0.0.1.a.0.8.c.e.2.0.a.2.ip6.arpa - sukhe@cumin1002"
16:21 sukhe: forcing commit 95858ba through sre.dns.netbox
16:20 sukhe@cumin1002: START - Cookbook sre.dns.netbox
16:07 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:05 sukhe@cumin1002: START - Cookbook sre.dns.netbox
16:03 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-misc2002.codfw.wmnet with OS bookworm
16:03 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-misc2001.codfw.wmnet with OS bookworm
16:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2002.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:58 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dns2005.wikimedia.org
15:58 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for dns2005.wikimedia.org
15:54 sukhe@cumin1002: END (ERROR) - Cookbook sre.dns.roll-reboot (exit_code=97) rolling reboot on A:dnsbox
15:53 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2002.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:53 sukhe: running authdns-update
15:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mc-misc2002.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:52 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx-in2001.wikimedia.org
15:49 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs[5004-5006].eqsin.wmnet with reason: site is depooled, cr2-eqsin is being replaced
15:49 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on lvs[5004-5006].eqsin.wmnet with reason: site is depooled, cr2-eqsin is being replaced
15:48 jhathaway@cumin1002: START - Cookbook sre.hosts.reboot-single for host mx-in2001.wikimedia.org
15:48 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx-in1001.wikimedia.org
15:47 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2002.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:45 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns2005.wikimedia.org
15:44 jhathaway@cumin1002: START - Cookbook sre.hosts.reboot-single for host mx-in1001.wikimedia.org
15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:43 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:43 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling reboot on A:wikidough and A:wikidough
15:30 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns2004.wikimedia.org
15:26 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idp.wikimedia.org on all recursors
15:26 sukhe@cumin1002: START - Cookbook sre.dns.wipe-cache idp.wikimedia.org on all recursors
15:25 fabfur: eqsin depooled for T375961
15:24 fabfur@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqsin [reason: eqsin cr replacement, T375961]
15:24 fabfur@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqsin [reason: eqsin cr replacement, T375961]
15:24 fabfur@cumin1002: END (FAIL) - Cookbook sre.dns.admin (exit_code=99) DNS admin: depool site eqsin [reason: eqsin cr replacementAA, T375961]
15:24 fabfur@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqsin [reason: eqsin cr replacementAA, T375961]
15:23 mutante: stewards* - rebooting machines - T351202
15:22 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:22 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add IPv6 reverse entry for cloudsw1-b1-codfw interface IPs - cmooney@cumin1002"
15:22 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add IPv6 reverse entry for cloudsw1-b1-codfw interface IPs - cmooney@cumin1002"
15:21 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns2004.wikimedia.org
15:20 sukhe: running dummy authdns-update
15:19 cmooney@cumin1002: START - Cookbook sre.dns.netbox
15:17 mutante: planet.wikimedia.org - rebooting backends
15:09 mutante: people.wikimedia.org - rebooting backends
15:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-maint2001.codfw.wmnet
15:07 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dns1006.wikimedia.org
15:07 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for dns1006.wikimedia.org
15:06 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns1006.wikimedia.org
15:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-maint2001.codfw.wmnet
15:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host crm2001.codfw.wmnet
15:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cr2-eqsin with reason: router replacement
15:03 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cr2-eqsin with reason: router replacement
15:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cr2-eqsin with reason: router replacement
15:02 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cr2-eqsin with reason: router replacement
15:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
15:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host crm2001.codfw.wmnet
14:59 brouberol@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling restart_daemons on P{cephosd1001*} and (A:cephosd)
14:58 brouberol@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling restart_daemons on P{cephosd1001*} and (A:cephosd)
14:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-maint1001.eqiad.wmnet
14:53 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on backup[2010-2011].codfw.wmnet with reason: T376800
14:52 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on backup[2010-2011].codfw.wmnet with reason: T376800
14:51 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
14:51 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:51 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
14:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-maint1001.eqiad.wmnet
14:50 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
14:50 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns1006.wikimedia.org
14:47 brouberol@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling restart_daemons on P{cephosd1001*} and (A:cephosd)
14:47 brouberol@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling restart_daemons on P{cephosd1001*} and (A:cephosd)
14:47 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:47 elukey@cumin2002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
14:45 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
14:44 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
14:44 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-durum (exit_code=0) rolling reboot on A:durum
14:44 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:44 elukey@cumin2002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:44 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:44 elukey@cumin2002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:43 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudlb2004-dev
14:43 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb2004-dev
14:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet
14:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet
14:35 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns1005.wikimedia.org
14:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet
14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:31 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
14:31 elukey@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: sync
14:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet
14:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:29 ladsgroup@cumin1002: START - Cookbook sre.mysql.clone of db1198.eqiad.wmnet onto db1157.eqiad.wmnet
14:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 (T367856)', diff saved to https://phabricator.wikimedia.org/P69522 and previous config saved to /var/cache/conftool/dbconfig/20241009-142848-ladsgroup.json
14:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1209.eqiad.wmnet with reason: Maintenance
14:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1209.eqiad.wmnet with reason: Maintenance
14:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T367856)', diff saved to https://phabricator.wikimedia.org/P69521 and previous config saved to /var/cache/conftool/dbconfig/20241009-142826-ladsgroup.json
14:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1012.eqiad.wmnet with reason: host reimage
14:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling for reclone (T375652)', diff saved to https://phabricator.wikimedia.org/P69520 and previous config saved to /var/cache/conftool/dbconfig/20241009-142404-ladsgroup.json
14:23 moritzm: failover master for ganeti/routed to ganeti2033
14:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudlb2004-dev
14:22 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb2004-dev
14:22 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1012.eqiad.wmnet with reason: host reimage
14:21 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns1005.wikimedia.org
14:21 sukhe: sudo cumin 'O:alerting_host' 'run-puppet-agent'
14:21 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2033.codfw.wmnet
14:21 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudlb2004-dev
14:21 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb2004-dev
14:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
14:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
14:20 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:20 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2004-dev.codfw.wmnet with OS bookworm
14:20 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
14:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
14:18 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:18 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
14:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
14:18 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling reboot on A:wikidough and A:wikidough
14:18 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:17 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:14 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2033.codfw.wmnet
14:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P69519 and previous config saved to /var/cache/conftool/dbconfig/20241009-141319-ladsgroup.json
14:12 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor1003.eqiad.wmnet
14:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2033.codfw.wmnet
14:11 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:11 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:11 moritzm: installing Apache security updates
14:10 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:09 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:09 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2033.codfw.wmnet
14:08 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:08 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
14:08 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
14:08 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor1003.eqiad.wmnet
14:07 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp2004.wikimedia.org
14:06 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns1004.wikimedia.org
14:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor2003.codfw.wmnet
14:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:04 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
14:03 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idp2004.wikimedia.org
14:02 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor2003.codfw.wmnet
14:01 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:01 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk1002.eqiad.wmnet
13:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P69517 and previous config saved to /var/cache/conftool/dbconfig/20241009-135812-ladsgroup.json
13:57 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host flink-zk1002.eqiad.wmnet
13:56 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host zookeeper-test1002.eqiad.wmnet
13:55 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp-test2005.wikimedia.org
13:54 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk1003.eqiad.wmnet
13:53 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
13:53 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
13:53 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns1004.wikimedia.org
13:52 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on A:dnsbox
13:52 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host zookeeper-test1002.eqiad.wmnet
13:51 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
13:51 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idp-test2005.wikimedia.org
13:51 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp-test2004.wikimedia.org
13:50 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host flink-zk1003.eqiad.wmnet
13:50 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on backup[1010-1011].eqiad.wmnet with reason: T376800
13:50 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on backup[1010-1011].eqiad.wmnet with reason: T376800
13:49 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1028.eqiad.wmnet
13:49 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk1001.eqiad.wmnet
13:48 Lucas_WMDE: UTC afternoon backport+config window done
13:48 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idp-test2004.wikimedia.org
13:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [brwikimedia] Enable the CampaignEvents extension (T376747) (duration: 07m 04s)
13:48 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp1004.wikimedia.org
13:45 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host flink-zk1001.eqiad.wmnet
13:45 brouberol@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host flink-zk1001.eqiad.wmnet
13:44 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host flink-zk1001.eqiad.wmnet
13:44 lucaswerkmeister-wmde@deploy2002: albertoleoncio, lucaswerkmeister-wmde: Continuing with sync
13:44 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idp1004.wikimedia.org
13:43 lucaswerkmeister-wmde@deploy2002: albertoleoncio, lucaswerkmeister-wmde: Backport for [brwikimedia] Enable the CampaignEvents extension (T376747) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:43 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp-test1004.wikimedia.org
13:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T367856)', diff saved to https://phabricator.wikimedia.org/P69516 and previous config saved to /var/cache/conftool/dbconfig/20241009-134305-ladsgroup.json
13:42 brouberol@cumin1002: END (ERROR) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=97) for Zookeeper A:zookeeper-flink-eqiad cluster: Roll restart of jvm daemons.
13:42 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1028.eqiad.wmnet
13:41 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [brwikimedia] Enable the CampaignEvents extension (T376747)
13:41 brouberol@cumin1002: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-flink-eqiad cluster: Roll restart of jvm daemons.
13:39 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idp-test1004.wikimedia.org
13:39 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-reboot-durum rolling reboot on A:durum
13:39 Lucas_WMDE: lucaswerkmeister-wmde@deploy2002 $ printf 'https://en.wikipedia.org/static/images/%s\n' 'project-logos/sdwiki.png' 'project-logos/sdwiki-1.5x.png' 'project-logos/sdwiki-2x.png' 'mobile/copyright/wikipedia-wordmark-sd.svg' 'mobile/copyright/wikipedia-tagline-sd.svg' | mwscript-k8s --attach -- purgeList.php # T376536
13:35 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for sdwiki: Add new logo and tagline (T376536) (duration: 19m 34s)
13:33 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm2001.wikimedia.org
13:32 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host gerrit2003.wikimedia.org
13:31 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idm2001.wikimedia.org
13:30 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ammarpad: Continuing with sync
13:30 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm1001.wikimedia.org
13:28 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idm1001.wikimedia.org
13:27 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm-test1001.wikimedia.org
13:23 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idm-test1001.wikimedia.org
13:22 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host etherpad1004.eqiad.wmnet
13:18 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ammarpad: Backport for sdwiki: Add new logo and tagline (T376536) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:18 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host etherpad1004.eqiad.wmnet
13:16 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host etherpad2002.codfw.wmnet
13:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for sdwiki: Add new logo and tagline (T376536)
13:14 kharlan@deploy2002: Finished scap sync-world: Backport for QuickSurveys: Deploy Safety Survey with zero coverage (T376517) (duration: 10m 37s)
13:12 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host etherpad2002.codfw.wmnet
13:09 kharlan@deploy2002: kharlan: Continuing with sync
13:06 kharlan@deploy2002: kharlan: Backport for QuickSurveys: Deploy Safety Survey with zero coverage (T376517) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:03 kharlan@deploy2002: Started scap sync-world: Backport for QuickSurveys: Deploy Safety Survey with zero coverage (T376517)
12:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
12:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts rpki2002.codfw.wmnet
12:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rpki2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1002"
12:41 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rpki2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1002"
12:38 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
12:33 ayounsi@cumin1002: START - Cookbook sre.hosts.decommission for hosts rpki2002.codfw.wmnet
12:24 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
12:24 jelto@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
12:23 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
12:23 jelto@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
12:18 moritzm: installing initramfs-tools bugfix updates from Bookworm point release
12:16 jelto@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
12:15 jelto@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
12:15 jelto@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
12:15 jelto@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
11:54 kcvelaga@deploy2002: Finished deploy [airflow-dags/analytics_product@b2c30ad]: T375153 (duration: 02m 32s)
11:52 jynus: start systemctl start wmf_auto_restart_routinator.service on rpki2003
11:52 kcvelaga@deploy2002: Started deploy [airflow-dags/analytics_product@b2c30ad]: T375153
11:24 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
11:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P69513 and previous config saved to /var/cache/conftool/dbconfig/20241009-111154-ladsgroup.json
11:04 elukey@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: sync
11:00 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
11:00 elukey@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: sync
10:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P69511 and previous config saved to /var/cache/conftool/dbconfig/20241009-105647-ladsgroup.json
10:49 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1027.eqiad.wmnet
10:44 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1177.eqiad.wmnet with OS bullseye
10:44 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
10:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P69507 and previous config saved to /var/cache/conftool/dbconfig/20241009-104142-ladsgroup.json
10:35 elukey@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: sync
10:28 elukey: roll restart swift-proxy on ms-fe* to pick up https://gerrit.wikimedia.org/r/c/operations/puppet/+/1078380
10:27 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1027.eqiad.wmnet
10:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P69506 and previous config saved to /var/cache/conftool/dbconfig/20241009-102636-ladsgroup.json
10:16 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1026.eqiad.wmnet
10:11 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1176.eqiad.wmnet with OS bullseye
09:42 Dreamy_Jazz: Started time limited MediaModertation scan on enwiki for 16hrs to catchup with monthly request limit - https://wikitech.wikimedia.org/wiki/MediaModeration
09:40 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1026.eqiad.wmnet
08:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:53 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:51 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1176.eqiad.wmnet with OS bullseye
08:49 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
08:49 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
08:48 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1176.eqiad.wmnet with OS bullseye
08:46 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:46 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
08:37 elukey@cumin2002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
08:37 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:36 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:23 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host cloudcephmon1005.eqiad.wmnet
08:18 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephmon1005.eqiad.wmnet
08:12 aklapper@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.43.0-wmf.26 refs T375657
08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1021.eqiad.wmnet
08:02 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1021.eqiad.wmnet
08:02 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1011.eqiad.wmnet
07:48 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
07:47 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1011.eqiad.wmnet
07:45 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
07:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
07:43 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
07:26 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
07:26 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
07:22 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
07:22 elukey@cumin2002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
07:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
07:20 elukey@cumin2002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
07:13 moritzm: remove ganeti2010 from active nodes T376594
06:37 eileen: civicrm upgraded from 251e958f to ae54bd5e
06:08 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
06:06 jelto@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
03:36 eileen: civicrm upgraded from 61718eae to 251e958f
01:26 eileen: tools upgraded from 3f7b238d to e9c05e30
00:04 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1012.eqiad.wmnet with OS bookworm

2024-10-08

22:36 tzatziki: removing 1 file for legal compliance
22:32 tzatziki: removing 3 files for legal compliance
22:16 tzatziki: removing 1 file for legal compliance
22:11 tzatziki: removing 3 files for legal compliance
21:59 tzatziki: removing 3 files for legal compliance
21:41 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on gerrit2003.wikimedia.org with reason: initial gerrit deploy wip
21:41 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on gerrit2003.wikimedia.org with reason: initial gerrit deploy wip
21:35 bvibber: running requeueTranscodes in k8s maint to clean up ios video transcodes (T363966)
21:34 mutante: gerrit2003 - sudo -u gerrit-deploy /usr/bin/scap deploy-local --repo gerrit/gerrit -D log_json:False (for some reason this fails in puppet but works manually) T372804 T257317 T317412
21:26 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
21:21 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1022.eqiad.wmnet with OS bullseye
21:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
21:16 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
21:06 eileen: config revision changed from 9ba217d2 to c84a1354
21:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1022.eqiad.wmnet with reason: host reimage
20:59 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1022.eqiad.wmnet with reason: host reimage
20:56 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host aqs1022.eqiad.wmnet with OS bullseye
20:54 cjming: end of UTC late backport window
20:54 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1022.eqiad.wmnet with OS bullseye
20:54 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host aqs1022.eqiad.wmnet with OS bullseye
20:54 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:52 cjming@deploy2002: Finished scap sync-world: Backport for Switch iOS back-compat video transcodes from HLS to regular QuickTime (T363966) (duration: 07m 39s)
20:52 jclark@cumin1002: START - Cookbook sre.dns.netbox
20:48 cjming@deploy2002: bvibber, cjming: Continuing with sync
20:47 cjming@deploy2002: bvibber, cjming: Backport for Switch iOS back-compat video transcodes from HLS to regular QuickTime (T363966) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:45 cjming@deploy2002: Started scap sync-world: Backport for Switch iOS back-compat video transcodes from HLS to regular QuickTime (T363966)
20:42 cjming@deploy2002: Finished scap sync-world: Backport for Dark mode: Make LiquidThreads namespace exclusion explicit (duration: 09m 58s)
20:37 cjming@deploy2002: jdlrobson, cjming: Continuing with sync
20:34 cjming@deploy2002: jdlrobson, cjming: Backport for Dark mode: Make LiquidThreads namespace exclusion explicit synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:32 cjming@deploy2002: Started scap sync-world: Backport for Dark mode: Make LiquidThreads namespace exclusion explicit
20:29 cjming@deploy2002: Finished scap sync-world: Backport for Expand Vector 2022 roll out and support local variants (T375549) (duration: 19m 28s)
20:29 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit2003.wikimedia.org with reason: applying gerrit profile
20:29 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit2003.wikimedia.org with reason: applying gerrit profile
20:26 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on gerrit2003.wikimedia.org with reason: applying gerrit profile
20:26 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:15:00 on gerrit2003.wikimedia.org with reason: applying gerrit profile
20:24 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
20:24 cjming@deploy2002: jdlrobson, cjming: Continuing with sync
20:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
20:12 cjming@deploy2002: jdlrobson, cjming: Backport for Expand Vector 2022 roll out and support local variants (T375549) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:11 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host backup1012
20:11 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host backup1012
20:10 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:10 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt backup1012 - jclark@cumin1002"
20:10 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt backup1012 - jclark@cumin1002"
20:10 cjming@deploy2002: Started scap sync-world: Backport for Expand Vector 2022 roll out and support local variants (T375549)
20:04 jclark@cumin1002: START - Cookbook sre.dns.netbox
19:54 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
18:59 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:58 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:54 swfrench-wmf: ran authdns-update on dns1004 to pick up mwdebug-next record - T372604
18:50 swfrench@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=mwdebug-next,name=codfw [reason: pooling mwdebug-next in codfw to match mwdebug - T372604]
18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for pfw1 lo0 - pt1979@cumin2002"
18:43 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for pfw1 lo0 - pt1979@cumin2002"
18:43 cdanis: 💔cdanis@cumin1002.eqiad.wmnet ~ 🕝☕ sudo cumin -b1 -s120 A:dnsbox 'run-puppet-agent --enable "cdanis rolling out T344171 Ie7d5091bca40"'
18:41 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:41 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:40 cdanis: 💙cdanis@cumin1002.eqiad.wmnet ~ 🕝☕ sudo cumin A:dnsbox 'disable-puppet "cdanis rolling out T344171 Ie7d5091bca40"'
18:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox
18:39 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:39 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:38 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:34 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
17:45 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-codfw (T372604)
17:39 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-codfw (T372604)
17:35 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-codfw (T372604)
17:35 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw (T372604)
17:34 swfrench-wmf: ran and enabled puppet-agent on 'A:lvs and A:codfw' - T372604
17:27 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-eqiad (T372604)
17:21 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-eqiad (T372604)
17:17 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad (T372604)
17:12 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad (T372604)
17:09 swfrench-wmf: ran and enabled puppet-agent on 'A:lvs and A:eqiad' - T372604
17:04 swfrench-wmf: ran disable-puppet on 'A:lvs and (A:eqiad or A:codfw)' - T372604
16:57 moritzm: enable Puppet fleet-wide for puppetmaster1001 hardware maintenance
16:49 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Define wgGlobalBlockingEnableAutoblocks as false (T374853), Remove wgGlobalBlockingAllowGlobalAccountBlocks as unused (duration: 06m 50s)
16:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2010.codfw.wmnet
16:48 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2004-dev.codfw.wmnet with OS bookworm
16:48 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
16:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudlb2004-dev.codfw.wmnet
16:44 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
16:44 dreamyjazz@deploy2002: dreamyjazz: Backport for Define wgGlobalBlockingEnableAutoblocks as false (T374853), Remove wgGlobalBlockingAllowGlobalAccountBlocks as unused synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on puppetserver1001.eqiad.wmnet with reason: RAM expansion
16:43 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on puppetserver1001.eqiad.wmnet with reason: RAM expansion
16:42 dreamyjazz@deploy2002: Started scap sync-world: Backport for Define wgGlobalBlockingEnableAutoblocks as false (T374853), Remove wgGlobalBlockingAllowGlobalAccountBlocks as unused
16:40 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cloudlb2004-dev.codfw.wmnet
16:39 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudlb2004-dev.codfw.wmnet
16:37 moritzm: disable Puppet fleet-wide for puppetmaster1001 hardware maintenance
16:28 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cloudlb2004-dev.codfw.wmnet
16:26 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-eqiad
16:25 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-eqiad
16:24 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad
16:23 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad
16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
16:09 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
16:08 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudlb2004-dev
16:08 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb2004-dev
16:08 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
16:08 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
16:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
16:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
16:06 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudlb2004-dev
16:06 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb2004-dev
16:06 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-misc2001 to codfw - jhancock@cumin2002"
16:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-misc2001 to codfw - jhancock@cumin2002"
16:02 jhancock@cumin2002: START - Cookbook sre.dns.netbox
15:41 papaul: mr1-magru end of maintenance
15:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f7-eqiad
15:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-f7-eqiad
15:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e7-eqiad
15:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-e7-eqiad
15:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e6-eqiad
15:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-e6-eqiad
15:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f6-eqiad
15:33 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-f6-eqiad
15:33 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f5-eqiad
15:33 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-f5-eqiad
15:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
15:32 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e5-eqiad
15:32 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-e5-eqiad
15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudlb2004-dev']
15:26 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
15:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudlb2004-dev']
15:19 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
15:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudlb2004-dev']
15:19 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
15:05 brennen@deploy2002: Finished deploy [phabricator/deployment@40a63c9]: deploy phab1004 for T376720 (duration: 01m 07s)
15:04 brennen@deploy2002: Started deploy [phabricator/deployment@40a63c9]: deploy phab1004 for T376720
15:03 brennen@deploy2002: Finished deploy [phabricator/deployment@40a63c9]: test deploy phab2002 for T376720 (duration: 00m 26s)
15:03 brennen@deploy2002: Started deploy [phabricator/deployment@40a63c9]: test deploy phab2002 for T376720
15:02 dzahn@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on phab.wmfusercontent.org with reason: version upgrade
15:02 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab.wmfusercontent.org with reason: version upgrade
15:02 dzahn@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on phabricator.wikimedia.org with reason: version upgrade
15:02 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on phabricator.wikimedia.org with reason: version upgrade
15:02 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: version upgrade
15:02 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: version upgrade
15:01 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: version upgrade
15:01 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: version upgrade
14:58 papaul: mr1-magru ongoing maintenance
14:56 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
14:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
14:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:47 sergi0: deployment-prep: `sgimeno@deployment-mwmaint03:~$ foreachwiki userOptions.php --delete --old=1 growthexperiments-tour-newimpact-discovery` (T376461)
14:41 moritzm: installing python-aiosmtpd security updates
14:35 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2010.codfw.wmnet
14:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2010.codfw.wmnet
14:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1010.eqiad.wmnet
14:30 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2010.codfw.wmnet
14:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
14:23 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1010.eqiad.wmnet
14:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudlb2004-dev']
14:22 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1009.eqiad.wmnet
14:22 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-misc2001
14:22 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-misc2001
14:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
14:17 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudlb2004-dev']
14:16 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
14:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudlb2004-dev']
14:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
14:15 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudlb2004-dev
14:15 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb2004-dev
14:13 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-misc2001
14:13 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-misc2001
14:10 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1009.eqiad.wmnet
14:08 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:08 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-misc2001 to codfw - jhancock@cumin2002"
14:08 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-misc2001 to codfw - jhancock@cumin2002"
14:05 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:03 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
13:59 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
13:53 zabe@deploy2002: Finished scap sync-world: Backport for Stop setting wgAbuseFilterActorTableSchemaMigrationStage (T188180) (duration: 07m 03s)
13:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-staging2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
13:49 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-staging2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
13:49 zabe@deploy2002: zabe: Continuing with sync
13:48 zabe@deploy2002: zabe: Backport for Stop setting wgAbuseFilterActorTableSchemaMigrationStage (T188180) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:46 zabe@deploy2002: Started scap sync-world: Backport for Stop setting wgAbuseFilterActorTableSchemaMigrationStage (T188180)
13:46 zabe@deploy2002: Finished scap sync-world: Backport for s5: Reduce revision-slots cache expiry to 60 seconds (T183490) (duration: 07m 10s)
13:41 zabe@deploy2002: zabe: Continuing with sync
13:41 zabe@deploy2002: zabe: Backport for s5: Reduce revision-slots cache expiry to 60 seconds (T183490) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:39 zabe@deploy2002: Started scap sync-world: Backport for s5: Reduce revision-slots cache expiry to 60 seconds (T183490)
13:33 Lucas_WMDE: UTC afternoon backport+config window done
13:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Remove $wgCodeMirrorRTL temporary feature flag (T170001 T357795) (duration: 06m 56s)
13:27 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, musikanimal: Continuing with sync
13:27 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, musikanimal: Backport for Remove $wgCodeMirrorRTL temporary feature flag (T170001 T357795) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:24 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Remove $wgCodeMirrorRTL temporary feature flag (T170001 T357795)
13:24 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2001.codfw.wmnet
13:24 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2001.codfw.wmnet
13:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-staging2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
13:15 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-staging2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
13:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host deploy1003.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
13:11 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for hawiki: Add temporary tagline for Vector-2022 (T376049) (duration: 08m 17s)
13:11 elukey@cumin1002: START - Cookbook sre.hosts.provision for host deploy1003.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
13:09 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host parsoidtest1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
13:07 lucaswerkmeister-wmde@deploy2002: ammarpad, lucaswerkmeister-wmde: Continuing with sync
13:06 lucaswerkmeister-wmde@deploy2002: ammarpad, lucaswerkmeister-wmde: Backport for hawiki: Add temporary tagline for Vector-2022 (T376049) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host parsoidtest1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
13:03 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for hawiki: Add temporary tagline for Vector-2022 (T376049)
12:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host krb1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
12:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host krb1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
12:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2008.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
12:57 Amir1: dropping povwatch_log on all.dblist (T54924 and T376627)
12:55 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ganeti2036.codfw.wmnet
12:53 elukey@cumin1002: START - Cookbook sre.hosts.provision for host dbproxy2008.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
12:53 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2007.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
12:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host dbproxy2007.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
12:49 ladsgroup@deploy2002: Finished scap sync-world: Backport for Remove flow from techconductwiki (T332022) (duration: 09m 27s)
12:47 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
12:45 moritzm: installing lua5.4 bugfix updates
12:44 ladsgroup@deploy2002: ladsgroup: Continuing with sync
12:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host dbproxy2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
12:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
12:42 ladsgroup@deploy2002: ladsgroup: Backport for Remove flow from techconductwiki (T332022) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:39 ladsgroup@deploy2002: Started scap sync-world: Backport for Remove flow from techconductwiki (T332022)
12:39 elukey@cumin1002: START - Cookbook sre.hosts.provision for host dbproxy2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
12:36 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
12:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
12:32 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1005.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
12:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2036.codfw.wmnet
12:29 elukey@cumin1002: START - Cookbook sre.hosts.provision for host an-conf1005.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
12:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2036.codfw.wmnet
12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2036.codfw.wmnet
12:26 moritzm: remove ganeti2009 from active nodes T376594
12:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1008.eqiad.wmnet
12:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2009.codfw.wmnet
12:19 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2001.codfw.wmnet with OS bookworm
12:15 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1008.eqiad.wmnet
12:13 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1007.eqiad.wmnet
12:01 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2001.codfw.wmnet with reason: host reimage
11:56 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2001.codfw.wmnet with reason: host reimage
11:52 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1007.eqiad.wmnet
11:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1006.eqiad.wmnet
11:35 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestage2001.codfw.wmnet with OS bookworm
11:33 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage2001.codfw.wmnet
11:30 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage2001.codfw.wmnet
11:30 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2002.codfw.wmnet
11:30 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2002.codfw.wmnet
11:29 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1006.eqiad.wmnet
11:28 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2002.codfw.wmnet with OS bookworm
11:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:13 elukey@cumin1002: START - Cookbook sre.hosts.provision for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:09 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
11:06 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
10:55 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2009.codfw.wmnet
10:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2009.codfw.wmnet
10:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2009.codfw.wmnet
10:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2009.codfw.wmnet
10:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2009.codfw.wmnet
10:49 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm
10:49 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002"
10:45 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestage2002.codfw.wmnet with OS bookworm
10:36 jayme: updated kubernetes 1.23.14-3 -> 1.23.14-4 on P:kubernetes::node - T362408
10:27 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
10:26 jayme: re-enable puppet on all P:kubernetes::node
10:26 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002"
10:09 jayme: disabled puppet on all P:kubernetes::node
10:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
10:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
09:52 moritzm: installing freetype bugfix updates from Bookworm point update
09:48 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
09:48 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
09:47 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
09:36 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
09:33 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
09:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1005.eqiad.wmnet
09:29 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
09:26 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
09:25 jayme: imported kubernetes 1.23.14-4 to component/kubernetes123 (buster, bullseye, bookworm) - T362408
09:23 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
09:20 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
09:17 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1005.eqiad.wmnet
09:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2036.codfw.wmnet to cluster codfw and group C
09:12 Dreamy_Jazz: Maintenance script for T376340 finished
09:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2036.codfw.wmnet to cluster codfw and group C
09:11 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
09:10 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
09:06 Dreamy_Jazz: Ran `mwscript-k8s --comment="T376340" -- extensions/GlobalBlocking/maintenance/UpdateAutoBlockParentIdColumn.php --wiki=aawikibooks`
09:01 stran@deploy2002: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
08:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2036.codfw.wmnet
08:55 stran@deploy2002: helmfile [codfw] START helmfile.d/services/ipoid: apply
08:55 stran@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
08:54 stran@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
08:53 stran@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
08:53 stran@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
08:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2036.codfw.wmnet
08:20 dcausse: repooling wdqs1013
08:20 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: maintenance
08:20 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: maintenance
08:19 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.26 refs T375657
08:16 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 100%: T374215', diff saved to https://phabricator.wikimedia.org/P69498 and previous config saved to /var/cache/conftool/dbconfig/20241008-081620-arnaudb.json
08:01 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 75%: T374215', diff saved to https://phabricator.wikimedia.org/P69497 and previous config saved to /var/cache/conftool/dbconfig/20241008-080115-arnaudb.json
07:46 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 50%: T374215', diff saved to https://phabricator.wikimedia.org/P69496 and previous config saved to /var/cache/conftool/dbconfig/20241008-074609-arnaudb.json
07:44 vgutierrez: uploaded golang-github-jvgutierrez-go-etcd-harness 1.0.0 to apt.wm.o (bookworm-wikimedia) - T376600
07:31 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 25%: T374215', diff saved to https://phabricator.wikimedia.org/P69495 and previous config saved to /var/cache/conftool/dbconfig/20241008-073104-arnaudb.json
07:16 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 15%: T374215', diff saved to https://phabricator.wikimedia.org/P69494 and previous config saved to /var/cache/conftool/dbconfig/20241008-071559-arnaudb.json
07:10 dcausse: depooling wdqs1013 (lag)
07:00 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 10%: T374215', diff saved to https://phabricator.wikimedia.org/P69493 and previous config saved to /var/cache/conftool/dbconfig/20241008-070053-arnaudb.json
06:45 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 5%: T374215', diff saved to https://phabricator.wikimedia.org/P69492 and previous config saved to /var/cache/conftool/dbconfig/20241008-064548-arnaudb.json
04:01 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.23 (duration: 00m 58s)
03:50 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.43.0-wmf.26 refs T375657 (duration: 47m 44s)
03:16 eileen: civicrm upgraded from 8b13ef22 to 61718eae
03:15 eileen: config revision changed from 6e649356 to 9ba217d2
03:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.43.0-wmf.26 refs T375657
00:55 eileen: config revision changed from 856e4d99 to 6e649356
00:30 eileen: config revision changed from 856e4d99 to 4ab498d2 - disable process control to load triggers

2024-10-07

22:33 eileen: civicrm upgraded from f2095695 to 8b13ef22
22:09 eileen: config revision changed from a2ba4a8d to 856e4d99
21:32 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2004-dev.codfw.wmnet with OS bookworm
20:20 urbanecm@deploy2002: Finished scap sync-world: Backport for disable the Add A Fact QuickSurvey on enwiki, Enable EditCheck on ru.wiki (T373022) (duration: 07m 41s)
20:16 urbanecm@deploy2002: esanders, derenrich, urbanecm: Continuing with sync
20:14 urbanecm@deploy2002: esanders, derenrich, urbanecm: Backport for disable the Add A Fact QuickSurvey on enwiki, Enable EditCheck on ru.wiki (T373022) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:12 urbanecm@deploy2002: Started scap sync-world: Backport for disable the Add A Fact QuickSurvey on enwiki, Enable EditCheck on ru.wiki (T373022)
20:12 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
19:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudlb2004-dev']
19:56 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
19:56 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
18:22 swfrench-wmf: running `git restore helmfile.d/services/thumbor/values.yaml` on deploy1003 to unblock git-pull timer
18:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2002.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2002.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:14 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-misc2001 to codfw - jhancock@cumin2002"
18:14 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-misc2001 to codfw - jhancock@cumin2002"
18:10 jhancock@cumin2002: START - Cookbook sre.dns.netbox
17:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
17:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
17:29 swfrench@deploy2002: Finished scap sync-world: Testing scap after mw-debug next bring-up - T372604 (duration: 02m 45s)
17:26 swfrench@deploy2002: Started scap sync-world: Testing scap after mw-debug next bring-up - T372604
17:12 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
17:12 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
17:06 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
17:06 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
16:26 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2001.codfw.wmnet with OS bookworm
16:24 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
16:16 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2002.codfw.wmnet with OS bookworm
16:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance
16:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance
15:59 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance
15:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance
15:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance
15:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance
15:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on puppetserver1003.eqiad.wmnet with reason: RAM expansion
15:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on puppetserver1003.eqiad.wmnet with reason: RAM expansion
15:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on puppetserver1002.eqiad.wmnet with reason: RAM expansion
15:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on puppetserver1002.eqiad.wmnet with reason: RAM expansion
15:13 jclark@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts puppetmaster1001.eqiad.wmnet
15:13 jclark@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts puppetmaster1001.eqiad.wmnet
15:00 papaul: ongoing maintenance on mr1-esams
14:43 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
14:40 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
14:18 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestage2002.codfw.wmnet with OS bookworm
14:16 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wikikube-worker2092.codfw.wmnet with reason: Degraded RAID
14:16 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wikikube-worker2092.codfw.wmnet with reason: Degraded RAID
13:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 (T367856)', diff saved to https://phabricator.wikimedia.org/P69489 and previous config saved to /var/cache/conftool/dbconfig/20241007-134950-ladsgroup.json
13:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1203.eqiad.wmnet with reason: Maintenance
13:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2036.codfw.wmnet
13:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1203.eqiad.wmnet with reason: Maintenance
13:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T367856)', diff saved to https://phabricator.wikimedia.org/P69488 and previous config saved to /var/cache/conftool/dbconfig/20241007-134929-ladsgroup.json
13:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2036.codfw.wmnet
13:37 vgutierrez: switching to digicert-2024 certificates on esams, eqsin, drmrs and magru
13:36 Lucas_WMDE: UTC afternoon backport+config window done
13:35 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Update globalblocks 'gb_address' index to allow autoblocks (T376052) (duration: 06m 49s)
13:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P69487 and previous config saved to /var/cache/conftool/dbconfig/20241007-133422-ladsgroup.json
13:31 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
13:30 dreamyjazz@deploy2002: dreamyjazz: Backport for Update globalblocks 'gb_address' index to allow autoblocks (T376052) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:28 dreamyjazz@deploy2002: Started scap sync-world: Backport for Update globalblocks 'gb_address' index to allow autoblocks (T376052)
13:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P69486 and previous config saved to /var/cache/conftool/dbconfig/20241007-131915-ladsgroup.json
13:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2035.codfw.wmnet to cluster codfw and group C
13:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2035.codfw.wmnet to cluster codfw and group C
13:10 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for scandium is being replaced by parsoidtest1001 (T363402) (duration: 07m 14s)
13:05 lucaswerkmeister-wmde@deploy2002: arlolra, lucaswerkmeister-wmde: Continuing with sync
13:05 lucaswerkmeister-wmde@deploy2002: arlolra, lucaswerkmeister-wmde: Backport for scandium is being replaced by parsoidtest1001 (T363402) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T367856)', diff saved to https://phabricator.wikimedia.org/P69485 and previous config saved to /var/cache/conftool/dbconfig/20241007-130409-ladsgroup.json
13:03 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for scandium is being replaced by parsoidtest1001 (T363402)
13:02 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2035.codfw.wmnet to cluster codfw and group C
13:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2035.codfw.wmnet to cluster codfw and group C
13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2035.codfw.wmnet
12:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2035.codfw.wmnet
12:53 Lucas_WMDE: printf 'https://en.wikipedia.org/static/images/%s\n' 'mobile/copyright/wikimaniawiki-wordmark.svg' 'project-logos/wikimaniawiki-1.5x.png' 'project-logos/wikimaniawiki-2x.png' 'project-logos/wikimaniawiki.png' 'icons/wikimaniawiki.svg' | mwscript-k8s --attach -- purgeList enwiki # T376292
12:03 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
12:02 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
11:29 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
11:29 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
11:25 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
11:25 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
11:16 vgutierrez: uploaded golang-github-mtchavez-jenkins 1.0.0 to apt.wm.o (bookworm-wikimedia) - T376600
11:04 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 100%: T374215', diff saved to https://phabricator.wikimedia.org/P69484 and previous config saved to /var/cache/conftool/dbconfig/20241007-110430-arnaudb.json
10:52 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage2002.codfw.wmnet
10:50 Dreamy_Jazz: Started 2 day scan on enwiki for MediaModeration to catchup with monthly request limit - https://wikitech.wikimedia.org/wiki/MediaModeration
10:49 Dreamy_Jazz: Started MediaModeration scanning script after it crashed for commonswiki - https://wikitech.wikimedia.org/wiki/MediaModeration
10:49 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage2002.codfw.wmnet
10:49 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 75%: T374215', diff saved to https://phabricator.wikimedia.org/P69483 and previous config saved to /var/cache/conftool/dbconfig/20241007-104925-arnaudb.json
10:47 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2001.codfw.wmnet
10:47 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2001.codfw.wmnet
10:34 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 50%: T374215', diff saved to https://phabricator.wikimedia.org/P69482 and previous config saved to /var/cache/conftool/dbconfig/20241007-103420-arnaudb.json
10:19 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 25%: T374215', diff saved to https://phabricator.wikimedia.org/P69481 and previous config saved to /var/cache/conftool/dbconfig/20241007-101914-arnaudb.json
10:17 vgutierrez: uploaded golang-github-cloudflare-ipvs 0.10.2 to apt.wm.o (bookworm-wikimedia) - T376600
10:13 moritzm: installing Linux 6.1.112 on Bookworm systems
10:11 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
10:10 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
10:04 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 10%: T374215', diff saved to https://phabricator.wikimedia.org/P69480 and previous config saved to /var/cache/conftool/dbconfig/20241007-100410-arnaudb.json
10:00 vgutierrez: uploaded golang-github-flyingmutant-rapid 1.1.0 to apt.wm.o (bookworm-wikimedia) - T376600
09:49 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 5%: T374215', diff saved to https://phabricator.wikimedia.org/P69478 and previous config saved to /var/cache/conftool/dbconfig/20241007-094904-arnaudb.json
09:34 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 2%: T374215', diff saved to https://phabricator.wikimedia.org/P69477 and previous config saved to /var/cache/conftool/dbconfig/20241007-093359-arnaudb.json
09:30 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: maintenance
09:30 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: maintenance
09:27 arnaudb@cumin1002: dbctl commit (dc=all): 'missing commit', diff saved to https://phabricator.wikimedia.org/P69476 and previous config saved to /var/cache/conftool/dbconfig/20241007-092714-arnaudb.json
09:19 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 1%: T374215', diff saved to https://phabricator.wikimedia.org/P69474 and previous config saved to /var/cache/conftool/dbconfig/20241007-091953-arnaudb.json
09:18 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 1%: T374215', diff saved to https://phabricator.wikimedia.org/P69473 and previous config saved to /var/cache/conftool/dbconfig/20241007-091854-arnaudb.json
09:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1233.eqiad.wmnet onto db1246.eqiad.wmnet
08:37 aqu@deploy2002: Finished deploy [airflow-dags/analytics@1699d34]: Refine staging fixes [airflow-dags@1699d34f] (duration: 04m 43s)
08:32 aqu@deploy2002: Started deploy [airflow-dags/analytics@1699d34]: Refine staging fixes [airflow-dags@1699d34f]
08:24 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503] (duration: 00m 13s)
08:24 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503]
08:02 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503] (duration: 00m 18s)
08:02 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
08:02 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503]
08:02 jelto@deploy2002: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
08:01 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
08:01 jelto@deploy2002: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
08:00 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1002.eqiad.wmnet
07:57 aborrero@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudgw1002.eqiad.wmnet
07:56 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db1233.eqiad.wmnet onto db1246.eqiad.wmnet
07:56 arnaudb@cumin1002: dbctl commit (dc=all): 'T374215 db1233 depool as clone source for db1246', diff saved to https://phabricator.wikimedia.org/P69471 and previous config saved to /var/cache/conftool/dbconfig/20241007-075611-arnaudb.json
07:56 hashar: UTC morning backport window completed
07:54 hashar@deploy2002: Finished scap sync-world: Backport for logos: Sync config.yaml and logos.php (T374430), hawiki: Add temporary logo (T376049) (duration: 11m 19s)
07:49 hashar@deploy2002: ammarpad, hashar: Continuing with sync
07:45 hashar@deploy2002: ammarpad, hashar: Backport for logos: Sync config.yaml and logos.php (T374430), hawiki: Add temporary logo (T376049) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:43 hashar@deploy2002: Started scap sync-world: Backport for logos: Sync config.yaml and logos.php (T374430), hawiki: Add temporary logo (T376049)
07:42 hashar@deploy2002: Finished scap sync-world: Backport for Revert "wikimaniawiki: Update logos to 2024" (duration: 21m 40s)
07:04 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
07:04 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64315
07:04 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 64315
07:04 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply

2024-10-06

22:59 eileen: civicrm upgraded from 45855ff4 to f2095695

2024-10-05

19:43 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
16:45 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
16:41 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
16:40 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
16:37 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
16:36 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
16:36 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
13:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 (T367856)', diff saved to https://phabricator.wikimedia.org/P69470 and previous config saved to /var/cache/conftool/dbconfig/20241005-133058-ladsgroup.json
13:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1192.eqiad.wmnet with reason: Maintenance
13:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1192.eqiad.wmnet with reason: Maintenance
13:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T367856)', diff saved to https://phabricator.wikimedia.org/P69469 and previous config saved to /var/cache/conftool/dbconfig/20241005-133036-ladsgroup.json
13:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P69468 and previous config saved to /var/cache/conftool/dbconfig/20241005-131529-ladsgroup.json
13:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P69467 and previous config saved to /var/cache/conftool/dbconfig/20241005-130022-ladsgroup.json
12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T367856)', diff saved to https://phabricator.wikimedia.org/P69466 and previous config saved to /var/cache/conftool/dbconfig/20241005-124515-ladsgroup.json

2024-10-04

17:48 ejegg: fundraising civicrm upgraded from 90199f62 to 45855ff4
16:21 ayounsi@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host sretest2001.codfw.wmnet
16:00 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.categories-reload (exit_code=99) reloading categories to wdqs-categories1001.eqiad.wmnet
14:29 mforns@deploy2002: Finished deploy [airflow-dags/analytics@4b69f50]: add category to commons impact metrics allowlist (duration: 01m 48s)
14:28 mforns@deploy2002: Started deploy [airflow-dags/analytics@4b69f50]: add category to commons impact metrics allowlist
13:54 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs-categories1001.eqiad.wmnet
13:33 bking@cumin2002: END (ERROR) - Cookbook sre.wdqs.categories-reload (exit_code=97) reloading categories to wdqs-categories1001.eqiad.wmnet
13:32 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs-categories1001.eqiad.wmnet
13:19 ayounsi@cumin1002: START - Cookbook sre.hosts.dhcp for host sretest2001.codfw.wmnet
12:00 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@9096f1b] (releasing): (no justification provided) (duration: 01m 13s)
11:59 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@9096f1b] (releasing): (no justification provided)
11:47 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@9096f1b] (releasing): (no justification provided) (duration: 00m 47s)
11:46 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@9096f1b] (releasing): (no justification provided)
10:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2004.wikimedia.org
10:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2004.wikimedia.org
10:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1004.wikimedia.org
10:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1004.wikimedia.org
10:07 moritzm: upload ircstream 0.13.0+sse12u1 to apt.wikimedia.org bookworm/ircstream-sse component (seperate build using the experimental eventstream feature branch of ircstream) T376014
09:43 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database shnwikinews (T375432)
09:35 moritzm: upload ircstream 0.13.0+wmf12u1 to apt.wikimedia.org T376014
09:18 btullis@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database shnwikinews (T375432)
09:17 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database kgewiki (T374814)
09:17 btullis@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database kgewiki (T374814)
09:17 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database gorwikiquote (T375094)
09:16 btullis@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database gorwikiquote (T375094)
09:16 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database madwiktionary (T375023)
09:16 btullis@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database madwiktionary (T375023)
09:15 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database moswiki (T375568)
09:15 btullis@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database moswiki (T375568)
09:09 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
08:58 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
07:51 oblivian@puppetserver1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=kubernetes,name=mw1439.eqiad.wmnet
07:51 oblivian@puppetserver1001: conftool action : set/weight=1; selector: dc=eqiad,cluster=kubernetes,name=mw1439.eqiad.wmnet
07:30 hashar: upgrading Jenkins on CI Jenkins
07:04 moritzm: import jenkins 2.462.3 to thirdparty/ci T376449
01:45 ejegg: payments-wiki upgraded from e88750e6 to ed2d78b3

2024-10-03

22:37 brennen@deploy2002: Finished scap sync-world: Backport for Revert "Turn on Parsoid Selective Update metrics" (T376433) (duration: 07m 04s)
22:33 brennen@deploy2002: brennen: Continuing with sync
22:32 brennen@deploy2002: brennen: Backport for Revert "Turn on Parsoid Selective Update metrics" (T376433) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:30 brennen@deploy2002: Started scap sync-world: Backport for Revert "Turn on Parsoid Selective Update metrics" (T376433)
22:18 brennen@deploy2002: scap failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.43.0-wmf.25 --multiversion-image-name docker-registry.discovery.wmnet/restricted/mediawiki-multiversion --multiversion-debug-image-name docker-registry.discovery.wmnet/restricted/m
22:18 brennen@deploy2002: Started scap sync-world: Backport for Revert "Turn on Parsoid Selective Update metrics" (T376433)
22:15 brennen@deploy2002: scap failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.43.0-wmf.25 --multiversion-image-name docker-registry.discovery.wmnet/restricted/mediawiki-multiversion --multiversion-debug-image-name docker-registry.discovery.wmnet/restricted/m
22:15 brennen@deploy2002: Started scap sync-world: Backport for Revert "Turn on Parsoid Selective Update metrics" (T376433)
21:39 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.categories-reload (exit_code=99) reloading categories to wdqs-categories1001.eqiad.wmnet
21:39 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs-categories1001.eqiad.wmnet
21:28 brennen: end of UTC late backport & config window
21:28 brennen@deploy2002: Finished scap sync-world: Backport for Turn on Parsoid Selective Update metrics (T371713) (duration: 15m 30s)
21:23 brennen@deploy2002: cscott, brennen: Continuing with sync
21:15 brennen@deploy2002: cscott, brennen: Backport for Turn on Parsoid Selective Update metrics (T371713) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:13 brennen@deploy2002: Started scap sync-world: Backport for Turn on Parsoid Selective Update metrics (T371713)
21:11 brennen@deploy2002: Finished scap sync-world: Backport for RefreshLinksJob: Fix exception due to null/false confusion (take 2) (duration: 10m 09s)
21:06 brennen@deploy2002: cscott, brennen: Continuing with sync
21:02 brennen@deploy2002: cscott, brennen: Backport for RefreshLinksJob: Fix exception due to null/false confusion (take 2) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:00 brennen@deploy2002: Started scap sync-world: Backport for RefreshLinksJob: Fix exception due to null/false confusion (take 2)
20:56 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1022.eqiad.wmnet with OS bullseye
20:44 brennen@deploy2002: Finished scap sync-world: Backport for Update jquery.ime from upstream (duration: 09m 25s)
20:39 brennen@deploy2002: brennen, amire80: Continuing with sync
20:37 brennen@deploy2002: brennen, amire80: Backport for Update jquery.ime from upstream synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:34 brennen@deploy2002: Started scap sync-world: Backport for Update jquery.ime from upstream
20:02 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.categories-reload (exit_code=99) reloading categories to wdqs-categories1001.eqiad.wmnet
20:02 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs-categories1001.eqiad.wmnet
19:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
19:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
19:51 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.categories-reload (exit_code=99) reloading categories to wdqs-categories1001.eqiad.wmnet
19:50 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs-categories1001.eqiad.wmnet
19:49 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.categories-reload (exit_code=99) reloading categories to wdqs-categories1001.eqiad.wmnet
19:48 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs-categories1001.eqiad.wmnet
19:42 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host aqs1022.eqiad.wmnet with OS bullseye
19:36 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.categories-reload (exit_code=99) reloading categories to wdqs-categories1001.eqiad.wmnet
19:35 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs-categories1001.eqiad.wmnet
19:28 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@a3efe93] (wcqs): Deploy 0.3.148 to WCQS (duration: 03m 02s)
19:25 ryankemper@deploy2002: Started deploy [wdqs/wdqs@a3efe93] (wcqs): Deploy 0.3.148 to WCQS
19:25 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
19:25 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
19:22 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@a3efe93]: 0.3.148 (duration: 08m 42s)
19:18 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
19:18 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
19:16 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
19:14 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.148` on canary `wdqs1016`; proceeding to rest of fleet
19:14 ryankemper@deploy2002: Started deploy [wdqs/wdqs@a3efe93]: 0.3.148
19:13 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.148`. Pre-deploy tests passing on canary `wdqs1016`
19:09 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
19:09 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
19:05 dduvall@deploy2002: Installing scap version "4.109.0" for 210 hosts
18:51 cmooney@cumin1002: conftool action : set/pooled=yes; selector: name=dns1005.wikimedia.org [reason: testing T344171]
18:43 xcollazo@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
18:43 xcollazo@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
18:31 cstone: SmashPig upgraded from df2a9c42 to eaa176f7
18:28 sukhe: depool dns1005 for all services for testing T344171
18:00 mutante: codesearch - ran out of disk due to 11G /var/log/account/pacct file - manually ran /etc/cron.daily/acct to rotate it, then deleted old file, back to 39% disk usage
17:41 mutante: codesearch was broken - VM was down - rebooted - restarting all the indices is a bit slow but mostly back up now
17:13 swfrench@deploy2002: Finished scap sync-world: Testing after mediawiki-deployments.yaml format change - T370934 (duration: 02m 50s)
17:11 swfrench@deploy2002: Started scap sync-world: Testing after mediawiki-deployments.yaml format change - T370934
15:58 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T364077, testing new flag; this should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
15:53 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 59.75.192.10.in-addr.arpa on all recursors
15:53 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache 59.75.192.10.in-addr.arpa on all recursors
15:53 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, testing new flag; this should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
15:52 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new flag; this should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
15:52 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, testing new flag; this should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
15:51 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
15:51 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
15:50 topranks: merging patch to add k8s pod IP range reverse delegations to dns T376291
15:47 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
15:47 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
15:46 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
15:46 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
15:46 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
15:45 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
15:45 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
15:45 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
15:36 papaul: Junos upgrade on mr1-codfw complete
15:00 papaul: ongoing Junos upgrade on mr1-codfw
14:56 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@b715af7]: Deploy latest DAGs to the analytics Airflow instance. T373694. T375402 (duration: 03m 33s)
14:52 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@b715af7]: Deploy latest DAGs to the analytics Airflow instance. T373694. T375402
14:31 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host aqs1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:31 jclark@cumin1002: START - Cookbook sre.hosts.provision for host aqs1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:30 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host aqs1022
14:29 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host aqs1022
14:29 jclark@cumin1002: END (ERROR) - Cookbook sre.network.configure-switch-interfaces (exit_code=97) for host aqs1022
14:28 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host aqs1022
14:28 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:28 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt aqs1022 - jclark@cumin1002"
14:26 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt aqs1022 - jclark@cumin1002"
14:23 jclark@cumin1002: START - Cookbook sre.dns.netbox
13:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
13:54 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
13:46 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2004.wikimedia.org
13:42 elukey@cumin1002: START - Cookbook sre.hosts.reboot-single for host irc2004.wikimedia.org
13:40 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc2004.wikimedia.org
13:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host irc2004.wikimedia.org with OS bookworm
13:32 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1176.eqiad.wmnet with OS bullseye
13:31 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1176.eqiad.wmnet with OS bullseye
13:30 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1176.eqiad.wmnet with OS bullseye
13:26 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on irc2004.wikimedia.org with reason: host reimage
13:23 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on irc2004.wikimedia.org with reason: host reimage
13:10 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host irc2004.wikimedia.org with OS bookworm
13:09 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM irc2004.wikimedia.org - elukey@cumin1002"
13:09 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM irc2004.wikimedia.org - elukey@cumin1002"
13:09 elukey@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) irc2004.wikimedia.org on all recursors
13:09 elukey@cumin1002: START - Cookbook sre.dns.wipe-cache irc2004.wikimedia.org on all recursors
13:09 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:09 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc2004.wikimedia.org - elukey@cumin1002"
13:08 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc2004.wikimedia.org - elukey@cumin1002"
13:00 elukey@cumin1002: START - Cookbook sre.dns.netbox
13:00 elukey@cumin1002: START - Cookbook sre.ganeti.makevm for new host irc2004.wikimedia.org
12:20 urbanecm@deploy2002: Finished scap sync-world: Backport for ReassignMenteesJob: Do not schedule follow-up jobs when first job fails (T376124) (duration: 06m 47s)
12:14 urbanecm@deploy2002: Started scap sync-world: Backport for ReassignMenteesJob: Do not schedule follow-up jobs when first job fails (T376124)
12:13 urbanecm@deploy2002: scap failed: <UnboundLocalError> local variable 'e' referenced before assignment (scap version: 4.108.0-1) (duration: 08m 02s)
12:13 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
12:09 elukey@cumin1002: START - Cookbook sre.hosts.provision for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
12:05 urbanecm@deploy2002: Started scap sync-world: Backport for ReassignMenteesJob: Do not schedule follow-up jobs when first job fails (T376124)
12:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-hd2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
12:02 elukey@cumin1002: START - Cookbook sre.hosts.provision for host logging-hd2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
11:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 (T367856)', diff saved to https://phabricator.wikimedia.org/P69458 and previous config saved to /var/cache/conftool/dbconfig/20241003-111544-ladsgroup.json
11:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
11:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
11:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T367856)', diff saved to https://phabricator.wikimedia.org/P69457 and previous config saved to /var/cache/conftool/dbconfig/20241003-111522-ladsgroup.json
11:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P69456 and previous config saved to /var/cache/conftool/dbconfig/20241003-110015-ladsgroup.json
10:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P69454 and previous config saved to /var/cache/conftool/dbconfig/20241003-104508-ladsgroup.json
10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T367856)', diff saved to https://phabricator.wikimedia.org/P69453 and previous config saved to /var/cache/conftool/dbconfig/20241003-103001-ladsgroup.json
10:29 urbanecm@deploy2002: Finished scap sync-world: Backport for Backport ReassignMenteesJob-related changes (T376124) (duration: 06m 54s)
10:29 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
10:25 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
10:22 urbanecm@deploy2002: Started scap sync-world: Backport for Backport ReassignMenteesJob-related changes (T376124)
10:11 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
10:08 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
10:06 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
10:06 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
10:04 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM irc1004.wikimedia.org
10:00 kcvelaga@deploy2002: Finished deploy [airflow-dags/analytics_product@b715af7]: T375153 (duration: 02m 44s)
10:00 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM irc1004.wikimedia.org
09:58 kcvelaga@deploy2002: Started deploy [airflow-dags/analytics_product@b715af7]: T375153
09:42 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
09:41 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
09:38 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
09:38 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
09:35 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
09:35 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
08:36 hashar@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.43.0-wmf.25 refs T375656
08:25 hashar@deploy2002: Finished scap sync-world: Backport for Deprecate ParserOutput::setLanguageLinks(null) (T376323) (duration: 07m 07s)
08:20 hashar@deploy2002: hashar, cscott: Continuing with sync
08:20 hashar@deploy2002: hashar, cscott: Backport for Deprecate ParserOutput::setLanguageLinks(null) (T376323) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:18 hashar@deploy2002: Started scap sync-world: Backport for Deprecate ParserOutput::setLanguageLinks(null) (T376323)
08:14 hashar@deploy2002: Finished scap sync-world: Backport for bjnwiki: Update logo (T375055), bjnwiktionary: Add logo (T374898) (duration: 08m 37s)
08:09 hashar@deploy2002: hashar, hamishz: Continuing with sync
08:07 hashar@deploy2002: hashar, hamishz: Backport for bjnwiki: Update logo (T375055), bjnwiktionary: Add logo (T374898) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:05 hashar@deploy2002: Started scap sync-world: Backport for bjnwiki: Update logo (T375055), bjnwiktionary: Add logo (T374898)
08:03 hashar: Ran `mwscript resetAuthenticationThrottle.php --signup --ip 14.139.82.6` for `metawiki`, `mediawikiwiki` and `wikidatawiki` # T375794
07:59 hashar@deploy2002: Finished scap sync-world: Backport for throttle.php: Remove expired throttle, IP limit exemption for WTS 2024 (T375794) (duration: 08m 41s)
07:54 hashar@deploy2002: anzx, hamishz, hashar: Continuing with sync
07:53 hashar@deploy2002: anzx, hamishz, hashar: Backport for throttle.php: Remove expired throttle, IP limit exemption for WTS 2024 (T375794) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:50 hashar@deploy2002: Started scap sync-world: Backport for throttle.php: Remove expired throttle, IP limit exemption for WTS 2024 (T375794)
07:17 kartik@deploy2002: Finished scap sync-world: Backport for Section Translation: Add mos, kde and rsk Wikipedias (T375017 T374815 T374644) (duration: 10m 39s)
07:12 kartik@deploy2002: kartik: Continuing with sync
07:08 kartik@deploy2002: kartik: Backport for Section Translation: Add mos, kde and rsk Wikipedias (T375017 T374815 T374644) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:06 kartik@deploy2002: Started scap sync-world: Backport for Section Translation: Add mos, kde and rsk Wikipedias (T375017 T374815 T374644)
06:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
06:50 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply

2024-10-02

23:47 urbanecm@deploy2002: Finished scap sync-world: Backport for Revert "logging: Enable logging for debug GrowthExperiments events" (T376124) (duration: 07m 07s)
23:39 urbanecm@deploy2002: Started scap sync-world: Backport for Revert "logging: Enable logging for debug GrowthExperiments events" (T376124)
22:35 urbanecm@deploy2002: Finished scap sync-world: Backport for logging: Enable logging for debug GrowthExperiments events (T376124) (duration: 06m 52s)
22:28 urbanecm@deploy2002: Started scap sync-world: Backport for logging: Enable logging for debug GrowthExperiments events (T376124)
21:55 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs-categories1001.eqiad.wmnet with reason: T375687
21:54 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs-categories1001.eqiad.wmnet with reason: T375687
21:24 mutante: phab1004 - link=$(/usr/bin/readlink -f /srv/phab) ; /usr/bin/git config -f /etc/gitconfig.d/10-phab-deploy-safedir.gitconfig --add safe.directory $link ; /bin/cat /etc/gitconfig.d/*.gitconfig > /etc/gitconfig - T360756
20:57 eileen: civicrm upgraded from 28fd5e3b to 90199f62
20:01 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-misc1001.eqiad.wmnet with OS bookworm
20:01 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
20:00 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
19:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-misc1002.eqiad.wmnet with OS bookworm
19:58 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
19:57 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
19:45 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-misc1001.eqiad.wmnet with reason: host reimage
19:42 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-misc1002.eqiad.wmnet with reason: host reimage
19:38 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-misc1001.eqiad.wmnet with reason: host reimage
19:38 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-misc1002.eqiad.wmnet with reason: host reimage
19:27 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-misc1002.eqiad.wmnet with OS bookworm
19:26 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-misc1001.eqiad.wmnet with OS bookworm
19:23 cstone: SmashPig upgraded from 715e91fa to df2a9c42
19:21 brett: cumin -b11 "A:cp" "run-puppet-agent --enable 'rolling out 1038884'"
19:16 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
19:15 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
19:13 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp4041.ulsfo.wmnet
19:06 brett@cumin2002: conftool action : set/pooled=no; selector: name=cp4041.ulsfo.wmnet
18:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudlb2004-dev']
18:23 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
18:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
18:21 denisse@deploy2002: Finished deploy [librenms/librenms@f049593]: Upgrade LibreNMS to 24.9.1 - T376256 (duration: 00m 12s)
18:21 denisse@deploy2002: Started deploy [librenms/librenms@f049593]: Upgrade LibreNMS to 24.9.1 - T376256
18:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
18:10 hashar@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.43.0-wmf.25 refs T375656
18:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
18:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
17:22 aokoth@cumin1002: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet
17:20 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet
17:02 aokoth@cumin1002: END (FAIL) - Cookbook sre.vrts.upgrade (exit_code=93) on VRTS host vrts1003.eqiad.wmnet
17:02 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet
17:01 btullis@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1019.eqiad.wmnet
17:00 urbanecm@deploy2002: Finished scap sync-world: Backport for ReassignMentees: Add additional logging (T376124), ReassignMentees: Add additional logging (T376124) (duration: 14m 42s)
16:58 btullis@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1019.eqiad.wmnet
16:56 urbanecm@deploy2002: urbanecm: Continuing with sync
16:50 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts alert[1001,2001].wikimedia.org
16:50 denisse@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:50 denisse@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: alert[1001,2001].wikimedia.org decommissioned, removing all IPs except the asset tag one - denisse@cumin2002"
16:49 denisse@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: alert[1001,2001].wikimedia.org decommissioned, removing all IPs except the asset tag one - denisse@cumin2002"
16:48 urbanecm@deploy2002: urbanecm: Backport for ReassignMentees: Add additional logging (T376124), ReassignMentees: Add additional logging (T376124) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:46 denisse@cumin2002: START - Cookbook sre.dns.netbox
16:46 urbanecm@deploy2002: Started scap sync-world: Backport for ReassignMentees: Add additional logging (T376124), ReassignMentees: Add additional logging (T376124)
16:38 denisse@cumin2002: START - Cookbook sre.hosts.decommission for hosts alert[1001,2001].wikimedia.org
16:33 taavi: start extensions/GlobalUsage/maintenance/refreshGlobalimagelinks.php on labswiki to backfill global usage information
16:31 taavi@deploy2002: Finished scap sync-world: Backport for Add wikitech.wikimedia.org to $wgCrossSiteAJAXdomains, logging: Remove unused global $wmgMonologProcessors, Remove references to removed wikitech.php (duration: 07m 13s)
16:31 btullis@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
16:27 denisse@cumin2002: START - Cookbook sre.hosts.decommission for hosts alert[1001,2001].wikimedia.org
16:27 denisse: Running the sre.hosts.decommission cookbook on the alert1001, and alert2001 hosts - T372607
16:27 taavi@deploy2002: matmarex, taavi: Continuing with sync
16:26 taavi@deploy2002: matmarex, taavi: Backport for Add wikitech.wikimedia.org to $wgCrossSiteAJAXdomains, logging: Remove unused global $wmgMonologProcessors, Remove references to removed wikitech.php synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:24 taavi@deploy2002: Started scap sync-world: Backport for Add wikitech.wikimedia.org to $wgCrossSiteAJAXdomains, logging: Remove unused global $wmgMonologProcessors, Remove references to removed wikitech.php
16:16 taavi@deploy2002: Finished scap sync-world: Backport for reverse-proxy: Drop all public ips except cloudweb2002-dev.codfw.wmnet (T292707) (duration: 07m 01s)
16:11 taavi@deploy2002: zabe, taavi: Continuing with sync
16:11 taavi@deploy2002: zabe, taavi: Backport for reverse-proxy: Drop all public ips except cloudweb2002-dev.codfw.wmnet (T292707) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:09 taavi@deploy2002: Started scap sync-world: Backport for reverse-proxy: Drop all public ips except cloudweb2002-dev.codfw.wmnet (T292707)
16:03 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views
16:03 bking@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host wdqs-categories1001.eqiad.wmnet
16:03 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs-categories1001.eqiad.wmnet with OS bullseye
15:46 jelto@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
15:45 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
15:43 cdanis@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
15:43 cdanis@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
15:41 jelto@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
15:41 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
15:38 cdanis@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
15:38 cdanis@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
15:37 cdanis@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
15:36 cdanis@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
15:36 cdanis@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
15:36 cdanis@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
15:36 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:35 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
15:35 cdanis@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
15:34 cdanis@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
15:33 cdanis@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:33 cdanis@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
15:31 cdanis@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
15:31 cdanis@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
15:30 kcvelaga@deploy2002: Finished deploy [airflow-dags/analytics_product@3a7901e]: T375153 (duration: 01m 59s)
15:28 swfrench@cumin1002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None
15:28 swfrench@cumin1002: START - Cookbook sre.discovery.datacenter status all services in all: None - None
15:28 kcvelaga@deploy2002: Started deploy [airflow-dags/analytics_product@3a7901e]: T375153
15:27 swfrench@cumin1002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in eqiad: Datacenter Switchover - T370962
15:26 dancy@deploy2002: Finished scap sync-world: Testing T370934 (duration: 03m 19s)
15:24 jelto@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
15:23 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
15:22 dancy@deploy2002: Started scap sync-world: Testing T370934
15:18 dancy@deploy2002: Installation of scap version "4.108.0" completed for 210 hosts
15:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on registry1004.eqiad.wmnet with reason: testing
15:14 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on registry1004.eqiad.wmnet with reason: testing
15:13 dancy@deploy2002: Installing scap version "4.108.0" for 210 hosts
15:12 cdanis@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
15:12 cdanis@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
15:07 swfrench@cumin1002: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: Datacenter Switchover - T370962
15:07 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-hd2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:04 elukey@cumin1002: START - Cookbook sre.hosts.provision for host logging-hd2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:00 swfrench@cumin1002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None
15:00 swfrench@cumin1002: START - Cookbook sre.discovery.datacenter status all services in all: None - None
14:59 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:56 elukey@cumin1002: START - Cookbook sre.hosts.provision for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:51 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs-categories1001.eqiad.wmnet with OS bullseye
14:46 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM wdqs-categories1001.eqiad.wmnet - bking@cumin2002"
14:46 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM wdqs-categories1001.eqiad.wmnet - bking@cumin2002"
14:45 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs-categories1001.eqiad.wmnet on all recursors
14:45 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs-categories1001.eqiad.wmnet on all recursors
14:45 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:45 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM wdqs-categories1001.eqiad.wmnet - bking@cumin2002"
14:44 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM wdqs-categories1001.eqiad.wmnet - bking@cumin2002"
14:40 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc1004.wikimedia.org
14:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host irc1004.wikimedia.org with OS bookworm
14:30 bking@cumin2002: START - Cookbook sre.dns.netbox
14:30 bking@cumin2002: START - Cookbook sre.ganeti.makevm for new host wdqs-categories1001.eqiad.wmnet
14:29 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2001.codfw.wmnet with OS bookworm
14:26 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on irc1004.wikimedia.org with reason: host reimage
14:22 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on irc1004.wikimedia.org with reason: host reimage
14:21 urbanecm@deploy2002: Finished scap sync-world: Backport for labswiki: Disallow account autocreation (T161859) (duration: 07m 38s)
14:17 urbanecm@deploy2002: urbanecm: Continuing with sync
14:16 urbanecm@deploy2002: urbanecm: Backport for labswiki: Disallow account autocreation (T161859) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:14 urbanecm@deploy2002: Started scap sync-world: Backport for labswiki: Disallow account autocreation (T161859)
14:12 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host irc1004.wikimedia.org with OS bookworm
14:11 hashar@deploy2002: Finished scap sync-world: Backport for Remove Maintenance check (T376255) (duration: 07m 27s)
14:08 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM irc1004.wikimedia.org - elukey@cumin1002"
14:08 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM irc1004.wikimedia.org - elukey@cumin1002"
14:07 elukey@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) irc1004.wikimedia.org on all recursors
14:07 elukey@cumin1002: START - Cookbook sre.dns.wipe-cache irc1004.wikimedia.org on all recursors
14:07 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:07 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc1004.wikimedia.org - elukey@cumin1002"
14:07 hashar@deploy2002: hashar: Continuing with sync
14:06 hashar@deploy2002: hashar: Backport for Remove Maintenance check (T376255) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:06 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc1004.wikimedia.org - elukey@cumin1002"
14:04 hashar@deploy2002: Started scap sync-world: Backport for Remove Maintenance check (T376255)
14:03 hashar@deploy2002: Sync cancelled.
14:03 hashar@deploy2002: hashar: Backport for Remove Maintenance check (T376255) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:03 elukey@cumin1002: START - Cookbook sre.dns.netbox
14:03 elukey@cumin1002: START - Cookbook sre.ganeti.makevm for new host irc1004.wikimedia.org
14:01 hashar@deploy2002: Started scap sync-world: Backport for Remove Maintenance check (T376255)
13:31 Lucas_WMDE: UTC afternoon backport+config window done
13:28 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Improve sub-ref check to avoid false positives (T376242) (duration: 10m 32s)
13:24 lucaswerkmeister-wmde@deploy2002: wmde-fisch, lucaswerkmeister-wmde: Continuing with sync
13:20 lucaswerkmeister-wmde@deploy2002: wmde-fisch, lucaswerkmeister-wmde: Backport for Improve sub-ref check to avoid false positives (T376242) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:18 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Improve sub-ref check to avoid false positives (T376242)
13:17 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [zhwiki] Enable the CampaignEvents extension (T373821) (duration: 14m 45s)
13:16 moritzm: upload ircstream 0.13.0~dev+wmf1 to apt.wikimedia.org bookworm/ircstream-sse component (seperate build using the experimental eventstream feature branch of ircstream) T376014
13:13 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
13:12 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, daimona: Continuing with sync
13:09 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
13:05 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, daimona: Backport for [zhwiki] Enable the CampaignEvents extension (T373821) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:02 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [zhwiki] Enable the CampaignEvents extension (T373821)
12:59 moritzm: upload python3-aiohttp-sse-client 0.2.1-0 to apt.wikimedia.org bookworm/ircstream-sse component (needed by the eventstream feature branch of ircstream) T376014
12:57 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: UEFI test
12:57 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: UEFI test
12:49 hashar@deploy2002: Finished scap sync-world: Backport for Use wgDonationInterfaceFundraiserMaintenance (T376255) (duration: 07m 01s)
12:45 hashar@deploy2002: hashar, zabe: Continuing with sync
12:45 hashar@deploy2002: hashar, zabe: Backport for Use wgDonationInterfaceFundraiserMaintenance (T376255) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:42 hashar@deploy2002: Started scap sync-world: Backport for Use wgDonationInterfaceFundraiserMaintenance (T376255)
12:39 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2001.codfw.wmnet with reason: host reimage
12:35 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2001.codfw.wmnet with reason: host reimage
12:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:14 zabe@deploy2002: Finished scap sync-world: Backport for s6: Reduce revision-slots cache expiry to 60s (T183490 T376129) (duration: 08m 50s)
12:13 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestage2001.codfw.wmnet with OS bookworm
12:12 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:11 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0)
12:11 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:09 zabe@deploy2002: zabe: Continuing with sync
12:09 zabe@deploy2002: zabe: Backport for s6: Reduce revision-slots cache expiry to 60s (T183490 T376129) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:08 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage2001.codfw.wmnet
12:08 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage2001.codfw.wmnet
12:08 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2001.codfw.wmnet
12:08 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2001.codfw.wmnet
12:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:06 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views
12:06 btullis@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93)
12:06 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views
12:05 zabe@deploy2002: Started scap sync-world: Backport for s6: Reduce revision-slots cache expiry to 60s (T183490 T376129)
12:04 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:03 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
11:58 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
11:57 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
11:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
11:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
10:57 _joe_: restarted rsyslog on kubernetes1045
10:46 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-etcd1005.eqiad.wmnet
10:46 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-etcd1005.eqiad.wmnet with OS bullseye
10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-etcd1005.eqiad.wmnet with reason: host reimage
10:27 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-etcd1005.eqiad.wmnet with reason: host reimage
10:17 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-etcd1005.eqiad.wmnet with OS bullseye
10:13 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd1005.eqiad.wmnet - elukey@cumin1002"
10:13 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd1005.eqiad.wmnet - elukey@cumin1002"
10:13 elukey@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-etcd1005.eqiad.wmnet on all recursors
10:13 elukey@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-etcd1005.eqiad.wmnet on all recursors
10:13 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:13 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd1005.eqiad.wmnet - elukey@cumin1002"
10:11 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd1005.eqiad.wmnet - elukey@cumin1002"
10:04 elukey@cumin1002: START - Cookbook sre.dns.netbox
10:04 elukey@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-etcd1005.eqiad.wmnet
10:03 elukey@deploy2002: Finished scap sync-world: Backport for Add irc2003 to the irc settings (T376014) (duration: 07m 11s)
10:03 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-etcd1004.eqiad.wmnet
10:03 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-etcd1004.eqiad.wmnet with OS bullseye
09:59 elukey@deploy2002: elukey: Continuing with sync
09:58 elukey@deploy2002: elukey: Backport for Add irc2003 to the irc settings (T376014) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:56 elukey@deploy2002: Started scap sync-world: Backport for Add irc2003 to the irc settings (T376014)
09:54 elukey@deploy2002: Finished scap sync-world: Add irc2003 to the network policies (duration: 02m 15s)
09:53 elukey@deploy2002: Started scap sync-world: Add irc2003 to the network policies
09:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-etcd1004.eqiad.wmnet with reason: host reimage
09:47 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-etcd1004.eqiad.wmnet with reason: host reimage
09:44 gmodena@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
09:44 gmodena@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
09:43 gmodena@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
09:43 gmodena@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
09:42 gmodena@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
09:42 gmodena@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
09:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-etcd1004.eqiad.wmnet with OS bullseye
09:31 hashar@deploy2002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to [php-1.43.0-wmf.24]" - T375656
09:30 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation/Advancement/Community Growth/Community Resources" "Wikimedia Foundation/Advancement/Community Growth/Community Resources and Partnerships" "Zabe" --reason "per request T376246"
09:23 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd1004.eqiad.wmnet - elukey@cumin1002"
09:23 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd1004.eqiad.wmnet - elukey@cumin1002"
09:22 elukey@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-etcd1004.eqiad.wmnet on all recursors
09:22 elukey@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-etcd1004.eqiad.wmnet on all recursors
09:22 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:22 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd1004.eqiad.wmnet - elukey@cumin1002"
09:21 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd1004.eqiad.wmnet - elukey@cumin1002"
09:17 elukey@cumin1002: START - Cookbook sre.dns.netbox
09:17 jynus@cumin1002: dbctl commit (dc=all): 'Set es2024 to weight 10 as the rest of es-rw hosts T376249', diff saved to https://phabricator.wikimedia.org/P69443 and previous config saved to /var/cache/conftool/dbconfig/20241002-091754-jynus.json
09:17 elukey@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-etcd1004.eqiad.wmnet
09:16 elukey@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-ctrl1004.eqiad.wmnet
09:16 elukey@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
09:16 elukey@cumin1002: START - Cookbook sre.dns.netbox
09:16 elukey@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-ctrl1004.eqiad.wmnet
09:13 vgutierrez: repooling cp3071 and cp3072 after HW maintenance - T374986
09:08 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp[3071-3072].esams.wmnet
09:08 vgutierrez@cumin1002: START - Cookbook sre.hosts.remove-downtime for cp[3071-3072].esams.wmnet
09:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2003.wikimedia.org
08:57 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host aux-k8s-ctrl1001.eqiad.wmnet
08:57 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host aux-k8s-ctrl1001.eqiad.wmnet
08:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2003.wikimedia.org
08:57 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host aux-k8s-worker1001.eqiad.wmnet
08:55 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host aux-k8s-worker1001.eqiad.wmnet
08:55 kcvelaga@deploy2002: Finished deploy [airflow-dags/analytics_product@3b76c68]: (no justification provided) (duration: 00m 52s)
08:54 kcvelaga@deploy2002: Started deploy [airflow-dags/analytics_product@3b76c68]: (no justification provided)
08:36 jayme: removed the label node-role.kubernetes.io/master and the taint node-role.kubernetes.io/master:NoSchedule to all k8s apiservers - T334234
08:32 jayme: added the taint node-role.kubernetes.io/control-plane:NoSchedule to all k8s apiservers - T334234
08:29 hashar: Restarted stashbot based on instructions at https://wikitech.wikimedia.org/wiki/Tool:Stashbot
08:20 hashar@deploy2002: Finished scap sync-world: Backport for Metrics Platform monotable: Base stream configuration (T373967) (duration: 10m 27s)
08:16 hashar@deploy2002: hashar, sfaci: Continuing with sync
08:12 hashar@deploy2002: hashar, sfaci: Backport for Metrics Platform monotable: Base stream configuration (T373967) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:10 hashar@deploy2002: Started scap sync-world: Backport for Metrics Platform monotable: Base stream configuration (T373967)
07:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet
07:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
07:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org
07:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org
07:09 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp[3071-3072].esams.wmnet with reason: HW maintenance
07:09 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp[3071-3072].esams.wmnet with reason: HW maintenance
06:50 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AndyRussG out of all services on: 1497 hosts
06:49 root@cumin2002: START - Cookbook sre.idm.logout Logging AndyRussG out of all services on: 1497 hosts
06:48 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AndyRussG out of all services on: 706 hosts
06:48 root@cumin2002: START - Cookbook sre.idm.logout Logging AndyRussG out of all services on: 706 hosts
02:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2004-dev.codfw.wmnet with OS bookworm
01:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
01:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host logging-hd2005.codfw.wmnet with OS bookworm

2024-10-01

23:42 zabe: zabe@mwmaint2002:~$ cat /home/zabe/s3.txt | xargs -I{} bash -c "echo {}; mwscript extensions/WikimediaMaintenance/migrateESRefToContentTable.php {} --skip /home/zabe/text_table_cleanup/{} --dump /home/zabe/text_table_dump/{} --sleep 1" # T183490
20:34 hashar: UTC late backport window completed
20:28 hashar: mwscript purgeList.php --wiki=tlywiki --namespace=4 # T367009
20:12 hashar@deploy2002: Finished scap sync-world: Backport for Update wgMetaNamespace for tlywiki (T367009) (duration: 07m 21s)
20:07 hashar@deploy2002: nmw03, hashar: Continuing with sync
20:06 hashar@deploy2002: nmw03, hashar: Backport for Update wgMetaNamespace for tlywiki (T367009) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:04 hashar@deploy2002: Started scap sync-world: Backport for Update wgMetaNamespace for tlywiki (T367009)
20:02 hashar: Restarting CI Jenkins
19:48 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
19:47 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
17:59 ladsgroup@deploy2002: Finished scap sync-world: Backport for Allow storing of passwords for local users in wikitech (T376140) (duration: 09m 03s)
17:56 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
17:55 ladsgroup@deploy2002: ladsgroup: Continuing with sync
17:55 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
17:53 ladsgroup@deploy2002: ladsgroup: Backport for Allow storing of passwords for local users in wikitech (T376140) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
17:50 ladsgroup@deploy2002: Started scap sync-world: Backport for Allow storing of passwords for local users in wikitech (T376140)
17:50 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2004-dev.codfw.wmnet with OS bookworm
16:00 ladsgroup@deploy2002: taavi, ladsgroup: Continuing with sync
15:59 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, this test transfer should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
15:58 ladsgroup@deploy2002: taavi, ladsgroup: Backport for Make Wikitech behave a bit more like a SUL wiki (T371374) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:56 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, this test transfer should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
15:55 ladsgroup@deploy2002: Started scap sync-world: Backport for Make Wikitech behave a bit more like a SUL wiki (T371374)
15:54 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, this test transfer should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1023.eqiad.wmnet, repooling both afterwards
15:54 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, this test transfer should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1023.eqiad.wmnet, repooling both afterwards
15:44 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
15:39 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
15:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
15:07 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-ctrl1003.eqiad.wmnet
15:07 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-worker1003.eqiad.wmnet
15:05 brennen@deploy2002: Finished deploy [phabricator/deployment@33a2c8d]: deploy phab1004 for T376149 (duration: 01m 07s)
15:04 brennen@deploy2002: Started deploy [phabricator/deployment@33a2c8d]: deploy phab1004 for T376149
15:03 brennen@deploy2002: Finished deploy [phabricator/deployment@33a2c8d]: test deploy phab2002 for T376149 (duration: 00m 30s)
15:03 jelto@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
15:03 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
15:03 brennen@deploy2002: Started deploy [phabricator/deployment@33a2c8d]: test deploy phab2002 for T376149
15:02 jelto@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
15:02 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
15:02 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
15:01 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
15:01 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
15:01 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
14:45 jayme: added the taint node-role.kubernetes.io/control-plane:NoSchedule to wikikube staging apiservers - T334234
14:32 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logging-hd2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
14:32 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
14:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
14:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host logging-hd2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
14:15 jayme: added the label node-role.kubernetes.io/control-plane= to all k8s apiservers - T334234
14:10 moritzm: installing cups security updates
13:49 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=aux-k8s-worker1003.eqiad.wmnet
13:49 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=aux-k8s-ctrl1003.eqiad.wmnet
13:32 elukey@puppetserver1001: conftool action : set/weight=1; selector: name=aux-k8s-ctrl1003.eqiad.wmnet
13:32 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-ctrl1003.eqiad.wmnet
13:31 elukey@puppetserver1001: conftool action : set/weight=10; selector: name=aux-k8s-worker1003.eqiad.wmnet
13:31 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-worker1003.eqiad.wmnet
13:21 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
12:28 ladsgroup@deploy2002: Finished scap sync-world: Backport for wikitech: Allow 'crats to rename local users (T161859) (duration: 07m 51s)
12:23 ladsgroup@deploy2002: ladsgroup: Continuing with sync
12:23 Amir1: mwscript maintenance/storage/moveToExternal.php --wiki=labswiki --undo /home/ladsgroup/T376129.undo.sql DB cluster31 (T376129)
12:22 ladsgroup@deploy2002: ladsgroup: Backport for wikitech: Allow 'crats to rename local users (T161859) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:20 ladsgroup@deploy2002: Started scap sync-world: Backport for wikitech: Allow 'crats to rename local users (T161859)
12:17 ladsgroup@deploy2002: Finished scap sync-world: Backport for Wikitech: Connect wikitech to external storage (T376129) (duration: 09m 53s)
12:12 ladsgroup@deploy2002: ladsgroup: Continuing with sync
12:09 ladsgroup@deploy2002: ladsgroup: Backport for Wikitech: Connect wikitech to external storage (T376129) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:07 ladsgroup@deploy2002: Started scap sync-world: Backport for Wikitech: Connect wikitech to external storage (T376129)
12:02 ladsgroup@deploy2002: Finished scap sync-world: Backport for wikitech: Soft connect wikitech to SUL (T161859) (duration: 09m 53s)
11:57 ladsgroup@deploy2002: ladsgroup: Continuing with sync
11:54 ladsgroup@deploy2002: ladsgroup: Backport for wikitech: Soft connect wikitech to SUL (T161859) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:52 ladsgroup@deploy2002: Started scap sync-world: Backport for wikitech: Soft connect wikitech to SUL (T161859)
11:51 stevemunene@cumin1002: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
11:49 ladsgroup@deploy2002: Finished scap sync-world: Backport for Drop wikitech.php (T371592 T371374) (duration: 07m 32s)
11:45 ladsgroup@deploy2002: ladsgroup: Continuing with sync
11:44 ladsgroup@deploy2002: ladsgroup: Backport for Drop wikitech.php (T371592 T371374) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:42 ladsgroup@deploy2002: Started scap sync-world: Backport for Drop wikitech.php (T371592 T371374)
11:28 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc2003.wikimedia.org
11:28 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host irc2003.wikimedia.org with OS bookworm
11:16 effie: Switching wikitech to k8s - T292707
11:12 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on irc2003.wikimedia.org with reason: host reimage
11:09 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on irc2003.wikimedia.org with reason: host reimage
11:01 jiji@deploy2002: Finished scap sync-world: Backport for wikitech: de-wikitech mediawiki-config (T371537 T371592 T371374 T371359) (duration: 08m 23s)
10:56 jiji@deploy2002: jiji: Continuing with sync
10:55 jiji@deploy2002: jiji: Backport for wikitech: de-wikitech mediawiki-config (T371537 T371592 T371374 T371359) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:52 jiji@deploy2002: Started scap sync-world: Backport for wikitech: de-wikitech mediawiki-config (T371537 T371592 T371374 T371359)
10:48 jiji@deploy2002: Sync cancelled.
10:44 jiji@deploy2002: jiji: Backport for wikitech: de-wikitech mediawiki-config (T371537 T371592 T371374 T371359) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:44 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-staging2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:44 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ml-staging2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:42 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2011.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:42 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ml-serve2011.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:42 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:42 jiji@deploy2002: Started scap sync-world: Backport for wikitech: de-wikitech mediawiki-config (T371537 T371592 T371374 T371359)
10:41 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ml-serve2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:40 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ml-serve2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:38 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host parsoidtest1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
10:38 elukey@cumin2002: START - Cookbook sre.hosts.provision for host parsoidtest1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
10:36 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host deploy1003.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
10:36 elukey@cumin2002: START - Cookbook sre.hosts.provision for host deploy1003.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
10:35 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host krb1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
10:35 elukey@cumin2002: START - Cookbook sre.hosts.provision for host krb1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
10:33 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2035.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:33 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2035.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2035.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2035.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:26 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2008.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:26 elukey@cumin2002: START - Cookbook sre.hosts.provision for host dbproxy2008.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:25 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2007.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:25 elukey@cumin2002: START - Cookbook sre.hosts.provision for host dbproxy2007.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:24 elukey@cumin2002: START - Cookbook sre.hosts.provision for host dbproxy2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:23 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:23 elukey@cumin2002: START - Cookbook sre.hosts.provision for host dbproxy2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1029.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
10:21 elukey@cumin2002: START - Cookbook sre.hosts.provision for host dbproxy1029.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
10:17 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
10:17 elukey@cumin2002: START - Cookbook sre.hosts.provision for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host irc2003.wikimedia.org with OS bookworm
10:15 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
10:15 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM irc2003.wikimedia.org - elukey@cumin1002"
10:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM irc2003.wikimedia.org - elukey@cumin1002"
10:15 elukey@cumin2002: START - Cookbook sre.hosts.provision for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
10:15 elukey@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) irc2003.wikimedia.org on all recursors
10:15 elukey@cumin1002: START - Cookbook sre.dns.wipe-cache irc2003.wikimedia.org on all recursors
10:15 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:15 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc2003.wikimedia.org - elukey@cumin1002"
10:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc2003.wikimedia.org - elukey@cumin1002"
10:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1005.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
10:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host an-conf1005.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
10:11 elukey@cumin1002: START - Cookbook sre.dns.netbox
10:11 elukey@cumin1002: START - Cookbook sre.ganeti.makevm for new host irc2003.wikimedia.org
10:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
10:06 elukey@cumin2002: START - Cookbook sre.hosts.provision for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
10:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org
10:02 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
10:01 elukey@cumin2002: START - Cookbook sre.hosts.provision for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
10:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org
09:59 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
09:57 elukey@cumin2002: START - Cookbook sre.hosts.provision for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
09:24 jmm@deploy2002: Finished scap sync-world: Backport for Remove irc1001/irc2001 from mediawiki-config and add irc1003 (T331702 T376014) (duration: 08m 07s)
09:19 jmm@deploy2002: jmm: Continuing with sync
09:19 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: cloudvirt1063 needs maintenance T375223
09:18 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: cloudvirt1063 needs maintenance T375223
09:18 jmm@deploy2002: jmm: Backport for Remove irc1001/irc2001 from mediawiki-config and add irc1003 (T331702 T376014) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:16 jmm@deploy2002: Started scap sync-world: Backport for Remove irc1001/irc2001 from mediawiki-config and add irc1003 (T331702 T376014)
09:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 (T367856)', diff saved to https://phabricator.wikimedia.org/P69437 and previous config saved to /var/cache/conftool/dbconfig/20241001-090708-ladsgroup.json
09:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
09:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
09:06 ladsgroup@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
09:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
09:06 ladsgroup@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
09:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
08:58 hashar@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.25 refs T375656
08:46 urbanecm@deploy2002: Finished scap sync-world: Backport for DatabaseMentorStore: Cast user IDs to integers before looking them up (T375784) (duration: 06m 58s)
08:39 urbanecm@deploy2002: Started scap sync-world: Backport for DatabaseMentorStore: Cast user IDs to integers before looking them up (T375784)
07:58 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: T375382
07:54 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: T375382
07:43 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1246.eqiad.wmnet with reason: T374215
07:39 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1246.eqiad.wmnet with reason: T374215
07:34 kartik@deploy2002: Finished scap sync-world: Backport for Add namespace aliases for scn.wikipedia (T375979) (duration: 10m 05s)
07:30 kartik@deploy2002: kartik, melos: Continuing with sync
07:26 kartik@deploy2002: kartik, melos: Backport for Add namespace aliases for scn.wikipedia (T375979) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:24 kartik@deploy2002: Started scap sync-world: Backport for Add namespace aliases for scn.wikipedia (T375979)
07:21 kartik@deploy2002: Finished scap sync-world: Backport for Enable translation settings banner for Test wikipedia (T372460) (duration: 18m 15s)
07:14 kartik@deploy2002: kartik, abi: Continuing with sync
07:09 kartik@deploy2002: kartik, abi: Backport for Enable translation settings banner for Test wikipedia (T372460) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:03 kartik@deploy2002: Started scap sync-world: Backport for Enable translation settings banner for Test wikipedia (T372460)
06:47 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Luke Bowmaker out of all services on: 705 hosts
06:47 root@cumin2002: START - Cookbook sre.idm.logout Logging Luke Bowmaker out of all services on: 705 hosts
06:47 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Luke Bowmaker out of all services on: 1497 hosts
06:46 root@cumin2002: START - Cookbook sre.idm.logout Logging Luke Bowmaker out of all services on: 1497 hosts
06:44 XioNoX: cr3-ulsfo> request vmhost snapshot - T375345
04:01 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.22 (duration: 00m 58s)
03:51 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.43.0-wmf.25 refs T375656 (duration: 48m 36s)
03:02 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.43.0-wmf.25 refs T375656
02:47 eileen: civicrm upgraded from cf27c789 to 28fd5e3b
02:17 ejegg: email preference center upgraded from 8ff002ef to e88750e6
02:16 ejegg: payments-wiki upgraded from 8d3b8e94 to e88750e6

Other archives

2000s

Archive 1: 2004 Jun - 2004 Sep
Archive 2: 2004 Oct - 2004 Nov
Archive 3: 2004 Dec - 2005 Mar
Archive 4: 2005 Apr - 2005 Jul
Archive 5: 2005 Aug - 2005 Oct, with revision history 2004-06-23 to 2005-11-25
Archive 6: 2005 Nov - 2006 Feb
Archive 7: 2006 Mar - 2006 Jun
Archive 8: 2006 Jul - 2006 Sep
Archive 9: 2006 Oct - 2007 Jan, with revision history 2005-11-25 to 2007-02-21
Archive 10: 2007 Feb - 2007 Jun
Archive 11: 2007 Jul - 2007 Dec
Archive 12: 2008 Jan - 2008 Jul
Archive 12a: 2008 Aug
Archive 12b: 2008 Sept
Archive 13: 2008 Oct - 2009 Jun
Archive 14: 2009 Jun - 2009 Dec

2010s

2020s