Server Admin Log/Archive 86
Appearance
2024-10-31
- 23:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T376905)', diff saved to https://phabricator.wikimedia.org/P70827 and previous config saved to /var/cache/conftool/dbconfig/20241031-234959-ladsgroup.json
- 23:41 urbanecm: Run extensions/Flow/maintenance/FlowMoveBoardsToSubpages.php for several wikis (T376749; wiki list is on task)
- 23:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T376905)', diff saved to https://phabricator.wikimedia.org/P70809 and previous config saved to /var/cache/conftool/dbconfig/20241031-234030-ladsgroup.json
- 23:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
- 23:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
- 23:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T376905)', diff saved to https://phabricator.wikimedia.org/P70808 and previous config saved to /var/cache/conftool/dbconfig/20241031-234003-ladsgroup.json
- 23:37 swfrench@deploy2002: Finished scap sync-world: Deployment to clear noop chart diff from 1085491 - T372604 T377040 (duration: 01m 49s)
- 23:35 swfrench@deploy2002: Started scap sync-world: Deployment to clear noop chart diff from 1085491 - T372604 T377040
- 23:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P70807 and previous config saved to /var/cache/conftool/dbconfig/20241031-232456-ladsgroup.json
- 23:15 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet']
- 23:13 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet']
- 23:12 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet']
- 23:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P70806 and previous config saved to /var/cache/conftool/dbconfig/20241031-230949-ladsgroup.json
- 22:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T376905)', diff saved to https://phabricator.wikimedia.org/P70805 and previous config saved to /var/cache/conftool/dbconfig/20241031-225442-ladsgroup.json
- 22:48 dancy@deploy2002: Finished scap sync-world: Backport for Dummy commit for testing (duration: 07m 28s)
- 22:46 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1019.eqiad.wmnet with OS bullseye
- 22:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T376905)', diff saved to https://phabricator.wikimedia.org/P70804 and previous config saved to /var/cache/conftool/dbconfig/20241031-224513-ladsgroup.json
- 22:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
- 22:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
- 22:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T376905)', diff saved to https://phabricator.wikimedia.org/P70803 and previous config saved to /var/cache/conftool/dbconfig/20241031-224446-ladsgroup.json
- 22:43 dancy@deploy2002: dancy: Continuing with sync
- 22:43 dancy@deploy2002: dancy: Backport for Dummy commit for testing synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:40 dancy@deploy2002: Started scap sync-world: Backport for Dummy commit for testing
- 22:30 dancy@deploy2002: Installation of scap version "4.119.4" completed for 1 hosts
- 22:29 dancy@deploy2002: Installing scap version "4.119.4" for 1 hosts
- 22:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P70802 and previous config saved to /var/cache/conftool/dbconfig/20241031-222939-ladsgroup.json
- 22:21 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye
- 22:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P70801 and previous config saved to /var/cache/conftool/dbconfig/20241031-221432-ladsgroup.json
- 21:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T376905)', diff saved to https://phabricator.wikimedia.org/P70800 and previous config saved to /var/cache/conftool/dbconfig/20241031-215925-ladsgroup.json
- 21:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T376905)', diff saved to https://phabricator.wikimedia.org/P70799 and previous config saved to /var/cache/conftool/dbconfig/20241031-215056-ladsgroup.json
- 21:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 21:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 21:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
- 21:50 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
- 21:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T376905)', diff saved to https://phabricator.wikimedia.org/P70798 and previous config saved to /var/cache/conftool/dbconfig/20241031-215025-ladsgroup.json
- 21:50 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1019.eqiad.wmnet with OS bullseye
- 21:40 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye
- 21:40 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet']
- 21:37 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet']
- 21:37 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet']
- 21:35 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet']
- 21:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P70797 and previous config saved to /var/cache/conftool/dbconfig/20241031-213518-ladsgroup.json
- 21:35 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet']
- 21:22 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet']
- 21:22 urandom: Bootstrapping Cassandra/aqs1022-b — T378725
- 21:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P70796 and previous config saved to /var/cache/conftool/dbconfig/20241031-212011-ladsgroup.json
- 21:19 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1019.eqiad.wmnet with OS bullseye
- 21:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye
- 21:18 dancy@deploy2002: Installing scap version "4.119.3" for 210 hosts
- 21:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T376905)', diff saved to https://phabricator.wikimedia.org/P70795 and previous config saved to /var/cache/conftool/dbconfig/20241031-210504-ladsgroup.json
- 20:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T376905)', diff saved to https://phabricator.wikimedia.org/P70794 and previous config saved to /var/cache/conftool/dbconfig/20241031-205631-ladsgroup.json
- 20:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
- 20:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
- 20:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T376905)', diff saved to https://phabricator.wikimedia.org/P70793 and previous config saved to /var/cache/conftool/dbconfig/20241031-205604-ladsgroup.json
- 20:55 jsn@deploy2002: Finished scap sync-world: Backport for Translations for configuration for same-user-same-page reverts in Automoderator (T370795), Add follow-up message (T372476) (duration: 27m 10s)
- 20:46 jsn@deploy2002: jsn: Continuing with sync
- 20:46 jsn@deploy2002: jsn: Backport for Translations for configuration for same-user-same-page reverts in Automoderator (T370795), Add follow-up message (T372476) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P70792 and previous config saved to /var/cache/conftool/dbconfig/20241031-204057-ladsgroup.json
- 20:28 jsn@deploy2002: Started scap sync-world: Backport for Translations for configuration for same-user-same-page reverts in Automoderator (T370795), Add follow-up message (T372476)
- 20:25 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
- 20:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P70791 and previous config saved to /var/cache/conftool/dbconfig/20241031-202549-ladsgroup.json
- 20:25 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
- 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
- 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
- 20:15 dancy@deploy2002: Finished scap sync-world: Backport for tcywikisource: fix typo of author namespace (T378555) (duration: 07m 46s)
- 20:10 dancy@deploy2002: dancy, anzx: Continuing with sync
- 20:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T376905)', diff saved to https://phabricator.wikimedia.org/P70790 and previous config saved to /var/cache/conftool/dbconfig/20241031-201042-ladsgroup.json
- 20:10 dancy@deploy2002: dancy, anzx: Backport for tcywikisource: fix typo of author namespace (T378555) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:07 dancy@deploy2002: Started scap sync-world: Backport for tcywikisource: fix typo of author namespace (T378555)
- 20:03 dancy@deploy2002: Installation of scap version "4.119.2" completed for 210 hosts
- 20:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2162 (T376905)', diff saved to https://phabricator.wikimedia.org/P70789 and previous config saved to /var/cache/conftool/dbconfig/20241031-200214-ladsgroup.json
- 20:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
- 20:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
- 20:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T376905)', diff saved to https://phabricator.wikimedia.org/P70788 and previous config saved to /var/cache/conftool/dbconfig/20241031-200148-ladsgroup.json
- 19:58 dancy@deploy2002: Installing scap version "4.119.2" for 210 hosts
- 19:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P70787 and previous config saved to /var/cache/conftool/dbconfig/20241031-194640-ladsgroup.json
- 19:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P70786 and previous config saved to /var/cache/conftool/dbconfig/20241031-193133-ladsgroup.json
- 19:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T376905)', diff saved to https://phabricator.wikimedia.org/P70785 and previous config saved to /var/cache/conftool/dbconfig/20241031-191626-ladsgroup.json
- 19:15 dduvall@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.1 refs T375660
- 19:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2161 (T376905)', diff saved to https://phabricator.wikimedia.org/P70784 and previous config saved to /var/cache/conftool/dbconfig/20241031-190648-ladsgroup.json
- 19:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
- 19:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
- 19:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T376905)', diff saved to https://phabricator.wikimedia.org/P70783 and previous config saved to /var/cache/conftool/dbconfig/20241031-190622-ladsgroup.json
- 19:06 swfrench@deploy2002: Finished scap sync-world: Backport for TimedMediaHandler: revert commonswiki changes due to capacity issues (duration: 07m 38s)
- 19:01 swfrench@deploy2002: swfrench, hnowlan: Continuing with sync
- 19:01 swfrench@deploy2002: swfrench, hnowlan: Backport for TimedMediaHandler: revert commonswiki changes due to capacity issues synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 18:58 swfrench@deploy2002: Started scap sync-world: Backport for TimedMediaHandler: revert commonswiki changes due to capacity issues
- 18:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P70782 and previous config saved to /var/cache/conftool/dbconfig/20241031-185115-ladsgroup.json
- 18:47 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
- 18:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P70781 and previous config saved to /var/cache/conftool/dbconfig/20241031-183608-ladsgroup.json
- 18:26 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
- 18:26 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
- 18:24 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 18:23 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 18:23 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 18:23 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 18:22 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 18:22 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 18:22 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 18:21 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T376905)', diff saved to https://phabricator.wikimedia.org/P70780 and previous config saved to /var/cache/conftool/dbconfig/20241031-182101-ladsgroup.json
- 18:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T376905)', diff saved to https://phabricator.wikimedia.org/P70779 and previous config saved to /var/cache/conftool/dbconfig/20241031-181225-ladsgroup.json
- 18:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
- 18:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
- 18:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T376905)', diff saved to https://phabricator.wikimedia.org/P70778 and previous config saved to /var/cache/conftool/dbconfig/20241031-181158-ladsgroup.json
- 18:05 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
- 17:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P70777 and previous config saved to /var/cache/conftool/dbconfig/20241031-175651-ladsgroup.json
- 17:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P70776 and previous config saved to /var/cache/conftool/dbconfig/20241031-174144-ladsgroup.json
- 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T376905)', diff saved to https://phabricator.wikimedia.org/P70775 and previous config saved to /var/cache/conftool/dbconfig/20241031-172637-ladsgroup.json
- 17:26 volans: uploaded spicerack_8.15.2 to apt.wikimedia.org bullseye-wikimedia
- 17:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T376905)', diff saved to https://phabricator.wikimedia.org/P70774 and previous config saved to /var/cache/conftool/dbconfig/20241031-171824-ladsgroup.json
- 17:18 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
- 17:18 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
- 17:16 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1216.eqiad.wmnet with reason: Maintenance
- 17:16 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1216.eqiad.wmnet with reason: Maintenance
- 17:13 swfrench@deploy2002: Finished scap sync-world: Deployment to pick up PHP version parameterization - T372604 T377040 (duration: 01m 52s)
- 17:11 swfrench@deploy2002: Started scap sync-world: Deployment to pick up PHP version parameterization - T372604 T377040
- 17:01 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye
- 17:00 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1020.eqiad.wmnet with OS bullseye
- 16:57 Emperor: set mgr mgr/prometheus/scrape_interval 15.0 in both apus clusters
- 16:56 urandom: Bootstrapping Cassandra/aqs1022-a — T378725
- 16:52 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1022.eqiad.wmnet with reason: Bootstrapping — T378725
- 16:52 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1022.eqiad.wmnet with reason: Bootstrapping — T378725
- 16:45 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye
- 16:37 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1016.eqiad.wmnet with OS bullseye
- 16:27 taavi@deploy2002: Finished scap sync-world: Backport for Drop 'nonglobal' dblist (duration: 08m 44s)
- 16:23 taavi@deploy2002: taavi: Continuing with sync
- 16:21 taavi@deploy2002: taavi: Backport for Drop 'nonglobal' dblist synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 16:19 taavi@deploy2002: Started scap sync-world: Backport for Drop 'nonglobal' dblist
- 16:16 taavi@deploy2002: Finished scap sync-world: Backport for Drop labtestwiki config (T378260) (duration: 09m 39s)
- 16:12 taavi@deploy2002: taavi: Continuing with sync
- 16:09 taavi@deploy2002: taavi: Backport for Drop labtestwiki config (T378260) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 16:07 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:07 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Additional IPs for Cassandra — aqs1022 - eevans@cumin1002"
- 16:07 taavi@deploy2002: Started scap sync-world: Backport for Drop labtestwiki config (T378260)
- 16:06 ryankemper: [archiva] Freed up space on `archiva1002.wikimedia.org` like so: `sudo rm -rfv /var/cache/archiva/temp* && sudo systemctl restart archiva`. We're down to 31% usage now
- 16:05 arnaudb@cumin1002: dbctl commit (dc=all): 'db1234 (re)pooling @ 100%: post T378267 reclone', diff saved to https://phabricator.wikimedia.org/P70772 and previous config saved to /var/cache/conftool/dbconfig/20241031-160542-arnaudb.json
- 16:04 dancy@deploy2002: scap failed: <CalledProcessError> Command '['sudo', '-u', 'mwbuilder', '-n', '--', '/home/dancy/src/venvs/scap/bin/scap', 'mwshell', '--no-local-config', '--directory', '/srv/mediawiki-staging', '--user', 'www-data', '--', 'rm -f /srv/mediawiki-staging/php-1.43.0-wmf.28/cache/l10n/*.tmp.*']' returned non-zero exit status 1. (scap version: 4.118.0) (duration: 00m 01s)
- 16:04 dancy@deploy2002: Started scap sync-world: Backport for Drop labtestwiki config (T378260)
- 16:03 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Additional IPs for Cassandra — aqs1022 - eevans@cumin1002"
- 15:59 eevans@cumin1002: START - Cookbook sre.dns.netbox
- 15:55 samtar@deploy2002: Finished scap sync-world: Backport for [CommunityRequests] disable wgCommunityRequestsEnable by default (T366194) (duration: 07m 51s)
- 15:50 arnaudb@cumin1002: dbctl commit (dc=all): 'db1234 (re)pooling @ 75%: post T378267 reclone', diff saved to https://phabricator.wikimedia.org/P70770 and previous config saved to /var/cache/conftool/dbconfig/20241031-155037-arnaudb.json
- 15:50 samtar@deploy2002: samtar, musikanimal: Continuing with sync
- 15:49 samtar@deploy2002: samtar, musikanimal: Backport for [CommunityRequests] disable wgCommunityRequestsEnable by default (T366194) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 15:47 samtar@deploy2002: Started scap sync-world: Backport for [CommunityRequests] disable wgCommunityRequestsEnable by default (T366194)
- 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2190']
- 15:44 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2190']
- 15:35 eevans@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 15:35 arnaudb@cumin1002: dbctl commit (dc=all): 'db1234 (re)pooling @ 50%: post T378267 reclone', diff saved to https://phabricator.wikimedia.org/P70769 and previous config saved to /var/cache/conftool/dbconfig/20241031-153531-arnaudb.json
- 15:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db2146 (re)pooling @ 100%: post maintenance', diff saved to https://phabricator.wikimedia.org/P70768 and previous config saved to /var/cache/conftool/dbconfig/20241031-152220-arnaudb.json
- 15:20 arnaudb@cumin1002: dbctl commit (dc=all): 'db1234 (re)pooling @ 25%: post T378267 reclone', diff saved to https://phabricator.wikimedia.org/P70767 and previous config saved to /var/cache/conftool/dbconfig/20241031-152026-arnaudb.json
- 15:15 eevans@cumin1002: START - Cookbook sre.dns.netbox
- 15:08 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 15:08 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 15:07 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add tooltips to expressions - oblivian@cumin1002"
- 15:07 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add tooltips to expressions - oblivian@cumin1002
- 15:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db2146 (re)pooling @ 75%: post maintenance', diff saved to https://phabricator.wikimedia.org/P70766 and previous config saved to /var/cache/conftool/dbconfig/20241031-150714-arnaudb.json
- 15:06 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add tooltips to expressions - oblivian@cumin1002
- 15:06 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add tooltips to expressions - oblivian@cumin1002"
- 15:05 arnaudb@cumin1002: dbctl commit (dc=all): 'db1234 (re)pooling @ 10%: post T378267 reclone', diff saved to https://phabricator.wikimedia.org/P70765 and previous config saved to /var/cache/conftool/dbconfig/20241031-150521-arnaudb.json
- 15:00 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 14:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 14:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db2146 (re)pooling @ 50%: post maintenance', diff saved to https://phabricator.wikimedia.org/P70764 and previous config saved to /var/cache/conftool/dbconfig/20241031-145209-arnaudb.json
- 14:50 arnaudb@cumin1002: dbctl commit (dc=all): 'db1234 (re)pooling @ 5%: post T378267 reclone', diff saved to https://phabricator.wikimedia.org/P70763 and previous config saved to /var/cache/conftool/dbconfig/20241031-145015-arnaudb.json
- 14:49 arnaudb@cumin1002: dbctl commit (dc=all): 'db1232 (re)pooling @ 100%: post db1234.eqiad.wmnet clone', diff saved to https://phabricator.wikimedia.org/P70762 and previous config saved to /var/cache/conftool/dbconfig/20241031-144902-arnaudb.json
- 14:37 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on db2190.codfw.wmnet with reason: host has hardware issues T378628
- 14:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on db2190.codfw.wmnet with reason: host has hardware issues T378628
- 14:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db2146 (re)pooling @ 25%: post maintenance', diff saved to https://phabricator.wikimedia.org/P70761 and previous config saved to /var/cache/conftool/dbconfig/20241031-143704-arnaudb.json
- 14:35 arnaudb@cumin1002: dbctl commit (dc=all): 'db1234 (re)pooling @ 4%: post T378267 reclone', diff saved to https://phabricator.wikimedia.org/P70760 and previous config saved to /var/cache/conftool/dbconfig/20241031-143510-arnaudb.json
- 14:33 arnaudb@cumin1002: dbctl commit (dc=all): 'db1232 (re)pooling @ 75%: post db1234.eqiad.wmnet clone', diff saved to https://phabricator.wikimedia.org/P70759 and previous config saved to /var/cache/conftool/dbconfig/20241031-143356-arnaudb.json
- 14:24 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database tcywikisource (T378469)
- 14:23 fnegri@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database tcywikisource (T378469)
- 14:22 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database tcywiktionary (T378462)
- 14:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db2146 (re)pooling @ 10%: post maintenance', diff saved to https://phabricator.wikimedia.org/P70758 and previous config saved to /var/cache/conftool/dbconfig/20241031-142158-arnaudb.json
- 14:21 fnegri@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database tcywiktionary (T378462)
- 14:21 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database ibawiki (T376571)
- 14:21 fnegri@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database ibawiki (T376571)
- 14:20 arnaudb@cumin1002: dbctl commit (dc=all): 'db1234 (re)pooling @ 2%: post T378267 reclone', diff saved to https://phabricator.wikimedia.org/P70757 and previous config saved to /var/cache/conftool/dbconfig/20241031-142004-arnaudb.json
- 14:19 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database bclwikisource (T377087)
- 14:19 fnegri@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database bclwikisource (T377087)
- 14:18 arnaudb@cumin1002: dbctl commit (dc=all): 'db1232 (re)pooling @ 50%: post db1234.eqiad.wmnet clone', diff saved to https://phabricator.wikimedia.org/P70756 and previous config saved to /var/cache/conftool/dbconfig/20241031-141851-arnaudb.json
- 14:14 sergi0: Running `foreachwiki userOptions.php --delete --old=sectionlevelimages growthexperiments-homepage-variant` (T375753)
- 14:11 sergi0: eswiki, arwiki, cswiki, frwiki running `mwscript userOptions.php --wiki=frwiki --delete-defaults growthexperiments-homepage-variant` (T374664)
- 14:06 arnaudb@cumin1002: dbctl commit (dc=all): 'db2146 (re)pooling @ 5%: post maintenance', diff saved to https://phabricator.wikimedia.org/P70755 and previous config saved to /var/cache/conftool/dbconfig/20241031-140653-arnaudb.json
- 14:05 arnaudb@cumin1002: dbctl commit (dc=all): 'db1234 (re)pooling @ 1%: post T378267 reclone', diff saved to https://phabricator.wikimedia.org/P70754 and previous config saved to /var/cache/conftool/dbconfig/20241031-140459-arnaudb.json
- 14:03 arnaudb@cumin1002: dbctl commit (dc=all): 'db1232 (re)pooling @ 25%: post db1234.eqiad.wmnet clone', diff saved to https://phabricator.wikimedia.org/P70753 and previous config saved to /var/cache/conftool/dbconfig/20241031-140345-arnaudb.json
- 13:50 urbanecm@deploy2002: Finished scap sync-world: Backport for tcywikisource: add logo (T378555) (duration: 08m 56s)
- 13:46 urbanecm@deploy2002: urbanecm, anzx: Continuing with sync
- 13:44 urbanecm@deploy2002: urbanecm, anzx: Backport for tcywikisource: add logo (T378555) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:41 urbanecm@deploy2002: Started scap sync-world: Backport for tcywikisource: add logo (T378555)
- {{safesubst:SAL entry|1=13:38 urbanecm@deploy2002: Finished scap sync-world: Backport for Set username in user mock and reset state after test (T378573), Fix and re-enable selenium test (T378581), Fix selenium test loading the wrong talk page, HomepageHooks: do not store assigned variant on account creation (T377713), [[gerrit:1085347|SpecialHomepage: show community update}}
- 13:34 urbanecm@deploy2002: hnowlan, sgimeno, urbanecm: Continuing with sync
- {{safesubst:SAL entry|1=13:30 urbanecm@deploy2002: hnowlan, sgimeno, urbanecm: Backport for Set username in user mock and reset state after test (T378573), Fix and re-enable selenium test (T378581), Fix selenium test loading the wrong talk page, HomepageHooks: do not store assigned variant on account creation (T377713), [[gerrit:1085347|SpecialHomepage: show community upda}}
- {{safesubst:SAL entry|1=13:28 urbanecm@deploy2002: Started scap sync-world: Backport for Set username in user mock and reset state after test (T378573), Fix and re-enable selenium test (T378581), Fix selenium test loading the wrong talk page, HomepageHooks: do not store assigned variant on account creation (T377713), [[gerrit:1085347|SpecialHomepage: show community update}}
- 13:25 urbanecm@deploy2002: Finished scap sync-world: Backport for tcywikisource: Add namespaces, SITENAME and timezone (T378555), tcywiktionary: add SITENAME and timezone (T378556), tcywiktionary: add logo (T378556) (duration: 09m 39s)
- 13:20 urbanecm@deploy2002: anzx, urbanecm: Continuing with sync
- 13:19 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
- 13:18 urbanecm@deploy2002: anzx, urbanecm: Backport for tcywikisource: Add namespaces, SITENAME and timezone (T378555), tcywiktionary: add SITENAME and timezone (T378556), tcywiktionary: add logo (T378556) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:18 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
- 13:15 urbanecm@deploy2002: Started scap sync-world: Backport for tcywikisource: Add namespaces, SITENAME and timezone (T378555), tcywiktionary: add SITENAME and timezone (T378556), tcywiktionary: add logo (T378556)
- 13:14 urbanecm@deploy2002: Finished scap sync-world: Backport for TimedMediaHandler: use shellbox globally (T357309), Remove RunSingleJobStdin script (T369048) (duration: 09m 43s)
- 13:09 urbanecm@deploy2002: urbanecm, hnowlan: Continuing with sync
- 13:08 urbanecm@deploy2002: urbanecm, hnowlan: Backport for TimedMediaHandler: use shellbox globally (T357309), Remove RunSingleJobStdin script (T369048) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:04 urbanecm@deploy2002: Started scap sync-world: Backport for TimedMediaHandler: use shellbox globally (T357309), Remove RunSingleJobStdin script (T369048)
- 12:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1237 (T376905)', diff saved to https://phabricator.wikimedia.org/P70752 and previous config saved to /var/cache/conftool/dbconfig/20241031-122719-ladsgroup.json
- 12:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1237', diff saved to https://phabricator.wikimedia.org/P70751 and previous config saved to /var/cache/conftool/dbconfig/20241031-121212-ladsgroup.json
- 12:06 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host aux-k8s-worker1002.eqiad.wmnet
- 12:06 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host aux-k8s-worker1002.eqiad.wmnet
- 12:01 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database annwiki (T377118)
- 12:01 fnegri@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database annwiki (T377118)
- 12:01 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database tddwiki (T375016)
- 12:00 fnegri@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database tddwiki (T375016)
- 12:00 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database rskwiki (T375016)
- 11:59 fnegri@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database rskwiki (T375016)
- 11:59 fnegri@cumin1002: END (ERROR) - Cookbook sre.wikireplicas.add-wiki (exit_code=97) for database rskwiki (T375016)
- 11:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1237', diff saved to https://phabricator.wikimedia.org/P70750 and previous config saved to /var/cache/conftool/dbconfig/20241031-115705-ladsgroup.json
- 11:54 fnegri@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database rskwiki (T375016)
- 11:47 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1232.eqiad.wmnet onto db1234.eqiad.wmnet
- 11:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1237 (T376905)', diff saved to https://phabricator.wikimedia.org/P70747 and previous config saved to /var/cache/conftool/dbconfig/20241031-114158-ladsgroup.json
- 11:38 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1002.eqiad.wmnet with OS bookworm
- 11:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1237 (T376905)', diff saved to https://phabricator.wikimedia.org/P70746 and previous config saved to /var/cache/conftool/dbconfig/20241031-113456-ladsgroup.json
- 11:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1237.eqiad.wmnet with reason: Maintenance
- 11:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1237.eqiad.wmnet with reason: Maintenance
- 11:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 11:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 11:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T376905)', diff saved to https://phabricator.wikimedia.org/P70744 and previous config saved to /var/cache/conftool/dbconfig/20241031-112924-ladsgroup.json
- 11:26 fabfur: reverted previous action (T378578)
- 11:20 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1002.eqiad.wmnet with reason: host reimage
- 11:17 fabfur: install haproxykafka on cp4037 and cp3066 (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1085308) (T378578)
- 11:17 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1002.eqiad.wmnet with reason: host reimage
- 11:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P70743 and previous config saved to /var/cache/conftool/dbconfig/20241031-111417-ladsgroup.json
- 11:02 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1002.eqiad.wmnet with OS bookworm
- 11:01 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host aux-k8s-worker1002.eqiad.wmnet
- 11:00 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host aux-k8s-worker1002.eqiad.wmnet
- 10:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P70742 and previous config saved to /var/cache/conftool/dbconfig/20241031-105910-ladsgroup.json
- 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host aux-k8s-worker1002.eqiad.wmnet
- 10:58 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host aux-k8s-worker1002.eqiad.wmnet
- 10:56 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host aux-k8s-ctrl1003.eqiad.wmnet
- 10:56 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host aux-k8s-ctrl1003.eqiad.wmnet
- 10:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db[1232,1234].eqiad.wmnet with reason: hosts in cloning, avoiding alerts
- 10:52 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db[1232,1234].eqiad.wmnet with reason: hosts in cloning, avoiding alerts
- 10:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T376905)', diff saved to https://phabricator.wikimedia.org/P70741 and previous config saved to /var/cache/conftool/dbconfig/20241031-104404-ladsgroup.json
- 10:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1224 (T376905)', diff saved to https://phabricator.wikimedia.org/P70740 and previous config saved to /var/cache/conftool/dbconfig/20241031-103406-ladsgroup.json
- 10:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
- 10:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
- 10:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
- 10:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
- 10:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T376905)', diff saved to https://phabricator.wikimedia.org/P70739 and previous config saved to /var/cache/conftool/dbconfig/20241031-102835-ladsgroup.json
- 10:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P70738 and previous config saved to /var/cache/conftool/dbconfig/20241031-101328-ladsgroup.json
- 10:06 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-ctrl1003.eqiad.wmnet with OS bookworm
- 10:04 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db1232.eqiad.wmnet onto db1234.eqiad.wmnet
- 10:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db1232 in db1234 for T378267', diff saved to https://phabricator.wikimedia.org/P70737 and previous config saved to /var/cache/conftool/dbconfig/20241031-100301-arnaudb.json
- 09:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P70736 and previous config saved to /var/cache/conftool/dbconfig/20241031-095821-ladsgroup.json
- 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-ctrl1003.eqiad.wmnet with reason: host reimage
- 09:47 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-ctrl1003.eqiad.wmnet with reason: host reimage
- 09:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T376905)', diff saved to https://phabricator.wikimedia.org/P70735 and previous config saved to /var/cache/conftool/dbconfig/20241031-094314-ladsgroup.json
- 09:35 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-ctrl1003.eqiad.wmnet with OS bookworm
- 09:35 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host aux-k8s-ctrl1003.eqiad.wmnet
- 09:35 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host aux-k8s-ctrl1003.eqiad.wmnet
- 09:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1179 (T376905)', diff saved to https://phabricator.wikimedia.org/P70734 and previous config saved to /var/cache/conftool/dbconfig/20241031-093446-ladsgroup.json
- 09:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
- 09:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
- 09:34 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-worker1003.eqiad.wmnet
- 09:32 elukey@puppetserver1001: conftool action : set/weight=10; selector: name=aux-k8s-ctrl1003.eqiad.wmnet
- 09:07 fabfur: importing haproxykafka 0.3 package into apt repository (T377613)
- 08:23 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye
- 08:23 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1016.eqiad.wmnet with OS bullseye
- 08:21 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1019.eqiad.wmnet with OS bullseye
- 08:13 ayounsi@cumin1002: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 56258
- 08:12 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 56258
- 08:01 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye
- 04:54 eileen: civicrm upgraded from 0eb881ca to 31f5cbdb
- 01:45 krinkle@deploy2002: Finished deploy [integration/docroot@0b03488]: (no justification provided) (duration: 00m 10s)
- 01:45 krinkle@deploy2002: Started deploy [integration/docroot@0b03488]: (no justification provided)
- 01:42 Krinkle: krinkle@mwmaint2001$ Purge https://doc.wikimedia.org/lib/wmui-page.css via `mwscript extensions/WikimediaMaintenance/purgeUrls.php`, T257188 T378542
- 01:38 krinkle@deploy2002: Finished deploy [integration/docroot@a2c044c]: T378542 (duration: 00m 23s)
- 01:38 krinkle@deploy2002: Started deploy [integration/docroot@a2c044c]: T378542
- 00:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2215 (T376905)', diff saved to https://phabricator.wikimedia.org/P70733 and previous config saved to /var/cache/conftool/dbconfig/20241031-003014-ladsgroup.json
- 00:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2215', diff saved to https://phabricator.wikimedia.org/P70732 and previous config saved to /var/cache/conftool/dbconfig/20241031-001507-ladsgroup.json
- 00:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2215', diff saved to https://phabricator.wikimedia.org/P70731 and previous config saved to /var/cache/conftool/dbconfig/20241031-000000-ladsgroup.json
2024-10-30
- 23:53 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2081.codfw.wmnet with OS bullseye
- 23:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2215 (T376905)', diff saved to https://phabricator.wikimedia.org/P70730 and previous config saved to /var/cache/conftool/dbconfig/20241030-234453-ladsgroup.json
- 23:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye
- 22:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2215 (T376905)', diff saved to https://phabricator.wikimedia.org/P70729 and previous config saved to /var/cache/conftool/dbconfig/20241030-225520-ladsgroup.json
- 22:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2215.codfw.wmnet with reason: Maintenance
- 22:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2215.codfw.wmnet with reason: Maintenance
- 22:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2191 (T376905)', diff saved to https://phabricator.wikimedia.org/P70728 and previous config saved to /var/cache/conftool/dbconfig/20241030-225449-ladsgroup.json
- 22:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2191', diff saved to https://phabricator.wikimedia.org/P70727 and previous config saved to /var/cache/conftool/dbconfig/20241030-223942-ladsgroup.json
- 22:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye
- 22:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
- 22:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2191', diff saved to https://phabricator.wikimedia.org/P70726 and previous config saved to /var/cache/conftool/dbconfig/20241030-222435-ladsgroup.json
- 22:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2191 (T376905)', diff saved to https://phabricator.wikimedia.org/P70725 and previous config saved to /var/cache/conftool/dbconfig/20241030-220928-ladsgroup.json
- 22:03 brett: Running ./redis-check-aof --fix on rdb1014 tcp_6379 instance - T376961
- 21:26 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Fix bug in BlockManager::getUniqueBlocks (T378563) (duration: 07m 22s)
- 21:21 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
- 21:21 dreamyjazz@deploy2002: dreamyjazz: Backport for Fix bug in BlockManager::getUniqueBlocks (T378563) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:18 dreamyjazz@deploy2002: Started scap sync-world: Backport for Fix bug in BlockManager::getUniqueBlocks (T378563)
- 21:17 tgr@deploy2002: Finished scap sync-world: Backport for GrowthExperiments: enable community updates module in pilot wikis (T374664) (duration: 10m 10s)
- 21:12 tgr@deploy2002: tgr, sgimeno: Continuing with sync
- 21:09 tgr@deploy2002: tgr, sgimeno: Backport for GrowthExperiments: enable community updates module in pilot wikis (T374664) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2191 (T376905)', diff saved to https://phabricator.wikimedia.org/P70724 and previous config saved to /var/cache/conftool/dbconfig/20241030-210902-ladsgroup.json
- 21:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2191.codfw.wmnet with reason: Maintenance
- 21:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2191.codfw.wmnet with reason: Maintenance
- 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T376905)', diff saved to https://phabricator.wikimedia.org/P70723 and previous config saved to /var/cache/conftool/dbconfig/20241030-210836-ladsgroup.json
- 21:07 tgr@deploy2002: Started scap sync-world: Backport for GrowthExperiments: enable community updates module in pilot wikis (T374664)
- {{safesubst:SAL entry|1=21:01 tgr@deploy2002: Finished scap sync-world: Backport for Set username in user mock and reset state after test (T378573), Fix and re-enable selenium test (T378581), Fix selenium test loading the wrong talk page, build: Suppress phan issue with null for Message::numParams, [[gerrit:1084181|HomepageHooks: do not store assigned variant on account cr}}
- 20:57 tgr@deploy2002: sgimeno, umherirrender, tgr: Continuing with sync
- 20:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P70722 and previous config saved to /var/cache/conftool/dbconfig/20241030-205329-ladsgroup.json
- 20:51 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 20:51 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- {{safesubst:SAL entry|1=20:45 tgr@deploy2002: sgimeno, umherirrender, tgr: Backport for Set username in user mock and reset state after test (T378573), Fix and re-enable selenium test (T378581), Fix selenium test loading the wrong talk page, build: Suppress phan issue with null for Message::numParams, [[gerrit:1084181|HomepageHooks: do not store assigned variant on account}}
- {{safesubst:SAL entry|1=20:43 tgr@deploy2002: Started scap sync-world: Backport for Set username in user mock and reset state after test (T378573), Fix and re-enable selenium test (T378581), Fix selenium test loading the wrong talk page, build: Suppress phan issue with null for Message::numParams, [[gerrit:1084181|HomepageHooks: do not store assigned variant on account cre}}
- 20:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P70721 and previous config saved to /var/cache/conftool/dbconfig/20241030-203822-ladsgroup.json
- 20:24 tgr@deploy2002: Finished scap sync-world: Backport for Set Flow to read-only on nowiki (T377990) (duration: 13m 21s)
- 20:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T376905)', diff saved to https://phabricator.wikimedia.org/P70720 and previous config saved to /var/cache/conftool/dbconfig/20241030-202315-ladsgroup.json
- 20:20 tgr@deploy2002: esanders, tgr: Continuing with sync
- 20:16 tgr@deploy2002: esanders, tgr: Backport for Set Flow to read-only on nowiki (T377990) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2131 (T376905)', diff saved to https://phabricator.wikimedia.org/P70719 and previous config saved to /var/cache/conftool/dbconfig/20241030-201331-ladsgroup.json
- 20:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2131.codfw.wmnet with reason: Maintenance
- 20:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2131.codfw.wmnet with reason: Maintenance
- 20:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2115 (T376905)', diff saved to https://phabricator.wikimedia.org/P70718 and previous config saved to /var/cache/conftool/dbconfig/20241030-201305-ladsgroup.json
- 20:11 tgr@deploy2002: Started scap sync-world: Backport for Set Flow to read-only on nowiki (T377990)
- 19:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2115', diff saved to https://phabricator.wikimedia.org/P70717 and previous config saved to /var/cache/conftool/dbconfig/20241030-195758-ladsgroup.json
- 19:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2115', diff saved to https://phabricator.wikimedia.org/P70716 and previous config saved to /var/cache/conftool/dbconfig/20241030-194251-ladsgroup.json
- 19:40 swfrench-wmf: all shellbox instances updated to shellbox 2024-10-15-214239 - T375243
- 19:39 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
- 19:39 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
- 19:37 mutante: gitlab - deleting user "jfk" on main server and both replicas T376936
- 19:37 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
- 19:36 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
- 19:36 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
- 19:35 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
- 19:35 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
- 19:34 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
- 19:34 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
- 19:33 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply
- 19:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2115 (T376905)', diff saved to https://phabricator.wikimedia.org/P70715 and previous config saved to /var/cache/conftool/dbconfig/20241030-192744-ladsgroup.json
- 19:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2115 (T376905)', diff saved to https://phabricator.wikimedia.org/P70714 and previous config saved to /var/cache/conftool/dbconfig/20241030-192011-ladsgroup.json
- 19:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2115.codfw.wmnet with reason: Maintenance
- 19:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2115.codfw.wmnet with reason: Maintenance
- 19:17 dduvall@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.1 refs T375660
- 18:40 dduvall@deploy2002: Finished scap sync-world: Backport for Revert "Use array instead of string for class list" (T378531) (duration: 19m 04s)
- 18:39 inflatador: bking@stat1008,stat1009,stat1010.mgmt racadm jobqueue delete -i $job T376813
- 18:36 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database nrwiki (T375101)
- 18:35 dduvall@deploy2002: ammarpad, dduvall: Continuing with sync
- 18:35 dduvall: error is still occurring following backport deployment of https://gerrit.wikimedia.org/r/c/mediawiki/skins/MinervaNeue/+/1084759 (T378531)
- 18:27 dduvall: monitoring testwiki error rates for a few minutes to see if the error related to T378531 subsides (current rate is 23 errors in the last 15 minutes)
- 18:23 dduvall@deploy2002: ammarpad, dduvall: Backport for Revert "Use array instead of string for class list" (T378531) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 18:21 dduvall@deploy2002: Started scap sync-world: Backport for Revert "Use array instead of string for class list" (T378531)
- 18:10 fnegri@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database nrwiki (T375101)
- 17:35 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s3
- 17:35 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s1
- 17:31 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s3
- 17:26 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s1
- 17:24 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1018.eqiad.wmnet,service=s7
- 17:23 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s7
- 17:21 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1019.eqiad.wmnet,service=s6
- 17:21 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1019.eqiad.wmnet,service=s4
- 17:20 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1019.eqiad.wmnet,service=s6
- 17:20 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1019.eqiad.wmnet,service=s4
- 17:19 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1020.eqiad.wmnet,service=s5
- 17:18 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1020.eqiad.wmnet,service=s8
- 17:11 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1020.eqiad.wmnet,service=s8
- 17:11 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1020.eqiad.wmnet,service=s5
- 17:03 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
- 17:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
- 17:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
- 17:00 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
- 16:59 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
- 16:58 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
- 16:58 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
- 16:57 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
- 16:54 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
- 16:53 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
- 16:44 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1017.eqiad.wmnet with OS bullseye
- 16:39 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
- 16:39 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
- 16:39 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
- 16:39 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 16:38 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
- 16:38 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 16:38 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 16:38 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
- 16:38 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
- 16:38 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
- 16:37 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
- 16:37 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
- 16:37 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
- 16:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 16:26 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 16:21 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 16:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 16:11 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye
- 16:09 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Fix bug in BlockManager::getUniqueBlocks (T378563) (duration: 07m 06s)
- 16:08 pfischer@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 16:08 pfischer@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
- 16:07 pfischer@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 16:07 pfischer@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 16:07 pfischer@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 16:06 pfischer@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 16:06 pfischer@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 16:06 pfischer@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 16:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 16:04 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
- 16:04 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 16:04 dreamyjazz@deploy2002: dreamyjazz: Backport for Fix bug in BlockManager::getUniqueBlocks (T378563) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 16:02 pfischer@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 16:02 pfischer@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 16:01 dreamyjazz@deploy2002: Started scap sync-world: Backport for Fix bug in BlockManager::getUniqueBlocks (T378563)
- 16:01 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1017.eqiad.wmnet with reason: host reimage
- 15:59 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 15:57 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1017.eqiad.wmnet with reason: host reimage
- 15:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 15:56 pfischer@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 15:55 pfischer@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
- 15:55 pfischer@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 15:54 pfischer@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
- 15:52 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 15:50 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 15:47 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 15:47 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 15:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 15:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 15:43 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1017.eqiad.wmnet with OS bullseye
- 15:39 moritzm: re-enable Puppet fleet-wide after puppetserver2001 maintenance
- 15:39 moritzm: re-enable Puppet fleet-wide for puppetserver2001 maintenance
- 15:39 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 15:38 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 15:36 ejegg: Standalone SmashPig upgraded from eaa176f7 to be47dddd
- 15:35 pfischer@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 15:35 pfischer@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
- 15:35 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 15:35 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 15:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 15:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 15:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 15:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on puppetserver2001.codfw.wmnet with reason: puppetserver2001 maintenance
- 15:29 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on puppetserver2001.codfw.wmnet with reason: puppetserver2001 maintenance
- 15:27 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 15:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 15:26 moritzm: disable Puppet fleet-wide for puppetserver2001 maintenance
- 15:25 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye
- 15:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 15:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 15:23 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1017.eqiad.wmnet with OS bullseye
- 15:07 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1017.eqiad.wmnet with OS bullseye
- 15:06 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on an-presto1020.eqiad.wmnet with reason: reimaging the hosts to bullseye
- 15:06 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on an-presto1020.eqiad.wmnet with reason: reimaging the hosts to bullseye
- 15:05 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on an-presto[1017-1019].eqiad.wmnet with reason: reimaging the hosts to bullseye
- 15:05 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on an-presto[1017-1019].eqiad.wmnet with reason: reimaging the hosts to bullseye
- 15:02 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1016.eqiad.wmnet with OS bullseye
- 15:01 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host aux-k8s-ctrl1002.eqiad.wmnet
- 15:00 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host aux-k8s-ctrl1002.eqiad.wmnet
- 14:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on puppetserver2003.codfw.wmnet with reason: RAM expansion
- 14:58 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on puppetserver2003.codfw.wmnet with reason: RAM expansion
- 14:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-ctrl1002.eqiad.wmnet with OS bookworm
- 14:56 fabfur: importing haproxykafka 0.2 package into apt repository (T377613)
- 14:43 joal@deploy2002: Finished deploy [airflow-dags/analytics@ec02629]: Regular analytics weekly train SECOND [airflow-dags/analytics@ec02629d] (duration: 00m 55s)
- 14:42 joal@deploy2002: Started deploy [airflow-dags/analytics@ec02629]: Regular analytics weekly train SECOND [airflow-dags/analytics@ec02629d]
- 14:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-ctrl1002.eqiad.wmnet with reason: host reimage
- 14:37 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2016.codfw.wmnet
- 14:37 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
- 14:37 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-ctrl1002.eqiad.wmnet with reason: host reimage
- 14:37 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
- 14:34 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
- 14:34 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
- 14:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2190 (T376905)', diff saved to https://phabricator.wikimedia.org/P70712 and previous config saved to /var/cache/conftool/dbconfig/20241030-143303-ladsgroup.json
- 14:32 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2190.codfw.wmnet with reason: Maintenance
- 14:32 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2190.codfw.wmnet with reason: Maintenance
- 14:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T376905)', diff saved to https://phabricator.wikimedia.org/P70711 and previous config saved to /var/cache/conftool/dbconfig/20241030-143236-ladsgroup.json
- 14:30 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 14:30 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 14:28 dreamyjazz@deploy2002: Finished scap sync-world: Backport for [BlockManager] Don't assume autoblocks have ::getParentBlockId (T378563), [GlobalBlocking] Enable global autoblocks on all WMF wikis (T377760) (duration: 09m 10s)
- 14:23 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
- 14:23 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-ctrl1002.eqiad.wmnet with OS bookworm
- 14:22 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host aux-k8s-ctrl1002.eqiad.wmnet
- 14:22 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host aux-k8s-ctrl1002.eqiad.wmnet
- 14:21 dreamyjazz@deploy2002: dreamyjazz: Backport for [BlockManager] Don't assume autoblocks have ::getParentBlockId (T378563), [GlobalBlocking] Enable global autoblocks on all WMF wikis (T377760) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on puppetserver2002.codfw.wmnet with reason: RAM expansion
- 14:20 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on puppetserver2002.codfw.wmnet with reason: RAM expansion
- 14:19 dreamyjazz@deploy2002: Started scap sync-world: Backport for [BlockManager] Don't assume autoblocks have ::getParentBlockId (T378563), [GlobalBlocking] Enable global autoblocks on all WMF wikis (T377760)
- 14:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P70710 and previous config saved to /var/cache/conftool/dbconfig/20241030-141729-ladsgroup.json
- 14:11 urbanecm: mwmaint2002: kill all running instances of `refreshLinkRecommendations.php` (T377150)
- 14:06 urbanecm@deploy2002: Finished scap sync-world: Backport for [BlockManager] Don't assume autoblocks have ::getParentBlockId (T378563), CirrusSearch: Enable offloading weighted tags via EventBus (T377150), cswiki: Add celebration logo (T378597) (duration: 15m 30s)
- 14:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P70709 and previous config saved to /var/cache/conftool/dbconfig/20241030-140222-ladsgroup.json
- 14:01 urbanecm@deploy2002: dreamyjazz, pfischer, urbanecm: Continuing with sync
- 13:58 joal@deploy2002: Finished deploy [airflow-dags/analytics@ec4746b]: Regular analytics weekly train [airflow-dags/analytics@ec4746b5] (duration: 00m 41s)
- 13:57 joal@deploy2002: Started deploy [airflow-dags/analytics@ec4746b]: Regular analytics weekly train [airflow-dags/analytics@ec4746b5]
- 13:53 urbanecm@deploy2002: dreamyjazz, pfischer, urbanecm: Backport for [BlockManager] Don't assume autoblocks have ::getParentBlockId (T378563), CirrusSearch: Enable offloading weighted tags via EventBus (T377150), cswiki: Add celebration logo (T378597) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:50 urbanecm@deploy2002: Started scap sync-world: Backport for [BlockManager] Don't assume autoblocks have ::getParentBlockId (T378563), CirrusSearch: Enable offloading weighted tags via EventBus (T377150), cswiki: Add celebration logo (T378597)
- 13:48 urbanecm@deploy2002: Finished scap sync-world: Backport for Growth [test2wiki]: enable community updates module (T376952), [Growth] beta: configure the A/B test experiment variants (T377233) (duration: 29m 00s)
- 13:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T376905)', diff saved to https://phabricator.wikimedia.org/P70707 and previous config saved to /var/cache/conftool/dbconfig/20241030-134715-ladsgroup.json
- 13:43 urbanecm@deploy2002: sgimeno, urbanecm: Continuing with sync
- 13:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P70704 and previous config saved to /var/cache/conftool/dbconfig/20241030-132204-ladsgroup.json
- 13:20 moritzm: upgrade PHP 7.4 on mwdebug* to 1:7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 T378173
- 13:19 urbanecm@deploy2002: Started scap sync-world: Backport for Growth [test2wiki]: enable community updates module (T376952), [Growth] beta: configure the A/B test experiment variants (T377233)
- 13:18 andrewtavis-wmde@deploy2002: Finished deploy [airflow-dags/wmde@ec4746b]: (no justification provided) (duration: 00m 07s)
- 13:18 andrewtavis-wmde@deploy2002: Started deploy [airflow-dags/wmde@ec4746b]: (no justification provided)
- 13:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P70703 and previous config saved to /var/cache/conftool/dbconfig/20241030-130657-ladsgroup.json
- 12:55 andrewtavis-wmde@deploy2002: Finished deploy [airflow-dags/wmde@ec4746b]: (no justification provided) (duration: 00m 11s)
- 12:54 andrewtavis-wmde@deploy2002: Started deploy [airflow-dags/wmde@ec4746b]: (no justification provided)
- 12:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T376905)', diff saved to https://phabricator.wikimedia.org/P70702 and previous config saved to /var/cache/conftool/dbconfig/20241030-125150-ladsgroup.json
- 12:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2156 (T376905)', diff saved to https://phabricator.wikimedia.org/P70701 and previous config saved to /var/cache/conftool/dbconfig/20241030-124316-ladsgroup.json
- 12:43 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 12:43 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 12:43 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
- 12:43 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
- 12:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T376905)', diff saved to https://phabricator.wikimedia.org/P70700 and previous config saved to /var/cache/conftool/dbconfig/20241030-124256-ladsgroup.json
- 12:30 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Handle a missing parent block in GlobalBlockLookup::getUserBlock (T378447), Handle a missing parent block in GlobalBlockLookup::getUserBlock (T378447), globalblocks API: Hide autoblocks when target param has username and IP (T377855) (duration: 10m 28s)
- 12:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P70699 and previous config saved to /var/cache/conftool/dbconfig/20241030-122749-ladsgroup.json
- 12:25 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
- 12:22 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
- 12:22 dreamyjazz@deploy2002: dreamyjazz: Backport for Handle a missing parent block in GlobalBlockLookup::getUserBlock (T378447), Handle a missing parent block in GlobalBlockLookup::getUserBlock (T378447), globalblocks API: Hide autoblocks when target param has username and IP (T377855) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 12:22 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
- 12:21 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
- 12:21 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
- 12:20 dreamyjazz@deploy2002: Started scap sync-world: Backport for Handle a missing parent block in GlobalBlockLookup::getUserBlock (T378447), Handle a missing parent block in GlobalBlockLookup::getUserBlock (T378447), globalblocks API: Hide autoblocks when target param has username and IP (T377855)
- 12:19 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
- 12:19 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
- 12:18 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
- 12:17 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
- 12:17 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
- 12:16 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
- 12:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P70698 and previous config saved to /var/cache/conftool/dbconfig/20241030-121242-ladsgroup.json
- 12:12 moritzm: installing podman security updates
- 12:11 joal@deploy2002: Finished deploy [analytics/refinery@0855ce2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0855ce28] (duration: 03m 41s)
- 12:07 joal@deploy2002: Started deploy [analytics/refinery@0855ce2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0855ce28]
- 12:04 joal@deploy2002: Finished deploy [analytics/refinery@0855ce2] (thin): Regular analytics weekly train THIN [analytics/refinery@0855ce28] (duration: 06m 54s)
- 11:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T376905)', diff saved to https://phabricator.wikimedia.org/P70697 and previous config saved to /var/cache/conftool/dbconfig/20241030-115735-ladsgroup.json
- 11:57 joal@deploy2002: Started deploy [analytics/refinery@0855ce2] (thin): Regular analytics weekly train THIN [analytics/refinery@0855ce28]
- 11:55 joal@deploy2002: Finished deploy [analytics/refinery@0855ce2]: Regular analytics weekly train [analytics/refinery@0855ce28] (duration: 08m 14s)
- 11:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2149 (T376905)', diff saved to https://phabricator.wikimedia.org/P70696 and previous config saved to /var/cache/conftool/dbconfig/20241030-114808-ladsgroup.json
- 11:48 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
- 11:47 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
- 11:47 joal@deploy2002: Started deploy [analytics/refinery@0855ce2]: Regular analytics weekly train [analytics/refinery@0855ce28]
- 11:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1216.eqiad.wmnet with reason: Maintenance
- 11:43 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1216.eqiad.wmnet with reason: Maintenance
- 11:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
- 11:41 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
- 11:39 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2016.codfw.wmnet
- 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd2003.codfw.wmnet to plain
- 11:38 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd2003.codfw.wmnet to plain
- 11:38 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1011.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2016.codfw.wmnet
- 11:37 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2016.codfw.wmnet
- 11:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd2003.codfw.wmnet to drbd
- 11:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve1011.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:28 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1010.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:26 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd2003.codfw.wmnet to drbd
- 11:23 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve1010.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2016.codfw.wmnet
- 11:19 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1009.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:19 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2016.codfw.wmnet
- 11:19 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye
- 11:17 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1016.eqiad.wmnet with OS bullseye
- 11:14 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve1009.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:09 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 11:09 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 11:06 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye
- 11:02 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2044.codfw.wmnet to cluster codfw and group D
- 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2044.codfw.wmnet to cluster codfw and group D
- 10:40 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 16347
- 10:40 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 16347
- 10:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 16347
- 10:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 16347
- 10:32 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 852
- 10:32 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 852
- 10:31 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 14593
- 10:29 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 14593
- 10:21 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6461
- 10:18 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 6461
- 10:04 moritzm: installing python-idna security updates
- 09:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P70694 and previous config saved to /var/cache/conftool/dbconfig/20241030-095904-arnaudb.json
- 09:50 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling reboot on A:docker-registry
- 09:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P70693 and previous config saved to /var/cache/conftool/dbconfig/20241030-094357-arnaudb.json
- 09:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 40676
- 09:40 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 40676
- 09:38 fabfur: importing haproxykafka package into apt repository (T377613)
- 09:33 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling reboot on A:docker-registry
- 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2044.codfw.wmnet
- 09:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P70692 and previous config saved to /var/cache/conftool/dbconfig/20241030-092850-arnaudb.json
- 09:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testreduce1002.eqiad.wmnet
- 09:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2044.codfw.wmnet
- 09:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testreduce1002.eqiad.wmnet
- 09:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P70691 and previous config saved to /var/cache/conftool/dbconfig/20241030-091343-arnaudb.json
- 09:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P70690 and previous config saved to /var/cache/conftool/dbconfig/20241030-091131-arnaudb.json
- 09:11 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
- 09:11 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
- 09:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70689 and previous config saved to /var/cache/conftool/dbconfig/20241030-091108-arnaudb.json
- 09:08 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2043.codfw.wmnet to cluster codfw and group D
- 09:07 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2043.codfw.wmnet to cluster codfw and group D
- 09:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2043.codfw.wmnet
- 09:00 arnaudb@cumin1002: dbctl commit (dc=all): 'db2223 (re)pooling @ 100%: post clone repool', diff saved to https://phabricator.wikimedia.org/P70688 and previous config saved to /var/cache/conftool/dbconfig/20241030-090002-arnaudb.json
- 08:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2043.codfw.wmnet
- 08:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70687 and previous config saved to /var/cache/conftool/dbconfig/20241030-085601-arnaudb.json
- 08:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db2223 (re)pooling @ 75%: post clone repool', diff saved to https://phabricator.wikimedia.org/P70685 and previous config saved to /var/cache/conftool/dbconfig/20241030-084457-arnaudb.json
- 08:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70684 and previous config saved to /var/cache/conftool/dbconfig/20241030-084054-arnaudb.json
- 08:29 arnaudb@cumin1002: dbctl commit (dc=all): 'db2223 (re)pooling @ 50%: post clone repool', diff saved to https://phabricator.wikimedia.org/P70683 and previous config saved to /var/cache/conftool/dbconfig/20241030-082952-arnaudb.json
- 08:28 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2239.codfw.wmnet with reason: host in preparation
- 08:28 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2239.codfw.wmnet with reason: host in preparation
- 08:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70682 and previous config saved to /var/cache/conftool/dbconfig/20241030-082547-arnaudb.json
- 08:14 arnaudb@cumin1002: dbctl commit (dc=all): 'db2223 (re)pooling @ 25%: post clone repool', diff saved to https://phabricator.wikimedia.org/P70680 and previous config saved to /var/cache/conftool/dbconfig/20241030-081446-arnaudb.json
- 07:59 arnaudb@cumin1002: dbctl commit (dc=all): 'db2223 (re)pooling @ 10%: post clone repool', diff saved to https://phabricator.wikimedia.org/P70678 and previous config saved to /var/cache/conftool/dbconfig/20241030-075941-arnaudb.json
- 07:57 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 07:52 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 07:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db2223 (re)pooling @ 5%: post clone repool', diff saved to https://phabricator.wikimedia.org/P70677 and previous config saved to /var/cache/conftool/dbconfig/20241030-074436-arnaudb.json
- 07:29 arnaudb@cumin1002: dbctl commit (dc=all): 'db2223 (re)pooling @ 4%: post clone repool', diff saved to https://phabricator.wikimedia.org/P70676 and previous config saved to /var/cache/conftool/dbconfig/20241030-072930-arnaudb.json
- 07:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70675 and previous config saved to /var/cache/conftool/dbconfig/20241030-072520-arnaudb.json
- 07:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 07:25 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 07:24 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
- 07:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
- 07:14 arnaudb@cumin1002: dbctl commit (dc=all): 'db2223 (re)pooling @ 2%: post clone repool', diff saved to https://phabricator.wikimedia.org/P70674 and previous config saved to /var/cache/conftool/dbconfig/20241030-071425-arnaudb.json
- 06:59 arnaudb@cumin1002: dbctl commit (dc=all): 'db2223 (re)pooling @ 1%: post clone repool', diff saved to https://phabricator.wikimedia.org/P70673 and previous config saved to /var/cache/conftool/dbconfig/20241030-065920-arnaudb.json
- 06:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-pii (exit_code=0) Managing PII for wikis tcywikisource, tcywiktionary in section s5
- 06:47 arnaudb@cumin1002: START - Cookbook sre.mysql.sanitize-pii Managing PII for wikis tcywikisource, tcywiktionary in section s5
- 06:47 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-pii (exit_code=0) Checking PII for wikis tcywikisource in section s5
- 06:46 arnaudb@cumin1002: START - Cookbook sre.mysql.sanitize-pii Checking PII for wikis tcywikisource in section s5
- 00:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T376905)', diff saved to https://phabricator.wikimedia.org/P70672 and previous config saved to /var/cache/conftool/dbconfig/20241030-003847-ladsgroup.json
- 00:28 zabe@deploy2002: Finished scap sync-world: update interwiki cache (duration: 09m 01s)
- 00:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P70671 and previous config saved to /var/cache/conftool/dbconfig/20241030-002340-ladsgroup.json
- 00:19 zabe@deploy2002: Started scap sync-world: update interwiki cache
- 00:14 zabe: zabe@mwmaint2002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=tcywikisource --cluster=all 2>&1 | tee /tmp/tcywikisource.UpdateSearchIndexConfig.log # T377919
- 00:11 zabe@deploy2002: Finished scap sync-world: Creating tcywikisource (T377919) (duration: 08m 13s)
- 00:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P70670 and previous config saved to /var/cache/conftool/dbconfig/20241030-000833-ladsgroup.json
- 00:03 zabe@deploy2002: Started scap sync-world: Creating tcywikisource (T377919)
2024-10-29
- 23:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T376905)', diff saved to https://phabricator.wikimedia.org/P70669 and previous config saved to /var/cache/conftool/dbconfig/20241029-235326-ladsgroup.json
- 23:53 zabe: zabe@mwmaint2002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=tcywiktionary --cluster=all 2>&1 | tee /tmp/tcywiktionary.UpdateSearchIndexConfig.log # T377922
- 23:48 zabe@deploy2002: Finished scap sync-world: Creating tcywiktionary (T377922) (duration: 07m 26s)
- 23:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T376905)', diff saved to https://phabricator.wikimedia.org/P70668 and previous config saved to /var/cache/conftool/dbconfig/20241029-234608-ladsgroup.json
- 23:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2216.codfw.wmnet with reason: Maintenance
- 23:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2216.codfw.wmnet with reason: Maintenance
- 23:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T376905)', diff saved to https://phabricator.wikimedia.org/P70667 and previous config saved to /var/cache/conftool/dbconfig/20241029-234541-ladsgroup.json
- 23:41 zabe@deploy2002: Started scap sync-world: Creating tcywiktionary (T377922)
- 23:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P70666 and previous config saved to /var/cache/conftool/dbconfig/20241029-233034-ladsgroup.json
- 23:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P70665 and previous config saved to /var/cache/conftool/dbconfig/20241029-231527-ladsgroup.json
- 23:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T376905)', diff saved to https://phabricator.wikimedia.org/P70664 and previous config saved to /var/cache/conftool/dbconfig/20241029-230020-ladsgroup.json
- 22:47 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2202.codfw.wmnet with reason: Maintenance
- 22:47 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2202.codfw.wmnet with reason: Maintenance
- 22:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T376905)', diff saved to https://phabricator.wikimedia.org/P70662 and previous config saved to /var/cache/conftool/dbconfig/20241029-224717-ladsgroup.json
- 22:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P70661 and previous config saved to /var/cache/conftool/dbconfig/20241029-223210-ladsgroup.json
- 22:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P70660 and previous config saved to /var/cache/conftool/dbconfig/20241029-221703-ladsgroup.json
- 22:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T376905)', diff saved to https://phabricator.wikimedia.org/P70659 and previous config saved to /var/cache/conftool/dbconfig/20241029-220156-ladsgroup.json
- 21:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T376905)', diff saved to https://phabricator.wikimedia.org/P70658 and previous config saved to /var/cache/conftool/dbconfig/20241029-215443-ladsgroup.json
- 21:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
- 21:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
- 21:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T376905)', diff saved to https://phabricator.wikimedia.org/P70657 and previous config saved to /var/cache/conftool/dbconfig/20241029-215417-ladsgroup.json
- 21:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P70656 and previous config saved to /var/cache/conftool/dbconfig/20241029-213910-ladsgroup.json
- 21:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P70655 and previous config saved to /var/cache/conftool/dbconfig/20241029-212402-ladsgroup.json
- 21:09 eileen: civicrm upgraded from 0b7f3b47 to 0eb881ca
- 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T376905)', diff saved to https://phabricator.wikimedia.org/P70654 and previous config saved to /var/cache/conftool/dbconfig/20241029-210855-ladsgroup.json
- 20:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T376905)', diff saved to https://phabricator.wikimedia.org/P70653 and previous config saved to /var/cache/conftool/dbconfig/20241029-205718-ladsgroup.json
- 20:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
- 20:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
- 20:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T376905)', diff saved to https://phabricator.wikimedia.org/P70652 and previous config saved to /var/cache/conftool/dbconfig/20241029-205652-ladsgroup.json
- 20:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P70651 and previous config saved to /var/cache/conftool/dbconfig/20241029-204145-ladsgroup.json
- 20:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P70650 and previous config saved to /var/cache/conftool/dbconfig/20241029-202638-ladsgroup.json
- 20:14 kostajh: UTC late deploys done
- 20:12 kharlan@deploy2002: Finished scap sync-world: Backport for QuickSurveys: Undeploy safety survey (T376517), Missing.php: redirect wikisources to localized main page (duration: 09m 16s)
- 20:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T376905)', diff saved to https://phabricator.wikimedia.org/P70649 and previous config saved to /var/cache/conftool/dbconfig/20241029-201131-ladsgroup.json
- 20:08 kharlan@deploy2002: pppery, kharlan: Continuing with sync
- 20:05 kharlan@deploy2002: pppery, kharlan: Backport for QuickSurveys: Undeploy safety survey (T376517), Missing.php: redirect wikisources to localized main page synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:03 kharlan@deploy2002: Started scap sync-world: Backport for QuickSurveys: Undeploy safety survey (T376517), Missing.php: redirect wikisources to localized main page
- 20:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T376905)', diff saved to https://phabricator.wikimedia.org/P70648 and previous config saved to /var/cache/conftool/dbconfig/20241029-200056-ladsgroup.json
- 20:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
- 20:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
- 20:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T376905)', diff saved to https://phabricator.wikimedia.org/P70647 and previous config saved to /var/cache/conftool/dbconfig/20241029-200029-ladsgroup.json
- 19:56 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on an-worker1165.eqiad.wmnet with reason: T378454
- 19:55 bking@cumin2002: START - Cookbook sre.hosts.downtime for 6 days, 0:00:00 on an-worker1165.eqiad.wmnet with reason: T378454
- 19:48 eileen: civicrm upgraded from 8f5c8b33 to 0b7f3b47
- 19:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P70646 and previous config saved to /var/cache/conftool/dbconfig/20241029-194522-ladsgroup.json
- 19:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P70645 and previous config saved to /var/cache/conftool/dbconfig/20241029-193015-ladsgroup.json
- 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T376905)', diff saved to https://phabricator.wikimedia.org/P70644 and previous config saved to /var/cache/conftool/dbconfig/20241029-191508-ladsgroup.json
- 19:05 eileen: civicrm upgraded from 1c6c4e08 to 8f5c8b33
- 19:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T376905)', diff saved to https://phabricator.wikimedia.org/P70643 and previous config saved to /var/cache/conftool/dbconfig/20241029-190442-ladsgroup.json
- 19:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 19:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 19:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
- 19:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
- 19:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T376905)', diff saved to https://phabricator.wikimedia.org/P70642 and previous config saved to /var/cache/conftool/dbconfig/20241029-190359-ladsgroup.json
- 18:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P70641 and previous config saved to /var/cache/conftool/dbconfig/20241029-184852-ladsgroup.json
- 18:37 swfrench-wmf: shellbox-syntaxhighlight updated to shellbox 2024-10-15-214239 - T375243
- 18:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P70640 and previous config saved to /var/cache/conftool/dbconfig/20241029-183345-ladsgroup.json
- 18:32 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 18:31 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 18:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T376905)', diff saved to https://phabricator.wikimedia.org/P70639 and previous config saved to /var/cache/conftool/dbconfig/20241029-181838-ladsgroup.json
- 18:10 dduvall@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.1 refs T375660
- 18:10 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 18:09 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 18:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T376905)', diff saved to https://phabricator.wikimedia.org/P70638 and previous config saved to /var/cache/conftool/dbconfig/20241029-180816-ladsgroup.json
- 18:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
- 18:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
- 18:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T376905)', diff saved to https://phabricator.wikimedia.org/P70637 and previous config saved to /var/cache/conftool/dbconfig/20241029-180750-ladsgroup.json
- 17:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P70636 and previous config saved to /var/cache/conftool/dbconfig/20241029-175243-ladsgroup.json
- 17:51 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 17:50 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 17:49 brett: Remove RSA cert support from Icinga, librenms (T375569)
- 17:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P70635 and previous config saved to /var/cache/conftool/dbconfig/20241029-173735-ladsgroup.json
- 17:37 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
- 17:36 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
- 17:32 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
- 17:31 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
- 17:30 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
- 17:30 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
- 17:29 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
- 17:29 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply
- 17:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T376905)', diff saved to https://phabricator.wikimedia.org/P70634 and previous config saved to /var/cache/conftool/dbconfig/20241029-172228-ladsgroup.json
- 17:17 sergi0: Running `foreachwiki userOptions.php --delete --old=A --old=D --old=C --old=null --old=imagerecommendation --old=linkrecommendation growthexperiments-homepage-variant`
- 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T376905)', diff saved to https://phabricator.wikimedia.org/P70633 and previous config saved to /var/cache/conftool/dbconfig/20241029-171258-ladsgroup.json
- 17:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
- 17:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
- 17:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
- 17:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
- 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T376905)', diff saved to https://phabricator.wikimedia.org/P70632 and previous config saved to /var/cache/conftool/dbconfig/20241029-170657-ladsgroup.json
- 17:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 17:00 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 16:58 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
- 16:58 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
- 16:57 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
- 16:56 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
- 16:55 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
- 16:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 16:54 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
- 16:54 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
- 16:53 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
- 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P70631 and previous config saved to /var/cache/conftool/dbconfig/20241029-165150-ladsgroup.json
- 16:49 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 16:47 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 16:47 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1016.eqiad.wmnet with OS bullseye
- 16:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 16:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 16:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P70630 and previous config saved to /var/cache/conftool/dbconfig/20241029-163643-ladsgroup.json
- 16:35 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 16:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 16:26 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 16:26 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:26 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:25 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2041.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T376905)', diff saved to https://phabricator.wikimedia.org/P70629 and previous config saved to /var/cache/conftool/dbconfig/20241029-162136-ladsgroup.json
- 16:21 rzl@deploy2002: Finished scap sync-world: 1079056 T376923 (duration: 11m 47s)
- 16:19 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti2041.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2044.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:16 rzl@deploy2002: rzl: Continuing with sync
- 16:16 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 16:15 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 16:14 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti2044.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:13 rzl@deploy2002: rzl: 1079056 T376923 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 16:12 rzl@deploy2002: Started scap sync-world: 1079056 T376923
- 16:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T376905)', diff saved to https://phabricator.wikimedia.org/P70627 and previous config saved to /var/cache/conftool/dbconfig/20241029-161103-ladsgroup.json
- 16:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
- 16:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
- 16:07 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2043.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
- 16:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
- 16:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T376905)', diff saved to https://phabricator.wikimedia.org/P70626 and previous config saved to /var/cache/conftool/dbconfig/20241029-160607-ladsgroup.json
- 16:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2041.codfw.wmnet
- 16:05 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2041.codfw.wmnet
- 16:03 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 16:02 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 16:01 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti2043.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:00 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2040.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 15:56 cjming@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
- 15:56 cjming@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
- 15:55 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
- 15:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti2040.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 15:54 cjming@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
- 15:54 cjming@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
- 15:54 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
- 15:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P70625 and previous config saved to /var/cache/conftool/dbconfig/20241029-155101-ladsgroup.json
- 15:47 moritzm: installing libheif security updates
- 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P70624 and previous config saved to /var/cache/conftool/dbconfig/20241029-153554-ladsgroup.json
- 15:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2040.codfw.wmnet
- 15:25 XioNoX: test prefering lumen-ATT path in eqiad
- 15:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2040.codfw.wmnet
- 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T376905)', diff saved to https://phabricator.wikimedia.org/P70623 and previous config saved to /var/cache/conftool/dbconfig/20241029-152047-ladsgroup.json
- 15:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2039.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 15:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1003.eqiad.wmnet with OS bookworm
- 15:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti2039.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 15:10 claime: Running `/usr/bin/systemd-cat -t "import-wikitech.sh" /wikitech-static/wikitechsync/import-wikitech.sh &` on wikitech-static - T348503
- 15:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2130 (T376905)', diff saved to https://phabricator.wikimedia.org/P70622 and previous config saved to /var/cache/conftool/dbconfig/20241029-150953-ladsgroup.json
- 15:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
- 15:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
- 15:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T376905)', diff saved to https://phabricator.wikimedia.org/P70621 and previous config saved to /var/cache/conftool/dbconfig/20241029-150926-ladsgroup.json
- 15:08 claime: Running `find /srv/mediawiki/images/wikitech/archive -type f | xargs rm` on wikitech-static - T374114 T348503
- 15:00 claime: Running php maintenance/deleteArchivedFiles.php --delete on wikitech-static - T374114
- 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2039.codfw.wmnet
- 14:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1003.eqiad.wmnet with reason: host reimage
- 14:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P70619 and previous config saved to /var/cache/conftool/dbconfig/20241029-145419-ladsgroup.json
- 14:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2039.codfw.wmnet
- 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1003.eqiad.wmnet with reason: host reimage
- 14:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2038.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 14:47 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti2038.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 14:44 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1009.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 14:40 reedy@deploy2002: Finished scap sync-world: 1.44.0-wmf.1 backports to fix deprecated logspam T375660 T377521 (duration: 07m 21s)
- 14:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2038.codfw.wmnet
- 14:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P70616 and previous config saved to /var/cache/conftool/dbconfig/20241029-143912-ladsgroup.json
- 14:39 herron: centrallog1002:~# systemctl restart rsyslogd
- 14:38 elukey@cumin2002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1009.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 14:35 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2038.codfw.wmnet
- 14:35 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2037.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 14:34 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1003.eqiad.wmnet with OS bookworm
- 14:32 reedy@deploy2002: Started scap sync-world: 1.44.0-wmf.1 backports to fix deprecated logspam T375660 T377521
- 14:29 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2037.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 14:29 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2037.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 14:25 MichaelG_WMF: T372337 clearing dangling database-records for link suggestions by running `mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=eswiki --db-table --force`
- 14:24 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2037.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 14:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T376905)', diff saved to https://phabricator.wikimedia.org/P70615 and previous config saved to /var/cache/conftool/dbconfig/20241029-142405-ladsgroup.json
- 14:20 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-lab1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 14:19 elukey: restart rsyslog on centrallog1002 - connection errors, failing prometheus probes
- 14:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2037.codfw.wmnet
- 14:18 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2037.codfw.wmnet
- 14:17 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ml-lab1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 14:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2116 (T376905)', diff saved to https://phabricator.wikimedia.org/P70614 and previous config saved to /var/cache/conftool/dbconfig/20241029-141532-ladsgroup.json
- 14:16 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
- 14:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
- 14:14 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2036.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 14:09 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2036.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 14:07 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-lab1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 14:06 kostajh: UTC afternoon deploys done
- 14:05 kharlan@deploy2002: Finished scap sync-world: Backport for AuthManagerStatsdHandler: Add label for wiki (T375505), AuthManagerStatsdHandler: Add label for wiki (T375505) (duration: 07m 53s)
- 14:01 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-lab1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 14:00 kharlan@deploy2002: kharlan: Continuing with sync
- 13:59 kharlan@deploy2002: kharlan: Backport for AuthManagerStatsdHandler: Add label for wiki (T375505), AuthManagerStatsdHandler: Add label for wiki (T375505) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:57 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2036.codfw.wmnet
- 13:57 kharlan@deploy2002: Started scap sync-world: Backport for AuthManagerStatsdHandler: Add label for wiki (T375505), AuthManagerStatsdHandler: Add label for wiki (T375505)
- 13:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2036.codfw.wmnet
- 13:48 jforrester@deploy2002: Finished scap sync-world: Backport for fix ibawiki's tagline svg path (duration: 07m 41s)
- 13:47 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 16347
- 13:46 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 16347
- 13:45 ayounsi@cumin1002: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 16347
- 13:45 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 16347
- 13:43 jforrester@deploy2002: jforrester, hamishz: Continuing with sync
- 13:42 jforrester@deploy2002: jforrester, hamishz: Backport for fix ibawiki's tagline svg path synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:42 moritzm: installing ghoscript security updates
- 13:40 jforrester@deploy2002: Started scap sync-world: Backport for fix ibawiki's tagline svg path
- 13:38 jforrester@deploy2002: Finished scap sync-world: Backport for Allow admins on testwiki to grant and remove upwizcampeditors (T378067), nlwiki, commonswiki, wikidata: lift IP cap for edit-a-thon (T377930) (duration: 08m 03s)
- 13:34 jforrester@deploy2002: dreamrimmer, superzerocool, jforrester: Continuing with sync
- 13:33 jforrester@deploy2002: dreamrimmer, superzerocool, jforrester: Backport for Allow admins on testwiki to grant and remove upwizcampeditors (T378067), nlwiki, commonswiki, wikidata: lift IP cap for edit-a-thon (T377930) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:31 jforrester@deploy2002: Started scap sync-world: Backport for Allow admins on testwiki to grant and remove upwizcampeditors (T378067), nlwiki, commonswiki, wikidata: lift IP cap for edit-a-thon (T377930)
- 13:31 arnaudb@cumin1002: dbctl commit (dc=all): 'db2211 (re)pooling @ 100%: post clone repool', diff saved to https://phabricator.wikimedia.org/P70612 and previous config saved to /var/cache/conftool/dbconfig/20241029-132956-arnaudb.json
- 13:30 mszabo@deploy2002: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
- 13:30 mszabo@deploy2002: helmfile [codfw] START helmfile.d/services/ipoid: apply
- 13:29 mszabo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
- {{safesubst:SAL entry|1=13:28 jforrester@deploy2002: Finished scap sync-world: Backport for annwiki: Add logo (T377535), kgewiki: Add logo (T377075), shnwikinews: Add logo (T377543), gorwikiquote: Add logo (T377542), moswiki: Add logo (T377539), ibawiki: Add logo (T377538), rskwiki: Add logo (T377536), [[gerrit:10840}}
- 13:28 mszabo@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
- 13:27 mszabo@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
- 13:26 mszabo@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
- 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2035.codfw.wmnet
- 13:23 jforrester@deploy2002: jforrester, hamishz: Continuing with sync
- {{safesubst:SAL entry|1=13:22 jforrester@deploy2002: jforrester, hamishz: Backport for annwiki: Add logo (T377535), kgewiki: Add logo (T377075), shnwikinews: Add logo (T377543), gorwikiquote: Add logo (T377542), moswiki: Add logo (T377539), ibawiki: Add logo (T377538), rskwiki: Add logo (T377536), [[gerrit:1084079|td}}
- {{safesubst:SAL entry|1=13:20 jforrester@deploy2002: Started scap sync-world: Backport for annwiki: Add logo (T377535), kgewiki: Add logo (T377075), shnwikinews: Add logo (T377543), gorwikiquote: Add logo (T377542), moswiki: Add logo (T377539), ibawiki: Add logo (T377538), rskwiki: Add logo (T377536), [[gerrit:108407}}
- 13:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2035.codfw.wmnet
- 13:17 jforrester@deploy2002: Finished scap sync-world: Backport for ExtensionDistributor: Mark 1.43 as beta (T372322), ExtensionDistributor: Remove EOL 1.40 (T364989), enwiktionary: Enable mobile page tabs for non logged in users (T377648) (duration: 12m 41s)
- 13:14 arnaudb@cumin1002: dbctl commit (dc=all): 'db2211 (re)pooling @ 75%: post clone repool', diff saved to https://phabricator.wikimedia.org/P70610 and previous config saved to /var/cache/conftool/dbconfig/20241029-131451-arnaudb.json
- 13:11 jforrester@deploy2002: zabe, macfan4000, hamishz, jforrester: Continuing with sync
- 13:10 jforrester@deploy2002: zabe, macfan4000, hamishz, jforrester: Backport for ExtensionDistributor: Mark 1.43 as beta (T372322), ExtensionDistributor: Remove EOL 1.40 (T364989), enwiktionary: Enable mobile page tabs for non logged in users (T377648) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2035.codfw.wmnet
- 13:07 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2035.codfw.wmnet
- 13:05 jforrester@deploy2002: Started scap sync-world: Backport for ExtensionDistributor: Mark 1.43 as beta (T372322), ExtensionDistributor: Remove EOL 1.40 (T364989), enwiktionary: Enable mobile page tabs for non logged in users (T377648)
- 12:59 arnaudb@cumin1002: dbctl commit (dc=all): 'db2211 (re)pooling @ 50%: post clone repool', diff saved to https://phabricator.wikimedia.org/P70607 and previous config saved to /var/cache/conftool/dbconfig/20241029-125945-arnaudb.json
- 12:50 moritzm: installing Apache security updates
- 12:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db2211 (re)pooling @ 25%: post clone repool', diff saved to https://phabricator.wikimedia.org/P70606 and previous config saved to /var/cache/conftool/dbconfig/20241029-124440-arnaudb.json
- 12:43 claime: Manually relaunched import-wikitech.sh on wikitech-static - T374114
- 12:42 claime: Killed dead and stacked import-wikitech.sh processes on wikitech-static - T374114
- 12:28 andrewtavis-wmde@deploy2002: Finished deploy [airflow-dags/wmde@d85a93c]: (no justification provided) (duration: 00m 30s)
- 12:27 andrewtavis-wmde@deploy2002: Started deploy [airflow-dags/wmde@d85a93c]: (no justification provided)
- 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2015.codfw.wmnet
- 12:04 cgoubert@deploy2002: Finished scap sync-world: T377958 - full mediawiki image rebuild and deployment to add helper scripts for mwcron, mwscript (duration: 29m 44s)
- 11:39 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2044.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 11:39 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2044.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 11:36 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2041.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 11:36 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2041.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 11:36 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2040.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 11:36 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2040.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 11:35 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2039.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 11:35 cgoubert@deploy2002: Started scap sync-world: T377958 - full mediawiki image rebuild and deployment to add helper scripts for mwcron, mwscript
- 11:35 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2039.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 11:35 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2038.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 11:34 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2038.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 11:33 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2037.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 11:33 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2037.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 11:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2036.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 11:32 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2036.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 11:30 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:30 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:29 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:29 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:29 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:29 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:28 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:28 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:27 claime: Rebuilding php{7.4,8.1}-fpm-multiversion-base - T377958
- 11:26 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:26 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:25 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:25 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:24 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:23 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:23 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:23 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:22 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:21 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:18 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:18 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:16 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:16 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:15 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:15 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:11 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1009.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:11 elukey@cumin2002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1009.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:10 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-lab1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:10 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ml-lab1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:10 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-lab1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:09 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ml-lab1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:09 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1011.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:09 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ml-serve1011.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1010.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:07 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ml-serve1010.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:05 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1009.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:05 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ml-serve1009.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:02 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 11:01 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:59 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2044.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:59 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2043.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:58 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2043.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:53 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2042.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2211.codfw.wmnet onto db2223.codfw.wmnet
- 10:50 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2042.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:14 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2015.codfw.wmnet
- 10:08 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2211.codfw.wmnet onto db2223.codfw.wmnet
- 10:07 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye
- 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2015.codfw.wmnet
- 10:04 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2015.codfw.wmnet
- 09:56 moritzm: installing wireshark security updates
- 09:41 kostajh: UTC morning deploys done
- 09:23 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2211.codfw.wmnet onto db2223.codfw.wmnet
- 09:23 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2211.codfw.wmnet onto db2223.codfw.wmnet
- 09:22 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2211.codfw.wmnet onto db2223.codfw.wmnet
- 09:21 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2211.codfw.wmnet onto db2223.codfw.wmnet
- 09:20 kharlan@deploy2002: Finished scap sync-world: Backport for temp accounts: Enable temp account autocreation on five pilot wikis (T378334), beta: enable "Surfacing structured tasks" for an early beta-wiki (T376677) (duration: 24m 42s)
- 09:20 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2211.codfw.wmnet onto db2223.codfw.wmnet
- 09:20 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2211.codfw.wmnet onto db2223.codfw.wmnet
- 09:16 kharlan@deploy2002: migr, kharlan: Continuing with sync
- 09:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: maintenance, host is not pooled
- 09:13 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: maintenance, host is not pooled
- 09:07 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2211.codfw.wmnet onto db2223.codfw.wmnet
- 09:07 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2211.codfw.wmnet onto db2223.codfw.wmnet
- 08:58 kharlan@deploy2002: migr, kharlan: Backport for temp accounts: Enable temp account autocreation on five pilot wikis (T378334), beta: enable "Surfacing structured tasks" for an early beta-wiki (T376677) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:56 kharlan@deploy2002: Started scap sync-world: Backport for temp accounts: Enable temp account autocreation on five pilot wikis (T378334), beta: enable "Surfacing structured tasks" for an early beta-wiki (T376677)
- 08:55 moritzm: upgrade irc.wikimedia.org to ircstream 1.0+wmf12u1 T376014
- 08:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: T378320', diff saved to https://phabricator.wikimedia.org/P70604 and previous config saved to /var/cache/conftool/dbconfig/20241029-085507-arnaudb.json
- 08:53 kharlan@deploy2002: Finished scap sync-world: Backport for Unblock CI (T377947), StatsLib: Set label for wiki ID (T375496) (duration: 13m 06s)
- 08:52 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2211.codfw.wmnet onto db2223.codfw.wmnet
- 08:52 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2211.codfw.wmnet onto db2223.codfw.wmnet
- 08:51 moritzm: uploaded ircstream 1.0+wmf12u1 to apt.wikimedia.org T376014
- 08:49 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 56258
- 08:48 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 56258
- 08:47 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 264567
- 08:47 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 264567
- 08:47 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16591
- 08:46 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 16591
- 08:46 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 200478
- 08:46 kharlan@deploy2002: kharlan: Continuing with sync
- 08:45 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 200478
- 08:45 kharlan@deploy2002: kharlan: Backport for Unblock CI (T377947), StatsLib: Set label for wiki ID (T375496) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 56258
- 08:44 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 56258
- 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8966
- 08:42 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 8966
- 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9038
- 08:41 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 9038
- 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16347
- 08:41 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2211.codfw.wmnet onto db2223.codfw.wmnet
- 08:41 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2211.codfw.wmnet onto db2223.codfw.wmnet
- 08:41 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 16347
- 08:40 arnaudb@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: T378320', diff saved to https://phabricator.wikimedia.org/P70603 and previous config saved to /var/cache/conftool/dbconfig/20241029-084002-arnaudb.json
- 08:40 kharlan@deploy2002: Started scap sync-world: Backport for Unblock CI (T377947), StatsLib: Set label for wiki ID (T375496)
- 08:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2211 in db2223 for T373579', diff saved to https://phabricator.wikimedia.org/P70602 and previous config saved to /var/cache/conftool/dbconfig/20241029-083035-arnaudb.json
- 08:29 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 28306
- 08:29 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2211 - depooling db2211 to clone on db2223
- 08:29 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2211 - depooling db2211 to clone on db2223
- 08:29 arnaudb@cumin1002: dbctl commit (dc=all): 'depool preshot db2211', diff saved to https://phabricator.wikimedia.org/P70601 and previous config saved to /var/cache/conftool/dbconfig/20241029-082903-arnaudb.json
- 08:29 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2223.codfw.wmnet with reason: provisionning db2223.codfw.wmnet - T373579
- 08:28 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2223.codfw.wmnet with reason: provisionning db2223.codfw.wmnet - T373579
- 08:28 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2211.codfw.wmnet with reason: provisionning db2223.codfw.wmnet - T373579
- 08:28 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 28306
- 08:28 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2211.codfw.wmnet with reason: provisionning db2223.codfw.wmnet - T373579
- 08:24 arnaudb@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: T378320', diff saved to https://phabricator.wikimedia.org/P70600 and previous config saved to /var/cache/conftool/dbconfig/20241029-082456-arnaudb.json
- 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts irc2004.wikimedia.org
- 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc2004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 08:21 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc2004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 08:15 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 08:09 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts irc2004.wikimedia.org
- 08:09 arnaudb@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: T378320', diff saved to https://phabricator.wikimedia.org/P70599 and previous config saved to /var/cache/conftool/dbconfig/20241029-080951-arnaudb.json
- 08:08 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1169 quickly with 2 steps - index rebuilt
- 08:08 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1169 quickly with 2 steps - index rebuilt
- 08:08 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1169 gradually with 4 steps - index rebuilt
- 08:08 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1169 gradually with 4 steps - index rebuilt
- 08:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts irc1004.wikimedia.org
- 08:07 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 08:07 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc1004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 08:06 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1169 gradually with 4 steps - index rebuilt
- 08:06 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1169 gradually with 4 steps - index rebuilt
- 08:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc1004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 08:03 moritzm: installing qemu security updates
- 07:58 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 07:53 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts irc1004.wikimedia.org
- 04:01 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.26 (duration: 01m 04s)
- 03:53 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.1 refs T375660 (duration: 49m 51s)
- 03:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.1 refs T375660
2024-10-28
- 23:08 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
- 23:08 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
- 23:06 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
- 23:06 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
- 23:05 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 23:05 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 23:04 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
- 23:03 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
- 23:03 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
- 23:02 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
- 23:01 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
- 23:01 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
- 22:28 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@d85a93c]: add missing comma (duration: 00m 36s)
- 22:27 ebernhardson@deploy2002: Started deploy [airflow-dags/search@d85a93c]: add missing comma
- 22:10 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@99eb6f3]: T375387: update discolytics to 0.27.0 (duration: 00m 50s)
- 22:09 ebernhardson@deploy2002: Started deploy [airflow-dags/search@99eb6f3]: T375387: update discolytics to 0.27.0
- 22:00 ryankemper: T372074 `sudo requestctl delete ipblock abuse/wdqs` && `sudo requestctl delete pattern ua/wdqs_sparql` to clean up objects removed in commit `d26fc1e910579d33d33ec3d5a192d137045eba4b` ( <-- this occurred before the requestctl commit; i just missed making the irc log)
- 21:48 ryankemper: T372074 `sudo requestctl commit`
- 21:29 kostajh: UTC late deploys done, for real
- 21:26 ryankemper: T372074 `sudo requestctl delete action cache-text/T372074` && `sudo requestctl delete action cache-text/T372074_wdqs_codfw_flap`
- 21:26 kharlan@deploy2002: Finished scap sync-world: Backport for GlobalContributionsPager: Make article link redirect to the page (T378155) (duration: 09m 01s)
- 21:21 kharlan@deploy2002: kharlan: Continuing with sync
- 21:19 kharlan@deploy2002: kharlan: Backport for GlobalContributionsPager: Make article link redirect to the page (T378155) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:17 kharlan@deploy2002: Started scap sync-world: Backport for GlobalContributionsPager: Make article link redirect to the page (T378155)
- 20:44 kostajh: UTC late deploys done
- {{safesubst:SAL entry|1=20:42 kharlan@deploy2002: Finished scap sync-world: Backport for Partial Revert "Make sure contributor's name is on its line" (T378142), Restore missing second argument to "mapState" in QuickView.vue (T378204), GlobalContributionsPager: Use Special:PermanentLink to construct link (T378155), [[gerrit:1083886|GlobalContributionsPager: Don't display external namespace in}}
- 20:37 kharlan@deploy2002: jdlrobson, kharlan: Continuing with sync
- {{safesubst:SAL entry|1=20:33 kharlan@deploy2002: jdlrobson, kharlan: Backport for Partial Revert "Make sure contributor's name is on its line" (T378142), Restore missing second argument to "mapState" in QuickView.vue (T378204), GlobalContributionsPager: Use Special:PermanentLink to construct link (T378155), [[gerrit:1083886|GlobalContributionsPager: Don't display external namespace in artic}}
- {{safesubst:SAL entry|1=20:30 kharlan@deploy2002: Started scap sync-world: Backport for Partial Revert "Make sure contributor's name is on its line" (T378142), Restore missing second argument to "mapState" in QuickView.vue (T378204), GlobalContributionsPager: Use Special:PermanentLink to construct link (T378155), [[gerrit:1083886|GlobalContributionsPager: Don't display external namespace in}}
- 19:52 brett: Removed RSA certificate support from tlsproxy (T375569)
- 19:33 brett: Removed RSA certificate support from mirrors, dumps (T375569)
- 19:27 gmodena@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 19:26 gmodena@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
- 19:24 gmodena@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 19:24 gmodena@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
- 19:23 brett: Removed RSA certificate support from ldap, archiva, durum
- 19:21 gmodena@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 19:21 gmodena@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
- 19:18 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 19:17 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 19:15 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
- 19:15 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
- 19:14 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 19:13 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 17:43 jiawang@deploy2002: Finished deploy [airflow-dags/analytics_product@a7456f9]: deploy tsp pipelines (duration: 01m 33s)
- 17:42 jiawang@deploy2002: Started deploy [airflow-dags/analytics_product@a7456f9]: deploy tsp pipelines
- 17:04 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: cloudvirt1063 needs maintenance T375223
- 17:03 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: cloudvirt1063 needs maintenance T375223
- 16:55 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1013.eqiad.wmnet
- 16:50 vgutierrez@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs1013.eqiad.wmnet
- 16:50 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1014.eqiad.wmnet
- 16:44 vgutierrez@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs1014.eqiad.wmnet
- 16:44 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1015.eqiad.wmnet
- 16:38 vgutierrez@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs1015.eqiad.wmnet
- 16:38 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1016.eqiad.wmnet
- 16:32 vgutierrez@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs1016.eqiad.wmnet
- 16:26 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcumin1001.eqiad.wmnet
- 16:20 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcumin1001.eqiad.wmnet
- 16:20 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcumin2001.codfw.wmnet
- 16:16 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcumin2001.codfw.wmnet
- 15:51 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 02m 25s)
- 15:49 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 07m 35s)
- 15:48 XioNoX: re-enable IX BGP sessions in eqiad
- 15:30 jan_drewniak: starting portals deployment
- 15:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 15:28 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 15:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 15:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:51 MichaelG_WMF: T372337 - run `mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=eswiki --search-index` to fix the remaining ca. 10K dangling search index records
- 14:37 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2042.codfw.wmnet to cluster codfw and group D
- 14:36 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2042.codfw.wmnet to cluster codfw and group D
- 14:08 urbanecm@deploy2002: Finished scap sync-world: Backport for knwiktionary: update logo, wordmark (T360022), hewikisource: add project namespace alias (T378303), Add config for testing T375264 on beta (T377988) (duration: 10m 43s)
- 14:04 urbanecm@deploy2002: anzx, cparle, urbanecm: Continuing with sync
- 14:01 urbanecm@deploy2002: anzx, cparle, urbanecm: Backport for knwiktionary: update logo, wordmark (T360022), hewikisource: add project namespace alias (T378303), Add config for testing T375264 on beta (T377988) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:58 urbanecm@deploy2002: Started scap sync-world: Backport for knwiktionary: update logo, wordmark (T360022), hewikisource: add project namespace alias (T378303), Add config for testing T375264 on beta (T377988)
- 13:57 urbanecm@deploy2002: Sync cancelled.
- 13:54 urbanecm@deploy2002: anzx, urbanecm: Backport for knwiktionary: update logo, wordmark (T360022) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:52 urbanecm@deploy2002: Started scap sync-world: Backport for knwiktionary: update logo, wordmark (T360022)
- 13:49 arnaudb@cumin2002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2211 quickly with 2 steps - test fast pool
- 13:41 urbanecm@deploy2002: Finished scap sync-world: Backport for Enable CampaignEvents collaboration list by default (T375141), beta: Drop $wgCampaignEventsShowEventInvitationSpecialPages (T373442), prod: Drop $wgCampaignEventsShowEventInvitationSpecialPages (T373442) (duration: 13m 43s)
- 13:38 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2041.codfw.wmnet to cluster codfw and group D
- 13:37 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2041.codfw.wmnet to cluster codfw and group D
- 13:36 urbanecm@deploy2002: urbanecm, daimona: Continuing with sync
- 13:33 arnaudb@cumin2002: START - Cookbook sre.mysql.pool db2211 quickly with 2 steps - test fast pool
- 13:31 arnaudb@cumin2002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2211 - test depool
- 13:31 arnaudb@cumin2002: START - Cookbook sre.mysql.depool db2211 - test depool
- 13:29 urbanecm@deploy2002: urbanecm, daimona: Backport for Enable CampaignEvents collaboration list by default (T375141), beta: Drop $wgCampaignEventsShowEventInvitationSpecialPages (T373442), prod: Drop $wgCampaignEventsShowEventInvitationSpecialPages (T373442) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:27 urbanecm@deploy2002: Started scap sync-world: Backport for Enable CampaignEvents collaboration list by default (T375141), beta: Drop $wgCampaignEventsShowEventInvitationSpecialPages (T373442), prod: Drop $wgCampaignEventsShowEventInvitationSpecialPages (T373442)
- 13:16 moritzm: installing bash/zsh updates from bookworm point release
- 12:12 moritzm: upgrade irc.wikimedia.org to ircstream 0.13.0+wmf12u3 T376014
- 12:06 _joe_: uploaded conftool 4.0.0-1 to reprepro T376877
- 11:30 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Specify wiki ID to ::getId call in GlobalBlockingHandler (T378085) (duration: 07m 44s)
- 11:25 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
- 11:25 dreamyjazz@deploy2002: dreamyjazz: Backport for Specify wiki ID to ::getId call in GlobalBlockingHandler (T378085) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 11:23 dreamyjazz@deploy2002: Started scap sync-world: Backport for Specify wiki ID to ::getId call in GlobalBlockingHandler (T378085)
- 11:05 volans: updated spicerack to v8.15.1 on cumin1002
- 10:58 Dreamy_Jazz: Ran `DROP TABLE /*_*/globalblocks` on all beta wikis (excluding the centralauth DB) - T377742
- 10:51 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2042.codfw.wmnet
- 10:50 elukey: elukey@puppetmaster1001:~$ sudo puppet cert destroy puppetboard.discovery.wmnet
- 10:46 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti2042.codfw.wmnet
- 10:46 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2041.codfw.wmnet
- 10:39 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2042.codfw.wmnet
- 10:36 dcausse: T378227: rebuilding dewiki_titlesuggest
- 10:35 moritzm: uploaded ircstream 0.13.0+wmf12u3 to apt.wikimedia.org (includes a fix which should hopefully reduce connection errors with bots using smart4irc)
- 10:34 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti2041.codfw.wmnet
- 10:34 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2041.codfw.wmnet
- 10:34 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti2042.codfw.wmnet
- 10:29 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti2041.codfw.wmnet
- 10:28 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2041.codfw.wmnet
- 10:12 volans: updated spicerack to v8.15.1 on cumin2002
- 09:21 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti2041.codfw.wmnet
- 09:11 hashar: Restarted CI Jenkins for plugin update - T378327
- 08:42 dcausse: T378227: deleting broken cirrus titlesugest index dewiki_titlesuggest_1729824440
- 08:38 kostajh: UTC morning deploys done
- 08:38 kharlan@deploy2002: Finished scap sync-world: Backport for ContributionsPager: Fix getTemplateParams() parameter (T378132), Fix getTemplateParams() $classes parameter (T378132) (duration: 09m 38s)
- 08:33 kharlan@deploy2002: kharlan: Continuing with sync
- 08:31 kharlan@deploy2002: kharlan: Backport for ContributionsPager: Fix getTemplateParams() parameter (T378132), Fix getTemplateParams() $classes parameter (T378132) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:28 kharlan@deploy2002: Started scap sync-world: Backport for ContributionsPager: Fix getTemplateParams() parameter (T378132), Fix getTemplateParams() $classes parameter (T378132)
- 08:27 hashar: Pushed https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CheckUser/+/1083592 and https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1083591 for wmf/1.43.0-wmf.28 / T378132 due to a dependency loop
- 08:24 hashar: Pushed https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CheckUser/+/1083592 and https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CheckUser/+/1083592 for wmf/1.43.0-wmf.28 / T378132 due to a dependency loop
- 08:19 hashar: Changed UTC morning backport window from 00:00 SF to 09:00 CET (aka 08:00 UTC) | UTC morning backport window
- 08:07 kartik@deploy2002: Finished scap sync-world: Backport for Disable MT in Content Translation on Lithuanian Wikipedia (T364073) (duration: 22m 24s)
- 08:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1234.eqiad.wmnet with reason: maintenance T378267
- 08:01 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1234.eqiad.wmnet with reason: maintenance T378267
- 08:01 hashar: Restarted CI Jenkins to update the Collapsible Sections plugin | T378327
- 07:57 kartik@deploy2002: kartik: Continuing with sync
- 07:56 kartik@deploy2002: kartik: Backport for Disable MT in Content Translation on Lithuanian Wikipedia (T364073) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:45 kartik@deploy2002: Started scap sync-world: Backport for Disable MT in Content Translation on Lithuanian Wikipedia (T364073)
- 07:14 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db[1169,1234].eqiad.wmnet with reason: maintenance
- 07:14 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db[1169,1234].eqiad.wmnet with reason: maintenance
- 06:07 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: replication broken T378320
- 06:06 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: replication broken T378320
- 06:03 taavi@cumin1002: dbctl commit (dc=all): 'depool db1169', diff saved to https://phabricator.wikimedia.org/P70590 and previous config saved to /var/cache/conftool/dbconfig/20241028-060327-taavi.json
2024-10-27
- 13:41 Dreamy_Jazz: Starting MediaModeration scanning on group1 wikis
- 13:37 Dreamy_Jazz: Starting MediaModeration scanning on group2 wikis
2024-10-26
- 16:29 mvernon@cumin1002: dbctl commit (dc=all): 'Depool db1234', diff saved to https://phabricator.wikimedia.org/P70589 and previous config saved to /var/cache/conftool/dbconfig/20241026-162946-mvernon.json
- 16:29 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1234.eqiad.wmnet with reason: spontaneous reboot, depooling 'til Monday
- 16:28 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1234.eqiad.wmnet with reason: spontaneous reboot, depooling 'til Monday
- 02:03 tzatziki: removing 9 files for legal compliance
2024-10-25
- 18:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup2012.codfw.wmnet with OS bookworm
- 18:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 18:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup2012.codfw.wmnet with reason: host reimage
- 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2012.codfw.wmnet with reason: host reimage
- 16:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host backup2012.codfw.wmnet with OS bookworm
- 16:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2012.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 16:28 JustHannah: T378170 Ran mwscript-k8s extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=trwiki --logwiki=metawiki 'Peter.kerepesi' 'Peakbagger77' @ 11:57:19 UTC
- 15:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host backup2012.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup2012.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host backup2012.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:53 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host backup2012
- 15:52 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host backup2012
- 15:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:52 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding backup2012 to codfw - jhancock@cumin2002"
- 15:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding backup2012 to codfw - jhancock@cumin2002"
- 15:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 15:07 dancy@deploy2002: Installation of scap version "4.118.0" completed for 209 hosts
- 15:03 dancy@deploy2002: Installing scap version "4.118.0" for 209 hosts
- 14:31 herron: alert1002: manually killed stunnel4 process to clear puppet failure T375143
- 14:02 sukhe: running authdns-update for CR 1082548
- 10:31 arnaudb@cumin1002: dbctl commit (dc=all): 'maintenance', diff saved to https://phabricator.wikimedia.org/P70588 and previous config saved to /var/cache/conftool/dbconfig/20241025-103157-arnaudb.json
- 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:18 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:17 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 10:16 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 10:15 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 10:12 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2014.codfw.wmnet
- 09:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2012.codfw.wmnet
- 09:17 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:17 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2012.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 09:17 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2012.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 09:12 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 09:06 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2012.codfw.wmnet
- 09:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2011.codfw.wmnet
- 09:05 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2011.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 09:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2011.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 08:54 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
- 08:53 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
- 08:47 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 08:42 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2011.codfw.wmnet
- 08:27 moritzm: installing wireshark security updates
- 08:24 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2014.codfw.wmnet
- 08:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of kubestagemaster2004.codfw.wmnet to plain
- 08:22 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of kubestagemaster2004.codfw.wmnet to plain
- 08:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2014.codfw.wmnet
- 08:21 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2014.codfw.wmnet
- 08:17 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of kubestagemaster2004.codfw.wmnet to drbd
- 08:11 moritzm: imported openjdk-8 8u422-b05-1~deb12u1 to component/jdk for bookworm-wikimedia
- 08:04 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of kubestagemaster2004.codfw.wmnet to drbd
- 08:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2014.codfw.wmnet
- 08:02 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2014.codfw.wmnet
- 08:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of kubestagemaster2004.codfw.wmnet to plain
- 08:01 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of kubestagemaster2004.codfw.wmnet to plain
- 07:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of kubestagemaster2004.codfw.wmnet to drbd
- 07:42 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of kubestagemaster2004.codfw.wmnet to drbd
- 06:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2014.codfw.wmnet
- 06:47 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2014.codfw.wmnet
- 06:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast1003.wikimedia.org
- 06:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast1003.wikimedia.org
- 06:27 jmm@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast2003.wikimedia.org
- 06:19 jmm@cumin1002: START - Cookbook sre.hosts.reboot-single for host bast2003.wikimedia.org
2024-10-24
- 23:09 tzatziki: removing 3 files for legal compliance
- 22:27 zabe@deploy2002: Finished scap sync-world: Backport for s8: Reduce revision-slots cache expiry to 60 seconds (T183490) (duration: 07m 03s)
- 22:23 zabe@deploy2002: zabe: Continuing with sync
- 22:23 zabe@deploy2002: zabe: Backport for s8: Reduce revision-slots cache expiry to 60 seconds (T183490) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:20 zabe@deploy2002: Started scap sync-world: Backport for s8: Reduce revision-slots cache expiry to 60 seconds (T183490)
- 21:37 legoktm@deploy2002: Finished scap sync-world: Backport for Update interwiki cache (duration: 07m 51s)
- 21:32 legoktm@deploy2002: legoktm: Continuing with sync
- 21:31 legoktm@deploy2002: legoktm: Backport for Update interwiki cache synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:29 legoktm@deploy2002: Started scap sync-world: Backport for Update interwiki cache
- 21:25 tzatziki: removing 1 file for legal compliance
- 21:24 Dreamy_Jazz: Ran `foreachwiki emptyUserGroup.php checkuser-temporary-account-viewer` on the beta wikis.
- 21:14 thcipriani@deploy2002: Finished scap sync-world: Backport for Enable edit check on nlwiki (T377551) (duration: 09m 07s)
- 21:09 thcipriani@deploy2002: thcipriani, kemayo: Continuing with sync
- 21:07 thcipriani@deploy2002: thcipriani, kemayo: Backport for Enable edit check on nlwiki (T377551) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:05 thcipriani@deploy2002: Started scap sync-world: Backport for Enable edit check on nlwiki (T377551)
- 21:02 thcipriani@deploy2002: Finished scap sync-world: Backport for chore: Move authevents logging into AuthManager (T341650 T375510 T375505), chore: AuthManager::autoCreateUser log authevents now (T341650 T375510 T375505) (duration: 18m 10s)
- 20:58 thcipriani@deploy2002: tgr, thcipriani: Continuing with sync
- 20:53 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002
- 20:46 thcipriani@deploy2002: tgr, thcipriani: Backport for chore: Move authevents logging into AuthManager (T341650 T375510 T375505), chore: AuthManager::autoCreateUser log authevents now (T341650 T375510 T375505) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:44 thcipriani@deploy2002: Started scap sync-world: Backport for chore: Move authevents logging into AuthManager (T341650 T375510 T375505), chore: AuthManager::autoCreateUser log authevents now (T341650 T375510 T375505)
- 20:40 thcipriani@deploy2002: Finished scap sync-world: Backport for Configure settings for annwiki, nrwiki, mywikisource (T375102 T377160 T363270) (duration: 11m 09s)
- 20:35 thcipriani@deploy2002: thcipriani, pppery: Continuing with sync
- 20:31 thcipriani@deploy2002: thcipriani, pppery: Backport for Configure settings for annwiki, nrwiki, mywikisource (T375102 T377160 T363270) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:29 thcipriani@deploy2002: Started scap sync-world: Backport for Configure settings for annwiki, nrwiki, mywikisource (T375102 T377160 T363270)
- 20:24 thcipriani@deploy2002: Finished scap sync-world: Backport for Deploy missing.php redirects for Allemanic German (T376923) (duration: 14m 08s)
- 20:20 thcipriani@deploy2002: thcipriani, pppery: Continuing with sync
- 20:13 thcipriani@deploy2002: thcipriani, pppery: Backport for Deploy missing.php redirects for Allemanic German (T376923) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:10 thcipriani@deploy2002: Started scap sync-world: Backport for Deploy missing.php redirects for Allemanic German (T376923)
- 19:34 dancy@deploy2002: Finished scap sync-world: Backport for Use SpecialPage::getRobotPolicy to set robot policy (T378108) (duration: 07m 08s)
- 19:29 dancy@deploy2002: dancy: Continuing with sync
- 19:29 dancy@deploy2002: dancy: Backport for Use SpecialPage::getRobotPolicy to set robot policy (T378108) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 19:26 dancy@deploy2002: Started scap sync-world: Backport for Use SpecialPage::getRobotPolicy to set robot policy (T378108)
- 18:46 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002
- 18:09 dancy@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.43.0-wmf.28 refs T375659
- 17:42 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
- 17:42 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
- 17:42 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
- 17:41 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
- 17:38 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
- 17:38 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
- 17:13 dancy@deploy2002: Finished scap sync-world: Backport for AbuseLogPager: Fix passing `false` as message parameter (T377917) (duration: 07m 18s)
- 17:09 dancy@deploy2002: dancy: Continuing with sync
- 17:09 dancy@deploy2002: dancy: Backport for AbuseLogPager: Fix passing `false` as message parameter (T377917) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 17:09 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2013.codfw.wmnet
- 17:06 dancy@deploy2002: Started scap sync-world: Backport for AbuseLogPager: Fix passing `false` as message parameter (T377917)
- 17:04 urbanecm: `mwscript-k8s -f extensions/Flow/maintenance/FlowMoveBoardsToSubpages.php -- --wiki=nowiki` (running as `mw-script.codfw.ui7285yu`; T376749)
- 16:56 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002
- 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2088.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 16:45 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix encoding of usernames with non-ascii letters - oblivian@cumin1002"
- 16:44 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix encoding of usernames with non-ascii letters - oblivian@cumin1002
- 16:43 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix encoding of usernames with non-ascii letters - oblivian@cumin1002
- 16:43 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix encoding of usernames with non-ascii letters - oblivian@cumin1002"
- 16:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 16:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2088.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 16:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2087
- 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2088
- 16:13 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2088
- 16:13 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2087
- 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2087 to codfw - jhancock@cumin2002"
- 16:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2087 to codfw - jhancock@cumin2002"
- 16:06 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 16:05 dancy@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.43.0-wmf.28 refs T375659
- 16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:51 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubernetes2016.codfw.wmnet
- 15:51 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubernetes2016.codfw.wmnet
- 15:50 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes2016.codfw.wmnet
- 15:48 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes2016.codfw.wmnet
- 15:47 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@325d943]: Deploy latest DAGs to analytics Airflow instance. T377999. (duration: 01m 07s)
- 15:46 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes2016.codfw.wmnet
- 15:46 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes2016.codfw.wmnet
- 15:46 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubernetes2015.codfw.wmnet
- 15:46 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubernetes2015.codfw.wmnet
- 15:45 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes2015.codfw.wmnet
- 15:45 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@325d943]: Deploy latest DAGs to analytics Airflow instance. T377999.
- 15:43 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:43 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes2015.codfw.wmnet
- 15:42 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2086
- 15:42 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2086
- 15:41 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:41 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2086 to codfw - jhancock@cumin2002"
- 15:41 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2086 to codfw - jhancock@cumin2002"
- 15:41 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes2015.codfw.wmnet
- 15:41 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes2015.codfw.wmnet
- 15:41 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubernetes2006.codfw.wmnet
- 15:40 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubernetes2006.codfw.wmnet
- 15:40 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes2006.codfw.wmnet
- 15:38 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes2006.codfw.wmnet
- 15:37 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 15:37 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes2006.codfw.wmnet
- 15:37 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes2006.codfw.wmnet
- 15:36 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubernetes2005.codfw.wmnet
- 15:36 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubernetes2005.codfw.wmnet
- 15:35 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes2005.codfw.wmnet
- 15:34 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubernetes1016.eqiad.wmnet
- 15:34 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubernetes1016.eqiad.wmnet
- 15:33 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes1016.eqiad.wmnet
- 15:32 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes2005.codfw.wmnet
- 15:31 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes1016.eqiad.wmnet
- 15:30 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes2005.codfw.wmnet
- 15:30 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes2005.codfw.wmnet
- 15:30 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes1016.eqiad.wmnet
- 15:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:29 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes1016.eqiad.wmnet
- 15:29 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubernetes1015.eqiad.wmnet
- 15:29 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubernetes1015.eqiad.wmnet
- 15:28 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes1015.eqiad.wmnet
- 15:26 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes1015.eqiad.wmnet
- 15:25 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes1015.eqiad.wmnet
- 15:24 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes1015.eqiad.wmnet
- 15:24 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubernetes1006.eqiad.wmnet
- 15:23 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubernetes1006.eqiad.wmnet
- 15:23 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes1006.eqiad.wmnet
- 15:21 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes1006.eqiad.wmnet
- 15:19 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes1006.eqiad.wmnet
- 15:18 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes1006.eqiad.wmnet
- 15:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:16 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubernetes1005.eqiad.wmnet
- 15:16 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubernetes1005.eqiad.wmnet
- 15:16 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes1005.eqiad.wmnet
- 15:15 ihurbain@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 15:15 ihurbain@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 15:13 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes1005.eqiad.wmnet
- 15:13 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes1005.eqiad.wmnet
- 15:13 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes1005.eqiad.wmnet
- 15:11 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:09 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubernetes1005.eqiad.wmnet
- 15:08 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubernetes1005.eqiad.wmnet
- 15:08 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes1005.eqiad.wmnet
- 15:08 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:04 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes1005.eqiad.wmnet
- 15:03 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes1005.eqiad.wmnet
- 15:02 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes1005.eqiad.wmnet
- 14:54 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1285-1286,1288-1289].eqiad.wmnet
- 14:53 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1285-1286,1288-1289].eqiad.wmnet
- 14:50 ihurbain@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 14:48 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002
- 14:42 hashar: Restarting CI Jenkins
- 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2085
- 14:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2085
- 14:32 gmodena@deploy2002: Finished deploy [analytics/refinery@413e5d9] (hadoop-test): 2024-10-24 refinery hotfix deployment TEST [analytics/refinery@413e5d91] (duration: 04m 03s)
- 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2085 to codfw - jhancock@cumin2002"
- 14:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2085 to codfw - jhancock@cumin2002"
- 14:27 gmodena@deploy2002: Started deploy [analytics/refinery@413e5d9] (hadoop-test): 2024-10-24 refinery hotfix deployment TEST [analytics/refinery@413e5d91]
- 14:24 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 14:24 gmodena@deploy2002: Finished deploy [analytics/refinery@413e5d9] (thin): 2024-10-24 refinery hotfix deployment THIN [analytics/refinery@413e5d91] (duration: 04m 59s)
- 14:22 urbanecm@deploy2002: Finished scap sync-world: Backport for Add maintenance script to move all flow boards on a wiki to a subpage (T371738), Add maintenance script to move all flow boards on a wiki to a subpage (T371738) (duration: 07m 28s)
- 14:22 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye
- 14:20 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2013.codfw.wmnet
- 14:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2013.codfw.wmnet
- 14:19 gmodena@deploy2002: Started deploy [analytics/refinery@413e5d9] (thin): 2024-10-24 refinery hotfix deployment THIN [analytics/refinery@413e5d91]
- 14:18 sukhe: running authdns-update for CR 1042919
- 14:16 gmodena@deploy2002: Finished deploy [analytics/refinery@413e5d9]: 2024-10-24 refinery hotfix deployment [analytics/refinery@413e5d91] (duration: 07m 48s)
- 14:15 urbanecm@deploy2002: Started scap sync-world: Backport for Add maintenance script to move all flow boards on a wiki to a subpage (T371738), Add maintenance script to move all flow boards on a wiki to a subpage (T371738)
- 14:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2013.codfw.wmnet
- 14:08 gmodena@deploy2002: Started deploy [analytics/refinery@413e5d9]: 2024-10-24 refinery hotfix deployment [analytics/refinery@413e5d91]
- 14:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2037.codfw.wmnet
- 14:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2037.codfw.wmnet
- 14:00 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be1066.eqiad.wmnet
- 13:59 mvernon@cumin1002: START - Cookbook sre.hosts.remove-downtime for ms-be1066.eqiad.wmnet
- 13:57 Emperor: restarting swift after vacuum on ms-be1066 T377827
- 13:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2037.codfw.wmnet
- 13:53 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be1066.eqiad.wmnet with reason: vacuum an overlarge container db
- 13:52 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be1066.eqiad.wmnet with reason: vacuum an overlarge container db
- 13:49 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1286.eqiad.wmnet with OS bookworm
- 13:47 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1288.eqiad.wmnet with OS bookworm
- 13:45 oblivian@cumin2002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool mw-web-ro in codfw: maintenance
- 13:43 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1289.eqiad.wmnet with OS bookworm
- 13:42 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2037.codfw.wmnet
- 13:40 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1285.eqiad.wmnet with OS bookworm
- 13:40 oblivian@cumin2002: START - Cookbook sre.discovery.service-route pool mw-web-ro in codfw: maintenance
- 13:29 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1286.eqiad.wmnet with reason: host reimage
- 13:26 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1288.eqiad.wmnet with reason: host reimage
- 13:23 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1289.eqiad.wmnet with reason: host reimage
- 13:23 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 13:22 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 13:20 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1285.eqiad.wmnet with reason: host reimage
- 13:19 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1288.eqiad.wmnet with reason: host reimage
- 13:18 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1289.eqiad.wmnet with reason: host reimage
- 13:18 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1286.eqiad.wmnet with reason: host reimage
- 13:16 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1285.eqiad.wmnet with reason: host reimage
- 13:15 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 13:14 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 13:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
- 12:59 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1288.eqiad.wmnet with OS bookworm
- 12:59 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1289.eqiad.wmnet with OS bookworm
- 12:58 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1286.eqiad.wmnet with OS bookworm
- 12:57 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1285.eqiad.wmnet with OS bookworm
- 12:55 btullis@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-main-eqiad cluster: Roll restart of jvm daemons.
- 12:46 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be1003.eqiad.wmnet
- 12:45 btullis@cumin1002: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-main-eqiad cluster: Roll restart of jvm daemons.
- 12:40 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1289.eqiad.wmnet with OS bookworm
- 12:38 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1286.eqiad.wmnet with OS bookworm
- 12:38 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host moss-be1003.eqiad.wmnet
- 12:37 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be2003.codfw.wmnet
- 12:37 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be1002.eqiad.wmnet
- 12:34 moritzm: bump qemu migration speed to 1000 for esams, ulsfo, eqsin, drmrs, magru Ganeti clusters
- 12:34 moritzm: bump qemu migration speed to 1000 for esams, ulsfo, eqsin, drmrs, magru clusters
- 12:33 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1288.eqiad.wmnet with OS bookworm
- 12:30 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host moss-be1002.eqiad.wmnet
- 12:29 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1285.eqiad.wmnet with OS bookworm
- 12:29 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-be2003.codfw.wmnet
- 12:28 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be2002.codfw.wmnet
- 12:22 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be1001.eqiad.wmnet
- 12:21 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1289.eqiad.wmnet with reason: host reimage
- 12:21 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-be2002.codfw.wmnet
- 12:21 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be2001.codfw.wmnet
- 12:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2036.codfw.wmnet
- 12:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2036.codfw.wmnet
- 12:17 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1286.eqiad.wmnet with reason: host reimage
- 12:15 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host moss-be1001.eqiad.wmnet
- 12:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2036.codfw.wmnet
- 12:14 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1288.eqiad.wmnet with reason: host reimage
- 12:13 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-be2001.codfw.wmnet
- 12:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2036.codfw.wmnet
- 12:10 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1285.eqiad.wmnet with reason: host reimage
- 12:08 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1288.eqiad.wmnet with reason: host reimage
- 12:08 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1289.eqiad.wmnet with reason: host reimage
- 12:07 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1286.eqiad.wmnet with reason: host reimage
- 12:07 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1285.eqiad.wmnet with reason: host reimage
- 11:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2038.codfw.wmnet
- 11:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2038.codfw.wmnet
- 11:48 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1289.eqiad.wmnet with OS bookworm
- 11:48 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1288.eqiad.wmnet with OS bookworm
- 11:48 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1286.eqiad.wmnet with OS bookworm
- 11:47 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1285.eqiad.wmnet with OS bookworm
- 11:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2038.codfw.wmnet
- 11:23 oblivian@cumin1002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool mw-web-ro in codfw: maintenance
- 11:18 oblivian@cumin1002: START - Cookbook sre.discovery.service-route depool mw-web-ro in codfw: maintenance
- 11:14 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-mariadb1002.eqiad.wmnet
- 11:07 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-mariadb1002.eqiad.wmnet
- 11:05 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
- 10:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2038.codfw.wmnet
- 10:51 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-cluster
- 10:43 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bookworm
- 10:38 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
- 10:30 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host an-redacteddb1001.eqiad.wmnet
- 10:27 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1285.eqiad.wmnet with OS bookworm
- 10:26 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: stopped being the active one, stopping replication
- 10:26 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: stopped being the active one, stopping replication
- 10:23 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-cluster
- 10:22 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1288.eqiad.wmnet with OS bookworm
- 10:22 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
- 10:21 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-cluster
- 10:21 Emperor: reboot apus frontends T376800
- 10:19 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1289.eqiad.wmnet with OS bookworm
- 10:18 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-redacteddb1001.eqiad.wmnet
- 10:17 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1286.eqiad.wmnet with OS bookworm
- 10:11 jynus@cumin1002: dbctl commit (dc=all): 'promoting pc1014 as the master of pc5 T378068', diff saved to https://phabricator.wikimedia.org/P70584 and previous config saved to /var/cache/conftool/dbconfig/20241024-101150-jynus.json
- 10:08 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1285.eqiad.wmnet with reason: host reimage
- 10:03 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1288.eqiad.wmnet with reason: host reimage
- 10:03 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1014.eqiad.wmnet with reason: moved pc number
- 10:03 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1014.eqiad.wmnet with reason: moved pc number
- 10:00 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1289.eqiad.wmnet with reason: host reimage
- 09:59 jynus: restart pc1014 T378068
- 09:57 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1286.eqiad.wmnet with reason: host reimage
- 09:57 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1289.eqiad.wmnet with reason: host reimage
- 09:55 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1288.eqiad.wmnet with reason: host reimage
- 09:54 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1285.eqiad.wmnet with reason: host reimage
- 09:54 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1286.eqiad.wmnet with reason: host reimage
- 09:37 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1289.eqiad.wmnet with OS bookworm
- 09:35 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1288.eqiad.wmnet with OS bookworm
- 09:35 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1286.eqiad.wmnet with OS bookworm
- 09:34 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1285.eqiad.wmnet with OS bookworm
- 09:28 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bookworm
- 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb1003.eqiad.wmnet
- 09:25 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[1285-1286,1288-1289].eqiad.wmnet
- 09:23 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
- 09:22 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
- 09:22 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[1285-1286,1288-1289].eqiad.wmnet
- 09:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb1003.eqiad.wmnet
- 09:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb2003.codfw.wmnet
- 09:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb2003.codfw.wmnet
- 09:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on pc[1014,1017].eqiad.wmnet with reason: pc maintenance T378068
- 09:12 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on pc[1014,1017].eqiad.wmnet with reason: pc maintenance T378068
- 08:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P70582 and previous config saved to /var/cache/conftool/dbconfig/20241024-083027-arnaudb.json
- 08:30 kevinbazira@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
- 08:27 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
- 08:23 moritzm: installing bash/zsh updates from bookworm point release
- 08:23 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
- 08:22 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master
- 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors
- 08:17 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors
- 08:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P70581 and previous config saved to /var/cache/conftool/dbconfig/20241024-081520-arnaudb.json
- 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors
- 08:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors
- 08:13 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master
- 08:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2035.codfw.wmnet
- 08:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2035.codfw.wmnet
- 08:01 moritzm: installing libssh2 security updates
- 08:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2035.codfw.wmnet
- 08:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P70580 and previous config saved to /var/cache/conftool/dbconfig/20241024-080013-arnaudb.json
- 08:00 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
- 07:59 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
- 07:57 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2035.codfw.wmnet
- 07:56 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
- 07:56 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
- 07:55 elukey: restart ircstream on irc.wikimedia.org to remove a performance experiment
- 07:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P70579 and previous config saved to /var/cache/conftool/dbconfig/20241024-074506-arnaudb.json
- 07:34 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-pii (exit_code=0) Checking PII for wikis annwiki in section s5
- 07:33 arnaudb@cumin1002: START - Cookbook sre.mysql.sanitize-pii Checking PII for wikis annwiki in section s5
- 07:32 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-pii (exit_code=0) Setting up permissions and view database PII for wikis annwiki in section s5
- 07:32 arnaudb@cumin1002: START - Cookbook sre.mysql.sanitize-pii Setting up permissions and view database PII for wikis annwiki in section s5
- 07:15 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti2039.codfw.wmnet
- 07:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2039.codfw.wmnet
- 06:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P70578 and previous config saved to /var/cache/conftool/dbconfig/20241024-064440-arnaudb.json
- 06:44 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
- 06:44 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
- 06:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70577 and previous config saved to /var/cache/conftool/dbconfig/20241024-064418-arnaudb.json
- 06:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70576 and previous config saved to /var/cache/conftool/dbconfig/20241024-062910-arnaudb.json
- 06:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70575 and previous config saved to /var/cache/conftool/dbconfig/20241024-061403-arnaudb.json
- 05:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70574 and previous config saved to /var/cache/conftool/dbconfig/20241024-055856-arnaudb.json
- 04:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70573 and previous config saved to /var/cache/conftool/dbconfig/20241024-045830-arnaudb.json
- 04:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 04:58 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 04:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
- 04:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
- 03:57 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 03:57 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 03:57 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 03:56 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 03:56 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 03:55 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 03:55 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 03:55 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 03:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 03:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 03:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 03:53 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 02:15 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye
- 01:01 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
- 00:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 00:44 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on gerrit2003.wikimedia.org with reason: in setup and T338470
- 00:44 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on gerrit2003.wikimedia.org with reason: in setup and T338470
- 00:26 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on gerrit2003.wikimedia.org with reason: reboot
- 00:26 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on gerrit2003.wikimedia.org with reason: reboot
- 00:26 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 00:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 00:22 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 00:22 mutante: gerrit2003 rebooting for T338470
- 00:15 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 00:14 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 00:05 dzahn@cumin2002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: security release 20241023
2024-10-23
- 23:47 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-eqiad: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002
- 23:46 reedy@deploy2002: Finished scap sync-world: T378006 (duration: 07m 09s)
- 23:39 reedy@deploy2002: Started scap sync-world: T378006
- 22:21 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-eqiad: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002
- 22:08 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-codfw: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002
- 21:59 urbanecm@deploy2002: Finished scap sync-world: Backport for throttle: Add exemption for WikiArabia (T377957) (duration: 07m 06s)
- 21:52 urbanecm@deploy2002: Started scap sync-world: Backport for throttle: Add exemption for WikiArabia (T377957)
- 21:22 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: security release 20241023
- 21:16 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye
- away: UTC late deploys done
- 21:05 tgr@deploy2002: Finished scap sync-world: Backport for SessionManager: Add more logging when unpersisting invalid sessions (T372702), Log unexpected central session lookup misses (T372702) (duration: 15m 07s)
- 21:02 dzahn@cumin2002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: security release 20241023
- 21:00 tgr@deploy2002: tgr: Continuing with sync
- 20:55 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release 20241023
- 20:53 dzahn@cumin2002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: security release 20241023
- 20:52 tgr@deploy2002: tgr: Backport for SessionManager: Add more logging when unpersisting invalid sessions (T372702), Log unexpected central session lookup misses (T372702) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:50 tgr@deploy2002: Started scap sync-world: Backport for SessionManager: Add more logging when unpersisting invalid sessions (T372702), Log unexpected central session lookup misses (T372702)
- 20:46 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: security release 20241023
- 20:41 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-codfw: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002
- 20:40 eileen: civicrm upgraded from e787e5f2 to 1c6c4e08
- 20:06 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 20:05 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 20:02 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
- 19:56 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 19:55 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 19:46 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 19:46 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 19:35 eileen: civicrm upgraded from ce44ce45 to e787e5f2
- 19:18 dancy@deploy2002: Finished scap sync-world: Backport for Adjust return type documentation on SuggestedEdits (T378003) (duration: 13m 20s)
- 19:13 dancy@deploy2002: dancy: Continuing with sync
- 19:13 dancy@deploy2002: dancy: Backport for Adjust return type documentation on SuggestedEdits (T378003) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 19:09 sukhe: dummy authdns-update run
- 19:04 dancy@deploy2002: Started scap sync-world: Backport for Adjust return type documentation on SuggestedEdits (T378003)
- 18:53 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 18:53 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 18:46 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:26 dancy@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.28 refs T375659
- 18:09 sukhe: running agent on A:dnsbox
- 17:43 urbanecm@deploy2002: Finished scap sync-world: Backport for StructuredTaskMobileArticleTarget: Fix history hacks to avoid firing events (T377907) (duration: 11m 56s)
- 17:38 urbanecm@deploy2002: urbanecm: Continuing with sync
- 17:33 urbanecm@deploy2002: urbanecm: Backport for StructuredTaskMobileArticleTarget: Fix history hacks to avoid firing events (T377907) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 17:31 urbanecm@deploy2002: Started scap sync-world: Backport for StructuredTaskMobileArticleTarget: Fix history hacks to avoid firing events (T377907)
- 17:03 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 17:02 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 17:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 17:00 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 16:58 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 16:57 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 16:52 sukhe: restart ircecho on alerting hosts
- 16:35 sukhe: sudo cumin 'O:alerting_host or O:dnsbox' 'run-puppet-agent'
- 16:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 16:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 16:30 hnowlan@cumin1002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool sessionstore in codfw: sessionstore mesh migration T363996
- 16:25 hnowlan@cumin1002: START - Cookbook sre.discovery.service-route pool sessionstore in codfw: sessionstore mesh migration T363996
- 16:22 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
- 16:22 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply
- 16:21 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
- 16:20 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply
- 16:19 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 16:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 16:15 hnowlan@cumin1002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool sessionstore in codfw: sessionstore mesh migration T363996
- 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 16:09 hnowlan@cumin1002: START - Cookbook sre.discovery.service-route depool sessionstore in codfw: sessionstore mesh migration T363996
- 15:57 btullis@deploy2002: Finished deploy [airflow-dags/analytics_product@ba61f77]: T351388 (duration: 01m 15s)
- 15:56 btullis@deploy2002: Started deploy [airflow-dags/analytics_product@ba61f77]: T351388
- 15:55 btullis@deploy2002: Finished deploy [airflow-dags/platform_eng@ba61f77]: T351388 (duration: 00m 31s)
- 15:55 btullis@deploy2002: Started deploy [airflow-dags/platform_eng@ba61f77]: T351388
- 15:55 btullis@deploy2002: Finished deploy [airflow-dags/research@ba61f77]: T351388 (duration: 00m 45s)
- 15:54 btullis@deploy2002: Started deploy [airflow-dags/research@ba61f77]: T351388
- 15:53 btullis@deploy2002: Finished deploy [airflow-dags/search@ba61f77]: T351388 (duration: 00m 29s)
- 15:53 btullis@deploy2002: Started deploy [airflow-dags/search@ba61f77]: T351388
- 15:52 btullis@deploy2002: Finished deploy [airflow-dags/analytics@ba61f77]: T351388 (duration: 01m 08s)
- 15:51 btullis@deploy2002: Started deploy [airflow-dags/analytics@ba61f77]: T351388
- 15:51 btullis@deploy2002: Finished deploy [airflow-dags/analytics_test@ba61f77]: T351388 (duration: 00m 31s)
- 15:51 btullis@deploy2002: Started deploy [airflow-dags/analytics_test@ba61f77]: T351388
- 15:42 dduvall@deploy2002: Finished deploy [releng/jenkins-deploy@e1c56d1] (releasing): Deploying https://gitlab.wikimedia.org/repos/releng/jenkins-deploy/-/merge_requests/95 (duration: 00m 53s)
- 15:42 dduvall@deploy2002: Started deploy [releng/jenkins-deploy@e1c56d1] (releasing): Deploying https://gitlab.wikimedia.org/repos/releng/jenkins-deploy/-/merge_requests/95
- 15:35 hashar: Restarted CI Jenkins
- 15:28 moritzm: uploaded openjdk-8 8u422-b05-1~deb12u0 for component/jdk for bookworm-wikimedia (bootstrap build since openjdk-8 needs openjdk-8 to build)
- 15:20 dduvall@deploy2002: Finished deploy [releng/jenkins-deploy@d8e345f] (releasing): Deploying https://gitlab.wikimedia.org/repos/releng/jenkins-deploy/-/merge_requests/94 (duration: 01m 05s)
- 15:19 dduvall@deploy2002: Started deploy [releng/jenkins-deploy@d8e345f] (releasing): Deploying https://gitlab.wikimedia.org/repos/releng/jenkins-deploy/-/merge_requests/94
- 15:17 Lucas_WMDE: UTC afternoon backport+config window done
- 15:15 logmsgbot: lucaswerkmeister-wmde Deployed security patch for T377912
- 15:10 volans: uploaded spicerack_8.15.1 to apt.wikimedia.org bullseye-wikimedia
- 15:04 stran@deploy2002: Finished scap sync-world: Backport for Support template overrides in ContributionsPager (T356292), Add source wiki to contributions on Special:GlobalContributions (T356292) (duration: 10m 53s)
- 14:59 stran@deploy2002: stran: Continuing with sync
- 14:55 stran@deploy2002: stran: Backport for Support template overrides in ContributionsPager (T356292), Add source wiki to contributions on Special:GlobalContributions (T356292) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on rdb1014.eqiad.wmnet with reason: Hardware issue
- 14:53 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on rdb1014.eqiad.wmnet with reason: Hardware issue
- 14:53 stran@deploy2002: Started scap sync-world: Backport for Support template overrides in ContributionsPager (T356292), Add source wiki to contributions on Special:GlobalContributions (T356292)
- 14:46 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:46 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:46 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:43 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Enable CampaignEvents collaboration list in testwiki and test2wiki (v2) (T376055) (duration: 17m 47s)
- 14:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:39 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, daimona: Continuing with sync
- 14:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:28 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:28 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, daimona: Backport for Enable CampaignEvents collaboration list in testwiki and test2wiki (v2) (T376055) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:25 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Enable CampaignEvents collaboration list in testwiki and test2wiki (v2) (T376055)
- 14:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:22 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:22 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:21 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:21 tgr@deploy2002: Finished scap sync-world: Backport for Auth: pass accountType to authevents log stream (T341650 T375510 T375505), Auth: pass accountType to authevents log stream (T341650 T375510 T375505) (duration: 13m 23s)
- 14:18 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:18 sukhe: sudo cumin 'O:alerting_host' 'run-puppet-agent'
- 14:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:16 tgr@deploy2002: tgr: Continuing with sync
- 14:14 sukhe: sudo cumin 'A:dnsbox' 'run-puppet-agent'
- 14:10 tgr@deploy2002: tgr: Backport for Auth: pass accountType to authevents log stream (T341650 T375510 T375505), Auth: pass accountType to authevents log stream (T341650 T375510 T375505) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:07 tgr@deploy2002: Started scap sync-world: Backport for Auth: pass accountType to authevents log stream (T341650 T375510 T375505), Auth: pass accountType to authevents log stream (T341650 T375510 T375505)
- 13:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2040.codfw.wmnet
- 13:54 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs and not A:ulsfo and A:lvs
- 13:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2040.codfw.wmnet
- 13:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 13:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 13:37 moritzm: instaling gdk-pixbuf security updates
- 13:34 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
- 13:34 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
- 13:34 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
- 13:33 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
- 13:33 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 13:33 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
- 13:32 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
- 13:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for WikiProjectIDLookup: use SparqlClient and make endpoint configurable (T377746) (duration: 07m 15s)
- 13:27 lucaswerkmeister-wmde@deploy2002: daimona, lucaswerkmeister-wmde: Continuing with sync
- 13:27 lucaswerkmeister-wmde@deploy2002: daimona, lucaswerkmeister-wmde: Backport for WikiProjectIDLookup: use SparqlClient and make endpoint configurable (T377746) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:26 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs and not A:ulsfo and A:lvs
- 13:24 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for WikiProjectIDLookup: use SparqlClient and make endpoint configurable (T377746)
- 13:18 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs and A:ulsfo and A:lvs
- 13:15 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs and A:ulsfo and A:lvs
- 13:13 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 13:13 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 13:12 moritzm: installing qemu security updates
- 13:12 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 13:11 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 13:09 sukhe: running agent on A:lvs to roll out CR 1082238
- 13:02 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2040.codfw.wmnet to cluster codfw and group C
- 12:53 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2040.codfw.wmnet to cluster codfw and group C
- 12:26 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2039.codfw.wmnet to cluster codfw and group C
- 12:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2039.codfw.wmnet to cluster codfw and group C
- 12:16 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2039.codfw.wmnet to cluster codfw and group C
- 12:16 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2039.codfw.wmnet to cluster codfw and group C
- 11:49 dreamyjazz@deploy2002: Finished scap sync-world: Backport for recentchanges: Use current time for imported revision category changes (T377932) (duration: 07m 26s)
- 11:44 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
- 11:44 dreamyjazz@deploy2002: dreamyjazz: Backport for recentchanges: Use current time for imported revision category changes (T377932) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 11:41 dreamyjazz@deploy2002: Started scap sync-world: Backport for recentchanges: Use current time for imported revision category changes (T377932)
- 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2012.codfw.wmnet
- 11:11 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
- 11:11 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/zotero: apply
- 11:09 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
- 11:09 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/zotero: apply
- 11:05 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply
- 11:05 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply
- 10:53 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
- 10:51 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
- 10:45 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
- 10:45 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
- 10:43 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 10:43 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
- 10:14 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
- 10:13 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
- 10:13 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 10:13 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
- 10:12 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
- 10:12 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
- 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2040.codfw.wmnet
- 10:05 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
- 10:03 Dreamy_Jazz: Restarted MediaModeration scanning script for commonswiki - https://wikitech.wikimedia.org/wiki/MediaModeration
- 09:59 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
- 09:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2040.codfw.wmnet
- 09:42 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
- 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2039.codfw.wmnet
- 09:34 volans@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1185 gradually with 4 steps - Testing new cookbook
- 09:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2039.codfw.wmnet
- 09:30 volans@cumin1002: START - Cookbook sre.mysql.pool db1185 gradually with 4 steps - Testing new cookbook
- 09:29 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
- 09:29 volans@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1185 - Testing new cookbook
- 09:29 volans@cumin1002: START - Cookbook sre.mysql.depool db1185 - Testing new cookbook
- 09:24 volans@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1185 gradually with 4 steps - Testing new cookbook
- 09:24 volans@cumin1002: START - Cookbook sre.mysql.pool db1185 gradually with 4 steps - Testing new cookbook
- 09:09 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner2004.codfw.wmnet with OS bullseye
- 09:02 Tran: UTC morning deploys done
- 08:48 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:48 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:32 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner2004.codfw.wmnet with reason: host reimage
- 08:29 moritzm: installing Java 11 security updates
- 08:28 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner2004.codfw.wmnet with reason: host reimage
- 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1052.eqiad.wmnet
- 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: new JDK - jmm@cumin2002
- 08:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1052.eqiad.wmnet
- 08:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1051.eqiad.wmnet
- 08:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1051.eqiad.wmnet
- 08:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1050.eqiad.wmnet
- 08:12 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host gitlab-runner2004
- 08:12 jelto@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host gitlab-runner2004
- 08:12 jelto@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host gitlab-runner2004
- 08:12 jelto@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) gitlab-runner2004.codfw.wmnet 71.48.192.10.in-addr.arpa 1.7.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 08:12 jelto@cumin1002: START - Cookbook sre.dns.wipe-cache gitlab-runner2004.codfw.wmnet 71.48.192.10.in-addr.arpa 1.7.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 08:12 jelto@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 08:11 jelto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host gitlab-runner2004 - jelto@cumin1002"
- 08:11 jelto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host gitlab-runner2004 - jelto@cumin1002"
- 08:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1050.eqiad.wmnet
- 08:08 jelto@cumin1002: START - Cookbook sre.dns.netbox
- 08:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet
- 08:07 jelto@cumin1002: START - Cookbook sre.hosts.move-vlan for host gitlab-runner2004
- 08:07 jelto@cumin1002: START - Cookbook sre.hosts.reimage for host gitlab-runner2004.codfw.wmnet with OS bullseye
- 08:06 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: new JDK - jmm@cumin2002
- 08:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet
- 07:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: reboot
- 07:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: reboot
- 07:52 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner2003.codfw.wmnet with OS bullseye
- 07:35 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner2003.codfw.wmnet with reason: host reimage
- 07:33 moritzm: installing perf updates on bookworm nodes
- 07:32 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner2003.codfw.wmnet with reason: host reimage
- 07:24 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2012.codfw.wmnet
- 07:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd2002.codfw.wmnet to plain
- 07:23 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd2002.codfw.wmnet to plain
- 07:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2012.codfw.wmnet
- 07:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2012.codfw.wmnet
- 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd2002.codfw.wmnet to drbd
- 07:15 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host gitlab-runner2003
- 07:15 jelto@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host gitlab-runner2003
- 07:15 jelto@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host gitlab-runner2003
- 07:15 jelto@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) gitlab-runner2003.codfw.wmnet 93.32.192.10.in-addr.arpa 3.9.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 07:15 jelto@cumin1002: START - Cookbook sre.dns.wipe-cache gitlab-runner2003.codfw.wmnet 93.32.192.10.in-addr.arpa 3.9.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 07:15 jelto@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 07:15 jelto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host gitlab-runner2003 - jelto@cumin1002"
- 07:15 jelto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host gitlab-runner2003 - jelto@cumin1002"
- 07:12 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd2002.codfw.wmnet to drbd
- 07:11 jelto@cumin1002: START - Cookbook sre.dns.netbox
- 07:11 jelto@cumin1002: START - Cookbook sre.hosts.move-vlan for host gitlab-runner2003
- 07:10 jelto@cumin1002: START - Cookbook sre.hosts.reimage for host gitlab-runner2003.codfw.wmnet with OS bullseye
- 06:48 kart_: Updated cxserver to 2024-10-23-055433-production
- 06:47 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
- 06:47 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
- 06:45 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
- 06:44 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
- 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
- 06:44 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
- 06:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2012.codfw.wmnet
- 06:35 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2012.codfw.wmnet
- 05:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast3007.wikimedia.org
- 05:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast3007.wikimedia.org
- 04:18 eileen: civicrm upgraded from de642bea to ce44ce45
- 00:01 ejegg: fundraising civicrm upgraded from 5463f37b to de642bea
2024-10-22
- 23:32 ejegg: fundraising civicrm upgraded from d9e85c3d to 5463f37b
- 22:59 ejegg: fundraising civicrm upgraded from 36660cb3 to d9e85c3d
- 22:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P70562 and previous config saved to /var/cache/conftool/dbconfig/20241022-223858-ladsgroup.json
- 22:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P70561 and previous config saved to /var/cache/conftool/dbconfig/20241022-222352-ladsgroup.json
- 22:11 zabe@deploy2002: Finished scap sync-world: Backport for s1: Reduce revision-slots cache expiry to 60 seconds (T183490) (duration: 07m 17s)
- 22:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P70560 and previous config saved to /var/cache/conftool/dbconfig/20241022-220847-ladsgroup.json
- 22:07 zabe@deploy2002: zabe: Continuing with sync
- 22:06 zabe@deploy2002: zabe: Backport for s1: Reduce revision-slots cache expiry to 60 seconds (T183490) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:03 zabe@deploy2002: Started scap sync-world: Backport for s1: Reduce revision-slots cache expiry to 60 seconds (T183490)
- 21:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T367856)', diff saved to https://phabricator.wikimedia.org/P70559 and previous config saved to /var/cache/conftool/dbconfig/20241022-215137-ladsgroup.json
- 21:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncmonitor1001.eqiad.wmnet
- 21:44 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host ncmonitor1001.eqiad.wmnet
- 21:44 dancy@deploy2002: Installation of scap version "4.117.0" completed for 209 hosts
- 21:40 dancy@deploy2002: Installing scap version "4.117.0" for 209 hosts
- 21:01 dduvall@deploy2002: Finished deploy [releng/jenkins-deploy@b08d130] (releasing): Deploying changes to single-version MediaWiki image build (duration: 01m 44s)
- 21:00 dduvall@deploy2002: Started deploy [releng/jenkins-deploy@b08d130] (releasing): Deploying changes to single-version MediaWiki image build
- 20:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
- 20:32 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
- 20:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
- 20:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
- 20:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T376905)', diff saved to https://phabricator.wikimedia.org/P70558 and previous config saved to /var/cache/conftool/dbconfig/20241022-202717-ladsgroup.json
- 20:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P70557 and previous config saved to /var/cache/conftool/dbconfig/20241022-201210-ladsgroup.json
- 19:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P70556 and previous config saved to /var/cache/conftool/dbconfig/20241022-195703-ladsgroup.json
- 19:54 swfrench-wmf: running puppet on A:cp-text (-b11) after validating ATS Lua changes on cp4040 - T372605
- 19:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T376905)', diff saved to https://phabricator.wikimedia.org/P70555 and previous config saved to /var/cache/conftool/dbconfig/20241022-194156-ladsgroup.json
- 19:40 swfrench-wmf: disabling puppet on A:cp-text before merging ATS Lua changes - T372605
- 19:39 ladsgroup@deploy2002: Finished scap sync-world: Backport for Fix duplicated key in wgVectorNightMode (duration: 07m 51s)
- 19:36 ladsgroup@deploy2002: ladsgroup, ebrahim: Continuing with sync
- 19:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T376905)', diff saved to https://phabricator.wikimedia.org/P70554 and previous config saved to /var/cache/conftool/dbconfig/20241022-193352-ladsgroup.json
- 19:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1235.eqiad.wmnet with reason: Maintenance
- 19:34 ladsgroup@deploy2002: ladsgroup, ebrahim: Backport for Fix duplicated key in wgVectorNightMode synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 19:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1235.eqiad.wmnet with reason: Maintenance
- 19:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T376905)', diff saved to https://phabricator.wikimedia.org/P70553 and previous config saved to /var/cache/conftool/dbconfig/20241022-193327-ladsgroup.json
- 19:31 ladsgroup@deploy2002: Started scap sync-world: Backport for Fix duplicated key in wgVectorNightMode
- 19:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P70552 and previous config saved to /var/cache/conftool/dbconfig/20241022-191820-ladsgroup.json
- 19:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P70551 and previous config saved to /var/cache/conftool/dbconfig/20241022-190313-ladsgroup.json
- 19:00 dduvall@deploy2002: Installation of scap version "4.116.0" completed for 209 hosts
- 18:56 dduvall@deploy2002: Installing scap version "4.116.0" for 209 hosts
- 18:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P70550 and previous config saved to /var/cache/conftool/dbconfig/20241022-184946-arnaudb.json
- 18:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T376905)', diff saved to https://phabricator.wikimedia.org/P70549 and previous config saved to /var/cache/conftool/dbconfig/20241022-184806-ladsgroup.json
- 18:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T376905)', diff saved to https://phabricator.wikimedia.org/P70548 and previous config saved to /var/cache/conftool/dbconfig/20241022-183955-ladsgroup.json
- 18:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: Maintenance
- 18:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: Maintenance
- 18:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T376905)', diff saved to https://phabricator.wikimedia.org/P70547 and previous config saved to /var/cache/conftool/dbconfig/20241022-183930-ladsgroup.json
- 18:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P70546 and previous config saved to /var/cache/conftool/dbconfig/20241022-183440-arnaudb.json
- 18:26 dancy@deploy2002: sync-world aborted: Refreshing (duration: 01m 33s)
- 18:24 dancy@deploy2002: Started scap sync-world: Refreshing
- 18:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P70544 and previous config saved to /var/cache/conftool/dbconfig/20241022-182423-ladsgroup.json
- 18:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P70543 and previous config saved to /var/cache/conftool/dbconfig/20241022-181933-arnaudb.json
- 18:17 dancy@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.28 refs T375659
- 18:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P70542 and previous config saved to /var/cache/conftool/dbconfig/20241022-180916-ladsgroup.json
- 18:09 dancy@deploy2002: Finished scap sync-world: Backport for Prevent blocked users from being able to review/unreview articles (T366991) (duration: 07m 26s)
- 18:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P70541 and previous config saved to /var/cache/conftool/dbconfig/20241022-180426-arnaudb.json
- 18:04 dancy@deploy2002: dancy, sbassett: Continuing with sync
- 18:04 dancy@deploy2002: dancy, sbassett: Backport for Prevent blocked users from being able to review/unreview articles (T366991) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 18:01 dancy@deploy2002: Started scap sync-world: Backport for Prevent blocked users from being able to review/unreview articles (T366991)
- 17:54 sukhe: sudo cumin -b4 "A:cp-upload" 'run-puppet-agent --enable "merging CR 1078994"': T375761
- 17:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T376905)', diff saved to https://phabricator.wikimedia.org/P70540 and previous config saved to /var/cache/conftool/dbconfig/20241022-175409-ladsgroup.json
- 17:50 dduvall@deploy2002: Finished deploy [releng/jenkins-deploy@16eb792] (releasing): Deploying https://gitlab.wikimedia.org/repos/releng/jenkins-deploy/-/merge_requests/90 (duration: 01m 21s)
- 17:49 dduvall@deploy2002: Started deploy [releng/jenkins-deploy@16eb792] (releasing): Deploying https://gitlab.wikimedia.org/repos/releng/jenkins-deploy/-/merge_requests/90
- 17:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T376905)', diff saved to https://phabricator.wikimedia.org/P70539 and previous config saved to /var/cache/conftool/dbconfig/20241022-174555-ladsgroup.json
- 17:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: Maintenance
- 17:45 sukhe: sudo cumin "A:cp-upload" 'disable-puppet "merging CR 1078994"': T375761
- 17:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: Maintenance
- 17:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T376905)', diff saved to https://phabricator.wikimedia.org/P70538 and previous config saved to /var/cache/conftool/dbconfig/20241022-174530-ladsgroup.json
- 17:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P70537 and previous config saved to /var/cache/conftool/dbconfig/20241022-173022-ladsgroup.json
- 17:30 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2014.codfw.wmnet
- 17:23 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs2014.codfw.wmnet
- 17:18 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs2014.codfw.wmnet with reason: rebooting to test changes rolled out in CR 1006063
- 17:17 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on lvs2014.codfw.wmnet with reason: rebooting to test changes rolled out in CR 1006063
- 17:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P70536 and previous config saved to /var/cache/conftool/dbconfig/20241022-171515-ladsgroup.json
- 17:14 sukhe: re-enable Puppet on A:lvs [change merged on lvs2014]: T358260
- 17:09 hnowlan@cumin1002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool sessionstore in eqiad: repooling sessionstore post mesh migration T363996
- 17:04 hnowlan@cumin1002: START - Cookbook sre.discovery.service-route pool sessionstore in eqiad: repooling sessionstore post mesh migration T363996
- 17:04 sukhe: disable Puppet on A:lvs to merge 1006063: T358260
- 17:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P70535 and previous config saved to /var/cache/conftool/dbconfig/20241022-170400-arnaudb.json
- 17:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
- 17:03 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
- 17:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P70534 and previous config saved to /var/cache/conftool/dbconfig/20241022-170337-arnaudb.json
- 17:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T376905)', diff saved to https://phabricator.wikimedia.org/P70533 and previous config saved to /var/cache/conftool/dbconfig/20241022-170008-ladsgroup.json
- 16:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1219 (T376905)', diff saved to https://phabricator.wikimedia.org/P70532 and previous config saved to /var/cache/conftool/dbconfig/20241022-165211-ladsgroup.json
- 16:52 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1176.eqiad.wmnet
- 16:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
- 16:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
- 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T376905)', diff saved to https://phabricator.wikimedia.org/P70531 and previous config saved to /var/cache/conftool/dbconfig/20241022-165147-ladsgroup.json
- 16:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P70530 and previous config saved to /var/cache/conftool/dbconfig/20241022-164830-arnaudb.json
- 16:47 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
- 16:46 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
- 16:46 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye
- 16:44 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
- 16:44 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
- 16:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P70529 and previous config saved to /var/cache/conftool/dbconfig/20241022-163639-ladsgroup.json
- 16:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P70528 and previous config saved to /var/cache/conftool/dbconfig/20241022-163323-arnaudb.json
- 16:31 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1176.eqiad.wmnet
- 16:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P70527 and previous config saved to /var/cache/conftool/dbconfig/20241022-162132-ladsgroup.json
- 16:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P70526 and previous config saved to /var/cache/conftool/dbconfig/20241022-161816-arnaudb.json
- 16:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P70525 and previous config saved to /var/cache/conftool/dbconfig/20241022-161604-arnaudb.json
- 16:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2172.codfw.wmnet with reason: Maintenance
- 16:15 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2172.codfw.wmnet with reason: Maintenance
- 16:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70524 and previous config saved to /var/cache/conftool/dbconfig/20241022-161552-arnaudb.json
- 16:14 hnowlan@cumin1002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool sessionstore in eqiad: testing sessionstore mesh migration
- 16:08 hnowlan@cumin1002: START - Cookbook sre.discovery.service-route depool sessionstore in eqiad: testing sessionstore mesh migration
- 16:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T376905)', diff saved to https://phabricator.wikimedia.org/P70523 and previous config saved to /var/cache/conftool/dbconfig/20241022-160625-ladsgroup.json
- 16:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70522 and previous config saved to /var/cache/conftool/dbconfig/20241022-160045-arnaudb.json
- 15:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5004.wikimedia.org
- 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1218 (T376905)', diff saved to https://phabricator.wikimedia.org/P70521 and previous config saved to /var/cache/conftool/dbconfig/20241022-155824-ladsgroup.json
- 15:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
- 15:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
- 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T376905)', diff saved to https://phabricator.wikimedia.org/P70520 and previous config saved to /var/cache/conftool/dbconfig/20241022-155759-ladsgroup.json
- 15:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2011.codfw.wmnet
- 15:54 cjming@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
- 15:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5004.wikimedia.org
- 15:53 cjming@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
- 15:53 hnowlan@cumin1002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) check sessionstore: maintenance
- 15:53 hnowlan@cumin1002: START - Cookbook sre.discovery.service-route check sessionstore: maintenance
- 15:52 cjming@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
- 15:52 cjming@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
- 15:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70519 and previous config saved to /var/cache/conftool/dbconfig/20241022-154538-arnaudb.json
- 15:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P70518 and previous config saved to /var/cache/conftool/dbconfig/20241022-154251-ladsgroup.json
- 15:39 sbassett@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 15:38 sbassett@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 15:38 sbassett@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 15:38 sbassett@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 15:38 sbassett@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 15:38 sbassett@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 15:37 sbassett@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 15:37 sbassett@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 15:36 sbassett@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 15:36 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 15:36 sbassett@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 15:35 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 15:32 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
- 15:31 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 15:30 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 15:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70517 and previous config saved to /var/cache/conftool/dbconfig/20241022-153031-arnaudb.json
- 15:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P70516 and previous config saved to /var/cache/conftool/dbconfig/20241022-152743-ladsgroup.json
- 15:19 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 15:19 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 15:18 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 15:18 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 15:15 aqu: Deployed refinery using scap, then deployed onto hdfs
- 15:14 cgoubert@cumin1002: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) check for host kubestagemaster2003.codfw.wmnet
- 15:14 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node check for host kubestagemaster2003.codfw.wmnet
- 15:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T376905)', diff saved to https://phabricator.wikimedia.org/P70515 and previous config saved to /var/cache/conftool/dbconfig/20241022-151237-ladsgroup.json
- 15:11 gmodena@deploy2002: Finished deploy [airflow-dags/analytics@7c2d65f]: DPE 2024-10-22 deployment train (duration: 01m 16s)
- 15:10 gmodena@deploy2002: Started deploy [airflow-dags/analytics@7c2d65f]: DPE 2024-10-22 deployment train
- 15:09 brennen@deploy2002: Finished deploy [phabricator/deployment@582cde5]: deploy phab1004 for T377850 (duration: 01m 04s)
- 15:08 brennen@deploy2002: Started deploy [phabricator/deployment@582cde5]: deploy phab1004 for T377850
- 15:07 brennen@deploy2002: Finished deploy [phabricator/deployment@582cde5]: test deploy phab2002 for T377850 (may fail, expected) (duration: 00m 24s)
- 15:07 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:07 eoghan@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator deployment
- 15:07 eoghan@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator deployment
- 15:07 brennen@deploy2002: Started deploy [phabricator/deployment@582cde5]: test deploy phab2002 for T377850 (may fail, expected)
- 15:06 eoghan@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on phabricator.wikimedia.org with reason: Phabricator deployment
- 15:06 eoghan@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phabricator.wikimedia.org with reason: Phabricator deployment
- 15:06 eoghan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator deployment
- 15:06 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:06 eoghan@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator deployment
- 15:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1207 (T376905)', diff saved to https://phabricator.wikimedia.org/P70514 and previous config saved to /var/cache/conftool/dbconfig/20241022-150435-ladsgroup.json
- 15:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
- 15:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
- 15:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T376905)', diff saved to https://phabricator.wikimedia.org/P70513 and previous config saved to /var/cache/conftool/dbconfig/20241022-150409-ladsgroup.json
- 14:57 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 14:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 100%: T377718', diff saved to https://phabricator.wikimedia.org/P70512 and previous config saved to /var/cache/conftool/dbconfig/20241022-145653-arnaudb.json
- 14:53 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 14:52 hashar@deploy2002: Finished deploy [gerrit/gerrit@30691f2]: Update patch demo to recognize both legacy and new URLs - T374954 (duration: 00m 10s)
- 14:52 hashar@deploy2002: Started deploy [gerrit/gerrit@30691f2]: Update patch demo to recognize both legacy and new URLs - T374954
- 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.netbox.restart-reboot (exit_code=0) rolling reboot on A:netbox
- 14:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P70511 and previous config saved to /var/cache/conftool/dbconfig/20241022-144902-ladsgroup.json
- 14:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 75%: T377718', diff saved to https://phabricator.wikimedia.org/P70510 and previous config saved to /var/cache/conftool/dbconfig/20241022-144148-arnaudb.json
- 14:40 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox.discovery.wmnet. on all recursors
- 14:40 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netbox.discovery.wmnet. on all recursors
- 14:37 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:37 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2084 to codfw - jhancock@cumin2002"
- 14:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2084 to codfw - jhancock@cumin2002"
- 14:36 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 100%: post clone', diff saved to https://phabricator.wikimedia.org/P70509 and previous config saved to /var/cache/conftool/dbconfig/20241022-143628-arnaudb.json
- 14:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox.discovery.wmnet. on all recursors
- 14:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netbox.discovery.wmnet. on all recursors
- 14:34 jmm@cumin2002: START - Cookbook sre.netbox.restart-reboot rolling reboot on A:netbox
- 14:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P70507 and previous config saved to /var/cache/conftool/dbconfig/20241022-143355-ladsgroup.json
- 14:32 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Fix performer link on Special:GlobalBlockList (T377398) (duration: 07m 43s)
- 14:31 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 14:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70506 and previous config saved to /var/cache/conftool/dbconfig/20241022-143005-arnaudb.json
- 14:30 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 14:29 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 14:29 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
- 14:29 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
- 14:27 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
- 14:27 dreamyjazz@deploy2002: dreamyjazz: Backport for Fix performer link on Special:GlobalBlockList (T377398) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 50%: T377718', diff saved to https://phabricator.wikimedia.org/P70505 and previous config saved to /var/cache/conftool/dbconfig/20241022-142642-arnaudb.json
- 14:24 dreamyjazz@deploy2002: Started scap sync-world: Backport for Fix performer link on Special:GlobalBlockList (T377398)
- 14:21 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 75%: post clone', diff saved to https://phabricator.wikimedia.org/P70504 and previous config saved to /var/cache/conftool/dbconfig/20241022-142123-arnaudb.json
- 14:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T376905)', diff saved to https://phabricator.wikimedia.org/P70503 and previous config saved to /var/cache/conftool/dbconfig/20241022-141848-ladsgroup.json
- 14:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 25%: T377718', diff saved to https://phabricator.wikimedia.org/P70502 and previous config saved to /var/cache/conftool/dbconfig/20241022-141137-arnaudb.json
- 14:10 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2011.codfw.wmnet
- 14:10 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2011.codfw.wmnet
- 14:10 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2011.codfw.wmnet
- 14:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T376905)', diff saved to https://phabricator.wikimedia.org/P70501 and previous config saved to /var/cache/conftool/dbconfig/20241022-140956-ladsgroup.json
- 14:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
- 14:09 ejegg: payments-wiki upgraded from 7ae3479f to a039cd50
- 14:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
- 14:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T376905)', diff saved to https://phabricator.wikimedia.org/P70500 and previous config saved to /var/cache/conftool/dbconfig/20241022-140931-ladsgroup.json
- 14:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2011.codfw.wmnet
- 14:06 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 50%: post clone', diff saved to https://phabricator.wikimedia.org/P70499 and previous config saved to /var/cache/conftool/dbconfig/20241022-140617-arnaudb.json
- 14:03 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2011.codfw.wmnet
- 13:59 moritzm: rebalance ganeti clusters in magru following reboots
- 13:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7001.magru.wmnet
- 13:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7001.magru.wmnet
- 13:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 10%: T377718', diff saved to https://phabricator.wikimedia.org/P70498 and previous config saved to /var/cache/conftool/dbconfig/20241022-135631-arnaudb.json
- 13:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P70497 and previous config saved to /var/cache/conftool/dbconfig/20241022-135424-ladsgroup.json
- 13:52 Lucas_WMDE: UTC afternoon backport+window done (a further GlobalBlocking fix will be backported out-of-window soon)
- 13:51 aqu@deploy2002: Finished deploy [analytics/refinery@ffc985a] (hadoop-test): Adding refinery/source 0.2.49.2 & 0.2.53 [analytics/refinery@ffc985a7] (duration: 03m 17s)
- 13:51 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 25%: post clone', diff saved to https://phabricator.wikimedia.org/P70496 and previous config saved to /var/cache/conftool/dbconfig/20241022-135112-arnaudb.json
- 13:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7001.magru.wmnet
- 13:48 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7001.magru.wmnet
- 13:48 aqu@deploy2002: Started deploy [analytics/refinery@ffc985a] (hadoop-test): Adding refinery/source 0.2.49.2 & 0.2.53 [analytics/refinery@ffc985a7]
- 13:48 aqu@deploy2002: Finished deploy [analytics/refinery@ffc985a] (thin): Adding refinery/source 0.2.49.2 & 0.2.53 [analytics/refinery@ffc985a7] (duration: 00m 07s)
- 13:48 aqu@deploy2002: Started deploy [analytics/refinery@ffc985a] (thin): Adding refinery/source 0.2.49.2 & 0.2.53 [analytics/refinery@ffc985a7]
- 13:47 aqu@deploy2002: Finished deploy [analytics/refinery@ffc985a] (thin): Adding refinery/source 0.2.49.2 & 0.2.53 [analytics/refinery@ffc985a7] (duration: 00m 57s)
- 13:47 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 13:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 13:46 aqu@deploy2002: Started deploy [analytics/refinery@ffc985a] (thin): Adding refinery/source 0.2.49.2 & 0.2.53 [analytics/refinery@ffc985a7]
- 13:45 aqu@deploy2002: deploy aborted: Adding refinery/source 0.2.49.2 & 0.2.53 [analytics/refinery@ffc985a7] (duration: 03m 50s)
- 13:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 13:44 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Activate feature flag to default move wikibase sidebar link to other projects. (T66315) (duration: 08m 40s)
- 13:41 aqu@deploy2002: Started deploy [analytics/refinery@ffc985a] (thin): Adding refinery/source 0.2.49.2 & 0.2.53 [analytics/refinery@ffc985a7]
- 13:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 5%: T377718', diff saved to https://phabricator.wikimedia.org/P70495 and previous config saved to /var/cache/conftool/dbconfig/20241022-134126-arnaudb.json
- 13:40 lucaswerkmeister-wmde@deploy2002: joelyrookewmde, lucaswerkmeister-wmde: Continuing with sync
- 13:39 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner2002.codfw.wmnet with OS bullseye
- 13:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P70494 and previous config saved to /var/cache/conftool/dbconfig/20241022-133916-ladsgroup.json
- 13:39 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 13:37 lucaswerkmeister-wmde@deploy2002: joelyrookewmde, lucaswerkmeister-wmde: Backport for Activate feature flag to default move wikibase sidebar link to other projects. (T66315) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:35 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Activate feature flag to default move wikibase sidebar link to other projects. (T66315)
- 13:34 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2149.codfw.wmnet onto db2227.codfw.wmnet
- 13:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7002.magru.wmnet
- 13:32 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Don't escape performer link HTML in GlobalBlockDetailsRenderer (T377398) (duration: 15m 27s)
- 13:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2011.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 13:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve2011.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 13:29 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2011.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 13:27 arnaudb@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 100%: T377718', diff saved to https://phabricator.wikimedia.org/P70493 and previous config saved to /var/cache/conftool/dbconfig/20241022-132745-arnaudb.json
- 13:25 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, dreamyjazz: Continuing with sync
- 13:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7002.magru.wmnet
- 13:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T376905)', diff saved to https://phabricator.wikimedia.org/P70492 and previous config saved to /var/cache/conftool/dbconfig/20241022-132409-ladsgroup.json
- 13:23 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve2011.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 13:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2003.codfw.wmnet
- 13:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2003.codfw.wmnet
- 13:19 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2085-2086,2088-2089].codfw.wmnet
- 13:19 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, dreamyjazz: Backport for Don't escape performer link HTML in GlobalBlockDetailsRenderer (T377398) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:19 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2085-2086,2088-2089].codfw.wmnet
- 13:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Don't escape performer link HTML in GlobalBlockDetailsRenderer (T377398)
- 13:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1196 (T376905)', diff saved to https://phabricator.wikimedia.org/P70491 and previous config saved to /var/cache/conftool/dbconfig/20241022-131448-ladsgroup.json
- 13:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 13:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 13:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
- 13:14 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Release CampaignEvents to eswiki (T376786) (duration: 09m 35s)
- 13:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
- 13:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T376905)', diff saved to https://phabricator.wikimedia.org/P70490 and previous config saved to /var/cache/conftool/dbconfig/20241022-131415-ladsgroup.json
- 13:14 aqu@deploy2002: Finished deploy [analytics/refinery@ffc985a]: Adding refinery/source 0.2.49.2 & 0.2.53 [analytics/refinery@ffc985a7] (duration: 19m 41s)
- 13:12 arnaudb@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 75%: T377718', diff saved to https://phabricator.wikimedia.org/P70489 and previous config saved to /var/cache/conftool/dbconfig/20241022-131239-arnaudb.json
- 13:09 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync
- 13:07 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for Release CampaignEvents to eswiki (T376786) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:04 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Release CampaignEvents to eswiki (T376786)
- 13:02 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner2002.codfw.wmnet with reason: host reimage
- 12:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P70488 and previous config saved to /var/cache/conftool/dbconfig/20241022-125908-ladsgroup.json
- 12:58 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner2002.codfw.wmnet with reason: host reimage
- 12:57 arnaudb@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 50%: T377718', diff saved to https://phabricator.wikimedia.org/P70487 and previous config saved to /var/cache/conftool/dbconfig/20241022-125734-arnaudb.json
- 12:55 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2089.codfw.wmnet with OS bookworm
- 12:54 aqu@deploy2002: Started deploy [analytics/refinery@ffc985a]: Adding refinery/source 0.2.49.2 & 0.2.53 [analytics/refinery@ffc985a7]
- 12:53 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2086.codfw.wmnet with OS bookworm
- 12:50 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2085.codfw.wmnet with OS bookworm
- 12:45 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2088.codfw.wmnet with OS bookworm
- 12:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P70486 and previous config saved to /var/cache/conftool/dbconfig/20241022-124401-ladsgroup.json
- 12:42 arnaudb@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 25%: T377718', diff saved to https://phabricator.wikimedia.org/P70485 and previous config saved to /var/cache/conftool/dbconfig/20241022-124228-arnaudb.json
- 12:42 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host gitlab-runner2002
- 12:42 jelto@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host gitlab-runner2002
- 12:41 jelto@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host gitlab-runner2002
- 12:41 jelto@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) gitlab-runner2002.codfw.wmnet 161.16.192.10.in-addr.arpa 1.6.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 12:41 jelto@cumin1002: START - Cookbook sre.dns.wipe-cache gitlab-runner2002.codfw.wmnet 161.16.192.10.in-addr.arpa 1.6.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 12:41 jelto@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:41 jelto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host gitlab-runner2002 - jelto@cumin1002"
- 12:41 jelto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host gitlab-runner2002 - jelto@cumin1002"
- 12:37 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2089.codfw.wmnet with reason: host reimage
- 12:37 jelto@cumin1002: START - Cookbook sre.dns.netbox
- 12:36 jelto@cumin1002: START - Cookbook sre.hosts.move-vlan for host gitlab-runner2002
- 12:36 jelto@cumin1002: START - Cookbook sre.hosts.reimage for host gitlab-runner2002.codfw.wmnet with OS bullseye
- 12:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:34 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2086.codfw.wmnet with reason: host reimage
- 12:34 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2089.codfw.wmnet with reason: host reimage
- 12:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:31 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2085.codfw.wmnet with reason: host reimage
- 12:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T376905)', diff saved to https://phabricator.wikimedia.org/P70484 and previous config saved to /var/cache/conftool/dbconfig/20241022-122854-ladsgroup.json
- 12:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2010.codfw.wmnet
- 12:28 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2010.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 12:27 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2088.codfw.wmnet with reason: host reimage
- 12:27 Dreamy_Jazz: Running MediaModeration scan on all group2 wikis
- 12:27 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2086.codfw.wmnet with reason: host reimage
- 12:27 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2085.codfw.wmnet with reason: host reimage
- 12:27 arnaudb@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 10%: T377718', diff saved to https://phabricator.wikimedia.org/P70483 and previous config saved to /var/cache/conftool/dbconfig/20241022-122723-arnaudb.json
- 12:27 Dreamy_Jazz: Stopped MediaModeration scan on all group1 wikis
- 12:24 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2088.codfw.wmnet with reason: host reimage
- 12:23 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2010.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 12:20 Dreamy_Jazz: Running MediaModeration scan on all group1 wikis
- 12:20 klausman@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Java 11 security updates - klausman@cumin2002
- 12:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1195 (T376905)', diff saved to https://phabricator.wikimedia.org/P70482 and previous config saved to /var/cache/conftool/dbconfig/20241022-121928-ladsgroup.json
- 12:19 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1195.eqiad.wmnet with reason: Maintenance
- 12:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1195.eqiad.wmnet with reason: Maintenance
- 12:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T376905)', diff saved to https://phabricator.wikimedia.org/P70481 and previous config saved to /var/cache/conftool/dbconfig/20241022-121903-ladsgroup.json
- 12:17 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 12:12 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2149.codfw.wmnet onto db2227.codfw.wmnet
- 12:12 arnaudb@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 5%: T377718', diff saved to https://phabricator.wikimedia.org/P70480 and previous config saved to /var/cache/conftool/dbconfig/20241022-121218-arnaudb.json
- 12:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2010.codfw.wmnet
- 12:09 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2089.codfw.wmnet with OS bookworm
- 12:09 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db[2149,2227].codfw.wmnet with reason: maintenance
- 12:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2009.codfw.wmnet
- 12:09 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:09 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2009.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 12:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db[2149,2227].codfw.wmnet with reason: maintenance
- 12:08 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2088.codfw.wmnet with OS bookworm
- 12:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db2149 and db2227 - T377718', diff saved to https://phabricator.wikimedia.org/P70479 and previous config saved to /var/cache/conftool/dbconfig/20241022-120753-arnaudb.json
- 12:06 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2086.codfw.wmnet with OS bookworm
- 12:06 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2085.codfw.wmnet with OS bookworm
- 12:05 Dreamy_Jazz: Running MediaModeration scan on all group0 wikis
- 12:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P70478 and previous config saved to /var/cache/conftool/dbconfig/20241022-120356-ladsgroup.json
- 12:03 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for tests: Don't depend on Message implementation details (T377778), Update for Message/MessageValue changes (T377778) (duration: 15m 27s)
- 12:02 klausman@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Java 11 security updates - klausman@cumin2002
- 11:57 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2085-2086,2088-2089].codfw.wmnet
- 11:57 klausman@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Java 11 security updates - klausman@cumin2002
- 11:56 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
- 11:55 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2085-2086,2088-2089].codfw.wmnet
- 11:55 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for tests: Don't depend on Message implementation details (T377778), Update for Message/MessageValue changes (T377778) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 11:54 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2009.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 11:48 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 11:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P70477 and previous config saved to /var/cache/conftool/dbconfig/20241022-114849-ladsgroup.json
- 11:48 kart_: Updated cxserver to 2024-10-22-112806-production (T357950)
- 11:47 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for tests: Don't depend on Message implementation details (T377778), Update for Message/MessageValue changes (T377778)
- 11:47 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
- 11:46 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
- 11:46 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
- 11:45 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
- 11:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
- 11:44 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2009.codfw.wmnet
- 11:43 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
- 11:43 jayme@cumin1002: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) check for host wikikube-worker2085.codfw.wmnet
- 11:43 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node check for host wikikube-worker2085.codfw.wmnet
- 11:41 akosiaris: remove faidon from WMCS projects maps, visualeditor, swift, testlabs per his request. Keep the bastion project. cc paravoid
- 11:39 klausman@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Java 11 security updates - klausman@cumin2002
- 11:34 jayme@cumin1002: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) check for host kubestagemaster2005.codfw.wmnet
- 11:34 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node check for host kubestagemaster2005.codfw.wmnet
- 11:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T376905)', diff saved to https://phabricator.wikimedia.org/P70476 and previous config saved to /var/cache/conftool/dbconfig/20241022-113342-ladsgroup.json
- 11:27 moritzm: installing Java 11 security updates
- 11:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1186 (T376905)', diff saved to https://phabricator.wikimedia.org/P70475 and previous config saved to /var/cache/conftool/dbconfig/20241022-112408-ladsgroup.json
- 11:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
- 11:23 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
- 11:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T376905)', diff saved to https://phabricator.wikimedia.org/P70474 and previous config saved to /var/cache/conftool/dbconfig/20241022-112343-ladsgroup.json
- 11:21 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: sync
- 11:21 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: sync
- 11:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P70473 and previous config saved to /var/cache/conftool/dbconfig/20241022-110836-ladsgroup.json
- 11:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 100%: post clone', diff saved to https://phabricator.wikimedia.org/P70472 and previous config saved to /var/cache/conftool/dbconfig/20241022-110744-arnaudb.json
- 10:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P70471 and previous config saved to /var/cache/conftool/dbconfig/20241022-105329-ladsgroup.json
- 10:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 75%: post clone', diff saved to https://phabricator.wikimedia.org/P70470 and previous config saved to /var/cache/conftool/dbconfig/20241022-105238-arnaudb.json
- 10:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T376905)', diff saved to https://phabricator.wikimedia.org/P70469 and previous config saved to /var/cache/conftool/dbconfig/20241022-103822-ladsgroup.json
- 10:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 50%: post clone', diff saved to https://phabricator.wikimedia.org/P70468 and previous config saved to /var/cache/conftool/dbconfig/20241022-103733-arnaudb.json
- 10:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1184 (T376905)', diff saved to https://phabricator.wikimedia.org/P70467 and previous config saved to /var/cache/conftool/dbconfig/20241022-102907-ladsgroup.json
- 10:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
- 10:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
- 10:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T376905)', diff saved to https://phabricator.wikimedia.org/P70466 and previous config saved to /var/cache/conftool/dbconfig/20241022-102843-ladsgroup.json
- 10:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 25%: post clone', diff saved to https://phabricator.wikimedia.org/P70465 and previous config saved to /var/cache/conftool/dbconfig/20241022-102227-arnaudb.json
- 10:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P70464 and previous config saved to /var/cache/conftool/dbconfig/20241022-101336-ladsgroup.json
- 10:12 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 10:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 10:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
- 10:04 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: sync
- 10:04 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
- 10:03 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: sync
- 10:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2149.codfw.wmnet onto db2205.codfw.wmnet
- 10:03 andrewtavis-wmde@deploy2002: Finished deploy [airflow-dags/wmde@dcf019d]: (no justification provided) (duration: 00m 11s)
- 10:02 andrewtavis-wmde@deploy2002: Started deploy [airflow-dags/wmde@dcf019d]: (no justification provided)
- 09:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P70463 and previous config saved to /var/cache/conftool/dbconfig/20241022-095829-ladsgroup.json
- 09:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
- 09:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T376905)', diff saved to https://phabricator.wikimedia.org/P70461 and previous config saved to /var/cache/conftool/dbconfig/20241022-094322-ladsgroup.json
- 09:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 09:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1169 (T376905)', diff saved to https://phabricator.wikimedia.org/P70460 and previous config saved to /var/cache/conftool/dbconfig/20241022-093345-ladsgroup.json
- 09:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
- 09:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
- 09:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
- 09:28 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:22 hashar: Restarting CI Jenkins
- 09:06 hashar: Restarting Gerrit
- 08:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: maintenance
- 08:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: maintenance
- 08:37 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2149.codfw.wmnet onto db2205.codfw.wmnet
- 08:35 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:33 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:33 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:25 arnaudb@cumin1002: dbctl commit (dc=all): 'db2240 (re)pooling @ 100%: post clone', diff saved to https://phabricator.wikimedia.org/P70459 and previous config saved to /var/cache/conftool/dbconfig/20241022-082545-arnaudb.json
- 08:24 moritzm: irc.wikimedia.org has been switched to ircstream T376014
- 08:10 arnaudb@cumin1002: dbctl commit (dc=all): 'db2240 (re)pooling @ 75%: post clone', diff saved to https://phabricator.wikimedia.org/P70457 and previous config saved to /var/cache/conftool/dbconfig/20241022-081040-arnaudb.json
- 08:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1002.wikimedia.org
- 08:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1002.wikimedia.org
- 08:03 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:03 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:00 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 07:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db[2149,2205].codfw.wmnet with reason: db2205 reclone
- 07:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db[2149,2205].codfw.wmnet with reason: db2205 reclone
- 07:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 07:58 arnaudb@cumin1002: dbctl commit (dc=all): 'T377718', diff saved to https://phabricator.wikimedia.org/P70456 and previous config saved to /var/cache/conftool/dbconfig/20241022-075830-arnaudb.json
- 07:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db2240 (re)pooling @ 50%: post clone', diff saved to https://phabricator.wikimedia.org/P70455 and previous config saved to /var/cache/conftool/dbconfig/20241022-075534-arnaudb.json
- 07:40 arnaudb@cumin1002: dbctl commit (dc=all): 'db2240 (re)pooling @ 28%: post clone', diff saved to https://phabricator.wikimedia.org/P70454 and previous config saved to /var/cache/conftool/dbconfig/20241022-074029-arnaudb.json
- 07:28 moritzm: installing Java 17 security updates
- 07:25 arnaudb@cumin1002: dbctl commit (dc=all): 'db2240 (re)pooling @ 27%: post clone', diff saved to https://phabricator.wikimedia.org/P70453 and previous config saved to /var/cache/conftool/dbconfig/20241022-072523-arnaudb.json
- 07:23 moritzm: rearm keyholder on netmon1003
- 07:10 arnaudb@cumin1002: dbctl commit (dc=all): 'db2240 (re)pooling @ 26%: post clone', diff saved to https://phabricator.wikimedia.org/P70452 and previous config saved to /var/cache/conftool/dbconfig/20241022-071018-arnaudb.json
- 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast6003.wikimedia.org
- 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon1003.wikimedia.org
- 06:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast6003.wikimedia.org
- 06:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db2240 (re)pooling @ 25%: post clone', diff saved to https://phabricator.wikimedia.org/P70451 and previous config saved to /var/cache/conftool/dbconfig/20241022-065513-arnaudb.json
- 06:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon1003.wikimedia.org
- 05:41 kart_: Remove servicerunner dependency for cxserver (T357950, T373777)
- 05:31 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
- 05:30 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
- 05:25 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
- 05:24 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
- 04:01 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.25 (duration: 00m 58s)
- 03:52 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.43.0-wmf.28 refs T375659 (duration: 49m 37s)
- 03:02 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.43.0-wmf.28 refs T375659
- 01:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2229 (T376905)', diff saved to https://phabricator.wikimedia.org/P70450 and previous config saved to /var/cache/conftool/dbconfig/20241022-010820-ladsgroup.json
- 00:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P70449 and previous config saved to /var/cache/conftool/dbconfig/20241022-005313-ladsgroup.json
- 00:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P70448 and previous config saved to /var/cache/conftool/dbconfig/20241022-003807-ladsgroup.json
- 00:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2229 (T376905)', diff saved to https://phabricator.wikimedia.org/P70447 and previous config saved to /var/cache/conftool/dbconfig/20241022-002259-ladsgroup.json
- 00:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2229 (T376905)', diff saved to https://phabricator.wikimedia.org/P70446 and previous config saved to /var/cache/conftool/dbconfig/20241022-001606-ladsgroup.json
- 00:17 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2229.codfw.wmnet with reason: Maintenance
- 00:16 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2229.codfw.wmnet with reason: Maintenance
- 00:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T376905)', diff saved to https://phabricator.wikimedia.org/P70445 and previous config saved to /var/cache/conftool/dbconfig/20241022-001539-ladsgroup.json
- 00:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P70444 and previous config saved to /var/cache/conftool/dbconfig/20241022-000032-ladsgroup.json
2024-10-21
- 23:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P70443 and previous config saved to /var/cache/conftool/dbconfig/20241021-234525-ladsgroup.json
- 23:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T376905)', diff saved to https://phabricator.wikimedia.org/P70442 and previous config saved to /var/cache/conftool/dbconfig/20241021-233018-ladsgroup.json
- 23:20 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye
- 22:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2224 (T376905)', diff saved to https://phabricator.wikimedia.org/P70441 and previous config saved to /var/cache/conftool/dbconfig/20241021-222952-ladsgroup.json
- 22:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2224.codfw.wmnet with reason: Maintenance
- 22:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2224.codfw.wmnet with reason: Maintenance
- 22:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T376905)', diff saved to https://phabricator.wikimedia.org/P70440 and previous config saved to /var/cache/conftool/dbconfig/20241021-222926-ladsgroup.json
- 22:21 eileen: config revision changed from a1c7759c to 3bbf553d
- 22:18 zabe@deploy2002: Finished scap sync-world: Backport for group0: Increase revision-slots cache expiry back to default (T183490) (duration: 06m 58s)
- 22:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P70439 and previous config saved to /var/cache/conftool/dbconfig/20241021-221419-ladsgroup.json
- 22:13 zabe@deploy2002: zabe: Continuing with sync
- 22:13 zabe@deploy2002: zabe: Backport for group0: Increase revision-slots cache expiry back to default (T183490) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:11 zabe@deploy2002: Started scap sync-world: Backport for group0: Increase revision-slots cache expiry back to default (T183490)
- 21:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
- 21:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P70438 and previous config saved to /var/cache/conftool/dbconfig/20241021-215912-ladsgroup.json
- 21:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T376905)', diff saved to https://phabricator.wikimedia.org/P70437 and previous config saved to /var/cache/conftool/dbconfig/20241021-214405-ladsgroup.json
- 21:43 eileen: config revision changed from d240bcfb to a1c7759c
- 21:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T376905)', diff saved to https://phabricator.wikimedia.org/P70436 and previous config saved to /var/cache/conftool/dbconfig/20241021-213801-ladsgroup.json
- 21:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2217.codfw.wmnet with reason: Maintenance
- 21:37 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2217.codfw.wmnet with reason: Maintenance
- 21:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T376905)', diff saved to https://phabricator.wikimedia.org/P70435 and previous config saved to /var/cache/conftool/dbconfig/20241021-213733-ladsgroup.json
- 21:25 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 21:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P70434 and previous config saved to /var/cache/conftool/dbconfig/20241021-212226-ladsgroup.json
- 21:22 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 21:16 swfrench-wmf: ran authdns-update to pick up mw-(web|api-ext)-next discovery records - T377040
- 21:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P70433 and previous config saved to /var/cache/conftool/dbconfig/20241021-210718-ladsgroup.json
- 21:00 sukhe: running authdns-update for CR 1081371
- away: UTC late deploys done
- 20:56 tgr@deploy2002: Finished scap sync-world: Backport for fix(AuthManagerStatsd): counters require static set of labels (T377476) (duration: 18m 43s)
- 20:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T376905)', diff saved to https://phabricator.wikimedia.org/P70431 and previous config saved to /var/cache/conftool/dbconfig/20241021-205211-ladsgroup.json
- 20:52 tgr@deploy2002: tgr: Continuing with sync
- 20:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T376905)', diff saved to https://phabricator.wikimedia.org/P70430 and previous config saved to /var/cache/conftool/dbconfig/20241021-204603-ladsgroup.json
- 20:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2193.codfw.wmnet with reason: Maintenance
- 20:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2193.codfw.wmnet with reason: Maintenance
- 20:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T376905)', diff saved to https://phabricator.wikimedia.org/P70429 and previous config saved to /var/cache/conftool/dbconfig/20241021-204536-ladsgroup.json
- 20:40 tgr@deploy2002: tgr: Backport for fix(AuthManagerStatsd): counters require static set of labels (T377476) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:37 tgr@deploy2002: Started scap sync-world: Backport for fix(AuthManagerStatsd): counters require static set of labels (T377476)
- 20:32 tgr@deploy2002: Finished scap sync-world: Backport for frwiki: switch clearing link recommendations to PageSaveComplete hook (T372337) (duration: 08m 19s)
- 20:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P70428 and previous config saved to /var/cache/conftool/dbconfig/20241021-203029-ladsgroup.json
- 20:28 tgr@deploy2002: migr, tgr: Continuing with sync
- 20:26 tgr@deploy2002: migr, tgr: Backport for frwiki: switch clearing link recommendations to PageSaveComplete hook (T372337) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:24 tgr@deploy2002: Started scap sync-world: Backport for frwiki: switch clearing link recommendations to PageSaveComplete hook (T372337)
- 20:22 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 20:21 tgr@deploy2002: Finished scap sync-world: Backport for Re-apply "Set special footer licence message for MediaWiki.org re. Help: pages" (T301483) (duration: 09m 48s)
- 20:19 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 20:17 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 20:16 tgr@deploy2002: matmarex, tgr: Continuing with sync
- 20:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P70427 and previous config saved to /var/cache/conftool/dbconfig/20241021-201522-ladsgroup.json
- 20:13 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 20:13 tgr@deploy2002: matmarex, tgr: Backport for Re-apply "Set special footer licence message for MediaWiki.org re. Help: pages" (T301483) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:11 tgr@deploy2002: Started scap sync-world: Backport for Re-apply "Set special footer licence message for MediaWiki.org re. Help: pages" (T301483)
- 20:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T376905)', diff saved to https://phabricator.wikimedia.org/P70426 and previous config saved to /var/cache/conftool/dbconfig/20241021-200015-ladsgroup.json
- 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T376905)', diff saved to https://phabricator.wikimedia.org/P70425 and previous config saved to /var/cache/conftool/dbconfig/20241021-195300-ladsgroup.json
- 19:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
- 19:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
- 19:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T376905)', diff saved to https://phabricator.wikimedia.org/P70424 and previous config saved to /var/cache/conftool/dbconfig/20241021-195233-ladsgroup.json
- 19:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P70423 and previous config saved to /var/cache/conftool/dbconfig/20241021-193726-ladsgroup.json
- 19:36 swfrench@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=mw-api-ext-next-ro,name=eqiad [reason: preparing mw-api-ext-next-ro (a/a) for discovery - T377040]
- 19:36 swfrench@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=mw-api-ext-next-ro,name=codfw [reason: preparing mw-api-ext-next-ro (a/a) for discovery - T377040]
- 19:36 dduvall@deploy2002: Finished deploy [releng/jenkins-deploy@b75c4aa] (releasing): Deploying changes to MediaWiki branch and publish WMF single-version image job (duration: 01m 20s)
- 19:36 swfrench@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=mw-web-next-ro,name=eqiad [reason: preparing mw-web-next-ro (a/a) for discovery - T377040]
- 19:35 swfrench@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=mw-web-next-ro,name=codfw [reason: preparing mw-web-next-ro (a/a) for discovery - T377040]
- 19:34 dduvall@deploy2002: Started deploy [releng/jenkins-deploy@b75c4aa] (releasing): Deploying changes to MediaWiki branch and publish WMF single-version image job
- 19:31 swfrench@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=mw-api-ext-next,name=codfw [reason: preparing mw-api-ext-next (a/p) for discovery - T377040]
- 19:30 swfrench@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=mw-web-next,name=codfw [reason: preparing mw-web-next (a/p) for discovery - T377040]
- 19:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P70422 and previous config saved to /var/cache/conftool/dbconfig/20241021-192219-ladsgroup.json
- 19:11 ejegg: re-enabled fundraising thank you mailer
- 19:10 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-codfw (T377040)
- 19:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T376905)', diff saved to https://phabricator.wikimedia.org/P70421 and previous config saved to /var/cache/conftool/dbconfig/20241021-190712-ladsgroup.json
- 19:04 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-codfw (T377040)
- 19:02 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-codfw (T377040)
- 19:02 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw (T377040)
- 19:01 swfrench-wmf: ran and enabled puppet agent on 'A:lvs and A:codfw' - T377040
- 19:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T376905)', diff saved to https://phabricator.wikimedia.org/P70420 and previous config saved to /var/cache/conftool/dbconfig/20241021-185957-ladsgroup.json
- 19:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
- 19:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
- 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T376905)', diff saved to https://phabricator.wikimedia.org/P70419 and previous config saved to /var/cache/conftool/dbconfig/20241021-185931-ladsgroup.json
- 18:58 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-eqiad (T377040)
- 18:52 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-eqiad (T377040)
- 18:51 zabe@deploy2002: Finished scap sync-world: Backport for s4: Reduce revision-slots cache expiry to 60 seconds (T183490) (duration: 16m 09s)
- 18:51 ejegg: fundraising civicrm upgraded from cfb0def0 to 36660cb3
- 18:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1012.eqiad.wmnet with OS bookworm
- 18:45 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
- 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P70418 and previous config saved to /var/cache/conftool/dbconfig/20241021-184424-ladsgroup.json
- 18:43 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad (T377040)
- 18:42 zabe@deploy2002: zabe: Continuing with sync
- 18:42 zabe@deploy2002: zabe: Backport for s4: Reduce revision-slots cache expiry to 60 seconds (T183490) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 18:37 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
- 18:37 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad (T377040)
- 18:36 swfrench-wmf: ran and enabled puppet agent on 'A:lvs and A:eqiad' - T377040
- 18:35 zabe@deploy2002: Started scap sync-world: Backport for s4: Reduce revision-slots cache expiry to 60 seconds (T183490)
- 18:32 swfrench-wmf: ran disable-puppet on 'A:lvs and (A:eqiad or A:codfw)' - T377040
- 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P70417 and previous config saved to /var/cache/conftool/dbconfig/20241021-182916-ladsgroup.json
- 18:23 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-codfw (T377040)
- 18:22 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-codfw (T377040)
- 18:20 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-codfw (T377040)
- 18:19 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw (T377040)
- 18:19 swfrench-wmf: ran and enabled pupppet agent on 'A:lvs and A:codfw' - T377040
- 18:15 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-eqiad (T377040)
- 18:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1012.eqiad.wmnet with reason: host reimage
- 18:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T376905)', diff saved to https://phabricator.wikimedia.org/P70416 and previous config saved to /var/cache/conftool/dbconfig/20241021-181410-ladsgroup.json
- 18:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1012.eqiad.wmnet with reason: host reimage
- 18:09 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-eqiad (T377040)
- 18:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T376905)', diff saved to https://phabricator.wikimedia.org/P70415 and previous config saved to /var/cache/conftool/dbconfig/20241021-180654-ladsgroup.json
- 18:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 18:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 18:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
- 18:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
- 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T376905)', diff saved to https://phabricator.wikimedia.org/P70414 and previous config saved to /var/cache/conftool/dbconfig/20241021-180612-ladsgroup.json
- 18:06 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad (T377040)
- 18:05 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad (T377040)
- 18:04 swfrench-wmf: ran and enabled pupppet agent on 'A:lvs and A:eqiad' - T377040
- 17:59 swfrench-wmf: ran disable-puppet on 'A:lvs and (A:eqiad or A:codfw)' - T377040
- 17:56 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
- 17:53 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1012.eqiad.wmnet with OS bookworm
- 17:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
- 17:52 dduvall@deploy2002: Installing scap version "4.115.0" for 209 hosts
- 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P70413 and previous config saved to /var/cache/conftool/dbconfig/20241021-175105-ladsgroup.json
- 17:50 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@671896c]: Deploy T375402. (duration: 01m 04s)
- 17:48 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@671896c]: Deploy T375402.
- 17:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 17:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 17:42 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 17:41 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 17:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P70412 and previous config saved to /var/cache/conftool/dbconfig/20241021-173558-ladsgroup.json
- 17:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T376905)', diff saved to https://phabricator.wikimedia.org/P70411 and previous config saved to /var/cache/conftool/dbconfig/20241021-172051-ladsgroup.json
- 17:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T376905)', diff saved to https://phabricator.wikimedia.org/P70410 and previous config saved to /var/cache/conftool/dbconfig/20241021-171138-ladsgroup.json
- 17:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
- 17:11 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
- 17:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213 (T376905)', diff saved to https://phabricator.wikimedia.org/P70409 and previous config saved to /var/cache/conftool/dbconfig/20241021-171046-ladsgroup.json
- 16:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db2172 (re)pooling @ 100%: post clone', diff saved to https://phabricator.wikimedia.org/P70408 and previous config saved to /var/cache/conftool/dbconfig/20241021-165624-arnaudb.json
- 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P70407 and previous config saved to /var/cache/conftool/dbconfig/20241021-165539-ladsgroup.json
- 16:44 herron@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
- 16:43 herron@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
- 16:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db2172 (re)pooling @ 75%: post clone', diff saved to https://phabricator.wikimedia.org/P70406 and previous config saved to /var/cache/conftool/dbconfig/20241021-164119-arnaudb.json
- 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P70405 and previous config saved to /var/cache/conftool/dbconfig/20241021-164032-ladsgroup.json
- 16:33 volans@cumin1002: dbctl commit (dc=all): 'Fix db1185 weight', diff saved to https://phabricator.wikimedia.org/P70404 and previous config saved to /var/cache/conftool/dbconfig/20241021-163355-volans.json
- 16:32 volans@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1185 quickly with 2 steps - Testing new cookbook
- 16:29 volans@cumin1002: START - Cookbook sre.mysql.pool db1185 quickly with 2 steps - Testing new cookbook
- 16:29 volans@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1185 quickly with 2 steps - Testing new cookbook
- 16:28 volans@cumin1002: START - Cookbook sre.mysql.pool db1185 quickly with 2 steps - Testing new cookbook
- 16:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2172 (re)pooling @ 50%: post clone', diff saved to https://phabricator.wikimedia.org/P70401 and previous config saved to /var/cache/conftool/dbconfig/20241021-162613-arnaudb.json
- 16:27 volans@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1185 - Testing new cookbook
- 16:26 volans@cumin1002: START - Cookbook sre.mysql.depool db1185 - Testing new cookbook
- 16:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213 (T376905)', diff saved to https://phabricator.wikimedia.org/P70399 and previous config saved to /var/cache/conftool/dbconfig/20241021-162525-ladsgroup.json
- 16:22 volans@cumin1002: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) db1185 - Testing new cookbook
- 16:22 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 16:22 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 16:21 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 16:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 16:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 16:19 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 16:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 16:18 volans@cumin1002: START - Cookbook sre.mysql.depool db1185 - Testing new cookbook
- 16:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 16:17 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2213 (T376905)', diff saved to https://phabricator.wikimedia.org/P70398 and previous config saved to /var/cache/conftool/dbconfig/20241021-161701-ladsgroup.json
- 16:17 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2213.codfw.wmnet with reason: Maintenance
- 16:16 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2213.codfw.wmnet with reason: Maintenance
- 16:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T376905)', diff saved to https://phabricator.wikimedia.org/P70397 and previous config saved to /var/cache/conftool/dbconfig/20241021-161634-ladsgroup.json
- 16:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 16:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db2172 (re)pooling @ 25%: post clone', diff saved to https://phabricator.wikimedia.org/P70396 and previous config saved to /var/cache/conftool/dbconfig/20241021-161108-arnaudb.json
- 16:04 ejegg: disabled fundraising Thank You mail send jobs
- 16:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P70395 and previous config saved to /var/cache/conftool/dbconfig/20241021-160127-ladsgroup.json
- 15:58 volans@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1185 gradually with 4 steps - Testing new cookbook
- 15:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:55 volans@cumin1002: START - Cookbook sre.mysql.pool db1185 gradually with 4 steps - Testing new cookbook
- 15:53 volans@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1185 - Testing new cookbook
- 15:53 volans@cumin1002: START - Cookbook sre.mysql.depool db1185 - Testing new cookbook
- 15:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P70389 and previous config saved to /var/cache/conftool/dbconfig/20241021-154620-ladsgroup.json
- 15:39 Dreamy_Jazz: Starting MediaModeration scanning script for 12 hrs on enwiki - https://wikitech.wikimedia.org/wiki/MediaModeration
- 15:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 15:37 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2172.codfw.wmnet onto db2240.codfw.wmnet
- 15:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 15:32 Dreamy_Jazz: Restarted MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
- 15:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T376905)', diff saved to https://phabricator.wikimedia.org/P70388 and previous config saved to /var/cache/conftool/dbconfig/20241021-153113-ladsgroup.json
- 15:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2211 (T376905)', diff saved to https://phabricator.wikimedia.org/P70387 and previous config saved to /var/cache/conftool/dbconfig/20241021-152408-ladsgroup.json
- 15:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2211.codfw.wmnet with reason: Maintenance
- 15:23 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2211.codfw.wmnet with reason: Maintenance
- 15:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T376905)', diff saved to https://phabricator.wikimedia.org/P70386 and previous config saved to /var/cache/conftool/dbconfig/20241021-152339-ladsgroup.json
- 15:20 moritzm: rearm keyholder on netmon2002
- 15:20 stran@deploy2002: Finished scap sync-world: Backport for Disable local IP view right group on meta (T377584) (duration: 20m 29s)
- 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon2002.wikimedia.org
- 15:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P70385 and previous config saved to /var/cache/conftool/dbconfig/20241021-150832-ladsgroup.json
- 15:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon2002.wikimedia.org
- 15:02 stran@deploy2002: stran: Continuing with sync
- 15:01 stran@deploy2002: stran: Backport for Disable local IP view right group on meta (T377584) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:59 stran@deploy2002: Started scap sync-world: Backport for Disable local IP view right group on meta (T377584)
- 14:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P70384 and previous config saved to /var/cache/conftool/dbconfig/20241021-145325-ladsgroup.json
- 14:53 ejegg: disabled failing CiviCRM contact dedupe job
- 14:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T376905)', diff saved to https://phabricator.wikimedia.org/P70383 and previous config saved to /var/cache/conftool/dbconfig/20241021-143818-ladsgroup.json
- 14:33 herron@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
- 14:32 herron@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
- 14:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2192 (T376905)', diff saved to https://phabricator.wikimedia.org/P70382 and previous config saved to /var/cache/conftool/dbconfig/20241021-143108-ladsgroup.json
- 14:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2192.codfw.wmnet with reason: Maintenance
- 14:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2192.codfw.wmnet with reason: Maintenance
- 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T376905)', diff saved to https://phabricator.wikimedia.org/P70381 and previous config saved to /var/cache/conftool/dbconfig/20241021-143042-ladsgroup.json
- 14:29 moritzm: installing PHP 8.2 security updates
- 14:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P70380 and previous config saved to /var/cache/conftool/dbconfig/20241021-141535-ladsgroup.json
- 14:15 Lucas_WMDE: UTC afternoon backport+config window done
- 14:10 stran@deploy2002: Finished scap sync-world: Backport for Disable IP reveal rights for local metawiki groups (T377584), Set redirect wiki for Special:GlobalContributions (T376612), temp accounts: Make temp accounts known on metawiki (T376132) (duration: 14m 55s)
- 14:05 stran@deploy2002: stran, kharlan: Continuing with sync
- 14:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P70379 and previous config saved to /var/cache/conftool/dbconfig/20241021-140028-ladsgroup.json
- 13:57 stran@deploy2002: stran, kharlan: Backport for Disable IP reveal rights for local metawiki groups (T377584), Set redirect wiki for Special:GlobalContributions (T376612), temp accounts: Make temp accounts known on metawiki (T376132) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2035.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 13:55 stran@deploy2002: Started scap sync-world: Backport for Disable IP reveal rights for local metawiki groups (T377584), Set redirect wiki for Special:GlobalContributions (T376612), temp accounts: Make temp accounts known on metawiki (T376132)
- 13:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti2035.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 13:53 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2035.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 13:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti2035.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 13:50 stran@deploy2002: Finished scap sync-world: Backport for Apply wmf-specific protected vars rights access (T369610) (duration: 08m 53s)
- 13:45 stran@deploy2002: stran: Continuing with sync
- 13:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T376905)', diff saved to https://phabricator.wikimedia.org/P70378 and previous config saved to /var/cache/conftool/dbconfig/20241021-134521-ladsgroup.json
- 13:43 stran@deploy2002: stran: Backport for Apply wmf-specific protected vars rights access (T369610) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:41 stran@deploy2002: Started scap sync-world: Backport for Apply wmf-specific protected vars rights access (T369610)
- 13:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2178 (T376905)', diff saved to https://phabricator.wikimedia.org/P70377 and previous config saved to /var/cache/conftool/dbconfig/20241021-133619-ladsgroup.json
- 13:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
- 13:36 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
- 13:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T376905)', diff saved to https://phabricator.wikimedia.org/P70376 and previous config saved to /var/cache/conftool/dbconfig/20241021-133552-ladsgroup.json
- 13:35 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 13:34 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7002.magru.wmnet
- 13:34 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Revert "Enable CampaignEvents collaboration list in testwiki and test2wiki" (duration: 08m 20s)
- 13:33 jelto@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 13:33 inflatador: bking@stat1009,stat1010.mgmt racadm>>racadm set BIOS.MemSettings.NodeInterleave Enabled && racadm jobqueue create BIOS.Setup.1-1 T376813
- 13:32 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 13:30 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2172.codfw.wmnet onto db2240.codfw.wmnet
- 13:29 jelto@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 13:29 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, trainbranchbot: Continuing with sync
- 13:28 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 13:28 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, trainbranchbot: Backport for Revert "Enable CampaignEvents collaboration list in testwiki and test2wiki" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:27 jelto@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 13:26 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Revert "Enable CampaignEvents collaboration list in testwiki and test2wiki"
- 13:25 inflatador: bking@stat1008.mgmt racadm>>racadm jobqueue create BIOS.Setup.1-1
- 13:24 inflatador: bking@stat1008.mgmt racadm>>racadm set BIOS.MemSettings.NodeInterleave Enabled T376813
- 13:24 lucaswerkmeister-wmde@deploy2002: Sync cancelled.
- 13:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2172 in db2240 for T373579', diff saved to https://phabricator.wikimedia.org/P70375 and previous config saved to /var/cache/conftool/dbconfig/20241021-132351-arnaudb.json
- 13:22 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2240.codfw.wmnet with reason: provisionning db2240.codfw.wmnet - T373579
- 13:22 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2240.codfw.wmnet with reason: provisionning db2240.codfw.wmnet - T373579
- 13:22 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: provisionning db2240.codfw.wmnet - T373579
- 13:22 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: provisionning db2240.codfw.wmnet - T373579
- 13:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P70374 and previous config saved to /var/cache/conftool/dbconfig/20241021-132045-ladsgroup.json
- 13:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db2172 to clone on db2240 T373579', diff saved to https://phabricator.wikimedia.org/P70373 and previous config saved to /var/cache/conftool/dbconfig/20241021-131750-arnaudb.json
- 13:12 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: test Ide32aa with dummy upgrade
- 13:11 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: test Ide32aa with dummy upgrade
- 13:08 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, daimona: Backport for Enable CampaignEvents collaboration list in testwiki and test2wiki (T376055) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P70372 and previous config saved to /var/cache/conftool/dbconfig/20241021-130538-ladsgroup.json
- 13:05 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Enable CampaignEvents collaboration list in testwiki and test2wiki (T376055)
- 12:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2035.codfw.wmnet
- 12:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2035.codfw.wmnet
- 12:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T376905)', diff saved to https://phabricator.wikimedia.org/P70371 and previous config saved to /var/cache/conftool/dbconfig/20241021-125029-ladsgroup.json
- 12:45 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-lab1002.eqiad.wmnet with OS bookworm
- 12:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2171 (T376905)', diff saved to https://phabricator.wikimedia.org/P70370 and previous config saved to /var/cache/conftool/dbconfig/20241021-124217-ladsgroup.json
- 12:42 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
- 12:42 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
- 12:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T376905)', diff saved to https://phabricator.wikimedia.org/P70369 and previous config saved to /var/cache/conftool/dbconfig/20241021-124151-ladsgroup.json
- 12:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P70368 and previous config saved to /var/cache/conftool/dbconfig/20241021-122644-ladsgroup.json
- 12:24 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-lab1002.eqiad.wmnet with reason: host reimage
- 12:21 klausman@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-lab1002.eqiad.wmnet with reason: host reimage
- 12:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P70367 and previous config saved to /var/cache/conftool/dbconfig/20241021-121136-ladsgroup.json
- 12:09 klausman@cumin1002: START - Cookbook sre.hosts.reimage for host ml-lab1002.eqiad.wmnet with OS bookworm
- 12:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:00 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 11:56 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: sync on production
- 11:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T376905)', diff saved to https://phabricator.wikimedia.org/P70366 and previous config saved to /var/cache/conftool/dbconfig/20241021-115629-ladsgroup.json
- 11:52 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
- 11:52 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
- 11:52 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
- 11:51 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
- 11:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2157 (T376905)', diff saved to https://phabricator.wikimedia.org/P70365 and previous config saved to /var/cache/conftool/dbconfig/20241021-114723-ladsgroup.json
- 11:47 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
- 11:47 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
- 11:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T376905)', diff saved to https://phabricator.wikimedia.org/P70364 and previous config saved to /var/cache/conftool/dbconfig/20241021-114657-ladsgroup.json
- 11:40 moritzm: installing python-idna security updates
- 11:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P70363 and previous config saved to /var/cache/conftool/dbconfig/20241021-113150-ladsgroup.json
- 11:17 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 11:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P70362 and previous config saved to /var/cache/conftool/dbconfig/20241021-111643-ladsgroup.json
- 11:14 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T376905)', diff saved to https://phabricator.wikimedia.org/P70361 and previous config saved to /var/cache/conftool/dbconfig/20241021-110136-ladsgroup.json
- 10:59 moritzm: installing curl security updates
- 10:54 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1029.eqiad.wmnet
- 10:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2128 (T376905)', diff saved to https://phabricator.wikimedia.org/P70360 and previous config saved to /var/cache/conftool/dbconfig/20241021-105223-ladsgroup.json
- 10:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 10:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 10:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
- 10:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
- 10:47 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1029.eqiad.wmnet
- 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2038.codfw.wmnet to cluster codfw and group C
- 10:31 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2038.codfw.wmnet to cluster codfw and group C
- 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping1004.eqiad.wmnet
- 10:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping1004.eqiad.wmnet
- 10:18 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1185.eqiad.wmnet with reason: testing depool/repool
- 10:18 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1185.eqiad.wmnet with reason: testing depool/repool
- 10:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 10:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1213.eqiad.wmnet with reason: testing depool/repool
- 10:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1213.eqiad.wmnet with reason: testing depool/repool
- 10:14 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1245.eqiad.wmnet with reason: testing depool/repool
- 10:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 10:14 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1245.eqiad.wmnet with reason: testing depool/repool
- 10:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 10:10 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host cloudcephmon1006.eqiad.wmnet
- 10:08 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.reimage-stacked-control-plane (exit_code=0) Reimaging k8s control planes of cluster staging-eqiad: containerd migration
- 10:08 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster1005.eqiad.wmnet with OS bookworm
- 10:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 10:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 10:02 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephmon1006.eqiad.wmnet
- 09:53 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 09:53 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 09:52 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2002.codfw.wmnet
- 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2037.codfw.wmnet to cluster codfw and group C
- 09:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2037.codfw.wmnet to cluster codfw and group C
- 09:47 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-staging2002.codfw.wmnet
- 09:46 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster1005.eqiad.wmnet with reason: host reimage
- 09:45 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2001.codfw.wmnet
- 09:42 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster1005.eqiad.wmnet with reason: host reimage
- 09:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2038.codfw.wmnet
- 09:40 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-staging2001.codfw.wmnet
- 09:39 dcausse@deploy2002: Finished scap sync-world: Backport for Fix phan issue with getCounter returning NullMetric|CounterMetric, Do not pass null to DataSender::sendWeightedTagsUpdate $tagWeights (T376715) (duration: 23m 26s)
- 09:36 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1011.eqiad.wmnet
- 09:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2038.codfw.wmnet
- 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping2004.codfw.wmnet
- 09:32 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-serve1011.eqiad.wmnet
- 09:31 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1010.eqiad.wmnet
- 09:29 dcausse@deploy2002: dcausse: Continuing with sync
- 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2037.codfw.wmnet
- 09:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping2004.codfw.wmnet
- 09:27 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestagemaster1005.eqiad.wmnet with OS bookworm
- 09:27 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster1004.eqiad.wmnet with OS bookworm
- 09:27 dcausse@deploy2002: dcausse: Backport for Fix phan issue with getCounter returning NullMetric|CounterMetric, Do not pass null to DataSender::sendWeightedTagsUpdate $tagWeights (T376715) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 09:26 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-serve1010.eqiad.wmnet
- 09:24 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1009.eqiad.wmnet
- 09:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2037.codfw.wmnet
- 09:19 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-serve1009.eqiad.wmnet
- 09:18 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet
- 09:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2038.codfw.wmnet
- 09:16 dcausse@deploy2002: Started scap sync-world: Backport for Fix phan issue with getCounter returning NullMetric|CounterMetric, Do not pass null to DataSender::sendWeightedTagsUpdate $tagWeights (T376715)
- 09:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2038.codfw.wmnet
- 09:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2037.codfw.wmnet
- 09:11 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet
- 09:11 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 09:11 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 09:10 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 09:10 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 09:09 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 09:09 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 09:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2037.codfw.wmnet
- 09:06 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster1004.eqiad.wmnet with reason: host reimage
- 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2044.codfw.wmnet
- 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet
- 09:03 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1001.eqiad.wmnet
- 09:02 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster1004.eqiad.wmnet with reason: host reimage
- 09:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet
- 08:58 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2044.codfw.wmnet
- 08:57 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-lab1001.eqiad.wmnet
- 08:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2043.codfw.wmnet
- 08:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2039.codfw.wmnet
- 08:53 andrewtavis-wmde@deploy2002: Finished deploy [airflow-dags/wmde@d176c47]: (no justification provided) (duration: 00m 11s)
- 08:53 andrewtavis-wmde@deploy2002: Started deploy [airflow-dags/wmde@d176c47]: (no justification provided)
- 08:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2043.codfw.wmnet
- 08:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2039.codfw.wmnet
- 08:48 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestagemaster1004.eqiad.wmnet with OS bookworm
- 08:47 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster1003.eqiad.wmnet with OS bookworm
- 08:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2040.codfw.wmnet
- 08:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2041.codfw.wmnet
- 08:44 jnuche@deploy2002: Installing scap version "4.114.0" for 210 hosts
- 08:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2041.codfw.wmnet
- 08:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2040.codfw.wmnet
- 08:26 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster1003.eqiad.wmnet with reason: host reimage
- 08:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 08:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 08:23 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster1003.eqiad.wmnet with reason: host reimage
- 08:22 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 08:21 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 08:09 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestagemaster1003.eqiad.wmnet with OS bookworm
- 08:09 jayme@cumin1002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster staging-eqiad: containerd migration
- 07:50 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 07:50 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 07:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 07:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 07:36 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1013.eqiad.wmnet with OS bookworm
- 07:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 07:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 07:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 07:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 07:23 moritzm: installing python-reportlab security updates
- 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast7001.wikimedia.org
- 07:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 07:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 07:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast7001.wikimedia.org
- 07:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1013.eqiad.wmnet with reason: host reimage
- 07:10 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1013.eqiad.wmnet with reason: host reimage
- 07:09 kartik@deploy2002: scap failed: <CalledProcessError> Command '['/usr/bin/scap', 'mwshell', '--no-local-config', '--directory', '/srv/mediawiki-staging', '--user', 'www-data', '--', 'rm -f /srv/mediawiki-staging/php-1.43.0-wmf.27/cache/l10n/*.tmp.*']' returned non-zero exit status 126. (scap version: 4.113.0) (duration: 00m 01s)
- 07:09 kartik@deploy2002: Started scap sync-world: Backport for Enable Special:Contribute on bnwiki
- 07:05 kartik@deploy2002: scap failed: <CalledProcessError> Command '['/usr/bin/scap', 'mwshell', '--no-local-config', '--directory', '/srv/mediawiki-staging', '--user', 'www-data', '--', 'rm -f /srv/mediawiki-staging/php-1.43.0-wmf.27/cache/l10n/*.tmp.*']' returned non-zero exit status 126. (scap version: 4.113.0) (duration: 00m 01s)
- 07:05 kartik@deploy2002: Started scap sync-world: Backport for Enable Special:Contribute on bnwiki
- 06:58 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 153087
- 06:58 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 153087
- 06:58 ayounsi@cumin1002: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'email' for AS: 153087
- 06:58 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 153087
- 06:56 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1013.eqiad.wmnet with OS bookworm
- 06:08 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2203.codfw.wmnet with reason: Maintenance
- 06:08 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2203.codfw.wmnet with reason: Maintenance
- 06:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1163.eqiad.wmnet with reason: Maintenance
- 06:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1163.eqiad.wmnet with reason: Maintenance
- 00:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 (T367856)', diff saved to https://phabricator.wikimedia.org/P70359 and previous config saved to /var/cache/conftool/dbconfig/20241021-000434-ladsgroup.json
- 00:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1211.eqiad.wmnet with reason: Maintenance
- 00:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1211.eqiad.wmnet with reason: Maintenance
2024-10-20
- 21:19 eileen: civicrm upgraded from 77ea54bc to cfb0def0
- 09:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T367856)', diff saved to https://phabricator.wikimedia.org/P70358 and previous config saved to /var/cache/conftool/dbconfig/20241020-095904-ladsgroup.json
- 09:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P70357 and previous config saved to /var/cache/conftool/dbconfig/20241020-094357-ladsgroup.json
- 09:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P70356 and previous config saved to /var/cache/conftool/dbconfig/20241020-092850-ladsgroup.json
- 09:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T367856)', diff saved to https://phabricator.wikimedia.org/P70355 and previous config saved to /var/cache/conftool/dbconfig/20241020-091344-ladsgroup.json
2024-10-19
- 00:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 00:13 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
2024-10-18
- 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 22:13 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 21:52 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:50 pt1979@cumin2002: START - Cookbook sre.dns.netbox
- 21:45 dduvall@deploy2002: Finished deploy [releng/jenkins-deploy@8c1070f] (releasing): deploying changes to publishMWSingleVersion job (duration: 01m 06s)
- 21:44 dduvall@deploy2002: Started deploy [releng/jenkins-deploy@8c1070f] (releasing): deploying changes to publishMWSingleVersion job
- 20:23 dduvall: deployed scap release 4.113.0 to releases{1003,2003} hosts
- 20:22 dduvall@deploy2002: Installing scap version "4.113.0" for 2 hosts
- 20:21 dduvall@deploy2002: install-world aborted: (no justification provided) (duration: 00m 52s)
- 20:20 dduvall@deploy2002: Installing scap version "latest" for 2 hosts
- 19:09 tzatziki: removing 3 files for legal compliance
- 18:56 tzatziki: removing 1 file for legal compliance
- 16:54 dzahn@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 16:54 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.reimage-stacked-control-plane (exit_code=0) Reimaging k8s control planes of cluster staging-codfw: containerd migration
- 16:54 dzahn@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
- 16:54 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster2005.codfw.wmnet with OS bookworm
- 16:32 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2005.codfw.wmnet with reason: host reimage
- 16:28 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2005.codfw.wmnet with reason: host reimage
- 16:10 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestagemaster2005.codfw.wmnet with OS bookworm
- 16:09 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster2004.codfw.wmnet with OS bookworm
- 15:46 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2004.codfw.wmnet with reason: host reimage
- 15:43 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2004.codfw.wmnet with reason: host reimage
- 15:26 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestagemaster2004.codfw.wmnet with OS bookworm
- 15:26 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster2003.codfw.wmnet with OS bookworm
- 15:02 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2003.codfw.wmnet with reason: host reimage
- 14:59 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:58 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2003.codfw.wmnet with reason: host reimage
- 14:57 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
- 14:53 akosiaris@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 14:53 akosiaris@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Removal of old mx records and api.svc records - akosiaris@cumin1002"
- 14:52 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Removal of old mx records and api.svc records - akosiaris@cumin1002"
- 14:48 milimetric@deploy2002: Finished deploy [airflow-dags/analytics@e44bacc]: Deploying updated dumps reconciliation (duration: 00m 31s)
- 14:47 milimetric@deploy2002: Started deploy [airflow-dags/analytics@e44bacc]: Deploying updated dumps reconciliation
- 14:39 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestagemaster2003.codfw.wmnet with OS bookworm
- 14:38 jayme@cumin1002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster staging-codfw: containerd migration
- 14:37 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1013.eqiad.wmnet
- 14:37 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1013.eqiad.wmnet
- 14:25 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
- 14:09 sergi0: Running `foreachwiki userOptions.php --delete-defaults growthexperiments-homepage-variant` (T374544, T375753)
- 13:47 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
- 13:46 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
- 13:32 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Hardware replacement
- 13:31 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Hardware replacement
- 13:22 milimetric@deploy2002: Finished deploy [airflow-dags/analytics@f020959]: Deploying updated dumps reconciliation (duration: 00m 31s)
- 13:22 milimetric@deploy2002: Started deploy [airflow-dags/analytics@f020959]: Deploying updated dumps reconciliation
- 13:03 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
- 12:22 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
- 12:22 jelto@deploy2002: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
- 12:22 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
- 12:21 jelto@deploy2002: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
- 11:43 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.reimage-stacked-control-plane (exit_code=0) Reimaging k8s control planes of cluster staging-codfw: containerd migration
- 11:43 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster2005.codfw.wmnet with OS bookworm
- 11:31 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dbstore1009.eqiad.wmnet
- 11:31 btullis@cumin1002: START - Cookbook sre.hosts.remove-downtime for dbstore1009.eqiad.wmnet
- 11:21 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2005.codfw.wmnet with reason: host reimage
- 11:17 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2005.codfw.wmnet with reason: host reimage
- 11:00 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestagemaster2005.codfw.wmnet with OS bookworm
- 11:00 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
- 10:59 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
- 10:59 jayme@cumin1002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster staging-codfw: containerd migration
- 10:58 jayme@cumin1002: conftool action : set/pooled=yes; selector: name=kubestagemaster2005.codfw.wmnet
- 10:39 jayme@cumin1002: END (FAIL) - Cookbook sre.k8s.reimage-stacked-control-plane (exit_code=99) Reimaging k8s control planes of cluster staging-codfw: containerd migration
- 10:38 jayme@cumin1002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster staging-codfw: containerd migration
- 10:37 jayme@cumin1002: conftool action : set/pooled=yes; selector: name=kubestagemaster2005.codfw.wmnet
- 10:37 jayme@cumin1002: conftool action : set/pooled=inactive; selector: name=kubestagemaster2005.codfw.wmnet
- 10:37 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
- 10:26 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
- 09:47 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 09:45 jelto@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 09:45 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 09:43 jelto@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 09:42 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 09:41 jelto@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 09:36 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync
- 09:35 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync
- 09:35 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync
- 09:33 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync
- 09:33 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync
- 09:33 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync
- 09:14 Dreamy_Jazz: Restarted MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
- 09:11 cjming@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
- 09:10 cjming@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
- 08:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2161 (T367856)', diff saved to https://phabricator.wikimedia.org/P70348 and previous config saved to /var/cache/conftool/dbconfig/20241018-080343-ladsgroup.json
- 08:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
- 08:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
- 01:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238 (T376905)', diff saved to https://phabricator.wikimedia.org/P70347 and previous config saved to /var/cache/conftool/dbconfig/20241018-015152-ladsgroup.json
- 01:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P70346 and previous config saved to /var/cache/conftool/dbconfig/20241018-013645-ladsgroup.json
- 01:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P70345 and previous config saved to /var/cache/conftool/dbconfig/20241018-012138-ladsgroup.json
- 01:16 eileen: civicrm upgraded from b0508a22 to 77ea54bc
- 01:16 eileen: ,
- 01:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238 (T376905)', diff saved to https://phabricator.wikimedia.org/P70344 and previous config saved to /var/cache/conftool/dbconfig/20241018-010631-ladsgroup.json
- 00:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2238 (T376905)', diff saved to https://phabricator.wikimedia.org/P70343 and previous config saved to /var/cache/conftool/dbconfig/20241018-005819-ladsgroup.json
- 00:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2238.codfw.wmnet with reason: Maintenance
- 00:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2238.codfw.wmnet with reason: Maintenance
- 00:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225 (T376905)', diff saved to https://phabricator.wikimedia.org/P70342 and previous config saved to /var/cache/conftool/dbconfig/20241018-005752-ladsgroup.json
- 00:43 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 00:43 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove mgmt DNS entries for old frack switches - pt1979@cumin2002"
- 00:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225', diff saved to https://phabricator.wikimedia.org/P70341 and previous config saved to /var/cache/conftool/dbconfig/20241018-004245-ladsgroup.json
- 00:42 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove mgmt DNS entries for old frack switches - pt1979@cumin2002"
- 00:38 pt1979@cumin2002: START - Cookbook sre.dns.netbox
- 00:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225', diff saved to https://phabricator.wikimedia.org/P70340 and previous config saved to /var/cache/conftool/dbconfig/20241018-002738-ladsgroup.json
- 00:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225 (T376905)', diff saved to https://phabricator.wikimedia.org/P70339 and previous config saved to /var/cache/conftool/dbconfig/20241018-001231-ladsgroup.json
- 00:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2225 (T376905)', diff saved to https://phabricator.wikimedia.org/P70338 and previous config saved to /var/cache/conftool/dbconfig/20241018-000422-ladsgroup.json
- 00:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2225.codfw.wmnet with reason: Maintenance
- 00:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2225.codfw.wmnet with reason: Maintenance
- 00:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T376905)', diff saved to https://phabricator.wikimedia.org/P70337 and previous config saved to /var/cache/conftool/dbconfig/20241018-000356-ladsgroup.json
2024-10-17
- 23:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P70336 and previous config saved to /var/cache/conftool/dbconfig/20241017-234849-ladsgroup.json
- 23:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P70335 and previous config saved to /var/cache/conftool/dbconfig/20241017-233342-ladsgroup.json
- 23:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T376905)', diff saved to https://phabricator.wikimedia.org/P70334 and previous config saved to /var/cache/conftool/dbconfig/20241017-231835-ladsgroup.json
- 23:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2207 (T376905)', diff saved to https://phabricator.wikimedia.org/P70333 and previous config saved to /var/cache/conftool/dbconfig/20241017-231037-ladsgroup.json
- 23:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Maintenance
- 23:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Maintenance
- 23:05 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
- 23:05 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
- 23:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T376905)', diff saved to https://phabricator.wikimedia.org/P70332 and previous config saved to /var/cache/conftool/dbconfig/20241017-230457-ladsgroup.json
- 22:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P70331 and previous config saved to /var/cache/conftool/dbconfig/20241017-224950-ladsgroup.json
- 22:42 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
- 22:42 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
- 22:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T376905)', diff saved to https://phabricator.wikimedia.org/P70330 and previous config saved to /var/cache/conftool/dbconfig/20241017-224209-ladsgroup.json
- 22:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P70329 and previous config saved to /var/cache/conftool/dbconfig/20241017-223443-ladsgroup.json
- 22:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P70328 and previous config saved to /var/cache/conftool/dbconfig/20241017-222702-ladsgroup.json
- 22:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T376905)', diff saved to https://phabricator.wikimedia.org/P70327 and previous config saved to /var/cache/conftool/dbconfig/20241017-221936-ladsgroup.json
- 22:15 eileen: civicrm upgraded from f980ace9 to b0508a22
- 22:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P70326 and previous config saved to /var/cache/conftool/dbconfig/20241017-221155-ladsgroup.json
- 22:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2189 (T376905)', diff saved to https://phabricator.wikimedia.org/P70325 and previous config saved to /var/cache/conftool/dbconfig/20241017-221123-ladsgroup.json
- 22:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: Maintenance
- 22:11 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: Maintenance
- 22:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T376905)', diff saved to https://phabricator.wikimedia.org/P70324 and previous config saved to /var/cache/conftool/dbconfig/20241017-221057-ladsgroup.json
- 21:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T376905)', diff saved to https://phabricator.wikimedia.org/P70323 and previous config saved to /var/cache/conftool/dbconfig/20241017-215648-ladsgroup.json
- 21:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P70322 and previous config saved to /var/cache/conftool/dbconfig/20241017-215550-ladsgroup.json
- 21:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1223 (T376905)', diff saved to https://phabricator.wikimedia.org/P70321 and previous config saved to /var/cache/conftool/dbconfig/20241017-215014-ladsgroup.json
- 21:50 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
- 21:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
- 21:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T376905)', diff saved to https://phabricator.wikimedia.org/P70320 and previous config saved to /var/cache/conftool/dbconfig/20241017-214949-ladsgroup.json
- 21:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P70319 and previous config saved to /var/cache/conftool/dbconfig/20241017-214043-ladsgroup.json
- 21:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P70318 and previous config saved to /var/cache/conftool/dbconfig/20241017-213442-ladsgroup.json
- 21:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T376905)', diff saved to https://phabricator.wikimedia.org/P70317 and previous config saved to /var/cache/conftool/dbconfig/20241017-212536-ladsgroup.json
- 21:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P70316 and previous config saved to /var/cache/conftool/dbconfig/20241017-211935-ladsgroup.json
- 21:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2175 (T376905)', diff saved to https://phabricator.wikimedia.org/P70315 and previous config saved to /var/cache/conftool/dbconfig/20241017-211458-ladsgroup.json
- 21:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
- 21:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
- 21:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T376905)', diff saved to https://phabricator.wikimedia.org/P70314 and previous config saved to /var/cache/conftool/dbconfig/20241017-211432-ladsgroup.json
- 21:11 kindrobot: UTC late backport window finished <3
- 21:08 kindrobot: results of de-duping: https://phabricator.wikimedia.org/P70313
- 21:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T376905)', diff saved to https://phabricator.wikimedia.org/P70312 and previous config saved to /var/cache/conftool/dbconfig/20241017-210428-ladsgroup.json
- 21:01 kindrobot: ran mwscript-k8s -f --comment="https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1080078/comments/02a9334e_cd3e7a0e" -- namespaceDupes.php on: bclwikisource, bewwiki, gorwikiquote, iglwiki, kaawiktionary, kgewiki, kuswiki, madwiktionary, moswiki, nrwiki, rskwiki, shnwikinews, and tddwiki
- 20:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P70311 and previous config saved to /var/cache/conftool/dbconfig/20241017-205925-ladsgroup.json
- 20:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1212 (T376905)', diff saved to https://phabricator.wikimedia.org/P70310 and previous config saved to /var/cache/conftool/dbconfig/20241017-205655-ladsgroup.json
- 20:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 20:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 20:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
- 20:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
- 20:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T376905)', diff saved to https://phabricator.wikimedia.org/P70309 and previous config saved to /var/cache/conftool/dbconfig/20241017-205612-ladsgroup.json
- 20:52 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 20:51 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 20:50 eileen: config revision changed from 150b02a9 to 0d019da0
- 20:50 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 20:50 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 20:49 eileen: config revision changed from 3b3e5cad to 0d019da0
- 20:48 kindrobot@deploy2002: Finished scap sync-world: Backport for Configure namespaces, sitenames, and timezones for new wikis (T377160 T375102 T375017 T375424 T376572 T377088 T374644 T375024 T374815 T375095 T375433 T360303 T363256 T360310) (duration: 31m 15s)
- 20:46 eileen: config revision changed from bf02494d to 3b3e5cad
- 20:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P70308 and previous config saved to /var/cache/conftool/dbconfig/20241017-204418-ladsgroup.json
- 20:43 kindrobot@deploy2002: pppery, kindrobot: Continuing with sync
- 20:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P70307 and previous config saved to /var/cache/conftool/dbconfig/20241017-204105-ladsgroup.json
- 20:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T376905)', diff saved to https://phabricator.wikimedia.org/P70306 and previous config saved to /var/cache/conftool/dbconfig/20241017-202911-ladsgroup.json
- 20:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P70305 and previous config saved to /var/cache/conftool/dbconfig/20241017-202558-ladsgroup.json
- 20:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T376905)', diff saved to https://phabricator.wikimedia.org/P70304 and previous config saved to /var/cache/conftool/dbconfig/20241017-201944-ladsgroup.json
- 20:20 kindrobot@deploy2002: pppery, kindrobot: Backport for Configure namespaces, sitenames, and timezones for new wikis (T377160 T375102 T375017 T375424 T376572 T377088 T374644 T375024 T374815 T375095 T375433 T360303 T363256 T360310) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
- 20:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
- 20:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T376905)', diff saved to https://phabricator.wikimedia.org/P70303 and previous config saved to /var/cache/conftool/dbconfig/20241017-201919-ladsgroup.json
- 20:17 kindrobot@deploy2002: Started scap sync-world: Backport for Configure namespaces, sitenames, and timezones for new wikis (T377160 T375102 T375017 T375424 T376572 T377088 T374644 T375024 T374815 T375095 T375433 T360303 T363256 T360310)
- 20:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T376905)', diff saved to https://phabricator.wikimedia.org/P70302 and previous config saved to /var/cache/conftool/dbconfig/20241017-201051-ladsgroup.json
- 20:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P70301 and previous config saved to /var/cache/conftool/dbconfig/20241017-200412-ladsgroup.json
- 20:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1198 (T376905)', diff saved to https://phabricator.wikimedia.org/P70300 and previous config saved to /var/cache/conftool/dbconfig/20241017-200147-ladsgroup.json
- 20:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
- 20:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
- 20:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T376905)', diff saved to https://phabricator.wikimedia.org/P70299 and previous config saved to /var/cache/conftool/dbconfig/20241017-200122-ladsgroup.json
- 19:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P70298 and previous config saved to /var/cache/conftool/dbconfig/20241017-194905-ladsgroup.json
- 19:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P70297 and previous config saved to /var/cache/conftool/dbconfig/20241017-194615-ladsgroup.json
- 19:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T376905)', diff saved to https://phabricator.wikimedia.org/P70296 and previous config saved to /var/cache/conftool/dbconfig/20241017-193358-ladsgroup.json
- 19:33 swfrench-wmf: ran authdns-update to pick up records for mw-(web|api-ext)-next in svc - T377040
- 19:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P70295 and previous config saved to /var/cache/conftool/dbconfig/20241017-193108-ladsgroup.json
- 19:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2126 (T376905)', diff saved to https://phabricator.wikimedia.org/P70294 and previous config saved to /var/cache/conftool/dbconfig/20241017-192424-ladsgroup.json
- 19:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 19:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 19:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
- 19:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
- 19:18 dancy@deploy2002: Finished scap sync-world: testing https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/484 (duration: 02m 46s)
- 19:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T376905)', diff saved to https://phabricator.wikimedia.org/P70293 and previous config saved to /var/cache/conftool/dbconfig/20241017-191601-ladsgroup.json
- 19:15 dancy@deploy2002: Started scap sync-world: testing https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/484
- 19:13 dancy@deploy2002: Installing scap version "4.112.0" for 1 hosts
- 19:07 dancy@deploy2002: Installing scap version "4.112.0" for 210 hosts
- 19:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1175 (T376905)', diff saved to https://phabricator.wikimedia.org/P70292 and previous config saved to /var/cache/conftool/dbconfig/20241017-190655-ladsgroup.json
- 19:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
- 19:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
- 19:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 19:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 18:54 ladsgroup@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 18:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 18:53 ladsgroup@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
- 18:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
- 18:49 dancy@deploy2002: Finished scap sync-world: testing scap 4.111.0 (duration: 02m 44s)
- 18:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 18:48 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 18:48 urbanecm: mwscript-k8s --comment=T377360 -f -- extensions/Flow/maintenance/FlowFixInconsistentBoards.php --wiki=wikidatawiki # T377360
- 18:47 dancy@deploy2002: Started scap sync-world: testing scap 4.111.0
- 18:45 dancy@deploy2002: Installation of scap version "4.111.0" completed for 210 hosts
- 18:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P70291 and previous config saved to /var/cache/conftool/dbconfig/20241017-184402-arnaudb.json
- 18:41 dancy@deploy2002: Installing scap version "4.111.0" for 210 hosts
- 18:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P70290 and previous config saved to /var/cache/conftool/dbconfig/20241017-182855-arnaudb.json
- 18:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 18:22 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 18:19 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.43.0-wmf.27 refs T375658
- 18:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2081.codfw.wmnet with OS bullseye
- 18:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P70289 and previous config saved to /var/cache/conftool/dbconfig/20241017-181348-arnaudb.json
- 17:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P70288 and previous config saved to /var/cache/conftool/dbconfig/20241017-175841-arnaudb.json
- 17:56 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 17:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 17:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 17:34 swfrench@deploy2002: Finished scap sync-world: Testing scap after mw-api-ext / mw-web next release bring up - T377040 (duration: 02m 54s)
- 17:31 swfrench@deploy2002: Started scap sync-world: Testing scap after mw-api-ext / mw-web next release bring up - T377040
- 17:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
- 17:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
- 17:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T376905)', diff saved to https://phabricator.wikimedia.org/P70287 and previous config saved to /var/cache/conftool/dbconfig/20241017-171844-ladsgroup.json
- 17:18 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
- 17:17 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
- 17:17 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
- 17:16 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
- 17:15 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
- 17:15 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
- 17:14 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
- 17:14 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
- 17:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye
- 17:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 17:12 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 17:07 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 17:06 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 17:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P70286 and previous config saved to /var/cache/conftool/dbconfig/20241017-170337-ladsgroup.json
- 16:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P70285 and previous config saved to /var/cache/conftool/dbconfig/20241017-165814-arnaudb.json
- 16:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
- 16:58 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
- 16:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P70284 and previous config saved to /var/cache/conftool/dbconfig/20241017-165803-arnaudb.json
- 16:55 mutante: phab2002 T377396 - reboot | in addition to /etc/passwd also fix aphlict GID in /etc/group | fixed puppet run which can now create group vcs. now equivalent to prod server phab1004.
- 16:53 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
- 16:52 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
- 16:52 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 16:52 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
- 16:51 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 16:51 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
- 16:50 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
- 16:49 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
- 16:49 mutante: phab2002 T377396 - fix UIDs/GIDs for phab-related system users: vcs: uid 496 -> 497 | aphlict: uid 497 -> uid 496, gid 497 -> gid 496 | chown aphlict:aphlict /var/log/aphlict | chown aphlict:aphlict /run/aphlict
- 16:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P70283 and previous config saved to /var/cache/conftool/dbconfig/20241017-164830-ladsgroup.json
- 16:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P70282 and previous config saved to /var/cache/conftool/dbconfig/20241017-164256-arnaudb.json
- 16:41 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 16:40 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 16:38 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 16:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T376905)', diff saved to https://phabricator.wikimedia.org/P70281 and previous config saved to /var/cache/conftool/dbconfig/20241017-163324-ladsgroup.json
- 16:28 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 16:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P70280 and previous config saved to /var/cache/conftool/dbconfig/20241017-162749-arnaudb.json
- 16:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P70279 and previous config saved to /var/cache/conftool/dbconfig/20241017-161242-arnaudb.json
- 16:02 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
- 16:01 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/page-analytics: apply
- 16:00 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
- 16:00 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply
- 15:59 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
- 15:59 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
- 15:59 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
- 15:58 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply
- 15:58 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 15:58 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 15:57 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
- 15:57 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply
- 15:56 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
- 15:56 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply
- 15:52 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
- 15:51 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply
- 15:51 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
- 15:50 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply
- 15:48 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
- 15:48 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply
- 15:47 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
- 15:47 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
- 15:45 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 15:45 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 15:44 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 15:44 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
- 15:41 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 15:40 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
- 15:39 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster2005.codfw.wmnet with OS bookworm
- 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P70278 and previous config saved to /var/cache/conftool/dbconfig/20241017-153546-ladsgroup.json
- 15:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T376905)', diff saved to https://phabricator.wikimedia.org/P70277 and previous config saved to /var/cache/conftool/dbconfig/20241017-153257-ladsgroup.json
- 15:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance
- 15:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance
- 15:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T376905)', diff saved to https://phabricator.wikimedia.org/P70276 and previous config saved to /var/cache/conftool/dbconfig/20241017-153238-ladsgroup.json
- 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P70275 and previous config saved to /var/cache/conftool/dbconfig/20241017-152040-ladsgroup.json
- 15:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P70274 and previous config saved to /var/cache/conftool/dbconfig/20241017-151731-ladsgroup.json
- 15:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P70273 and previous config saved to /var/cache/conftool/dbconfig/20241017-151216-arnaudb.json
- 15:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2172.codfw.wmnet with reason: Maintenance
- 15:12 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2172.codfw.wmnet with reason: Maintenance
- 15:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70272 and previous config saved to /var/cache/conftool/dbconfig/20241017-151204-arnaudb.json
- 15:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P70271 and previous config saved to /var/cache/conftool/dbconfig/20241017-150535-ladsgroup.json
- 15:05 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 15:05 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 15:04 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 15:03 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 15:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P70270 and previous config saved to /var/cache/conftool/dbconfig/20241017-150224-ladsgroup.json
- 15:01 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 15:00 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 15:00 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 14:59 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 14:57 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 14:57 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 14:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70269 and previous config saved to /var/cache/conftool/dbconfig/20241017-145657-arnaudb.json
- 14:56 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:56 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:54 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 14:54 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 14:54 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 14:53 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 14:53 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 14:52 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 14:52 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 14:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P70268 and previous config saved to /var/cache/conftool/dbconfig/20241017-145030-ladsgroup.json
- 14:51 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 14:51 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 14:51 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 14:50 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 14:50 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 14:49 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 14:49 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 14:49 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 14:48 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 14:48 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 14:47 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
- 14:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T376905)', diff saved to https://phabricator.wikimedia.org/P70267 and previous config saved to /var/cache/conftool/dbconfig/20241017-144717-ladsgroup.json
- 14:43 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 14:43 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 14:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70266 and previous config saved to /var/cache/conftool/dbconfig/20241017-144150-arnaudb.json
- 14:41 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 14:40 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 14:40 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 14:39 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 14:38 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 14:38 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 14:31 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2005.codfw.wmnet with reason: host reimage
- 14:28 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2005.codfw.wmnet with reason: host reimage
- 14:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70265 and previous config saved to /var/cache/conftool/dbconfig/20241017-142643-arnaudb.json
- 14:09 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestagemaster2005.codfw.wmnet with OS bookworm
- 14:08 urbanecm@deploy2002: Finished scap sync-world: Backport for Bump wikimedia/parsoid to 0.20.0-a26 (T377287), Bump wikimedia/parsoid to 0.20.0-a26 (T377287) (duration: 09m 41s)
- 14:03 urbanecm@deploy2002: cscott, urbanecm: Continuing with sync
- 14:00 urbanecm@deploy2002: cscott, urbanecm: Backport for Bump wikimedia/parsoid to 0.20.0-a26 (T377287), Bump wikimedia/parsoid to 0.20.0-a26 (T377287) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:00 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 13:59 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 13:58 urbanecm@deploy2002: Started scap sync-world: Backport for Bump wikimedia/parsoid to 0.20.0-a26 (T377287), Bump wikimedia/parsoid to 0.20.0-a26 (T377287)
- 13:56 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 13:54 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 13:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T376905)', diff saved to https://phabricator.wikimedia.org/P70264 and previous config saved to /var/cache/conftool/dbconfig/20241017-134651-ladsgroup.json
- 13:47 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
- 13:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
- 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T376905)', diff saved to https://phabricator.wikimedia.org/P70263 and previous config saved to /var/cache/conftool/dbconfig/20241017-134636-ladsgroup.json
- 13:35 urbanecm@deploy2002: Finished scap sync-world: Backport for Set $wgAllowRawHtmlCopyrightMessages = false (T375789), tests: ensure maintenance base class has always been requierd (T377391 T357535) (duration: 08m 07s)
- 13:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70261 and previous config saved to /var/cache/conftool/dbconfig/20241017-133129-ladsgroup.json
- 13:30 urbanecm@deploy2002: cscott, urbanecm, matmarex: Continuing with sync
- 13:29 urbanecm@deploy2002: cscott, urbanecm, matmarex: Backport for Set $wgAllowRawHtmlCopyrightMessages = false (T375789), tests: ensure maintenance base class has always been requierd (T377391 T357535) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:29 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript updateCollation.php --wiki=cswikivoyage --previous-collation=uppercase # T377446
- 13:27 urbanecm@deploy2002: Started scap sync-world: Backport for Set $wgAllowRawHtmlCopyrightMessages = false (T375789), tests: ensure maintenance base class has always been requierd (T377391 T357535)
- 13:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70260 and previous config saved to /var/cache/conftool/dbconfig/20241017-132617-arnaudb.json
- 13:26 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 13:26 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 13:26 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
- 13:26 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
- 13:24 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2204.codfw.wmnet with reason: Maintenance
- 13:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2204.codfw.wmnet with reason: Maintenance
- 13:23 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1222.eqiad.wmnet with reason: Maintenance
- 13:23 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1222.eqiad.wmnet with reason: Maintenance
- 13:22 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 13:22 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 13:18 inflatador: bking@wdqs1015 depooling to catch up on lag
- 13:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70258 and previous config saved to /var/cache/conftool/dbconfig/20241017-131622-ladsgroup.json
- 13:14 urbanecm@deploy2002: Finished scap sync-world: Backport for cswikivoyage: Set category collation to uca-cs-u-kn (T377446), QuickSurveys: Update safety survey coverage (T376517) (duration: 07m 23s)
- 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1166 (T376905)', diff saved to https://phabricator.wikimedia.org/P70257 and previous config saved to /var/cache/conftool/dbconfig/20241017-131012-ladsgroup.json
- 13:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
- 13:10 urbanecm@deploy2002: kharlan, urbanecm: Continuing with sync
- 13:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
- 13:09 urbanecm@deploy2002: kharlan, urbanecm: Backport for cswikivoyage: Set category collation to uca-cs-u-kn (T377446), QuickSurveys: Update safety survey coverage (T376517) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T376905)', diff saved to https://phabricator.wikimedia.org/P70256 and previous config saved to /var/cache/conftool/dbconfig/20241017-130947-ladsgroup.json
- 13:09 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 13:07 urbanecm@deploy2002: Started scap sync-world: Backport for cswikivoyage: Set category collation to uca-cs-u-kn (T377446), QuickSurveys: Update safety survey coverage (T376517)
- 13:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T376905)', diff saved to https://phabricator.wikimedia.org/P70255 and previous config saved to /var/cache/conftool/dbconfig/20241017-130115-ladsgroup.json
- 13:00 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster2005.codfw.wmnet with OS bookworm
- 12:59 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 12:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2209.codfw.wmnet with reason: Maintenance
- 12:58 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2209.codfw.wmnet with reason: Maintenance
- 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1189.eqiad.wmnet with reason: Maintenance
- 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1189.eqiad.wmnet with reason: Maintenance
- 12:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P70254 and previous config saved to /var/cache/conftool/dbconfig/20241017-125440-ladsgroup.json
- 12:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P70253 and previous config saved to /var/cache/conftool/dbconfig/20241017-123932-ladsgroup.json
- 12:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T376905)', diff saved to https://phabricator.wikimedia.org/P70252 and previous config saved to /var/cache/conftool/dbconfig/20241017-122425-ladsgroup.json
- 12:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1157 (T376905)', diff saved to https://phabricator.wikimedia.org/P70251 and previous config saved to /var/cache/conftool/dbconfig/20241017-121525-ladsgroup.json
- 12:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
- 12:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
- 12:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 12:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 12:07 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 12:07 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 12:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T376905)', diff saved to https://phabricator.wikimedia.org/P70250 and previous config saved to /var/cache/conftool/dbconfig/20241017-120049-ladsgroup.json
- 12:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 12:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 12:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
- 12:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
- 12:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T376905)', diff saved to https://phabricator.wikimedia.org/P70249 and previous config saved to /var/cache/conftool/dbconfig/20241017-120029-ladsgroup.json
- 11:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P70248 and previous config saved to /var/cache/conftool/dbconfig/20241017-114522-ladsgroup.json
- 11:39 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1177.eqiad.wmnet
- 11:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P70247 and previous config saved to /var/cache/conftool/dbconfig/20241017-113014-ladsgroup.json
- 11:29 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1177.eqiad.wmnet
- 11:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T376905)', diff saved to https://phabricator.wikimedia.org/P70246 and previous config saved to /var/cache/conftool/dbconfig/20241017-111507-ladsgroup.json
- 11:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T376905)', diff saved to https://phabricator.wikimedia.org/P70245 and previous config saved to /var/cache/conftool/dbconfig/20241017-110527-ladsgroup.json
- 11:05 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
- 11:05 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
- 10:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
- 10:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
- 10:17 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on kubestagemaster2005.codfw.wmnet with reason: reimage
- 10:17 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on kubestagemaster2005.codfw.wmnet with reason: reimage
- 09:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 09:34 dzahn@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host phab2002.codfw.wmnet with OS bullseye
- 09:22 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 09:21 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 09:09 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add support for read-only users - oblivian@cumin1002"
- 09:09 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add support for read-only users - oblivian@cumin1002
- 09:08 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add support for read-only users - oblivian@cumin1002
- 09:08 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add support for read-only users - oblivian@cumin1002"
- 09:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 100%: post clone', diff saved to https://phabricator.wikimedia.org/P70243 and previous config saved to /var/cache/conftool/dbconfig/20241017-090731-arnaudb.json
- 08:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 75%: post clone', diff saved to https://phabricator.wikimedia.org/P70242 and previous config saved to /var/cache/conftool/dbconfig/20241017-085226-arnaudb.json
- 08:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 50%: post clone', diff saved to https://phabricator.wikimedia.org/P70241 and previous config saved to /var/cache/conftool/dbconfig/20241017-083721-arnaudb.json
- 08:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 25%: post clone', diff saved to https://phabricator.wikimedia.org/P70240 and previous config saved to /var/cache/conftool/dbconfig/20241017-082215-arnaudb.json
- 08:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db2149 to reclone on db2205 - T377276', diff saved to https://phabricator.wikimedia.org/P70239 and previous config saved to /var/cache/conftool/dbconfig/20241017-081822-arnaudb.json
- 08:18 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 25%: post clone', diff saved to https://phabricator.wikimedia.org/P70238 and previous config saved to /var/cache/conftool/dbconfig/20241017-081802-arnaudb.json
- 08:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2149.codfw.wmnet onto db2205.codfw.wmnet
- 08:11 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1065.eqiad.wmnet
- 08:01 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1065.eqiad.wmnet
- 07:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 07:55 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 07:51 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2005.codfw.wmnet with reason: host reimage
- 07:48 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2005.codfw.wmnet with reason: host reimage
- 07:37 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
- 07:37 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
- 07:37 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 07:36 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 07:28 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestagemaster2005.codfw.wmnet with OS bookworm
- 07:19 dcausse@deploy2002: Finished scap sync-world: Backport for cirrus: cleanup removed label_count field on next re-index (T377226) (duration: 10m 40s)
- 07:18 jayme@cumin1002: conftool action : set/pooled=inactive; selector: name=kubestagemaster2005.codfw.wmnet
- 07:14 dcausse@deploy2002: dcausse: Continuing with sync
- 07:13 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kubestagemaster2005.codfw.wmnet with reason: reimage
- 07:13 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on kubestagemaster2005.codfw.wmnet with reason: reimage
- 07:13 dcausse@deploy2002: dcausse: Backport for cirrus: cleanup removed label_count field on next re-index (T377226) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:08 dcausse@deploy2002: Started scap sync-world: Backport for cirrus: cleanup removed label_count field on next re-index (T377226)
- 07:00 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2149.codfw.wmnet onto db2205.codfw.wmnet
- 07:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db2149 to reclone on db2205 - T377276', diff saved to https://phabricator.wikimedia.org/P70237 and previous config saved to /var/cache/conftool/dbconfig/20241017-070015-arnaudb.json
- 06:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2205.codfw.wmnet with OS bookworm
- 06:32 arnaudb@cumin1002: dbctl commit (dc=all): 'db1219 (re)pooling @ 100%: T367781', diff saved to https://phabricator.wikimedia.org/P70236 and previous config saved to /var/cache/conftool/dbconfig/20241017-063238-arnaudb.json
- 06:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2205.codfw.wmnet with reason: host reimage
- 06:26 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2205.codfw.wmnet with reason: host reimage
- 06:17 arnaudb@cumin1002: dbctl commit (dc=all): 'db1219 (re)pooling @ 75%: T367781', diff saved to https://phabricator.wikimedia.org/P70235 and previous config saved to /var/cache/conftool/dbconfig/20241017-061732-arnaudb.json
- 06:07 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db2205.codfw.wmnet with OS bookworm
- 06:02 arnaudb@cumin1002: dbctl commit (dc=all): 'db1219 (re)pooling @ 50%: T367781', diff saved to https://phabricator.wikimedia.org/P70234 and previous config saved to /var/cache/conftool/dbconfig/20241017-060227-arnaudb.json
- 05:47 arnaudb@cumin1002: dbctl commit (dc=all): 'db1219 (re)pooling @ 25%: T367781', diff saved to https://phabricator.wikimedia.org/P70233 and previous config saved to /var/cache/conftool/dbconfig/20241017-054722-arnaudb.json
- 05:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222 (T376905)', diff saved to https://phabricator.wikimedia.org/P70231 and previous config saved to /var/cache/conftool/dbconfig/20241017-051700-ladsgroup.json
- 05:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222', diff saved to https://phabricator.wikimedia.org/P70230 and previous config saved to /var/cache/conftool/dbconfig/20241017-050153-ladsgroup.json
- 04:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222', diff saved to https://phabricator.wikimedia.org/P70229 and previous config saved to /var/cache/conftool/dbconfig/20241017-044646-ladsgroup.json
- 04:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222 (T376905)', diff saved to https://phabricator.wikimedia.org/P70228 and previous config saved to /var/cache/conftool/dbconfig/20241017-043139-ladsgroup.json
- 04:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2222 (T376905)', diff saved to https://phabricator.wikimedia.org/P70227 and previous config saved to /var/cache/conftool/dbconfig/20241017-042440-ladsgroup.json
- 04:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2222.codfw.wmnet with reason: Maintenance
- 04:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2222.codfw.wmnet with reason: Maintenance
- 04:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221 (T376905)', diff saved to https://phabricator.wikimedia.org/P70226 and previous config saved to /var/cache/conftool/dbconfig/20241017-042413-ladsgroup.json
- 04:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221', diff saved to https://phabricator.wikimedia.org/P70225 and previous config saved to /var/cache/conftool/dbconfig/20241017-040906-ladsgroup.json
- 03:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221', diff saved to https://phabricator.wikimedia.org/P70224 and previous config saved to /var/cache/conftool/dbconfig/20241017-035359-ladsgroup.json
- 03:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221 (T376905)', diff saved to https://phabricator.wikimedia.org/P70223 and previous config saved to /var/cache/conftool/dbconfig/20241017-033852-ladsgroup.json
- 03:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2221 (T376905)', diff saved to https://phabricator.wikimedia.org/P70222 and previous config saved to /var/cache/conftool/dbconfig/20241017-033144-ladsgroup.json
- 03:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2221.codfw.wmnet with reason: Maintenance
- 03:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2221.codfw.wmnet with reason: Maintenance
- 03:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T376905)', diff saved to https://phabricator.wikimedia.org/P70221 and previous config saved to /var/cache/conftool/dbconfig/20241017-033118-ladsgroup.json
- 03:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P70220 and previous config saved to /var/cache/conftool/dbconfig/20241017-031611-ladsgroup.json
- 03:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P70219 and previous config saved to /var/cache/conftool/dbconfig/20241017-030104-ladsgroup.json
- 02:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T376905)', diff saved to https://phabricator.wikimedia.org/P70218 and previous config saved to /var/cache/conftool/dbconfig/20241017-024557-ladsgroup.json
- 02:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2220 (T376905)', diff saved to https://phabricator.wikimedia.org/P70217 and previous config saved to /var/cache/conftool/dbconfig/20241017-023857-ladsgroup.json
- 02:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2220.codfw.wmnet with reason: Maintenance
- 02:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2220.codfw.wmnet with reason: Maintenance
- 02:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T376905)', diff saved to https://phabricator.wikimedia.org/P70216 and previous config saved to /var/cache/conftool/dbconfig/20241017-023831-ladsgroup.json
- 02:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P70215 and previous config saved to /var/cache/conftool/dbconfig/20241017-022324-ladsgroup.json
- 02:18 tstarling@deploy2002: Synchronized wmf-config/InitialiseSettings.php: T4085 Enable en on Commons and Meta (duration: 06m 34s)
- 02:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P70214 and previous config saved to /var/cache/conftool/dbconfig/20241017-020817-ladsgroup.json
- 01:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T376905)', diff saved to https://phabricator.wikimedia.org/P70213 and previous config saved to /var/cache/conftool/dbconfig/20241017-015310-ladsgroup.json
- 01:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2208 (T376905)', diff saved to https://phabricator.wikimedia.org/P70212 and previous config saved to /var/cache/conftool/dbconfig/20241017-014500-ladsgroup.json
- 01:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2208.codfw.wmnet with reason: Maintenance
- 01:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2208.codfw.wmnet with reason: Maintenance
- 01:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
- 01:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
- 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T376905)', diff saved to https://phabricator.wikimedia.org/P70211 and previous config saved to /var/cache/conftool/dbconfig/20241017-013926-ladsgroup.json
- 01:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P70210 and previous config saved to /var/cache/conftool/dbconfig/20241017-012419-ladsgroup.json
- 01:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P70209 and previous config saved to /var/cache/conftool/dbconfig/20241017-010912-ladsgroup.json
- 00:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T376905)', diff saved to https://phabricator.wikimedia.org/P70208 and previous config saved to /var/cache/conftool/dbconfig/20241017-005405-ladsgroup.json
- 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2182 (T376905)', diff saved to https://phabricator.wikimedia.org/P70207 and previous config saved to /var/cache/conftool/dbconfig/20241017-004537-ladsgroup.json
- 00:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
- 00:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
- 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T376905)', diff saved to https://phabricator.wikimedia.org/P70206 and previous config saved to /var/cache/conftool/dbconfig/20241017-004511-ladsgroup.json
- 00:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P70204 and previous config saved to /var/cache/conftool/dbconfig/20241017-003004-ladsgroup.json
- 00:26 cjming@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
- 00:25 cjming@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
- 00:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P70203 and previous config saved to /var/cache/conftool/dbconfig/20241017-001457-ladsgroup.json
2024-10-16
- 23:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T376905)', diff saved to https://phabricator.wikimedia.org/P70202 and previous config saved to /var/cache/conftool/dbconfig/20241016-235950-ladsgroup.json
- 23:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2168 (T376905)', diff saved to https://phabricator.wikimedia.org/P70201 and previous config saved to /var/cache/conftool/dbconfig/20241016-235129-ladsgroup.json
- 23:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
- 23:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
- 23:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T376905)', diff saved to https://phabricator.wikimedia.org/P70200 and previous config saved to /var/cache/conftool/dbconfig/20241016-235102-ladsgroup.json
- 23:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P70199 and previous config saved to /var/cache/conftool/dbconfig/20241016-233555-ladsgroup.json
- 23:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P70198 and previous config saved to /var/cache/conftool/dbconfig/20241016-232048-ladsgroup.json
- 23:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T376905)', diff saved to https://phabricator.wikimedia.org/P70197 and previous config saved to /var/cache/conftool/dbconfig/20241016-230541-ladsgroup.json
- 22:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2159 (T376905)', diff saved to https://phabricator.wikimedia.org/P70196 and previous config saved to /var/cache/conftool/dbconfig/20241016-225716-ladsgroup.json
- 22:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 22:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 22:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
- 22:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
- 22:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T376905)', diff saved to https://phabricator.wikimedia.org/P70195 and previous config saved to /var/cache/conftool/dbconfig/20241016-225646-ladsgroup.json
- 22:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P70194 and previous config saved to /var/cache/conftool/dbconfig/20241016-224139-ladsgroup.json
- 22:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P70193 and previous config saved to /var/cache/conftool/dbconfig/20241016-222632-ladsgroup.json
- 22:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T376905)', diff saved to https://phabricator.wikimedia.org/P70192 and previous config saved to /var/cache/conftool/dbconfig/20241016-221125-ladsgroup.json
- 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2150 (T376905)', diff saved to https://phabricator.wikimedia.org/P70191 and previous config saved to /var/cache/conftool/dbconfig/20241016-220053-ladsgroup.json
- 22:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
- 22:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
- 21:17 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
- 21:17 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
- 21:12 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 21:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 20:44 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
- 20:44 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
- 20:43 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 20:43 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 20:39 gmodena@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 20:39 gmodena@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
- 20:37 gmodena@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 20:37 gmodena@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
- 20:31 brennen@deploy2002: Finished deploy [phabricator/deployment@40a63c9]: deploy phab2002 for T377374 (duration: 00m 08s)
- 20:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T376905)', diff saved to https://phabricator.wikimedia.org/P70189 and previous config saved to /var/cache/conftool/dbconfig/20241016-203034-ladsgroup.json
- 20:30 brennen@deploy2002: Started deploy [phabricator/deployment@40a63c9]: deploy phab2002 for T377374
- 20:29 gmodena@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 20:29 gmodena@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
- 20:26 jhuneidi@deploy2002: Finished scap sync-world: Backport for Make wikitech a target for CentralNotice banners (T377030) (duration: 10m 02s)
- 20:21 jhuneidi@deploy2002: ejegg, jhuneidi: Continuing with sync
- 20:18 jhuneidi@deploy2002: ejegg, jhuneidi: Backport for Make wikitech a target for CentralNotice banners (T377030) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:18 mutante: phab2002 - ln -s /var/lib/scap/scap/bin/scap /usr/bin/scap
- 20:17 mutante: phab2002 - after manually running bootstrap-scap-target.sh and "Scap from local bullseye wheels successfully installed at /var/lib/scap/scap" still "cannot open `/usr/bin/scap' (No such file or directory)" though. T303559 T310740 T377374
- 20:17 jhuneidi@deploy2002: Started scap sync-world: Backport for Make wikitech a target for CentralNotice banners (T377030)
- 20:16 mutante: phab2002 - manually bootstrapping scap since puppet did not do it due to dependency cycles: sudo -u scap /usr/local/bin/bootstrap-scap-target.sh deploy2002.codfw.wmnet /var/lib/scap T303559 T310740 T377374
- 20:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P70188 and previous config saved to /var/cache/conftool/dbconfig/20241016-201527-ladsgroup.json
- 20:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P70187 and previous config saved to /var/cache/conftool/dbconfig/20241016-200020-ladsgroup.json
- 19:54 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx-out1001.wikimedia.org
- 19:50 jhathaway@cumin1002: START - Cookbook sre.hosts.reboot-single for host mx-out1001.wikimedia.org
- 19:49 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx-out2001.wikimedia.org
- 19:47 jhathaway@cumin1002: START - Cookbook sre.hosts.reboot-single for host mx-out2001.wikimedia.org
- 19:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T376905)', diff saved to https://phabricator.wikimedia.org/P70186 and previous config saved to /var/cache/conftool/dbconfig/20241016-194513-ladsgroup.json
- 19:47 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mx-out2001.wikimedia.org
- 19:47 jhathaway@cumin1002: START - Cookbook sre.hosts.reboot-single for host mx-out2001.wikimedia.org
- 19:46 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mx-out2001.wikimedia.org
- 19:45 jhathaway@cumin1002: START - Cookbook sre.hosts.reboot-single for host mx-out2001.wikimedia.org
- 19:45 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mx1001.wikimedia.org
- 19:44 jhathaway@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:44 jhathaway@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mx1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jhathaway@cumin1002"
- 19:43 jhathaway@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mx1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jhathaway@cumin1002"
- 19:42 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mx-out2001.wikimedia.org
- 19:42 jhathaway@cumin1002: START - Cookbook sre.hosts.reboot-single for host mx-out2001.wikimedia.org
- 19:40 jhathaway@cumin1002: START - Cookbook sre.dns.netbox
- 19:36 jhathaway@cumin1002: START - Cookbook sre.hosts.decommission for hosts mx1001.wikimedia.org
- 19:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2237 (T376905)', diff saved to https://phabricator.wikimedia.org/P70185 and previous config saved to /var/cache/conftool/dbconfig/20241016-193500-ladsgroup.json
- 19:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance
- 19:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance
- 19:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T376905)', diff saved to https://phabricator.wikimedia.org/P70184 and previous config saved to /var/cache/conftool/dbconfig/20241016-193433-ladsgroup.json
- 19:30 brennen@deploy2002: Finished deploy [phabricator/deployment@40a63c9]: deploy phab2002 for T377374 (duration: 10m 42s)
- 19:19 brennen@deploy2002: Started deploy [phabricator/deployment@40a63c9]: deploy phab2002 for T377374
- 19:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P70183 and previous config saved to /var/cache/conftool/dbconfig/20241016-191926-ladsgroup.json
- 19:16 inflatador: bking@stat1011 racadm>>racadm jobqueue create BIOS.Setup.1-1 Commit JID = JID_291241139935 T376813
- 19:14 inflatador: bking@stat1011 racadm>>racadm set BIOS.MemSettings.NodeInterleave Enabled T376813
- 19:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P70182 and previous config saved to /var/cache/conftool/dbconfig/20241016-190419-ladsgroup.json
- 18:54 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1177.eqiad.wmnet with OS bullseye
- 18:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T376905)', diff saved to https://phabricator.wikimedia.org/P70181 and previous config saved to /var/cache/conftool/dbconfig/20241016-184912-ladsgroup.json
- 18:47 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mx2001.wikimedia.org
- 18:47 jhathaway@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:46 jhathaway@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mx2001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jhathaway@cumin1002"
- 18:45 jhathaway@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mx2001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jhathaway@cumin1002"
- 18:43 papaul: maintenance on mr1-ulsfo complete
- 18:41 jhathaway@cumin1002: START - Cookbook sre.dns.netbox
- 18:36 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1176.eqiad.wmnet with OS bullseye
- 18:35 jhathaway@cumin1002: START - Cookbook sre.hosts.decommission for hosts mx2001.wikimedia.org
- 18:33 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on phab2002.codfw.wmnet with reason: host reimage
- 18:32 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
- 18:32 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
- 18:31 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 18:31 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 18:29 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on phab2002.codfw.wmnet with reason: host reimage
- 18:27 gmodena@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 18:27 gmodena@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
- 18:21 gmodena@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 18:20 gmodena@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
- 18:17 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.43.0-wmf.27 refs T375658
- 18:13 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host phab2002
- 18:13 dzahn@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host phab2002
- 18:13 dzahn@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host phab2002
- 18:12 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) phab2002.codfw.wmnet 54.32.192.10.in-addr.arpa 4.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 18:12 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache phab2002.codfw.wmnet 54.32.192.10.in-addr.arpa 4.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 18:12 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:12 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host phab2002 - dzahn@cumin2002"
- 18:11 gmodena@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 18:11 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host phab2002 - dzahn@cumin2002"
- 18:11 gmodena@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
- 18:06 dzahn@cumin2002: START - Cookbook sre.dns.netbox
- 18:05 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 18:04 cdanis@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 18:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 18:01 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 18:00 papaul: ongoing maintenance on mr1-ulsfo
- 18:00 cdanis@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 17:58 dzahn@cumin2002: START - Cookbook sre.hosts.move-vlan for host phab2002
- 17:58 cdanis@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 17:57 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host phab2002.codfw.wmnet with OS bullseye
- 17:56 cdanis@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 17:55 cdanis@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 17:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T376905)', diff saved to https://phabricator.wikimedia.org/P70179 and previous config saved to /var/cache/conftool/dbconfig/20241016-174847-ladsgroup.json
- 17:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance
- 17:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance
- 17:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T376905)', diff saved to https://phabricator.wikimedia.org/P70178 and previous config saved to /var/cache/conftool/dbconfig/20241016-174821-ladsgroup.json
- 17:48 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:48 swfrench@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly allocated LVS VIPs for mw-web-next and mw-api-ext-next - swfrench@cumin2002"
- 17:41 swfrench@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly allocated LVS VIPs for mw-web-next and mw-api-ext-next - swfrench@cumin2002"
- 17:39 gmodena@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 17:38 gmodena@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
- 17:37 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1177.eqiad.wmnet with OS bullseye
- 17:37 swfrench@cumin2002: START - Cookbook sre.dns.netbox
- 17:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P70177 and previous config saved to /var/cache/conftool/dbconfig/20241016-173314-ladsgroup.json
- 17:20 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1176.eqiad.wmnet with OS bullseye
- 17:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P70176 and previous config saved to /var/cache/conftool/dbconfig/20241016-171807-ladsgroup.json
- 17:16 xcollazo@deploy2002: Finished deploy [analytics/refinery@f186c94] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@f186c94a] (duration: 03m 44s)
- 17:13 xcollazo@deploy2002: Started deploy [analytics/refinery@f186c94] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@f186c94a]
- 17:12 xcollazo@deploy2002: Finished deploy [analytics/refinery@f186c94] (thin): Regular analytics weekly train THIN [analytics/refinery@f186c94a] (duration: 05m 11s)
- 17:06 xcollazo@deploy2002: Started deploy [analytics/refinery@f186c94] (thin): Regular analytics weekly train THIN [analytics/refinery@f186c94a]
- 17:06 xcollazo@deploy2002: Finished deploy [analytics/refinery@f186c94]: Regular analytics weekly train [analytics/refinery@f186c94a] (duration: 08m 54s)
- 17:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T376905)', diff saved to https://phabricator.wikimedia.org/P70175 and previous config saved to /var/cache/conftool/dbconfig/20241016-170300-ladsgroup.json
- 16:57 xcollazo@deploy2002: Started deploy [analytics/refinery@f186c94]: Regular analytics weekly train [analytics/refinery@f186c94a]
- 16:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2210 (T376905)', diff saved to https://phabricator.wikimedia.org/P70174 and previous config saved to /var/cache/conftool/dbconfig/20241016-165343-ladsgroup.json
- 16:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2210.codfw.wmnet with reason: Maintenance
- 16:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2210.codfw.wmnet with reason: Maintenance
- 16:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T376905)', diff saved to https://phabricator.wikimedia.org/P70173 and previous config saved to /var/cache/conftool/dbconfig/20241016-165317-ladsgroup.json
- 16:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P70172 and previous config saved to /var/cache/conftool/dbconfig/20241016-163810-ladsgroup.json
- 16:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P70171 and previous config saved to /var/cache/conftool/dbconfig/20241016-162303-ladsgroup.json
- 16:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T376905)', diff saved to https://phabricator.wikimedia.org/P70170 and previous config saved to /var/cache/conftool/dbconfig/20241016-160756-ladsgroup.json
- 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2206 (T376905)', diff saved to https://phabricator.wikimedia.org/P70169 and previous config saved to /var/cache/conftool/dbconfig/20241016-155948-ladsgroup.json
- 15:59 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance
- 15:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance
- 15:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance
- 15:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance
- 15:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T376905)', diff saved to https://phabricator.wikimedia.org/P70168 and previous config saved to /var/cache/conftool/dbconfig/20241016-155450-ladsgroup.json
- 15:52 papaul: maintenance on mr1-eqsin complete
- 15:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P70167 and previous config saved to /var/cache/conftool/dbconfig/20241016-153943-ladsgroup.json
- 15:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P70166 and previous config saved to /var/cache/conftool/dbconfig/20241016-152436-ladsgroup.json
- 15:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T376905)', diff saved to https://phabricator.wikimedia.org/P70165 and previous config saved to /var/cache/conftool/dbconfig/20241016-150928-ladsgroup.json
- 15:05 papaul: ongoing maintenance on mr1-eqsin
- 14:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 14:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 14:41 urbanecm@deploy2002: Finished scap sync-world: Backport for [Growth] beta: Lower batch size for reassignMenteesJob (T376124) (duration: 06m 46s)
- 14:35 urbanecm@deploy2002: Started scap sync-world: Backport for [Growth] beta: Lower batch size for reassignMenteesJob (T376124)
- 14:25 Lucas_WMDE: UTC afternoon backport+config window done
- 14:25 Lucas_WMDE: [cont.] 7)]], Hard-code LabelCountField::NAME (T377226), Remove LabelCountField (T377226), Drop label_count field (LabelCountField) (T377226) (duration: 11m 36s)
- {{safesubst:SAL entry|1=14:24 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Tests: Skip testViewForExistingGlobalTemporaryAccount (T377197), Hard-code LabelCountField::NAME (T377226), Remove LabelCountField (T377226), Drop label_count field (LabelCountField) (T377226), [[gerrit:1080703|Tests: Skip testViewForExistingGlobalTemporaryAccount (T37719}}
- 14:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 14:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 14:20 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
- 14:19 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - oblivian@cumin1002"
- 14:19 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: [not really into teleological thinking] - oblivian@cumin1002
- 14:19 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
- 14:18 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
- 14:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T367856)', diff saved to https://phabricator.wikimedia.org/P70164 and previous config saved to /var/cache/conftool/dbconfig/20241016-141819-ladsgroup.json
- 14:18 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: [not really into teleological thinking] - oblivian@cumin1002
- 14:18 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - oblivian@cumin1002"
- 14:17 oblivian@cumin1002: END (FAIL) - Cookbook sre.deploy.hiddenparma (exit_code=99) Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - oblivian@cumin1002"
- 14:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - oblivian@cumin1002"
- 14:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1219.eqiad.wmnet with reason: Maintenance
- 14:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1219.eqiad.wmnet with reason: Maintenance
- 14:15 Lucas_WMDE: [cont.] ], Hard-code LabelCountField::NAME (T377226), Remove LabelCountField (T377226), Drop label_count field (LabelCountField) (T377226) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- {{safesubst:SAL entry|1=14:15 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Tests: Skip testViewForExistingGlobalTemporaryAccount (T377197), Hard-code LabelCountField::NAME (T377226), Remove LabelCountField (T377226), Drop label_count field (LabelCountField) (T377226), Tests: Skip testViewForExistingGlobalTemporaryAccount (T377197)]}}
- 14:13 Lucas_WMDE: [cont.] ), Hard-code LabelCountField::NAME (T377226), Remove LabelCountField (T377226), Drop label_count field (LabelCountField) (T377226)
- {{safesubst:SAL entry|1=14:13 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Tests: Skip testViewForExistingGlobalTemporaryAccount (T377197), Hard-code LabelCountField::NAME (T377226), Remove LabelCountField (T377226), Drop label_count field (LabelCountField) (T377226), [[gerrit:1080703|Tests: Skip testViewForExistingGlobalTemporaryAccount (T377197}}
- 14:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T376905)', diff saved to https://phabricator.wikimedia.org/P70163 and previous config saved to /var/cache/conftool/dbconfig/20241016-140902-ladsgroup.json
- 14:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
- 14:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
- 14:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T376905)', diff saved to https://phabricator.wikimedia.org/P70162 and previous config saved to /var/cache/conftool/dbconfig/20241016-140835-ladsgroup.json
- 14:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70161 and previous config saved to /var/cache/conftool/dbconfig/20241016-140312-ladsgroup.json
- 13:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70160 and previous config saved to /var/cache/conftool/dbconfig/20241016-135328-ladsgroup.json
- 13:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70159 and previous config saved to /var/cache/conftool/dbconfig/20241016-134805-ladsgroup.json
- 13:43 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1177.eqiad.wmnet with OS bullseye
- 13:41 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1176.eqiad.wmnet with OS bullseye
- 13:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70158 and previous config saved to /var/cache/conftool/dbconfig/20241016-133821-ladsgroup.json
- 13:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T367856)', diff saved to https://phabricator.wikimedia.org/P70157 and previous config saved to /var/cache/conftool/dbconfig/20241016-133257-ladsgroup.json
- 13:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Update Z669x references to Z609x (duration: 08m 23s)
- 13:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T376905)', diff saved to https://phabricator.wikimedia.org/P70156 and previous config saved to /var/cache/conftool/dbconfig/20241016-132314-ladsgroup.json
- 13:20 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, jforrester: Continuing with sync
- 13:19 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, jforrester: Backport for Update Z669x references to Z609x synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Update Z669x references to Z609x
- 13:16 Dreamy_Jazz: Started time limited scan on enwiki - https://wikitech.wikimedia.org/wiki/MediaModeration
- 13:16 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Remove wgGEUseNewImpactModule config (T350077) (duration: 11m 35s)
- 13:11 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, cyndywikime: Continuing with sync
- 13:07 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, cyndywikime: Backport for Remove wgGEUseNewImpactModule config (T350077) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:04 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Remove wgGEUseNewImpactModule config (T350077)
- 12:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2200.codfw.wmnet with reason: Maintenance
- 12:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2200.codfw.wmnet with reason: Maintenance
- 12:52 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1177.eqiad.wmnet with OS bullseye
- 12:47 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 12:46 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 12:46 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 12:46 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 12:43 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1176.eqiad.wmnet with OS bullseye
- 12:35 stevemunene@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1177
- 12:35 stevemunene@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1177
- 12:35 stevemunene@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1176
- 12:34 stevemunene@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1176
- 12:33 Dreamy_Jazz: Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
- 12:32 stevemunene@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:32 stevemunene@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly reassigned an-worker hosts in analytics eqiad - stevemunene@cumin1002"
- 12:32 stevemunene@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly reassigned an-worker hosts in analytics eqiad - stevemunene@cumin1002"
- 12:28 stevemunene@cumin1002: START - Cookbook sre.dns.netbox
- 12:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T376905)', diff saved to https://phabricator.wikimedia.org/P70155 and previous config saved to /var/cache/conftool/dbconfig/20241016-122248-ladsgroup.json
- 12:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 12:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 12:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
- 12:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
- 12:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T376905)', diff saved to https://phabricator.wikimedia.org/P70154 and previous config saved to /var/cache/conftool/dbconfig/20241016-122206-ladsgroup.json
- 12:15 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
- 12:14 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
- 12:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P70153 and previous config saved to /var/cache/conftool/dbconfig/20241016-120659-ladsgroup.json
- 11:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P70152 and previous config saved to /var/cache/conftool/dbconfig/20241016-115152-ladsgroup.json
- 11:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2198.codfw.wmnet with reason: Maintenance
- 11:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T376905)', diff saved to https://phabricator.wikimedia.org/P70150 and previous config saved to /var/cache/conftool/dbconfig/20241016-113645-ladsgroup.json
- 11:36 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2198.codfw.wmnet with reason: Maintenance
- 11:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T371742)', diff saved to https://phabricator.wikimedia.org/P70149 and previous config saved to /var/cache/conftool/dbconfig/20241016-113639-ladsgroup.json
- 11:29 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
- 11:28 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
- 11:26 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
- 11:25 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
- 11:22 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
- 11:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70148 and previous config saved to /var/cache/conftool/dbconfig/20241016-112132-ladsgroup.json
- 11:21 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
- 11:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70147 and previous config saved to /var/cache/conftool/dbconfig/20241016-110625-ladsgroup.json
- 10:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T371742)', diff saved to https://phabricator.wikimedia.org/P70146 and previous config saved to /var/cache/conftool/dbconfig/20241016-105118-ladsgroup.json
- 10:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T376905)', diff saved to https://phabricator.wikimedia.org/P70145 and previous config saved to /var/cache/conftool/dbconfig/20241016-103620-ladsgroup.json
- 10:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
- 10:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
- 10:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T376905)', diff saved to https://phabricator.wikimedia.org/P70144 and previous config saved to /var/cache/conftool/dbconfig/20241016-103553-ladsgroup.json
- 10:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P70143 and previous config saved to /var/cache/conftool/dbconfig/20241016-102046-ladsgroup.json
- 10:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P70142 and previous config saved to /var/cache/conftool/dbconfig/20241016-100539-ladsgroup.json
- 09:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T376905)', diff saved to https://phabricator.wikimedia.org/P70141 and previous config saved to /var/cache/conftool/dbconfig/20241016-095032-ladsgroup.json
- 09:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2140 (T376905)', diff saved to https://phabricator.wikimedia.org/P70140 and previous config saved to /var/cache/conftool/dbconfig/20241016-093852-ladsgroup.json
- 09:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
- 09:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
- 09:32 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
- 09:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
- 09:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T376905)', diff saved to https://phabricator.wikimedia.org/P70139 and previous config saved to /var/cache/conftool/dbconfig/20241016-093147-ladsgroup.json
- 09:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 (T371742)', diff saved to https://phabricator.wikimedia.org/P70138 and previous config saved to /var/cache/conftool/dbconfig/20241016-092219-ladsgroup.json
- 09:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2195.codfw.wmnet with reason: Maintenance
- 09:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2195.codfw.wmnet with reason: Maintenance
- 09:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T371742)', diff saved to https://phabricator.wikimedia.org/P70137 and previous config saved to /var/cache/conftool/dbconfig/20241016-092157-ladsgroup.json
- 09:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P70136 and previous config saved to /var/cache/conftool/dbconfig/20241016-091640-ladsgroup.json
- 09:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70134 and previous config saved to /var/cache/conftool/dbconfig/20241016-090650-ladsgroup.json
- 09:04 kevinbazira@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 09:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P70133 and previous config saved to /var/cache/conftool/dbconfig/20241016-090133-ladsgroup.json
- 08:57 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 08:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70132 and previous config saved to /var/cache/conftool/dbconfig/20241016-085143-ladsgroup.json
- 08:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T376905)', diff saved to https://phabricator.wikimedia.org/P70131 and previous config saved to /var/cache/conftool/dbconfig/20241016-084626-ladsgroup.json
- 08:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2136 (T376905)', diff saved to https://phabricator.wikimedia.org/P70130 and previous config saved to /var/cache/conftool/dbconfig/20241016-083651-ladsgroup.json
- 08:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
- 08:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T371742)', diff saved to https://phabricator.wikimedia.org/P70129 and previous config saved to /var/cache/conftool/dbconfig/20241016-083636-ladsgroup.json
- 08:36 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
- 08:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 08:07 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 08:05 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 08:04 brouberol@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 08:03 brouberol@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 08:02 brouberol@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 08:01 brouberol@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 08:00 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 07:59 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 07:41 awight: UTC morning deployments done
- 07:40 awight@deploy2002: Finished scap sync-world: Backport for zhwiki: Revise contact page deprecated usage (duration: 09m 07s)
- 07:35 awight@deploy2002: awight, hamishz: Continuing with sync
- 07:34 awight@deploy2002: awight, hamishz: Backport for zhwiki: Revise contact page deprecated usage synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:31 awight@deploy2002: Started scap sync-world: Backport for zhwiki: Revise contact page deprecated usage
- 07:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T376905)', diff saved to https://phabricator.wikimedia.org/P70128 and previous config saved to /var/cache/conftool/dbconfig/20241016-072501-ladsgroup.json
- 07:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P70127 and previous config saved to /var/cache/conftool/dbconfig/20241016-070954-ladsgroup.json
- 07:09 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
- 07:08 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
- 07:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 (T371742)', diff saved to https://phabricator.wikimedia.org/P70126 and previous config saved to /var/cache/conftool/dbconfig/20241016-070246-ladsgroup.json
- 07:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2181.codfw.wmnet with reason: Maintenance
- 07:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2181.codfw.wmnet with reason: Maintenance
- 07:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T371742)', diff saved to https://phabricator.wikimedia.org/P70125 and previous config saved to /var/cache/conftool/dbconfig/20241016-070224-ladsgroup.json
- 06:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P70124 and previous config saved to /var/cache/conftool/dbconfig/20241016-065447-ladsgroup.json
- 06:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70123 and previous config saved to /var/cache/conftool/dbconfig/20241016-064717-ladsgroup.json
- 06:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T376905)', diff saved to https://phabricator.wikimedia.org/P70122 and previous config saved to /var/cache/conftool/dbconfig/20241016-063940-ladsgroup.json
- 06:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70121 and previous config saved to /var/cache/conftool/dbconfig/20241016-063210-ladsgroup.json
- 06:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1249 (T376905)', diff saved to https://phabricator.wikimedia.org/P70120 and previous config saved to /var/cache/conftool/dbconfig/20241016-063132-ladsgroup.json
- 06:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: Maintenance
- 06:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: Maintenance
- 06:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T376905)', diff saved to https://phabricator.wikimedia.org/P70119 and previous config saved to /var/cache/conftool/dbconfig/20241016-063107-ladsgroup.json
- 06:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T371742)', diff saved to https://phabricator.wikimedia.org/P70118 and previous config saved to /var/cache/conftool/dbconfig/20241016-061703-ladsgroup.json
- 06:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P70117 and previous config saved to /var/cache/conftool/dbconfig/20241016-061558-ladsgroup.json
- 06:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P70116 and previous config saved to /var/cache/conftool/dbconfig/20241016-060051-ladsgroup.json
- 05:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T376905)', diff saved to https://phabricator.wikimedia.org/P70115 and previous config saved to /var/cache/conftool/dbconfig/20241016-054544-ladsgroup.json
- 05:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1248 (T376905)', diff saved to https://phabricator.wikimedia.org/P70114 and previous config saved to /var/cache/conftool/dbconfig/20241016-053943-ladsgroup.json
- 05:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: Maintenance
- 05:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: Maintenance
- 05:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T376905)', diff saved to https://phabricator.wikimedia.org/P70113 and previous config saved to /var/cache/conftool/dbconfig/20241016-053918-ladsgroup.json
- 05:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P70112 and previous config saved to /var/cache/conftool/dbconfig/20241016-052411-ladsgroup.json
- 05:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P70111 and previous config saved to /var/cache/conftool/dbconfig/20241016-050904-ladsgroup.json
- 04:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T376905)', diff saved to https://phabricator.wikimedia.org/P70110 and previous config saved to /var/cache/conftool/dbconfig/20241016-045356-ladsgroup.json
- 04:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1247 (T376905)', diff saved to https://phabricator.wikimedia.org/P70109 and previous config saved to /var/cache/conftool/dbconfig/20241016-044657-ladsgroup.json
- 04:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: Maintenance
- 04:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: Maintenance
- 04:42 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
- 04:42 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
- 04:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T376905)', diff saved to https://phabricator.wikimedia.org/P70108 and previous config saved to /var/cache/conftool/dbconfig/20241016-044204-ladsgroup.json
- 04:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T371742)', diff saved to https://phabricator.wikimedia.org/P70107 and previous config saved to /var/cache/conftool/dbconfig/20241016-043757-ladsgroup.json
- 04:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
- 04:37 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
- 04:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T371742)', diff saved to https://phabricator.wikimedia.org/P70106 and previous config saved to /var/cache/conftool/dbconfig/20241016-043734-ladsgroup.json
- 04:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P70105 and previous config saved to /var/cache/conftool/dbconfig/20241016-042657-ladsgroup.json
- 04:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P70104 and previous config saved to /var/cache/conftool/dbconfig/20241016-042227-ladsgroup.json
- 04:22 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 04:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for new frack devices - pt1979@cumin2002"
- 04:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for new frack devices - pt1979@cumin2002"
- 04:18 pt1979@cumin2002: START - Cookbook sre.dns.netbox
- 04:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P70103 and previous config saved to /var/cache/conftool/dbconfig/20241016-041150-ladsgroup.json
- 04:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P70102 and previous config saved to /var/cache/conftool/dbconfig/20241016-040721-ladsgroup.json
- 04:05 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 04:05 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for new frack devices - pt1979@cumin2002"
- 04:05 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for new frack devices - pt1979@cumin2002"
- 04:01 pt1979@cumin2002: START - Cookbook sre.dns.netbox
- 03:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T376905)', diff saved to https://phabricator.wikimedia.org/P70101 and previous config saved to /var/cache/conftool/dbconfig/20241016-035643-ladsgroup.json
- 03:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T371742)', diff saved to https://phabricator.wikimedia.org/P70100 and previous config saved to /var/cache/conftool/dbconfig/20241016-035214-ladsgroup.json
- 03:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1244 (T376905)', diff saved to https://phabricator.wikimedia.org/P70099 and previous config saved to /var/cache/conftool/dbconfig/20241016-034932-ladsgroup.json
- 03:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: Maintenance
- 03:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: Maintenance
- 03:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T376905)', diff saved to https://phabricator.wikimedia.org/P70098 and previous config saved to /var/cache/conftool/dbconfig/20241016-034907-ladsgroup.json
- 03:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P70097 and previous config saved to /var/cache/conftool/dbconfig/20241016-033400-ladsgroup.json
- 03:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P70096 and previous config saved to /var/cache/conftool/dbconfig/20241016-031852-ladsgroup.json
- 03:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T376905)', diff saved to https://phabricator.wikimedia.org/P70095 and previous config saved to /var/cache/conftool/dbconfig/20241016-030345-ladsgroup.json
- 02:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T376905)', diff saved to https://phabricator.wikimedia.org/P70094 and previous config saved to /var/cache/conftool/dbconfig/20241016-025633-ladsgroup.json
- 02:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: Maintenance
- 02:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: Maintenance
- 02:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T376905)', diff saved to https://phabricator.wikimedia.org/P70093 and previous config saved to /var/cache/conftool/dbconfig/20241016-025608-ladsgroup.json
- 02:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P70092 and previous config saved to /var/cache/conftool/dbconfig/20241016-024101-ladsgroup.json
- 02:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P70091 and previous config saved to /var/cache/conftool/dbconfig/20241016-022554-ladsgroup.json
- 02:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T371742)', diff saved to https://phabricator.wikimedia.org/P70090 and previous config saved to /var/cache/conftool/dbconfig/20241016-021358-ladsgroup.json
- 02:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2166.codfw.wmnet with reason: Maintenance
- 02:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2166.codfw.wmnet with reason: Maintenance
- 02:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T371742)', diff saved to https://phabricator.wikimedia.org/P70089 and previous config saved to /var/cache/conftool/dbconfig/20241016-021347-ladsgroup.json
- 02:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T376905)', diff saved to https://phabricator.wikimedia.org/P70088 and previous config saved to /var/cache/conftool/dbconfig/20241016-021047-ladsgroup.json
- 02:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T376905)', diff saved to https://phabricator.wikimedia.org/P70087 and previous config saved to /var/cache/conftool/dbconfig/20241016-020333-ladsgroup.json
- 02:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: Maintenance
- 02:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: Maintenance
- 02:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T376905)', diff saved to https://phabricator.wikimedia.org/P70086 and previous config saved to /var/cache/conftool/dbconfig/20241016-020308-ladsgroup.json
- 01:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P70085 and previous config saved to /var/cache/conftool/dbconfig/20241016-015840-ladsgroup.json
- 01:50 eileen: tools upgraded from 62f2d170 to 68f64e43
- 01:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P70084 and previous config saved to /var/cache/conftool/dbconfig/20241016-014801-ladsgroup.json
- 01:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P70083 and previous config saved to /var/cache/conftool/dbconfig/20241016-014333-ladsgroup.json
- 01:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P70082 and previous config saved to /var/cache/conftool/dbconfig/20241016-013254-ladsgroup.json
- 01:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T371742)', diff saved to https://phabricator.wikimedia.org/P70081 and previous config saved to /var/cache/conftool/dbconfig/20241016-012826-ladsgroup.json
- 01:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T376905)', diff saved to https://phabricator.wikimedia.org/P70080 and previous config saved to /var/cache/conftool/dbconfig/20241016-011747-ladsgroup.json
- 01:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1241 (T376905)', diff saved to https://phabricator.wikimedia.org/P70079 and previous config saved to /var/cache/conftool/dbconfig/20241016-011036-ladsgroup.json
- 01:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: Maintenance
- 01:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: Maintenance
- 01:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T376905)', diff saved to https://phabricator.wikimedia.org/P70078 and previous config saved to /var/cache/conftool/dbconfig/20241016-011010-ladsgroup.json
- 00:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P70077 and previous config saved to /var/cache/conftool/dbconfig/20241016-005500-ladsgroup.json
- 00:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P70076 and previous config saved to /var/cache/conftool/dbconfig/20241016-003953-ladsgroup.json
- 00:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T376905)', diff saved to https://phabricator.wikimedia.org/P70075 and previous config saved to /var/cache/conftool/dbconfig/20241016-002446-ladsgroup.json
- 00:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1238 (T376905)', diff saved to https://phabricator.wikimedia.org/P70074 and previous config saved to /var/cache/conftool/dbconfig/20241016-001629-ladsgroup.json
- 00:16 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: Maintenance
- 00:16 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: Maintenance
- 00:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T376905)', diff saved to https://phabricator.wikimedia.org/P70073 and previous config saved to /var/cache/conftool/dbconfig/20241016-001604-ladsgroup.json
- 00:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P70072 and previous config saved to /var/cache/conftool/dbconfig/20241016-000057-ladsgroup.json
2024-10-15
- 23:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T371742)', diff saved to https://phabricator.wikimedia.org/P70071 and previous config saved to /var/cache/conftool/dbconfig/20241015-235055-ladsgroup.json
- 23:50 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 23:50 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 23:50 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2164.codfw.wmnet with reason: Maintenance
- 23:50 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2164.codfw.wmnet with reason: Maintenance
- 23:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T371742)', diff saved to https://phabricator.wikimedia.org/P70070 and previous config saved to /var/cache/conftool/dbconfig/20241015-235017-ladsgroup.json
- 23:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P70069 and previous config saved to /var/cache/conftool/dbconfig/20241015-234550-ladsgroup.json
- 23:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P70068 and previous config saved to /var/cache/conftool/dbconfig/20241015-233510-ladsgroup.json
- 23:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T376905)', diff saved to https://phabricator.wikimedia.org/P70067 and previous config saved to /var/cache/conftool/dbconfig/20241015-233043-ladsgroup.json
- 23:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T376905)', diff saved to https://phabricator.wikimedia.org/P70066 and previous config saved to /var/cache/conftool/dbconfig/20241015-232456-ladsgroup.json
- 23:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 23:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 23:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
- 23:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
- 23:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T376905)', diff saved to https://phabricator.wikimedia.org/P70065 and previous config saved to /var/cache/conftool/dbconfig/20241015-232423-ladsgroup.json
- 23:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P70064 and previous config saved to /var/cache/conftool/dbconfig/20241015-232003-ladsgroup.json
- 23:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P70063 and previous config saved to /var/cache/conftool/dbconfig/20241015-230916-ladsgroup.json
- 23:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T371742)', diff saved to https://phabricator.wikimedia.org/P70062 and previous config saved to /var/cache/conftool/dbconfig/20241015-230456-ladsgroup.json
- 22:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P70061 and previous config saved to /var/cache/conftool/dbconfig/20241015-225409-ladsgroup.json
- 22:48 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 22:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T376905)', diff saved to https://phabricator.wikimedia.org/P70060 and previous config saved to /var/cache/conftool/dbconfig/20241015-223902-ladsgroup.json
- 22:38 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 22:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1199 (T376905)', diff saved to https://phabricator.wikimedia.org/P70059 and previous config saved to /var/cache/conftool/dbconfig/20241015-222936-ladsgroup.json
- 22:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
- 22:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
- 22:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T376905)', diff saved to https://phabricator.wikimedia.org/P70058 and previous config saved to /var/cache/conftool/dbconfig/20241015-222911-ladsgroup.json
- 22:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1222.eqiad.wmnet with reason: Maintenance
- 22:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1222.eqiad.wmnet with reason: Maintenance
- 22:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1189.eqiad.wmnet with reason: Maintenance
- 22:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1189.eqiad.wmnet with reason: Maintenance
- 22:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P70057 and previous config saved to /var/cache/conftool/dbconfig/20241015-221404-ladsgroup.json
- 22:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T370903)', diff saved to https://phabricator.wikimedia.org/P70056 and previous config saved to /var/cache/conftool/dbconfig/20241015-221356-ladsgroup.json
- 22:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P70055 and previous config saved to /var/cache/conftool/dbconfig/20241015-220316-ladsgroup.json
- 21:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P70054 and previous config saved to /var/cache/conftool/dbconfig/20241015-215857-ladsgroup.json
- 21:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P70053 and previous config saved to /var/cache/conftool/dbconfig/20241015-215849-ladsgroup.json
- 21:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye
- 21:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P70052 and previous config saved to /var/cache/conftool/dbconfig/20241015-214811-ladsgroup.json
- 21:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T376905)', diff saved to https://phabricator.wikimedia.org/P70051 and previous config saved to /var/cache/conftool/dbconfig/20241015-214350-ladsgroup.json
- 21:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P70050 and previous config saved to /var/cache/conftool/dbconfig/20241015-214342-ladsgroup.json
- 21:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1190 (T376905)', diff saved to https://phabricator.wikimedia.org/P70049 and previous config saved to /var/cache/conftool/dbconfig/20241015-213423-ladsgroup.json
- 21:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
- 21:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
- 21:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P70048 and previous config saved to /var/cache/conftool/dbconfig/20241015-213305-ladsgroup.json
- 21:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T371742)', diff saved to https://phabricator.wikimedia.org/P70047 and previous config saved to /var/cache/conftool/dbconfig/20241015-213227-ladsgroup.json
- 21:32 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
- 21:32 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
- 21:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T371742)', diff saved to https://phabricator.wikimedia.org/P70046 and previous config saved to /var/cache/conftool/dbconfig/20241015-213203-ladsgroup.json
- 21:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 21:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 21:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T370903)', diff saved to https://phabricator.wikimedia.org/P70045 and previous config saved to /var/cache/conftool/dbconfig/20241015-212835-ladsgroup.json
- 21:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2205.codfw.wmnet with reason: Sad
- 21:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2205.codfw.wmnet with reason: Sad
- 21:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1175 (T370903)', diff saved to https://phabricator.wikimedia.org/P70044 and previous config saved to /var/cache/conftool/dbconfig/20241015-212431-ladsgroup.json
- 21:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1175.eqiad.wmnet with reason: Maintenance
- 21:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1175.eqiad.wmnet with reason: Maintenance
- 21:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P70043 and previous config saved to /var/cache/conftool/dbconfig/20241015-211800-ladsgroup.json
- 21:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P70042 and previous config saved to /var/cache/conftool/dbconfig/20241015-211656-ladsgroup.json
- 21:04 cjming: end of UTC late backport window
- 21:04 cjming@deploy2002: Finished scap sync-world: Backport for SkinComponentCopyright: Fix message existence check for history-copyright (T45646) (duration: 06m 51s)
- 21:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P70041 and previous config saved to /var/cache/conftool/dbconfig/20241015-210149-ladsgroup.json
- 20:59 cjming@deploy2002: cjming, matmarex: Continuing with sync
- 20:59 cjming@deploy2002: cjming, matmarex: Backport for SkinComponentCopyright: Fix message existence check for history-copyright (T45646) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:57 ladsgroup@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2194.codfw.wmnet onto db2205.codfw.wmnet
- 20:57 cjming@deploy2002: Started scap sync-world: Backport for SkinComponentCopyright: Fix message existence check for history-copyright (T45646)
- 20:56 cjming@deploy2002: Finished scap sync-world: Backport for Redirect all namespace-in-Wikipedia cases to Wikipedia (T376923) (duration: 12m 33s)
- 20:51 cjming@deploy2002: cjming, pppery: Continuing with sync
- 20:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T371742)', diff saved to https://phabricator.wikimedia.org/P70040 and previous config saved to /var/cache/conftool/dbconfig/20241015-204642-ladsgroup.json
- 20:46 cjming@deploy2002: cjming, pppery: Backport for Redirect all namespace-in-Wikipedia cases to Wikipedia (T376923) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:43 cjming@deploy2002: Started scap sync-world: Backport for Redirect all namespace-in-Wikipedia cases to Wikipedia (T376923)
- 20:42 cjming@deploy2002: Finished scap sync-world: Backport for Missing.php: Improve detection of interwikis in certain cases (T363538) (duration: 08m 50s)
- 20:37 cjming@deploy2002: cjming, pppery: Continuing with sync
- 20:35 cjming@deploy2002: cjming, pppery: Backport for Missing.php: Improve detection of interwikis in certain cases (T363538) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:33 cjming@deploy2002: Started scap sync-world: Backport for Missing.php: Improve detection of interwikis in certain cases (T363538)
- 20:31 cjming@deploy2002: Finished scap sync-world: Backport for contactpages: Move stewards contactpage to MetaContactPages.php (duration: 10m 56s)
- 20:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
- 20:27 cjming@deploy2002: ammarpad, cjming: Continuing with sync
- 20:23 cjming@deploy2002: ammarpad, cjming: Backport for contactpages: Move stewards contactpage to MetaContactPages.php synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:20 cjming@deploy2002: Started scap sync-world: Backport for contactpages: Move stewards contactpage to MetaContactPages.php
- 20:16 cjming@deploy2002: Finished scap sync-world: Backport for Remove legacy UI actions tracking (T376065) (duration: 12m 28s)
- 20:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye
- 20:12 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 20:12 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 20:11 cjming@deploy2002: ksarabia, cjming: Continuing with sync
- 20:11 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 20:10 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 20:10 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 20:09 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 20:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye
- 20:08 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2081.codfw.wmnet with OS bullseye
- 20:07 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 20:07 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 20:06 cjming@deploy2002: ksarabia, cjming: Backport for Remove legacy UI actions tracking (T376065) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:05 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 20:04 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 20:04 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 20:03 cjming@deploy2002: Started scap sync-world: Backport for Remove legacy UI actions tracking (T376065)
- 20:03 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 20:02 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 20:01 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 20:00 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 19:59 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 19:56 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 19:56 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 19:16 aklapper@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.27 refs T375658
- 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2162 (T371742)', diff saved to https://phabricator.wikimedia.org/P70039 and previous config saved to /var/cache/conftool/dbconfig/20241015-191345-ladsgroup.json
- 19:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2162.codfw.wmnet with reason: Maintenance
- 19:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2162.codfw.wmnet with reason: Maintenance
- 19:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T371742)', diff saved to https://phabricator.wikimedia.org/P70038 and previous config saved to /var/cache/conftool/dbconfig/20241015-191322-ladsgroup.json
- 19:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T367781)', diff saved to https://phabricator.wikimedia.org/P70037 and previous config saved to /var/cache/conftool/dbconfig/20241015-190231-arnaudb.json
- 18:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P70036 and previous config saved to /var/cache/conftool/dbconfig/20241015-185814-ladsgroup.json
- 18:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
- 18:55 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
- 18:55 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye
- 18:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:48 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P70035 and previous config saved to /var/cache/conftool/dbconfig/20241015-184724-arnaudb.json
- 18:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P70034 and previous config saved to /var/cache/conftool/dbconfig/20241015-184307-ladsgroup.json
- 18:42 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:42 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:41 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:40 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:40 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:39 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2082
- 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2081
- 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2083
- 18:37 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2083
- 18:37 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2082
- 18:36 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2081
- 18:36 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:35 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2081-3 to codfw - jhancock@cumin2002"
- 18:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2081-3 to codfw - jhancock@cumin2002"
- 18:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P70033 and previous config saved to /var/cache/conftool/dbconfig/20241015-183218-arnaudb.json
- 18:31 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 18:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T371742)', diff saved to https://phabricator.wikimedia.org/P70032 and previous config saved to /var/cache/conftool/dbconfig/20241015-182800-ladsgroup.json
- 18:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T376905)', diff saved to https://phabricator.wikimedia.org/P70031 and previous config saved to /var/cache/conftool/dbconfig/20241015-181930-ladsgroup.json
- 18:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T367781)', diff saved to https://phabricator.wikimedia.org/P70030 and previous config saved to /var/cache/conftool/dbconfig/20241015-181711-arnaudb.json
- 18:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T367781)', diff saved to https://phabricator.wikimedia.org/P70029 and previous config saved to /var/cache/conftool/dbconfig/20241015-181455-arnaudb.json
- 18:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2216.codfw.wmnet with reason: Maintenance
- 18:15 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2216.codfw.wmnet with reason: Maintenance
- 18:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T367781)', diff saved to https://phabricator.wikimedia.org/P70028 and previous config saved to /var/cache/conftool/dbconfig/20241015-181433-arnaudb.json
- 18:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P70027 and previous config saved to /var/cache/conftool/dbconfig/20241015-180423-ladsgroup.json
- 17:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P70026 and previous config saved to /var/cache/conftool/dbconfig/20241015-175926-arnaudb.json
- 17:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P70025 and previous config saved to /var/cache/conftool/dbconfig/20241015-174916-ladsgroup.json
- 17:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P70024 and previous config saved to /var/cache/conftool/dbconfig/20241015-174419-arnaudb.json
- 17:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T376905)', diff saved to https://phabricator.wikimedia.org/P70023 and previous config saved to /var/cache/conftool/dbconfig/20241015-173409-ladsgroup.json
- 17:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T367781)', diff saved to https://phabricator.wikimedia.org/P70022 and previous config saved to /var/cache/conftool/dbconfig/20241015-172912-arnaudb.json
- 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1236 (T376905)', diff saved to https://phabricator.wikimedia.org/P70021 and previous config saved to /var/cache/conftool/dbconfig/20241015-172714-ladsgroup.json
- 17:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Maintenance
- 17:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2212 (T367781)', diff saved to https://phabricator.wikimedia.org/P70020 and previous config saved to /var/cache/conftool/dbconfig/20241015-172657-arnaudb.json
- 17:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Maintenance
- 17:27 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2212.codfw.wmnet with reason: Maintenance
- 17:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T376905)', diff saved to https://phabricator.wikimedia.org/P70019 and previous config saved to /var/cache/conftool/dbconfig/20241015-172648-ladsgroup.json
- 17:27 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2212.codfw.wmnet with reason: Maintenance
- 17:26 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2202.codfw.wmnet with reason: Maintenance
- 17:26 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2202.codfw.wmnet with reason: Maintenance
- 17:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T367781)', diff saved to https://phabricator.wikimedia.org/P70018 and previous config saved to /var/cache/conftool/dbconfig/20241015-172610-arnaudb.json
- 17:13 swfrench@deploy2002: Finished scap sync-world: Testing scap after mediawiki-deployments.yaml format change - T370934 (duration: 02m 47s)
- 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P70017 and previous config saved to /var/cache/conftool/dbconfig/20241015-171141-ladsgroup.json
- 17:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P70016 and previous config saved to /var/cache/conftool/dbconfig/20241015-171103-arnaudb.json
- 17:10 swfrench@deploy2002: Started scap sync-world: Testing scap after mediawiki-deployments.yaml format change - T370934
- 16:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P70015 and previous config saved to /var/cache/conftool/dbconfig/20241015-165634-ladsgroup.json
- 16:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2161 (T371742)', diff saved to https://phabricator.wikimedia.org/P70014 and previous config saved to /var/cache/conftool/dbconfig/20241015-165608-ladsgroup.json
- 16:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P70013 and previous config saved to /var/cache/conftool/dbconfig/20241015-165556-arnaudb.json
- 16:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
- 16:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
- 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T371742)', diff saved to https://phabricator.wikimedia.org/P70012 and previous config saved to /var/cache/conftool/dbconfig/20241015-165539-ladsgroup.json
- 16:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T376905)', diff saved to https://phabricator.wikimedia.org/P70011 and previous config saved to /var/cache/conftool/dbconfig/20241015-164127-ladsgroup.json
- 16:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T367781)', diff saved to https://phabricator.wikimedia.org/P70010 and previous config saved to /var/cache/conftool/dbconfig/20241015-164050-arnaudb.json
- 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P70009 and previous config saved to /var/cache/conftool/dbconfig/20241015-164032-ladsgroup.json
- 16:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T367781)', diff saved to https://phabricator.wikimedia.org/P70008 and previous config saved to /var/cache/conftool/dbconfig/20241015-163834-arnaudb.json
- 16:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2188.codfw.wmnet with reason: Maintenance
- 16:38 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2188.codfw.wmnet with reason: Maintenance
- 16:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T367781)', diff saved to https://phabricator.wikimedia.org/P70007 and previous config saved to /var/cache/conftool/dbconfig/20241015-163812-arnaudb.json
- 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1227 (T376905)', diff saved to https://phabricator.wikimedia.org/P70006 and previous config saved to /var/cache/conftool/dbconfig/20241015-163419-ladsgroup.json
- 16:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: Maintenance
- 16:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: Maintenance
- 16:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T376905)', diff saved to https://phabricator.wikimedia.org/P70005 and previous config saved to /var/cache/conftool/dbconfig/20241015-163404-ladsgroup.json
- 16:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P70004 and previous config saved to /var/cache/conftool/dbconfig/20241015-162525-ladsgroup.json
- 16:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P70003 and previous config saved to /var/cache/conftool/dbconfig/20241015-162305-arnaudb.json
- 16:21 ladsgroup@cumin1002: START - Cookbook sre.mysql.clone of db2194.codfw.wmnet onto db2205.codfw.wmnet
- 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool for reclone (T375652)', diff saved to https://phabricator.wikimedia.org/P70002 and previous config saved to /var/cache/conftool/dbconfig/20241015-161934-ladsgroup.json
- 16:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P70001 and previous config saved to /var/cache/conftool/dbconfig/20241015-161858-ladsgroup.json
- 16:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T371742)', diff saved to https://phabricator.wikimedia.org/P70000 and previous config saved to /var/cache/conftool/dbconfig/20241015-161018-ladsgroup.json
- 16:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P69999 and previous config saved to /var/cache/conftool/dbconfig/20241015-160758-arnaudb.json
- 16:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P69998 and previous config saved to /var/cache/conftool/dbconfig/20241015-160351-ladsgroup.json
- 16:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool db2205 T377164', diff saved to https://phabricator.wikimedia.org/P69997 and previous config saved to /var/cache/conftool/dbconfig/20241015-160106-ladsgroup.json
- 15:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T367781)', diff saved to https://phabricator.wikimedia.org/P69996 and previous config saved to /var/cache/conftool/dbconfig/20241015-155251-arnaudb.json
- 15:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Promote db2209 to s3 primary and set section read-write T377164', diff saved to https://phabricator.wikimedia.org/P69995 and previous config saved to /var/cache/conftool/dbconfig/20241015-155240-ladsgroup.json
- 15:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T376905)', diff saved to https://phabricator.wikimedia.org/P69994 and previous config saved to /var/cache/conftool/dbconfig/20241015-154844-ladsgroup.json
- 15:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Set s3 codfw as read-only for maintenance - T377164', diff saved to https://phabricator.wikimedia.org/P69993 and previous config saved to /var/cache/conftool/dbconfig/20241015-154834-ladsgroup.json
- 15:48 Amir1: Starting s3 codfw failover from db2205 to db2209 - T377164
- 15:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T367781)', diff saved to https://phabricator.wikimedia.org/P69992 and previous config saved to /var/cache/conftool/dbconfig/20241015-154318-arnaudb.json
- 15:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2176.codfw.wmnet with reason: Maintenance
- 15:45 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2176.codfw.wmnet with reason: Maintenance
- 15:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T367781)', diff saved to https://phabricator.wikimedia.org/P69991 and previous config saved to /var/cache/conftool/dbconfig/20241015-154256-arnaudb.json
- 15:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Set db2209 with weight 0 T377164', diff saved to https://phabricator.wikimedia.org/P69990 and previous config saved to /var/cache/conftool/dbconfig/20241015-154228-ladsgroup.json
- 15:43 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Primary switchover s3 T377164
- 15:42 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Primary switchover s3 T377164
- 15:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1202 (T376905)', diff saved to https://phabricator.wikimedia.org/P69989 and previous config saved to /var/cache/conftool/dbconfig/20241015-154027-ladsgroup.json
- 15:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
- 15:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
- 15:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T376905)', diff saved to https://phabricator.wikimedia.org/P69988 and previous config saved to /var/cache/conftool/dbconfig/20241015-154002-ladsgroup.json
- 15:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P69987 and previous config saved to /var/cache/conftool/dbconfig/20241015-152749-arnaudb.json
- 15:26 akosiaris: run gnt-cluster verify-disks after ganeti1034 forceful reboot
- 15:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P69986 and previous config saved to /var/cache/conftool/dbconfig/20241015-152456-ladsgroup.json
- 15:22 volans: force-rebooting ganeti1034 stuck due to drbd traces via mgmt
- 15:19 akosiaris@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1034.eqiad.wmnet
- 15:17 akosiaris: drain ganeti1034 of VMs, hardware might be misbehaving
- 15:16 akosiaris@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1034.eqiad.wmnet
- 15:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P69985 and previous config saved to /var/cache/conftool/dbconfig/20241015-151243-arnaudb.json
- 15:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P69984 and previous config saved to /var/cache/conftool/dbconfig/20241015-150948-ladsgroup.json
- 14:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T367781)', diff saved to https://phabricator.wikimedia.org/P69983 and previous config saved to /var/cache/conftool/dbconfig/20241015-145734-arnaudb.json
- 14:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan1001.eqiad.wmnet
- 14:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T367781)', diff saved to https://phabricator.wikimedia.org/P69982 and previous config saved to /var/cache/conftool/dbconfig/20241015-145517-arnaudb.json
- 14:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2174.codfw.wmnet with reason: Maintenance
- 14:55 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2174.codfw.wmnet with reason: Maintenance
- 14:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T367781)', diff saved to https://phabricator.wikimedia.org/P69981 and previous config saved to /var/cache/conftool/dbconfig/20241015-145453-arnaudb.json
- 14:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T376905)', diff saved to https://phabricator.wikimedia.org/P69980 and previous config saved to /var/cache/conftool/dbconfig/20241015-145441-ladsgroup.json
- 14:48 herron@cumin1002: START - Cookbook sre.hosts.reboot-single for host titan1001.eqiad.wmnet
- 14:47 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan2001.codfw.wmnet
- 14:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1194 (T376905)', diff saved to https://phabricator.wikimedia.org/P69979 and previous config saved to /var/cache/conftool/dbconfig/20241015-144631-ladsgroup.json
- 14:47 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
- 14:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
- 14:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T376905)', diff saved to https://phabricator.wikimedia.org/P69978 and previous config saved to /var/cache/conftool/dbconfig/20241015-144606-ladsgroup.json
- 14:45 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 02m 24s)
- 14:43 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 46s)
- 14:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P69977 and previous config saved to /var/cache/conftool/dbconfig/20241015-143946-arnaudb.json
- 14:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T371742)', diff saved to https://phabricator.wikimedia.org/P69976 and previous config saved to /var/cache/conftool/dbconfig/20241015-143803-ladsgroup.json
- 14:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2154.codfw.wmnet with reason: Maintenance
- 14:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2154.codfw.wmnet with reason: Maintenance
- 14:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T371742)', diff saved to https://phabricator.wikimedia.org/P69975 and previous config saved to /var/cache/conftool/dbconfig/20241015-143740-ladsgroup.json
- 14:36 herron@cumin1002: START - Cookbook sre.hosts.reboot-single for host titan2001.codfw.wmnet
- 14:35 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host matomo1003.eqiad.wmnet
- 14:33 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan1002.eqiad.wmnet
- 14:31 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host matomo1003.eqiad.wmnet
- 14:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P69974 and previous config saved to /var/cache/conftool/dbconfig/20241015-143059-ladsgroup.json
- 14:29 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 14:28 herron@cumin1002: START - Cookbook sre.hosts.reboot-single for host titan1002.eqiad.wmnet
- 14:28 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 14:27 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 14:27 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 14:26 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 14:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P69973 and previous config saved to /var/cache/conftool/dbconfig/20241015-142439-arnaudb.json
- 14:24 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan2002.codfw.wmnet
- 14:24 urbanecm@deploy2002: Finished scap sync-world: Backport for SkinComponentCopyright: Fix message existence check for history-copyright (T45646) (duration: 33m 23s)
- 14:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P69972 and previous config saved to /var/cache/conftool/dbconfig/20241015-142233-ladsgroup.json
- 14:21 btullis@cumin1002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling reboot on A:schema
- 14:19 urbanecm@deploy2002: urbanecm, matmarex: Continuing with sync
- 14:17 herron@cumin1002: START - Cookbook sre.hosts.reboot-single for host titan2002.codfw.wmnet
- 14:16 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog2002.codfw.wmnet
- 14:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P69971 and previous config saved to /var/cache/conftool/dbconfig/20241015-141552-ladsgroup.json
- 14:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T367781)', diff saved to https://phabricator.wikimedia.org/P69970 and previous config saved to /var/cache/conftool/dbconfig/20241015-140932-arnaudb.json
- 14:09 herron@cumin1002: START - Cookbook sre.hosts.reboot-single for host centrallog2002.codfw.wmnet
- 14:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P69969 and previous config saved to /var/cache/conftool/dbconfig/20241015-140726-ladsgroup.json
- 14:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T367781)', diff saved to https://phabricator.wikimedia.org/P69968 and previous config saved to /var/cache/conftool/dbconfig/20241015-140716-arnaudb.json
- 14:08 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 14:08 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1020.eqiad.wmnet
- 14:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 14:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2173.codfw.wmnet with reason: Maintenance
- 14:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2173.codfw.wmnet with reason: Maintenance
- 14:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T367781)', diff saved to https://phabricator.wikimedia.org/P69967 and previous config saved to /var/cache/conftool/dbconfig/20241015-140638-arnaudb.json
- 14:05 btullis@cumin1002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling reboot on A:schema
- 14:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T376905)', diff saved to https://phabricator.wikimedia.org/P69966 and previous config saved to /var/cache/conftool/dbconfig/20241015-140045-ladsgroup.json
- 14:00 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-presto1020.eqiad.wmnet
- 13:57 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1019.eqiad.wmnet
- 13:55 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog1002.eqiad.wmnet
- 13:54 urbanecm@deploy2002: urbanecm, matmarex: Backport for SkinComponentCopyright: Fix message existence check for history-copyright (T45646) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1191 (T376905)', diff saved to https://phabricator.wikimedia.org/P69965 and previous config saved to /var/cache/conftool/dbconfig/20241015-135234-ladsgroup.json
- 13:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
- 13:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
- 13:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T371742)', diff saved to https://phabricator.wikimedia.org/P69964 and previous config saved to /var/cache/conftool/dbconfig/20241015-135213-ladsgroup.json
- 13:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T376905)', diff saved to https://phabricator.wikimedia.org/P69963 and previous config saved to /var/cache/conftool/dbconfig/20241015-135208-ladsgroup.json
- 13:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P69962 and previous config saved to /var/cache/conftool/dbconfig/20241015-135131-arnaudb.json
- 13:51 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-presto1019.eqiad.wmnet
- 13:50 urbanecm@deploy2002: Started scap sync-world: Backport for SkinComponentCopyright: Fix message existence check for history-copyright (T45646)
- 13:48 herron@cumin1002: START - Cookbook sre.hosts.reboot-single for host centrallog1002.eqiad.wmnet
- 13:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P69961 and previous config saved to /var/cache/conftool/dbconfig/20241015-133701-ladsgroup.json
- 13:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P69960 and previous config saved to /var/cache/conftool/dbconfig/20241015-133624-arnaudb.json
- 13:32 urbanecm@deploy2002: Finished scap sync-world: Backport for eswiki: switch clearing link recommendations to PageSaveComplete hook (T372337), s7: Reduce revision-slots cache expiry to 60 seconds (T183490) (duration: 07m 44s)
- 13:27 urbanecm@deploy2002: migr, urbanecm, zabe: Continuing with sync
- 13:26 urbanecm@deploy2002: migr, urbanecm, zabe: Backport for eswiki: switch clearing link recommendations to PageSaveComplete hook (T372337), s7: Reduce revision-slots cache expiry to 60 seconds (T183490) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:24 urbanecm@deploy2002: Started scap sync-world: Backport for eswiki: switch clearing link recommendations to PageSaveComplete hook (T372337), s7: Reduce revision-slots cache expiry to 60 seconds (T183490)
- 13:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [wikidatawiki] Enable the CampaignEvents extension (T375411), GrowthExperiments: update stream configuration to capture user id (T376833) (duration: 19m 25s)
- 13:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P69959 and previous config saved to /var/cache/conftool/dbconfig/20241015-132154-ladsgroup.json
- 13:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T367781)', diff saved to https://phabricator.wikimedia.org/P69958 and previous config saved to /var/cache/conftool/dbconfig/20241015-132117-arnaudb.json
- 13:19 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1018.eqiad.wmnet
- 13:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T367781)', diff saved to https://phabricator.wikimedia.org/P69957 and previous config saved to /var/cache/conftool/dbconfig/20241015-131901-arnaudb.json
- 13:19 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2170.codfw.wmnet with reason: Maintenance
- 13:18 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2170.codfw.wmnet with reason: Maintenance
- 13:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T367781)', diff saved to https://phabricator.wikimedia.org/P69956 and previous config saved to /var/cache/conftool/dbconfig/20241015-131839-arnaudb.json
- 13:16 urbanecm@deploy2002: cyndywikime, daimona, urbanecm: Continuing with sync
- 13:12 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-presto1018.eqiad.wmnet
- 13:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T370903)', diff saved to https://phabricator.wikimedia.org/P69955 and previous config saved to /var/cache/conftool/dbconfig/20241015-131122-ladsgroup.json
- 13:11 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1017.eqiad.wmnet
- 13:11 urbanecm@deploy2002: cyndywikime, daimona, urbanecm: Backport for [wikidatawiki] Enable the CampaignEvents extension (T375411), GrowthExperiments: update stream configuration to capture user id (T376833) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T376905)', diff saved to https://phabricator.wikimedia.org/P69954 and previous config saved to /var/cache/conftool/dbconfig/20241015-130647-ladsgroup.json
- 13:04 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-presto1017.eqiad.wmnet
- 13:04 urbanecm@deploy2002: Started scap sync-world: Backport for [wikidatawiki] Enable the CampaignEvents extension (T375411), GrowthExperiments: update stream configuration to capture user id (T376833)
- 13:03 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1016.eqiad.wmnet
- 13:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P69953 and previous config saved to /var/cache/conftool/dbconfig/20241015-130332-arnaudb.json
- 12:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1174 (T376905)', diff saved to https://phabricator.wikimedia.org/P69952 and previous config saved to /var/cache/conftool/dbconfig/20241015-125748-ladsgroup.json
- 12:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
- 12:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
- 12:57 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-presto1016.eqiad.wmnet
- 12:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P69951 and previous config saved to /var/cache/conftool/dbconfig/20241015-125615-ladsgroup.json
- 12:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
- 12:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
- 12:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T376905)', diff saved to https://phabricator.wikimedia.org/P69950 and previous config saved to /var/cache/conftool/dbconfig/20241015-125203-ladsgroup.json
- 12:50 brouberol@cumin1002: END (FAIL) - Cookbook sre.presto.reboot-workers (exit_code=99) for Presto an-presto cluster: Reboot Presto nodes
- 12:50 elukey: destroy old certs from puppetmaster1001's CA (parsoid.svc.{eqiad,codfw}.wmnet, debmonitor.discovery.wmnet)
- 12:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P69949 and previous config saved to /var/cache/conftool/dbconfig/20241015-124825-arnaudb.json
- 12:46 brouberol@cumin1002: START - Cookbook sre.presto.reboot-workers for Presto an-presto cluster: Reboot Presto nodes
- 12:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P69948 and previous config saved to /var/cache/conftool/dbconfig/20241015-124108-ladsgroup.json
- 12:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P69947 and previous config saved to /var/cache/conftool/dbconfig/20241015-123656-ladsgroup.json
- 12:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T367781)', diff saved to https://phabricator.wikimedia.org/P69946 and previous config saved to /var/cache/conftool/dbconfig/20241015-123318-arnaudb.json
- 12:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T367781)', diff saved to https://phabricator.wikimedia.org/P69945 and previous config saved to /var/cache/conftool/dbconfig/20241015-123101-arnaudb.json
- 12:30 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2153.codfw.wmnet with reason: Maintenance
- 12:30 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2153.codfw.wmnet with reason: Maintenance
- 12:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T367781)', diff saved to https://phabricator.wikimedia.org/P69944 and previous config saved to /var/cache/conftool/dbconfig/20241015-123039-arnaudb.json
- 12:30 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 12:29 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 12:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T370903)', diff saved to https://phabricator.wikimedia.org/P69943 and previous config saved to /var/cache/conftool/dbconfig/20241015-122601-ladsgroup.json
- 12:24 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 12:24 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 12:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1230 (T370903)', diff saved to https://phabricator.wikimedia.org/P69942 and previous config saved to /var/cache/conftool/dbconfig/20241015-122251-ladsgroup.json
- 12:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1230.eqiad.wmnet with reason: Maintenance
- 12:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1230.eqiad.wmnet with reason: Maintenance
- 12:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P69941 and previous config saved to /var/cache/conftool/dbconfig/20241015-122149-ladsgroup.json
- 12:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T371742)', diff saved to https://phabricator.wikimedia.org/P69940 and previous config saved to /var/cache/conftool/dbconfig/20241015-121706-ladsgroup.json
- 12:17 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2152.codfw.wmnet with reason: Maintenance
- 12:16 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2152.codfw.wmnet with reason: Maintenance
- 12:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P69939 and previous config saved to /var/cache/conftool/dbconfig/20241015-121532-arnaudb.json
- 12:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T371742)', diff saved to https://phabricator.wikimedia.org/P69938 and previous config saved to /var/cache/conftool/dbconfig/20241015-121349-ladsgroup.json
- 12:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T376905)', diff saved to https://phabricator.wikimedia.org/P69937 and previous config saved to /var/cache/conftool/dbconfig/20241015-120642-ladsgroup.json
- 12:03 brouberol@cumin1002: END (FAIL) - Cookbook sre.presto.reboot-workers (exit_code=99) for Presto an-presto cluster: Reboot Presto nodes
- 12:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P69936 and previous config saved to /var/cache/conftool/dbconfig/20241015-120025-arnaudb.json
- 11:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P69935 and previous config saved to /var/cache/conftool/dbconfig/20241015-115842-ladsgroup.json
- 11:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1170 (T376905)', diff saved to https://phabricator.wikimedia.org/P69934 and previous config saved to /var/cache/conftool/dbconfig/20241015-115630-ladsgroup.json
- 11:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
- 11:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
- 11:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T376905)', diff saved to https://phabricator.wikimedia.org/P69933 and previous config saved to /var/cache/conftool/dbconfig/20241015-115606-ladsgroup.json
- 11:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T367781)', diff saved to https://phabricator.wikimedia.org/P69932 and previous config saved to /var/cache/conftool/dbconfig/20241015-114518-arnaudb.json
- 11:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P69931 and previous config saved to /var/cache/conftool/dbconfig/20241015-114336-ladsgroup.json
- 11:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T367781)', diff saved to https://phabricator.wikimedia.org/P69930 and previous config saved to /var/cache/conftool/dbconfig/20241015-114302-arnaudb.json
- 11:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2146.codfw.wmnet with reason: Maintenance
- 11:42 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2146.codfw.wmnet with reason: Maintenance
- 11:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T367781)', diff saved to https://phabricator.wikimedia.org/P69929 and previous config saved to /var/cache/conftool/dbconfig/20241015-114240-arnaudb.json
- 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P69927 and previous config saved to /var/cache/conftool/dbconfig/20241015-114059-ladsgroup.json
- 11:34 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 11:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T371742)', diff saved to https://phabricator.wikimedia.org/P69926 and previous config saved to /var/cache/conftool/dbconfig/20241015-112829-ladsgroup.json
- 11:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P69925 and previous config saved to /var/cache/conftool/dbconfig/20241015-112733-arnaudb.json
- 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P69924 and previous config saved to /var/cache/conftool/dbconfig/20241015-112551-ladsgroup.json
- 11:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P69923 and previous config saved to /var/cache/conftool/dbconfig/20241015-111226-arnaudb.json
- 11:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T376905)', diff saved to https://phabricator.wikimedia.org/P69922 and previous config saved to /var/cache/conftool/dbconfig/20241015-111045-ladsgroup.json
- 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1230 (T371742)', diff saved to https://phabricator.wikimedia.org/P69921 and previous config saved to /var/cache/conftool/dbconfig/20241015-110741-ladsgroup.json
- 11:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance
- 11:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance
- 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T376905)', diff saved to https://phabricator.wikimedia.org/P69920 and previous config saved to /var/cache/conftool/dbconfig/20241015-110132-ladsgroup.json
- 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
- 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
- 10:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T367781)', diff saved to https://phabricator.wikimedia.org/P69919 and previous config saved to /var/cache/conftool/dbconfig/20241015-105719-arnaudb.json
- 10:53 tappof: expand LVs on prometheus instances (k8s-mlserve and k8s-stagin) T377196
- 10:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T367781)', diff saved to https://phabricator.wikimedia.org/P69918 and previous config saved to /var/cache/conftool/dbconfig/20241015-105301-arnaudb.json
- 10:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2145.codfw.wmnet with reason: Maintenance
- 10:52 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2145.codfw.wmnet with reason: Maintenance
- 10:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2141.codfw.wmnet with reason: Maintenance
- 10:52 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2141.codfw.wmnet with reason: Maintenance
- 10:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T367781)', diff saved to https://phabricator.wikimedia.org/P69917 and previous config saved to /var/cache/conftool/dbconfig/20241015-105213-arnaudb.json
- 10:38 brouberol@cumin1002: START - Cookbook sre.presto.reboot-workers for Presto an-presto cluster: Reboot Presto nodes
- 10:38 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk2002.codfw.wmnet
- 10:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P69915 and previous config saved to /var/cache/conftool/dbconfig/20241015-103706-arnaudb.json
- 10:34 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host flink-zk2002.codfw.wmnet
- 10:30 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk2003.codfw.wmnet
- 10:26 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host flink-zk2003.codfw.wmnet
- 10:25 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk2001.codfw.wmnet
- 10:22 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host flink-zk2001.codfw.wmnet
- 10:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P69914 and previous config saved to /var/cache/conftool/dbconfig/20241015-102159-arnaudb.json
- 10:21 brouberol@cumin1002: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-flink-codfw cluster: Roll restart of jvm daemons.
- 10:14 brouberol@cumin1002: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-flink-codfw cluster: Roll restart of jvm daemons.
- 10:11 brouberol@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:dse-k8s-worker
- 10:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T367781)', diff saved to https://phabricator.wikimedia.org/P69913 and previous config saved to /var/cache/conftool/dbconfig/20241015-100652-arnaudb.json
- 10:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2130 (T367781)', diff saved to https://phabricator.wikimedia.org/P69912 and previous config saved to /var/cache/conftool/dbconfig/20241015-100435-arnaudb.json
- 10:04 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2130.codfw.wmnet with reason: Maintenance
- 10:04 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2130.codfw.wmnet with reason: Maintenance
- 10:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T367781)', diff saved to https://phabricator.wikimedia.org/P69911 and previous config saved to /var/cache/conftool/dbconfig/20241015-100413-arnaudb.json
- 09:57 brouberol@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker
- 09:55 brouberol@cumin1002: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:dse-k8s-worker
- 09:52 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 09:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P69910 and previous config saved to /var/cache/conftool/dbconfig/20241015-094906-arnaudb.json
- 09:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P69909 and previous config saved to /var/cache/conftool/dbconfig/20241015-093359-arnaudb.json
- 09:26 brouberol@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker
- 09:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T367781)', diff saved to https://phabricator.wikimedia.org/P69908 and previous config saved to /var/cache/conftool/dbconfig/20241015-091852-arnaudb.json
- 09:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2116 (T367781)', diff saved to https://phabricator.wikimedia.org/P69907 and previous config saved to /var/cache/conftool/dbconfig/20241015-091635-arnaudb.json
- 09:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2116.codfw.wmnet with reason: Maintenance
- 09:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2116.codfw.wmnet with reason: Maintenance
- 09:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
- 09:15 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
- 09:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1240.eqiad.wmnet with reason: Maintenance
- 09:15 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1240.eqiad.wmnet with reason: Maintenance
- 09:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1239.eqiad.wmnet with reason: Maintenance
- 09:15 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1239.eqiad.wmnet with reason: Maintenance
- 09:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T367781)', diff saved to https://phabricator.wikimedia.org/P69906 and previous config saved to /var/cache/conftool/dbconfig/20241015-091502-arnaudb.json
- 09:07 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 08:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P69905 and previous config saved to /var/cache/conftool/dbconfig/20241015-085955-arnaudb.json
- 08:47 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: init - oblivian@cumin2002
- 08:46 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: init - oblivian@cumin2002
- 08:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P69903 and previous config saved to /var/cache/conftool/dbconfig/20241015-084448-arnaudb.json
- 08:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T367781)', diff saved to https://phabricator.wikimedia.org/P69902 and previous config saved to /var/cache/conftool/dbconfig/20241015-082941-arnaudb.json
- 08:27 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: maintenance
- 08:27 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 08:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T367781)', diff saved to https://phabricator.wikimedia.org/P69901 and previous config saved to /var/cache/conftool/dbconfig/20241015-082727-arnaudb.json
- 08:27 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: maintenance
- 08:27 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1235.eqiad.wmnet with reason: Maintenance
- 08:27 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1235.eqiad.wmnet with reason: Maintenance
- 08:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T367781)', diff saved to https://phabricator.wikimedia.org/P69900 and previous config saved to /var/cache/conftool/dbconfig/20241015-082704-arnaudb.json
- 08:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P69899 and previous config saved to /var/cache/conftool/dbconfig/20241015-081157-arnaudb.json
- 07:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P69898 and previous config saved to /var/cache/conftool/dbconfig/20241015-075650-arnaudb.json
- 07:48 arnaudb@cumin1002: dbctl commit (dc=all): 'db2147 (re)pooling @ 100%: post sunday p.age T368098', diff saved to https://phabricator.wikimedia.org/P69897 and previous config saved to /var/cache/conftool/dbconfig/20241015-074843-arnaudb.json
- 07:47 hashar: Restarted Gerrit - T373897
- 07:46 hashar@deploy2002: Finished deploy [gerrit/gerrit@2f0c927]: Gerrit to 3.10.2 on gerrit1003 - T373897 (duration: 00m 09s)
- 07:46 hashar@deploy2002: Started deploy [gerrit/gerrit@2f0c927]: Gerrit to 3.10.2 on gerrit1003 - T373897
- 07:42 hashar@deploy2002: Finished deploy [gerrit/gerrit@2f0c927]: Gerrit to 3.10.2 on gerrit2002 - T373897 (duration: 00m 07s)
- 07:42 hashar@deploy2002: Started deploy [gerrit/gerrit@2f0c927]: Gerrit to 3.10.2 on gerrit2002 - T373897
- 07:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T367781)', diff saved to https://phabricator.wikimedia.org/P69896 and previous config saved to /var/cache/conftool/dbconfig/20241015-074143-arnaudb.json
- 07:40 hashar@deploy2002: Finished deploy [gerrit/gerrit@2f0c927]: Gerrit to 3.10.2 on gerrit2003 - T373897 (duration: 00m 07s)
- 07:40 hashar@deploy2002: Started deploy [gerrit/gerrit@2f0c927]: Gerrit to 3.10.2 on gerrit2003 - T373897
- 07:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T367781)', diff saved to https://phabricator.wikimedia.org/P69895 and previous config saved to /var/cache/conftool/dbconfig/20241015-073928-arnaudb.json
- 07:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1234.eqiad.wmnet with reason: Maintenance
- 07:39 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1234.eqiad.wmnet with reason: Maintenance
- 07:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T367781)', diff saved to https://phabricator.wikimedia.org/P69894 and previous config saved to /var/cache/conftool/dbconfig/20241015-073906-arnaudb.json
- 07:38 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit[1003,2002-2003].wikimedia.org with reason: Gerrit 3.10.2 update
- 07:38 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit[1003,2002-2003].wikimedia.org with reason: Gerrit 3.10.2 update
- 07:35 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 07:33 arnaudb@cumin1002: dbctl commit (dc=all): 'db2147 (re)pooling @ 75%: post sunday p.age T368098', diff saved to https://phabricator.wikimedia.org/P69893 and previous config saved to /var/cache/conftool/dbconfig/20241015-073338-arnaudb.json
- 07:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P69892 and previous config saved to /var/cache/conftool/dbconfig/20241015-072359-arnaudb.json
- 07:18 arnaudb@cumin1002: dbctl commit (dc=all): 'db2147 (re)pooling @ 50%: post sunday p.age T368098', diff saved to https://phabricator.wikimedia.org/P69891 and previous config saved to /var/cache/conftool/dbconfig/20241015-071833-arnaudb.json
- 07:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P69890 and previous config saved to /var/cache/conftool/dbconfig/20241015-070852-arnaudb.json
- 07:03 arnaudb@cumin1002: dbctl commit (dc=all): 'db2147 (re)pooling @ 25%: post sunday p.age T368098', diff saved to https://phabricator.wikimedia.org/P69889 and previous config saved to /var/cache/conftool/dbconfig/20241015-070327-arnaudb.json
- 06:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T367781)', diff saved to https://phabricator.wikimedia.org/P69888 and previous config saved to /var/cache/conftool/dbconfig/20241015-065345-arnaudb.json
- 06:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T367781)', diff saved to https://phabricator.wikimedia.org/P69887 and previous config saved to /var/cache/conftool/dbconfig/20241015-065130-arnaudb.json
- 06:51 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1232.eqiad.wmnet with reason: Maintenance
- 06:51 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1232.eqiad.wmnet with reason: Maintenance
- 06:30 kart_: Updated MinT to 2024-10-11-113932-production
- 06:27 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
- 06:18 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
- 06:16 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
- 06:08 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
- 05:38 _joe_: restart tomcat on idp1004
- 05:35 _joe_: restart tomcat on idp2004
- 05:15 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
- 05:10 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
- 04:00 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.24 (duration: 00m 56s)
- 03:51 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.43.0-wmf.27 refs T375658 (duration: 48m 30s)
- 03:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.43.0-wmf.27 refs T375658
- 02:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 02:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 02:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T376905)', diff saved to https://phabricator.wikimedia.org/P69885 and previous config saved to /var/cache/conftool/dbconfig/20241015-024037-ladsgroup.json
- 02:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P69884 and previous config saved to /var/cache/conftool/dbconfig/20241015-022530-ladsgroup.json
- 02:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P69883 and previous config saved to /var/cache/conftool/dbconfig/20241015-021023-ladsgroup.json
- 01:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T376905)', diff saved to https://phabricator.wikimedia.org/P69882 and previous config saved to /var/cache/conftool/dbconfig/20241015-015516-ladsgroup.json
- 01:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1246 (T376905)', diff saved to https://phabricator.wikimedia.org/P69881 and previous config saved to /var/cache/conftool/dbconfig/20241015-014831-ladsgroup.json
- 01:48 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
- 01:48 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
- 01:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T376905)', diff saved to https://phabricator.wikimedia.org/P69880 and previous config saved to /var/cache/conftool/dbconfig/20241015-014803-ladsgroup.json
- 01:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P69879 and previous config saved to /var/cache/conftool/dbconfig/20241015-013257-ladsgroup.json
- 01:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P69878 and previous config saved to /var/cache/conftool/dbconfig/20241015-011749-ladsgroup.json
- 01:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T376905)', diff saved to https://phabricator.wikimedia.org/P69877 and previous config saved to /var/cache/conftool/dbconfig/20241015-010242-ladsgroup.json
- 00:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T376905)', diff saved to https://phabricator.wikimedia.org/P69876 and previous config saved to /var/cache/conftool/dbconfig/20241015-005551-ladsgroup.json
- 00:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T370903)', diff saved to https://phabricator.wikimedia.org/P69875 and previous config saved to /var/cache/conftool/dbconfig/20241015-005546-ladsgroup.json
- 00:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
- 00:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
- 00:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T376905)', diff saved to https://phabricator.wikimedia.org/P69874 and previous config saved to /var/cache/conftool/dbconfig/20241015-005525-ladsgroup.json
- 00:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P69873 and previous config saved to /var/cache/conftool/dbconfig/20241015-004039-ladsgroup.json
- 00:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P69872 and previous config saved to /var/cache/conftool/dbconfig/20241015-004018-ladsgroup.json
- 00:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P69871 and previous config saved to /var/cache/conftool/dbconfig/20241015-002531-ladsgroup.json
- 00:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P69870 and previous config saved to /var/cache/conftool/dbconfig/20241015-002511-ladsgroup.json
- 00:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T370903)', diff saved to https://phabricator.wikimedia.org/P69869 and previous config saved to /var/cache/conftool/dbconfig/20241015-001024-ladsgroup.json
- 00:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T376905)', diff saved to https://phabricator.wikimedia.org/P69868 and previous config saved to /var/cache/conftool/dbconfig/20241015-001004-ladsgroup.json
- 00:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1229 (T376905)', diff saved to https://phabricator.wikimedia.org/P69867 and previous config saved to /var/cache/conftool/dbconfig/20241015-000304-ladsgroup.json
- 00:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: Maintenance
- 00:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: Maintenance
- 00:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T376905)', diff saved to https://phabricator.wikimedia.org/P69866 and previous config saved to /var/cache/conftool/dbconfig/20241015-000236-ladsgroup.json
2024-10-14
- 23:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P69865 and previous config saved to /var/cache/conftool/dbconfig/20241014-234729-ladsgroup.json
- 23:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P69864 and previous config saved to /var/cache/conftool/dbconfig/20241014-233222-ladsgroup.json
- 23:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2140 (T370903)', diff saved to https://phabricator.wikimedia.org/P69863 and previous config saved to /var/cache/conftool/dbconfig/20241014-232857-ladsgroup.json
- 23:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2140.codfw.wmnet with reason: Maintenance
- 23:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2140.codfw.wmnet with reason: Maintenance
- 23:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T370903)', diff saved to https://phabricator.wikimedia.org/P69862 and previous config saved to /var/cache/conftool/dbconfig/20241014-232835-ladsgroup.json
- 23:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T376905)', diff saved to https://phabricator.wikimedia.org/P69861 and previous config saved to /var/cache/conftool/dbconfig/20241014-231715-ladsgroup.json
- 23:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P69860 and previous config saved to /var/cache/conftool/dbconfig/20241014-231328-ladsgroup.json
- 23:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T376905)', diff saved to https://phabricator.wikimedia.org/P69859 and previous config saved to /var/cache/conftool/dbconfig/20241014-230903-ladsgroup.json
- 23:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
- 23:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
- 23:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T376905)', diff saved to https://phabricator.wikimedia.org/P69858 and previous config saved to /var/cache/conftool/dbconfig/20241014-230838-ladsgroup.json
- 22:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P69857 and previous config saved to /var/cache/conftool/dbconfig/20241014-225818-ladsgroup.json
- 22:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T371742)', diff saved to https://phabricator.wikimedia.org/P69856 and previous config saved to /var/cache/conftool/dbconfig/20241014-225528-ladsgroup.json
- 22:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P69855 and previous config saved to /var/cache/conftool/dbconfig/20241014-225331-ladsgroup.json
- 22:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T370903)', diff saved to https://phabricator.wikimedia.org/P69854 and previous config saved to /var/cache/conftool/dbconfig/20241014-224311-ladsgroup.json
- 22:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P69853 and previous config saved to /var/cache/conftool/dbconfig/20241014-224022-ladsgroup.json
- 22:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P69852 and previous config saved to /var/cache/conftool/dbconfig/20241014-223824-ladsgroup.json
- 22:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P69851 and previous config saved to /var/cache/conftool/dbconfig/20241014-222515-ladsgroup.json
- 22:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T376905)', diff saved to https://phabricator.wikimedia.org/P69850 and previous config saved to /var/cache/conftool/dbconfig/20241014-222317-ladsgroup.json
- 22:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P69849 and previous config saved to /var/cache/conftool/dbconfig/20241014-222009-ladsgroup.json
- 22:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T376905)', diff saved to https://phabricator.wikimedia.org/P69848 and previous config saved to /var/cache/conftool/dbconfig/20241014-221508-ladsgroup.json
- 22:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
- 22:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
- 22:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T376905)', diff saved to https://phabricator.wikimedia.org/P69847 and previous config saved to /var/cache/conftool/dbconfig/20241014-221443-ladsgroup.json
- 22:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T371742)', diff saved to https://phabricator.wikimedia.org/P69846 and previous config saved to /var/cache/conftool/dbconfig/20241014-221008-ladsgroup.json
- 22:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P69845 and previous config saved to /var/cache/conftool/dbconfig/20241014-220504-ladsgroup.json
- 22:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T370903)', diff saved to https://phabricator.wikimedia.org/P69844 and previous config saved to /var/cache/conftool/dbconfig/20241014-220134-ladsgroup.json
- 22:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1243.eqiad.wmnet with reason: Maintenance
- 22:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1243.eqiad.wmnet with reason: Maintenance
- 21:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P69843 and previous config saved to /var/cache/conftool/dbconfig/20241014-215936-ladsgroup.json
- 21:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P69842 and previous config saved to /var/cache/conftool/dbconfig/20241014-214958-ladsgroup.json
- 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1236 (T371742)', diff saved to https://phabricator.wikimedia.org/P69841 and previous config saved to /var/cache/conftool/dbconfig/20241014-214515-ladsgroup.json
- 21:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1236.eqiad.wmnet with reason: Maintenance
- 21:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1236.eqiad.wmnet with reason: Maintenance
- 21:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P69840 and previous config saved to /var/cache/conftool/dbconfig/20241014-214429-ladsgroup.json
- 21:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 (T367856)', diff saved to https://phabricator.wikimedia.org/P69839 and previous config saved to /var/cache/conftool/dbconfig/20241014-213902-ladsgroup.json
- 21:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1226.eqiad.wmnet with reason: Maintenance
- 21:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1226.eqiad.wmnet with reason: Maintenance
- 21:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P69838 and previous config saved to /var/cache/conftool/dbconfig/20241014-213453-ladsgroup.json
- 21:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T376905)', diff saved to https://phabricator.wikimedia.org/P69837 and previous config saved to /var/cache/conftool/dbconfig/20241014-212922-ladsgroup.json
- 21:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T376905)', diff saved to https://phabricator.wikimedia.org/P69836 and previous config saved to /var/cache/conftool/dbconfig/20241014-212001-ladsgroup.json
- 21:19 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
- 21:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
- 21:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T376905)', diff saved to https://phabricator.wikimedia.org/P69835 and previous config saved to /var/cache/conftool/dbconfig/20241014-211937-ladsgroup.json
- 21:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P69834 and previous config saved to /var/cache/conftool/dbconfig/20241014-210430-ladsgroup.json
- 20:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P69833 and previous config saved to /var/cache/conftool/dbconfig/20241014-204923-ladsgroup.json
- 20:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T376905)', diff saved to https://phabricator.wikimedia.org/P69832 and previous config saved to /var/cache/conftool/dbconfig/20241014-203416-ladsgroup.json
- 20:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1162 (T376905)', diff saved to https://phabricator.wikimedia.org/P69831 and previous config saved to /var/cache/conftool/dbconfig/20241014-202504-ladsgroup.json
- 20:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
- 20:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
- 20:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T376905)', diff saved to https://phabricator.wikimedia.org/P69830 and previous config saved to /var/cache/conftool/dbconfig/20241014-202439-ladsgroup.json
- 20:21 TheresNoTime: UTC late backport window done
- 20:18 samtar@deploy2002: Finished scap sync-world: Backport for Missing.php: Redirect Scots Wiktionary to Scots Wikipedia (T249648) (duration: 08m 14s)
- 20:14 samtar@deploy2002: samtar, pppery: Continuing with sync
- 20:12 samtar@deploy2002: samtar, pppery: Backport for Missing.php: Redirect Scots Wiktionary to Scots Wikipedia (T249648) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:10 samtar@deploy2002: Started scap sync-world: Backport for Missing.php: Redirect Scots Wiktionary to Scots Wikipedia (T249648)
- 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P69829 and previous config saved to /var/cache/conftool/dbconfig/20241014-200932-ladsgroup.json
- 19:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P69828 and previous config saved to /var/cache/conftool/dbconfig/20241014-195425-ladsgroup.json
- 19:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T376905)', diff saved to https://phabricator.wikimedia.org/P69827 and previous config saved to /var/cache/conftool/dbconfig/20241014-193918-ladsgroup.json
- 19:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T376905)', diff saved to https://phabricator.wikimedia.org/P69826 and previous config saved to /var/cache/conftool/dbconfig/20241014-192956-ladsgroup.json
- 19:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 19:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 19:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
- 19:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
- 18:57 aqu@deploy2002: Finished deploy [airflow-dags/analytics@a1a70ce]: Deploy last version for Refine staging [airflow-dags@a1a70ce8] (duration: 00m 29s)
- 18:57 aqu@deploy2002: Started deploy [airflow-dags/analytics@a1a70ce]: Deploy last version for Refine staging [airflow-dags@a1a70ce8]
- 18:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
- 18:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
- 18:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T376905)', diff saved to https://phabricator.wikimedia.org/P69825 and previous config saved to /var/cache/conftool/dbconfig/20241014-185225-ladsgroup.json
- 18:47 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@a1a70ce]: Deploy last fixes on Refine staging [airflow-dags@a1a70ce8] (duration: 00m 13s)
- 18:47 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@a1a70ce]: Deploy last fixes on Refine staging [airflow-dags@a1a70ce8]
- 18:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P69824 and previous config saved to /var/cache/conftool/dbconfig/20241014-183718-ladsgroup.json
- 18:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P69823 and previous config saved to /var/cache/conftool/dbconfig/20241014-182211-ladsgroup.json
- 18:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T376905)', diff saved to https://phabricator.wikimedia.org/P69822 and previous config saved to /var/cache/conftool/dbconfig/20241014-180704-ladsgroup.json
- 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1231 (T376905)', diff saved to https://phabricator.wikimedia.org/P69821 and previous config saved to /var/cache/conftool/dbconfig/20241014-170647-ladsgroup.json
- 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
- 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
- 17:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 17:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 17:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T376905)', diff saved to https://phabricator.wikimedia.org/P69820 and previous config saved to /var/cache/conftool/dbconfig/20241014-170123-ladsgroup.json
- 16:51 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: cloudvirt1063 needs maintenance T375223
- 16:50 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: cloudvirt1063 needs maintenance T375223
- 16:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P69819 and previous config saved to /var/cache/conftool/dbconfig/20241014-164616-ladsgroup.json
- 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P69818 and previous config saved to /var/cache/conftool/dbconfig/20241014-163109-ladsgroup.json
- 16:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T376905)', diff saved to https://phabricator.wikimedia.org/P69817 and previous config saved to /var/cache/conftool/dbconfig/20241014-161602-ladsgroup.json
- 16:03 sergi0: Running `sgimeno@mwmaint2002:~$ foreachwiki userOptions.php --delete --old=1 growthexperiments-tour-newimpact-discovery` (T376461)
- 15:52 aikochou@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 15:46 aikochou@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 15:16 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
- 15:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T376905)', diff saved to https://phabricator.wikimedia.org/P69816 and previous config saved to /var/cache/conftool/dbconfig/20241014-151546-ladsgroup.json
- 15:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
- 15:15 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
- 15:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
- 15:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T376905)', diff saved to https://phabricator.wikimedia.org/P69815 and previous config saved to /var/cache/conftool/dbconfig/20241014-151521-ladsgroup.json
- 15:07 elukey@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
- 15:06 elukey@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
- 15:05 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
- 15:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P69814 and previous config saved to /var/cache/conftool/dbconfig/20241014-150014-ladsgroup.json
- 14:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P69813 and previous config saved to /var/cache/conftool/dbconfig/20241014-144507-ladsgroup.json
- 14:43 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 14:43 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 14:41 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 14:41 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 14:39 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T376905)', diff saved to https://phabricator.wikimedia.org/P69812 and previous config saved to /var/cache/conftool/dbconfig/20241014-143000-ladsgroup.json
- 14:16 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts an-worker1177.eqiad.wmnet
- 14:16 stevemunene@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:16 stevemunene@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1177.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1002"
- 14:16 stevemunene@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1177.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1002"
- 14:12 stevemunene@cumin1002: START - Cookbook sre.dns.netbox
- 14:12 Lucas_WMDE: UTC afternoon backport+config window done
- 14:10 Lucas_WMDE: [untruncated duration: 06m 48s]
- 14:09 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for refactor(tests): don't use per-method coverage annotation, refactor(HomepageHooks): extract method for simpler modifyability, Clear LinkRecommendation suggestions on page save (T364341 T372337), Run fixLinkRecommendationData even when disabled in CC (T373176) (duration: 0
- 14:07 stevemunene@cumin1002: START - Cookbook sre.hosts.decommission for hosts an-worker1177.eqiad.wmnet
- 14:07 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts an-worker1176.eqiad.wmnet
- 14:07 stevemunene@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:07 stevemunene@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1176.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1002"
- 14:06 stevemunene@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1176.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1002"
- 14:04 lucaswerkmeister-wmde@deploy2002: migr, lucaswerkmeister-wmde: Continuing with sync
- 14:04 lucaswerkmeister-wmde@deploy2002: migr, lucaswerkmeister-wmde: Backport for refactor(tests): don't use per-method coverage annotation, refactor(HomepageHooks): extract method for simpler modifyability, Clear LinkRecommendation suggestions on page save (T364341 T372337), Run fixLinkRecommendationData even when disabled in CC (T373176) synced to
- 14:03 stevemunene@cumin1002: START - Cookbook sre.dns.netbox
- 14:02 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for refactor(tests): don't use per-method coverage annotation, refactor(HomepageHooks): extract method for simpler modifyability, Clear LinkRecommendation suggestions on page save (T364341 T372337), Run fixLinkRecommendationData even when disabled in CC (T373176)
- 13:58 stevemunene@cumin1002: START - Cookbook sre.hosts.decommission for hosts an-worker1176.eqiad.wmnet
- 13:46 ladsgroup@deploy2002: Finished scap sync-world: Backport for Update interwiki.php (duration: 07m 00s)
- 13:45 kcvelaga@deploy2002: Finished deploy [airflow-dags/analytics_product@fbcf880]: T375480 (duration: 01m 07s)
- 13:44 kcvelaga@deploy2002: Started deploy [airflow-dags/analytics_product@fbcf880]: T375480
- 13:41 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 13:41 ladsgroup@deploy2002: ladsgroup: Backport for Update interwiki.php synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:39 ladsgroup@deploy2002: Started scap sync-world: Backport for Update interwiki.php
- 13:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aux-k8s-etcd1002.eqiad.wmnet
- 13:35 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:35 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-etcd1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
- 13:34 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-etcd1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
- 13:31 elukey@cumin1002: START - Cookbook sre.dns.netbox
- 13:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T376905)', diff saved to https://phabricator.wikimedia.org/P69811 and previous config saved to /var/cache/conftool/dbconfig/20241014-132944-ladsgroup.json
- 13:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
- 13:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
- 13:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T376905)', diff saved to https://phabricator.wikimedia.org/P69810 and previous config saved to /var/cache/conftool/dbconfig/20241014-132918-ladsgroup.json
- 13:26 elukey@cumin1002: START - Cookbook sre.hosts.decommission for hosts aux-k8s-etcd1002.eqiad.wmnet
- 13:26 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aux-k8s-etcd1001.eqiad.wmnet
- 13:26 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:26 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-etcd1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
- 13:26 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-etcd1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
- 13:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P69809 and previous config saved to /var/cache/conftool/dbconfig/20241014-132409-ladsgroup.json
- 13:22 elukey@cumin1002: START - Cookbook sre.dns.netbox
- 13:18 elukey@cumin1002: START - Cookbook sre.hosts.decommission for hosts aux-k8s-etcd1001.eqiad.wmnet
- 13:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-etcd1002.eqiad.wmnet with reason: about to decom
- 13:16 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-etcd1002.eqiad.wmnet with reason: about to decom
- 13:15 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-etcd1001.eqiad.wmnet with reason: about to decom
- 13:15 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-etcd1001.eqiad.wmnet with reason: about to decom
- 13:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P69808 and previous config saved to /var/cache/conftool/dbconfig/20241014-131411-ladsgroup.json
- 13:13 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [uawikimedia] Enable the CampaignEvents extension (T376695) (duration: 10m 19s)
- 13:09 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, daimona: Continuing with sync
- 13:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P69807 and previous config saved to /var/cache/conftool/dbconfig/20241014-130904-ladsgroup.json
- 13:05 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, daimona: Backport for [uawikimedia] Enable the CampaignEvents extension (T376695) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:03 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [uawikimedia] Enable the CampaignEvents extension (T376695)
- 12:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P69806 and previous config saved to /var/cache/conftool/dbconfig/20241014-125904-ladsgroup.json
- 12:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P69805 and previous config saved to /var/cache/conftool/dbconfig/20241014-125358-ladsgroup.json
- 12:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1219 (T367781)', diff saved to https://phabricator.wikimedia.org/P69804 and previous config saved to /var/cache/conftool/dbconfig/20241014-124554-arnaudb.json
- 12:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1219.eqiad.wmnet with reason: Maintenance
- 12:45 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1219.eqiad.wmnet with reason: Maintenance
- 12:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T367781)', diff saved to https://phabricator.wikimedia.org/P69803 and previous config saved to /var/cache/conftool/dbconfig/20241014-124532-arnaudb.json
- 12:44 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503] (duration: 00m 12s)
- 12:44 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503]
- 12:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T376905)', diff saved to https://phabricator.wikimedia.org/P69802 and previous config saved to /var/cache/conftool/dbconfig/20241014-124357-ladsgroup.json
- 12:43 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aux-k8s-worker1001.eqiad.wmnet
- 12:43 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:43 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-worker1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
- 12:41 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-worker1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
- 12:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P69801 and previous config saved to /var/cache/conftool/dbconfig/20241014-123853-ladsgroup.json
- 12:37 elukey@cumin1002: START - Cookbook sre.dns.netbox
- 12:32 elukey@cumin1002: START - Cookbook sre.hosts.decommission for hosts aux-k8s-worker1001.eqiad.wmnet
- 12:32 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aux-k8s-ctrl1001.eqiad.wmnet
- 12:32 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:32 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-ctrl1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
- 12:32 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-ctrl1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
- 12:30 hnowlan: removed all aqsv1 service components from aqs* hosts
- 12:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P69800 and previous config saved to /var/cache/conftool/dbconfig/20241014-123025-arnaudb.json
- 12:28 elukey@cumin1002: START - Cookbook sre.dns.netbox
- 12:23 elukey@cumin1002: START - Cookbook sre.hosts.decommission for hosts aux-k8s-ctrl1001.eqiad.wmnet
- 12:22 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=aux-k8s-worker1001.eqiad.wmnet
- 12:22 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=aux-k8s-ctrl1001.eqiad.wmnet
- 12:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P69799 and previous config saved to /var/cache/conftool/dbconfig/20241014-121518-arnaudb.json
- 12:09 elukey: increase etcd k8s aux cluster from 3 -> 5 - T344230
- 12:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T367781)', diff saved to https://phabricator.wikimedia.org/P69798 and previous config saved to /var/cache/conftool/dbconfig/20241014-120011-arnaudb.json
- 11:59 aborrero@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:59 aborrero@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb2004-dev cloud-private adddress - aborrero@cumin1002"
- 11:59 aborrero@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb2004-dev cloud-private adddress - aborrero@cumin1002"
- 11:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1218 (T367781)', diff saved to https://phabricator.wikimedia.org/P69797 and previous config saved to /var/cache/conftool/dbconfig/20241014-115755-arnaudb.json
- 11:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1218.eqiad.wmnet with reason: Maintenance
- 11:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1218.eqiad.wmnet with reason: Maintenance
- 11:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T367781)', diff saved to https://phabricator.wikimedia.org/P69796 and previous config saved to /var/cache/conftool/dbconfig/20241014-115732-arnaudb.json
- 11:56 Dreamy_Jazz: Started time limited scan on enwiki for MediaModeration - https://wikitech.wikimedia.org/wiki/MediaModeration
- 11:56 aborrero@cumin1002: START - Cookbook sre.dns.netbox
- 11:52 btullis@cumin1002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wcqs-public
- 11:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2194.codfw.wmnet onto db2227.codfw.wmnet
- 11:50 btullis@cumin1002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wcqs-public
- 11:50 hnowlan@deploy2002: Finished deploy [restbase/deploy@26112d4]: Remove unused AQS components. Add bdrwiki (T371761) (duration: 15m 38s)
- 11:45 Dreamy_Jazz: Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
- 11:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1173 (T376905)', diff saved to https://phabricator.wikimedia.org/P69794 and previous config saved to /var/cache/conftool/dbconfig/20241014-114341-ladsgroup.json
- 11:43 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
- 11:43 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
- 11:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T376905)', diff saved to https://phabricator.wikimedia.org/P69793 and previous config saved to /var/cache/conftool/dbconfig/20241014-114316-ladsgroup.json
- 11:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P69792 and previous config saved to /var/cache/conftool/dbconfig/20241014-114225-arnaudb.json
- 11:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P69791 and previous config saved to /var/cache/conftool/dbconfig/20241014-113941-arnaudb.json
- 11:34 hnowlan@deploy2002: Started deploy [restbase/deploy@26112d4]: Remove unused AQS components. Add bdrwiki (T371761)
- 11:31 andrewtavis-wmde@deploy2002: Finished deploy [airflow-dags/wmde@c9a2532]: (no justification provided) (duration: 00m 08s)
- 11:30 andrewtavis-wmde@deploy2002: Started deploy [airflow-dags/wmde@c9a2532]: (no justification provided)
- 11:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P69790 and previous config saved to /var/cache/conftool/dbconfig/20241014-112809-ladsgroup.json
- 11:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P69789 and previous config saved to /var/cache/conftool/dbconfig/20241014-112719-arnaudb.json
- 11:26 claime: Running ./redis-check-aof --fix on rdb1014 tcp_6379 instance - T376961
- 11:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P69788 and previous config saved to /var/cache/conftool/dbconfig/20241014-112434-arnaudb.json
- 11:16 ladsgroup@deploy2002: Finished scap sync-world: Creating bclwikisource (T377084) (duration: 06m 49s)
- 11:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P69787 and previous config saved to /var/cache/conftool/dbconfig/20241014-111302-ladsgroup.json
- 11:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T367781)', diff saved to https://phabricator.wikimedia.org/P69786 and previous config saved to /var/cache/conftool/dbconfig/20241014-111211-arnaudb.json
- 11:10 ladsgroup@deploy2002: Started scap sync-world: Creating bclwikisource (T377084)
- 11:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1207 (T367781)', diff saved to https://phabricator.wikimedia.org/P69785 and previous config saved to /var/cache/conftool/dbconfig/20241014-110956-arnaudb.json
- 11:09 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1207.eqiad.wmnet with reason: Maintenance
- 11:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1207.eqiad.wmnet with reason: Maintenance
- 11:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T367781)', diff saved to https://phabricator.wikimedia.org/P69784 and previous config saved to /var/cache/conftool/dbconfig/20241014-110933-arnaudb.json
- 11:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P69783 and previous config saved to /var/cache/conftool/dbconfig/20241014-110927-arnaudb.json
- 11:07 ladsgroup@deploy2002: Finished scap sync-world: Creating ibawiki (T376568) (duration: 06m 45s)
- 11:05 eoghan@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts1003.eqiad.wmnet
- 11:01 ladsgroup@deploy2002: Started scap sync-world: Creating ibawiki (T376568)
- 11:00 eoghan@cumin2002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet
- 10:58 ladsgroup@deploy2002: Finished scap sync-world: Creating annwiki (T376332) (duration: 06m 45s)
- 10:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T376905)', diff saved to https://phabricator.wikimedia.org/P69782 and previous config saved to /var/cache/conftool/dbconfig/20241014-105755-ladsgroup.json
- 10:55 mvernon@cumin1002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
- 10:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P69781 and previous config saved to /var/cache/conftool/dbconfig/20241014-105426-arnaudb.json
- 10:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P69780 and previous config saved to /var/cache/conftool/dbconfig/20241014-105421-arnaudb.json
- 10:52 ladsgroup@deploy2002: Started scap sync-world: Creating annwiki (T376332)
- 10:51 mvernon@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
- 10:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T376905)', diff saved to https://phabricator.wikimedia.org/P69779 and previous config saved to /var/cache/conftool/dbconfig/20241014-104941-ladsgroup.json
- 10:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
- 10:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
- 10:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T376905)', diff saved to https://phabricator.wikimedia.org/P69778 and previous config saved to /var/cache/conftool/dbconfig/20241014-104916-ladsgroup.json
- 10:48 ladsgroup@deploy2002: Finished scap sync-world: Creating tddwiki (T375422) (duration: 06m 46s)
- 10:44 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert1002.wikimedia.org with reason: init - oblivian@cumin2002
- 10:44 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert1002.wikimedia.org with reason: init - oblivian@cumin2002
- 10:42 ladsgroup@deploy2002: Started scap sync-world: Creating tddwiki (T375422)
- 10:40 ladsgroup@deploy2002: Finished scap sync-world: Creating nrwiki (T375087) (duration: 06m 54s)
- 10:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P69777 and previous config saved to /var/cache/conftool/dbconfig/20241014-103919-arnaudb.json
- 10:35 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert2002.wikimedia.org with reason: init - oblivian@cumin2002
- 10:35 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert2002.wikimedia.org with reason: init - oblivian@cumin2002
- 10:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P69776 and previous config saved to /var/cache/conftool/dbconfig/20241014-103410-ladsgroup.json
- 10:33 ladsgroup@deploy2002: Started scap sync-world: Creating nrwiki (T375087)
- 10:31 ladsgroup@deploy2002: Finished scap sync-world: Backport for Add namespace translations for Tai Nüa (tdd) (T375421) (duration: 06m 45s)
- 10:27 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 10:27 ladsgroup@deploy2002: ladsgroup: Backport for Add namespace translations for Tai Nüa (tdd) (T375421) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 10:25 ladsgroup@deploy2002: Started scap sync-world: Backport for Add namespace translations for Tai Nüa (tdd) (T375421)
- 10:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T367781)', diff saved to https://phabricator.wikimedia.org/P69775 and previous config saved to /var/cache/conftool/dbconfig/20241014-102412-arnaudb.json
- 10:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T367781)', diff saved to https://phabricator.wikimedia.org/P69774 and previous config saved to /var/cache/conftool/dbconfig/20241014-102256-arnaudb.json
- 10:22 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1206.eqiad.wmnet with reason: Maintenance
- 10:22 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1206.eqiad.wmnet with reason: Maintenance
- 10:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T367781)', diff saved to https://phabricator.wikimedia.org/P69773 and previous config saved to /var/cache/conftool/dbconfig/20241014-102234-arnaudb.json
- 10:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P69772 and previous config saved to /var/cache/conftool/dbconfig/20241014-101903-ladsgroup.json
- 10:17 ladsgroup@cumin1002: START - Cookbook sre.mysql.clone of db2194.codfw.wmnet onto db2227.codfw.wmnet
- 10:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling for reclone (T375652)', diff saved to https://phabricator.wikimedia.org/P69771 and previous config saved to /var/cache/conftool/dbconfig/20241014-101354-ladsgroup.json
- 10:13 eoghan@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lists1004.wikimedia.org
- 10:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling for reclone (T375652)', diff saved to https://phabricator.wikimedia.org/P69770 and previous config saved to /var/cache/conftool/dbconfig/20241014-101246-ladsgroup.json
- 10:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P69769 and previous config saved to /var/cache/conftool/dbconfig/20241014-100727-arnaudb.json
- 10:06 eoghan@cumin2002: START - Cookbook sre.hosts.reboot-single for host lists1004.wikimedia.org
- 10:06 eoghan@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lists2001.wikimedia.org
- 10:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T376905)', diff saved to https://phabricator.wikimedia.org/P69768 and previous config saved to /var/cache/conftool/dbconfig/20241014-100356-ladsgroup.json
- 10:00 akosiaris: powercycle rdb1014 T376961
- 10:00 eoghan@cumin2002: START - Cookbook sre.hosts.reboot-single for host lists2001.wikimedia.org
- 10:00 oblivian@cumin2002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) hiddenparma to alert2002.wikimedia.org with reason: init - oblivian@cumin2002
- 10:00 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert2002.wikimedia.org with reason: init - oblivian@cumin2002
- 10:00 ladsgroup@deploy2002: Finished scap sync-world: Creating rskwiki (T374963) (duration: 18m 38s)
- 09:59 oblivian@cumin2002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) hiddenparma to alert2002.wikimedia.org with reason: init - oblivian@cumin2002
- 09:59 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert2002.wikimedia.org with reason: init - oblivian@cumin2002
- 09:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P69767 and previous config saved to /var/cache/conftool/dbconfig/20241014-095354-arnaudb.json
- 09:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
- 09:53 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
- 09:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P69766 and previous config saved to /var/cache/conftool/dbconfig/20241014-095331-arnaudb.json
- 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P69765 and previous config saved to /var/cache/conftool/dbconfig/20241014-095220-arnaudb.json
- 09:41 ladsgroup@deploy2002: Started scap sync-world: Creating rskwiki (T374963)
- 09:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P69764 and previous config saved to /var/cache/conftool/dbconfig/20241014-093824-arnaudb.json
- 09:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T367781)', diff saved to https://phabricator.wikimedia.org/P69763 and previous config saved to /var/cache/conftool/dbconfig/20241014-093713-arnaudb.json
- 09:36 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 09:35 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1196 (T367781)', diff saved to https://phabricator.wikimedia.org/P69762 and previous config saved to /var/cache/conftool/dbconfig/20241014-093459-arnaudb.json
- 09:34 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 09:34 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 09:34 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1196.eqiad.wmnet with reason: Maintenance
- 09:34 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1196.eqiad.wmnet with reason: Maintenance
- 09:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T367781)', diff saved to https://phabricator.wikimedia.org/P69761 and previous config saved to /var/cache/conftool/dbconfig/20241014-093418-arnaudb.json
- 09:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P69760 and previous config saved to /var/cache/conftool/dbconfig/20241014-092317-arnaudb.json
- 09:21 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 09:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P69759 and previous config saved to /var/cache/conftool/dbconfig/20241014-091911-arnaudb.json
- 09:09 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
- 09:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P69758 and previous config saved to /var/cache/conftool/dbconfig/20241014-090810-arnaudb.json
- 09:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P69757 and previous config saved to /var/cache/conftool/dbconfig/20241014-090403-arnaudb.json
- 09:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T376905)', diff saved to https://phabricator.wikimedia.org/P69756 and previous config saved to /var/cache/conftool/dbconfig/20241014-090340-ladsgroup.json
- 09:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 09:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 09:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
- 09:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
- 09:01 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster2005.codfw.wmnet
- 08:58 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
- 08:55 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2005.codfw.wmnet
- 08:55 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster2004.codfw.wmnet
- 08:49 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2004.codfw.wmnet
- 08:49 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster2003.codfw.wmnet
- 08:49 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
- 08:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T367781)', diff saved to https://phabricator.wikimedia.org/P69755 and previous config saved to /var/cache/conftool/dbconfig/20241014-084856-arnaudb.json
- 08:48 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
- 08:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1195 (T367781)', diff saved to https://phabricator.wikimedia.org/P69754 and previous config saved to /var/cache/conftool/dbconfig/20241014-084643-arnaudb.json
- 08:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1195.eqiad.wmnet with reason: Maintenance
- 08:46 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1195.eqiad.wmnet with reason: Maintenance
- 08:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T367781)', diff saved to https://phabricator.wikimedia.org/P69753 and previous config saved to /var/cache/conftool/dbconfig/20241014-084620-arnaudb.json
- 08:43 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2003.codfw.wmnet
- 08:43 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1029.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 08:40 elukey@cumin1002: START - Cookbook sre.hosts.provision for host dbproxy1029.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 08:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P69752 and previous config saved to /var/cache/conftool/dbconfig/20241014-083113-arnaudb.json
- 08:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 08:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P69751 and previous config saved to /var/cache/conftool/dbconfig/20241014-081606-arnaudb.json
- 08:13 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster2003.codfw.wmnet
- 08:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 08:12 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 08:11 elukey@cumin1002: START - Cookbook sre.hosts.provision for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 08:11 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 08:10 elukey@cumin1002: START - Cookbook sre.hosts.provision for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 08:10 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster2004.codfw.wmnet
- 08:08 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2003.codfw.wmnet
- 08:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P69750 and previous config saved to /var/cache/conftool/dbconfig/20241014-080744-arnaudb.json
- 08:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2172.codfw.wmnet with reason: Maintenance
- 08:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2172.codfw.wmnet with reason: Maintenance
- 08:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P69749 and previous config saved to /var/cache/conftool/dbconfig/20241014-080721-arnaudb.json
- 08:07 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster2005.codfw.wmnet
- 08:02 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2004.codfw.wmnet
- 08:01 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2005.codfw.wmnet
- 08:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T367781)', diff saved to https://phabricator.wikimedia.org/P69748 and previous config saved to /var/cache/conftool/dbconfig/20241014-080059-arnaudb.json
- 08:00 jayme@cumin1002: END (FAIL) - Cookbook sre.ganeti.reboot-vm (exit_code=99) for VM kubestagemaster2005.codfw.wmnet
- 08:00 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2005.codfw.wmnet
- 07:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1186 (T367781)', diff saved to https://phabricator.wikimedia.org/P69747 and previous config saved to /var/cache/conftool/dbconfig/20241014-075845-arnaudb.json
- 07:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1186.eqiad.wmnet with reason: Maintenance
- 07:58 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1186.eqiad.wmnet with reason: Maintenance
- 07:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T367781)', diff saved to https://phabricator.wikimedia.org/P69746 and previous config saved to /var/cache/conftool/dbconfig/20241014-075823-arnaudb.json
- 07:52 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 07:52 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 07:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P69745 and previous config saved to /var/cache/conftool/dbconfig/20241014-075214-arnaudb.json
- 07:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P69744 and previous config saved to /var/cache/conftool/dbconfig/20241014-074317-arnaudb.json
- 07:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P69743 and previous config saved to /var/cache/conftool/dbconfig/20241014-073707-arnaudb.json
- 07:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P69742 and previous config saved to /var/cache/conftool/dbconfig/20241014-072810-arnaudb.json
- 07:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P69741 and previous config saved to /var/cache/conftool/dbconfig/20241014-072201-arnaudb.json
- 07:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T367781)', diff saved to https://phabricator.wikimedia.org/P69740 and previous config saved to /var/cache/conftool/dbconfig/20241014-071302-arnaudb.json
- 07:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1184 (T367781)', diff saved to https://phabricator.wikimedia.org/P69739 and previous config saved to /var/cache/conftool/dbconfig/20241014-071048-arnaudb.json
- 07:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1184.eqiad.wmnet with reason: Maintenance
- 07:10 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1184.eqiad.wmnet with reason: Maintenance
- 07:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T367781)', diff saved to https://phabricator.wikimedia.org/P69738 and previous config saved to /var/cache/conftool/dbconfig/20241014-071026-arnaudb.json
- 06:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P69737 and previous config saved to /var/cache/conftool/dbconfig/20241014-065519-arnaudb.json
- 06:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P69736 and previous config saved to /var/cache/conftool/dbconfig/20241014-064012-arnaudb.json
- 06:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T367781)', diff saved to https://phabricator.wikimedia.org/P69735 and previous config saved to /var/cache/conftool/dbconfig/20241014-062505-arnaudb.json
- 06:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1169 (T367781)', diff saved to https://phabricator.wikimedia.org/P69734 and previous config saved to /var/cache/conftool/dbconfig/20241014-062249-arnaudb.json
- 06:22 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1169.eqiad.wmnet with reason: Maintenance
- 06:22 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1169.eqiad.wmnet with reason: Maintenance
- 06:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P69733 and previous config saved to /var/cache/conftool/dbconfig/20241014-062135-arnaudb.json
- 06:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 06:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 06:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
- 06:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
- 06:20 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2147.codfw.wmnet with reason: Maintenance
- 06:20 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2147.codfw.wmnet with reason: Maintenance
- 04:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
- 04:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
- 04:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
- 04:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
- 04:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T376905)', diff saved to https://phabricator.wikimedia.org/P69732 and previous config saved to /var/cache/conftool/dbconfig/20241014-042443-ladsgroup.json
- 04:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P69731 and previous config saved to /var/cache/conftool/dbconfig/20241014-040936-ladsgroup.json
- 03:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P69730 and previous config saved to /var/cache/conftool/dbconfig/20241014-035429-ladsgroup.json
- 03:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T376905)', diff saved to https://phabricator.wikimedia.org/P69729 and previous config saved to /var/cache/conftool/dbconfig/20241014-033922-ladsgroup.json
- 03:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1230 (T376905)', diff saved to https://phabricator.wikimedia.org/P69728 and previous config saved to /var/cache/conftool/dbconfig/20241014-033237-ladsgroup.json
- 03:32 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
- 03:32 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
- 03:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
- 03:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
- 03:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T376905)', diff saved to https://phabricator.wikimedia.org/P69727 and previous config saved to /var/cache/conftool/dbconfig/20241014-032710-ladsgroup.json
- 03:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P69726 and previous config saved to /var/cache/conftool/dbconfig/20241014-031203-ladsgroup.json
- 02:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P69725 and previous config saved to /var/cache/conftool/dbconfig/20241014-025656-ladsgroup.json
- 02:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T376905)', diff saved to https://phabricator.wikimedia.org/P69724 and previous config saved to /var/cache/conftool/dbconfig/20241014-024149-ladsgroup.json
- 02:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1213 (T376905)', diff saved to https://phabricator.wikimedia.org/P69723 and previous config saved to /var/cache/conftool/dbconfig/20241014-023616-ladsgroup.json
- 02:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
- 02:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
- 02:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T376905)', diff saved to https://phabricator.wikimedia.org/P69722 and previous config saved to /var/cache/conftool/dbconfig/20241014-023551-ladsgroup.json
- 02:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P69721 and previous config saved to /var/cache/conftool/dbconfig/20241014-022044-ladsgroup.json
- 02:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P69720 and previous config saved to /var/cache/conftool/dbconfig/20241014-020537-ladsgroup.json
- 01:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T376905)', diff saved to https://phabricator.wikimedia.org/P69719 and previous config saved to /var/cache/conftool/dbconfig/20241014-015030-ladsgroup.json
- 01:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1210 (T376905)', diff saved to https://phabricator.wikimedia.org/P69718 and previous config saved to /var/cache/conftool/dbconfig/20241014-014435-ladsgroup.json
- 01:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
- 01:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
- 01:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T376905)', diff saved to https://phabricator.wikimedia.org/P69717 and previous config saved to /var/cache/conftool/dbconfig/20241014-014410-ladsgroup.json
- 01:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P69716 and previous config saved to /var/cache/conftool/dbconfig/20241014-012903-ladsgroup.json
- 01:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P69715 and previous config saved to /var/cache/conftool/dbconfig/20241014-011356-ladsgroup.json
- 00:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T376905)', diff saved to https://phabricator.wikimedia.org/P69714 and previous config saved to /var/cache/conftool/dbconfig/20241014-005849-ladsgroup.json
- 00:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1200 (T376905)', diff saved to https://phabricator.wikimedia.org/P69713 and previous config saved to /var/cache/conftool/dbconfig/20241014-005056-ladsgroup.json
- 00:50 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
- 00:50 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
- 00:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T376905)', diff saved to https://phabricator.wikimedia.org/P69712 and previous config saved to /var/cache/conftool/dbconfig/20241014-005042-ladsgroup.json
- 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P69711 and previous config saved to /var/cache/conftool/dbconfig/20241014-003534-ladsgroup.json
- 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P69710 and previous config saved to /var/cache/conftool/dbconfig/20241014-002027-ladsgroup.json
- 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T376905)', diff saved to https://phabricator.wikimedia.org/P69709 and previous config saved to /var/cache/conftool/dbconfig/20241014-000520-ladsgroup.json
2024-10-13
- 23:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1185 (T376905)', diff saved to https://phabricator.wikimedia.org/P69708 and previous config saved to /var/cache/conftool/dbconfig/20241013-235726-ladsgroup.json
- 23:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
- 23:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
- 23:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T376905)', diff saved to https://phabricator.wikimedia.org/P69707 and previous config saved to /var/cache/conftool/dbconfig/20241013-235701-ladsgroup.json
- 23:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P69706 and previous config saved to /var/cache/conftool/dbconfig/20241013-234154-ladsgroup.json
- 23:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P69705 and previous config saved to /var/cache/conftool/dbconfig/20241013-232647-ladsgroup.json
- 23:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T376905)', diff saved to https://phabricator.wikimedia.org/P69704 and previous config saved to /var/cache/conftool/dbconfig/20241013-231140-ladsgroup.json
- 23:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1161 (T376905)', diff saved to https://phabricator.wikimedia.org/P69703 and previous config saved to /var/cache/conftool/dbconfig/20241013-230403-ladsgroup.json
- 23:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 23:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 23:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
- 23:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
- 12:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: maintenance
- 12:12 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: maintenance
- 12:11 arnaudb@cumin1002: dbctl commit (dc=all): 'depool db2147', diff saved to https://phabricator.wikimedia.org/P69702 and previous config saved to /var/cache/conftool/dbconfig/20241013-121154-arnaudb.json
- 10:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1216.eqiad.wmnet with reason: Maintenance
- 10:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1216.eqiad.wmnet with reason: Maintenance
- 10:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T367856)', diff saved to https://phabricator.wikimedia.org/P69701 and previous config saved to /var/cache/conftool/dbconfig/20241013-102205-ladsgroup.json
- 10:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P69700 and previous config saved to /var/cache/conftool/dbconfig/20241013-100658-ladsgroup.json
- 09:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P69699 and previous config saved to /var/cache/conftool/dbconfig/20241013-095151-ladsgroup.json
- 09:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T367856)', diff saved to https://phabricator.wikimedia.org/P69698 and previous config saved to /var/cache/conftool/dbconfig/20241013-093644-ladsgroup.json
2024-10-11
- 22:18 btullis@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling reboot on P{cephosd100[3-5]*} and (A:cephosd)
- 21:38 btullis@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling reboot on P{cephosd100[3-5]*} and (A:cephosd)
- 21:36 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1002.eqiad.wmnet
- 21:26 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host cephosd1002.eqiad.wmnet
- 21:24 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1001.eqiad.wmnet
- 21:14 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host cephosd1001.eqiad.wmnet
- 16:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb2004-dev.codfw.wmnet with OS bookworm
- 16:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 16:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 16:49 btullis@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling reboot on A:cephosd
- 16:40 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@c1d2914]: bump section topics to v0.16.0 (duration: 00m 42s)
- 16:39 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@c1d2914]: bump section topics to v0.16.0
- 16:38 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@c1d2914]: bump section topics to v0.16.0 (duration: 01m 06s)
- 16:38 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@c1d2914]: bump section topics to v0.16.0
- 16:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb2004-dev.codfw.wmnet with reason: host reimage
- 16:34 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb2004-dev.codfw.wmnet with reason: host reimage
- 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
- 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
- 16:14 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
- 16:11 kcvelaga@deploy2002: Finished deploy [airflow-dags/analytics_product@1fb69c4]: T376456 (duration: 01m 15s)
- 16:10 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 16:10 kcvelaga@deploy2002: Started deploy [airflow-dags/analytics_product@1fb69c4]: T376456
- 15:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 15:40 btullis@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling reboot on A:cephosd
- 15:37 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:37 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for codfw cloudgw - cmooney@cumin1002"
- 15:37 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for codfw cloudgw - cmooney@cumin1002"
- 15:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 15:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 15:34 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 15:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 14:48 eevans@deploy2002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
- 14:48 eevans@deploy2002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
- 14:47 urandom: upgrading data-gateway to v1.0.10
- 14:46 eevans@deploy2002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
- 14:46 eevans@deploy2002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
- 14:39 eevans@deploy2002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
- 14:38 eevans@deploy2002: helmfile [staging] START helmfile.d/services/data-gateway: apply
- 14:31 andrewtavis-wmde@deploy2002: Finished deploy [airflow-dags/wmde@c9a2532]: (no justification provided) (duration: 00m 25s)
- 14:30 andrewtavis-wmde@deploy2002: Started deploy [airflow-dags/wmde@c9a2532]: (no justification provided)
- 13:59 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 100%: T376988', diff saved to https://phabricator.wikimedia.org/P69695 and previous config saved to /var/cache/conftool/dbconfig/20241011-135903-arnaudb.json
- 13:46 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 13:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 75%: T376988', diff saved to https://phabricator.wikimedia.org/P69694 and previous config saved to /var/cache/conftool/dbconfig/20241011-134357-arnaudb.json
- 13:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 50%: T376988', diff saved to https://phabricator.wikimedia.org/P69693 and previous config saved to /var/cache/conftool/dbconfig/20241011-132852-arnaudb.json
- 13:13 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 25%: T376988', diff saved to https://phabricator.wikimedia.org/P69692 and previous config saved to /var/cache/conftool/dbconfig/20241011-131347-arnaudb.json
- 13:13 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "renamed k8s prefixes descriptions in Netbox - ayounsi@cumin1002"
- 13:12 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "renamed k8s prefixes descriptions in Netbox - ayounsi@cumin1002"
- 13:08 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 12:58 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 10%: T376988', diff saved to https://phabricator.wikimedia.org/P69691 and previous config saved to /var/cache/conftool/dbconfig/20241011-125841-arnaudb.json
- 12:43 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 5%: T376988', diff saved to https://phabricator.wikimedia.org/P69690 and previous config saved to /var/cache/conftool/dbconfig/20241011-124336-arnaudb.json
- 12:37 hashar: Restarting Gerrit
- 12:34 akosiaris@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts scandium.eqiad.wmnet
- 12:34 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:34 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: scandium.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - akosiaris@cumin1002"
- 12:34 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: scandium.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - akosiaris@cumin1002"
- 12:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 2%: T376988', diff saved to https://phabricator.wikimedia.org/P69688 and previous config saved to /var/cache/conftool/dbconfig/20241011-122830-arnaudb.json
- 12:13 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 1%: T376988', diff saved to https://phabricator.wikimedia.org/P69687 and previous config saved to /var/cache/conftool/dbconfig/20241011-121325-arnaudb.json
- 11:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 (T367856)', diff saved to https://phabricator.wikimedia.org/P69686 and previous config saved to /var/cache/conftool/dbconfig/20241011-114446-ladsgroup.json
- 11:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1214.eqiad.wmnet with reason: Maintenance
- 11:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1214.eqiad.wmnet with reason: Maintenance
- 11:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T367856)', diff saved to https://phabricator.wikimedia.org/P69685 and previous config saved to /var/cache/conftool/dbconfig/20241011-114424-ladsgroup.json
- 11:36 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
- 11:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P69684 and previous config saved to /var/cache/conftool/dbconfig/20241011-112917-ladsgroup.json
- 11:27 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-worker2092.codfw.wmnet
- 11:27 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for wikikube-worker2092.codfw.wmnet
- 11:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2092.codfw.wmnet
- 11:26 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2092.codfw.wmnet
- 11:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2092.codfw.wmnet with OS bullseye
- 11:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P69683 and previous config saved to /var/cache/conftool/dbconfig/20241011-111410-ladsgroup.json
- 11:02 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test1001.eqiad.wmnet
- 10:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T367856)', diff saved to https://phabricator.wikimedia.org/P69682 and previous config saved to /var/cache/conftool/dbconfig/20241011-105903-ladsgroup.json
- 10:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2092.codfw.wmnet with reason: host reimage
- 10:57 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host acmechief-test1001.eqiad.wmnet
- 10:56 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test2001.codfw.wmnet
- 10:56 cgoubert@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 10:55 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2092.codfw.wmnet with reason: host reimage
- 10:53 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host acmechief-test2001.codfw.wmnet
- 10:50 brouberol@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling reboot on A:cephosd
- 10:50 fabfur: enabled puppet on R:acme_chief::cert for T376800
- 10:50 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
- 10:47 fabfur@cumin1002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host acmechief2002.codfw.wmnet
- 10:44 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host acmechief2002.codfw.wmnet
- 10:44 fabfur: rebooting acmechief1002|2002 (sequentially) (T376800)
- 10:37 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief1002.eqiad.wmnet
- 10:37 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host acmechief1002.eqiad.wmnet
- 10:35 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2092.codfw.wmnet with OS bullseye
- 10:34 fabfur: disabled puppet on acmechief1002 (T376800)
- 10:33 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2175.codfw.wmnet with reason: index corruption
- 10:33 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2175.codfw.wmnet with reason: index corruption
- 10:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2092.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTARTand with Dell SCP reboot policy GRACEFUL
- 10:27 jynus@cumin1002: dbctl commit (dc=all): 'depool db2175', diff saved to https://phabricator.wikimedia.org/P69680 and previous config saved to /var/cache/conftool/dbconfig/20241011-102706-jynus.json
- 10:26 fabfur: disabling puppet on R:acme_chief::cert for T376800
- 10:23 cgoubert@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker2092.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTARTand with Dell SCP reboot policy GRACEFUL
- 09:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 (T367856)', diff saved to https://phabricator.wikimedia.org/P69678 and previous config saved to /var/cache/conftool/dbconfig/20241011-095847-ladsgroup.json
- 09:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1211.eqiad.wmnet with reason: Maintenance
- 09:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1211.eqiad.wmnet with reason: Maintenance
- 09:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T367856)', diff saved to https://phabricator.wikimedia.org/P69677 and previous config saved to /var/cache/conftool/dbconfig/20241011-095826-ladsgroup.json
- 09:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P69676 and previous config saved to /var/cache/conftool/dbconfig/20241011-094319-ladsgroup.json
- 09:41 brouberol@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling reboot on A:cephosd
- 09:38 akosiaris@cumin1002: START - Cookbook sre.hosts.decommission for hosts scandium.eqiad.wmnet
- 09:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P69675 and previous config saved to /var/cache/conftool/dbconfig/20241011-092812-ladsgroup.json
- 09:27 Dreamy_Jazz: Restarted MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
- 09:18 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1176.eqiad.wmnet with OS bullseye
- 09:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T367856)', diff saved to https://phabricator.wikimedia.org/P69674 and previous config saved to /var/cache/conftool/dbconfig/20241011-091305-ladsgroup.json
- 08:19 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 08:17 jelto@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 08:12 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 08:10 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1177.eqiad.wmnet with OS bullseye
- 08:10 jelto@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 08:02 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1176.eqiad.wmnet with OS bullseye
- 08:00 moritzm: upload ircstream 0.13.0+wmf12u2 to apt.wikimedia.org (sync to latest git and the async_broadcast feature branch) T376014
- 07:59 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1177.eqiad.wmnet with OS bullseye
- 07:56 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1177.eqiad.wmnet with OS bullseye
- 02:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T367781)', diff saved to https://phabricator.wikimedia.org/P69673 and previous config saved to /var/cache/conftool/dbconfig/20241011-021156-arnaudb.json
- 01:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P69672 and previous config saved to /var/cache/conftool/dbconfig/20241011-015649-arnaudb.json
- 01:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P69671 and previous config saved to /var/cache/conftool/dbconfig/20241011-014142-arnaudb.json
- 01:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T367781)', diff saved to https://phabricator.wikimedia.org/P69670 and previous config saved to /var/cache/conftool/dbconfig/20241011-012635-arnaudb.json
- 01:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2237 (T367781)', diff saved to https://phabricator.wikimedia.org/P69669 and previous config saved to /var/cache/conftool/dbconfig/20241011-012424-arnaudb.json
- 01:24 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2237.codfw.wmnet with reason: Maintenance
- 01:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2237.codfw.wmnet with reason: Maintenance
- 01:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P69668 and previous config saved to /var/cache/conftool/dbconfig/20241011-012401-arnaudb.json
- 01:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P69667 and previous config saved to /var/cache/conftool/dbconfig/20241011-010854-arnaudb.json
- 00:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P69666 and previous config saved to /var/cache/conftool/dbconfig/20241011-005347-arnaudb.json
- 00:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P69665 and previous config saved to /var/cache/conftool/dbconfig/20241011-003840-arnaudb.json
2024-10-10
- 23:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P69664 and previous config saved to /var/cache/conftool/dbconfig/20241010-233814-arnaudb.json
- 23:38 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
- 23:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
- 23:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T367781)', diff saved to https://phabricator.wikimedia.org/P69663 and previous config saved to /var/cache/conftool/dbconfig/20241010-233752-arnaudb.json
- 23:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P69662 and previous config saved to /var/cache/conftool/dbconfig/20241010-232245-arnaudb.json
- 23:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P69661 and previous config saved to /var/cache/conftool/dbconfig/20241010-230738-arnaudb.json
- 22:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T367781)', diff saved to https://phabricator.wikimedia.org/P69660 and previous config saved to /var/cache/conftool/dbconfig/20241010-225231-arnaudb.json
- 22:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2210 (T367781)', diff saved to https://phabricator.wikimedia.org/P69659 and previous config saved to /var/cache/conftool/dbconfig/20241010-225019-arnaudb.json
- 22:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2210.codfw.wmnet with reason: Maintenance
- 22:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2210.codfw.wmnet with reason: Maintenance
- 22:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T367781)', diff saved to https://phabricator.wikimedia.org/P69658 and previous config saved to /var/cache/conftool/dbconfig/20241010-224957-arnaudb.json
- 22:37 cstone: payments-wiki upgraded from ebb42c67 to 40e4a592
- 22:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P69657 and previous config saved to /var/cache/conftool/dbconfig/20241010-223450-arnaudb.json
- 22:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P69656 and previous config saved to /var/cache/conftool/dbconfig/20241010-221943-arnaudb.json
- 22:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T367781)', diff saved to https://phabricator.wikimedia.org/P69655 and previous config saved to /var/cache/conftool/dbconfig/20241010-220437-arnaudb.json
- 22:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2206 (T367781)', diff saved to https://phabricator.wikimedia.org/P69654 and previous config saved to /var/cache/conftool/dbconfig/20241010-220125-arnaudb.json
- 22:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2206.codfw.wmnet with reason: Maintenance
- 22:01 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2206.codfw.wmnet with reason: Maintenance
- 22:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2199.codfw.wmnet with reason: Maintenance
- 22:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2199.codfw.wmnet with reason: Maintenance
- 22:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P69653 and previous config saved to /var/cache/conftool/dbconfig/20241010-220043-arnaudb.json
- 21:52 jforrester@deploy2002: Finished deploy [integration/docroot@ff9e25a]: Add Codex PHP doc and source code link, for T375939 (duration: 00m 08s)
- 21:52 jforrester@deploy2002: Started deploy [integration/docroot@ff9e25a]: Add Codex PHP doc and source code link, for T375939
- 21:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P69652 and previous config saved to /var/cache/conftool/dbconfig/20241010-214536-arnaudb.json
- 21:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P69651 and previous config saved to /var/cache/conftool/dbconfig/20241010-213029-arnaudb.json
- 21:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P69650 and previous config saved to /var/cache/conftool/dbconfig/20241010-211522-arnaudb.json
- 21:05 aqu@deploy2002: Finished deploy [airflow-dags/analytics@c9a2532]: Webrequest-Refine fix [airflow-dags@c9a2532e] (duration: 00m 51s)
- 21:04 aqu@deploy2002: Started deploy [airflow-dags/analytics@c9a2532]: Webrequest-Refine fix [airflow-dags@c9a2532e]
- 21:04 thcipriani@deploy2002: Finished scap sync-world: Backport for Update VE core submodule to master (c98f3a542) (T376901) (duration: 08m 56s)
- 20:59 thcipriani@deploy2002: jforrester, thcipriani: Continuing with sync
- 20:57 thcipriani@deploy2002: jforrester, thcipriani: Backport for Update VE core submodule to master (c98f3a542) (T376901) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:55 thcipriani@deploy2002: Started scap sync-world: Backport for Update VE core submodule to master (c98f3a542) (T376901)
- 20:27 eileen: config revision changed from 150b02a9 to 3c6d2054
- 20:23 thcipriani@deploy2002: Finished scap sync-world: Backport for REST: Make experimental endpoints available on beta and testwiki (T375512) (duration: 08m 34s)
- 20:18 thcipriani@deploy2002: bpirkle, thcipriani: Continuing with sync
- 20:16 thcipriani@deploy2002: bpirkle, thcipriani: Backport for REST: Make experimental endpoints available on beta and testwiki (T375512) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P69649 and previous config saved to /var/cache/conftool/dbconfig/20241010-201456-arnaudb.json
- 20:14 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2172.codfw.wmnet with reason: Maintenance
- 20:14 thcipriani@deploy2002: Started scap sync-world: Backport for REST: Make experimental endpoints available on beta and testwiki (T375512)
- 20:14 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2172.codfw.wmnet with reason: Maintenance
- 20:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P69648 and previous config saved to /var/cache/conftool/dbconfig/20241010-201433-arnaudb.json
- 20:05 eileen: civicrm upgraded from 07dee21c to ff3144dd
- 19:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P69647 and previous config saved to /var/cache/conftool/dbconfig/20241010-195926-arnaudb.json
- 19:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P69646 and previous config saved to /var/cache/conftool/dbconfig/20241010-194419-arnaudb.json
- 19:43 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@4b69f50]: Stage Webrequest-Refine fix on test cluster [airflow-dags@4b69f503] (duration: 00m 13s)
- 19:43 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@4b69f50]: Stage Webrequest-Refine fix on test cluster [airflow-dags@4b69f503]
- 19:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P69645 and previous config saved to /var/cache/conftool/dbconfig/20241010-192912-arnaudb.json
- 19:23 rzl@deploy2002: Finished scap sync-world: chart version bump for 1078720 (duration: 02m 09s)
- 19:21 rzl@deploy2002: Started scap sync-world: chart version bump for 1078720
- 19:06 eileen: config revision changed from ae4a5be9 to 150b02a9
- 18:50 papaul: maintenance on mr1-eqiad complete
- 18:44 eileen: tools upgraded from 632bf430 to 62f2d170
- 18:29 eileen: tools upgraded from e9c05e30 to 632bf430
- 18:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P69644 and previous config saved to /var/cache/conftool/dbconfig/20241010-182846-arnaudb.json
- 18:28 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 18:28 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 18:28 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
- 18:28 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
- 18:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T367781)', diff saved to https://phabricator.wikimedia.org/P69643 and previous config saved to /var/cache/conftool/dbconfig/20241010-182808-arnaudb.json
- 18:14 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
- 18:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P69642 and previous config saved to /var/cache/conftool/dbconfig/20241010-181301-arnaudb.json
- 18:08 jhathaway@cumin1002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
- 18:00 papaul: ongoing maintenance on mr1-eqiad
- 17:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P69641 and previous config saved to /var/cache/conftool/dbconfig/20241010-175754-arnaudb.json
- 17:57 root@cumin1002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for dbprov1001.eqiad.wmnet: Renew puppet certificate - root@cumin1002
- 17:54 root@cumin1002: START - Cookbook sre.puppet.renew-cert for dbprov1001.eqiad.wmnet: Renew puppet certificate - root@cumin1002
- 17:47 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool echostore in eqiad: Repooling echostore after migration to service mesh - T376766
- 17:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T367781)', diff saved to https://phabricator.wikimedia.org/P69640 and previous config saved to /var/cache/conftool/dbconfig/20241010-174247-arnaudb.json
- 17:42 swfrench@cumin2002: START - Cookbook sre.discovery.service-route pool echostore in eqiad: Repooling echostore after migration to service mesh - T376766
- 17:39 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/echostore: apply
- 17:39 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/echostore: apply
- 17:38 swfrench-wmf: removing echostore eqiad deployment (depooled) to unblock breaking change - T376766
- 17:34 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
- 17:34 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
- 17:34 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
- 17:33 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
- 17:33 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
- 17:32 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
- 17:25 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool echostore in eqiad: Depooling echostore for migration to service mesh - T376766
- 17:20 swfrench@cumin2002: START - Cookbook sre.discovery.service-route depool echostore in eqiad: Depooling echostore for migration to service mesh - T376766
- 17:04 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 17:04 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 17:04 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool echostore in codfw: Repooling echostore after migration to service mesh - T376766
- 16:59 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1001.eqiad.wmnet
- 16:58 swfrench@cumin2002: START - Cookbook sre.discovery.service-route pool echostore in codfw: Repooling echostore after migration to service mesh - T376766
- 16:53 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 16:53 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 16:53 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 16:51 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 16:51 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage1003.eqiad.wmnet
- 16:51 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage1003.eqiad.wmnet
- 16:50 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/echostore: apply
- 16:50 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/echostore: apply
- 16:49 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host cephosd1001.eqiad.wmnet
- 16:47 swfrench-wmf: removing echostore codfw deployment (depooled) to unblock breaking change - T376766
- 16:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T367781)', diff saved to https://phabricator.wikimedia.org/P69639 and previous config saved to /var/cache/conftool/dbconfig/20241010-164221-arnaudb.json
- 16:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2147.codfw.wmnet with reason: Maintenance
- 16:42 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2147.codfw.wmnet with reason: Maintenance
- 16:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T367781)', diff saved to https://phabricator.wikimedia.org/P69638 and previous config saved to /var/cache/conftool/dbconfig/20241010-164159-arnaudb.json
- 16:40 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1003.eqiad.wmnet with OS bookworm
- 16:30 jhathaway@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['sretest1003.eqiad.wmnet']
- 16:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P69637 and previous config saved to /var/cache/conftool/dbconfig/20241010-162652-arnaudb.json
- 16:23 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1003.eqiad.wmnet with reason: host reimage
- 16:23 jhathaway@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003.eqiad.wmnet']
- 16:21 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1003.eqiad.wmnet with reason: host reimage
- 16:18 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool echostore in codfw: Depooling echostore for migration to service mesh - T376766
- 16:13 swfrench@cumin2002: START - Cookbook sre.discovery.service-route depool echostore in codfw: Depooling echostore for migration to service mesh - T376766
- 16:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P69636 and previous config saved to /var/cache/conftool/dbconfig/20241010-161145-arnaudb.json
- 16:04 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestage1003.eqiad.wmnet with OS bookworm
- 16:03 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage1003.eqiad.wmnet
- 16:02 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage1003.eqiad.wmnet
- 15:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T367781)', diff saved to https://phabricator.wikimedia.org/P69635 and previous config saved to /var/cache/conftool/dbconfig/20241010-155638-arnaudb.json
- 15:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2140 (T367781)', diff saved to https://phabricator.wikimedia.org/P69634 and previous config saved to /var/cache/conftool/dbconfig/20241010-155426-arnaudb.json
- 15:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2140.codfw.wmnet with reason: Maintenance
- 15:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2140.codfw.wmnet with reason: Maintenance
- 15:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2139.codfw.wmnet with reason: Maintenance
- 15:53 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2139.codfw.wmnet with reason: Maintenance
- 15:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T367781)', diff saved to https://phabricator.wikimedia.org/P69633 and previous config saved to /var/cache/conftool/dbconfig/20241010-155345-arnaudb.json
- 15:53 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 15:47 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bookworm
- 15:40 papaul: mr1-drmrs maintenance complete
- 15:39 dancy@deploy2002: Installation of scap version "4.110.0" completed for 211 hosts
- 15:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P69632 and previous config saved to /var/cache/conftool/dbconfig/20241010-153838-arnaudb.json
- 15:35 dancy@deploy2002: Installing scap version "4.110.0" for 211 hosts
- 15:33 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 15:28 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
- 15:25 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
- 15:23 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on A:ncredir
- 15:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P69631 and previous config saved to /var/cache/conftool/dbconfig/20241010-152331-arnaudb.json
- 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:13 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 15:13 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 15:13 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 15:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T367781)', diff saved to https://phabricator.wikimedia.org/P69630 and previous config saved to /var/cache/conftool/dbconfig/20241010-150824-arnaudb.json
- 15:08 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
- 15:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2136 (T367781)', diff saved to https://phabricator.wikimedia.org/P69629 and previous config saved to /var/cache/conftool/dbconfig/20241010-150512-arnaudb.json
- 15:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2136.codfw.wmnet with reason: Maintenance
- 15:04 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2136.codfw.wmnet with reason: Maintenance
- 15:04 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 15:04 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 15:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T367781)', diff saved to https://phabricator.wikimedia.org/P69628 and previous config saved to /var/cache/conftool/dbconfig/20241010-150433-arnaudb.json
- 15:02 papaul: ongoing maintenance on mr1-drmrs
- 14:56 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@4b69f50]: Revert previous staging of Refine fixes on test cluster [airflow-dags@4b69f503] (duration: 00m 13s)
- 14:56 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@4b69f50]: Revert previous staging of Refine fixes on test cluster [airflow-dags@4b69f503]
- 14:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P69626 and previous config saved to /var/cache/conftool/dbconfig/20241010-144926-arnaudb.json
- 14:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T367781)', diff saved to https://phabricator.wikimedia.org/P69625 and previous config saved to /var/cache/conftool/dbconfig/20241010-143713-arnaudb.json
- 14:34 jhathaway@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['sretest1002.eqiad.wmnet']
- 14:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P69624 and previous config saved to /var/cache/conftool/dbconfig/20241010-143419-arnaudb.json
- 14:28 jhathaway@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1002.eqiad.wmnet']
- 14:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P69623 and previous config saved to /var/cache/conftool/dbconfig/20241010-142206-arnaudb.json
- 14:19 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503] (duration: 00m 13s)
- 14:19 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503]
- 14:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T367781)', diff saved to https://phabricator.wikimedia.org/P69622 and previous config saved to /var/cache/conftool/dbconfig/20241010-141912-arnaudb.json
- 14:18 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 14:18 jelto@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 14:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1249 (T367781)', diff saved to https://phabricator.wikimedia.org/P69621 and previous config saved to /var/cache/conftool/dbconfig/20241010-141704-arnaudb.json
- 14:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1249.eqiad.wmnet with reason: Maintenance
- 14:16 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 14:16 sukhe@cumin1002: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on A:ncredir
- 14:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1249.eqiad.wmnet with reason: Maintenance
- 14:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T367781)', diff saved to https://phabricator.wikimedia.org/P69620 and previous config saved to /var/cache/conftool/dbconfig/20241010-141642-arnaudb.json
- 14:16 moritzm: failover Ganeti masters in magru to secondary node
- 14:12 jelto@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 14:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P69619 and previous config saved to /var/cache/conftool/dbconfig/20241010-140659-arnaudb.json
- 14:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7004.magru.wmnet
- 14:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7004.magru.wmnet
- 14:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P69618 and previous config saved to /var/cache/conftool/dbconfig/20241010-140135-arnaudb.json
- 13:59 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on A:dnsbox and A:ulsfo and A:dnsbox
- 13:59 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns4004.wikimedia.org
- 13:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T367781)', diff saved to https://phabricator.wikimedia.org/P69617 and previous config saved to /var/cache/conftool/dbconfig/20241010-135152-arnaudb.json
- 13:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7004.magru.wmnet
- 13:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1230 (T367781)', diff saved to https://phabricator.wikimedia.org/P69616 and previous config saved to /var/cache/conftool/dbconfig/20241010-134926-arnaudb.json
- 13:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1230.eqiad.wmnet with reason: Maintenance
- 13:49 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1230.eqiad.wmnet with reason: Maintenance
- 13:48 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns4004.wikimedia.org
- 13:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P69615 and previous config saved to /var/cache/conftool/dbconfig/20241010-134628-arnaudb.json
- 13:46 Lucas_WMDE: UTC afternoon backport+config window done
- 13:45 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Use ?? instead of default value in getRawVal() (T376245) (duration: 07m 16s)
- 13:43 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7004.magru.wmnet
- 13:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7003.magru.wmnet
- 13:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7003.magru.wmnet
- 13:41 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, fomafix: Continuing with sync
- 13:41 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, fomafix: Backport for Use ?? instead of default value in getRawVal() (T376245) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:38 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Use ?? instead of default value in getRawVal() (T376245)
- 13:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Turn on mobile support for Parsoid Read Views (but not on talk pages) (T269499 T376048), Turn on Parsoid Selective Update metrics (take 2) (T371713 T376433) (duration: 16m 09s)
- 13:36 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns4003.wikimedia.org
- 13:35 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dns4003.wikimedia.org
- 13:35 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for dns4003.wikimedia.org
- 13:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7003.magru.wmnet
- 13:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, cscott: Continuing with sync
- 13:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T367781)', diff saved to https://phabricator.wikimedia.org/P69613 and previous config saved to /var/cache/conftool/dbconfig/20241010-133121-arnaudb.json
- 13:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1248 (T367781)', diff saved to https://phabricator.wikimedia.org/P69612 and previous config saved to /var/cache/conftool/dbconfig/20241010-133113-arnaudb.json
- 13:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1248.eqiad.wmnet with reason: Maintenance
- 13:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1248.eqiad.wmnet with reason: Maintenance
- 13:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T367781)', diff saved to https://phabricator.wikimedia.org/P69611 and previous config saved to /var/cache/conftool/dbconfig/20241010-133049-arnaudb.json
- 13:23 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7003.magru.wmnet
- 13:23 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, cscott: Backport for Turn on mobile support for Parsoid Read Views (but not on talk pages) (T269499 T376048), Turn on Parsoid Selective Update metrics (take 2) (T371713 T376433) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:21 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Turn on mobile support for Parsoid Read Views (but not on talk pages) (T269499 T376048), Turn on Parsoid Selective Update metrics (take 2) (T371713 T376433)
- 13:17 dreamyjazz@deploy2002: Finished scap sync-world: Backport for QuickSurvey.vue: Support using HTML in thank you message (T376517), extension.json: Add mediawiki.jqueryMsg to dependencies for ext.quicksurveys.lib (T376517) (duration: 09m 12s)
- 13:17 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns4003.wikimedia.org
- 13:17 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on A:dnsbox and A:ulsfo and A:dnsbox
- 13:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P69610 and previous config saved to /var/cache/conftool/dbconfig/20241010-131542-arnaudb.json
- 13:12 dreamyjazz@deploy2002: dreamyjazz, kharlan: Continuing with sync
- 13:11 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage1004.eqiad.wmnet
- 13:11 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage1004.eqiad.wmnet
- 13:10 dreamyjazz@deploy2002: dreamyjazz, kharlan: Backport for QuickSurvey.vue: Support using HTML in thank you message (T376517), extension.json: Add mediawiki.jqueryMsg to dependencies for ext.quicksurveys.lib (T376517) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2034.codfw.wmnet
- 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2034.codfw.wmnet
- 13:08 dreamyjazz@deploy2002: Started scap sync-world: Backport for QuickSurvey.vue: Support using HTML in thank you message (T376517), extension.json: Add mediawiki.jqueryMsg to dependencies for ext.quicksurveys.lib (T376517)
- 13:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2034.codfw.wmnet
- 13:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2034.codfw.wmnet
- 13:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast4005.wikimedia.org
- 13:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow1002.eqiad.wmnet
- 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2034.codfw.wmnet
- 13:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P69609 and previous config saved to /var/cache/conftool/dbconfig/20241010-130035-arnaudb.json
- 12:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow1002.eqiad.wmnet
- 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow2003.codfw.wmnet
- 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2034.codfw.wmnet
- 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast4005.wikimedia.org
- 12:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow2003.codfw.wmnet
- 12:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow3003.esams.wmnet
- 12:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow3003.esams.wmnet
- 12:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T367781)', diff saved to https://phabricator.wikimedia.org/P69608 and previous config saved to /var/cache/conftool/dbconfig/20241010-124528-arnaudb.json
- 12:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1247 (T367781)', diff saved to https://phabricator.wikimedia.org/P69607 and previous config saved to /var/cache/conftool/dbconfig/20241010-124319-arnaudb.json
- 12:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1247.eqiad.wmnet with reason: Maintenance
- 12:43 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1247.eqiad.wmnet with reason: Maintenance
- 12:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1245.eqiad.wmnet with reason: Maintenance
- 12:42 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1245.eqiad.wmnet with reason: Maintenance
- 12:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T367781)', diff saved to https://phabricator.wikimedia.org/P69606 and previous config saved to /var/cache/conftool/dbconfig/20241010-124241-arnaudb.json
- 12:38 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1004.eqiad.wmnet with OS bookworm
- 12:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow4002.ulsfo.wmnet
- 12:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow4002.ulsfo.wmnet
- 12:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P69605 and previous config saved to /var/cache/conftool/dbconfig/20241010-122734-arnaudb.json
- 12:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow5002.eqsin.wmnet
- 12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow5002.eqsin.wmnet
- 12:19 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1004.eqiad.wmnet with reason: host reimage
- 12:16 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1004.eqiad.wmnet with reason: host reimage
- 12:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P69604 and previous config saved to /var/cache/conftool/dbconfig/20241010-121227-arnaudb.json
- 12:00 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestage1004.eqiad.wmnet with OS bookworm
- 11:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T367781)', diff saved to https://phabricator.wikimedia.org/P69603 and previous config saved to /var/cache/conftool/dbconfig/20241010-115720-arnaudb.json
- 11:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P69599 and previous config saved to /var/cache/conftool/dbconfig/20241010-114042-arnaudb.json
- 11:34 zabe@deploy2002: Finished scap sync-world: Backport for s2: Reduce revision-slots cache expiry to 60 seconds (T183490) (duration: 06m 58s)
- 11:29 zabe@deploy2002: zabe: Continuing with sync
- 11:29 zabe@deploy2002: zabe: Backport for s2: Reduce revision-slots cache expiry to 60 seconds (T183490) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 11:27 zabe@deploy2002: Started scap sync-world: Backport for s2: Reduce revision-slots cache expiry to 60 seconds (T183490)
- 11:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet
- 11:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P69598 and previous config saved to /var/cache/conftool/dbconfig/20241010-112535-arnaudb.json
- 11:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet
- 11:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow7001.magru.wmnet
- 11:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow7001.magru.wmnet
- 11:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2008.wikimedia.org
- 11:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2008.wikimedia.org
- 11:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T367781)', diff saved to https://phabricator.wikimedia.org/P69597 and previous config saved to /var/cache/conftool/dbconfig/20241010-111028-arnaudb.json
- 11:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T367781)', diff saved to https://phabricator.wikimedia.org/P69596 and previous config saved to /var/cache/conftool/dbconfig/20241010-110920-arnaudb.json
- 11:09 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1243.eqiad.wmnet with reason: Maintenance
- 11:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1243.eqiad.wmnet with reason: Maintenance
- 11:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T367781)', diff saved to https://phabricator.wikimedia.org/P69595 and previous config saved to /var/cache/conftool/dbconfig/20241010-110857-arnaudb.json
- 11:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2007.codfw.wmnet
- 10:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2007.codfw.wmnet
- 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2006.codfw.wmnet
- 10:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P69594 and previous config saved to /var/cache/conftool/dbconfig/20241010-105350-arnaudb.json
- 10:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2006.codfw.wmnet
- 10:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2005.codfw.wmnet
- 10:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2005.codfw.wmnet
- 10:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2004.codfw.wmnet
- 10:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2004.codfw.wmnet
- 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testhost2001.codfw.wmnet
- 10:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P69593 and previous config saved to /var/cache/conftool/dbconfig/20241010-103843-arnaudb.json
- 10:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testhost2001.codfw.wmnet
- 10:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki2003.codfw.wmnet
- 10:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T367781)', diff saved to https://phabricator.wikimedia.org/P69592 and previous config saved to /var/cache/conftool/dbconfig/20241010-102336-arnaudb.json
- 10:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki2003.codfw.wmnet
- 10:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T367781)', diff saved to https://phabricator.wikimedia.org/P69591 and previous config saved to /var/cache/conftool/dbconfig/20241010-102127-arnaudb.json
- 10:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1242.eqiad.wmnet with reason: Maintenance
- 10:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1242.eqiad.wmnet with reason: Maintenance
- 10:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T367781)', diff saved to https://phabricator.wikimedia.org/P69590 and previous config saved to /var/cache/conftool/dbconfig/20241010-102104-arnaudb.json
- 10:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P69589 and previous config saved to /var/cache/conftool/dbconfig/20241010-100557-arnaudb.json
- 09:54 jayme@cumin1002: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) depool for host kubestage1004.eqiad.wmnet
- 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt1002.wikimedia.org
- 09:52 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage1004.eqiad.wmnet
- 09:52 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage1003.eqiad.wmnet
- 09:52 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage1003.eqiad.wmnet
- 09:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P69587 and previous config saved to /var/cache/conftool/dbconfig/20241010-095050-arnaudb.json
- 09:50 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1003.eqiad.wmnet with OS bookworm
- 09:49 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
- 09:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host apt1002.wikimedia.org
- 09:35 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T367781)', diff saved to https://phabricator.wikimedia.org/P69586 and previous config saved to /var/cache/conftool/dbconfig/20241010-093544-arnaudb.json
- 09:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1241 (T367781)', diff saved to https://phabricator.wikimedia.org/P69585 and previous config saved to /var/cache/conftool/dbconfig/20241010-093335-arnaudb.json
- 09:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1241.eqiad.wmnet with reason: Maintenance
- 09:33 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1241.eqiad.wmnet with reason: Maintenance
- 09:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T367781)', diff saved to https://phabricator.wikimedia.org/P69584 and previous config saved to /var/cache/conftool/dbconfig/20241010-093313-arnaudb.json
- 09:33 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1003.eqiad.wmnet with reason: host reimage
- 09:30 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1003.eqiad.wmnet with reason: host reimage
- 09:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T367781)', diff saved to https://phabricator.wikimedia.org/P69583 and previous config saved to /var/cache/conftool/dbconfig/20241010-092735-arnaudb.json
- 09:21 aklapper@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.43.0-wmf.26 refs T375657
- 09:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P69582 and previous config saved to /var/cache/conftool/dbconfig/20241010-091806-arnaudb.json
- 09:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt-staging2001.codfw.wmnet
- 09:14 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestage1003.eqiad.wmnet with OS bookworm
- 09:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P69581 and previous config saved to /var/cache/conftool/dbconfig/20241010-091228-arnaudb.json
- 09:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host apt-staging2001.codfw.wmnet
- 09:10 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage1003.eqiad.wmnet
- 09:10 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage1003.eqiad.wmnet
- 09:07 aklapper@deploy2002: Finished scap sync-world: Backport for Revert "Use HTML markup instead of bidi control chars in wiki changes" (T375975 T376814) (duration: 12m 09s)
- 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt2002.wikimedia.org
- 09:03 aklapper@deploy2002: hashar, aklapper: Continuing with sync
- 09:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P69580 and previous config saved to /var/cache/conftool/dbconfig/20241010-090259-arnaudb.json
- 09:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host apt2002.wikimedia.org
- 08:57 aklapper@deploy2002: hashar, aklapper: Backport for Revert "Use HTML markup instead of bidi control chars in wiki changes" (T375975 T376814) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P69579 and previous config saved to /var/cache/conftool/dbconfig/20241010-085721-arnaudb.json
- 08:55 aklapper@deploy2002: Started scap sync-world: Backport for Revert "Use HTML markup instead of bidi control chars in wiki changes" (T375975 T376814)
- 08:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T367781)', diff saved to https://phabricator.wikimedia.org/P69578 and previous config saved to /var/cache/conftool/dbconfig/20241010-084752-arnaudb.json
- 08:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1238 (T367781)', diff saved to https://phabricator.wikimedia.org/P69577 and previous config saved to /var/cache/conftool/dbconfig/20241010-084543-arnaudb.json
- 08:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1238.eqiad.wmnet with reason: Maintenance
- 08:45 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1238.eqiad.wmnet with reason: Maintenance
- 08:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T367781)', diff saved to https://phabricator.wikimedia.org/P69576 and previous config saved to /var/cache/conftool/dbconfig/20241010-084521-arnaudb.json
- 08:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T367781)', diff saved to https://phabricator.wikimedia.org/P69575 and previous config saved to /var/cache/conftool/dbconfig/20241010-084214-arnaudb.json
- 08:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on cloudsw1-b1-codfw.mgmt with reason: prevent bgp alerts firing until CRs configured
- 08:41 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on cloudsw1-b1-codfw.mgmt with reason: prevent bgp alerts firing until CRs configured
- 08:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org
- 08:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1236 (T367781)', diff saved to https://phabricator.wikimedia.org/P69574 and previous config saved to /var/cache/conftool/dbconfig/20241010-084003-arnaudb.json
- 08:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1236.eqiad.wmnet with reason: Maintenance
- 08:39 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1236.eqiad.wmnet with reason: Maintenance
- 08:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org
- 08:33 arnaudb@cumin1002: dbctl commit (dc=all): 'db1236 (re)pooling @ 100%: T376868', diff saved to https://phabricator.wikimedia.org/P69573 and previous config saved to /var/cache/conftool/dbconfig/20241010-083347-arnaudb.json
- 08:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P69572 and previous config saved to /var/cache/conftool/dbconfig/20241010-083013-arnaudb.json
- 08:21 brouberol@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling restart_daemons on P{cephosd1001*} and (A:cephosd)
- 08:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1002.wikimedia.org
- 08:21 brouberol@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling restart_daemons on P{cephosd1001*} and (A:cephosd)
- 08:18 arnaudb@cumin1002: dbctl commit (dc=all): 'db1236 (re)pooling @ 75%: T376868', diff saved to https://phabricator.wikimedia.org/P69571 and previous config saved to /var/cache/conftool/dbconfig/20241010-081841-arnaudb.json
- 08:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1002.wikimedia.org
- 08:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P69570 and previous config saved to /var/cache/conftool/dbconfig/20241010-081506-arnaudb.json
- 08:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 100%: T376867', diff saved to https://phabricator.wikimedia.org/P69569 and previous config saved to /var/cache/conftool/dbconfig/20241010-080711-arnaudb.json
- 08:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1002.eqiad.wmnet
- 08:03 arnaudb@cumin1002: dbctl commit (dc=all): 'db1236 (re)pooling @ 50%: T376868', diff saved to https://phabricator.wikimedia.org/P69568 and previous config saved to /var/cache/conftool/dbconfig/20241010-080336-arnaudb.json
- 08:02 moritzm: irc.wikimedia.org not directs to the ircstream implementation on irc1003.wikimedia.org T376014
- 08:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 08:02 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 08:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T367781)', diff saved to https://phabricator.wikimedia.org/P69567 and previous config saved to /var/cache/conftool/dbconfig/20241010-075959-arnaudb.json
- 07:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T367781)', diff saved to https://phabricator.wikimedia.org/P69566 and previous config saved to /var/cache/conftool/dbconfig/20241010-075951-arnaudb.json
- 07:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 07:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 07:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1221.eqiad.wmnet with reason: Maintenance
- 07:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1221.eqiad.wmnet with reason: Maintenance
- 07:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb1002.eqiad.wmnet
- 07:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T367781)', diff saved to https://phabricator.wikimedia.org/P69565 and previous config saved to /var/cache/conftool/dbconfig/20241010-075911-arnaudb.json
- 07:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1003.eqiad.wmnet
- 07:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 75%: T376867', diff saved to https://phabricator.wikimedia.org/P69564 and previous config saved to /var/cache/conftool/dbconfig/20241010-075206-arnaudb.json
- 07:48 arnaudb@cumin1002: dbctl commit (dc=all): 'db1236 (re)pooling @ 25%: T376868', diff saved to https://phabricator.wikimedia.org/P69563 and previous config saved to /var/cache/conftool/dbconfig/20241010-074831-arnaudb.json
- 07:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1003.eqiad.wmnet
- 07:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-rw2001.wikimedia.org
- 07:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P69562 and previous config saved to /var/cache/conftool/dbconfig/20241010-074404-arnaudb.json
- 07:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-rw2001.wikimedia.org
- 07:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 50%: T376867', diff saved to https://phabricator.wikimedia.org/P69561 and previous config saved to /var/cache/conftool/dbconfig/20241010-073700-arnaudb.json
- 07:34 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudidm2001-dev.codfw.wmnet
- 07:34 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 07:34 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudidm2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1002"
- 07:33 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudidm2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1002"
- 07:33 arnaudb@cumin1002: dbctl commit (dc=all): 'db1236 (re)pooling @ 10%: T376868', diff saved to https://phabricator.wikimedia.org/P69560 and previous config saved to /var/cache/conftool/dbconfig/20241010-073326-arnaudb.json
- 07:33 awight: UTC morning deployments done.
- 07:32 hashar: Stopped gerrit service on gerrit2003.codfw.wmnet since it is not starting up properly | T372804
- 07:32 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 07:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 07:31 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 07:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 07:30 slyngshede@cumin1002: START - Cookbook sre.dns.netbox
- 07:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P69559 and previous config saved to /var/cache/conftool/dbconfig/20241010-072857-arnaudb.json
- 07:28 awight@deploy2002: Finished scap sync-world: Backport for [config] Rename moved gadget name setting (T362771) (duration: 09m 22s)
- 07:25 slyngshede@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudidm2001-dev.codfw.wmnet
- 07:23 awight@deploy2002: awight, wmde-fisch: Continuing with sync
- 07:21 arnaudb@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 25%: T376867', diff saved to https://phabricator.wikimedia.org/P69558 and previous config saved to /var/cache/conftool/dbconfig/20241010-072155-arnaudb.json
- 07:21 awight@deploy2002: awight, wmde-fisch: Backport for [config] Rename moved gadget name setting (T362771) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-rw1001.wikimedia.org
- 07:18 awight@deploy2002: Started scap sync-world: Backport for [config] Rename moved gadget name setting (T362771)
- 07:18 arnaudb@cumin1002: dbctl commit (dc=all): 'db1236 (re)pooling @ 5%: T376868', diff saved to https://phabricator.wikimedia.org/P69557 and previous config saved to /var/cache/conftool/dbconfig/20241010-071820-arnaudb.json
- 07:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1236 T376868', diff saved to https://phabricator.wikimedia.org/P69556 and previous config saved to /var/cache/conftool/dbconfig/20241010-071721-arnaudb.json
- 07:16 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 07:16 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 07:15 slyngshede@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts cloudidm2001-dev.codfw.wmnet
- 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-rw1001.wikimedia.org
- 07:15 slyngshede@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudidm2001-dev.codfw.wmnet
- 07:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote db1181 to s7 primary T376868', diff saved to https://phabricator.wikimedia.org/P69555 and previous config saved to /var/cache/conftool/dbconfig/20241010-071453-arnaudb.json
- 07:14 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 07:14 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 07:14 arnaudb: Starting s7 eqiad failover from db1236 to db1181 - T376868
- 07:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T367781)', diff saved to https://phabricator.wikimedia.org/P69554 and previous config saved to /var/cache/conftool/dbconfig/20241010-071350-arnaudb.json
- 07:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1199 (T367781)', diff saved to https://phabricator.wikimedia.org/P69553 and previous config saved to /var/cache/conftool/dbconfig/20241010-071242-arnaudb.json
- 07:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1199.eqiad.wmnet with reason: Maintenance
- 07:12 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1199.eqiad.wmnet with reason: Maintenance
- 07:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T367781)', diff saved to https://phabricator.wikimedia.org/P69552 and previous config saved to /var/cache/conftool/dbconfig/20241010-071219-arnaudb.json
- 07:08 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 07:08 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 07:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Set db1181 with weight 0 T376868', diff saved to https://phabricator.wikimedia.org/P69551 and previous config saved to /var/cache/conftool/dbconfig/20241010-070843-arnaudb.json
- 07:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s7 T376868
- 07:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s7 T376868
- 07:06 arnaudb@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 10%: T376867', diff saved to https://phabricator.wikimedia.org/P69550 and previous config saved to /var/cache/conftool/dbconfig/20241010-070650-arnaudb.json
- 06:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P69549 and previous config saved to /var/cache/conftool/dbconfig/20241010-065712-arnaudb.json
- 06:56 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
- 06:51 arnaudb@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 5%: T376867', diff saved to https://phabricator.wikimedia.org/P69548 and previous config saved to /var/cache/conftool/dbconfig/20241010-065145-arnaudb.json
- 06:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1230 T376867', diff saved to https://phabricator.wikimedia.org/P69547 and previous config saved to /var/cache/conftool/dbconfig/20241010-065048-arnaudb.json
- 06:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote db1183 to s5 primary T376867', diff saved to https://phabricator.wikimedia.org/P69546 and previous config saved to /var/cache/conftool/dbconfig/20241010-064827-arnaudb.json
- 06:47 arnaudb: Starting s5 eqiad failover from db1230 to db1183 - T376867
- 06:43 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 06:43 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 06:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Set db1183 with weight 0 T376867', diff saved to https://phabricator.wikimedia.org/P69545 and previous config saved to /var/cache/conftool/dbconfig/20241010-064219-arnaudb.json
- 06:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Primary switchover s5 T376867
- 06:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P69544 and previous config saved to /var/cache/conftool/dbconfig/20241010-064206-arnaudb.json
- 06:41 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Primary switchover s5 T376867
- 06:37 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 06:37 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 06:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T367781)', diff saved to https://phabricator.wikimedia.org/P69543 and previous config saved to /var/cache/conftool/dbconfig/20241010-062659-arnaudb.json
- 06:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1190 (T367781)', diff saved to https://phabricator.wikimedia.org/P69542 and previous config saved to /var/cache/conftool/dbconfig/20241010-062450-arnaudb.json
- 06:24 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1190.eqiad.wmnet with reason: Maintenance
- 06:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1190.eqiad.wmnet with reason: Maintenance
- 06:24 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 06:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 06:10 jelto@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
- 06:10 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
- 06:03 XioNoX: cr2-eqsin> request vmhost snapshot - T375961
- 03:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P69541 and previous config saved to /var/cache/conftool/dbconfig/20241010-031553-ladsgroup.json
- 03:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P69540 and previous config saved to /var/cache/conftool/dbconfig/20241010-031531-ladsgroup.json
- 03:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P69539 and previous config saved to /var/cache/conftool/dbconfig/20241010-030048-ladsgroup.json
- 03:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P69538 and previous config saved to /var/cache/conftool/dbconfig/20241010-030025-ladsgroup.json
- 02:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P69537 and previous config saved to /var/cache/conftool/dbconfig/20241010-024543-ladsgroup.json
- 02:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P69536 and previous config saved to /var/cache/conftool/dbconfig/20241010-024519-ladsgroup.json
- 02:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P69535 and previous config saved to /var/cache/conftool/dbconfig/20241010-023037-ladsgroup.json
- 02:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P69534 and previous config saved to /var/cache/conftool/dbconfig/20241010-023014-ladsgroup.json
- 02:02 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqsin [reason: repooling eqsin after cr2-eqsin replaced, T375961]
- 02:02 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqsin [reason: repooling eqsin after cr2-eqsin replaced, T375961]
- 01:50 sukhe: restart bird on doh5001 and dns5003 to resolve flapping BFD session after cr2-eqsin junos upgrade
- 01:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1198.eqiad.wmnet onto db1223.eqiad.wmnet
- 00:46 denisse@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host prometheus1006.eqiad.wmnet
- 00:41 eileen: civicrm upgraded from 3b6a7cbb to 07dee21c
- 00:27 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus1006.eqiad.wmnet
- 00:26 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1007.eqiad.wmnet
- 00:19 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus1007.eqiad.wmnet
- 00:19 denisse@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host prometheus2005.codfw.wmnet
- 00:02 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus2005.codfw.wmnet
- 00:02 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2006.codfw.wmnet
2024-10-09
- 23:52 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
- 23:51 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2007.codfw.wmnet
- 23:49 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gerrit2003.wikimedia.org
- 23:43 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus2007.codfw.wmnet
- 23:41 denisse@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host prometheus1005.eqiad.wmnet
- 23:26 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus1005.eqiad.wmnet
- 23:25 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2008.codfw.wmnet
- 23:18 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus2008.codfw.wmnet
- 23:07 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
- 23:02 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
- 22:51 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1012.eqiad.wmnet with OS bookworm
- 22:51 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 22:51 ladsgroup@cumin1002: START - Cookbook sre.mysql.clone of db1198.eqiad.wmnet onto db1223.eqiad.wmnet
- 22:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool for reclone (T375652)', diff saved to https://phabricator.wikimedia.org/P69532 and previous config saved to /var/cache/conftool/dbconfig/20241009-225055-ladsgroup.json
- 22:40 eileen: civicrm upgraded from cc7c7744 to 3b6a7cbb
- 22:35 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1004.wikimedia.org with reason: security release 20241009-3
- 22:30 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: security release 20241009-3
- 22:28 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1004.wikimedia.org with reason: security release 20241009-3
- 22:28 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: security release 20241009-3
- 22:01 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1004.wikimedia.org with reason: release 20241009-3
- 22:00 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: release 20241009-3
- 21:57 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: release 20241009-3
- 21:57 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: release 20241009-3
- 21:55 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009-2
- 21:54 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009-2
- 21:48 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009-2
- 21:47 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009-2
- 21:45 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
- 21:45 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on A:dnsbox and (A:esams or A:drmrs) and A:dnsbox
- 21:45 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns6002.wikimedia.org
- 21:44 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
- 21:44 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
- 21:44 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
- 21:42 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
- 21:42 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
- 21:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1212 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P69531 and previous config saved to /var/cache/conftool/dbconfig/20241009-214117-ladsgroup.json
- 21:41 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
- 21:32 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
- 21:30 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns6002.wikimedia.org
- 21:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1212 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P69530 and previous config saved to /var/cache/conftool/dbconfig/20241009-212612-ladsgroup.json
- 21:22 mutante: [apt1002:~] $ sudo -i reprepro --component thirdparty/gitlab-bullseye update bullseye-wikimedia
- 21:18 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns6001.wikimedia.org
- 21:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1212 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P69529 and previous config saved to /var/cache/conftool/dbconfig/20241009-211107-ladsgroup.json
- 21:08 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns6001.wikimedia.org
- 20:56 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns3004.wikimedia.org
- 20:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1212 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P69528 and previous config saved to /var/cache/conftool/dbconfig/20241009-205601-ladsgroup.json
- 20:44 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns3004.wikimedia.org
- 20:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1198.eqiad.wmnet onto db1212.eqiad.wmnet
- 20:32 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns3003.wikimedia.org
- 20:17 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns3003.wikimedia.org
- 20:17 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on A:dnsbox and (A:esams or A:drmrs) and A:dnsbox
- 20:12 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/echostore: apply
- 20:12 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/echostore: apply
- 20:08 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns2006*} and A:dnsbox
- 20:08 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns2006.wikimedia.org
- 19:55 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns2006.wikimedia.org
- 19:55 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns2006*} and A:dnsbox
- 19:55 swfrench-wmf: removing echostore staging deployment to unblock breaking change - T376766
- 19:46 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on A:dnsbox and A:magru and A:dnsbox
- 19:46 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns7002.wikimedia.org
- 19:38 mforns@deploy2002: helmfile [codfw] DONE helmfile.d/services/commons-impact-analytics: apply
- 19:38 mforns@deploy2002: helmfile [codfw] START helmfile.d/services/commons-impact-analytics: apply
- 19:35 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns7002.wikimedia.org
- 19:35 mforns@deploy2002: helmfile [eqiad] DONE helmfile.d/services/commons-impact-analytics: apply
- 19:35 mforns@deploy2002: helmfile [eqiad] START helmfile.d/services/commons-impact-analytics: apply
- 19:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-misc2002.codfw.wmnet with OS bookworm
- 19:28 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 19:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-misc2001.codfw.wmnet with OS bookworm
- 19:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 19:27 mforns@deploy2002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
- 19:27 mforns@deploy2002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
- 19:20 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns7001.wikimedia.org
- 19:05 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns7001.wikimedia.org
- 19:04 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on A:dnsbox and A:magru and A:dnsbox
- 19:04 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on A:dnsbox and A:eqsin and A:dnsbox
- 19:04 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns5004.wikimedia.org
- 18:49 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns5004.wikimedia.org
- 18:45 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts2002.codfw.wmnet
- 18:41 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus3003.esams.wmnet
- 18:38 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts2002.codfw.wmnet
- 18:35 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus3003.esams.wmnet
- 18:34 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns5003.wikimedia.org
- 18:34 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus4002.ulsfo.wmnet
- 18:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 18:29 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/echostore: apply
- 18:29 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/echostore: apply
- 18:28 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus4002.ulsfo.wmnet
- 18:26 ladsgroup@cumin1002: START - Cookbook sre.mysql.clone of db1198.eqiad.wmnet onto db1212.eqiad.wmnet
- 18:26 denisse@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host prometheus5002.eqsin.wmnet
- 18:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool for reclone (T375652)', diff saved to https://phabricator.wikimedia.org/P69527 and previous config saved to /var/cache/conftool/dbconfig/20241009-182632-ladsgroup.json
- 18:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 18:24 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns5003.wikimedia.org
- 18:24 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on A:dnsbox and A:eqsin and A:dnsbox
- 18:19 eileen: config revision changed from 739e8794 to ae4a5be9
- 18:18 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus5002.eqsin.wmnet
- 18:16 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus6002.drmrs.wmnet
- 18:16 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus6002.drmrs.wmnet
- 18:15 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs[5004-5006].eqsin.wmnet
- 18:15 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs[5004-5006].eqsin.wmnet
- 18:15 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus7001.magru.wmnet
- 18:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-misc2002.codfw.wmnet with reason: host reimage
- 18:12 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-misc2002.codfw.wmnet with reason: host reimage
- 18:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-misc2001.codfw.wmnet with reason: host reimage
- 18:08 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus7001.magru.wmnet
- 18:06 eileen: civicrm upgraded from ae54bd5e to cc7c7744
- 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-misc2001.codfw.wmnet with reason: host reimage
- 18:01 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/echostore: apply
- 18:01 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/echostore: apply
- 17:58 zabe: zabe@mwmaint2002:~$ cat /home/zabe/s5.txt | xargs -I{} bash -c "echo {}; mwscript extensions/WikimediaMaintenance/migrateESRefToContentTable.php {} --skip /home/zabe/text_table_cleanup/{} --dump /home/zabe/text_table_dump/{} --sleep 1" # T183490
- 17:53 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-misc2002.codfw.wmnet with OS bookworm
- 17:53 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-misc2001.codfw.wmnet with OS bookworm
- 17:51 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/echostore: apply
- 17:51 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/echostore: apply
- 17:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P69526 and previous config saved to /var/cache/conftool/dbconfig/20241009-174501-ladsgroup.json
- 17:44 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host grafana2001.codfw.wmnet
- 17:41 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
- 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply
- 17:40 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host grafana2001.codfw.wmnet
- 17:38 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host grafana1002.eqiad.wmnet
- 17:34 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host grafana1002.eqiad.wmnet
- 17:31 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host alert1002.wikimedia.org
- 17:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P69525 and previous config saved to /var/cache/conftool/dbconfig/20241009-172956-ladsgroup.json
- 17:23 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host alert1002.wikimedia.org
- 17:23 denisse@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host alert2002.wikimedia.org
- 17:23 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host alert2002.wikimedia.org
- 17:21 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host alert1002.wikimedia.org
- 17:13 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host alert1002.wikimedia.org
- 17:12 denisse@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host alert2002.wikimedia.org
- 17:12 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host alert2002.wikimedia.org
- 16:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1157 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P69523 and previous config saved to /var/cache/conftool/dbconfig/20241009-165944-ladsgroup.json
- 16:50 kharlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
- 16:50 kharlan@deploy2002: helmfile [codfw] START helmfile.d/services/ipoid: apply
- 16:50 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
- 16:50 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
- 16:50 kharlan@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
- 16:50 kharlan@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
- 16:48 kharlan@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
- 16:48 kharlan@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
- 16:44 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:44 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for codfw cr IPs facin cloudsw - cmooney@cumin1002"
- 16:44 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for codfw cr IPs facin cloudsw - cmooney@cumin1002"
- 16:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1198.eqiad.wmnet onto db1157.eqiad.wmnet
- 16:34 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 16:32 bvibber: starting requeueTranscodes on old school mwmaint2002 after the k8s blowup last night
- 16:23 sukhe: running authdns-update to fix broken zone files on dns2004
- 16:23 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:23 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: picking up zone file 1.0.e.f.0.0.1.a.0.8.c.e.2.0.a.2.ip6.arpa - sukhe@cumin1002"
- 16:23 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: picking up zone file 1.0.e.f.0.0.1.a.0.8.c.e.2.0.a.2.ip6.arpa - sukhe@cumin1002"
- 16:21 sukhe: forcing commit 95858ba through sre.dns.netbox
- 16:20 sukhe@cumin1002: START - Cookbook sre.dns.netbox
- 16:07 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:05 sukhe@cumin1002: START - Cookbook sre.dns.netbox
- 16:03 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-misc2002.codfw.wmnet with OS bookworm
- 16:03 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-misc2001.codfw.wmnet with OS bookworm
- 16:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2002.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:58 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dns2005.wikimedia.org
- 15:58 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for dns2005.wikimedia.org
- 15:54 sukhe@cumin1002: END (ERROR) - Cookbook sre.dns.roll-reboot (exit_code=97) rolling reboot on A:dnsbox
- 15:53 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2002.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:53 sukhe: running authdns-update
- 15:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mc-misc2002.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:52 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx-in2001.wikimedia.org
- 15:49 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs[5004-5006].eqsin.wmnet with reason: site is depooled, cr2-eqsin is being replaced
- 15:49 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on lvs[5004-5006].eqsin.wmnet with reason: site is depooled, cr2-eqsin is being replaced
- 15:48 jhathaway@cumin1002: START - Cookbook sre.hosts.reboot-single for host mx-in2001.wikimedia.org
- 15:48 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx-in1001.wikimedia.org
- 15:47 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2002.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:45 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns2005.wikimedia.org
- 15:44 jhathaway@cumin1002: START - Cookbook sre.hosts.reboot-single for host mx-in1001.wikimedia.org
- 15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:43 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:43 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling reboot on A:wikidough and A:wikidough
- 15:30 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns2004.wikimedia.org
- 15:26 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idp.wikimedia.org on all recursors
- 15:26 sukhe@cumin1002: START - Cookbook sre.dns.wipe-cache idp.wikimedia.org on all recursors
- 15:25 fabfur: eqsin depooled for T375961
- 15:24 fabfur@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqsin [reason: eqsin cr replacement, T375961]
- 15:24 fabfur@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqsin [reason: eqsin cr replacement, T375961]
- 15:24 fabfur@cumin1002: END (FAIL) - Cookbook sre.dns.admin (exit_code=99) DNS admin: depool site eqsin [reason: eqsin cr replacementAA, T375961]
- 15:24 fabfur@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqsin [reason: eqsin cr replacementAA, T375961]
- 15:23 mutante: stewards* - rebooting machines - T351202
- 15:22 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:22 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add IPv6 reverse entry for cloudsw1-b1-codfw interface IPs - cmooney@cumin1002"
- 15:22 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add IPv6 reverse entry for cloudsw1-b1-codfw interface IPs - cmooney@cumin1002"
- 15:21 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns2004.wikimedia.org
- 15:20 sukhe: running dummy authdns-update
- 15:19 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 15:17 mutante: planet.wikimedia.org - rebooting backends
- 15:09 mutante: people.wikimedia.org - rebooting backends
- 15:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-maint2001.codfw.wmnet
- 15:07 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dns1006.wikimedia.org
- 15:07 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for dns1006.wikimedia.org
- 15:06 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns1006.wikimedia.org
- 15:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-maint2001.codfw.wmnet
- 15:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host crm2001.codfw.wmnet
- 15:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cr2-eqsin with reason: router replacement
- 15:03 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cr2-eqsin with reason: router replacement
- 15:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cr2-eqsin with reason: router replacement
- 15:02 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cr2-eqsin with reason: router replacement
- 15:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 15:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host crm2001.codfw.wmnet
- 14:59 brouberol@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling restart_daemons on P{cephosd1001*} and (A:cephosd)
- 14:58 brouberol@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling restart_daemons on P{cephosd1001*} and (A:cephosd)
- 14:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-maint1001.eqiad.wmnet
- 14:53 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on backup[2010-2011].codfw.wmnet with reason: T376800
- 14:52 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on backup[2010-2011].codfw.wmnet with reason: T376800
- 14:51 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 14:51 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 14:51 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 14:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-maint1001.eqiad.wmnet
- 14:50 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 14:50 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns1006.wikimedia.org
- 14:47 brouberol@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling restart_daemons on P{cephosd1001*} and (A:cephosd)
- 14:47 brouberol@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling restart_daemons on P{cephosd1001*} and (A:cephosd)
- 14:47 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:47 elukey@cumin2002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 14:45 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 14:44 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 14:44 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-durum (exit_code=0) rolling reboot on A:durum
- 14:44 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:44 elukey@cumin2002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:44 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:44 elukey@cumin2002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:43 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudlb2004-dev
- 14:43 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb2004-dev
- 14:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet
- 14:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet
- 14:35 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns1005.wikimedia.org
- 14:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet
- 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:31 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
- 14:31 elukey@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: sync
- 14:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet
- 14:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 14:29 ladsgroup@cumin1002: START - Cookbook sre.mysql.clone of db1198.eqiad.wmnet onto db1157.eqiad.wmnet
- 14:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 (T367856)', diff saved to https://phabricator.wikimedia.org/P69522 and previous config saved to /var/cache/conftool/dbconfig/20241009-142848-ladsgroup.json
- 14:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1209.eqiad.wmnet with reason: Maintenance
- 14:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1209.eqiad.wmnet with reason: Maintenance
- 14:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T367856)', diff saved to https://phabricator.wikimedia.org/P69521 and previous config saved to /var/cache/conftool/dbconfig/20241009-142826-ladsgroup.json
- 14:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1012.eqiad.wmnet with reason: host reimage
- 14:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling for reclone (T375652)', diff saved to https://phabricator.wikimedia.org/P69520 and previous config saved to /var/cache/conftool/dbconfig/20241009-142404-ladsgroup.json
- 14:23 moritzm: failover master for ganeti/routed to ganeti2033
- 14:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudlb2004-dev
- 14:22 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb2004-dev
- 14:22 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1012.eqiad.wmnet with reason: host reimage
- 14:21 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns1005.wikimedia.org
- 14:21 sukhe: sudo cumin 'O:alerting_host' 'run-puppet-agent'
- 14:21 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2033.codfw.wmnet
- 14:21 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudlb2004-dev
- 14:21 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb2004-dev
- 14:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 14:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 14:20 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:20 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2004-dev.codfw.wmnet with OS bookworm
- 14:20 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 14:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 14:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 14:18 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 14:18 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
- 14:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
- 14:18 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling reboot on A:wikidough and A:wikidough
- 14:18 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:17 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:14 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 14:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2033.codfw.wmnet
- 14:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P69519 and previous config saved to /var/cache/conftool/dbconfig/20241009-141319-ladsgroup.json
- 14:12 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor1003.eqiad.wmnet
- 14:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2033.codfw.wmnet
- 14:11 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:11 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 14:11 moritzm: installing Apache security updates
- 14:10 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 14:09 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:09 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2033.codfw.wmnet
- 14:08 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:08 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
- 14:08 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
- 14:08 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor1003.eqiad.wmnet
- 14:07 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp2004.wikimedia.org
- 14:06 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns1004.wikimedia.org
- 14:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor2003.codfw.wmnet
- 14:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:04 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
- 14:03 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idp2004.wikimedia.org
- 14:02 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor2003.codfw.wmnet
- 14:01 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 14:01 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk1002.eqiad.wmnet
- 13:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P69517 and previous config saved to /var/cache/conftool/dbconfig/20241009-135812-ladsgroup.json
- 13:57 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host flink-zk1002.eqiad.wmnet
- 13:56 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host zookeeper-test1002.eqiad.wmnet
- 13:55 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp-test2005.wikimedia.org
- 13:54 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk1003.eqiad.wmnet
- 13:53 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 13:53 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 13:53 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns1004.wikimedia.org
- 13:52 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on A:dnsbox
- 13:52 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host zookeeper-test1002.eqiad.wmnet
- 13:51 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 13:51 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idp-test2005.wikimedia.org
- 13:51 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp-test2004.wikimedia.org
- 13:50 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host flink-zk1003.eqiad.wmnet
- 13:50 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on backup[1010-1011].eqiad.wmnet with reason: T376800
- 13:50 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on backup[1010-1011].eqiad.wmnet with reason: T376800
- 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1028.eqiad.wmnet
- 13:49 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk1001.eqiad.wmnet
- 13:48 Lucas_WMDE: UTC afternoon backport+config window done
- 13:48 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idp-test2004.wikimedia.org
- 13:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [brwikimedia] Enable the CampaignEvents extension (T376747) (duration: 07m 04s)
- 13:48 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp1004.wikimedia.org
- 13:45 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host flink-zk1001.eqiad.wmnet
- 13:45 brouberol@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host flink-zk1001.eqiad.wmnet
- 13:44 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host flink-zk1001.eqiad.wmnet
- 13:44 lucaswerkmeister-wmde@deploy2002: albertoleoncio, lucaswerkmeister-wmde: Continuing with sync
- 13:44 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idp1004.wikimedia.org
- 13:43 lucaswerkmeister-wmde@deploy2002: albertoleoncio, lucaswerkmeister-wmde: Backport for [brwikimedia] Enable the CampaignEvents extension (T376747) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:43 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp-test1004.wikimedia.org
- 13:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T367856)', diff saved to https://phabricator.wikimedia.org/P69516 and previous config saved to /var/cache/conftool/dbconfig/20241009-134305-ladsgroup.json
- 13:42 brouberol@cumin1002: END (ERROR) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=97) for Zookeeper A:zookeeper-flink-eqiad cluster: Roll restart of jvm daemons.
- 13:42 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1028.eqiad.wmnet
- 13:41 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [brwikimedia] Enable the CampaignEvents extension (T376747)
- 13:41 brouberol@cumin1002: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-flink-eqiad cluster: Roll restart of jvm daemons.
- 13:39 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idp-test1004.wikimedia.org
- 13:39 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-reboot-durum rolling reboot on A:durum
- 13:39 Lucas_WMDE: lucaswerkmeister-wmde@deploy2002 $ printf 'https://en.wikipedia.org/static/images/%s\n' 'project-logos/sdwiki.png' 'project-logos/sdwiki-1.5x.png' 'project-logos/sdwiki-2x.png' 'mobile/copyright/wikipedia-wordmark-sd.svg' 'mobile/copyright/wikipedia-tagline-sd.svg' | mwscript-k8s --attach -- purgeList.php # T376536
- 13:35 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for sdwiki: Add new logo and tagline (T376536) (duration: 19m 34s)
- 13:33 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm2001.wikimedia.org
- 13:32 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host gerrit2003.wikimedia.org
- 13:31 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idm2001.wikimedia.org
- 13:30 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ammarpad: Continuing with sync
- 13:30 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm1001.wikimedia.org
- 13:28 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idm1001.wikimedia.org
- 13:27 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm-test1001.wikimedia.org
- 13:23 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idm-test1001.wikimedia.org
- 13:22 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host etherpad1004.eqiad.wmnet
- 13:18 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ammarpad: Backport for sdwiki: Add new logo and tagline (T376536) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:18 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host etherpad1004.eqiad.wmnet
- 13:16 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host etherpad2002.codfw.wmnet
- 13:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for sdwiki: Add new logo and tagline (T376536)
- 13:14 kharlan@deploy2002: Finished scap sync-world: Backport for QuickSurveys: Deploy Safety Survey with zero coverage (T376517) (duration: 10m 37s)
- 13:12 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host etherpad2002.codfw.wmnet
- 13:09 kharlan@deploy2002: kharlan: Continuing with sync
- 13:06 kharlan@deploy2002: kharlan: Backport for QuickSurveys: Deploy Safety Survey with zero coverage (T376517) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:03 kharlan@deploy2002: Started scap sync-world: Backport for QuickSurveys: Deploy Safety Survey with zero coverage (T376517)
- 12:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
- 12:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts rpki2002.codfw.wmnet
- 12:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rpki2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1002"
- 12:41 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rpki2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1002"
- 12:38 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
- 12:33 ayounsi@cumin1002: START - Cookbook sre.hosts.decommission for hosts rpki2002.codfw.wmnet
- 12:24 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 12:24 jelto@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 12:23 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 12:23 jelto@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 12:18 moritzm: installing initramfs-tools bugfix updates from Bookworm point release
- 12:16 jelto@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 12:15 jelto@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 12:15 jelto@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 12:15 jelto@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 11:54 kcvelaga@deploy2002: Finished deploy [airflow-dags/analytics_product@b2c30ad]: T375153 (duration: 02m 32s)
- 11:52 jynus: start systemctl start wmf_auto_restart_routinator.service on rpki2003
- 11:52 kcvelaga@deploy2002: Started deploy [airflow-dags/analytics_product@b2c30ad]: T375153
- 11:24 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
- 11:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P69513 and previous config saved to /var/cache/conftool/dbconfig/20241009-111154-ladsgroup.json
- 11:04 elukey@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: sync
- 11:00 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
- 11:00 elukey@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: sync
- 10:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P69511 and previous config saved to /var/cache/conftool/dbconfig/20241009-105647-ladsgroup.json
- 10:49 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1027.eqiad.wmnet
- 10:44 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1177.eqiad.wmnet with OS bullseye
- 10:44 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
- 10:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P69507 and previous config saved to /var/cache/conftool/dbconfig/20241009-104142-ladsgroup.json
- 10:35 elukey@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: sync
- 10:28 elukey: roll restart swift-proxy on ms-fe* to pick up https://gerrit.wikimedia.org/r/c/operations/puppet/+/1078380
- 10:27 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1027.eqiad.wmnet
- 10:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P69506 and previous config saved to /var/cache/conftool/dbconfig/20241009-102636-ladsgroup.json
- 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1026.eqiad.wmnet
- 10:11 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1176.eqiad.wmnet with OS bullseye
- 09:42 Dreamy_Jazz: Started time limited MediaModertation scan on enwiki for 16hrs to catchup with monthly request limit - https://wikitech.wikimedia.org/wiki/MediaModeration
- 09:40 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1026.eqiad.wmnet
- 08:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:53 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:51 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1176.eqiad.wmnet with OS bullseye
- 08:49 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 08:49 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 08:48 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1176.eqiad.wmnet with OS bullseye
- 08:46 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:46 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 08:37 elukey@cumin2002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 08:37 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:36 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:23 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host cloudcephmon1005.eqiad.wmnet
- 08:18 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephmon1005.eqiad.wmnet
- 08:12 aklapper@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.43.0-wmf.26 refs T375657
- 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1021.eqiad.wmnet
- 08:02 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1021.eqiad.wmnet
- 08:02 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1011.eqiad.wmnet
- 07:48 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 07:47 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1011.eqiad.wmnet
- 07:45 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 07:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 07:43 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 07:26 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 07:26 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 07:22 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 07:22 elukey@cumin2002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 07:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 07:20 elukey@cumin2002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 07:13 moritzm: remove ganeti2010 from active nodes T376594
- 06:37 eileen: civicrm upgraded from 251e958f to ae54bd5e
- 06:08 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 06:06 jelto@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 03:36 eileen: civicrm upgraded from 61718eae to 251e958f
- 01:26 eileen: tools upgraded from 3f7b238d to e9c05e30
- 00:04 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1012.eqiad.wmnet with OS bookworm
2024-10-08
- 22:36 tzatziki: removing 1 file for legal compliance
- 22:32 tzatziki: removing 3 files for legal compliance
- 22:16 tzatziki: removing 1 file for legal compliance
- 22:11 tzatziki: removing 3 files for legal compliance
- 21:59 tzatziki: removing 3 files for legal compliance
- 21:41 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on gerrit2003.wikimedia.org with reason: initial gerrit deploy wip
- 21:41 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on gerrit2003.wikimedia.org with reason: initial gerrit deploy wip
- 21:35 bvibber: running requeueTranscodes in k8s maint to clean up ios video transcodes (T363966)
- 21:34 mutante: gerrit2003 - sudo -u gerrit-deploy /usr/bin/scap deploy-local --repo gerrit/gerrit -D log_json:False (for some reason this fails in puppet but works manually) T372804 T257317 T317412
- 21:26 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
- 21:21 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1022.eqiad.wmnet with OS bullseye
- 21:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 21:16 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 21:06 eileen: config revision changed from 9ba217d2 to c84a1354
- 21:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1022.eqiad.wmnet with reason: host reimage
- 20:59 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1022.eqiad.wmnet with reason: host reimage
- 20:56 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host aqs1022.eqiad.wmnet with OS bullseye
- 20:54 cjming: end of UTC late backport window
- 20:54 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1022.eqiad.wmnet with OS bullseye
- 20:54 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host aqs1022.eqiad.wmnet with OS bullseye
- 20:54 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:52 cjming@deploy2002: Finished scap sync-world: Backport for Switch iOS back-compat video transcodes from HLS to regular QuickTime (T363966) (duration: 07m 39s)
- 20:52 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 20:48 cjming@deploy2002: bvibber, cjming: Continuing with sync
- 20:47 cjming@deploy2002: bvibber, cjming: Backport for Switch iOS back-compat video transcodes from HLS to regular QuickTime (T363966) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:45 cjming@deploy2002: Started scap sync-world: Backport for Switch iOS back-compat video transcodes from HLS to regular QuickTime (T363966)
- 20:42 cjming@deploy2002: Finished scap sync-world: Backport for Dark mode: Make LiquidThreads namespace exclusion explicit (duration: 09m 58s)
- 20:37 cjming@deploy2002: jdlrobson, cjming: Continuing with sync
- 20:34 cjming@deploy2002: jdlrobson, cjming: Backport for Dark mode: Make LiquidThreads namespace exclusion explicit synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:32 cjming@deploy2002: Started scap sync-world: Backport for Dark mode: Make LiquidThreads namespace exclusion explicit
- 20:29 cjming@deploy2002: Finished scap sync-world: Backport for Expand Vector 2022 roll out and support local variants (T375549) (duration: 19m 28s)
- 20:29 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit2003.wikimedia.org with reason: applying gerrit profile
- 20:29 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit2003.wikimedia.org with reason: applying gerrit profile
- 20:26 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on gerrit2003.wikimedia.org with reason: applying gerrit profile
- 20:26 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:15:00 on gerrit2003.wikimedia.org with reason: applying gerrit profile
- 20:24 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 20:24 cjming@deploy2002: jdlrobson, cjming: Continuing with sync
- 20:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 20:12 cjming@deploy2002: jdlrobson, cjming: Backport for Expand Vector 2022 roll out and support local variants (T375549) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:11 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host backup1012
- 20:11 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host backup1012
- 20:10 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:10 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt backup1012 - jclark@cumin1002"
- 20:10 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt backup1012 - jclark@cumin1002"
- 20:10 cjming@deploy2002: Started scap sync-world: Backport for Expand Vector 2022 roll out and support local variants (T375549)
- 20:04 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 19:54 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
- 18:59 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:58 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:54 swfrench-wmf: ran authdns-update on dns1004 to pick up mwdebug-next record - T372604
- 18:50 swfrench@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=mwdebug-next,name=codfw [reason: pooling mwdebug-next in codfw to match mwdebug - T372604]
- 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for pfw1 lo0 - pt1979@cumin2002"
- 18:43 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for pfw1 lo0 - pt1979@cumin2002"
- 18:43 cdanis: 💔cdanis@cumin1002.eqiad.wmnet ~ 🕝☕ sudo cumin -b1 -s120 A:dnsbox 'run-puppet-agent --enable "cdanis rolling out T344171 Ie7d5091bca40"'
- 18:41 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:41 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:40 cdanis: 💙cdanis@cumin1002.eqiad.wmnet ~ 🕝☕ sudo cumin A:dnsbox 'disable-puppet "cdanis rolling out T344171 Ie7d5091bca40"'
- 18:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox
- 18:39 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:39 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:38 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:34 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 17:45 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-codfw (T372604)
- 17:39 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-codfw (T372604)
- 17:35 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-codfw (T372604)
- 17:35 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw (T372604)
- 17:34 swfrench-wmf: ran and enabled puppet-agent on 'A:lvs and A:codfw' - T372604
- 17:27 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-eqiad (T372604)
- 17:21 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-eqiad (T372604)
- 17:17 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad (T372604)
- 17:12 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad (T372604)
- 17:09 swfrench-wmf: ran and enabled puppet-agent on 'A:lvs and A:eqiad' - T372604
- 17:04 swfrench-wmf: ran disable-puppet on 'A:lvs and (A:eqiad or A:codfw)' - T372604
- 16:57 moritzm: enable Puppet fleet-wide for puppetmaster1001 hardware maintenance
- 16:49 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Define wgGlobalBlockingEnableAutoblocks as false (T374853), Remove wgGlobalBlockingAllowGlobalAccountBlocks as unused (duration: 06m 50s)
- 16:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2010.codfw.wmnet
- 16:48 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2004-dev.codfw.wmnet with OS bookworm
- 16:48 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
- 16:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudlb2004-dev.codfw.wmnet
- 16:44 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
- 16:44 dreamyjazz@deploy2002: dreamyjazz: Backport for Define wgGlobalBlockingEnableAutoblocks as false (T374853), Remove wgGlobalBlockingAllowGlobalAccountBlocks as unused synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 16:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on puppetserver1001.eqiad.wmnet with reason: RAM expansion
- 16:43 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on puppetserver1001.eqiad.wmnet with reason: RAM expansion
- 16:42 dreamyjazz@deploy2002: Started scap sync-world: Backport for Define wgGlobalBlockingEnableAutoblocks as false (T374853), Remove wgGlobalBlockingAllowGlobalAccountBlocks as unused
- 16:40 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cloudlb2004-dev.codfw.wmnet
- 16:39 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudlb2004-dev.codfw.wmnet
- 16:37 moritzm: disable Puppet fleet-wide for puppetmaster1001 hardware maintenance
- 16:28 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cloudlb2004-dev.codfw.wmnet
- 16:26 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-eqiad
- 16:25 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-eqiad
- 16:24 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad
- 16:23 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad
- 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 16:09 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 16:08 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudlb2004-dev
- 16:08 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb2004-dev
- 16:08 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 16:08 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 16:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 16:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 16:06 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudlb2004-dev
- 16:06 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb2004-dev
- 16:06 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-misc2001 to codfw - jhancock@cumin2002"
- 16:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-misc2001 to codfw - jhancock@cumin2002"
- 16:02 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 15:41 papaul: mr1-magru end of maintenance
- 15:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f7-eqiad
- 15:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-f7-eqiad
- 15:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e7-eqiad
- 15:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-e7-eqiad
- 15:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e6-eqiad
- 15:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-e6-eqiad
- 15:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f6-eqiad
- 15:33 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-f6-eqiad
- 15:33 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f5-eqiad
- 15:33 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-f5-eqiad
- 15:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
- 15:32 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e5-eqiad
- 15:32 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-e5-eqiad
- 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudlb2004-dev']
- 15:26 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
- 15:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudlb2004-dev']
- 15:19 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
- 15:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudlb2004-dev']
- 15:19 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
- 15:05 brennen@deploy2002: Finished deploy [phabricator/deployment@40a63c9]: deploy phab1004 for T376720 (duration: 01m 07s)
- 15:04 brennen@deploy2002: Started deploy [phabricator/deployment@40a63c9]: deploy phab1004 for T376720
- 15:03 brennen@deploy2002: Finished deploy [phabricator/deployment@40a63c9]: test deploy phab2002 for T376720 (duration: 00m 26s)
- 15:03 brennen@deploy2002: Started deploy [phabricator/deployment@40a63c9]: test deploy phab2002 for T376720
- 15:02 dzahn@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on phab.wmfusercontent.org with reason: version upgrade
- 15:02 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab.wmfusercontent.org with reason: version upgrade
- 15:02 dzahn@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on phabricator.wikimedia.org with reason: version upgrade
- 15:02 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on phabricator.wikimedia.org with reason: version upgrade
- 15:02 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: version upgrade
- 15:02 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: version upgrade
- 15:01 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: version upgrade
- 15:01 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: version upgrade
- 14:58 papaul: mr1-magru ongoing maintenance
- 14:56 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
- 14:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
- 14:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 14:47 sergi0: deployment-prep: `sgimeno@deployment-mwmaint03:~$ foreachwiki userOptions.php --delete --old=1 growthexperiments-tour-newimpact-discovery` (T376461)
- 14:41 moritzm: installing python-aiosmtpd security updates
- 14:35 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2010.codfw.wmnet
- 14:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2010.codfw.wmnet
- 14:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1010.eqiad.wmnet
- 14:30 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2010.codfw.wmnet
- 14:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
- 14:23 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1010.eqiad.wmnet
- 14:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudlb2004-dev']
- 14:22 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1009.eqiad.wmnet
- 14:22 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-misc2001
- 14:22 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-misc2001
- 14:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 14:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
- 14:17 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudlb2004-dev']
- 14:16 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
- 14:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudlb2004-dev']
- 14:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
- 14:15 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudlb2004-dev
- 14:15 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb2004-dev
- 14:13 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-misc2001
- 14:13 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-misc2001
- 14:10 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1009.eqiad.wmnet
- 14:08 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:08 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-misc2001 to codfw - jhancock@cumin2002"
- 14:08 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-misc2001 to codfw - jhancock@cumin2002"
- 14:05 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 14:03 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 13:59 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 13:53 zabe@deploy2002: Finished scap sync-world: Backport for Stop setting wgAbuseFilterActorTableSchemaMigrationStage (T188180) (duration: 07m 03s)
- 13:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-staging2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 13:49 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-staging2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 13:49 zabe@deploy2002: zabe: Continuing with sync
- 13:48 zabe@deploy2002: zabe: Backport for Stop setting wgAbuseFilterActorTableSchemaMigrationStage (T188180) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:46 zabe@deploy2002: Started scap sync-world: Backport for Stop setting wgAbuseFilterActorTableSchemaMigrationStage (T188180)
- 13:46 zabe@deploy2002: Finished scap sync-world: Backport for s5: Reduce revision-slots cache expiry to 60 seconds (T183490) (duration: 07m 10s)
- 13:41 zabe@deploy2002: zabe: Continuing with sync
- 13:41 zabe@deploy2002: zabe: Backport for s5: Reduce revision-slots cache expiry to 60 seconds (T183490) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:39 zabe@deploy2002: Started scap sync-world: Backport for s5: Reduce revision-slots cache expiry to 60 seconds (T183490)
- 13:33 Lucas_WMDE: UTC afternoon backport+config window done
- 13:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Remove $wgCodeMirrorRTL temporary feature flag (T170001 T357795) (duration: 06m 56s)
- 13:27 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, musikanimal: Continuing with sync
- 13:27 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, musikanimal: Backport for Remove $wgCodeMirrorRTL temporary feature flag (T170001 T357795) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:24 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Remove $wgCodeMirrorRTL temporary feature flag (T170001 T357795)
- 13:24 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2001.codfw.wmnet
- 13:24 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2001.codfw.wmnet
- 13:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-staging2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 13:15 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-staging2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 13:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host deploy1003.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 13:11 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for hawiki: Add temporary tagline for Vector-2022 (T376049) (duration: 08m 17s)
- 13:11 elukey@cumin1002: START - Cookbook sre.hosts.provision for host deploy1003.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 13:09 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host parsoidtest1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 13:07 lucaswerkmeister-wmde@deploy2002: ammarpad, lucaswerkmeister-wmde: Continuing with sync
- 13:06 lucaswerkmeister-wmde@deploy2002: ammarpad, lucaswerkmeister-wmde: Backport for hawiki: Add temporary tagline for Vector-2022 (T376049) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host parsoidtest1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 13:03 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for hawiki: Add temporary tagline for Vector-2022 (T376049)
- 12:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host krb1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 12:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host krb1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 12:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2008.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 12:57 Amir1: dropping povwatch_log on all.dblist (T54924 and T376627)
- 12:55 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ganeti2036.codfw.wmnet
- 12:53 elukey@cumin1002: START - Cookbook sre.hosts.provision for host dbproxy2008.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 12:53 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2007.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 12:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host dbproxy2007.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 12:49 ladsgroup@deploy2002: Finished scap sync-world: Backport for Remove flow from techconductwiki (T332022) (duration: 09m 27s)
- 12:47 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 12:45 moritzm: installing lua5.4 bugfix updates
- 12:44 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 12:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host dbproxy2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 12:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 12:42 ladsgroup@deploy2002: ladsgroup: Backport for Remove flow from techconductwiki (T332022) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 12:39 ladsgroup@deploy2002: Started scap sync-world: Backport for Remove flow from techconductwiki (T332022)
- 12:39 elukey@cumin1002: START - Cookbook sre.hosts.provision for host dbproxy2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 12:36 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 12:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 12:32 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1005.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 12:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2036.codfw.wmnet
- 12:29 elukey@cumin1002: START - Cookbook sre.hosts.provision for host an-conf1005.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 12:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2036.codfw.wmnet
- 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2036.codfw.wmnet
- 12:26 moritzm: remove ganeti2009 from active nodes T376594
- 12:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1008.eqiad.wmnet
- 12:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2009.codfw.wmnet
- 12:19 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2001.codfw.wmnet with OS bookworm
- 12:15 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1008.eqiad.wmnet
- 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1007.eqiad.wmnet
- 12:01 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2001.codfw.wmnet with reason: host reimage
- 11:56 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2001.codfw.wmnet with reason: host reimage
- 11:52 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1007.eqiad.wmnet
- 11:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1006.eqiad.wmnet
- 11:35 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestage2001.codfw.wmnet with OS bookworm
- 11:33 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage2001.codfw.wmnet
- 11:30 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage2001.codfw.wmnet
- 11:30 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2002.codfw.wmnet
- 11:30 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2002.codfw.wmnet
- 11:29 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1006.eqiad.wmnet
- 11:28 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2002.codfw.wmnet with OS bookworm
- 11:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:13 elukey@cumin1002: START - Cookbook sre.hosts.provision for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 11:09 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
- 11:06 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
- 10:55 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2009.codfw.wmnet
- 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2009.codfw.wmnet
- 10:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2009.codfw.wmnet
- 10:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2009.codfw.wmnet
- 10:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2009.codfw.wmnet
- 10:49 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm
- 10:49 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002"
- 10:45 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestage2002.codfw.wmnet with OS bookworm
- 10:36 jayme: updated kubernetes 1.23.14-3 -> 1.23.14-4 on P:kubernetes::node - T362408
- 10:27 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 10:26 jayme: re-enable puppet on all P:kubernetes::node
- 10:26 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002"
- 10:09 jayme: disabled puppet on all P:kubernetes::node
- 10:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
- 10:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
- 09:52 moritzm: installing freetype bugfix updates from Bookworm point update
- 09:48 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
- 09:48 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:47 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:36 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:33 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1005.eqiad.wmnet
- 09:29 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:26 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:25 jayme: imported kubernetes 1.23.14-4 to component/kubernetes123 (buster, bullseye, bookworm) - T362408
- 09:23 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:20 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:17 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1005.eqiad.wmnet
- 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2036.codfw.wmnet to cluster codfw and group C
- 09:12 Dreamy_Jazz: Maintenance script for T376340 finished
- 09:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2036.codfw.wmnet to cluster codfw and group C
- 09:11 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:10 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:06 Dreamy_Jazz: Ran `mwscript-k8s --comment="T376340" -- extensions/GlobalBlocking/maintenance/UpdateAutoBlockParentIdColumn.php --wiki=aawikibooks`
- 09:01 stran@deploy2002: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
- 08:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2036.codfw.wmnet
- 08:55 stran@deploy2002: helmfile [codfw] START helmfile.d/services/ipoid: apply
- 08:55 stran@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
- 08:54 stran@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
- 08:53 stran@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
- 08:53 stran@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
- 08:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2036.codfw.wmnet
- 08:20 dcausse: repooling wdqs1013
- 08:20 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: maintenance
- 08:20 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: maintenance
- 08:19 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.26 refs T375657
- 08:16 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 100%: T374215', diff saved to https://phabricator.wikimedia.org/P69498 and previous config saved to /var/cache/conftool/dbconfig/20241008-081620-arnaudb.json
- 08:01 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 75%: T374215', diff saved to https://phabricator.wikimedia.org/P69497 and previous config saved to /var/cache/conftool/dbconfig/20241008-080115-arnaudb.json
- 07:46 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 50%: T374215', diff saved to https://phabricator.wikimedia.org/P69496 and previous config saved to /var/cache/conftool/dbconfig/20241008-074609-arnaudb.json
- 07:44 vgutierrez: uploaded golang-github-jvgutierrez-go-etcd-harness 1.0.0 to apt.wm.o (bookworm-wikimedia) - T376600
- 07:31 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 25%: T374215', diff saved to https://phabricator.wikimedia.org/P69495 and previous config saved to /var/cache/conftool/dbconfig/20241008-073104-arnaudb.json
- 07:16 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 15%: T374215', diff saved to https://phabricator.wikimedia.org/P69494 and previous config saved to /var/cache/conftool/dbconfig/20241008-071559-arnaudb.json
- 07:10 dcausse: depooling wdqs1013 (lag)
- 07:00 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 10%: T374215', diff saved to https://phabricator.wikimedia.org/P69493 and previous config saved to /var/cache/conftool/dbconfig/20241008-070053-arnaudb.json
- 06:45 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 5%: T374215', diff saved to https://phabricator.wikimedia.org/P69492 and previous config saved to /var/cache/conftool/dbconfig/20241008-064548-arnaudb.json
- 04:01 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.23 (duration: 00m 58s)
- 03:50 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.43.0-wmf.26 refs T375657 (duration: 47m 44s)
- 03:16 eileen: civicrm upgraded from 8b13ef22 to 61718eae
- 03:15 eileen: config revision changed from 6e649356 to 9ba217d2
- 03:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.43.0-wmf.26 refs T375657
- 00:55 eileen: config revision changed from 856e4d99 to 6e649356
- 00:30 eileen: config revision changed from 856e4d99 to 4ab498d2 - disable process control to load triggers
2024-10-07
- 22:33 eileen: civicrm upgraded from f2095695 to 8b13ef22
- 22:09 eileen: config revision changed from a2ba4a8d to 856e4d99
- 21:32 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2004-dev.codfw.wmnet with OS bookworm
- 20:20 urbanecm@deploy2002: Finished scap sync-world: Backport for disable the Add A Fact QuickSurvey on enwiki, Enable EditCheck on ru.wiki (T373022) (duration: 07m 41s)
- 20:16 urbanecm@deploy2002: esanders, derenrich, urbanecm: Continuing with sync
- 20:14 urbanecm@deploy2002: esanders, derenrich, urbanecm: Backport for disable the Add A Fact QuickSurvey on enwiki, Enable EditCheck on ru.wiki (T373022) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:12 urbanecm@deploy2002: Started scap sync-world: Backport for disable the Add A Fact QuickSurvey on enwiki, Enable EditCheck on ru.wiki (T373022)
- 20:12 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
- 19:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudlb2004-dev']
- 19:56 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
- 19:56 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
- 18:22 swfrench-wmf: running `git restore helmfile.d/services/thumbor/values.yaml` on deploy1003 to unblock git-pull timer
- 18:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2002.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2002.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:14 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-misc2001 to codfw - jhancock@cumin2002"
- 18:14 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-misc2001 to codfw - jhancock@cumin2002"
- 18:10 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 17:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 17:29 swfrench@deploy2002: Finished scap sync-world: Testing scap after mw-debug next bring-up - T372604 (duration: 02m 45s)
- 17:26 swfrench@deploy2002: Started scap sync-world: Testing scap after mw-debug next bring-up - T372604
- 17:12 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 17:12 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 17:06 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 17:06 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 16:26 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2001.codfw.wmnet with OS bookworm
- 16:24 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
- 16:16 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2002.codfw.wmnet with OS bookworm
- 16:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance
- 16:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance
- 15:59 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance
- 15:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance
- 15:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance
- 15:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance
- 15:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on puppetserver1003.eqiad.wmnet with reason: RAM expansion
- 15:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on puppetserver1003.eqiad.wmnet with reason: RAM expansion
- 15:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on puppetserver1002.eqiad.wmnet with reason: RAM expansion
- 15:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on puppetserver1002.eqiad.wmnet with reason: RAM expansion
- 15:13 jclark@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts puppetmaster1001.eqiad.wmnet
- 15:13 jclark@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts puppetmaster1001.eqiad.wmnet
- 15:00 papaul: ongoing maintenance on mr1-esams
- 14:43 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
- 14:40 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
- 14:18 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestage2002.codfw.wmnet with OS bookworm
- 14:16 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wikikube-worker2092.codfw.wmnet with reason: Degraded RAID
- 14:16 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wikikube-worker2092.codfw.wmnet with reason: Degraded RAID
- 13:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 (T367856)', diff saved to https://phabricator.wikimedia.org/P69489 and previous config saved to /var/cache/conftool/dbconfig/20241007-134950-ladsgroup.json
- 13:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1203.eqiad.wmnet with reason: Maintenance
- 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2036.codfw.wmnet
- 13:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1203.eqiad.wmnet with reason: Maintenance
- 13:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T367856)', diff saved to https://phabricator.wikimedia.org/P69488 and previous config saved to /var/cache/conftool/dbconfig/20241007-134929-ladsgroup.json
- 13:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2036.codfw.wmnet
- 13:37 vgutierrez: switching to digicert-2024 certificates on esams, eqsin, drmrs and magru
- 13:36 Lucas_WMDE: UTC afternoon backport+config window done
- 13:35 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Update globalblocks 'gb_address' index to allow autoblocks (T376052) (duration: 06m 49s)
- 13:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P69487 and previous config saved to /var/cache/conftool/dbconfig/20241007-133422-ladsgroup.json
- 13:31 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
- 13:30 dreamyjazz@deploy2002: dreamyjazz: Backport for Update globalblocks 'gb_address' index to allow autoblocks (T376052) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:28 dreamyjazz@deploy2002: Started scap sync-world: Backport for Update globalblocks 'gb_address' index to allow autoblocks (T376052)
- 13:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P69486 and previous config saved to /var/cache/conftool/dbconfig/20241007-131915-ladsgroup.json
- 13:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2035.codfw.wmnet to cluster codfw and group C
- 13:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2035.codfw.wmnet to cluster codfw and group C
- 13:10 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for scandium is being replaced by parsoidtest1001 (T363402) (duration: 07m 14s)
- 13:05 lucaswerkmeister-wmde@deploy2002: arlolra, lucaswerkmeister-wmde: Continuing with sync
- 13:05 lucaswerkmeister-wmde@deploy2002: arlolra, lucaswerkmeister-wmde: Backport for scandium is being replaced by parsoidtest1001 (T363402) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T367856)', diff saved to https://phabricator.wikimedia.org/P69485 and previous config saved to /var/cache/conftool/dbconfig/20241007-130409-ladsgroup.json
- 13:03 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for scandium is being replaced by parsoidtest1001 (T363402)
- 13:02 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2035.codfw.wmnet to cluster codfw and group C
- 13:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2035.codfw.wmnet to cluster codfw and group C
- 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2035.codfw.wmnet
- 12:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2035.codfw.wmnet
- 12:53 Lucas_WMDE: printf 'https://en.wikipedia.org/static/images/%s\n' 'mobile/copyright/wikimaniawiki-wordmark.svg' 'project-logos/wikimaniawiki-1.5x.png' 'project-logos/wikimaniawiki-2x.png' 'project-logos/wikimaniawiki.png' 'icons/wikimaniawiki.svg' | mwscript-k8s --attach -- purgeList enwiki # T376292
- 12:03 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
- 12:02 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
- 11:29 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 11:29 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 11:25 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 11:25 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 11:16 vgutierrez: uploaded golang-github-mtchavez-jenkins 1.0.0 to apt.wm.o (bookworm-wikimedia) - T376600
- 11:04 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 100%: T374215', diff saved to https://phabricator.wikimedia.org/P69484 and previous config saved to /var/cache/conftool/dbconfig/20241007-110430-arnaudb.json
- 10:52 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage2002.codfw.wmnet
- 10:50 Dreamy_Jazz: Started 2 day scan on enwiki for MediaModeration to catchup with monthly request limit - https://wikitech.wikimedia.org/wiki/MediaModeration
- 10:49 Dreamy_Jazz: Started MediaModeration scanning script after it crashed for commonswiki - https://wikitech.wikimedia.org/wiki/MediaModeration
- 10:49 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage2002.codfw.wmnet
- 10:49 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 75%: T374215', diff saved to https://phabricator.wikimedia.org/P69483 and previous config saved to /var/cache/conftool/dbconfig/20241007-104925-arnaudb.json
- 10:47 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2001.codfw.wmnet
- 10:47 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2001.codfw.wmnet
- 10:34 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 50%: T374215', diff saved to https://phabricator.wikimedia.org/P69482 and previous config saved to /var/cache/conftool/dbconfig/20241007-103420-arnaudb.json
- 10:19 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 25%: T374215', diff saved to https://phabricator.wikimedia.org/P69481 and previous config saved to /var/cache/conftool/dbconfig/20241007-101914-arnaudb.json
- 10:17 vgutierrez: uploaded golang-github-cloudflare-ipvs 0.10.2 to apt.wm.o (bookworm-wikimedia) - T376600
- 10:13 moritzm: installing Linux 6.1.112 on Bookworm systems
- 10:11 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
- 10:10 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
- 10:04 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 10%: T374215', diff saved to https://phabricator.wikimedia.org/P69480 and previous config saved to /var/cache/conftool/dbconfig/20241007-100410-arnaudb.json
- 10:00 vgutierrez: uploaded golang-github-flyingmutant-rapid 1.1.0 to apt.wm.o (bookworm-wikimedia) - T376600
- 09:49 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 5%: T374215', diff saved to https://phabricator.wikimedia.org/P69478 and previous config saved to /var/cache/conftool/dbconfig/20241007-094904-arnaudb.json
- 09:34 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 2%: T374215', diff saved to https://phabricator.wikimedia.org/P69477 and previous config saved to /var/cache/conftool/dbconfig/20241007-093359-arnaudb.json
- 09:30 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: maintenance
- 09:30 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: maintenance
- 09:27 arnaudb@cumin1002: dbctl commit (dc=all): 'missing commit', diff saved to https://phabricator.wikimedia.org/P69476 and previous config saved to /var/cache/conftool/dbconfig/20241007-092714-arnaudb.json
- 09:19 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 1%: T374215', diff saved to https://phabricator.wikimedia.org/P69474 and previous config saved to /var/cache/conftool/dbconfig/20241007-091953-arnaudb.json
- 09:18 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 1%: T374215', diff saved to https://phabricator.wikimedia.org/P69473 and previous config saved to /var/cache/conftool/dbconfig/20241007-091854-arnaudb.json
- 09:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1233.eqiad.wmnet onto db1246.eqiad.wmnet
- 08:37 aqu@deploy2002: Finished deploy [airflow-dags/analytics@1699d34]: Refine staging fixes [airflow-dags@1699d34f] (duration: 04m 43s)
- 08:32 aqu@deploy2002: Started deploy [airflow-dags/analytics@1699d34]: Refine staging fixes [airflow-dags@1699d34f]
- 08:24 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503] (duration: 00m 13s)
- 08:24 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503]
- 08:02 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503] (duration: 00m 18s)
- 08:02 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
- 08:02 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503]
- 08:02 jelto@deploy2002: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
- 08:01 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
- 08:01 jelto@deploy2002: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
- 08:00 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1002.eqiad.wmnet
- 07:57 aborrero@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudgw1002.eqiad.wmnet
- 07:56 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db1233.eqiad.wmnet onto db1246.eqiad.wmnet
- 07:56 arnaudb@cumin1002: dbctl commit (dc=all): 'T374215 db1233 depool as clone source for db1246', diff saved to https://phabricator.wikimedia.org/P69471 and previous config saved to /var/cache/conftool/dbconfig/20241007-075611-arnaudb.json
- 07:56 hashar: UTC morning backport window completed
- 07:54 hashar@deploy2002: Finished scap sync-world: Backport for logos: Sync config.yaml and logos.php (T374430), hawiki: Add temporary logo (T376049) (duration: 11m 19s)
- 07:49 hashar@deploy2002: ammarpad, hashar: Continuing with sync
- 07:45 hashar@deploy2002: ammarpad, hashar: Backport for logos: Sync config.yaml and logos.php (T374430), hawiki: Add temporary logo (T376049) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:43 hashar@deploy2002: Started scap sync-world: Backport for logos: Sync config.yaml and logos.php (T374430), hawiki: Add temporary logo (T376049)
- 07:42 hashar@deploy2002: Finished scap sync-world: Backport for Revert "wikimaniawiki: Update logos to 2024" (duration: 21m 40s)
- 07:04 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
- 07:04 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64315
- 07:04 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 64315
- 07:04 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
2024-10-06
2024-10-05
- 19:43 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 16:45 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
- 16:41 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
- 16:40 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
- 16:37 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
- 16:36 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
- 16:36 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
- 13:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 (T367856)', diff saved to https://phabricator.wikimedia.org/P69470 and previous config saved to /var/cache/conftool/dbconfig/20241005-133058-ladsgroup.json
- 13:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1192.eqiad.wmnet with reason: Maintenance
- 13:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1192.eqiad.wmnet with reason: Maintenance
- 13:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T367856)', diff saved to https://phabricator.wikimedia.org/P69469 and previous config saved to /var/cache/conftool/dbconfig/20241005-133036-ladsgroup.json
- 13:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P69468 and previous config saved to /var/cache/conftool/dbconfig/20241005-131529-ladsgroup.json
- 13:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P69467 and previous config saved to /var/cache/conftool/dbconfig/20241005-130022-ladsgroup.json
- 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T367856)', diff saved to https://phabricator.wikimedia.org/P69466 and previous config saved to /var/cache/conftool/dbconfig/20241005-124515-ladsgroup.json
2024-10-04
- 17:48 ejegg: fundraising civicrm upgraded from 90199f62 to 45855ff4
- 16:21 ayounsi@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host sretest2001.codfw.wmnet
- 16:00 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.categories-reload (exit_code=99) reloading categories to wdqs-categories1001.eqiad.wmnet
- 14:29 mforns@deploy2002: Finished deploy [airflow-dags/analytics@4b69f50]: add category to commons impact metrics allowlist (duration: 01m 48s)
- 14:28 mforns@deploy2002: Started deploy [airflow-dags/analytics@4b69f50]: add category to commons impact metrics allowlist
- 13:54 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs-categories1001.eqiad.wmnet
- 13:33 bking@cumin2002: END (ERROR) - Cookbook sre.wdqs.categories-reload (exit_code=97) reloading categories to wdqs-categories1001.eqiad.wmnet
- 13:32 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs-categories1001.eqiad.wmnet
- 13:19 ayounsi@cumin1002: START - Cookbook sre.hosts.dhcp for host sretest2001.codfw.wmnet
- 12:00 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@9096f1b] (releasing): (no justification provided) (duration: 01m 13s)
- 11:59 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@9096f1b] (releasing): (no justification provided)
- 11:47 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@9096f1b] (releasing): (no justification provided) (duration: 00m 47s)
- 11:46 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@9096f1b] (releasing): (no justification provided)
- 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2004.wikimedia.org
- 10:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2004.wikimedia.org
- 10:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1004.wikimedia.org
- 10:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1004.wikimedia.org
- 10:07 moritzm: upload ircstream 0.13.0+sse12u1 to apt.wikimedia.org bookworm/ircstream-sse component (seperate build using the experimental eventstream feature branch of ircstream) T376014
- 09:43 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database shnwikinews (T375432)
- 09:35 moritzm: upload ircstream 0.13.0+wmf12u1 to apt.wikimedia.org T376014
- 09:18 btullis@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database shnwikinews (T375432)
- 09:17 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database kgewiki (T374814)
- 09:17 btullis@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database kgewiki (T374814)
- 09:17 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database gorwikiquote (T375094)
- 09:16 btullis@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database gorwikiquote (T375094)
- 09:16 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database madwiktionary (T375023)
- 09:16 btullis@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database madwiktionary (T375023)
- 09:15 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database moswiki (T375568)
- 09:15 btullis@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database moswiki (T375568)
- 09:09 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
- 08:58 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
- 07:51 oblivian@puppetserver1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=kubernetes,name=mw1439.eqiad.wmnet
- 07:51 oblivian@puppetserver1001: conftool action : set/weight=1; selector: dc=eqiad,cluster=kubernetes,name=mw1439.eqiad.wmnet
- 07:30 hashar: upgrading Jenkins on CI Jenkins
- 07:04 moritzm: import jenkins 2.462.3 to thirdparty/ci T376449
- 01:45 ejegg: payments-wiki upgraded from e88750e6 to ed2d78b3
2024-10-03
- 22:37 brennen@deploy2002: Finished scap sync-world: Backport for Revert "Turn on Parsoid Selective Update metrics" (T376433) (duration: 07m 04s)
- 22:33 brennen@deploy2002: brennen: Continuing with sync
- 22:32 brennen@deploy2002: brennen: Backport for Revert "Turn on Parsoid Selective Update metrics" (T376433) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:30 brennen@deploy2002: Started scap sync-world: Backport for Revert "Turn on Parsoid Selective Update metrics" (T376433)
- 22:18 brennen@deploy2002: scap failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.43.0-wmf.25 --multiversion-image-name docker-registry.discovery.wmnet/restricted/mediawiki-multiversion --multiversion-debug-image-name docker-registry.discovery.wmnet/restricted/m
- 22:18 brennen@deploy2002: Started scap sync-world: Backport for Revert "Turn on Parsoid Selective Update metrics" (T376433)
- 22:15 brennen@deploy2002: scap failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.43.0-wmf.25 --multiversion-image-name docker-registry.discovery.wmnet/restricted/mediawiki-multiversion --multiversion-debug-image-name docker-registry.discovery.wmnet/restricted/m
- 22:15 brennen@deploy2002: Started scap sync-world: Backport for Revert "Turn on Parsoid Selective Update metrics" (T376433)
- 21:39 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.categories-reload (exit_code=99) reloading categories to wdqs-categories1001.eqiad.wmnet
- 21:39 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs-categories1001.eqiad.wmnet
- 21:28 brennen: end of UTC late backport & config window
- 21:28 brennen@deploy2002: Finished scap sync-world: Backport for Turn on Parsoid Selective Update metrics (T371713) (duration: 15m 30s)
- 21:23 brennen@deploy2002: cscott, brennen: Continuing with sync
- 21:15 brennen@deploy2002: cscott, brennen: Backport for Turn on Parsoid Selective Update metrics (T371713) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:13 brennen@deploy2002: Started scap sync-world: Backport for Turn on Parsoid Selective Update metrics (T371713)
- 21:11 brennen@deploy2002: Finished scap sync-world: Backport for RefreshLinksJob: Fix exception due to null/false confusion (take 2) (duration: 10m 09s)
- 21:06 brennen@deploy2002: cscott, brennen: Continuing with sync
- 21:02 brennen@deploy2002: cscott, brennen: Backport for RefreshLinksJob: Fix exception due to null/false confusion (take 2) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:00 brennen@deploy2002: Started scap sync-world: Backport for RefreshLinksJob: Fix exception due to null/false confusion (take 2)
- 20:56 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1022.eqiad.wmnet with OS bullseye
- 20:44 brennen@deploy2002: Finished scap sync-world: Backport for Update jquery.ime from upstream (duration: 09m 25s)
- 20:39 brennen@deploy2002: brennen, amire80: Continuing with sync
- 20:37 brennen@deploy2002: brennen, amire80: Backport for Update jquery.ime from upstream synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:34 brennen@deploy2002: Started scap sync-world: Backport for Update jquery.ime from upstream
- 20:02 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.categories-reload (exit_code=99) reloading categories to wdqs-categories1001.eqiad.wmnet
- 20:02 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs-categories1001.eqiad.wmnet
- 19:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 19:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 19:51 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.categories-reload (exit_code=99) reloading categories to wdqs-categories1001.eqiad.wmnet
- 19:50 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs-categories1001.eqiad.wmnet
- 19:49 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.categories-reload (exit_code=99) reloading categories to wdqs-categories1001.eqiad.wmnet
- 19:48 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs-categories1001.eqiad.wmnet
- 19:42 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host aqs1022.eqiad.wmnet with OS bullseye
- 19:36 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.categories-reload (exit_code=99) reloading categories to wdqs-categories1001.eqiad.wmnet
- 19:35 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs-categories1001.eqiad.wmnet
- 19:28 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@a3efe93] (wcqs): Deploy 0.3.148 to WCQS (duration: 03m 02s)
- 19:25 ryankemper@deploy2002: Started deploy [wdqs/wdqs@a3efe93] (wcqs): Deploy 0.3.148 to WCQS
- 19:25 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
- 19:25 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
- 19:22 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@a3efe93]: 0.3.148 (duration: 08m 42s)
- 19:18 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 19:18 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 19:16 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 19:14 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.148` on canary `wdqs1016`; proceeding to rest of fleet
- 19:14 ryankemper@deploy2002: Started deploy [wdqs/wdqs@a3efe93]: 0.3.148
- 19:13 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.148`. Pre-deploy tests passing on canary `wdqs1016`
- 19:09 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 19:09 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 19:05 dduvall@deploy2002: Installing scap version "4.109.0" for 210 hosts
- 18:51 cmooney@cumin1002: conftool action : set/pooled=yes; selector: name=dns1005.wikimedia.org [reason: testing T344171]
- 18:43 xcollazo@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 18:43 xcollazo@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
- 18:31 cstone: SmashPig upgraded from df2a9c42 to eaa176f7
- 18:28 sukhe: depool dns1005 for all services for testing T344171
- 18:00 mutante: codesearch - ran out of disk due to 11G /var/log/account/pacct file - manually ran /etc/cron.daily/acct to rotate it, then deleted old file, back to 39% disk usage
- 17:41 mutante: codesearch was broken - VM was down - rebooted - restarting all the indices is a bit slow but mostly back up now
- 17:13 swfrench@deploy2002: Finished scap sync-world: Testing after mediawiki-deployments.yaml format change - T370934 (duration: 02m 50s)
- 17:11 swfrench@deploy2002: Started scap sync-world: Testing after mediawiki-deployments.yaml format change - T370934
- 15:58 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T364077, testing new flag; this should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
- 15:53 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 59.75.192.10.in-addr.arpa on all recursors
- 15:53 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache 59.75.192.10.in-addr.arpa on all recursors
- 15:53 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, testing new flag; this should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
- 15:52 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new flag; this should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
- 15:52 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, testing new flag; this should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
- 15:51 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
- 15:51 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
- 15:50 topranks: merging patch to add k8s pod IP range reverse delegations to dns T376291
- 15:47 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
- 15:47 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
- 15:46 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
- 15:46 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
- 15:46 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
- 15:45 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
- 15:45 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
- 15:45 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
- 15:36 papaul: Junos upgrade on mr1-codfw complete
- 15:00 papaul: ongoing Junos upgrade on mr1-codfw
- 14:56 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@b715af7]: Deploy latest DAGs to the analytics Airflow instance. T373694. T375402 (duration: 03m 33s)
- 14:52 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@b715af7]: Deploy latest DAGs to the analytics Airflow instance. T373694. T375402
- 14:31 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host aqs1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:31 jclark@cumin1002: START - Cookbook sre.hosts.provision for host aqs1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:30 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host aqs1022
- 14:29 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host aqs1022
- 14:29 jclark@cumin1002: END (ERROR) - Cookbook sre.network.configure-switch-interfaces (exit_code=97) for host aqs1022
- 14:28 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host aqs1022
- 14:28 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:28 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt aqs1022 - jclark@cumin1002"
- 14:26 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt aqs1022 - jclark@cumin1002"
- 14:23 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 13:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 13:54 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 13:46 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2004.wikimedia.org
- 13:42 elukey@cumin1002: START - Cookbook sre.hosts.reboot-single for host irc2004.wikimedia.org
- 13:40 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc2004.wikimedia.org
- 13:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host irc2004.wikimedia.org with OS bookworm
- 13:32 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1176.eqiad.wmnet with OS bullseye
- 13:31 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1176.eqiad.wmnet with OS bullseye
- 13:30 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1176.eqiad.wmnet with OS bullseye
- 13:26 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on irc2004.wikimedia.org with reason: host reimage
- 13:23 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on irc2004.wikimedia.org with reason: host reimage
- 13:10 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host irc2004.wikimedia.org with OS bookworm
- 13:09 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM irc2004.wikimedia.org - elukey@cumin1002"
- 13:09 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM irc2004.wikimedia.org - elukey@cumin1002"
- 13:09 elukey@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) irc2004.wikimedia.org on all recursors
- 13:09 elukey@cumin1002: START - Cookbook sre.dns.wipe-cache irc2004.wikimedia.org on all recursors
- 13:09 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:09 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc2004.wikimedia.org - elukey@cumin1002"
- 13:08 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc2004.wikimedia.org - elukey@cumin1002"
- 13:00 elukey@cumin1002: START - Cookbook sre.dns.netbox
- 13:00 elukey@cumin1002: START - Cookbook sre.ganeti.makevm for new host irc2004.wikimedia.org
- 12:20 urbanecm@deploy2002: Finished scap sync-world: Backport for ReassignMenteesJob: Do not schedule follow-up jobs when first job fails (T376124) (duration: 06m 47s)
- 12:14 urbanecm@deploy2002: Started scap sync-world: Backport for ReassignMenteesJob: Do not schedule follow-up jobs when first job fails (T376124)
- 12:13 urbanecm@deploy2002: scap failed: <UnboundLocalError> local variable 'e' referenced before assignment (scap version: 4.108.0-1) (duration: 08m 02s)
- 12:13 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 12:09 elukey@cumin1002: START - Cookbook sre.hosts.provision for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 12:05 urbanecm@deploy2002: Started scap sync-world: Backport for ReassignMenteesJob: Do not schedule follow-up jobs when first job fails (T376124)
- 12:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-hd2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 12:02 elukey@cumin1002: START - Cookbook sre.hosts.provision for host logging-hd2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 11:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 (T367856)', diff saved to https://phabricator.wikimedia.org/P69458 and previous config saved to /var/cache/conftool/dbconfig/20241003-111544-ladsgroup.json
- 11:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
- 11:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
- 11:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T367856)', diff saved to https://phabricator.wikimedia.org/P69457 and previous config saved to /var/cache/conftool/dbconfig/20241003-111522-ladsgroup.json
- 11:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P69456 and previous config saved to /var/cache/conftool/dbconfig/20241003-110015-ladsgroup.json
- 10:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P69454 and previous config saved to /var/cache/conftool/dbconfig/20241003-104508-ladsgroup.json
- 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T367856)', diff saved to https://phabricator.wikimedia.org/P69453 and previous config saved to /var/cache/conftool/dbconfig/20241003-103001-ladsgroup.json
- 10:29 urbanecm@deploy2002: Finished scap sync-world: Backport for Backport ReassignMenteesJob-related changes (T376124) (duration: 06m 54s)
- 10:29 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 10:25 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 10:22 urbanecm@deploy2002: Started scap sync-world: Backport for Backport ReassignMenteesJob-related changes (T376124)
- 10:11 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 10:08 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 10:06 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 10:06 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 10:04 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM irc1004.wikimedia.org
- 10:00 kcvelaga@deploy2002: Finished deploy [airflow-dags/analytics_product@b715af7]: T375153 (duration: 02m 44s)
- 10:00 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM irc1004.wikimedia.org
- 09:58 kcvelaga@deploy2002: Started deploy [airflow-dags/analytics_product@b715af7]: T375153
- 09:42 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
- 09:41 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
- 09:38 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
- 09:38 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
- 09:35 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 09:35 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 08:36 hashar@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.43.0-wmf.25 refs T375656
- 08:25 hashar@deploy2002: Finished scap sync-world: Backport for Deprecate ParserOutput::setLanguageLinks(null) (T376323) (duration: 07m 07s)
- 08:20 hashar@deploy2002: hashar, cscott: Continuing with sync
- 08:20 hashar@deploy2002: hashar, cscott: Backport for Deprecate ParserOutput::setLanguageLinks(null) (T376323) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:18 hashar@deploy2002: Started scap sync-world: Backport for Deprecate ParserOutput::setLanguageLinks(null) (T376323)
- 08:14 hashar@deploy2002: Finished scap sync-world: Backport for bjnwiki: Update logo (T375055), bjnwiktionary: Add logo (T374898) (duration: 08m 37s)
- 08:09 hashar@deploy2002: hashar, hamishz: Continuing with sync
- 08:07 hashar@deploy2002: hashar, hamishz: Backport for bjnwiki: Update logo (T375055), bjnwiktionary: Add logo (T374898) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:05 hashar@deploy2002: Started scap sync-world: Backport for bjnwiki: Update logo (T375055), bjnwiktionary: Add logo (T374898)
- 08:03 hashar: Ran `mwscript resetAuthenticationThrottle.php --signup --ip 14.139.82.6` for `metawiki`, `mediawikiwiki` and `wikidatawiki` # T375794
- 07:59 hashar@deploy2002: Finished scap sync-world: Backport for throttle.php: Remove expired throttle, IP limit exemption for WTS 2024 (T375794) (duration: 08m 41s)
- 07:54 hashar@deploy2002: anzx, hamishz, hashar: Continuing with sync
- 07:53 hashar@deploy2002: anzx, hamishz, hashar: Backport for throttle.php: Remove expired throttle, IP limit exemption for WTS 2024 (T375794) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:50 hashar@deploy2002: Started scap sync-world: Backport for throttle.php: Remove expired throttle, IP limit exemption for WTS 2024 (T375794)
- 07:17 kartik@deploy2002: Finished scap sync-world: Backport for Section Translation: Add mos, kde and rsk Wikipedias (T375017 T374815 T374644) (duration: 10m 39s)
- 07:12 kartik@deploy2002: kartik: Continuing with sync
- 07:08 kartik@deploy2002: kartik: Backport for Section Translation: Add mos, kde and rsk Wikipedias (T375017 T374815 T374644) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:06 kartik@deploy2002: Started scap sync-world: Backport for Section Translation: Add mos, kde and rsk Wikipedias (T375017 T374815 T374644)
- 06:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 06:50 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
2024-10-02
- 23:47 urbanecm@deploy2002: Finished scap sync-world: Backport for Revert "logging: Enable logging for debug GrowthExperiments events" (T376124) (duration: 07m 07s)
- 23:39 urbanecm@deploy2002: Started scap sync-world: Backport for Revert "logging: Enable logging for debug GrowthExperiments events" (T376124)
- 22:35 urbanecm@deploy2002: Finished scap sync-world: Backport for logging: Enable logging for debug GrowthExperiments events (T376124) (duration: 06m 52s)
- 22:28 urbanecm@deploy2002: Started scap sync-world: Backport for logging: Enable logging for debug GrowthExperiments events (T376124)
- 21:55 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs-categories1001.eqiad.wmnet with reason: T375687
- 21:54 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs-categories1001.eqiad.wmnet with reason: T375687
- 21:24 mutante: phab1004 - link=$(/usr/bin/readlink -f /srv/phab) ; /usr/bin/git config -f /etc/gitconfig.d/10-phab-deploy-safedir.gitconfig --add safe.directory $link ; /bin/cat /etc/gitconfig.d/*.gitconfig > /etc/gitconfig - T360756
- 20:57 eileen: civicrm upgraded from 28fd5e3b to 90199f62
- 20:01 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-misc1001.eqiad.wmnet with OS bookworm
- 20:01 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 20:00 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 19:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-misc1002.eqiad.wmnet with OS bookworm
- 19:58 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 19:57 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 19:45 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-misc1001.eqiad.wmnet with reason: host reimage
- 19:42 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-misc1002.eqiad.wmnet with reason: host reimage
- 19:38 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-misc1001.eqiad.wmnet with reason: host reimage
- 19:38 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-misc1002.eqiad.wmnet with reason: host reimage
- 19:27 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-misc1002.eqiad.wmnet with OS bookworm
- 19:26 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-misc1001.eqiad.wmnet with OS bookworm
- 19:23 cstone: SmashPig upgraded from 715e91fa to df2a9c42
- 19:21 brett: cumin -b11 "A:cp" "run-puppet-agent --enable 'rolling out 1038884'"
- 19:16 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
- 19:15 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
- 19:13 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp4041.ulsfo.wmnet
- 19:06 brett@cumin2002: conftool action : set/pooled=no; selector: name=cp4041.ulsfo.wmnet
- 18:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudlb2004-dev']
- 18:23 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
- 18:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 18:21 denisse@deploy2002: Finished deploy [librenms/librenms@f049593]: Upgrade LibreNMS to 24.9.1 - T376256 (duration: 00m 12s)
- 18:21 denisse@deploy2002: Started deploy [librenms/librenms@f049593]: Upgrade LibreNMS to 24.9.1 - T376256
- 18:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 18:10 hashar@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.43.0-wmf.25 refs T375656
- 18:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
- 17:22 aokoth@cumin1002: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet
- 17:20 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet
- 17:02 aokoth@cumin1002: END (FAIL) - Cookbook sre.vrts.upgrade (exit_code=93) on VRTS host vrts1003.eqiad.wmnet
- 17:02 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet
- 17:01 btullis@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1019.eqiad.wmnet
- 17:00 urbanecm@deploy2002: Finished scap sync-world: Backport for ReassignMentees: Add additional logging (T376124), ReassignMentees: Add additional logging (T376124) (duration: 14m 42s)
- 16:58 btullis@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1019.eqiad.wmnet
- 16:56 urbanecm@deploy2002: urbanecm: Continuing with sync
- 16:50 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts alert[1001,2001].wikimedia.org
- 16:50 denisse@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:50 denisse@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: alert[1001,2001].wikimedia.org decommissioned, removing all IPs except the asset tag one - denisse@cumin2002"
- 16:49 denisse@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: alert[1001,2001].wikimedia.org decommissioned, removing all IPs except the asset tag one - denisse@cumin2002"
- 16:48 urbanecm@deploy2002: urbanecm: Backport for ReassignMentees: Add additional logging (T376124), ReassignMentees: Add additional logging (T376124) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 16:46 denisse@cumin2002: START - Cookbook sre.dns.netbox
- 16:46 urbanecm@deploy2002: Started scap sync-world: Backport for ReassignMentees: Add additional logging (T376124), ReassignMentees: Add additional logging (T376124)
- 16:38 denisse@cumin2002: START - Cookbook sre.hosts.decommission for hosts alert[1001,2001].wikimedia.org
- 16:33 taavi: start extensions/GlobalUsage/maintenance/refreshGlobalimagelinks.php on labswiki to backfill global usage information
- 16:31 taavi@deploy2002: Finished scap sync-world: Backport for Add wikitech.wikimedia.org to $wgCrossSiteAJAXdomains, logging: Remove unused global $wmgMonologProcessors, Remove references to removed wikitech.php (duration: 07m 13s)
- 16:31 btullis@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
- 16:27 denisse@cumin2002: START - Cookbook sre.hosts.decommission for hosts alert[1001,2001].wikimedia.org
- 16:27 denisse: Running the sre.hosts.decommission cookbook on the alert1001, and alert2001 hosts - T372607
- 16:27 taavi@deploy2002: matmarex, taavi: Continuing with sync
- 16:26 taavi@deploy2002: matmarex, taavi: Backport for Add wikitech.wikimedia.org to $wgCrossSiteAJAXdomains, logging: Remove unused global $wmgMonologProcessors, Remove references to removed wikitech.php synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 16:24 taavi@deploy2002: Started scap sync-world: Backport for Add wikitech.wikimedia.org to $wgCrossSiteAJAXdomains, logging: Remove unused global $wmgMonologProcessors, Remove references to removed wikitech.php
- 16:16 taavi@deploy2002: Finished scap sync-world: Backport for reverse-proxy: Drop all public ips except cloudweb2002-dev.codfw.wmnet (T292707) (duration: 07m 01s)
- 16:11 taavi@deploy2002: zabe, taavi: Continuing with sync
- 16:11 taavi@deploy2002: zabe, taavi: Backport for reverse-proxy: Drop all public ips except cloudweb2002-dev.codfw.wmnet (T292707) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 16:09 taavi@deploy2002: Started scap sync-world: Backport for reverse-proxy: Drop all public ips except cloudweb2002-dev.codfw.wmnet (T292707)
- 16:03 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views
- 16:03 bking@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host wdqs-categories1001.eqiad.wmnet
- 16:03 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs-categories1001.eqiad.wmnet with OS bullseye
- 15:46 jelto@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
- 15:45 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
- 15:43 cdanis@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 15:43 cdanis@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 15:41 jelto@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
- 15:41 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
- 15:38 cdanis@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 15:38 cdanis@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 15:37 cdanis@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 15:36 cdanis@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 15:36 cdanis@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 15:36 cdanis@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 15:36 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 15:35 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 15:35 cdanis@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 15:34 cdanis@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 15:33 cdanis@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 15:33 cdanis@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
- 15:31 cdanis@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 15:31 cdanis@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 15:30 kcvelaga@deploy2002: Finished deploy [airflow-dags/analytics_product@3a7901e]: T375153 (duration: 01m 59s)
- 15:28 swfrench@cumin1002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None
- 15:28 swfrench@cumin1002: START - Cookbook sre.discovery.datacenter status all services in all: None - None
- 15:28 kcvelaga@deploy2002: Started deploy [airflow-dags/analytics_product@3a7901e]: T375153
- 15:27 swfrench@cumin1002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in eqiad: Datacenter Switchover - T370962
- 15:26 dancy@deploy2002: Finished scap sync-world: Testing T370934 (duration: 03m 19s)
- 15:24 jelto@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
- 15:23 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
- 15:22 dancy@deploy2002: Started scap sync-world: Testing T370934
- 15:18 dancy@deploy2002: Installation of scap version "4.108.0" completed for 210 hosts
- 15:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on registry1004.eqiad.wmnet with reason: testing
- 15:14 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on registry1004.eqiad.wmnet with reason: testing
- 15:13 dancy@deploy2002: Installing scap version "4.108.0" for 210 hosts
- 15:12 cdanis@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 15:12 cdanis@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 15:07 swfrench@cumin1002: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: Datacenter Switchover - T370962
- 15:07 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-hd2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:04 elukey@cumin1002: START - Cookbook sre.hosts.provision for host logging-hd2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:00 swfrench@cumin1002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None
- 15:00 swfrench@cumin1002: START - Cookbook sre.discovery.datacenter status all services in all: None - None
- 14:59 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 14:56 elukey@cumin1002: START - Cookbook sre.hosts.provision for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 14:51 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs-categories1001.eqiad.wmnet with OS bullseye
- 14:46 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM wdqs-categories1001.eqiad.wmnet - bking@cumin2002"
- 14:46 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM wdqs-categories1001.eqiad.wmnet - bking@cumin2002"
- 14:45 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs-categories1001.eqiad.wmnet on all recursors
- 14:45 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs-categories1001.eqiad.wmnet on all recursors
- 14:45 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:45 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM wdqs-categories1001.eqiad.wmnet - bking@cumin2002"
- 14:44 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM wdqs-categories1001.eqiad.wmnet - bking@cumin2002"
- 14:40 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc1004.wikimedia.org
- 14:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host irc1004.wikimedia.org with OS bookworm
- 14:30 bking@cumin2002: START - Cookbook sre.dns.netbox
- 14:30 bking@cumin2002: START - Cookbook sre.ganeti.makevm for new host wdqs-categories1001.eqiad.wmnet
- 14:29 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2001.codfw.wmnet with OS bookworm
- 14:26 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on irc1004.wikimedia.org with reason: host reimage
- 14:22 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on irc1004.wikimedia.org with reason: host reimage
- 14:21 urbanecm@deploy2002: Finished scap sync-world: Backport for labswiki: Disallow account autocreation (T161859) (duration: 07m 38s)
- 14:17 urbanecm@deploy2002: urbanecm: Continuing with sync
- 14:16 urbanecm@deploy2002: urbanecm: Backport for labswiki: Disallow account autocreation (T161859) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:14 urbanecm@deploy2002: Started scap sync-world: Backport for labswiki: Disallow account autocreation (T161859)
- 14:12 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host irc1004.wikimedia.org with OS bookworm
- 14:11 hashar@deploy2002: Finished scap sync-world: Backport for Remove Maintenance check (T376255) (duration: 07m 27s)
- 14:08 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM irc1004.wikimedia.org - elukey@cumin1002"
- 14:08 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM irc1004.wikimedia.org - elukey@cumin1002"
- 14:07 elukey@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) irc1004.wikimedia.org on all recursors
- 14:07 elukey@cumin1002: START - Cookbook sre.dns.wipe-cache irc1004.wikimedia.org on all recursors
- 14:07 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:07 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc1004.wikimedia.org - elukey@cumin1002"
- 14:07 hashar@deploy2002: hashar: Continuing with sync
- 14:06 hashar@deploy2002: hashar: Backport for Remove Maintenance check (T376255) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:06 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc1004.wikimedia.org - elukey@cumin1002"
- 14:04 hashar@deploy2002: Started scap sync-world: Backport for Remove Maintenance check (T376255)
- 14:03 hashar@deploy2002: Sync cancelled.
- 14:03 hashar@deploy2002: hashar: Backport for Remove Maintenance check (T376255) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:03 elukey@cumin1002: START - Cookbook sre.dns.netbox
- 14:03 elukey@cumin1002: START - Cookbook sre.ganeti.makevm for new host irc1004.wikimedia.org
- 14:01 hashar@deploy2002: Started scap sync-world: Backport for Remove Maintenance check (T376255)
- 13:31 Lucas_WMDE: UTC afternoon backport+config window done
- 13:28 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Improve sub-ref check to avoid false positives (T376242) (duration: 10m 32s)
- 13:24 lucaswerkmeister-wmde@deploy2002: wmde-fisch, lucaswerkmeister-wmde: Continuing with sync
- 13:20 lucaswerkmeister-wmde@deploy2002: wmde-fisch, lucaswerkmeister-wmde: Backport for Improve sub-ref check to avoid false positives (T376242) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:18 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Improve sub-ref check to avoid false positives (T376242)
- 13:17 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [zhwiki] Enable the CampaignEvents extension (T373821) (duration: 14m 45s)
- 13:16 moritzm: upload ircstream 0.13.0~dev+wmf1 to apt.wikimedia.org bookworm/ircstream-sse component (seperate build using the experimental eventstream feature branch of ircstream) T376014
- 13:13 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 13:12 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, daimona: Continuing with sync
- 13:09 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 13:05 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, daimona: Backport for [zhwiki] Enable the CampaignEvents extension (T373821) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:02 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [zhwiki] Enable the CampaignEvents extension (T373821)
- 12:59 moritzm: upload python3-aiohttp-sse-client 0.2.1-0 to apt.wikimedia.org bookworm/ircstream-sse component (needed by the eventstream feature branch of ircstream) T376014
- 12:57 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: UEFI test
- 12:57 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: UEFI test
- 12:49 hashar@deploy2002: Finished scap sync-world: Backport for Use wgDonationInterfaceFundraiserMaintenance (T376255) (duration: 07m 01s)
- 12:45 hashar@deploy2002: hashar, zabe: Continuing with sync
- 12:45 hashar@deploy2002: hashar, zabe: Backport for Use wgDonationInterfaceFundraiserMaintenance (T376255) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 12:42 hashar@deploy2002: Started scap sync-world: Backport for Use wgDonationInterfaceFundraiserMaintenance (T376255)
- 12:39 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2001.codfw.wmnet with reason: host reimage
- 12:35 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2001.codfw.wmnet with reason: host reimage
- 12:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:14 zabe@deploy2002: Finished scap sync-world: Backport for s6: Reduce revision-slots cache expiry to 60s (T183490 T376129) (duration: 08m 50s)
- 12:13 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestage2001.codfw.wmnet with OS bookworm
- 12:12 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:11 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0)
- 12:11 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:09 zabe@deploy2002: zabe: Continuing with sync
- 12:09 zabe@deploy2002: zabe: Backport for s6: Reduce revision-slots cache expiry to 60s (T183490 T376129) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 12:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:08 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage2001.codfw.wmnet
- 12:08 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage2001.codfw.wmnet
- 12:08 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2001.codfw.wmnet
- 12:08 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2001.codfw.wmnet
- 12:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:06 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views
- 12:06 btullis@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93)
- 12:06 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views
- 12:05 zabe@deploy2002: Started scap sync-world: Backport for s6: Reduce revision-slots cache expiry to 60s (T183490 T376129)
- 12:04 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:03 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 11:58 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 11:57 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 11:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 11:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 10:57 _joe_: restarted rsyslog on kubernetes1045
- 10:46 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-etcd1005.eqiad.wmnet
- 10:46 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-etcd1005.eqiad.wmnet with OS bullseye
- 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-etcd1005.eqiad.wmnet with reason: host reimage
- 10:27 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-etcd1005.eqiad.wmnet with reason: host reimage
- 10:17 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-etcd1005.eqiad.wmnet with OS bullseye
- 10:13 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd1005.eqiad.wmnet - elukey@cumin1002"
- 10:13 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd1005.eqiad.wmnet - elukey@cumin1002"
- 10:13 elukey@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-etcd1005.eqiad.wmnet on all recursors
- 10:13 elukey@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-etcd1005.eqiad.wmnet on all recursors
- 10:13 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:13 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd1005.eqiad.wmnet - elukey@cumin1002"
- 10:11 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd1005.eqiad.wmnet - elukey@cumin1002"
- 10:04 elukey@cumin1002: START - Cookbook sre.dns.netbox
- 10:04 elukey@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-etcd1005.eqiad.wmnet
- 10:03 elukey@deploy2002: Finished scap sync-world: Backport for Add irc2003 to the irc settings (T376014) (duration: 07m 11s)
- 10:03 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-etcd1004.eqiad.wmnet
- 10:03 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-etcd1004.eqiad.wmnet with OS bullseye
- 09:59 elukey@deploy2002: elukey: Continuing with sync
- 09:58 elukey@deploy2002: elukey: Backport for Add irc2003 to the irc settings (T376014) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 09:56 elukey@deploy2002: Started scap sync-world: Backport for Add irc2003 to the irc settings (T376014)
- 09:54 elukey@deploy2002: Finished scap sync-world: Add irc2003 to the network policies (duration: 02m 15s)
- 09:53 elukey@deploy2002: Started scap sync-world: Add irc2003 to the network policies
- 09:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-etcd1004.eqiad.wmnet with reason: host reimage
- 09:47 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-etcd1004.eqiad.wmnet with reason: host reimage
- 09:44 gmodena@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 09:44 gmodena@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
- 09:43 gmodena@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 09:43 gmodena@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
- 09:42 gmodena@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 09:42 gmodena@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
- 09:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-etcd1004.eqiad.wmnet with OS bullseye
- 09:31 hashar@deploy2002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to [php-1.43.0-wmf.24]" - T375656
- 09:30 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation/Advancement/Community Growth/Community Resources" "Wikimedia Foundation/Advancement/Community Growth/Community Resources and Partnerships" "Zabe" --reason "per request T376246"
- 09:23 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd1004.eqiad.wmnet - elukey@cumin1002"
- 09:23 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd1004.eqiad.wmnet - elukey@cumin1002"
- 09:22 elukey@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-etcd1004.eqiad.wmnet on all recursors
- 09:22 elukey@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-etcd1004.eqiad.wmnet on all recursors
- 09:22 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:22 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd1004.eqiad.wmnet - elukey@cumin1002"
- 09:21 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd1004.eqiad.wmnet - elukey@cumin1002"
- 09:17 elukey@cumin1002: START - Cookbook sre.dns.netbox
- 09:17 jynus@cumin1002: dbctl commit (dc=all): 'Set es2024 to weight 10 as the rest of es-rw hosts T376249', diff saved to https://phabricator.wikimedia.org/P69443 and previous config saved to /var/cache/conftool/dbconfig/20241002-091754-jynus.json
- 09:17 elukey@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-etcd1004.eqiad.wmnet
- 09:16 elukey@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-ctrl1004.eqiad.wmnet
- 09:16 elukey@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
- 09:16 elukey@cumin1002: START - Cookbook sre.dns.netbox
- 09:16 elukey@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-ctrl1004.eqiad.wmnet
- 09:13 vgutierrez: repooling cp3071 and cp3072 after HW maintenance - T374986
- 09:08 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp[3071-3072].esams.wmnet
- 09:08 vgutierrez@cumin1002: START - Cookbook sre.hosts.remove-downtime for cp[3071-3072].esams.wmnet
- 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2003.wikimedia.org
- 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host aux-k8s-ctrl1001.eqiad.wmnet
- 08:57 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host aux-k8s-ctrl1001.eqiad.wmnet
- 08:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2003.wikimedia.org
- 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host aux-k8s-worker1001.eqiad.wmnet
- 08:55 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host aux-k8s-worker1001.eqiad.wmnet
- 08:55 kcvelaga@deploy2002: Finished deploy [airflow-dags/analytics_product@3b76c68]: (no justification provided) (duration: 00m 52s)
- 08:54 kcvelaga@deploy2002: Started deploy [airflow-dags/analytics_product@3b76c68]: (no justification provided)
- 08:36 jayme: removed the label node-role.kubernetes.io/master and the taint node-role.kubernetes.io/master:NoSchedule to all k8s apiservers - T334234
- 08:32 jayme: added the taint node-role.kubernetes.io/control-plane:NoSchedule to all k8s apiservers - T334234
- 08:29 hashar: Restarted stashbot based on instructions at https://wikitech.wikimedia.org/wiki/Tool:Stashbot
- 08:20 hashar@deploy2002: Finished scap sync-world: Backport for Metrics Platform monotable: Base stream configuration (T373967) (duration: 10m 27s)
- 08:16 hashar@deploy2002: hashar, sfaci: Continuing with sync
- 08:12 hashar@deploy2002: hashar, sfaci: Backport for Metrics Platform monotable: Base stream configuration (T373967) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:10 hashar@deploy2002: Started scap sync-world: Backport for Metrics Platform monotable: Base stream configuration (T373967)
- 07:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet
- 07:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
- 07:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org
- 07:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org
- 07:09 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp[3071-3072].esams.wmnet with reason: HW maintenance
- 07:09 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp[3071-3072].esams.wmnet with reason: HW maintenance
- 06:50 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AndyRussG out of all services on: 1497 hosts
- 06:49 root@cumin2002: START - Cookbook sre.idm.logout Logging AndyRussG out of all services on: 1497 hosts
- 06:48 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AndyRussG out of all services on: 706 hosts
- 06:48 root@cumin2002: START - Cookbook sre.idm.logout Logging AndyRussG out of all services on: 706 hosts
- 02:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2004-dev.codfw.wmnet with OS bookworm
- 01:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
- 01:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host logging-hd2005.codfw.wmnet with OS bookworm
2024-10-01
- 23:42 zabe: zabe@mwmaint2002:~$ cat /home/zabe/s3.txt | xargs -I{} bash -c "echo {}; mwscript extensions/WikimediaMaintenance/migrateESRefToContentTable.php {} --skip /home/zabe/text_table_cleanup/{} --dump /home/zabe/text_table_dump/{} --sleep 1" # T183490
- 20:34 hashar: UTC late backport window completed
- 20:28 hashar: mwscript purgeList.php --wiki=tlywiki --namespace=4 # T367009
- 20:12 hashar@deploy2002: Finished scap sync-world: Backport for Update wgMetaNamespace for tlywiki (T367009) (duration: 07m 21s)
- 20:07 hashar@deploy2002: nmw03, hashar: Continuing with sync
- 20:06 hashar@deploy2002: nmw03, hashar: Backport for Update wgMetaNamespace for tlywiki (T367009) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:04 hashar@deploy2002: Started scap sync-world: Backport for Update wgMetaNamespace for tlywiki (T367009)
- 20:02 hashar: Restarting CI Jenkins
- 19:48 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 19:47 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 17:59 ladsgroup@deploy2002: Finished scap sync-world: Backport for Allow storing of passwords for local users in wikitech (T376140) (duration: 09m 03s)
- 17:56 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 17:55 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 17:55 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 17:53 ladsgroup@deploy2002: ladsgroup: Backport for Allow storing of passwords for local users in wikitech (T376140) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 17:50 ladsgroup@deploy2002: Started scap sync-world: Backport for Allow storing of passwords for local users in wikitech (T376140)
- 17:50 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2004-dev.codfw.wmnet with OS bookworm
- 16:00 ladsgroup@deploy2002: taavi, ladsgroup: Continuing with sync
- 15:59 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, this test transfer should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
- 15:58 ladsgroup@deploy2002: taavi, ladsgroup: Backport for Make Wikitech behave a bit more like a SUL wiki (T371374) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 15:56 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, this test transfer should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
- 15:55 ladsgroup@deploy2002: Started scap sync-world: Backport for Make Wikitech behave a bit more like a SUL wiki (T371374)
- 15:54 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, this test transfer should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1023.eqiad.wmnet, repooling both afterwards
- 15:54 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, this test transfer should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1023.eqiad.wmnet, repooling both afterwards
- 15:44 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 15:39 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
- 15:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 15:07 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-ctrl1003.eqiad.wmnet
- 15:07 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-worker1003.eqiad.wmnet
- 15:05 brennen@deploy2002: Finished deploy [phabricator/deployment@33a2c8d]: deploy phab1004 for T376149 (duration: 01m 07s)
- 15:04 brennen@deploy2002: Started deploy [phabricator/deployment@33a2c8d]: deploy phab1004 for T376149
- 15:03 brennen@deploy2002: Finished deploy [phabricator/deployment@33a2c8d]: test deploy phab2002 for T376149 (duration: 00m 30s)
- 15:03 jelto@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
- 15:03 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
- 15:03 brennen@deploy2002: Started deploy [phabricator/deployment@33a2c8d]: test deploy phab2002 for T376149
- 15:02 jelto@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
- 15:02 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
- 15:02 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
- 15:01 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
- 15:01 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
- 15:01 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
- 14:45 jayme: added the taint node-role.kubernetes.io/control-plane:NoSchedule to wikikube staging apiservers - T334234
- 14:32 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logging-hd2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 14:32 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 14:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 14:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host logging-hd2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 14:15 jayme: added the label node-role.kubernetes.io/control-plane= to all k8s apiservers - T334234
- 14:10 moritzm: installing cups security updates
- 13:49 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=aux-k8s-worker1003.eqiad.wmnet
- 13:49 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=aux-k8s-ctrl1003.eqiad.wmnet
- 13:32 elukey@puppetserver1001: conftool action : set/weight=1; selector: name=aux-k8s-ctrl1003.eqiad.wmnet
- 13:32 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-ctrl1003.eqiad.wmnet
- 13:31 elukey@puppetserver1001: conftool action : set/weight=10; selector: name=aux-k8s-worker1003.eqiad.wmnet
- 13:31 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-worker1003.eqiad.wmnet
- 13:21 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
- 12:28 ladsgroup@deploy2002: Finished scap sync-world: Backport for wikitech: Allow 'crats to rename local users (T161859) (duration: 07m 51s)
- 12:23 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 12:23 Amir1: mwscript maintenance/storage/moveToExternal.php --wiki=labswiki --undo /home/ladsgroup/T376129.undo.sql DB cluster31 (T376129)
- 12:22 ladsgroup@deploy2002: ladsgroup: Backport for wikitech: Allow 'crats to rename local users (T161859) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 12:20 ladsgroup@deploy2002: Started scap sync-world: Backport for wikitech: Allow 'crats to rename local users (T161859)
- 12:17 ladsgroup@deploy2002: Finished scap sync-world: Backport for Wikitech: Connect wikitech to external storage (T376129) (duration: 09m 53s)
- 12:12 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 12:09 ladsgroup@deploy2002: ladsgroup: Backport for Wikitech: Connect wikitech to external storage (T376129) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 12:07 ladsgroup@deploy2002: Started scap sync-world: Backport for Wikitech: Connect wikitech to external storage (T376129)
- 12:02 ladsgroup@deploy2002: Finished scap sync-world: Backport for wikitech: Soft connect wikitech to SUL (T161859) (duration: 09m 53s)
- 11:57 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 11:54 ladsgroup@deploy2002: ladsgroup: Backport for wikitech: Soft connect wikitech to SUL (T161859) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 11:52 ladsgroup@deploy2002: Started scap sync-world: Backport for wikitech: Soft connect wikitech to SUL (T161859)
- 11:51 stevemunene@cumin1002: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
- 11:49 ladsgroup@deploy2002: Finished scap sync-world: Backport for Drop wikitech.php (T371592 T371374) (duration: 07m 32s)
- 11:45 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 11:44 ladsgroup@deploy2002: ladsgroup: Backport for Drop wikitech.php (T371592 T371374) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 11:42 ladsgroup@deploy2002: Started scap sync-world: Backport for Drop wikitech.php (T371592 T371374)
- 11:28 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc2003.wikimedia.org
- 11:28 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host irc2003.wikimedia.org with OS bookworm
- 11:16 effie: Switching wikitech to k8s - T292707
- 11:12 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on irc2003.wikimedia.org with reason: host reimage
- 11:09 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on irc2003.wikimedia.org with reason: host reimage
- 11:01 jiji@deploy2002: Finished scap sync-world: Backport for wikitech: de-wikitech mediawiki-config (T371537 T371592 T371374 T371359) (duration: 08m 23s)
- 10:56 jiji@deploy2002: jiji: Continuing with sync
- 10:55 jiji@deploy2002: jiji: Backport for wikitech: de-wikitech mediawiki-config (T371537 T371592 T371374 T371359) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 10:52 jiji@deploy2002: Started scap sync-world: Backport for wikitech: de-wikitech mediawiki-config (T371537 T371592 T371374 T371359)
- 10:48 jiji@deploy2002: Sync cancelled.
- 10:44 jiji@deploy2002: jiji: Backport for wikitech: de-wikitech mediawiki-config (T371537 T371592 T371374 T371359) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 10:44 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-staging2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:44 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ml-staging2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:42 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2011.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:42 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ml-serve2011.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:42 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:42 jiji@deploy2002: Started scap sync-world: Backport for wikitech: de-wikitech mediawiki-config (T371537 T371592 T371374 T371359)
- 10:41 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ml-serve2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:40 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ml-serve2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:38 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host parsoidtest1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:38 elukey@cumin2002: START - Cookbook sre.hosts.provision for host parsoidtest1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:36 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host deploy1003.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:36 elukey@cumin2002: START - Cookbook sre.hosts.provision for host deploy1003.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:35 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host krb1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:35 elukey@cumin2002: START - Cookbook sre.hosts.provision for host krb1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:33 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2035.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:33 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2035.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2035.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2035.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:26 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2008.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:26 elukey@cumin2002: START - Cookbook sre.hosts.provision for host dbproxy2008.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:25 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2007.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:25 elukey@cumin2002: START - Cookbook sre.hosts.provision for host dbproxy2007.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:24 elukey@cumin2002: START - Cookbook sre.hosts.provision for host dbproxy2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:23 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:23 elukey@cumin2002: START - Cookbook sre.hosts.provision for host dbproxy2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1029.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:21 elukey@cumin2002: START - Cookbook sre.hosts.provision for host dbproxy1029.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:17 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:17 elukey@cumin2002: START - Cookbook sre.hosts.provision for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host irc2003.wikimedia.org with OS bookworm
- 10:15 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:15 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM irc2003.wikimedia.org - elukey@cumin1002"
- 10:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM irc2003.wikimedia.org - elukey@cumin1002"
- 10:15 elukey@cumin2002: START - Cookbook sre.hosts.provision for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:15 elukey@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) irc2003.wikimedia.org on all recursors
- 10:15 elukey@cumin1002: START - Cookbook sre.dns.wipe-cache irc2003.wikimedia.org on all recursors
- 10:15 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:15 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc2003.wikimedia.org - elukey@cumin1002"
- 10:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc2003.wikimedia.org - elukey@cumin1002"
- 10:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1005.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host an-conf1005.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:11 elukey@cumin1002: START - Cookbook sre.dns.netbox
- 10:11 elukey@cumin1002: START - Cookbook sre.ganeti.makevm for new host irc2003.wikimedia.org
- 10:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:06 elukey@cumin2002: START - Cookbook sre.hosts.provision for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org
- 10:02 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:01 elukey@cumin2002: START - Cookbook sre.hosts.provision for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org
- 09:59 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 09:57 elukey@cumin2002: START - Cookbook sre.hosts.provision for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 09:24 jmm@deploy2002: Finished scap sync-world: Backport for Remove irc1001/irc2001 from mediawiki-config and add irc1003 (T331702 T376014) (duration: 08m 07s)
- 09:19 jmm@deploy2002: jmm: Continuing with sync
- 09:19 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: cloudvirt1063 needs maintenance T375223
- 09:18 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: cloudvirt1063 needs maintenance T375223
- 09:18 jmm@deploy2002: jmm: Backport for Remove irc1001/irc2001 from mediawiki-config and add irc1003 (T331702 T376014) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 09:16 jmm@deploy2002: Started scap sync-world: Backport for Remove irc1001/irc2001 from mediawiki-config and add irc1003 (T331702 T376014)
- 09:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 (T367856)', diff saved to https://phabricator.wikimedia.org/P69437 and previous config saved to /var/cache/conftool/dbconfig/20241001-090708-ladsgroup.json
- 09:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
- 09:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
- 09:06 ladsgroup@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
- 09:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
- 09:06 ladsgroup@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
- 09:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
- 08:58 hashar@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.25 refs T375656
- 08:46 urbanecm@deploy2002: Finished scap sync-world: Backport for DatabaseMentorStore: Cast user IDs to integers before looking them up (T375784) (duration: 06m 58s)
- 08:39 urbanecm@deploy2002: Started scap sync-world: Backport for DatabaseMentorStore: Cast user IDs to integers before looking them up (T375784)
- 07:58 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: T375382
- 07:54 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: T375382
- 07:43 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1246.eqiad.wmnet with reason: T374215
- 07:39 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1246.eqiad.wmnet with reason: T374215
- 07:34 kartik@deploy2002: Finished scap sync-world: Backport for Add namespace aliases for scn.wikipedia (T375979) (duration: 10m 05s)
- 07:30 kartik@deploy2002: kartik, melos: Continuing with sync
- 07:26 kartik@deploy2002: kartik, melos: Backport for Add namespace aliases for scn.wikipedia (T375979) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:24 kartik@deploy2002: Started scap sync-world: Backport for Add namespace aliases for scn.wikipedia (T375979)
- 07:21 kartik@deploy2002: Finished scap sync-world: Backport for Enable translation settings banner for Test wikipedia (T372460) (duration: 18m 15s)
- 07:14 kartik@deploy2002: kartik, abi: Continuing with sync
- 07:09 kartik@deploy2002: kartik, abi: Backport for Enable translation settings banner for Test wikipedia (T372460) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:03 kartik@deploy2002: Started scap sync-world: Backport for Enable translation settings banner for Test wikipedia (T372460)
- 06:47 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Luke Bowmaker out of all services on: 705 hosts
- 06:47 root@cumin2002: START - Cookbook sre.idm.logout Logging Luke Bowmaker out of all services on: 705 hosts
- 06:47 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Luke Bowmaker out of all services on: 1497 hosts
- 06:46 root@cumin2002: START - Cookbook sre.idm.logout Logging Luke Bowmaker out of all services on: 1497 hosts
- 06:44 XioNoX: cr3-ulsfo> request vmhost snapshot - T375345
- 04:01 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.22 (duration: 00m 58s)
- 03:51 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.43.0-wmf.25 refs T375656 (duration: 48m 36s)
- 03:02 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.43.0-wmf.25 refs T375656
- 02:47 eileen: civicrm upgraded from cf27c789 to 28fd5e3b
- 02:17 ejegg: email preference center upgraded from 8ff002ef to e88750e6
- 02:16 ejegg: payments-wiki upgraded from 8d3b8e94 to e88750e6
Other archives
2000s
- Archive 1: 2004 Jun - 2004 Sep
- Archive 2: 2004 Oct - 2004 Nov
- Archive 3: 2004 Dec - 2005 Mar
- Archive 4: 2005 Apr - 2005 Jul
- Archive 5: 2005 Aug - 2005 Oct, with revision history 2004-06-23 to 2005-11-25
- Archive 6: 2005 Nov - 2006 Feb
- Archive 7: 2006 Mar - 2006 Jun
- Archive 8: 2006 Jul - 2006 Sep
- Archive 9: 2006 Oct - 2007 Jan, with revision history 2005-11-25 to 2007-02-21
- Archive 10: 2007 Feb - 2007 Jun
- Archive 11: 2007 Jul - 2007 Dec
- Archive 12: 2008 Jan - 2008 Jul
- Archive 12a: 2008 Aug
- Archive 12b: 2008 Sept
- Archive 13: 2008 Oct - 2009 Jun
- Archive 14: 2009 Jun - 2009 Dec
2010s
- Archive 15: 2010 Jan - 2010 Jun
- Archive 16: 2010 Jul - 2010 Oct
- Archive 17: 2010 Nov - 2010 Dec
- Archive 18: 2011 Jan - 2011 Jun
- Archive 19: 2011 Jul - 2011 Dec
- Archive 20: 2011 Dec - 2012 Jun, with revision history 2007-02-21 to 2012-03-27
- Archive 21: 2012 Jul - 2013 Jan
- Archive 22: 2013 Jan - 2013 Jul
- Archive 23: 2013 Aug - 2013 Dec
- Archive 24: 2014 Jan - 2014 Mar
- Archive 25: 2014 April - 2014 September
- Archive 26: 2014 October - 2014 December
- Archive 27: 2015 January - 2015 July
- Archive 28: 2015 August - 2015 December
- Archive 29: 2016 January - 2016 May
- Archive 30: 2016 June - 2016 August
- Archive 31: 2016 September - 2016 December
- Archive 32: 2017 January - 2017 July
- Archive 33: 2017 August - 2017 December
- Archive 34: 2018 January - 2018 April
- Archive 35: 2018 May - 2018 August
- Archive 36: 2018 September - 2018 December
- Archive 37: 2019 January - 2019 April
- Archive 38: 2019 May - 2019 August
- Archive 39: 2019 September - 2019 December
2020s
- Archive 40: 2020 January - 2020 April
- Archive 41: 2020 May - 2020 July
- Archive 42: 2020 August - 2020 November
- Archive 43: 2020 December
- Archive 44: 2021 January - 2021 April
- Archive 45: 2021 May - 2021 July
- Archive 46: 2021 August - 2021 October
- Archive 47: 2021 November - 2021 December
- Archive 48: 2022 January
- Archive 49: 2022 February
- Archive 50: 2022 March
- Archive 51: 2022 April 1-15
- Archive 52: 2022 April 16-30
- Archive 53: 2022 May
- Archive 54: 2022 June
- Archive 55: 2022 July
- Archive 56: 2022 August
- Archive 57: 2022 September
- Archive 58: 2022 October
- Archive 59: 2022 November 1-15
- Archive 60: 2022 November 16-30
- Archive 61: 2022 December
- Archive 62: 2023 January
- Archive 63: 2023 February
- Archive 64: 2023 March
- Archive 65: 2023 April
- Archive 66: 2023 May
- Archive 67: 2023 June
- Archive 68: 2023 July
- Archive 69: 2023 August 1-15
- Archive 70: 2023 August 16-31
- Archive 71: 2023 September
- Archive 72: 2023 October
- Archive 73: 2023 November
- Archive 74: 2023 December
- Archive 75: 2024 January
- Archive 76: 2024 February
- Archive 77: 2024 March
- Archive 78: 2024 April
- Archive 79: 2024 May 1-15
- Archive 80: 2024 May 16-31
- Archive 81: 2024 June 1-15
- Archive 82: 2024 June 16-30
- Archive 83: 2024 July
- Archive 84: 2024 August
- Archive 85: 2024 September
- Archive 86: 2024 October