Jump to content

Nova Resource:Devtools/SAL

From Wikitech

2024-08-01

  • 19:18 mutante: gitlab-prod-1002 - moved contents of /var/log/gitlab out of the way temp, emptied it, letting wmcs-prepare-cinder-volume mount new volume into /var/log/gitlab and format it with ext4, moved old existing logs back into it T371066
  • 19:11 mutante: attaching volume 'gitlab-prod-logs' to instance 'gitlab-prod-1002' T371066 | running 'sudo wmcs-prepare-cinder-volume' manually and answering interactive questions to mount/format it (once T371573 is resolved will do this with puppet)
  • 19:07 mutante: creating new 20GB volume 'gitlab-prod-logs' T371066
  • 09:21 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0)
  • 09:21 sstefanova@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase

2024-07-24

  • 22:42 mutante: gitlab-prod-1002 out of disk again when attempting package upgrade - apt-get clean; rm /var/log/syslog.2*.gz rm /var/log/messages.2*.gz to get space

2024-07-10

  • 21:46 mutante: gitlab-runner* instances: apt-get upgrade (upgrading gitlab-runner, exim*. downgrading lshw, pythong3-jinja2)
  • 21:34 mutante: gitlab-prod-1002 apt-get upgrade (upgrading gitlab-ce, exim-*. downgrading lshw, python3-jinja2)

2024-06-18

  • 10:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=0)
  • 10:18 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_project_to_ovs

2024-05-06

  • 20:37 mutante: deleting buster deployment server deploy-1004, replaced by deploy-1006 - T360964

2024-05-02

  • 23:52 mutante: switching puppetmaster for deploy-1006 back to local project puppetmaster; rm -rf /var/lib/puppet/ssl that still referred to puppetmaster-1001, signing new request on puppetmaster-1003 T360470 T363415
  • 23:10 mutante: replacing deploy-1004 (buster) with deploy-1006 (bullseye) as new deployment server in both repo and Horizon hiera T360964 T363415

2024-05-01

2024-04-24

  • 20:28 mutante: changing puppetmaster of deploy-1006 to puppetmaster.cloudinfra.wmflabs.org instead of the local project one
  • 20:12 mutante: - puppet run on deploy-1006 deployment_server - scap init duplicate declaration fixed after adding "profile::mediawiki::scap_client::is_master: true" to new deploy-prefix Hiera - thanks to Jaime Nuche per comment on gerrit
  • 19:46 mutante: creating new prefix 'deploy' to apply needed Hiera keys to deployment hosts based on host name prefix (both deploy1004, deploy1006 and future deploy*)
  • 19:41 mutante: deleting instance puppetmaster-1001 that was > 4 years old, on buster and I had shutdown a couple days ago. replaced by puppetmaster-1003 (bookworm, puppetserver) T360964 T360470
  • 19:21 mutante: gitlab-prod-1002; gitlab-runner-1003; gitlab-runner-1002 - apt-get update && apt-get upgrade

2024-04-17

  • 21:18 mutante: - resizing puppetmaster-1003 from g3.cores1.ram2.disk20 to g3.cores2.ram4.disk20 - T360470

2024-04-16

  • 17:09 mutante: - deleting devtools-puppetdb1001 instance (T360964)
  • 16:54 mutante: - soft rebooting puppetmaster-1003, shutting down puppetmaster-1001
  • 16:38 mutante: - can't ssh to new puppetmaster again
  • 16:33 mutante: - deleting deploy-1005 - don't try deployment server in bookworm, first bullseye
  • 16:29 mutante: - shutting down puppetdb instance again

2024-04-15

  • 17:52 mutante: - added profile::labs::cindermount::srv to puppetmaster-1003 in horizon to get missing cinder volume - T360470
  • 17:51 mutante: - added Notice: /Stage[main]/Profile::Labs::Cindermount::Srv/Cinderutils::Ensure[cinder_on_srv]/Exec[prepare_cinder_volume_/srv]/returns: executed successfully
  • 16:57 mutante: - puppetmaster-1003 reachable again but service fails to start and puppetserver-deploy-code fails
  • 16:50 mutante: - rebooted unreachable puppetmaster-1003 - was "no route to host" - but is back now, log had a " /dev/sdb: Can't open blockdev" as well

2024-04-12

  • 17:55 mutante: - changed both puppetmaster and puppetdb hiera setting back to puppetmaster-1001 for instance deploy-1004
  • 17:49 mutante: - deploy-1004 itself is on buster and buster and puppet 7 don't mix well - testing deployment role on bookworm, creating deploy-1005
  • 17:48 mutante: - deploy-1004 has a puppet problem when talking to new puppetmaster-1003 that goes away when switching back to puppetmaster-1001

2024-04-11

  • 20:08 mutante: zuul-1001 - switching to new puppetmaster-1003 in puppet.conf manually, switched project defaults in repo too
  • 19:58 mutante: manually editing puppet.conf to use puppetmaster-1003 instead of -1001 because you can't switch the puppetmaster via puppet if puppet is already broken :)
  • 19:41 mutante: switching gitlab-runner-1005 from puppetmaster-1001 to puppetmaster-1003 via web Hiera
  • 19:03 mutante: - deleting instance contint-bullseye which was only used by me for a test before we created contint1003 in prod T334517 T361224
  • 18:38 mutante: - attempting to fix puppet run on vrts-1001 related to switching prod to cfssl for SSL cers
  • 18:23 mutante: - shutting down puppetmaster-1001 on buster - should now be replaced by puppetmaster-1003 on bookworm (thanks brennen) T360964 T360470
  • 18:02 mutante: - shutting down instance devtools-puppetdb1001 - which is on buster - basically to see what breaks of complains, if anything

2024-04-09

  • 19:46 mutante: - soft rebooting gerrit-prod-1001 buster instance (to be removed )

2024-04-01

  • 17:27 mutante: - added profile::pki::client::ensure: present to instance hiera for etherpad-bookworm - fixing broken puppet run
  • 17:25 mutante: - attempting to fix puppet on instance etherpad-bookworm but SSL provider cfssl doesn't appear to work in cloud

2024-02-09

  • 19:08 mutante: deleting instance phabricator-prod-1001 (shut down a couple days ago, buster instance replaced by phabricator-bullseye instance) T356530

2024-02-07

  • 21:51 mutante: rebooting gitlab-prod-1002 T356906

2024-02-02

  • 23:37 mutante: phabricator-bullseye configured auth provider for simple passwords, letting users register users, locked auth config option again
  • 23:34 mutante: phabricator-bullseye:/srv/deployment/phabricator/deployment/phabricator/bin$ sudo ./auth unlock
  • 22:53 mutante: - phabricator-bullseye sudo ./config set phabricator.base-uri 'http://phabricator.wmcloud.org/' | sudo ./config set security.alternate-file-domain 'https://phab-usercontent.wmcloud.org' | delete proxy phab-prod-usercontent, create proxy phab-usercontent.wmcloud.org, restart apache T356530
  • 22:37 mutante: - phabricator-bullseye - /srv/deployment/phabricator/deployment/phabricator/bin$ sudo ./config set phabricator.base-uri 'http://phabricator.wmcloud.org/' T356530
  • 22:31 mutante: - phabricator-bullseye, instance hiera: setting phabricator_domain to phabricator.wmcloud.org and phabricator_altdomain to phab-usercontent.wmcloud.org T356530
  • 22:27 mutante: - deleted proxies phab.wmflabs.org, phab-prod-usercontent.wmflabs.org, phabricator.wmflabs.org - created proxies phabricator.wmcloud.org, phab-usercontent.wmcloud.org - wmflabs names are legacy and should migrate T356530
  • 20:59 mutante: - changing phabricator domain in instance Hiera of phabricator-bullseye to phab.wmflabs.org and running puppet to update apache config/rewrite rules T356530
  • 20:41 mutante: shutting down instance phabricator-prod-1001 (buster), replaced by phab-bullseye (bullseye) T356530
  • 20:38 mutante: deleting proxy phab-bull.wmcloud.org after previous proxy names are switched to bullseye backend T356530
  • 20:17 mutante: editing proxies phab.wmflabs.org and phab-prod-usercontent.wmflabs.org to point to bullseye instance instead of buster T356530
  • 20:13 mutante: running "scap deploy" in /srv/deployment/phabricator/deployment on deploy-1004 which deploys to phabricator-bullseye and phabricator-prod-1001 T356530
  • 19:29 mutante: deleting web proxy phabricator-prod.wmflabs.org which pointed to port 443 on the buster instance and timed out T356530
  • 19:12 mutante: deleting web proxy phorge.wmcloud.org which pointed to 172.16.7.98 which doesn't exist anymore T356530

2024-01-05

  • 21:30 mutante: contint-bullseye - sudo /usr/sbin/a2dismod mpm_event ; sudo /usr/sbin/a2endmod php74 - the usual issue we have had for years

2023-12-18

  • 21:53 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.quota_increase (exit_code=99)
  • 21:53 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase
  • 21:52 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.quota_increase (exit_code=99) (T353671)
  • 21:52 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase (T353671)

2023-12-13

  • 23:49 mutante: starting manual gitlab upgrade process on gitlab-prod-1002

2023-11-30

  • 21:03 mutante: - phabricator-bullseye - created user app_user and granted privileges in mysql for user from 127.0.0.1, ran ./phabricator/bin/storage upgrade --force; set 'phabricator_domain: phab-bull.wmcloud.org' in web Hiera
  • 20:33 mutante: - phabricator-bullseye - running 'mariadb-secure-installation' interactive script - this fixed mysql shell which previously exited with "bash: /nonexistent: No such file or directory"
  • 20:29 mutante: - phabricator-bullseye - running 'mariadb-install-db'
  • 20:27 mutante: - phabricator-bullseye - attempting to fix mariadb/mysql server, apt-get remove mariadb-server, running puppet, debugging why it wont start

2023-11-21

  • 21:52 mutante: - commit fake key for phabricator-bullseye host in git /var/lib/git/labs/private/modules/secret/secrets/ssl on puppetmaster-1001.devtools T327068
  • 21:41 mutante: - cert issue on new machine related to having local puppetmaster, like T349937#9288547 except "rm -rf /var/lib/puppet/ssl" was enough since puppetmaster did auto-sign new CSR - T327068
  • 21:24 mutante: - initial puppet run on newly created VM fails with "SSL_connect returned=1 errno=0 state=error: certificate verify failed (self signed certificate in certificate chain): [self signed certificate in certificate chain for /CN=Puppet CA: puppetmaster-1001.devtools.eqiad.wmflabs" T327068
  • 21:05 mutante: - creating instance phabricator-bullseye g3.cores2.ram4.disk20 T327068
  • 21:00 mutante: - deleted instance phorge-1001 to get quota back and allow for creting new phabricator-on-bullseye instance T328595 T327068

2023-04-12

  • 19:26 mutante: - vrts-1001 - editing /etc/my.cnf to set mariadb datadir to /var/lib/mysql instead of /srv/sqldata and restart service, issue like T329571

2023-03-11

  • 00:20 mutante: - on phorge1001, enable general query log in mysql (mariadb), to learn about database scheme, don't forget to turn that off so VM doesn't run out of disk (SET GLOBAL general_log=1;) T328595

2023-03-07

  • 23:34 mutante: - phorge-1001 - MariaDB [(none)]> SET GLOBAL max_allowed_packet=33554432;
  • 23:33 mutante: - phorge-1001 - MariaDB [(none)]> SET GLOBAL local_infile=0;
  • 23:31 mutante: - phorge-1001 - MariaDB [(none)]> SET GLOBAL sql_mode = "STRICT_ALL_TABLES,STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION";

2023-02-13

  • 23:09 mutante: - shutting down gerrit-prod-1001
  • 22:22 mutante: certbot renew --apache fixed cert issue - https://ldapauth-gitldap.wmflabs.org/ does not exist unrelatedly - T329444
  • 22:18 mutante: install package python3-certbot-apache on gerrit-prod-1001 - T329444
  • 22:03 mutante: - re-activating disabled puppet on gerrit-prod-1001 (reason given was 'gerrit deploy' but it was about 17 days ago)
  • 21:58 mutante: rebooting instance gerrit-prod-1001 which can't be reached T329444

2023-01-31

  • 22:39 mutante: remove role::gitlab from gitlab-prod-1001. to be replaced with gitlab-prod-1002. T318521

2023-01-28

  • 16:26 taavi: adjust gitlab-prod-1002 network port settings to allow adding the secondary IP, requested in T318521

2023-01-24

  • 09:47 wm-bot2: Increased quotas by 1 instances (T327750) - cookbook ran by arturo@nostromo

2022-11-30

  • 15:57 wm-bot2: Increased quotas by 1 floating-ips (T323986) - cookbook ran by dcaro@vulcanus

2022-11-17

  • 18:18 andrewbogott: committed a local puppet change on puppetprimary to fix upstream syncs

2022-10-28

  • 20:24 mutante: - removing from Horizon / project-wide hiera: profile::phabricator::main::manage_scap_user: true (set in the repo)
  • 20:23 mutante: - removing from Horizon / project-wide hiera: profile::keyholder::server::require_encrypted_keys: 'no' (set in the repo)
  • 20:19 mutante: - removing from Horizon / project-wide hiera: profile::gerrit::daemon_user: gerrit2, profile::gerrit::manage_scap_user: true, profile::gerrit::scap_user: gerrit-deploy (all of these are set in the repo)

2022-10-19

  • 19:21 mutante: - on puppetmaster-1001.devtools created /var/lib/puppet/volatile/GeoIP directory - to fix puppet error on deploy-1004.devtools - reacting to puppet-broken-nagging-emails

2022-06-24

  • 12:43 taavi: `os quota set devtools --ram 45056 --cores 22 --instances 9` # T311302

2022-06-15

  • 20:30 mutante: - created gitlab-runner-1002 - applied puppet role - attached cinder volume "docker" - running puppet again
  • 17:06 mutante: deleting instance gitlab-runner-1001 which just disconnects people. gut feeling is it has to do with the fact that a previous instance name was used again

2022-06-14

  • 23:29 mutante: - creating instance gitlab-runner-1001 since we did not have a test machine for gitlab-runners but need one to test things like gerrit:791655 before hitting prod T308271

2022-04-29

  • 20:59 mutante: - restarting instance gitlab-prod-1001 - No route to host
  • 20:55 mutante: - attempting to soft reboot instance deploy1004 (got the puppet fail mail and wasnt reachable by ssh), this happened lately as well to gitlab-prod-1001, same project, different instance, but this time it doesn't just come back yet

2022-04-20

  • 17:03 mutante: soft rebooting gitlab-prod-1001 which was sending "failed puppet" reports and was unreachable, just like the other day.

2022-04-18

  • 19:08 mutante: - gitlab-prod-1001 is indeed back after soft rebooting the instance. uptime 1 min T297411
  • 19:07 mutante: - gitlab-prod-1001 randomly stopped working. we got the "puppet failed" mails without having made changes and can't ssh to the instance anymore when trying to check out why. trying soft reboot via Horizon T297411

2022-04-15

  • 18:00 mutante: - deleting deploy-1002 - use deploy-1004 instead - T306069
  • 17:03 mutante: - not sure if possible (for me) to create a bullseye deployment server in cloud, using scap: failed: Execution of '/usr/bin/scap deploy --init', missing PHP packages, missing prometheus-mcrouter-exporter and more T306069
  • 17:02 mutante: - not sure if possible (for me) to create ad deployment server in cloud, using scap: failed: Execution of '/usr/bin/scap deploy --init'
  • 16:40 mutante: : creating deploy1003 to replace deploy1002 T306069
  • 16:36 mutante: : deleting instance gitlab-runner-1001 - was just for testing, real runners are upgrade in their own project

2022-03-02

  • 22:22 mutante: - creating gitlab-runner-1001 on bullseye - purely test for T297659

2022-03-01

  • 18:16 taavi: allocated secondary IP for gitlab-prod-1001 per request on T302803

2022-02-15

  • 16:08 taavi: created devtools.wmcloud.org dns zone for the devtools project T301793

2022-01-26

  • 17:26 arturo: bump quota, floating IP from 1 to 2 (T299561)
  • 15:56 arturo: bump quota, RAM from 32 to 40, cores from 16 to 20 (T299561)

2022-01-21

  • 22:12 mutante: - created new instance gitlab-prod-1001 T297411
  • 22:11 mutante: - created new instance gitlab-prod-1001T297411
  • 21:57 mutante: - deleted instances "doc" and "doc1002" to make room for gitlab instance T299561 - T297411

2022-01-19

  • 17:36 mutante: - added brennen, aokoth and jelto as users and projectadmins (T297411)

2021-11-10

  • 19:49 mutante: - removing manually added things in Horizon Hiera that were already in the repo, please don't keep adding in web UI, we don't want to repeat the same thing we did in deployment-prep

2021-07-28

  • 16:39 andrewbogott: rebooting gerrit-prod-1001; seemingly unreachable

2021-03-10

  • 10:58 arturo: briefly stopped VM 'doc' to disable VMX cpu flag and live-migrate it

2021-02-22

  • 20:58 mutante: fixed puppet run on deploy-1002 by adding empty array of wikimedia-sites to hiera
  • 20:01 mutante: deploy-1002 is broken because mediawiki::sites is not in Hiera (yet)

2020-10-28

  • 17:01 andrewbogott: fixed puppet runs on phabricator-stage-1001 (previously puppetmaster name mismatch)

2020-09-01

  • 00:16 mutante: - unbreaking puppet run on the local deployment after it was broken since July due to changes in prod deployment_server role

2020-06-30

  • 20:22 mutante: managed to let certbot get LE certs for gerrit.devtools.wmflabs.org and the floating IP

2020-06-17

  • 19:55 paladox: ran `iptables -A INPUT -p tcp -m tcp --dport 80 -j ACCEPT` on phabricator-prod-1001

2020-05-08

  • 07:01 mutante: phabricator-prod-1001 - removing cron for public task dump (though puppet should have removed it)

2020-05-07

2020-04-13

  • 10:00 mutante: - phabricator-stage-1001: replace deployment-tin.deployment-rep with deploy-1002.devtools in deployment-cache/.config
  • 09:40 mutante: set missing (and new) profile::tlsproxy::envoy::capitalize_headers: true to fix puppet errors
  • 09:35 mutante: set phabricator::vcs::address::v6 to fe80 local address to fix puppet error on phabricator-stage-1001

2020-01-16

  • 00:53 mutante: deploy-1002 - become 'trebuchet' user and ssh to phabricator scap targets. to fix ssh host key verification issue on first deploy
  • 00:30 mutante: deploy-1002 live hack /srv/deployment/phabricator/deployment/scap/phabricator-targets and replace prod server with cloud instances; scap deploy in phabricator repo

2020-01-15

  • 23:51 paladox: deploy-1002 rm -rf /srv/deployment
  • 23:44 mutante: deploy-1002 sudo git init in /srv/deployment ; scap deploy --init (now fails with 'fatal: Not a valid object name HEAD')
  • 23:42 mutante: deploy-1002 mkdir /srv/deployments/.git ; chown trebuchet:wikidev .git ; manually run "scap deploy --init" as trebuchet user in an attempt to fix initial puppet run on deployment_server

2020-01-14

  • 00:59 mutante: deleting instance codesearch-buster
  • 00:54 mutante: - deleting instance codesearch-stretch, creating codesearch-buster

2020-01-11

  • 00:35 mutante: deleting instance codesearch-buster, creating codesearch-stretch
  • 00:05 mutante: s/cloudsearch/codesearch/g
  • 00:04 mutante: creating throwaway instance "cloudsearch"
  • 00:04 mutante: deleting instance deploy1001 (buster), creating deploy-1002 (stretch) instead

2020-01-04

  • 16:01 bstorm_: moving vm puppetmaster-1001 from cloudvirt1024 to cloudvirt1009 due to hardware error T241884

2020-01-03

  • 22:37 mutante: - sudo vi /srv/deployment/phabricator/deployment-cache/.config on both phabricator instances to fix deployment server (remove deployment-tin (!))
  • 21:50 mutante: assigned 172.16.0.198/32 on eth0 on phabricator-prod-1001
  • 21:50 jeh: add secondary interface to phabricator-prod-1001
  • 21:32 mutante: configure 172.16.0.189 as "vcs" address v4 for phabricator-stage-1001
  • 21:24 jeh: add secondary interface to phabricator-stage-1001
  • 00:47 paladox: puppet cert generate puppetmaster-1001.devtools.eqiad.wmflabs
  • 00:30 paladox: set puppetmaster: puppetmaster-1001.devtools.eqiad.wmflabs in hiera

2020-01-02

  • 23:42 paladox: puppetmaster-1001: ln -s /var/lib/puppet/ssl/private_keys/puppetmaster-1001.devtools.eqiad.wmflabs.pem /var/lib/puppet/server/ssl/private_keys/puppetmaster-1001.devtools.eqiad.wmflabs.pem
  • 23:40 paladox: puppetmaster-1001: ln -s /var/lib/puppet/ssl/certs/puppetmaster-1001.devtools.eqiad.wmflabs.pem /var/lib/puppet/server/ssl/certs/puppetmaster-1001.devtools.eqiad.wmflabs.pem
  • 23:35 mutante: attempting to create local puppetmaster (formerly puppetmaster::self)

2019-12-23

  • 04:14 mutante: - turns out instance creation doesn't work using Brave browser but does work using Firefox (T241345) - created phabricator-prod-1001
  • 03:45 mutante: - creating new instance seems to fail - nothing shows up at all
  • 03:42 mutante: - launching medium sized buster instance gerrit-prod-01 (T236309)