Portal:Toolforge/Admin/Archive
This is a documentation for tool labs admins
Lot of stuff is missing, so please add all documentation of how things are set up here
Tools
Creation of new tool
Users create tools themselve, just make sure that toolwatcher is running
Removal of tool
Login to tools-login, and execute
sudo su cd /home/petrb/bin ./rmtool "<name of tool>"
Follow all instructions / eventually respond to questions, this is interactive script, don't run it in nohup
Disabling tools running on -login
There should be no bots or such running directly on -login, these should run on a grid. If you see anyone running a bot on -login do following:
If they run them in cron
Comment out the jobs and leave a message with explanation why you did it.
If they run them in a screen
Kill them and execute: /usr/local/sbin/warn-screen <list of pts>
Configuration of instance
Memory
Every instance has overcommit disabled, that is done using /etc/sysctl.d/60-vm.overcommit_memory.conf - this file is inside of init.pp so that every instance get it
NFS
Every instance need to use nfs by default, this is done by putting it to proper class and then enforcing puppet and rebooting. By default they use gluster,
New instance cookbook
Make sure this new instance doesn't provide service which needs own security group - if you fail to add one in time you will have to delete it
Exec nodes
- Make sure that exec nodes have their own external IP so that identd works and bots can connect to IRC inter alia
The /var/mail is a symlink to /data/project/.system/mail on all servers, that makes the mail boxens shared on whole tools project
There is tools-mail which only Coren knows, what is for
Toolwatcher (tools-login)
There is a daemon on tools-login called toolwatcher. It creates folders for new tools, creates local databases and updates the webservers' configuration. If you reboot tools-login or if it's dead, you need to (re-)start it:
sudo su service toolwatcher start
Access to instances
There is a puppet variable restricted_to. It is set to local-admin on all machines which access should be restricted to.
Howto
List all jobs
$ qstat -u '*'
See how much memory is being used
$ qstat -F h_vmem
See information about jobs that finished
$ ssh tools-master $ qacct --help
local-admin
There is admin service group called local-admin, it has own documentation page at http://tools.wmflabs.org/admin/ here is a copy of it, for case that webservers are offline:
History of tools
Every hour, the list of tools (service groups) is dumped and committed to the repository at ~tools.admin/var/lib/git/servicegroups
by the script ~tools.admin/bin/toolhistory
(backup) that runs as the continuous job toolhistory
.