Obsolete:Media server
Plans for new architecture: Media server/2011 Media Storage plans
Deploying a new media server
Checklist of things that need to be done when setting up a media server (from scratch, no jumpstart):
- Base Solaris 10 install
- fix timezone, get /tmp mounted, fix dump and swap sizes
- Convert root filesystem to use ZFS
- fix grub menu,
- Set up raid
- Install pkgtool from sourceforge
- root home dir /root, get ssh keys over
- add local nonroot user with path /export/home/username and install privs
- Install from wmf spec files: pkgtool, pca
- set proxy environ var for this, do as nonroot user
- Check all patches, install using pca
- set proxy environ var for this, need sun account with contract
- Install from wmf spec files: screen, netcat
- set proxy environ var for this, do as nonroot user
- Copy media data and unpack
- aggregate nge0 and nge1
- Install Sun Java Webserver 7
- install 1,2,3,4, put in /opt/webserver7/... with /export/upload as path
- turn off web admin server and enable regular web server by editing xml files
- Install 64 bit dtrace plugin
- 64-bit copy from ...
- Install from wmf spec files: php, ganglia, libogg, libvorbis
- copy /etc/gmond.conf from...
- enable gmetrics in crontab by...
- Set up cron jobs for replication and snapshots if this is a master
Transitioning from one server to another
Basic procedure for switching to another media server, presuming it has been receiving data via replication:
- Halt uploads and delete/restores temporarily
- Turn off replication and snapshots on live media server
- Do replication manually to new host to copy over data since last push
- Mount the new /export/upload filesystem everywhere with a new mount point
- Change /home/wikipedia/bin/scap so that it uses the new mount point
- Change the mount point everywhere in CommonSettings.php and InitialiseSettings.php
- Change the squid settings in upload-settings.php to use new host
- Check we can read media as things are now
- Stop webserver on old media server, check we can read media
- Unmount old /export/upload everywhere, check we can read media
- Turn on uploads
- Test uploading of images
- Change puppet settings for ExtensionDistributor and nfs mount point checks to use the new filesystem (misc-servers.pp, nfs.pp)
- Test ExtensionDistributor, make sure it works
Transitioning from one nfs mount to another (ie. ms7 to nas1), Oct 2012
If we go with a new mountpoint:
- Toss this line:
Alias /centralnotice/ /mnt/upload6/centralnotice/
from /etc/apache2/wmf/wikimedia.conf (and make sure centralnotice still works) - Mount the new /export/upload filesystem everywhere with a new mount point, using Puppet
- Make sure the netapp exports are set up right for the volume
- Halt uploads and delete/restores temporarily
- on fenari, make a copy of /home/w/common/wmf-config/InitialiseSettings.php
- in that same file, set the values for wgEnableUploads for the wikis that have 'true' to 'false'
- sync-common-file to push that around
- sleep() (Wait 5-10 minutes for in progress uploads to finish)
- Change the mount point everywhere in CommonSettings.php, InitialiseSettings.php (saved copy and new copy), filebackend.php, extdist/svn-invoke.conf
- Change the settings for it in manifests/misc-servers.pp:misc::extension-distributor and push to fenari
- Check that public and private media, ext-dist, math and timeline (did those move to swift yet?), captcha are working
- Stop webserver on ms7, see if the above are still all good -- err, no, guess not (captcha plus some cached pages with math imgs that still get served by ms7)
- Turn on uploads/deletes/restores
- copy back the saved copy of InitialiseSettings.php on fenari to /home/w/common/wmf-config/
- sync-common-file to push that around
- Check that public and private media writes are working and that writes also go to the netapp
- Unmount old /export/upload6 everywhere -- maybe not since we can't move captcha!!
(Please fill in with what's missing, etc)
Additional notes from Aaron's email:
- I'd use the filebackend.php 'readOnly' option instead of wgEnableUploads (which is less robust). You could use both of course. 'readOnly' would need to be set for each multiwrite backend and is to be set to an explanatory English string.
- Someone probably just forgot to add this to the list, but a final rsync (with --update and *without* --delete) is needed afterwards since the netapps are not fully up to date with ms7. This does not need to be during any read-only time. It needs --update so it won't nuke newer files changes since the rsync. It can't have --delete since the newest uploads will be in the netapps but not ms7.
- After the rsync, I'll need to run syncFileBackend.php to pick up updates since the first netapp rsync began. I've already generated a list of position files for all wikis in my home directory with the position from 10 days ago (the 25th, for good measure). This will basically clean up after the resync above in terms of file deletions.
- This should be fine since mark started the rsync on the 29th according to the server admin log.
Since reads basically come from swift for uploads, timelines, and math renderings, users should not notice the temporary inconstancies between the netapps and swift. Writes will transparently sync the netapps on the fly to match swift for whatever files the write operation affects or needs.
Misc
To get live access logs on the media or thumb servers you can run the command
dtrace -qs access_log.d
which relies on the dtrace nsapi plugin and the file /opt/local/share/access_log.d
These are built from mediawiki/trunk/tools/nsapi-dtrace in case you need the sources.