Swift/Setup New Swift Cluster
The steps necessary to set up owa1-3 and ms1-3 as a swift cluster (owa -> proxies, ms -> storage)
Setting up swift in labs is similar. Swift/Setup New Swift Cluster (labs) describes the differences from this document
update DNS
Create a name that will be balanced across all the proxy servers using either round robin DNS or an LVS server. For testing, I have created things like msfe-pmtpa-test that is a RRDNS entry pointing to the tampa proxies, owa1-3.
Set up filesystems
Puppet will take care of all disks that are only 1 partition used for data - you should pass it all non-OS disks. You must create partitions on the OS disk for swift storage. The following is what I ran on ms-be1 (where the bios is on sda1 and sdb1, the OS partition is raided across 120GB partitions on sda2 and sdb2, and sda3 and sdb3 are swap):
# parted ) help ) print free ) mkpart swift-sda4 121GB 2000GB ) select /dev/sdb ) print free ) mkpart swift-sdb4 121GB 2000GB ) quit # mkfs -t xfs -i size=512 -L swift-sda4 /dev/sda4 # mkfs -t xfs -i size=512 -L swift-sdb4 /dev/sdb4 # mkdir /srv/swift-storage/sd{a,b}4 # vi /etc/fstab # <-- add in a line for sda4 and sdb4 with the same xfs options as the rest # mount -a # chown -R swift:swift /srv/swift-storage/sd{a,b}4 # chmod 750 /srv/swift-storage/sd{a,b}4
create the cluster hash
each cluster has a random string it uses to seed the hashes of what objects go where. Generate this string for use in the puppet configs
od -t x8 -N 8 -A n </dev/random
update puppet
Use ms-fe[12] and ms-be1-5 in puppet/manifests/site.pp as an example. You will have to create a class for your cluster following the examples in puppet/manifests/role/swift.pp
- make sure to define the list of drives for the storage nodes
- set up a class for your cluster's base, proxy, and storage configs in site.pp (model after pmtpa-test)
- make sure to set all variables for the proxy config, even if you don't have the real values yet
class { "swift::base": hash_path_suffix => "1234deadbeef5678" } <---- the cluster hash you just made class proxy inherits from swift-cluster::your-cluster { bind_port => "80", <---- the port on which swift will listen num_workers => "8", <---- should be double the number of cores proxy_address => "http://msfe-pmtpa-test.wikimedia.org", <---- the DNS entry you made super_admin_key => "some-secret-key", <---- choose a strong password here memcached_servers => [ "owa1.wikimedia.org:11211", "owa2.wikimedia.org:11211", "owa3.wikimedia.org:11211" ] <-- all proxy servers rewrite_account => "placeholder", <---- you will change this to its real value later rewrite_url => "http://127.0.0.1/auth/v1.0", <---- this should actually be localhost rewrite_user => "place:holder", <---- you will change this later rewrite_password => "placeholder", <---- you will change this later rewrite_thumb_server => "ms5.pmtpa.wmnet", <---- where swift goes to get thumbnails shard_containers => "some", <---- whether to shard any of the containers (all, some, none) shard_container_list => "wikipedia-commons-local-thumb" <---- comma separated list of containers to shard (or empty if none) }
- on all puppetmasters, create placeholder files for the rings in /var/lib/puppet/volatile/
cd /var/lib/puppet/volatile mkdir swift/clustername touch swift/clustername/{account,container,object}.{builder,ring.gz}
- load the new puppet configs onto each server
for host in owa{1..3} ms{1..3} do ssh $host puppetd --test sleep 30 && ssh $host puppetd --test & #run puppet twice just for good measure. done
build the rings
On any proxy server:
- create the ring files
- 16: indicates 2^16 partitions total. Partition count should be round-up(max num drives ever * 100)
- eg 50 servers * 12 drives each * 100 = 60,000. 2^16 = 65536, 16 would be the partition number.
- 3: replica count - the number of copies of each piece of data to store
- 3: min-part-hours - the minimum time before a partition can be moved again
- 16: indicates 2^16 partitions total. Partition count should be round-up(max num drives ever * 100)
cd /etc/swift swift-ring-builder account.builder create 16 3 3 swift-ring-builder container.builder create 16 3 3 swift-ring-builder object.builder create 16 3 3
- add the storage nodes
- this setup is one zone per server
- assuming all storage devices are the same size, they all want weight 100
- the format for the command is
swift-ring-builder <ring-file> <command> z<zone-number>-<hostname>:<port>/<device-name> <weight>
- eg:
swift-ring-builder account.builder add z2-ms2.pmtpa.wmnet:6002/sde1 100
- device name is the basename of the path to the mountpoint; eg /srv/swift-storage/abcd -> abcd.
- eg:
- rebalance the rings once they're created (this can take a while)
cd /etc/swift for num in 1 2 3 do host="ms${num}.pmtpa.wmnet" hostip=$(dig +short $host) zone="${num}-${hostip}" weight=100 for dev in $(ssh $host ls /srv/swift-storage/) do swift-ring-builder account.builder add z${zone}:6002/${dev} $weight swift-ring-builder container.builder add z${zone}:6001/${dev} $weight swift-ring-builder object.builder add z${zone}:6000/${dev} $weight done done swift-ring-builder account.builder rebalance swift-ring-builder container.builder rebalance swift-ring-builder object.builder rebalance chown swift:swift *.ring.gz
distribute the rings
copy the three .builder and the three ring.gz files into puppet, who will distribute them to all nodes in the cluster. They live in the volatile section of puppet (for big binary files) on all puppetmasters. Within that/swift/ they reside in a directory named for the location and role of the cluster (eg eqiad-test, pmtpa-prod, etc.)
cd /etc/swift scp {account,container,object}.{builder,ring.gz} puppetmaster__node__name:/var/lib/puppet/volatile/swift/eqiad-test/
Check them in and do all the normal puppet stuff.
reboot
always good practice to reboot into a new server as proof that it functions correctly on system start.
set up auth tokens for the cluster
- Initialize swauth using the super_admin_key from the config above
- swauth-prep -A http://127.0.0.1/auth/ -K xxxxxx
- add the user for thumbnails
- generate a password:
pass=$(pwgen -s 12 1)
- add the user:
swauth-add-user -A http://127.0.0.1/auth/ -K thisshouldbesecret -a mw thumb testing
- account: mw, user: thumb, password: testing (you should use your generated password instead)
- generate a password:
- test it and retrieve the account id:
swauth-list -A http://127.0.0.1/auth/ -K thisshouldbesecret mw
- you're looking for:
"account_id": "AUTH_205b4c23-6716-4a3b-91b2-5da36ce1d120"
- you're looking for:
tell puppet about the auth tokens
Update the puppet config with the authentication tokens you just made
rewrite_account => "AUTH_205b4c23-6716-4a3b-91b2-5da36ce1d120", rewrite_user => "mw:thumb", rewrite_password => "testing",
update the proxies and restart the proxy service
- run puppet on all proxy servers
- reload the proxy config on all proxy servers
swift-init proxy reload
create dispersion objects and containers
- run swift-dispersion-populate on a proxy node once to populate the initial list of containers and objects for dispersion detection
make the containers necessary for thumbnails
until BZ:33206 is resolved, we have to make all thumbnail containers by hand.
ssh ms5 cd /export/thumbs for i in */*/*; do echo $i ; done | grep "thumb$" |tr \/ - > /tmp/container-list scp /tmp/container-list my-proxy-server:/tmp/ ssh my-proxy-server for cont in $(cat /tmp/container-list); do swift -A http://127.0.0.1/auth/v1.0 -U mw:thumb -K testing post $cont; swift -A http://127.0.0.1/auth/v1.0 -U mw:thumb -K testing post ${cont/thumb/temp}; swift -A http://127.0.0.1/auth/v1.0 -U mw:thumb -K testing post ${cont/thumb/public}; done
make the containers readable by anonymous users
The swift rewrite middleware doesn't authenticate for requests to public buckets.
grep -v private /tmp/container-list > /tmp/public-container-list for cont in $(cat /tmp/public-container-list) do swift -A http://127.0.0.1/auth/v1.0 -U mw:thumb -K testing post -r '.r:*' ${cont} swift -A http://127.0.0.1/auth/v1.0 -U mw:thumb -K testing post -r '.r:*' ${cont/thumb/temp} swift -A http://127.0.0.1/auth/v1.0 -U mw:thumb -K testing post -r '.r:*' ${cont/thumb/public} done
test the cluster
todo
- all hosts:
- move the swift_hash_path_suffix into the psaswords file.
- adjust tcp settings (from http://docs.openstack.org/bexar/openstack-object-storage/admin/content/ch04s06.html#d5e1206)
- proxy host:
- move super_admin_key into passwords file. [DONE]
- move the filter:rewrite login/key into passwords file. [DONE]