Swift/Hackathon Installation Notes
This page describes specifics on how we set up the servers for swift at the NOLA hackathon.
hardware
We have 3 misc servers: magnesium, copper, and zinc, each has 2 drives.
- 50GB on both drives RAID 1 for the OS
- 450GB with no RAID for the storage bricks (swift likes direct access to the disks and the docs discourage RAID)
copper and zinc are configured as proxy nodes. all three are configured as storage nodes.
swift install
following instructions on swift's website: http://swift.openstack.org/howto_installmultinode.html
packages
after adding the recommended debian repo, aptitude install swift brought in the following packages from a non-wmf repository:
Get:10 http://ppa.launchpad.net/swift-core/release/ubuntu/ lucid/main python-greenlet 0.3.1-1ubuntu1~lucid0 [15.1kB] Get:11 http://ppa.launchpad.net/swift-core/release/ubuntu/ lucid/main python-eventlet 0.9.13-0ubuntu1~lucid0 [115kB] Get:12 http://ppa.launchpad.net/swift-core/release/ubuntu/ lucid/main python-webob 1.0.8-0~ppalucid2 [283kB] Get:13 http://ppa.launchpad.net/swift-core/release/ubuntu/ lucid/main python-swift 1.4.3-0ubuntu1~lucid1~ppa1 [263kB] Get:14 http://ppa.launchpad.net/swift-core/release/ubuntu/ lucid/main swift 1.4.3-0ubuntu1~lucid1~ppa1 [39.2kB]
The rest of the packages needed are:
Get:1 http://ppa.launchpad.net/swift-core/release/ubuntu/ lucid/main swift-proxy 1.4.3-0ubuntu1~lucid1~ppa1 [8,568B] Get:1 http://ppa.launchpad.net/swift-core/release/ubuntu/ lucid/main swift-account 1.4.3-0ubuntu1~lucid1~ppa1 [8,172B] Get:2 http://ppa.launchpad.net/swift-core/release/ubuntu/ lucid/main swift-container 1.4.3-0ubuntu1~lucid1~ppa1 [8,000B] Get:3 http://ppa.launchpad.net/swift-core/release/ubuntu/ lucid/main swift-object 1.4.3-0ubuntu1~lucid1~ppa1 [9,534B]
I downloaded binary swauth packages from github and installed them in our repo. Only the python-swauth package is necessary; the swauth-doc is only documentation. There are .debs in ubuntu's precise repo, so eventaully these will no longer need to be manually tracked.
python-swauth 1.0.2-1 swauth-doc 1.0.2-1
puppet
- created manifests/swift.pp
- created files/swift
- created private:files/swift
proxy nodes
TODO:
- memcached should only be available to other swift proxy servers - the port should be firewalled off to enforce that.
- move proxy-server.conf from files to template to templatize account, login, and password
- puppetize netfilter settings
high capacity conntracking
To let the firewall do stateful connection tracking the kernel maintains a table of all open connections. If this table size is too low, the server starts refusing connections well before exhausting other resources. The default should be raised to the point where the server can be saturated. more detail on high performance linux networking.
- /sbin/sysctl -w net.netfilter.nf_conntrack_max=262144
- echo 32768 > /sys/module/nf_conntrack/parameters/hashsize
memcached
Memcached will run on all the proxy servers. /etc/swift/swift-proxy.conf lists all the memcached servers, so must be updated (eventually) when adding or removing proxy servers. Everything is supposed to continue working with a missing memcached shard, so the list does not need to be updated in an outage situation. (This must be validated through testing)
created a class to differentiate configs for different clusters with hardcoded values like the list of memcached servers for the proxy servers. lame, but I'd rather get it running then improve than block on that.
rings
initial setup of the rings need to be done by hand. This is something that is more of a cluster operation rather than a server operation. Once built, the rings are modified when additional servers are added. See section 4 of the proxy server config for http://swift.openstack.org/howto_installmultinode.html
ran:
swift-ring-builder account.builder create 18 3 1 swift-ring-builder container.builder create 18 3 1 swift-ring-builder object.builder create 18 3 1
then ran:
cat ~/build-swift-rings #!/bin/bash for zone in 1-208.80.154.136 2-208.80.154.5 3-208.80.154.146 do for dev in sda3 sdb3 do weight=100 swift-ring-builder account.builder add z${zone}:6002/$dev $weight swift-ring-builder container.builder add z${zone}:6001/$dev $weight swift-ring-builder object.builder add z${zone}:6000/$dev $weight done done
and got this output:
root@copper:/etc/swift# bash ~/build-swift-rings Device z1-208.80.154.136:6002/sda3_"" with 100.0 weight got id 0 Device z1-208.80.154.136:6001/sda3_"" with 100.0 weight got id 0 Device z1-208.80.154.136:6000/sda3_"" with 100.0 weight got id 0 Device z1-208.80.154.136:6002/sdb3_"" with 100.0 weight got id 1 Device z1-208.80.154.136:6001/sdb3_"" with 100.0 weight got id 1 Device z1-208.80.154.136:6000/sdb3_"" with 100.0 weight got id 1 Device z2-208.80.154.5:6002/sda3_"" with 100.0 weight got id 2 Device z2-208.80.154.5:6001/sda3_"" with 100.0 weight got id 2 Device z2-208.80.154.5:6000/sda3_"" with 100.0 weight got id 2 Device z2-208.80.154.5:6002/sdb3_"" with 100.0 weight got id 3 Device z2-208.80.154.5:6001/sdb3_"" with 100.0 weight got id 3 Device z2-208.80.154.5:6000/sdb3_"" with 100.0 weight got id 3 Device z3-208.80.154.146:6002/sda3_"" with 100.0 weight got id 4 Device z3-208.80.154.146:6001/sda3_"" with 100.0 weight got id 4 Device z3-208.80.154.146:6000/sda3_"" with 100.0 weight got id 4 Device z3-208.80.154.146:6002/sdb3_"" with 100.0 weight got id 5 Device z3-208.80.154.146:6001/sdb3_"" with 100.0 weight got id 5 Device z3-208.80.154.146:6000/sdb3_"" with 100.0 weight got id 5
After building them, the rings look like this (note - only showing account, but did this for all three):
root@copper:/etc/swift# swift-ring-builder account.builder account.builder, build version 6 262144 partitions, 3 replicas, 3 zones, 6 devices, 100.00 balance The minimum number of hours before a partition can be reassigned is 1 Devices: id zone ip address port name weight partitions balance meta 0 1 208.80.154.136 6002 sda3 100.00 0 -100.00 1 1 208.80.154.136 6002 sdb3 100.00 0 -100.00 2 2 208.80.154.5 6002 sda3 100.00 0 -100.00 3 2 208.80.154.5 6002 sdb3 100.00 0 -100.00 4 3 208.80.154.146 6002 sda3 100.00 0 -100.00 5 3 208.80.154.146 6002 sdb3 100.00 0 -100.00
The next step is rebalancing, after which they look like this (note - only showing account, but did this for all three):
root@copper:/etc/swift# swift-ring-builder account.builder rebalance Reassigned 262144 (100.00%) partitions. Balance is now 0.00. root@copper:/etc/swift# swift-ring-builder account.builder account.builder, build version 6 262144 partitions, 3 replicas, 3 zones, 6 devices, 0.00 balance The minimum number of hours before a partition can be reassigned is 1 Devices: id zone ip address port name weight partitions balance meta 0 1 208.80.154.136 6002 sda3 100.00 131072 0.00 1 1 208.80.154.136 6002 sdb3 100.00 131072 0.00 2 2 208.80.154.5 6002 sda3 100.00 131072 0.00 3 2 208.80.154.5 6002 sdb3 100.00 131072 0.00 4 3 208.80.154.146 6002 sda3 100.00 131072 0.00 5 3 208.80.154.146 6002 sdb3 100.00 131072 0.00
finally, copying the ring data files to the other proxy/proxies:
root@copper:/etc/swift# scp *.ring.gz zinc:/etc/swift/ account.ring.gz 100% 308KB 308.0KB/s 00:00 container.ring.gz 100% 308KB 308.0KB/s 00:00 object.ring.gz 100% 308KB 307.7KB/s 00:00
chown /etc/swift/* to swift:swift on both zinc and copper, then run 'swift-init proxy start'. ps shows it running and netstat shows it listening no port 8080.
swauth
TODO:
- change the swauth master password
- got the swauth package from github: https://github.com/gholt/swauth
- added an swauth stanza to the proxy-server.conf file and changed the pipeline to use it:
[filter:swauth] use = egg:swauth#swauth default_swift_cluster = local#http://127.0.0.1:8080/v1 set log_name = swauth super_admin_key = mymadeupkey
- restarted the proxy server
- prepped swauth with
swauth-prep -K mymadeupkey
- created a test user with
swauth-add-user -A http://127.0.0.1:8080/auth/ -K mymadeupkey -a test tester testing
- got the test users credentials with:
root@copper:/etc/swift# swauth-list -K mymadeupkey {"accounts": [{"name": "test"}]} root@copper:/etc/swift# swauth-list -K mymadeupkey test {"services": {"storage": {"default": "local", "local": "http://127.0.0.1:8080/v1/AUTH_a6eb7b54-dafc-4311-84a2-9ebf12a7d881"}}, "account_id": "AUTH_a6eb7b54-dafc-4311-84a2-9ebf12a7d881", "users": [{"name": "tester"}]} root@copper:/etc/swift# swauth-list -K mymadeupkey test tester {"groups": [{"name": "test:tester"}, {"name": "test"}, {"name": ".admin"}], "auth": "plaintext:testing"}
- note that the password is stored in plaintext here
storage nodes
TODO:
- change mountpoints from /mnt/sd?3 to /srv/swift-storage/?
- get xfs options (noatime,nodiratime,nobarrier,logbufs=8) into puppet
- firewall off storage nodes so only the proxy nodes can speak to them via swift protocols
- investigate running rsync at different nice / ionice levels
- make sure the firewall covers both ipv4 and ipv6
- get puppet to start swift services
filesystems
each host has /mnt/sda3 and /mnt/sdb3 (representing raw disk partitions /dev/sda3 and /dev/sdb3). these are already formatted xfs. * remount them with appropriate xfs options: for dev in sda3 sdb3 ; do mount /mnt/${dev} -o remount,noatime,nodiratime,nobarrier,logbufs=8; done
moved mountpoints to /srv/swift-storage/* because rsyncd wants to use the parent directory of the mountpoints as where it bases its access. Leaving the swift storage in /mnt would mean that anyhtincg else we might mount in /mnt/ would automatically be available to the rsync daemon. Moving the swift mountpoints puts them in a more standard location.
make all the swift storage writable:chown -R swift:swift /srv/swift-storage/
rsyncd
used the rsyncd.conf from the installation manual with the exception that I removed the address line and changed the mountpoint. rsyncd will bind to the default address, which is fine, and makes it so we don't need to puppet-template the config and insert the hosts' address (we can just use a file instead of a template).
used the rsync file from /etc/defaults with the one change the installation manual suggested.
swift servers
used the templates from the install guide for /etc/swift/{account,container,object}-server.conf but added:
- bind_ip = 0.0.0.0
- devices = /srv/swift-storage/
to override the default location for storage nodes of /srv/node/
testing
using tempauth
Note that there are two different sets of headers that you can pass in to swift for authentication:
- X-Storage-User and X-Storage-Pass
- X-Auth-User and X-Auth-Key (often referred to as user and api-key)
They are the same; when given the option we should use the X-Auth-User/Key instead of X-Storage.
curl to test stuff (nice verbose output):
184 curl -k -v -H 'X-Storage-User: xxx:yyy' -H 'X-Storage-Pass: zzz' https://$PROXY_LOCAL_NET_IP:8080/auth/v1.0 186 AUTH='AUTH_abcdef0123456789' 187 curl -k -v -H "X-Auth-Token: $AUTH" https://copper.wikimedia.org:8080/v1/AUTH_system
various commands to do stuff:
170 PROXY_LOCAL_NET_IP='copper.wikimedia.org' 173 swift -A https://$PROXY_LOCAL_NET_IP:8080/auth/v1.0 -U xxx:yyy -K zzz stat 177 swift -A https://$PROXY_LOCAL_NET_IP:8080/auth/v1.0 -U xxx:yyy -K zzz upload myfiles build-swift-rings 178 find /srv/swift-storage/ 179 swift -A https://$PROXY_LOCAL_NET_IP:8080/auth/v1.0 -U xxx:yyy -K zzz upload builders /etc/swift/*.builder 180 swift -A https://$PROXY_LOCAL_NET_IP:8080/auth/v1.0 -U xxx:yyy -K zzz list 181 swift -A https://$PROXY_LOCAL_NET_IP:8080/auth/v1.0 -U xxx:yyy -K zzz list builders
the output of one
root@copper:~# swift -A https://$PROXY_LOCAL_NET_IP:8080/auth/v1.0 -U xxx:yyy -K zzz list builders myfiles
using swauth
testing the auth system
note that the password 'mymadeupkey' is in proxy-server.conf. The &&echo thing is just to force a newline after the HTTP output. watch /var/log/messages
for output from swauth. the user 'super_admin' is built in as the master root admin style account. Be careful when to use 127.0.0.1 and when to use copper in setting this up; it's sticky. swauth-set-account-service -U .super_admin -K mymadeupkey test storage local http://copper.wikimedia.org:8080/v1/AUTH_a6eb7b54-dafc-4311-84a2-9ebf12a7d881
after the fact can fix it.
- create an accoutn/user/password combo - account: test, user: tester, password: testing.
- swauth-add-user -A http://copper.wikimedia.org:8080/auth/ -K mymadeupkey -a test tester testing
- list accounts
curl -D - -H "X-Auth-Admin-User: .super_admin" -H "X-Auth-Admin-Key: mymadeupkey" http://localhost:8080/auth/v2/ && echo
- swauth-list -K mymadeupkey
- examine that account
curl -D - -H "X-Auth-Admin-User: .super_admin" -H "X-Auth-Admin-Key: mymadeupkey" http://localhost:8080/auth/v2/test
- swauth-list -K mymadeupkey test
testing the object store
using the swift commands
- get statistics
swift -K testing -U test:tester -A http://127.0.0.1:8080/auth/v1.0/ stat
- get a list of buckets
swift -K testing -U test:tester -A http://127.0.0.1:8080/auth/v1.0/ list
- upload some files
- swift -A http://127.0.0.1:8080/auth/v1.0 -U test:tester -K testing upload builders /etc/swift/*.builder
- list those files
- swift -K testing -U test:tester -A http://127.0.0.1:8080/auth/v1.0/ list builders
using curl
- get a list of containers
curl -k -v -H 'X-Auth-User: test:tester' -H 'X-Auth-Key: testing' http://127.0.0.1:8080/auth/v1.0
- take X-Auth-Token and X-Storge-URL from the return headers
curl -k -v -H 'X-Auth-Token: AUTH_tkfcdade108c0b4bdbb4cacb4f1c1ab128' http://127.0.0.1:8080/v1/AUTH_a6eb7b54-dafc-4311-84a2-9ebf12a7d881
- get back a list of containers (builders if you already ran the test above)
- get a listing of a container (using the same auth token you just got)
curl -k -v -H 'X-Auth-Token: AUTH_tkfcdade108c0b4bdbb4cacb4f1c1ab128' http://127.0.0.1:8080/v1/AUTH_a6eb7b54-dafc-4311-84a2-9ebf12a7d881/builders
testing upload.wikimedia.org style URLs
The containers into which the images will be cached must already exist. For testing, create a few by hand; for real use we'll have to script it. NOTE: containers for public wikis must allow unauthenticated read (.r:*) in the ACL.
- create the container for wikipedia/commons
- swift -A http://127.0.0.1:8080/auth/v1.0 -U test:tester -K testing post -r '.r:*' wikipedia-commons-thumb
- put a test file in the container
- mkdir -p d/d5/Foo.test; echo "foo" > d/d5/Foo.test/10px-Foo.test; swift -A http://127.0.0.1:8080/auth/v1.0 -U test:tester -K testing upload wikipedia-commons-thumb d/d5/Foo.test/10px-Foo.test
- request your test file
- curl -v -H "User-Agent:plain" http://localhost:8080/wikipedia/commons/thumb/d/d5/Foo.test/10px-Foo.test
- I'm setting User-Agent to plain just so its easily recognizable in the logs. User-Agent shouldn't matter.
- curl -v -H "User-Agent:plain" http://localhost:8080/wikipedia/commons/thumb/d/d5/Foo.test/10px-Foo.test
Performance Testing
pmtpa test cluster
hardware setup:
- owa1-3 for proxy nodes
- ms1-3 for storage nodes
methods:
- geturls.py -t 30 filelists/wikipedia-filelist-urls.txt
- ab -n 10000 -c 50 -L filelists/wp19k.txt
Initial performance findings
- with ~1m objects, 50-60qps write throughput, 1100qps read throughput
- with ~6m objects, 50-60qps write throughput, 1100qps read throughput
Moved container storage onto ramdisks on owa1-3
- with ~11m objects, 50-60qps write throughput, 750qps read throughput
Moved container storage back onto ms1-3
- with ~11m objects, ~45qps write throughput, 500-700qps read throughput
debugging
To enable more logging in the swift middleware we've written, modify /usr/local/lib/python2.6/dist-packages/wmf/rewrite.py
and add:
- at the head of the file, add:
from swift.common.utils import get_logger
- to the class WMFRewrite in __init__, add
self.logger = get_logger(conf)
- where you want to add logging (eg in WMFRewrite:__call__):
- self.logger.warn( "Env: %s" % env)
tail /var/log/messages and look for messages from the proxy-server process
if you get a 401 response, check the ACL on the container - make sure Read ACL has '.r:*':
root@copper:/usr/share/pyshared/swift/common# swift -A http://127.0.0.1:8080/auth/v1.0 -U test:tester -K testing stat wikipedia-commons-thumb Account: AUTH_a6eb7b54-dafc-4311-84a2-9ebf12a7d881 Container: wikipedia-commons-thumb Objects: 2 Bytes: 4 Read ACL: .r:* Write ACL: Sync To: Sync Key: Accept-Ranges: bytes