Jump to content

Nova Resource:Deployment-prep/Squid help

From Wikitech
This page may be outdated or contain incorrect details. Please update it if you can.

Why Squid?

Squid is a high-performance proxy server that can also be used as an HTTP accelerator for the webserver. Explained in layman terms, Squid will store a copy of the pages served by webserver and the next time the same page is requested, Squid will serve the copy. This process is called "caching" and it removes the need for the webserver to regenerate that same page again, resulting in a tremendous performance boost for the webserver.

Since MediaWiki websites are generated entirely dynamically, there is a substantial performance gain in running Squid as a HTTP accelerator for your webserver. In fact, sites like Wikipedia use several Squid caches to enhance their performance.

Because of this performance gain, MediaWiki has been designed to integrate closely with Squid. For example, MediaWiki will notify Squid when a page should be purged from the cache in order to be regenerated.

The architecture

How to set up a combo of Squid, Apache and MediaWiki on a single server is outlined below. It is possible to use a more complex caching strategy or use different port numbers and IP-addresses, but for this simple example we strive for the following single-server architecture:

Outside world <--->

Server

Squid accelerator
w.x.y.z:80

<--->

Apache webserver
127.0.0.1:80


To the outside world, Squid will seem to act as the webserver. In reality, it passes on requests to the Apache webserver, but only when necessary. Apache runs on the same server, but it only listens to requests from localhost (127.0.0.1). Rest assured, running both services on port 80 will not cause conflicts, since both services are bound to different IP addresses.

Setting it up like this means Apache cannot be accessed from the outside world directly, only through Squid. Using this configuration, Apache can only be accessed directly from the console of the server it is running on. For testing and troubleshooting purposes to bypass Squid completely, one can use Elinks (http://elinks.or.cz/) and browse to http://127.0.0.1/

Configuring Squid 2.5

Due to its versatility Squid has a very large "squid.conf" configuration file. There are however only a few settings relevant when using Squid in accelerator mode. See below section for 2.6 configuration.

First and foremost, Squid needs to know which IP-address and port to listen to:

# Use your own external IP-address
http_port 207.142.131.205:80

(Note that multiple addresses - and a newer version of Squid - will be needed if you intend to support both IPv4 and IPv6).

Then, Squid needs to know what host to accelerate. In this case, Apache will be listening to port 80 on the localhost:

#Accelerating only Apache on localhost
httpd_accel_host 127.0.0.1
httpd_accel_port 80
httpd_accel_single_host on

If you run virtual domains, you also want this:

httpd_accel_uses_host_header on

For Squid 2.6:

http_port 207.142.131.205:80 transparent vhost defaultsite=<sitename>
cache_peer 127.0.0.1 parent 80 3130 originserver 

Now it is time to define what may be accessed and from where. This is done by defining access control lists ("acl"s) and allowing or denying http_access. Basically, there are a few things we need to allow in order for our setup to work:

  1. Access to the web port (80) must be allowed
  2. For maintenance purposes access to the cachemanager will be allowed for the localhost
  3. MediaWiki's requests to purge pages will be allowed for the localhost

All other access will be denied. This results in the following configuration:

# Minimum setup
acl all src 0.0.0.0/0.0.0.0
acl manager proto cache_object
acl localhost src 127.0.0.1/255.255.255.255
 
# Allow access to the web ports
acl web_ports port 80
http_access allow web_ports
 
# Only allow cachemgr access from localhost
http_access allow manager localhost
http_access deny manager
  
# Allow purge
acl purge method PURGE
http_access allow purge localhost
http_access deny purge
 
# And finally deny all other access to this proxy
http_access deny all

Note: There is a mention in the function sendCacheControl() in OutputPage.php of more rules that should be added in to replace Cache-Control headers http://wiki.aulinx.de/Cache-Control.

Configuring Apache

The Apache webserver now needs to be configured to listen only to the localhost port 80. The file httpd.conf (or possibly ports.conf) should contain the following line:

Listen 127.0.0.1:80

and if you are using virtual hosts also lines like:

NameVirtualHost 127.0.0.1:80

<VirtualHost 127.0.0.1:80>
    ServerName meta.wikimedia.org
    ...
</VirtualHost>

Please see http://wiki.apache.org/httpd/CouldNotBindToAddress for more on troubleshooting this step.

Configuring MediaWiki

When configuring MediaWiki act as if there is no Squid. Meaning, use the servername the outside world would use instead of the internal IP-address. E.g., use "meta.wikimedia.org" for servername instead of "127.0.0.1".

Since Squid is doing the requests from localhost, Apache will receive "127.0.0.1" as the direct remote address. However, as Squid forwards the requests to Apache, it adds the "X-Forwarded-For" header containing the direct remote address as received by Squid. This way the remote address from the outside world is preserved.

By default MediaWiki will use the direct remote address for changes etcetera, so it must be configured to use the "X-Forwarded-For" header instead in order to function correctly. Make sure the LocalSettings.php file contains the following lines:

$wgUseSquid = true;
$wgSquidServers = array('<your IPv4 address>');
$wgSquidServersNoPurge = array('127.0.0.1');

This ensures both that addresses internal to your network (such as the Squid server or the 127.0.0.1 loopback) do not appear in Special:Recentchanges and that notification to discard changed pages will be sent to Squid (not Apache).

See also Manual:Configuration settings#Squid for all configuration settings related to squid.

Some notes

In this setup, Squid will shield off most of the traffic to Apache. Therefore, if you need reliable web statistics from a statistics package like e.g. AWStats, you will need to set it up to analyze Squid's access_log instead of Apache's.

If you plan on balancing between multiple servers, plan on changing your session.save_path PHP configuration to be shared through all your backend Apache servers, see this e-mail and info on NFS. Or consider storing your sessions in memcached, see the global variable $wgSessionsInMemcached.

Squid 2.6 Configuration Settings

Squid 2.6 has simplified the http accelerator configuration, and these settings should work:

http_port <Your external IP>:80 defaultsite=<Your DNS sitename> vhost
cache_peer 127.0.0.1 parent 80 0 no-query originserver round-robin name=wiki
acl mySites dstdomain <Your DNS sitename> <any other vhosts>
cache_peer_access wiki allow mySites
cache_peer_access wiki deny all
http_access allow mySites

Also, a URL rewriter isn't necessary for redirecting from *.com and *.net domains to your *.org domain if you have $wgServer set in your LocalSettings.php since Mediawiki will take care of this for you.

Squid 3.1.x Configuration Settings

Squid 3.1.5 is similar in configuration to Squid 2.6, except:

  • The "acl all" mask is predefined, so should not be defined again in squid.cfg
  • The netmask should be indicated as a number of bits (not a bitmask)
  • The use of none in a log file name to turn off logging will generate warnings

so:

# Squid 2.6
acl all src 0.0.0.0/0.0.0.0
acl manager proto cache_object
acl localhost src 127.0.0.1/255.255.255.255
cache_store_log none

becomes:

# Squid 3.1.x
acl manager proto cache_object
acl localhost src 127.0.0.1/32

While an unmodified Squid 2.6 configuration file may work, it will generate warnings in the system log.

Squid 3.1.5 also has the advantage of being able to listen for both IPv4 and IPv6 connections:

http_port  <Your external IPv4>:80  defaultsite=<Your DNS sitename> vhost
http_port [<Your external IPv6>]:80 defaultsite=<Your DNS sitename> vhost
cache_peer 127.0.0.1 parent 80 0 no-query originserver round-robin name=wiki

where multiple outside IP addresses may be listed, one per line, in either IPv4 or IPv6 protocol:

http_port [2001:db8::2]:80 vhost defaultsite=example.org
http_port [2001:db8::123:456]:80 vhost defaultsite=example.org

Note that, as Squid handles the task of listening for all outside connections and Apache merely sits behind it on a local loopback address (127.0.0.1:80) it is not necessary to configure Apache to be IPv6-aware in this instance.

Only your cache server (Squid in this instance), your domain name server (IN AAAA records) and your network (ipconfig, route) need to be modified to contain IPv6-specific information if you intend your wiki to be IPv6-compatible and are using Squid (or Varnish).

Apache 2.x-Logfile Settings

The Apache Webserver is only seeing "127.0.0.1:80" Within Apache you can use the Parameter "X-Forwarded-for" which is provided by Squid e.g. within a custom logfile format. The sample below is similar to the "combined" one.

Settings

mod_log_config.conf

LogFormat "%{X-Forwarded-for}i %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" cached

squid.conf**

forwarded_for on