Requestctl/Overview
This page is currently a draft. More information and discussion about changes to this draft on the talk page. |
This page explains the basic concepts behind how the requestctl tool manages configuration to control access and routing of web requests.
TODO: all references to commmands in this doc must link to the relevant entry in the command line reference.
Overview of http request control configuration
TODO: diagram?
etcd stores configuration that is:
- written/modified by conftool
- read from confd
- translated to VCL that gets loaded by varnish.
The configuration stored in etcd loads the following into the varnish configuration:
- The list of pooled ats backends to connect to in the same datacenter
- The list of IP ranges for every public cloud, in the form of a netmapper file
- A list of rate-limiting/ban rules for incoming traffic
How requestctl modifies configuration
requestctl modifies data that resides in the main Etcd cluster. Specifically the keys you'll modify are under /conftool/v1/request-{ipblock,action,pattern}s/
. requestctl enables you to manage the configuration for:
- IP ranges for every public cloud
- Rate-limiting/ban rules for incoming traffic.
Object model
To configure IP ranges and rate-limiting/ban rules, requestctl uses a custom schema that defines three types of objects:
- pattern objects describe specific patterns of an HTTP request.
- ipblock objects group specific IP ranges into logical groups.
- action objects describe an action to be performed on a request that matches specific combinations of patterns and ipblocks. Actions are what is enabled on varnish.
- haproxy_action objects are similar to
action
objects, but allow a different set of actions because of the capabilities of haproxy.
- haproxy_action objects are similar to
Pattern
Describe specific patterns of an HTTP request. A pattern object should be able to describe, with good flexibility, the large majority of the characteristics we want to match in a request.
Each pattern has an associated “scope” tag. The fields of each record are:
method
, the http methodrequest_body
a regex to match in the http body. CURRENTLY UNSUPPORTED IN VARNISH.url_path
the path part of the url, will be used as a regexpheader
an header name to match, using the regexp atheader_value
;header_value
the regexp to match the value ofheader
to. If left blank when a header is defined, the pattern means “the header is not present”query_parameter
andquery_parameter_value
are a parameter and a regexp for the value of a query parameter to match. An empty value will be interpreted as “for any value”.
Ipblock
Group specific IP ranges into logical groups.
- For example: the ipblock with scope=cloud,name=aws includes all the IP ranges used by AWS.
The ipblock object is very simple: it has an associated scope tag (which has semantic value, see above), a name and just two fields:
comment
: a comment to describe what is the intended use of an ipblockcidrs
: a list of network ranges in CIDR notation.
Action
Describe an action to be:
- performed on a request that matches specific combinations of patterns and ipblocks
- enabled on varnish
An action object has two main functions:
- describing a composition of patterns and ipblocks to form a request pattern that we want to manage, and
- describing the actions we will take on matching requests.
The objects are associated to a specific cluster (cache-text
or cache-upload
at the time of writing) and have a name. Their fields are as follows:
enabled
boolean. If false, the pattern will not be included in VCLsites
a list of datacenters where to apply the rule. If empty, the rule will be applied to all datacenters.cache_miss_only
boolean. If false, the pattern will be applied also to cache hits.comment
a comment to describe what this action does.expression
a string describing the combination of patterns and ipblocks that should be matched. The BNF of the grammar is described incli.Requestctl.grammar
, but in short:- A pattern is referenced with the keyword
pattern@<scope>/<name>
- An ipblock is referenced with the keyword
ipblock@<scope>/<name>
- Patterns and ipblocks can be combined with
AND
/AND NOT
andOR
/OR NOT
logic and groups can be organized using parentheses.
- A pattern is referenced with the keyword
So for example, a valid expression could look like:
( pattern@ua/requests OR pattern@ua/curl ) AND ipblock@cloud/aws AND NOT pattern@site/commons
resp_status
the http status code to send as a responseresp_reason
the text to send as a reason with the responsedo_throttle
boolean to say if we should throttle requests matching theexpression
(true) on just respond withresp_status
unconditionally (false)throttle_requests
,throttle_interval
,throttle_duration
are the three arguments ofvsthrottle
in VCL to control the rate-limiting behaviour.throttle_per_ip
boolean makes the rate-limiting per-ip rather than per-cache-serverlog_matching
if true, it will record in X-Requestctl if a request matches the rule. It will thus be included into thevcl
objects even if disabled; it will just not perform any banning / ratelimiting action.
Haproxy_action
objects are similar to action
objects, but allow a different set of actions because of the capabilities of haproxy.
This type of object describes actions on haproxy. While it shares most of the fields (and their meaning) with the action
objects, there are some notable differences:
- No rate-limiting is enforced at this level, as we were wary of the performance implications of adding potentially one stick-table per rule. Rate-limiting will have to happen at the varnish layer. So all
throttle_*
fields aren’t present. - No
log_matching
is supported at this layer for now. It means that we won’t mangle X-Requestctl at this layer at least for now. - Given we’re not caching anything in haproxyt,
cache_miss_only
has no meaning - Haproxy can both deny or silently drop a request. To do that, set the
silent_drop
field to True - Haproxy can limit the bandwidth that can be used for requests matching a certain pattern. This is controlled by the
bw_throttle
(boolean),bw_throttle_rate
(the rate limit in bandwidth), andbw_duration
(duration of the limit)
Injection into varnish and haproxy
When you run the requestctl commit
command, the tool automatically generates a derived data type for each type of action, called vcl
and haproxy_dsl
respectively. This derived object is then injected into varnish and haproxy.
TODO: the following two sections could use diagrams
A given varnish host has:
- One vcl condition per enabled
action
defined for the cluster the host is in.- The action is either enabled for all datacenters, or it’s enabled for the specific datacenter the host is in.
- A netmapper file containing all the
ipblock
entries defined under the cloud scope- Possibly more netmapper files for things like crawlers.
- A vcl list of ACLs, one for each
ipblock
entry under the abuse scope
HAproxy has:
- One or multiple ACL definitions per pattern/ipblock used in an enabled action
- A map file for each
ipblock
scope. - One
http-request
condition perhaproxy_action
that is enabled
TODO: the following could maybe include a diagram On all cache proxy servers, a Confd instance watches these keyspaces and generates the following files:
- /var/netmapper/public_clouds.json from the data at /conftool/v1/request-ipblocks/cloud/, using this template. This netmap is then used to add an
X-Public-Cloud: <name>
header to requests coming from any IP address in those ranges. This data is updated daily using a script that runs on the puppetservers. - /etc/varnish/blocked-nets.inc.vcl from the data at /conftool/v1/request-ipblocks/abuse/, using another template. This generates a list of varnish acls that can be referenced later in the VCL. This data is currently duplicated from the private puppet hiera, and will need to be kept in sync somehow.
- /etc/varnish/requestctl-filters.inc.vcl from the data at
/conftool/v1/request-vcl/cache-$CLUSTER/{global,$datacenter}
, where vcl gets stored when we runrequestctl commit
. This code gets injected directly in thecluster_fe_ratelimit
VCL subroutine, so it only applies to cache misses at the moment (see: phab:T317794).