Wikimedia Cloud Services team/EnhancementProposals/Third-party interaction consent tool

This page is currently a draft.
Material may not yet be complete, information may presently be omitted, and certain parts of the content may be subject to radical, rapid alteration. More information pertaining to this may be available on the talk page.

Tracked in Phabricator
Task T130748

Services provided via the Wikimedia Cloud Services infrastructure should respect the broad protections for end-user privacy that are provided in the Wikimedia production web environment. At the same time it should be possible to host tools with WMCS which interact with third-party services, especially when those services can be used to enhance or expand the Wikimedia free knowledge projects.

Toolforge currently uses a Content-Security-Policy (CSP) HTTP header in report-only configuration to monitor third-party interactions from the visiting browser. These reports are collected by and reviewable via the csp-report tool. This project would update the use of CSP to active enforcement. Once in enforcement mode, tool maintainers will be blocked programmatically from including unapproved third-party content in their HTML output until they request approval of the specific third-party sites from their visitors. When a visitor has granted consent to a particular tool, that tool's list of additional allowed domains will be added to the enforce mode CSP header. This consent will have a maximum duration not to exceed 1 year.

Use cases

As a Tool Maintainer

I want to receive positive consent before causing a web browser to share information with third-parties

So I can respect the privacy thresholds of my fellow Wikimedians.

Add a content source
Delete a content source

As a Tool User

I want to be asked for consent before having third-party resources used by my browser

So I can decide if sharing my IP address, User-Agent header, and possibly other information with the third-party is justified by the value I will receive from the tool.

See additional content sources requested by a tool
Consent to additional content sources for a specific tool
Revoke consent for a tool
See tools I have granted consent for
Revoke consent for all tools

As a Bot Maintainer

I want to pre-authorize third-party resource consent

So I can write bots that interact with Toolforge hosted webservices requiring consent without manual intervention.

Consent to additional sources a priori

General constraints

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Consent is tracked using HTTP cookies stored by the end user's browser.
- Consent cookies MUST be scoped to the affected tool's domain: $TOOL.toolforge.org
- Consent cookies MUST be marked Secure & HttpOnly so that they cannot be tampered with by javascript delivered by the tool
- Cookie MUST somehow encode the approved sources so that if new sources are added the user is prompted for consent again, including information about their prior consent if that is reasonably possible
Personally identifiable information (PII) MUST NOT be stored on the server related to consent decisions
Determining if consent is necessary MUST be an extremely lightweight operation for the proxy. It is more important that this operation is fast and low cost than correct. Worst case for a false negative will be CSP blocked resources for the end user, possibly making the application appear broken.
Validating a consent cookie SHOULD be a lightweight operation for the proxy.
Consent MUST NOT be permanent.
- A maximum duration of 1 year for an individual consent decision is RECOMMENDED.

Workflows

Normal request

User requests https://$TOOL.toolforge.org/... URL
Dynamicproxy checks to see if $TOOL has registered additional CSP sources
1. If no sources registered, process request normally
Dynamicproxy checks to see if a WMCS-CSP-CONSENT cookie is present
1. If not, 302 redirect to https://$TOOL.toolforge.org/.wmcs/csp-consent?url=... for additional processing
Dynamicproxy checks to see if the WMCS-CSP-CONSENT cookie grants needed consent
1. If no, return 403 Forbidden response explaining the block and how to grant consent
2. If yes, add approved sources to response CSP header
Dynamicproxy passes request to upstream tool service as per a normal request

Consent redirect

Display:
1. The name of the tool being accessed
2. The additional sources requested by the tool maintainer
3. A "remember this decision" checkbox
4. "Allow my browser to access these websites when using $TOOL" button
5. "Cancel" button
6. ...
If approved:
1. If "remember this decision" is checked:
  - Set-cookie: WMCS-CSP-CONSENT="..."; Path=/; Max-Age=$MAX_CONSENT_DURATION; Secure; HttpOnly; SameSite=None
2. Else:
  - Set-cookie: WMCS-CSP-CONSENT="..."; Path=/; Secure; HttpOnly; SameSite=None
If rejected:
1. Set-cookie: WMCS-CSP-CONSENT=false; Path=/; Secure; HttpOnly; SameSite=None
307 redirect to original url

FAQs

What can be done about the typical "POST loophole" for redirecting?

Intercepting and redirecting without application level context is not safe for the POST verb
We will return a 403 Forbidden response for all POST without a WMCS-CSP-CONSENT cookie (or bot opt-in if implemented) to close the POST loophole while avoiding the complications of non-safe redirects

Can we audit who has granted consent?

We do not want to store any consent information server side that is connected to PII
We could increment numeric counters for things like consent forms served, consent granted, consent denied if there is a good reason to do so.

Can folks see all tools that are requesting consent?

This should be allowed via some well-known URL, probably easiest to just make this a tool and the same tool that is actually used by maintainers to manage their additional sources.

How can I see all tools that I have granted consent to?

Would it be sufficient to show the user the list of all tools that they may have granted consent to?
This one is tricky with the proposed design.
Consent only exists as cookies in the user's cookie store
Each consent cookie is scoped to a single tool sub-domain and marked HttpOnly
Toolforge.org, wmcloud.org, and wmflabs.org are all on the public suffix list which means browsers should not allow us to set any cookies at the parent domain level where it would be sent to all sub-domains.
To check the cookies, we would need to send an HTTPS request to every sub-domain that might have a cookie and get some signal back without violating third-party cookie constraints in the user-agent

How does a user revoke consent for a tool?

Delete the WMCS-CSP-CONSENT cookie for the tool's domain
Do we need to provide a UI for doing this?
- See "How can I see all tools that I have granted consent to?" for some of the challenges here
- Fairly easy for a single tool, but more complicated in a bulk workflow
Opt-in workflow should include a "remember this decision" option to turn on far future cookie expiration with the default being session scoped cookies.

Opt-in for robots?

Do we need to build a mechanism for bots to say "I approve any and all sources that may be requested?"
Probably easiest to implement as a special request header something like X-WMCS-CSP-CONSENT if desired