Nova Resource:Copypatrol
Project Name | copypatrol |
---|---|
Details, admins/members |
openstack-browser |
Monitoring |
CopyPatrol
Description
CopyPatrol is a tool that allows you to see recent Wikipedia edits that are flagged as possible copyright violations.
Project status
currently running
Contact address
- phab:tag/copypatrol
- copypatroltoolforge.org
This page documents how to create a VPS instance of CopyPatrol. This consists of four components – the frontend webserver (Apache), the frontend app (PHP/Symfony), and the backend webserver (nginx) and application (Python).
Frontend
Web server
Install and configure Apache and PHP.
sudo apt -y install php php-common php-cli php-fpm php-json php-xml php-intl php-curl php-apcu php-mysql apache2 libapache2-mod-php cron
Create the web server configuration file at /etc/apache2/sites-available/copypatrol.conf
with the following:
<VirtualHost *:80>
DocumentRoot /var/www/public
ServerName copypatrol.wmcloud.org
# Requests with these user agents are denied.
SetEnvIfNoCase User-Agent "(uCrawler|Baiduspider|CCBot|scrapy\.org|kinshoobot|YisouSpider|Sogou web spider|yandex\.com\/bots|twitterbot|TweetmemeBot|SeznamBot|datasift\.com\/bot|Googlebot|Yahoo! Slurp|Python-urllib|BehloolBot|MJ13bot|SemrushBot|facebookexternalhit|rcdtokyo\.com|Pcore-HTTP|yacybot|ltx71|RyteBot|bingbot|python-requests|Cloudflare-AMP|Mr\.4x3|MSIE 7\.0; AOL 9\.5|Acoo Browser|AcooBrowser|MSIE 6\.0; Windows NT 5\.1; SV1; QQDownload|\.NET CLR 2\.0\.50727|MSIE 7\.0; Windows NT 5\.1; Trident\/4\.0; SV1; QQDownload|Frontera|tigerbot|Slackbot|Discordbot|LinkedInBot|BLEXBot|filterdb\.iss\.net|SemanticScholarBot|FemtosearchBot|BrandVerity|Zuuk crawler|archive\.org_bot|mediawords bot|Qwantify\/Bleriot|Pinterestbot|EarwigBot|Citoid \(Wikimedia|GuzzleHttp|PageFreezer|Java\/|SiteCheckerBot|Re\-re Studio|^R \(|GoogleDocs|WinHTTP|cis455crawler|WhatsApp|Archive\-It|lua\-resty\-http|crawler4j|libcurl|dygg\-robot|GarlikCrawler|Gluten Free Crawler|WordPress|Paracrawl|7Siters|Microsoft Office Excel|MJ12bot|AhrefsBot|dotbot|amp-cloud|naver\.me\/spd|Adsbot|linkfluence|coccocbot|sqlmap)" bad_bot=yes
CustomLog ${APACHE_LOG_DIR}/access.log combined expr=!(reqenv('bad_bot')=='yes'||reqenv('dontlog')=='yes')
CustomLog ${APACHE_LOG_DIR}/denied.log combined expr=(reqenv('bad_bot')=='yes')
ErrorLog ${APACHE_LOG_DIR}/error.log
<Directory /var/www/public/>
Options Indexes FollowSymLinks
AllowOverride All
Require all granted
DirectoryIndex index.php
RewriteEngine On
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</Directory>
<Directory /var/www/>
Options Indexes FollowSymLinks
AllowOverride None
Require all granted
Deny from env=bad_bot
</Directory>
ErrorDocument 403 "Access denied"
RewriteCond "%{HTTP_REFERER}" "^http://127\.0\.0\.1:(5500|8002)/index\.html" [NC]
RewriteRule .* - [R=403,L]
RewriteCond "%{HTTP_USER_AGENT}" "^[Ww]get"
RewriteRule .* - [R=403,L]
RewriteEngine On
RewriteCond %{HTTP:X-Forwarded-Proto} !https
RewriteRule ^/?(.*) https://%{SERVER_NAME}/$1 [R=301,L]
</VirtualHost>
Enable various Apache modules, and the web server configuration (and disable the default site, which isn't used):
sudo a2enmod php8.2 rewrite
sudo a2ensite copypatrol
sudo a2dissite 000-default
sudo apache2ctl graceful
Application
Install dependencies:
sudo apt install composer
Clone the repository, first removing the html/
directory created by Apache.
cd /var/www && sudo rm -rf html
sudo git clone https://github.com/wikimedia/CopyPatrol.git .
Copy .env
to .env.local
and adjust accordingly, ensuring to change APP_ENV
to prod
. You can also remove all rows for REPLICAS_HOST_
, REPLICAS_PORT_
, TOOLSDB_HOST
and TOOLSDB_PORT
as they are already properly defined in .env.prod
.
Install the composer packages and restore ownership of files to www-data (Apache):
sudo composer install --no-dev -o
sudo chown -R www-data:www-data .
Add the cron job to update the app when there's a new tagged release with sudo crontab -e -u www-data
then add:
MAILTO=copypatrol@toolforge.org
*/10 * * * * /var/www/vendor/wikimedia/toolforge-bundle/bin/deploy.sh prod /var/www/
Backend
See https://github.com/JJMC89/copypatrol-backend/tree/main/.vps.
Server admin log
2024-10-31
- 06:50 JJMC89: copypatrol-backend-prod-01 sudo /var/www/.venv/bin/pip install --upgrade --upgrade-strategy eager pip setuptools wheel /var/www/copypatrol-backend # apply security updates
2024-08-15
- 17:01 JJMC89: copypatrol-backend-prod-01 deploy ed212c4..81ec727
2024-07-22
- 16:58 andrewbogott: upgrading db servers copypatrol-prod-db-01 and copypatrol-dev-db-01 to latest Trove guest image
2024-06-23
- 00:31 JJMC89: copypatrol-backend-prod-01 deploy f61d2c0..ed212c4
2024-06-21
- 03:03 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.o... (more)