PAWS/PAWS examples and recipes
Overview
This is a list of existing PAWS notebooks created by users that can serve as examples for others. The list includes notebooks that employ database connections and API connections and are useful to individuals wishing to complete research and on-wiki tasks.
If you want to download and re-use these examples, see the instruction on how to quickstart with PAWS.
The notebooks are marked with specific topic tags (see key below) to help aid in understanding what they cover and what tasks they are best suited for.
Key
A visual key to help keep track of what examples and tutorials are available Example Tutorial or How-to API Wikireplicas Datadumps Research & Analysis On-Wiki tasks Pywikibot Wikidata SPARQL
Wiki replicas and datasets
Connecting to Wiki replicas
Note: As of April 2021, there is a new method for connecting to Wikireplicas. Please see the following notebooks for examples of how to connect to the Wiki replicas. If you plan on following the other Tutorials or How-tos or fork the examples below. Make sure to use the most current method of connecting to the Wiki replicas.
- Working with Wiki replicas and datasets
- Using Wikireplicas with PAWS and Python
- Accessing the new replicas & changes from the previous cluster
Tutorials and How-tos
Tutorial or How-to Research & Analysis Wikireplicas Wikidata SPARQLDatadumps
- Using Wikireplicas from PAWS with Python - A quick tutorial that explores how to connect to the Wikireplica databases and make a query.
- Revision histories - A lab that explores how to do some introductory data extraction and analysis from Wikipedia data
- Hyperlink networks (APIs)
- Collaboration networks - A lab that explores how to analyze the structure of collaborations in Wikipedia data about users revisions across multiple articles
- Pageviews - A lab that explores how to analyze the structure of collaborations in Wikipedia data about users; revisions across multiple articles
- How to explore Wikimedia data using Python – XML, SQL dumps, and APIs A Jupyter notebook tutorial that explores, analyzes and visualizes data from the Simple English Wikipedia using Python, mediawiki-utilites, and Pandas
- SQL Demo and examples - A variety of examples for working with SQL and PAWS
- How-to - Visualizing Wikipedia topics - Connect to the database and use several Python libraries to create visualizations of data from Wikipedia
- How-to - Teahouse question archive builder - This notebook will build a queryable data object out of a parsed thread dataset
- How-to - Event Stream, API, Database connections - A variety of methods for accessing data about revisions
- Querying Wikidata with SPARQL
Wikidata dumps tutorials
Tutorial or How-to Research & Analysis Wikidata Datadumps
Example notebooks
Note: This only includes notebooks without JOINS in their SQL queries -- which may not work correctly after planned changes to Wiki replicas. For a list of notebooks that include JOINS by USER-ID, see this list: https://wikitech.wikimedia.org/wiki/User:SRodlund/PAWS_examples_lists/notebooks_with_joins
Wiki replicas
Wiki replicas Research & Analysis Wikidata Example
- Find Wikidata Q ids for all pages in category
- Curation log
- Get count of unreviewed pages per creation day, by autoconfirmed status
- Get the recent changes of the day
- Common edits by WMF staff
- How many NPP pages marked for deletion are actually deleted?
- Teahouse Answers
- Language revision counts per day
- SELECT page_title FROM page WHERE page title like ;% %;
- Wikidata database - Names similar to Karl
- Number of pages with "Berlin" - Wikimedia DE
- Changes made to pages using MyPySQL and Pywikibot - HY Wikipedia
- User Ids and their edit counts - Teahouse
- Get top viewed categories
- Tables Download
- Querying Media Counts - WikiLovesAfrica
- Querying images and how often they were used - WikiLovesAfrica
- This notebook contains functions for article comparison
- Edit notices - En Wikipedia
- A look at Barnstars
- Images not marked for fair use
- Wiki abuse filter list
- What is the annual volume of patrolling?
Dumps
Research & Analysis Datadumps Example
- Edit summary data from MediaWiki history dump
- Accessing page protections
- Wikimedia - public dumps
- Inferring countries from articles - public dumps
- Pageviews - public dumps (note on the file paths:
/public/dumps/pageviews/
is now/public/dumps/public/other/pageviews/
[1]) - A variety of tasks with dumps
- Public dumps
- Generic notebook for dump processing
- Simplified Wikidata dumps
- Extract pages containing a keyword from a dump
SPARQL
Research & Analysis SPARQL Wikidata Example
- Call SPARQL with Python
- Building layered maps using SPARQL
- Add references to items already in Wikidata
- Get Wikipedia languages SPARQL query
- Exploring Smithsonian content on Wikidata - queries and stats
Wikidata Query
Research & Analysis Wikidata Example
- Runs Wikidata query in iframe and displays results
- Get Wikidata info from an arbitrary URL
- Species without English descriptions - Wikidata
APIs
PAWS notebook: API Connections
Tutorials and How-tos
API Tutorial or How-to
- API Connections With PAWS - An overview of how to use PAWS with APIs. Updated 2021
- MediaWiki page history - The MediaWiki REST API lets you build apps and scripts that interact with any MediaWiki-based wiki. In this tutorial, we'll use the REST API page history endpoints to explore the history of articles on English Wikipedia.
- MediaWiki Rest API examples- This notebook contains a variety of MediaWiki Rest API examples: search pages, autocomplete page title, get page history, get page history counts, get revision, compare revision, get page, get page offline, get page source, get languages, get files, get files on a page, create page, update page.
- Wikimedia Feeds intro - Many Wikipedias include daily featured articles and other curated content on their homepages. You can see an example of this content on the main page of English, German, and French Wikipedias. The Wikifeeds API lets you access this content programmatically and add high-quality, multilingual content to your apps.
- Create an image grid using free images from Wikimedia Commons - This guide uses the MediaWiki REST API to explore media files on Wikimedia Commons. Wikimedia Commons is a collection of over 60,000,000 freely usable media files, many of which are used in Wikipedia articles.
- Reuse free images from Wikimedia Commons - This guide uses the MediaWiki REST API to explore media files on Wikimedia Commons. Wikimedia Commons is a collection of over 60,000,000 freely usable media files, many of which are used in Wikipedia articles.
- Exploring page history- The MediaWiki REST API lets you build apps and scripts that interact with any MediaWiki-based wiki. In this tutorial, we'll use the REST API page history endpoints to explore the history of articles on English Wikipedia.
- Search Wikipedia articles - The MediaWiki REST API lets you build apps and scripts that interact with any MediaWiki-based wiki. In this tutorial, we'll use the REST API search endpoints to search for articles about the Solar System on English Wikipedia.
- Retrieving free knowledge - This guide uses the MediaWiki REST API to explore articles on English Wikipedia.
- Wikipedia page stats comparison - This guide uses the MediaWiki REST API to explore articles on English Wikipedia.
- Get featured content from English Wikipedia - The Wikifeeds API provides convenient access to content featured on the Main Page of English Wikipedia.
Example notebooks
Various notebooks using APIs
API Example On-Wiki tasks
- Action API tests
- Article quality demo
- Blocked users Wikipedia DE
- Get namespace names - MediaWiki API
- Find vandalism on a give set of pages
- Find pages translated from English to Hindi
- Understand impact of the content translation tool
- Content translation exploration A complex notebook featuring content translation Super interesting; not sure if it is entirely useful for this purpose
- Wikidata API example - update descriptions
- Extracting Covid-19 data from English Wikipedia
Pywikibot (Uses MediaWiki API)
Pywikibot API Example On-Wiki tasks
- Add copyright to items in Wikidata
- Add copyright, creator to items in Wikidata
- Add awards to Wikidata category Sports Hall of Fame
- Add references to items already in Wikidata
- Auto Wikiproject
- Add short descriptions to biographies on Wikipedia EN
- Add items to Wikidata
- Change qualifier in P39 statements - Wikidata
- Make changes to pages using MyPySQL and Pywikibot - HY Wikipedia - On Wiki task using replicas and API
- Remove broken files
- Investigate bot issues
- Policy changes - ZH Wikipedia - Uses databases, pywikibot, JSON files, etc
- Teahouse archives answers - Uses databases, pywikibot, JSON files, etc
- Analyze number of new editors per month
- Categorize images after the end of Wiki Loves Love
- Clean history merge list - WikiProject history
- Categorize images from Wiki Loves Earth
- Move and recategorize patronymic names on Commons
- Dead interlanguage links
- Fix BDA Ids on Wikidata
- Fix titles on Wikidata
- Get articles without images
- Global replace in Wikipedia DE
- Categorize graves in cemeteries - commons
- Mass remove claims - Wikidata
- A script to move pages
- Get files with NASA image template - Commons
- Remove redirect class
- Check userpage authorship - RU Wikipedia
- Fix bad interwiki links
- Upload text
- Parse data from talk pages
- Add a property to a category - Wikidata
- Autostatus update for WikiProject
- Batch delete and unlink images
- Identify unhelpful file names on Commons
- Bulk deprecate a template
- Bulk deprecate an index parameter
- Add statements to candidates in Canada elections - Wikidata
- Move all pages from one subcategory to another
- Create new user pages
- Redirect a talk page
- Relicense uploads to Wikimedia Commons
- Replace page text
- Update a redirect
Further resources
- PAWS is a Jupyter notebooks installation hosted by Wikimedia Cloud Services. The existing Jupyter Notebooks documentation is an excellent resource for PAWS users.
- Check out the PAWS README on GitHub for information on useful libraries and storage space.
- To import Python packages in PAWS, see PAWS/Python with Pip.