Jump to content

WMDE/Wikidata/PropertySuggester update

From Wikitech

Occasionally, the data for the property suggester needs to be updated from the latest JSON dumps.

This process requires access to a Wikimedia deploy host (deployment.eqiad.wmnet).

Please note that occasionally this property suggester will be referred to as the 'legacy property suggester'. However as of 2024 December, the new property suggester has not yet been deployed.

One-time setup

Each update

  • Find the latest wbs_propertypairs on https://analytics.wikimedia.org/published/datasets/wmde-analytics-engineering/Wikidata/wbs_propertypairs/ (generated on stat1005 by a cron from ladsgroup). We’ll use yyyymmdd as a placeholder for its name below.
  • Pull analyzed-out.gz to your local machine, apply wbs_propertypairs-refine refine.py (README) and commit to the wbs_propertypairs repo with the commit message Add propertypairs from the yyyymmdd dump.
  • Load it down to the deployment host with https_proxy=http://webproxy.eqiad.wmnet:8080 wget 'https://github.com/wmde/wbs_propertypairs/raw/master/yyyymmdd/wbs_propertypairs.csv.gz'.
  • Unpack it: gzip -d wbs_propertypairs.csv
  • Update the actual table: mwscript-k8s --attach -- extensions/PropertySuggester/maintenance/UpdateTable.php --wiki wikidatawiki --file php://stdin < wbs_propertypairs.csv.
    • This will take ca. 10 minutes.
    • It will first log (to your terminal) a bunch of “Deleting a batch” lines, then “X rows inserted” up to the total number of lines in the CSV file (which you can count with wc -l wbs_propertypairs.csv beforehand).
  • Log your changes: !log Updated the Wikidata property suggester with data from yyyymmdd's JSON dump