Summary

Year-end vacation season and deployment freeze are coming, work is expected to somewhat slow down until January. We're still on track to deliver what we expected.

What we've accomplished

Improve multilingual zero-results rate

Finished heuristics for merging Type and Script attributes (e.g., <ALPHANUM>/Latin + <NUM> = <ALPHANUM>/Latin; Latin + Cyrillic = Unknown; etc.). Abandoned making Script attributes merging more configurable (e.g., keep first, keep last, count characters), so every mixed token gets "Unknown" (we're limited to ICU script types, otherwise I'd go with "Mixed") - https://phabricator.wikimedia.org/T332337
Lots of thinking about configurability of scripts & types to merge (e.g., don't merge <EMOJI> types; only merge <ALPHANUM> types; don't merge CJK scripts, etc.). Still thinking about "numbers only" option (because current behavior is an error wrt UAX #29) - https://phabricator.wikimedia.org/T332337

WDQS graph splitting

Investigated TFT (https://github.com/BorderCloud/TFT) and found that it might not suit our needs for the graph splitting analysis, it does not have a handy ways to generate "test scenarios" and seems to be designed to work only against the test created by the w3c working group (https://github.com/w3c/rdf-tests). It's written in PHP and I don't think it'd be wise to add such functionality there - https://phabricator.wikimedia.org/T349519
Investigation started on Iguana (https://iguana-benchmark.eu/), looks more promising, but needs more investigation before decision.
Scholarly Article Split job is now deployed via Airflow and generating a working graph split - https://phabricator.wikimedia.org/T347989
Work started on converting the graph split to a format that can be exported and ingested into Blazegraph - https://phabricator.wikimedia.org/T350106

Search Update Pipeline

Starting backfilling test to validate functional correctness and that load on backend systems is appropriate - https://phabricator.wikimedia.org/T350826
There are open questions about failure modes. Currently, some failures related to bad input data require manual intervention to recover. Automated recovery in a robust way isn't trivial. Note that at the moment, SUP has been running with production data for multiple days without issues, so failures due to data are at least somewhat rare.
Helm charts created and validated by deployment - https://phabricator.wikimedia.org/T326328
Improve the flink-app chart to provide more useful defaults - https://phabricator.wikimedia.org/T346315

Misc

Java restarts for security updates - https://phabricator.wikimedia.org/T350703
Deployment of Mjolnir on Python 3.9 - https://phabricator.wikimedia.org/T346373
Deployment of WDQS Streaming Updater with Flink / k8s Operators in staging (not in production yet) -https://phabricator.wikimedia.org/T326409
Decommission search-loader VMs (part of the migration to Debian Bullseye) - https://phabricator.wikimedia.org/T351123
VisualEditor's Add a link should suggest a redirect with exact case match - https://phabricator.wikimedia.org/T346920