Summary

We're starting a new quarter. Our goals for this quarter are almost ready and will be published on wiki shortly (keep an eye on https://wikitech.wikimedia.org/wiki/Search_Platform/Goals).

Overall, this was a short week, due to Wikimedia Connect. We're making good progress towards deploying the Search Update Pipeline, with the testing of standard operations completed. We've identified a number of performance improvements to our improvements to multilingual zero-results rate. And we're getting started on experimenting with WDQS graph split.

What we've accomplished

Search Update Pipeline

We have tested all relevant operations, we are ready for a production deployment of the Search Update Pipeline on Flink, with k8s operators - https://phabricator.wikimedia.org/T342149
Migration of the WDQS updater to use newer Flink connectors
Started to work on better isolation of wdqs updater error streams, quick patch to disable them to unblock testing the flink-k8s-op, better solution still WIP - https://phabricator.wikimedia.org/T347515

Improve multilingual zero-results rate

Performance optimization in progress. In particular, consolidating character mapping brings a 9.3% improvement to indexing times, implementing custom mapping code instead of the heavy weight elasticsearch machinery is ~50% faster.
A new Elasticsearch plugin will be created to isolate this and allow for easier rollout

WDQS Graph Split

New servers being provisioned - https://phabricator.wikimedia.org/T347505

Misc

New partman recipe for cloudelastic, which will allow better use of disk space - https://phabricator.wikimedia.org/T342463
Search Platform office hours had Andrea (https://wikitech.wikimedia.org/wiki/User:AndreaWest) joining and good discussions around WDQS, split the graph, the future of RDF and SPARQL.
Document process for getting JNL files/consider automation - https://phabricator.wikimedia.org/T347605
Review of Elasticsearch incident during DC switchover. We've identified an issue with RESTBase that might have contributed to overloading the cluster - https://wikitech.wikimedia.org/wiki/Incidents/2023-09-20_Elasticsearch_unavailable