Search Platform/Weekly Updates/2023-10-06
Appearance
Summary
We're starting a new quarter. Our goals for this quarter are almost ready and will be published on wiki shortly (keep an eye on https://wikitech.wikimedia.org/wiki/Search_Platform/Goals).
Overall, this was a short week, due to Wikimedia Connect. We're making good progress towards deploying the Search Update Pipeline, with the testing of standard operations completed. We've identified a number of performance improvements to our improvements to multilingual zero-results rate. And we're getting started on experimenting with WDQS graph split.
What we've accomplished
Search Update Pipeline
- We have tested all relevant operations, we are ready for a production deployment of the Search Update Pipeline on Flink, with k8s operators - https://phabricator.wikimedia.org/T342149
- Migration of the WDQS updater to use newer Flink connectors
- Started to work on better isolation of wdqs updater error streams, quick patch to disable them to unblock testing the flink-k8s-op, better solution still WIP - https://phabricator.wikimedia.org/T347515
Improve multilingual zero-results rate
- Performance optimization in progress. In particular, consolidating character mapping brings a 9.3% improvement to indexing times, implementing custom mapping code instead of the heavy weight elasticsearch machinery is ~50% faster.
- A new Elasticsearch plugin will be created to isolate this and allow for easier rollout
WDQS Graph Split
- New servers being provisioned - https://phabricator.wikimedia.org/T347505
Misc
- New partman recipe for cloudelastic, which will allow better use of disk space - https://phabricator.wikimedia.org/T342463
- Search Platform office hours had Andrea (https://wikitech.wikimedia.org/wiki/User:AndreaWest) joining and good discussions around WDQS, split the graph, the future of RDF and SPARQL.
- Document process for getting JNL files/consider automation - https://phabricator.wikimedia.org/T347605
- Review of Elasticsearch incident during DC switchover. We've identified an issue with RESTBase that might have contributed to overloading the cluster - https://wikitech.wikimedia.org/wiki/Incidents/2023-09-20_Elasticsearch_unavailable