Search Platform/Weekly Updates/2023-06-02
Appearance
Summary
We spent some time this week to deal with the aftermath of our recent WDQS outages. In particular, we are now equipped to diagnose and block problematic queries faster.
Our work on SLOs for Search is unlikely to be completed this quarter. We now have a good working definition of what we want to measure, but we will need help to implement metric collection and create the appropriate dashboards. This is unlikely to be done before the end of the quarter.
What we've accomplished
Search Analysis
- Reindex Estonian wikis to enable new unpacked analyzer - https://phabricator.wikimedia.org/T335704
- Deploy Turkish Analyzer Plugin - https://phabricator.wikimedia.org/T332355
- Reindex Turkish wikis to enable improved apostrophe handling, full write up on Mediawiki https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Unpacking_Notes#better_apostrophe_Impact_on_Turkish_Wikipedia_(T337064) - https://phabricator.wikimedia.org/T337064
- [EPIC] Unpack all Elasticsearch analyzers - https://phabricator.wikimedia.org/T272606
Search Update Pipeline
- The W[DC]QS flink job was upgraded to flink 1.16.1 in all k8s environments - https://phabricator.wikimedia.org/T334244 / https://phabricator.wikimedia.org/T289836
- Finalizing EventBus patch to encode redirect link informations https://gerrit.wikimedia.org/r/c/mediawiki/extensions/EventBus/+/913030
- Deep dive into different types of links, to come up with a shared schema that supports almost all of them.
WDQS graph splitting
- Started writing a small doc about a possible plan
Search SLOs
- SLIs for Search SLOs are defined. Next steps: ensure metrics are collected, and create dashbaords. https://phabricator.wikimedia.org/T335498 (see parent task for more context)
Operations / SRE
- Fixed missing metrics for WDQS rdf-streaming-updated - https://phabricator.wikimedia.org/T336872
- Investigate puppet failure on cirrus-integ03.search.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T336519
- Missing Cirrussearch dump (enwiki and wikidata) - https://phabricator.wikimedia.org/T330936
- Special:Search broken on Beta Wikidata for entity namespaces - https://phabricator.wikimedia.org/T335873
- on-wiki search is failing to find relatively newer titles on enwiki, adding a new check to alert if title suggester isn't updated every 24h - https://phabricator.wikimedia.org/T327199
- Federated queries to Lingua Libre time out in the Commons query service - https://phabricator.wikimedia.org/T334470
- Add https://opendata.aragon.es/sparql to the list of federated endpoints for WDQS and WCQS - https://phabricator.wikimedia.org/T334823
- Reduce the load of CirrusSearch update jobs on MW jobrunners - https://phabricator.wikimedia.org/T336698
Misc
- Ongoing discussions around Search and ChatGPT. It seems that the current search ranking isn't appropriate for the needs, we need to clarify what the problem is and what we could be doing to help.